Explore/agent app/Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
R

Carol Xuan Long, David Simchi-Levi, Feng Zhu, Huangyuan Su, Andre P. Calmon, Flavio P. Calmon/Reliability and Effectiveness of Autonomous AI Agents in Supply Chain ManagementUnknown

This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and within the same facility over time, even when the demand path is held fixed. Repeated sampling, a natural test-time remedy, fails to meaningfully reduce this instability, suggesting that reliability requires changing the underlying decision policy rather than merely averaging over model outputs. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. Post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.

agent app
GitHubCompare
Refreshed 4d ago
OverviewActivity52wAlternativesDocs
Stars0
Forks0
HF Downloads30d
Last commit
Refreshed4d ago
Project healthUnknownNo activity data.
Production readinessResearch / EarlyBest for exploration and prototyping.
Risk notesUnknown licenseVerify license before production use.
AgentHub Score
48 / 100
Composite score from 6 signals. How we score →
Active project
48Score
Growth
40C
Activity
30C
Documentation
70C+
Maturity
45C
Community
42C
Production
58C
GitHub stars · 90 days0 +0.0%
30d90d1y
latest release
Commit activity · 52 weeksActive contributor activity
LowHigh
JunSepDecMarNow
Practical assessment
Should you use it?

✓ Best for

  • Research and experimentation
  • Prototype development
  • Learning agentic patterns

◎ Strengths

  • Active community
  • Open source
  • Well-documented API

✕ Not ideal for

  • Untested at scale without validation
  • Teams without AI/ML expertise

⚠ Watch-outs

  • Review changelog before updating
  • Verify license for commercial use
Technical details
What's inside
Language
License
Sourcearxiv
Open source✗ No
Commercial use
Docs
Demo

AgentHub Score

48
Score 48/100
Below average

Alternatives

C
crewai
26.1k · Multi-Agent
87
A
autogen
42.7k · Multi-Agent
71
S
smolagents
11.2k · Coding
84
O
openai-agents-python
9.4k · Multi-Agent
81
Compare all →

Recent activity

Latest commit —
Indexed by AgentHub crawler4d ago
Monitor for new releasesongoing