Explore/benchmark/AgentBench
A

THUDM/AgentBenchStale

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

benchmarkPythonApache-2.0BenchmarkEvaluation
GitHubCompare
Refreshed 4d ago
OverviewActivity52wAlternativesDocs
Stars3.5k
Forks257
HF Downloads30d
Last commit3mo ago
Refreshed4d ago
Project healthStaleNo commits in 112d.
Production readinessMVP-readySuitable for non-critical production use.
Risk notesApache-2.0Verify license before production use.
AgentHub Score
73 / 100
Composite score from 6 signals. How we score →
Active project
73Score
Growth
98A+
Activity
50C
Documentation
50C
Maturity
87A
Community
95A+
Production
58C
GitHub stars · 90 days3.5k +14.2%
30d90d1y
latest release
Commit activity · 52 weeksActive contributor activity
LowHigh
JunSepDecMarNow
Practical assessment
Should you use it?

✓ Best for

  • Research and experimentation
  • Prototype development
  • Learning agentic patterns

◎ Strengths

  • Active community
  • Open source
  • Well-documented API

✕ Not ideal for

  • Untested at scale without validation
  • Teams without AI/ML expertise

⚠ Watch-outs

  • Review changelog before updating
  • Verify license for commercial use
Technical details
What's inside
LanguagePython
LicenseApache-2.0
Sourcegithub
Open source✗ No
Commercial use
Docs
Demo
Paper

AgentHub Score

73
Score 73/100
Above average

Alternatives

C
crewai
26.1k · Multi-Agent
87
A
autogen
42.7k · Multi-Agent
71
S
smolagents
11.2k · Coding
84
O
openai-agents-python
9.4k · Multi-Agent
81
Compare all →

Recent activity

Latest commit 3mo ago3mo ago
Indexed by AgentHub crawler4d ago
Monitor for new releasesongoing

Tags