Explore/benchmark/From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents
F

Sanderson Oliveira de Macedo/From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development AgentsUnknown

AI tools for programming are no longer just autocomplete or chat assistants: they organize themselves as development frameworks, with process, roles, artifacts and verification. Recent surveys map agents and LLMs for software engineering, but a study centered on the operational frameworks that turn these capabilities into process is missing. We ran a directed search of primary sources, with a functional inclusion criterion and traction measurement, and selected six frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done (GSD), Spec Kitty and Reversa. Each attacks AI development through a different path: spec-driven development in full and lightweight variants, agent-driven agile planning, context engineering over the agent, worktree isolation and review, and recovery of operational specifications from legacy systems. Our central contribution is a six-dimension process taxonomy: specification, context, roles, execution, validation and portability, with a scoring rubric that turns it into a replicable instrument. We apply it to the six frameworks and an out-of-sample case, Spec-Flow. Two results stand out. Among frameworks that already adopt some process there is convergence: the isolated prompt loses centrality, and persistent artifacts, work contracts, traceability and human review become mechanisms that reduce ambiguity and coordinate agents. And no framework strongly covers all six dimensions, exposing a structural trade-off between process depth and portability across agents. We also found recurring risks: drift between specification and code, excessive trust in generated artifacts, fragility of community extensions, platform dependence and a lack of benchmarks for the complete process. We close with a research agenda for empirical evaluation, focused on intermediate-quality metrics, context governance, installation security and reproducibility.

benchmark
GitHubCompare
Refreshed 1d ago
OverviewActivity52wAlternativesDocs
Stars0
Forks0
HF Downloads30d
Last commit
Refreshed1d ago
Project healthUnknownNo activity data.
Production readinessResearch / EarlyBest for exploration and prototyping.
Risk notesUnknown licenseVerify license before production use.
AgentHub Score
48 / 100
Composite score from 6 signals. How we score →
Active project
48Score
Growth
40C
Activity
30C
Documentation
70C+
Maturity
45C
Community
42C
Production
58C
GitHub stars · 90 days0 +0.0%
30d90d1y
latest release
Commit activity · 52 weeksActive contributor activity
LowHigh
JunSepDecMarNow
Practical assessment
Should you use it?

✓ Best for

  • Research and experimentation
  • Prototype development
  • Learning agentic patterns

◎ Strengths

  • Active community
  • Open source
  • Well-documented API

✕ Not ideal for

  • Untested at scale without validation
  • Teams without AI/ML expertise

⚠ Watch-outs

  • Review changelog before updating
  • Verify license for commercial use
Technical details
What's inside
Language
License
Sourcearxiv
Open source✗ No
Commercial use
Docs
Demo

AgentHub Score

48
Score 48/100
Below average

Alternatives

C
crewai
26.1k · Multi-Agent
87
A
autogen
42.7k · Multi-Agent
71
S
smolagents
11.2k · Coding
84
O
openai-agents-python
9.4k · Multi-Agent
81
Compare all →

Recent activity

Latest commit —
Indexed by AgentHub crawler1d ago
Monitor for new releasesongoing