Specification-Driven Development Benchmark: Security Knowledge Transition — AI benchmark

Oleg Grynets, Andrii Salyk, Vasyl Lyashkevych, Oleh Kaskun, Danyil Zhuravchak/Specification-Driven Development Benchmark: Security Knowledge TransitionUnknown

AI-assisted software development is shifting from isolated code completion toward specification-driven generation, where business requirements, technical specifications, and acceptance criteria become operational input for LLM-based development agents. This shift creates a security problem: functional behavior is described explicitly, while security behavior remains implicit, generic, or postponed to post-generation review, causing generated systems to satisfy visible functional requirements while failing to preserve authorization rules, ownership boundaries, input validation, token rejection, sensitive data handling, and abuse-case semantics. This paper proposes a security knowledge operationalization approach for AI-assisted specification-driven development, combining two contributions: a Multilayer Specification Security Model that represents security knowledge through traceable relations between system entities, threats, risks, requirements, implementation rules, controls, verification scenarios, and evidence; and a Security Knowledge Transition Method that transforms business and technical specifications into a validated security-enriched generation contract. We evaluate the approach through two empirical studies: a hidden-oracle study assessing whether an LLM-based pipeline can derive a structured security model from system context, and a backend generation study under three conditions: no explicit security requirements, ASVS-conditioned generation, and Multilayer Security Model conditioning. Evaluated against a hidden 221-test black-box API suite, modal failures decreased from 50 in the baseline to 42 with ASVS and 36 with the Multilayer Security Model, with the strongest improvements in application-specific categories such as business logic and admin safety.

benchmark

Stars0

Forks0

HF Downloads—30d

Last commit—

Refreshed1d ago

Project healthUnknownNo activity data.

Production readinessResearch / EarlyBest for exploration and prototyping.

Risk notesUnknown licenseVerify license before production use.

AgentHub Score

48 / 100

Composite score from 6 signals. How we score →

Active project

48Score

Growth

40C

Activity

30C

Documentation

70C+

Maturity

45C

Community

42C

Production

58C

GitHub stars · 90 days0 +0.0%

30d90d1y

Commit activity · 52 weeksActive contributor activity

LowHigh

JunSepDecMarNow

Practical assessment

Should you use it?

✓ Best for

Research and experimentation
Prototype development
Learning agentic patterns

◎ Strengths

Active community
Open source
Well-documented API

✕ Not ideal for

Untested at scale without validation
Teams without AI/ML expertise

⚠ Watch-outs

Review changelog before updating
Verify license for commercial use

Technical details

What's inside

Language—

License—

Sourcearxiv

Open source✗ No

Commercial use—

Docs—

Demo—

PaperarXiv ↗

AgentHub Score

Score 48/100

Below average

Alternatives

crewai

26.1k · Multi-Agent

autogen

42.7k · Multi-Agent

smolagents

11.2k · Coding

openai-agents-python

9.4k · Multi-Agent

Compare all →

Recent activity

Latest commit ——

Indexed by AgentHub crawler1d ago

Monitor for new releasesongoing