Beyond One-shot: AI Agents for Learning in Field Experiments — AI agent app

Junjie Luo, Ritu Agarwal, Gordon Gao/Beyond One-shot: AI Agents for Learning in Field ExperimentsUnknown

Organizations routinely run experiments for A/B testing, yet the data generated from one experiment is underutilized to inform subsequent intervention design. Significant barriers exist to extracting actionable knowledge from prior experimental data to inform new interventions. We study whether tool-augmented agentic AI can automatically learn from experimental data to generate new interventions in subsequent experiments. Through two-stage field experiments in healthcare prescription messaging (693,139 patient visits), we compare a Human + Chatbot method (Stage 1: behavioral experts with conversational AI co-designing 13 message variants, 444,691 patient visits) against a Tool-Augmented Agentic AI method (Stage 2: AI autonomously extracting principles from Stage 1 data to generate 17 new variants, 248,448 patient visits). The Agentic AI method, equipped with analytical tools, structured Data-Information-Knowledge-Wisdom (DIKW) reasoning agents, and transparent evidence chains, produces superior interventions: the best AI-generated message achieved a 69.8% CTR (+6.5 percentage points over baseline). Critically, our results suggest that the value comes from domain-specific experimental data, not from general reasoning ability: frontier LLMs operating without experimental data failed to predict which interventions would succeed. The field experiments also revealed that general-purpose behavioral theories used for intervention design do not extend uniformly to specific healthcare contexts, motivating an agentic AI approach to theory audits at field-experiment scale. Our research shows that tool-augmented AI can learn from experimental data and generate improved domain-relevant interventions, transforming behavioral experimentation from one-shot evaluation into a scalable system for cumulative design learning.

agent app

Stars0

Forks0

HF Downloads—30d

Last commit—

Refreshed14h ago

Project healthUnknownNo activity data.

Production readinessResearch / EarlyBest for exploration and prototyping.

Risk notesUnknown licenseVerify license before production use.

AgentHub Score

48 / 100

Composite score from 6 signals. How we score →

Active project

48Score

Growth

40C

Activity

30C

Documentation

70C+

Maturity

45C

Community

42C

Production

58C

GitHub stars · 90 days0 +0.0%

30d90d1y

Commit activity · 52 weeksActive contributor activity

LowHigh

JunSepDecMarNow

Practical assessment

Should you use it?

✓ Best for

Research and experimentation
Prototype development
Learning agentic patterns

◎ Strengths

Active community
Open source
Well-documented API

✕ Not ideal for

Untested at scale without validation
Teams without AI/ML expertise

⚠ Watch-outs

Review changelog before updating
Verify license for commercial use

Technical details

What's inside

Language—

License—

Sourcearxiv

Open source✗ No

Commercial use—

Docs—

Demo—

PaperarXiv ↗

AgentHub Score

Score 48/100

Below average

Alternatives

crewai

26.1k · Multi-Agent

autogen

42.7k · Multi-Agent

smolagents

11.2k · Coding

openai-agents-python

9.4k · Multi-Agent

Compare all →

Recent activity

Latest commit ——

Indexed by AgentHub crawler14h ago

Monitor for new releasesongoing