agentcore

Put the hard guarantees in code, not in the prompt. The agent is a deterministic state machine; the LLM is called only at named, typed edges. A model that returns garbage cannot break a transition, a routing rule, an SLA, or a guardrail.

from agentcore import triage, Ticket, FakeLLMClient

# Offline, no API key. The fake model returns priority "low" for a security ticket.
client = FakeLLMClient(responses=[
    {"category": "security"},
    {"priority": "low"},                       # the model lowballs it...
    {"reply": "ping me at admin@acme.com"},     # ...and leaks an email
])

d = triage(client, Ticket(id="T-9", subject="account takeover", body="someone is in my account"))

print(d.state.value)        # closed
print(d.priority.value)     # high      <- core forced HIGH, ignored the model
print(d.queue)              # security  <- security routing, not the default
print(d.requires_human)     # True      <- mandatory human review
print(d.reply)              # ping me at [REDACTED_EMAIL]  <- PII scrubbed

The model said "low priority, no PII problem." The deterministic core overrode all of it. That is the whole idea.

Without vs with a deterministic core

A pure-prompt agent puts the rules in the system prompt: "classify the ticket, route security issues to the security team, redact PII, never auto-close an account issue without a human." Every one of those is a request the model may or may not honor. A jailbreak, a bad sample, a schema drift, or a model upgrade can silently break any of them, and you find out in production.

agentcore keeps the model on the outside. It is asked three narrow questions (what category, what priority, draft a reply) and nothing it answers is trusted until code validates it:

Concern	Pure-prompt agent	Deterministic core
State transitions	Model "remembers" the flow	Enforced table; illegal jumps raise
Category	Whatever string the model emits	Coerced to a fixed enum, unknown -> `other`
Routing / SLA	Described in the prompt	Pure functions over validated enums
Security escalation	"Please prioritize security"	Forced HIGH + security queue + human review, in code
PII in replies	"Don't include PII"	Regex redaction before the reply leaves the system
Reply length	"Keep it short"	Hard character cap
Testable offline	No (needs the model)	Yes (model faked)

Architecture

stateDiagram-v2
    direction LR
    [*] --> NEW

    NEW --> CATEGORIZED: classify_intent (LLM)<br/>core.apply_categorization<br/>coerce_category -> OTHER
    CATEGORIZED --> ROUTED: assess_priority (LLM)<br/>core.route<br/>coerce_priority, queue, SLA, human-flag
    ROUTED --> DRAFTED: draft_reply (LLM)<br/>core.attach_reply<br/>redact PII, cap 1200 chars
    DRAFTED --> CLOSED: core.close (rules only)
    CLOSED --> [*]

    note right of NEW
        LLM is called only at the three labeled edges.
        Each returns a raw, untrusted value.
        core.py coerces it to a valid enum and applies
        routing / SLA / guardrail rules before the state
        advances via transition().
    end note

A ticket moves through a linear state machine whose transitions are enforced in core.py. The LLM is consulted only at the three labeled edges in edges.py; each edge returns an untrusted raw value that the core coerces to a valid enum and validates before the state advances.

How it works

            deterministic core (code)                 LLM (edges)
        +-------------------------------+        +---------------------+
NEW --> | apply_categorization          | <----- | classify_intent     |
        |   validate -> Category enum   |        +---------------------+
CATEGORIZED                             |
        | route                         | <----- | assess_priority     |
        |   priority enum + queue + SLA |        +---------------------+
        |   + human-review flag (rules) |
ROUTED                                  |
        | attach_reply                  | <----- | draft_reply         |
        |   redact PII + cap length     |        +---------------------+
DRAFTED                                 |
        | close                         |
        +-------------------------------+
CLOSED

Edges propose. The core disposes. A bad edge output cannot break a rule.

core.py owns every guarantee: the transition table, routing map, SLA table, human-review rules, and PII redaction. Every function is pure: (Decision, ...) -> Decision. State can only change through transition, which raises on an illegal jump.
edges.py holds the three LLM seams. Each builds a prompt and a JSON schema, calls the client, and returns the raw value. Edges decide nothing.
agent.py is the driver. Read triage top to bottom: every edge output is immediately handed to a core function that validates it. There is no path from edge to decision that skips the core.
llm.py defines the LLMClient seam, a deterministic FakeLLMClient for tests/offline use, and AnthropicLLMClient (the real default). The anthropic import is lazy, so the package imports with no SDK and no API key.

The real adapter defaults to claude-sonnet-4-6 for cost; claude-opus-4-8 and claude-haiku-4-5-20251001 are also exported.

Install

pip install -e .            # core, zero runtime dependencies
pip install -e ".[anthropic]"   # add the real Anthropic adapter

With uv:

uv venv && uv pip install -e ".[dev]"

Run the example (under 30s, offline)

python -m agentcore.demo        # FakeLLMClient, no key, no network
python -m agentcore.demo --live # real Anthropic adapter (needs ANTHROPIC_API_KEY)

Results

The reliability claim is measured by the test suite, not asserted. The headline test, test_invariants_hold_for_every_garbage_combination, runs the full triage pipeline against a Cartesian product of bad model outputs (8 category values x 6 priority values x 7 reply values = 336 runs) covering unknown strings, wrong types, empty values, None, and injection-style junk. After every run it checks six invariants: the ticket ends in CLOSED, category and priority are valid enum members, the queue is known, the SLA is a positive integer, security tickets are always escalated, and the reply is a redacted, length-capped string.

Metric	Value	How it is produced
Adversarial model outputs that violated a core rule	0 / 336	`pytest tests/test_agent_adversarial.py::test_invariants_hold_for_every_garbage_combination`
Total tests passing (no network, no API key)	449	`pytest`

Reproduce:

python3 -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
pytest

============================= 449 passed in 3.46s ==============================

Limitations

This is a template, not a framework. The domain (support triage) is a worked example; the categories, queues, and SLA table are illustrative. Adapt them to your domain.
PII redaction uses regex for emails, card-like digit runs, and US SSNs. It is a guardrail, not a compliance solution, and will miss formats it does not match. Do not treat it as a complete DLP layer.
The pattern moves correctness into code, which means you write the rules. If a rule is wrong, the core enforces it faithfully and wrongly. The win is that the rule is explicit, versioned, and tested, not that it is automatically correct.
The agent is a linear pipeline (categorize -> route -> draft -> close). Branching workflows, retries, and tool loops are out of scope here; the same core/edge split extends to them, but this template does not implement them.
FakeLLMClient makes tests deterministic. It does not model latency, token limits, or streaming. Use the live adapter for behavior that depends on the real model.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/agentcore		src/agentcore
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentcore

Without vs with a deterministic core

Architecture

How it works

Install

Run the example (under 30s, offline)

Results

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentcore

Without vs with a deterministic core

Architecture

How it works

Install

Run the example (under 30s, offline)

Results

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages