Detect runaway exchange loops in any token economy — game balance, AI spend, or reward-function auditing.
Quick Start · How It Works · CLI Reference · MCP / Claude · vs. Alternatives · Contributing
Any system with exchange rules — tokens, compute budgets, reward points, or in-game currency — can develop runaway loops that the designers never intended. The loops are mathematically inevitable once the exchange graph contains a negative cycle; the only question is whether you find them first.
Game economies: A crafting loop that converts gold → silver → gems → gold at a net gain of 24x will be found by players within hours of launch — not by QA. Manual balance spreadsheets don't scale. Playtesting can't enumerate all cycles.
AI token budgets: An analytics pipeline discovers that re-expanding summaries and re-scoring them earns more "quality credit" than it costs in tokens — so it loops. By month-end the pipeline consumed 26x its monthly budget. 78% of teams have no per-workflow token alert; they see the overrun on the billing page, not in a dashboard.
Reward-function auditing: An RL agent finds a sequence of actions where each step earns more reward than it costs — a classic reward-hacking loop. balancelab encodes the reward structure as an exchange graph and detects the profitable cycle before the agent finds it.
balancelab treats any set of exchange rules as a directed graph, applies Bellman-Ford on log-weighted edges, and produces an exploit report with exact gain ratios — before launch or before your next billing cycle.
balancelab scan --format json # CI-friendly, fails on exploits
Model a multi-workflow AI pipeline as an economy where each model call costs tokens and each quality-score credit refills the budget. balancelab finds the loop before it inflates your bill 26×.
from balancelab import EconomyGraph, EconomyRule, ExploitFinder, recommend_fixes
graph = EconomyGraph()
# Analytics pipeline: each quality_score credit = 3500 new budget tokens
graph.add_rule(EconomyRule("budget_tokens", "quality_score", 200.0, 1.0, rule_id="score"))
graph.add_rule(EconomyRule("quality_score", "budget_tokens", 1.0, 3500.0, rule_id="redeem"))
report = ExploitFinder().find_exploits(graph)
# → ExploitReport(exploits=[ExploitPath(path=[budget_tokens → quality_score → budget_tokens], gain_ratio=17.5)])
for fix in recommend_fixes(report):
print(fix.fix_type, fix.description) # rate_cap: cap the redeem edgeSee examples/ai_token_budget_monitor.py for a full walkthrough including budget projection and sensitivity analysis.
A crafting loop gold → silver → gems → gold at 24x gain will be found by players within hours of launch. balancelab encodes your exchange rules and proves whether arbitrage is possible — before it ships.
See examples/demo.py for the classic game economy example.
flowchart LR
A[Define EconomyRule\nsource_item → target_item\nat source_qty:target_qty] --> B[Build EconomyGraph\ndirected exchange graph]
B --> C[ExploitFinder\nBellman-Ford on\nlog-weight edges]
C --> D{Negative cycle\ndetected?}
D -->|Yes| E[ExploitPath\ngain_ratio > 1.0]
D -->|No| F[Economy balanced\ntotal_found = 0]
E --> G[ExploitReport\nall cycles ranked\nby gain_ratio]
Core primitives:
- EconomyRule — an immutable, content-addressed exchange: give
source_qtyofsource_item, receivetarget_qtyoftarget_item. ID = SHA-256[:16] of the rule parameters. Same rule always produces the same ID. - EconomyGraph — a directed graph of EconomyRules. Supports neighbor traversal and serialization.
- ExploitFinder — converts exchange rates to log-weights (
weight = -log(rate)). A negative cycle in the log-weight graph corresponds to a positive-gain cycle in the economy. Uses Bellman-Ford for O(V·E) detection. - ExploitPath — a single circular trade path with its gain ratio (e.g., 24.0x).
- ExploitReport — the full scan result: item count, rule count, all exploit paths, timestamp.
Facts are stored in a local SQLite database. No server required.
| Feature | Details |
|---|---|
| Graph-based exploit detection | Bellman-Ford on log-weight graph finds all profitable cycles |
| Content-addressed rules | Same exchange always produces the same ID — no duplicates |
| Gain ratio ranking | Every exploit path shows exact multiplier (e.g., 24.0x) |
| Offline / local-first | Single SQLite file, no server required |
| CI exit code | balancelab scan returns non-zero if exploits found |
| JSON output | Machine-readable output for downstream automation |
| Markdown output | Ready-to-paste GitHub PR comment |
| FastAPI REST server | /rule, /rules, /scan, /reports, /health endpoints |
| MCP server | Model Context Protocol integration for Claude and other agents |
| OpenAI tool spec | tools/openai-tools.json for GPT function calling |
| 45 tests | Comprehensive test suite covering all layers |
pip install balancelabfrom balancelab.economy import EconomyRule, EconomyGraph, ExploitFinder
from balancelab.report import print_report
# Define your economy's exchange rules
graph = EconomyGraph()
graph.add_rule(EconomyRule("gold", "silver", 1.0, 3.0, rule_id="mint"))
graph.add_rule(EconomyRule("silver", "gems", 1.0, 2.0, rule_id="jeweler"))
graph.add_rule(EconomyRule("gems", "gold", 1.0, 4.0, rule_id="trader"))
# Find exploits
finder = ExploitFinder()
report = finder.find_exploits(graph)
# Display results
print_report(report)
# Exploit Report (id: a3f8b2c1d4e5f6a7)
# Items: 3 Rules: 3
# Exploits found: 1
# ┌──────────────────┬─────────────────────────────────────────┬────────────┐
# │ ID │ Path │ Gain Ratio │
# ├──────────────────┼─────────────────────────────────────────┼────────────┤
# │ 4d7e9c2a1b8f3e6a │ gold → silver → gems → gold │ 24.00x │
# └──────────────────┴─────────────────────────────────────────┴────────────┘balancelab [--db PATH] COMMAND [OPTIONS]| Command | Description | Key options |
|---|---|---|
add SOURCE TARGET SRC_QTY TGT_QTY |
Add an exchange rule | --rule-id LABEL, --db PATH |
scan |
Find exploits in stored rules | --format {rich,json}, --db PATH |
report REPORT_ID |
Show a specific exploit report | --format {rich,json}, --db PATH |
log |
List all exploit reports | --db PATH |
status |
Show rule count and last scan | --db PATH |
simulate GRAPH_FILE |
Simulate economy from a JSON graph file | --steps N, --strategy {greedy,balanced,exploit}, --format {rich,json} |
fixes |
Show fix recommendations for the latest exploit report | --report-id ID, --db PATH |
Global options:
| Option | Default |
|---|---|
--db PATH |
.balancelab/economy.db |
Examples:
# Add exchange rules
balancelab add gold silver 1.0 3.0 --rule-id mint
balancelab add silver gems 1.0 2.0 --rule-id jeweler
balancelab add gems gold 1.0 4.0 --rule-id trader
# Scan for exploits
balancelab scan
# Machine-readable output (for CI)
balancelab scan --format json
# Review previous scans
balancelab logbalancelab ships a Model Context Protocol server. Add it to Claude Desktop:
{
"mcpServers": {
"balancelab": {
"command": "balancelab-mcp"
}
}
}Available MCP tools: add_rule, scan_economy, list_reports.
Install with MCP support: pip install "balancelab[mcp]"
You can also find balancelab on Smithery for one-click MCP installation.
balancelab ships a FastAPI server exposing all core operations over HTTP.
pip install "balancelab[api]"
uvicorn balancelab.api:app --reloadAvailable endpoints:
| Method | Path | Description |
|---|---|---|
POST |
/rule |
Add an exchange rule |
GET |
/rules |
List all rules |
POST |
/scan |
Run exploit scan |
GET |
/reports |
List all reports |
GET |
/health |
Health check |
The full OpenAPI spec is in openapi.yaml.
Use tools/openai-tools.json for OpenAI function calling:
import json, openai
tools = json.load(open("tools/openai-tools.json"))
response = openai.chat.completions.create(
model="gpt-4o",
tools=tools,
messages=[{"role": "user", "content": "Scan my economy for exploits"}],
)Beyond the core exploit detection primitives, balancelab exposes several higher-level APIs imported directly from the package:
from balancelab import (
simulate,
recommend_fixes,
sensitivity_analysis,
critical_path,
BalanceFix,
SensitivityResult,
SimulationResult,
)Run a forward simulation of the economy. Strategies: "greedy" (apply every rule each step), "balanced" (one rule per source item), "exploit" (agent actively exploits known cycles).
from balancelab import simulate
from balancelab.economy import EconomyGraph, EconomyRule
graph = EconomyGraph()
graph.add_rule(EconomyRule("gold", "silver", 1.0, 3.0))
result = simulate(graph, {"gold": 100.0, "silver": 0.0}, n_steps=50, agent_strategy="exploit")
print(result.inflation_detected) # True/False
print(result.final_levels) # {"gold": ..., "silver": ...}SimulationResult fields: steps, final_levels, violated_rules, inflation_detected, inflation_resource, summary.
For each exploit in an ExploitReport, suggest the minimum intervention to neutralize it.
from balancelab import recommend_fixes
fixes = recommend_fixes(report)
for fix in fixes:
print(fix.fix_type, fix.target_edge, fix.suggested_value, fix.description)BalanceFix fields: exploit_path, fix_type ("rate_cap", "cooldown", "daily_limit", "require_prerequisite"), target_edge, suggested_value, description, estimated_reduction_pct.
Rank all economy nodes by how much they affect overall balance, descending by impact_score.
from balancelab import sensitivity_analysis
results = sensitivity_analysis(graph, report)
for r in results:
print(r.node_id, r.impact_score, r.recommendation)SensitivityResult fields: node_id, node_type ("hub", "source_only", "target_only"), impact_score (0–1), connected_rules, exploit_involvement, recommendation ("monitor", "rate-limit", "gate").
Find the sequence of nodes with the highest economic throughput (most exchange-rate flow). Useful for identifying which items act as economic bottlenecks.
from balancelab import critical_path
path = critical_path(graph)
print(" -> ".join(path)) # e.g. "gems -> gold -> silver"| Approach | Scalability | Automation | Accuracy | Cost |
|---|---|---|---|---|
| balancelab | Graph (O·V·E) | Full CI | Mathematical | Free |
| Manual spreadsheet | Poor | None | Error-prone | Low |
| Playtesting | Poor | None | Incomplete | High |
| Custom scripts | Variable | Manual | Variable | Medium |
| LLM-only analysis | N/A | Partial | Hallucination risk | High |
| Cloud billing alerts | Post-facto | Reactive only | Accurate but late | $$ |
| Per-model rate limits | Coarse-grained | Partial | Misses cross-model loops | Low |
balancelab/
├── src/balancelab/
│ ├── __init__.py # Public API
│ ├── economy.py # EconomyRule, EconomyGraph, ExploitFinder
│ ├── store.py # SQLite persistence
│ ├── report.py # Rich/JSON/Markdown formatters
│ ├── cli.py # Click CLI
│ ├── api.py # FastAPI server
│ └── mcp_server.py # MCP server
├── tests/
│ ├── test_economy.py # Data model tests
│ ├── test_exploit.py # Bellman-Ford exploit detection
│ ├── test_store.py # SQLite CRUD
│ ├── test_report.py # Formatters
│ ├── test_cli_runner.py # CLI integration
│ └── test_api.py # FastAPI endpoints
├── tools/openai-tools.json # OpenAI function calling spec
├── openapi.yaml # Full OpenAPI 3.1 spec
├── examples/demo.py # Standalone demo
└── smoke_test.py # End-to-end smoke test
Suggested topics for discoverability: #game-economy #arbitrage #balance-testing #agents #mcp #llmops #python #cli #exploit-detection #graph-algorithms
See how teams are using balancelab in production:
- Eliminating Economy Exploits Before Launch — Stellar Forge finds 5 exploits (including a 1,250x gain ratio) before shipping to 2M DAU
- Catching Reward Hacking in an AI Agent Token Economy — Orbital Systems catches synthetic-task reward hacking in simulation before it reaches production
Subscribe to The Silence Layer — weekly dispatches on production AI infrastructure, new releases, and the failure modes that production AI systems don't surface until it's too late.
