A local LLM autonomously plays Warsim: The Realm of Aslona. Sessions below are public; submit your own configuration to queue a run for the next time the operator's PC is online.
| Label | Status | Duration | Turns | Ruler | Kingdom | Difficulty | Model | Prompt | Submitter | Dev | Years | Gold Δ | Men Δ | Lands Δ |
|---|
| Model | N (sessions) | Avg years elapsed | Best years elapsed | Best gold Δ | Best men Δ | Best lands Δ |
|---|
Configure a benevolent dictator, a bleeding-heart champion of the people, a ruthless warmonger — or anything else you want to see play out. Your agent runs the next time the operator's online and a slot's open.
Named, versioned system prompts. Editing a prompt creates a new immutable version — sessions stamped with an older version keep the exact text they ran with. Archived prompts disappear from the pickers below but their versions stay resolvable for old sessions.
Keeps the queue topped up by cycling through these (model, prompt) pairs. Public submissions still jump ahead of every entry here. Edits apply live — no restart needed.
Every public-site page request the Cloudflare Pages worker observed. Indexed columns are the ones the table sorts on; click a row to see the full passive-metadata blob (TLS, client hints, geo, network).
| Time | Path | Where | IP | Network | Browser / OS | Referer |
|---|
A flowchart of the per-turn runner loop. Every box reflects code in
runner.py; every diamond is a real branch.
messages = [
tool_calls)tool_call[Long-term goal] — verbatim, 200-char slot[Short-term goal] — verbatim, 100-char slot[Memory] X/4000 chars, last_edited_turn=Nmemory(op="read").[Note] … lines, zero or more, only if active:
]
Key contrast: goals are re-sent every turn (verbatim). Memory is shown as a status line; the agent fetches the doc on demand. History grows turn by turn; eviction trims it when over budget.
nudge_count ≥ 3mutation_only_count ≥ 10
tools = [interact, memory, set_goal]
choice = "auto"
tools = [interact]
choice = pin to interact
(Ollama 0.21 ignores tool_choice; narrowing the tools
list is the load-bearing belt.)
client.chat.completions.create(
model = session.model_name,
messages = call_messages,
tools = call_tools,
tool_choice = tool_choice,
temperature = cfg.llm_temperature,
extra_body = {"options": {"num_ctx": session.num_ctx}},
)
Non-streaming. No retries.
Any exception → llm_retries_exhausted = True,
crash_reason = "llm_error"
→ EXIT
session.messages.append(assistant_msg)
— verbatim: content + tool_calls
If tool_calls is empty AND content looks
like fenced JSON / fenced Python (qwen3 quirk): recover into
structured tool_calls and rewrite the just-appended
assistant msg.
total_tokens = max(reported_total, local_estimate)
Ollama caps usage.total_tokens at num_ctx;
the local char-based estimate over what we sent is the floor.
nudge_count++turn_type = "nudge"session.messages
(model sees it next turn)for tool_call in assistant_msg.tool_calls:
interact ANDinteractloop_guard_count = 0process_died = True; raise ToolErrormemoryop="read" → bypass loop-guard, return doc + statusop="edit"/"append":
count ≥ 3 → ToolErrorcount++, perform edit/appendset_goalcount ≥ 3 → ToolErrorcount++long_term (200ch) or short_term (100ch) slot→ append tool_result to session.messages
Loop-guard semantics: the mutation that trips
the threshold (count was 2 → call lands, increments to 3) is allowed.
The next non-interact, non-read mutation hits
the gate. Only interact resets the counter.
nudge_count = 0
mutation_only_count = 0
nudge_count = 0
mutation_only_count++
None of these gate this turn. Their state feeds the
[Note] selection in the next turn's request.
INSERT INTO turns (session_id, turn_number, input, output,
rationale, year, gold, men, lands, opinion, turn_type,
prompt_tokens, completion_tokens, latency_ms,
nudge_text, recovered_tool_calls, output_with_ansi,
error_text, error_tool, …)
Then emit one SSE turn event on the live-view feed.
On DB error → crash_reason = "db_write_failed"
→ EXIT
total_tokens ≥ history_budgetlen(messages) > 400
tool_result msgs (one pair)tokens < 0.8 × budget AND len ≤ 400stop_requested → stopped
gamebridge.game_over → game_over
gamebridge.process_died → crashed
llm_retries_exhausted → crashed
invalid_input ≥ 50 → crashed
year_stagnation → crashed
public_turn_cap → done
Verbatim view of what the model received and returned on a given turn. View pipeline reference
Me too, so I made this.
No, I'm neither fabulously wealthy nor an Anthropic employee so this is actually "small open source model that can run on my computer plays Warsim: The Realm of Aslona".
A really neat indie game, you should support the dev and get it to play yourself!
That sounded like I was trying to prepare AI to wage real war, and this game is very silly (complimentary).
A custom harness that lets a small open-weight LLM autonomously play Warsim on the operator's computer. It exposes a tiny MCP surface (one tool to send input to the game, one to edit a memory document, one to set a goal), a hand-rolled tool-use loop drives every turn, and the dashboard streams each move to SQLite so you can watch — and the operator can study — what the model actually does. The "hand-rolled" runner was, fittingly, rolled mostly by Claude's hands. (I wrote this answer; the operator wrote the rest of the page.)
For this? I'll keep adding to the harness and page for a while and slowly gather more session data. I've got a bunch of ideas cooking for how to visualize it in interesting ways, plus lots of ideas for harness improvements and agent configs to test out.
Beyond this? If I get real crazy I might try to build a version for the true 2024 GOTY - Caves of Qud, which you should also check out if you haven't yet. Getting a harness solid enough for a local model to actually play that game will be an order of magnitude harder though, so I'll be tinkering with this one for a while first.