A local LLM autonomously plays Warsim: The Realm of Aslona. Sessions below are public; submit your own configuration to queue a run for the next time the operator's PC is online.
| Label | Status | Duration | Turns | Ruler | Kingdom | Difficulty | Model | Prompt | Submitter | Dev | Years | Gold Δ | Men Δ | Lands Δ |
|---|
| Model | N (sessions) | Avg years elapsed | Best years elapsed | Best gold Δ | Best men Δ | Best lands Δ |
|---|
Configure a benevolent dictator, a bleeding-heart champion of the people, a ruthless warmonger — or anything else you want to see play out. Your agent runs the next time the operator's online and a slot's open.
Named, versioned system prompts. Editing a prompt creates a new immutable version — sessions stamped with an older version keep the exact text they ran with. Archived prompts disappear from the pickers below but their versions stay resolvable for old sessions.
Keeps the queue topped up by cycling through these (model, prompt) pairs. Public submissions still jump ahead of every entry here. Edits apply live — no restart needed.
A flowchart of the per-turn runner loop. Every box reflects code in
runner.py; every diamond is a real branch.
messages = [
tool_calls)tool_call[Long-term goal] — verbatim, 200-char slot[Short-term goal] — verbatim, 100-char slot[Memory] X/4000 chars, last_edited_turn=Nmemory(op="read").[Note] … lines, zero or more, only if active:
]
Key contrast: goals are re-sent every turn (verbatim). Memory is shown as a status line; the agent fetches the doc on demand. History grows turn by turn; eviction trims it when over budget.
nudge_count ≥ 3mutation_only_count ≥ 10
tools = [interact, memory, set_goal]
choice = "auto"
tools = [interact]
choice = pin to interact
(Ollama 0.21 ignores tool_choice; narrowing the tools
list is the load-bearing belt.)
client.chat.completions.create(
model = session.model_name,
messages = call_messages,
tools = call_tools,
tool_choice = tool_choice,
temperature = cfg.llm_temperature,
extra_body = {"options": {"num_ctx": session.num_ctx}},
)
Non-streaming. No retries.
Any exception → llm_retries_exhausted = True,
crash_reason = "llm_error"
→ EXIT
session.messages.append(assistant_msg)
— verbatim: content + tool_calls
If tool_calls is empty AND content looks
like fenced JSON / fenced Python (qwen3 quirk): recover into
structured tool_calls and rewrite the just-appended
assistant msg.
total_tokens = max(reported_total, local_estimate)
Ollama caps usage.total_tokens at num_ctx;
the local char-based estimate over what we sent is the floor.
nudge_count++turn_type = "nudge"session.messages
(model sees it next turn)for tool_call in assistant_msg.tool_calls:
interact ANDinteractloop_guard_count = 0process_died = True; raise ToolErrormemoryop="read" → bypass loop-guard, return doc + statusop="edit"/"append":
count ≥ 3 → ToolErrorcount++, perform edit/appendset_goalcount ≥ 3 → ToolErrorcount++long_term (200ch) or short_term (100ch) slot→ append tool_result to session.messages
Loop-guard semantics: the mutation that trips
the threshold (count was 2 → call lands, increments to 3) is allowed.
The next non-interact, non-read mutation hits
the gate. Only interact resets the counter.
nudge_count = 0
mutation_only_count = 0
nudge_count = 0
mutation_only_count++
None of these gate this turn. Their state feeds the
[Note] selection in the next turn's request.
INSERT INTO turns (session_id, turn_number, input, output,
rationale, year, gold, men, lands, opinion, turn_type,
prompt_tokens, completion_tokens, latency_ms,
nudge_text, recovered_tool_calls, output_with_ansi,
error_text, error_tool, …)
Then emit one SSE turn event on the live-view feed.
On DB error → crash_reason = "db_write_failed"
→ EXIT
total_tokens ≥ history_budgetlen(messages) > 400
tool_result msgs (one pair)tokens < 0.8 × budget AND len ≤ 400stop_requested → stopped
gamebridge.game_over → game_over
gamebridge.process_died → crashed
llm_retries_exhausted → crashed
invalid_input ≥ 50 → crashed
year_stagnation → crashed
public_turn_cap → done
Verbatim view of what the model received and returned on a given turn. View pipeline reference
A behavioral research tool. A local large language model autonomously plays
Warsim: The Realm of Aslona — a kingdom-management roguelike — through a
tiny tool surface (interact, memory, set_goal).
The same uvicorn process runs the dashboard, drives the game subprocess, and writes
per-turn telemetry to SQLite.
Every game is one model + one system prompt + one autonomous loop. The viewer shows the live game screen, the model's last action and rationale, and the agent's current goal and memory document. No human is in the loop while a session runs.
To collect grounded signal on how small open-weight models handle long-horizon
agentic play: where they get stuck, how they use memory, when they fixate on a
single tool, what their goals drift toward. The runner scaffolds engagement
(anti-stuck nudges, force-interact overrides) but stays neutral on
strategy. It's hand-rolled — no agent SDK — so every decision the model makes is
visible end-to-end.