Skip to content

feat: Epiplexity Demo 1 - Game of Life Emergent Objects#10

Open
JNK234 wants to merge 5 commits into
mainfrom
feature/epiplexity-01-full-context
Open

feat: Epiplexity Demo 1 - Game of Life Emergent Objects#10
JNK234 wants to merge 5 commits into
mainfrom
feature/epiplexity-01-full-context

Conversation

@JNK234
Copy link
Copy Markdown
Collaborator

@JNK234 JNK234 commented Feb 26, 2026

Epiplexity Demo 1: Emergent Object Discovery

Validates Paradox 1 from From Entropy to Epiplexity (Finzi et al., 2026): arXiv:2601.03220

What This Does

  • Implements Conway's Game of Life with an LLM-based bounded observer
  • Observer learns to recognize emergent patterns (gliders, oscillators, stable blocks) that are NOT in the micro rules
  • Demonstrates two memory modes:
    • Bounded memory: Markovian observer (clears each tick) → no learning
    • Persistent memory: Accumulates history → learns patterns over time

Results

  • Bounded mode: Lower prediction accuracy
  • Persistent mode: Higher accuracy
  • Validates Paradox 1: Structure (patterns) emerges from deterministic computation and can be extracted by sufficiently powerful observers

Files

  • demos/epiplexity-01-emergent-objects/game_of_life.nlogo - NetLogo model
  • demos/epiplexity-01-emergent-objects/templates/ - YAML prompts for LLM
  • demos/epiplexity-01-emergent-objects/results/ - CSV output and plots
  • demos/epiplexity-01-emergent-objects/tests/ - Unit tests

How to Run

  1. Open game_of_life.nlogo in NetLogo 6.4+
  2. Configure LLM model in config.txt
  3. Click setup then go
  4. Check results/ folder for CSV and SVG outputs

@cursor
Copy link
Copy Markdown

cursor Bot commented Feb 26, 2026

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on March 7.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 229ed4d2d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +374 to +375
_try_run_with_pynetlogo(args.ticks)
_generate_baseline_if_missing(rows=args.ticks, force=args.refresh_baseline)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip baseline generation after a successful live run

The script ignores the boolean result of _try_run_with_pynetlogo and always calls _generate_baseline_if_missing, which can silently replace real NetLogo outputs with simulated baseline data. In a clean results directory, a live run creates bounded-output.csv and persistent-output.csv but not demo-output.csv, so the baseline path runs and rewrites both files, corrupting the experiment summary and any downstream analysis.

Useful? React with 👍 / 👎.

link.load_model(str(MODEL_PATH))
link.command(f"set episode-length {ticks}")
link.command("run-episode-bounded")
link.command("run-episode-persistent")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reapply episode-length before persistent NetLogo run

The live path sets episode-length only once before run-episode-bounded, but run-episode calls setup, and setup-defaults resets episode-length to 50 in the NetLogo model. That means run-episode-persistent runs at 50 ticks whenever --ticks is not 50, producing unequal run lengths and biasing the bounded-vs-persistent comparison.

Useful? React with 👍 / 👎.

@JNK234
Copy link
Copy Markdown
Collaborator Author

JNK234 commented Feb 26, 2026

Code Review: PR #10 - Game of Life Emergent Objects

What Looks Good

  • Well-structured test harness with comprehensive validation and strict mode
  • Robust CSV parsing with proper error handling for field types
  • Smart fallback mechanism for baseline generation when pyNetLogo unavailable
  • Clear documentation explaining Paradox 1 and memory modes
  • Tests pass: bounded pred=0.44, persistent pred=0.86, lift=0.42

Issues Found

  1. Baseline overwrite bug: In main(), _try_run_with_pynetlogo() return value is ignored. After successful live run, _generate_baseline_if_missing() still runs and may overwrite real outputs.

  2. Magic numbers: Hardcoded thresholds (0.70, 0.10) could be CLI arguments.

Suggestions

  • Store and check _try_run_with_pynetlogo() result before baseline generation
  • Extract threshold constants to module level

Approval

Approved with minor fix recommended. Core logic correctly validates Paradox 1.

@JNK234 JNK234 force-pushed the feature/epiplexity-01-full-context branch from 229ed4d to 1e3e843 Compare February 26, 2026 06:45
JNK234 added 4 commits March 5, 2026 14:39
Major changes:
- Convert from .nlogo to .nlogox (NetLogo 7.0.3 XML format)
- Replace random observer walk with deterministic waypoint navigation
  visiting glider(10,10), blinker(30,30), block(20,20), random(25,25)
- Rich temporal memory: window-history-buffer stores (tick, grid, label)
  tuples instead of flat text labels
- Fair comparison: both episodes re-seed identically for same GoL state
- Add real-time GUI: prediction accuracy plot, label/prediction monitors,
  output widget with per-tick narration
- Add stop button for interrupting long episodes
- Configure for Ollama (local) instead of OpenAI
- Remove simulated baseline from test harness; require real CSV data
- Replace lower-case (not a NetLogo 7 builtin) with to-lower-case reporter
- Fix 'let label' shadowing NetLogo builtin turtle variable
- Update macro_predict.yaml template header for window history format

Known issues and improvements needed:

1. BLOCKING: Qwen 3 thinking mode — Qwen 3 models return empty content
   field and put all output in a 'thinking' field. The LLM extension
   reads message.content which is empty, so llm:choose falls back to
   random selection. Fix options:
   a) Use non-thinking models (qwen2.5, llama3, gemma2)
   b) Patch OllamaProvider.scala to pass "think": false in request body
   c) Patch OllamaProvider.parseProviderResponse to read message.thinking
      when message.content is empty

2. Glider drift — The glider moves away from its seed position (10,10)
   over GoL ticks, so the observer may see empty space at the waypoint.
   Options: track glider position dynamically, increase observation
   region, or accept it as part of the bounded-observer narrative.

3. BehaviorSpace headless — NetLogo 7.0.3 has a known bug where
   BehaviorSpace headless mode fails with "head of empty list" even
   on bundled sample models. Cannot use headless for automated testing.
   Workaround: use sbt test for compilation checks, NetLogo GUI for
   functional testing.

4. Config not tracked — config.txt is untracked (may contain API keys).
   Currently set to provider=ollama model=qwen3:4b. Should be updated
   to a non-thinking model (e.g., qwen2.5:3b) before running.
…e per-tick LLM observer

Completely rewrites the Game of Life demo so the LLM acts as a scientific
observer — describing patterns in free text, predicting next grid states,
and building theories over time. Memory vs no-memory is a live toggle.

Key changes:
- Per-tick LLM calls in `go` (describe + predict + periodic reflect)
- New seeds: R-pentomino, Gosper glider gun, pulsar (dramatic evolution)
- Interactive controls: memory-mode chooser, show-observations switch,
  reflect-every and episode-length sliders
- 3 new YAML templates (describe, predict_grid, reflect)
- Removed batch procedures (run-episode, run-comparison, analyze-results)
- Removed constrained-choice labeling (llm:choose from fixed list)
- Removed Python test harness (batch workflow obsolete)
- History management: bounded clears per tick, persistent caps at 20
- Grid prediction accuracy scored cell-by-cell (0-100%)
- Config monitor shows active file and model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant