Skip to content

Add SO2 embedding modes and init-conditioned embedding interface for RL#5

Open
ht0324 wants to merge 2 commits intogeometric-intelligence:mainfrom
ht0324:team-rl-huntae-initcond-pr
Open

Add SO2 embedding modes and init-conditioned embedding interface for RL#5
ht0324 wants to merge 2 commits intogeometric-intelligence:mainfrom
ht0324:team-rl-huntae-initcond-pr

Conversation

@ht0324
Copy link
Collaborator

@ht0324 ht0324 commented Mar 2, 2026

Summary

This PR adds SO2 embedding experiment infrastructure for RL and fixes embedding inference to use initial-angle conditioning (matching estimation training behavior).

Changes

  • Added RL observation modes:
  • raw
  • embed_only
  • concat (raw + embedding)
  • Added SO2 RL configs for hidden sizes 64 and 128:
  • embed_only_so2_h64.yaml
  • embed_only_so2_h128.yaml
  • concat_so2_h64.yaml
  • concat_so2_h128.yaml
  • Updated embedded config to explicitly set observation_mode: embed_only.
  • Extended embedding interface to accept optional init_pos during inference.
  • Updated Reacher wrapper to extract reset-time SO2 init encoding and pass it during embedding computation.
  • Added SO2 hidden-size sweep configs/scripts:
  • rnn_so2_h64.yaml
  • rnn_so2_h128.yaml
  • scripts/train_so2_rnn_hidden_sweep.sh

Validation

  • poetry run ruff check articulated/estimation/model.py articulated/rl/agent.py articulated/rl/environment.py
  • YAML parse checks for all new RL/estimation configs
  • bash -n scripts/train_so2_rnn_hidden_sweep.sh
  • End-to-end RL comparison rerun with init-conditioning enabled

Results

New comparison file:

  • logs/rl/comparison/obs_mode_comparison_initcond_20260228_014306.csv

Single-seed ranking (higher is better):

  • baseline_raw: -3.902
  • concat_h128: -8.234
  • embed_only_h64: -10.167
  • concat_h64: -10.419
  • embed_only_h128: -11.054

Notes

Raw baseline still performs best in current runs. Init conditioning improved some embedding variants (notably concat_h128). Multi-seed confirmation is recommended before final conclusions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant