Skip to content

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6

Open
PushpitaJoardar wants to merge 3 commits intogeometric-intelligence:mainfrom
PushpitaJoardar:main
Open

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6
PushpitaJoardar wants to merge 3 commits intogeometric-intelligence:mainfrom
PushpitaJoardar:main

Conversation

@PushpitaJoardar
Copy link
Collaborator

Summary

Implemented the RL pipeline(RNN estimation) for Reacher-v5, supporting both
raw observation baseline and GRU-embedded observation conditions as
described in the Body-State Manifold Learning proposal.

Changes

New Files

  • articulated/rl/environment.py — ReacherWithEmbedding wrapper (raw + embedded modes)
  • articulated/rl/agent.py — RLAgent with PPO, VecNormalize, all config fields
  • articulated/rl/train.py — Training script with eval and TensorBoard logging
  • articulated/rl/fit_pca.py — PCA fitting script for GRU embedding compression
  • articulated/configs/rl/baseline.yaml — Raw obs baseline (500K steps)
  • articulated/configs/rl/baseline_tuned.yaml — Tuned baseline (1M steps)
  • articulated/configs/rl/baseline_tuned2.yaml — Tuned baseline, lower LR
  • articulated/configs/rl/embedded.yaml — GRU-embedded obs config
  • articulated/configs/estimation/gru_so2.yaml — GRU estimation config (SO2)

Modified Files

  • articulated/shared/robot_arm.py — Added RobotArm2DKinematics for SO(2)
  • articulated/estimation/datamodule.py — SO(2) manifold support
  • articulated/estimation/model.py — GRU support + get_embedding() interface
  • articulated/estimation/train.py — Training script updates

Results

Condition Mean Reward Timesteps
Baseline PPO (raw obs) -3.80 500K
Embedded RNN (val/acc=24%) -9.67 1M
Embedded GRU (val/acc=99%) -6.19 1M

Notes

  • GRU with kappa=20, seq_length=50 achieves val/acc=0.993
  • Embedded obs = [h_t | cos/sin joints | target_pos | fingertip_vec]

- environment.py: ReacherWithEmbedding wrapper (raw + embedded modes)
- agent.py: RLAgent with PPO, VecNormalize, all config fields
- train.py: training script with eval and TensorBoard logging
- configs: baseline, baseline_tuned, baseline_tuned2, embedded YAMLs
- estimation configs: rnn_so2.yaml, gru_so2.yaml (val/acc=0.993)
- embedded obs: h_t + joint angles + target pos + fingertip vec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant