feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size by PushpitaJoardar · Pull Request #6 · geometric-intelligence/articulated

PushpitaJoardar · 2026-03-04T08:02:04Z

Summary

Implemented the RL pipeline(RNN estimation) for Reacher-v5, supporting both
raw observation baseline and GRU-embedded observation conditions as
described in the Body-State Manifold Learning proposal.

Changes

New Files

articulated/rl/environment.py — ReacherWithEmbedding wrapper (raw + embedded modes)
articulated/rl/agent.py — RLAgent with PPO, VecNormalize, all config fields
articulated/rl/train.py — Training script with eval and TensorBoard logging
articulated/rl/fit_pca.py — PCA fitting script for GRU embedding compression
articulated/configs/rl/baseline.yaml — Raw obs baseline (500K steps)
articulated/configs/rl/baseline_tuned.yaml — Tuned baseline (1M steps)
articulated/configs/rl/baseline_tuned2.yaml — Tuned baseline, lower LR
articulated/configs/rl/embedded.yaml — GRU-embedded obs config
articulated/configs/estimation/gru_so2.yaml — GRU estimation config (SO2)

Modified Files

articulated/shared/robot_arm.py — Added RobotArm2DKinematics for SO(2)
articulated/estimation/datamodule.py — SO(2) manifold support
articulated/estimation/model.py — GRU support + get_embedding() interface
articulated/estimation/train.py — Training script updates

Results

Condition	Mean Reward	Timesteps
Baseline PPO (raw obs)	-3.80	500K
Embedded RNN (val/acc=24%)	-9.67	1M
Embedded GRU (val/acc=99%)	-6.19	1M

Notes

GRU with kappa=20, seq_length=50 achieves val/acc=0.993
Embedded obs = [h_t | cos/sin joints | target_pos | fingertip_vec]

- environment.py: ReacherWithEmbedding wrapper (raw + embedded modes) - agent.py: RLAgent with PPO, VecNormalize, all config fields - train.py: training script with eval and TensorBoard logging - configs: baseline, baseline_tuned, baseline_tuned2, embedded YAMLs - estimation configs: rnn_so2.yaml, gru_so2.yaml (val/acc=0.993) - embedded obs: h_t + joint angles + target pos + fingertip vec

…dded obs

PushpitaJoardar added 3 commits February 13, 2026 09:39

Add PPO baseline training, logging, evaluation, and rendering pipeline

81a8ede

feat: add RobotArm2DKinematics, GRU estimation, RL pipeline with embe…

f694a93

…dded obs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6
PushpitaJoardar wants to merge 3 commits intogeometric-intelligence:mainfrom
PushpitaJoardar:main

PushpitaJoardar commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PushpitaJoardar commented Mar 4, 2026

Summary

Changes

New Files

Modified Files

Results

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant