Jeff/fix/rosnav8 by jeff-hykin · Pull Request #1791 · dimensionalOS/dimos

jeff-hykin · 2026-04-16T05:16:44Z

Pre-requisite PRs

unity-sim — #1681 (merged)
docker — #1431 (merged)
dimos-viewer (separate repo PR) websockets and --connect
native_rebuild3 — #1599
rconnect2 — #1643

Will be perpetually re-testing hardware after merges/changes to ^

631af1e tested for 2 hours on the g1 (entire battery! instantly responsive after 2hrs)
c54578a044f892fe7f231b468f2f3512adf4282a (later) hardware tested and sim tested.

Problem

Robot blueprints each duplicated viewer backend selection logic (foxglove vs rerun vs rerun-web vs rerun-connect), making it error-prone to add new backends or change behavior. The RerunBridge also lacked a WebSocket server for dimos-viewer keyboard/click events in --connect mode.

Solution

port over the whole nav stack

Breaking Changes

None

How to Test

git checkout jeff/fix/rosnav8

# last hardware tested commit
git checkout c54578a044f892fe7f231b468f2f3512adf4282a

# simulation
uv run dimos run unitree-g1-nav-sim

# real
uv run dimos run unitree-g1-nav-onboard

Contributor License Agreement

I have read and approved the CLA.

…nto jeff/fix/rosnav8

…dimos into jeff/fix/rosnav8

…ative_rebuild2

Twelve fixes from Paul-style code review of #1780: native_module.py: - _maybe_build: use did_change(update=False) + post-success update_cache so failed builds never poison the rebuild cache - delete stray "Source files changed" log line that fired on every call - stop(): per-instance _stop_lock; capture proc/watchdog refs under the lock, do signal/wait/join outside the lock to avoid the watchdog-self- join deadlock; second caller no-ops via _stopping - _read_log_stream: receive pid as a parameter instead of reaching into self._process.pid (TOCTOU with stop()) - _child_preexec_linux + ctypes hoisted to module scope under a sys.platform guard; no more re-import per start() - _clear_nix_executable: gracefully handle cwd=None (skip the walk entirely), use .resolve() comparison so a symlinked cwd still terminates the walk, refuse to walk past cwd's tree change_detect.py: - fold max_file_size into _hash_files digest so different thresholds against the same cache_name can't corrupt each other - new _locked_cache_file context manager — flock the .hash file directly in "a+" mode; no more orphan .lock sidecars accumulating in the cache dir; did_change/update_cache/clear_cache all share the helper Tests: - new test_should_rebuild_true_bypasses_change_check for the explicit "always rebuild" path - new test_failed_build_does_not_mark_cache_clean for the update=False retry semantics Build system: - tiledb darwin overlay: filter patches by name instead of dropping all of them, so a future security patch from nixpkgs survives - livox_sdk_config.hpp: honor $TMPDIR on Darwin instead of hardcoding /tmp Perf script: - compute cache_name inline (using the same inspect-based source_file pattern) instead of calling the inlined _build_cache_name method All 26 tests across test_change_detect.py + test_native_module.py + test_native_rebuild.py pass. ruff + mypy clean. mid360_native and fastlio2_native still build end-to-end on aarch64-darwin. Linux drvPaths for libpqxx/tiledb/pcl/vtk verified unchanged by the overlay.

The class name already says "config for extended terrain map" — the docstring added no information beyond the name.

Both removed strings restated the identifier without adding info: - TUIControlConfig: "Configuration for the TUI controller." - TUIControlModule._handle_key: "Process a single keypress."

Per critic audit (commit 0171415 claimed reverts that didn't actually land). Real reverts now applied — verified per file by checking whether any nav_stack code imports/uses the changed API. Reverted to origin/dev: - dimos/msgs/geometry_msgs/PoseWithCovariance.py — only test_Odometry.py uses it (not nav_stack); copy-constructor consolidation isn't needed. - dimos/robot/unitree/go2/blueprints/smart/unitree_go2.py — wrong robot. - dimos/visualization/rerun/constants.py — RERUN_WEB_PORT change is unrelated to nav_stack. - pyproject.toml: dropped xxhash>=3.0.0 (unused) and the duplicate open3d-unofficial-arm without platform marker. Kept the navigation extra (gtsam-extended for PGO), the pytest override-dep, and the ruff main.cpp exclude patterns. - uv.lock regenerated against pruned pyproject.toml. Re-deleted (had been restored by my earlier blanket checkout): - dimos/robot/position_stream.py - dimos/robot/ros_command_queue.py - dimos/robot/unitree/rosnav.py Kept (verified in-scope): - dimos/core/global_config.py — `build_native` field is consumed by native_module.py which all nav_stack native modules inherit from. - dimos/core/native_module.py — used by terrain_analysis, far_planner, local_planner, path_follower, tare_planner, fastlio2. - dimos/msgs/nav_msgs/{ContourPolygons3D,GraphNodes3D,LineSegments3D}.py — imported by far_planner. - dimos/robot/config.py — G1 config (used by nav_stack's blueprint test) requires the new optional `end_effector_link` field. Reverting this broke test_g1_nav_sim_blueprint_importable. - dimos/simulation/unity/module.py — used by test_unity_sim, test_sim_pipeline, test_cross_wall_planning{,_simple}. test_sim_pipeline.py: tui_control was deleted from the branch upstream (commit 08c908e). Removed the dangling `test_tui_publishes_twist_via_transport` test that imported the now- gone module.

Just restated the function name and return type annotation.

Just restated the file's purpose ("Unit tests for Costmap + A* used by SimplePlanner") — already obvious from the path and what's tested.

Same description already lives in the module-level docstring above.

Just paraphrased the file's name and the MovementManager's own docstring.

Per project python-style rule (no magic numbers). Audited ``dimos/navigation/nav_stack/`` production code (excluded tests + auto- generated lcm files); pydantic field defaults like ``sensor_range: float = 20.0`` were left alone since the field name is the explanation. Changes: - ``modules/movement_manager/movement_manager.py``: extract ``MAX_CLICK_HORIZONTAL_M = 500.0`` and ``MAX_CLICK_VERTICAL_M = 50.0`` module-level constants for the click-bounds sanity check that was ``if abs(msg.x) > 500 or abs(msg.y) > 500 or abs(msg.z) > 50:``. - ``modules/pgo/pgo.py``: extract ``_MIN_ICP_INLIERS = 10`` (used by ``_icp`` to bail when the correspondence count is too low to constrain the alignment) and ``_MIN_KEYFRAMES_FOR_LOOP_SEARCH = 10`` (used by ``_SimplePGO.search_for_loops``). - ``main.py::_floor_quad`` (the ground-plane mesh helper): renamed ``s`` / ``z`` locals to ``half_size`` / ``z_below_ground``, and pulled the ``[40, 40, 40, 120]`` RGBA value out into a named ``floor_color_rgba`` local. Color-ramp values in ``_global_map_override`` and ``_terrain_map_override`` (``30 + z_norm * 30`` etc.) left as-is — they're already documented by the explicit ``Low z = (30, 80, 200) → High z = (60, 220, 100)`` comments above each block. Config-dict values being forwarded to sub-modules (e.g. ``"voxel_point_update_threshold": 100``) left as-is — the dict key is the explanation, same as a pydantic field default.

Class docstrings like "Test ICP matching functionality" and method docstrings like "PGO module should declare the right input/output ports" just paraphrased the identifiers above them. Removed: - TestICP class docstring - TestEdgeCases class docstring - TestPGOWrapper class docstring - test_pgo_module_has_correct_ports docstring - test_pgo_config_defaults docstring

Removed class and method docstrings that just paraphrased the identifiers (e.g. "Test that PGO produces a valid global map from keyframes" on TestGlobalMapAccumulation). Kept docstrings that add specifics about assertions or inputs.

Co-authored-by: Paul Nechifor <paul@nechifor.net>

…o jeff/fix/rosnav8

greptile-apps · 2026-04-28T06:46:38Z

Greptile Summary

This PR ports the full CMU SmartNav stack into the dimos framework: a composable create_nav_stack() factory replaces per-blueprint viewer-backend duplication, adding TerrainAnalysis, LocalPlanner, PathFollower, PGO (GTSAM iSAM2 loop closure), SimplePlanner (pure-Python A*), FarPlanner, and MovementManager. Two new G1 blueprints (unitree-g1-nav-onboard, unitree-g1-nav-sim) exercise the stack on real hardware and Unity sim respectively, with correct raw-vs-corrected odometry routing per CMU ICRA 2022 Fig. 11.

P1 (dds_sdk.py stand_up): if _get_fsm_id() returns None (DDS query failure), the stale variable makes None != FsmState.AI_MODE evaluate to True, issuing SetFsmId(AI_MODE) to the physical robot in an unknown state with no guard.

Confidence Score: 3/5

P1 in stand_up() could issue an unsolicited AI-mode command to the physical robot when DDS state query fails; safe to merge for sim, needs fix before onboard use.

One P1 (unconditional SetFsmId on None FSM ID in physical robot control) pulls the ceiling to 4; combined with the scope of the change (7000+ line net addition, new nav stack, new hardware SDK, new sim auto-download) a score of 3 is appropriate.

dimos/robot/unitree/g1/effectors/high_level/dds_sdk.py — stand_up() null-fsm guard

Important Files Changed

Filename	Overview
dimos/robot/unitree/g1/effectors/high_level/dds_sdk.py	New G1 high-level DDS SDK module; stale `fsm_id` in `stand_up()` can issue AI-mode command to robot when state is unknown (P1)
dimos/navigation/nav_stack/main.py	New composable nav stack factory (`create_nav_stack`) wiring TerrainAnalysis, LocalPlanner, PathFollower, FAR/Simple planner, PGO, and MovementManager with correct corrected-vs-raw odometry routing
dimos/navigation/nav_stack/modules/simple_planner/simple_planner.py	New pure-Python A* global planner with costmap, inflation, stuck detection, and lookahead; directly accesses private `Costmap` internals in `_classify_points` (P2)
dimos/navigation/nav_stack/modules/pgo/pgo.py	New Python PGO module with GTSAM iSAM2, ICP loop closure, voxel downsampling, and keyframe management; implementation looks correct
dimos/core/native_module.py	Refactored NativeModule with improved stop/watchdog locking, `prctl(PR_SET_PDEATHSIG)` on Linux, auto-build toggle, and `cli_name_override` for camelCase CLI args
dimos/simulation/unity/module.py	Added Google Drive scene auto-download (`gdown.download_folder`), terrain inclination fitting config, and sensor offsets; downloads entire Drive folder instead of target scene only (P2)
dimos/robot/unitree/g1/blueprints/navigation/unitree_g1_nav_onboard.py	New onboard navigation blueprint composing FastLio2, nav stack, G1HighLevelDdsSdk, and vis_module with correct odometry remappings
dimos/robot/unitree/g1/blueprints/navigation/unitree_g1_nav_sim.py	New sim navigation blueprint composing UnityBridgeModule, nav stack with SimplePlanner, and vis_module; terrain_map→terrain_map_ext remapping for Unity Z-height is correct
dimos/robot/config.py	Made `end_effector_link` optional with a clear error on manipulation use; added `height_clearance`, `width_clearance`, and `internal_odom_offsets` fields; moved imports to top of file

Sequence Diagram

sequenceDiagram
    participant UI as dimos-viewer / Foxglove
    participant MM as MovementManager
    participant SP as SimplePlanner / FarPlanner
    participant LP as LocalPlanner (C++)
    participant PF as PathFollower (C++)
    participant PGO as PGO (GTSAM)
    participant TA as TerrainAnalysis (C++)
    participant HW as G1HighLevelDdsSdk / Unity

    UI->>MM: clicked_point
    MM->>SP: goal (corrected_odometry frame)
    PGO-->>MM: corrected_odometry
    PGO-->>TA: corrected_odometry
    PGO-->>SP: corrected_odometry
    SP->>LP: way_point (map frame)
    TA->>LP: terrain_map (obstacles)
    LP->>PF: path (raw odom frame)
    PF->>MM: nav_cmd_vel
    MM->>HW: cmd_vel
    HW-->>PGO: odometry (raw SLAM)
    HW-->>LP: registered_scan
    PGO->>PGO: loop closure (ICP + iSAM2)
    PGO-->>UI: corrected_odometry + global_map

_{Reviews (1): Last reviewed commit: "test: hoist remaining inline ros1 import..." | Re-trigger Greptile}

greptile-apps · 2026-04-28T06:46:42Z

+                fsm_id = self._get_fsm_id()
+                if fsm_id == FsmState.ZERO_TORQUE:
+                    logger.info("Robot in zero torque, enabling damp mode...")
+                    self.loco_client.SetFsmId(FsmState.DAMP)
+                    time.sleep(self._standup_step_delay / 3)
+                if fsm_id != FsmState.AI_MODE:
+                    logger.info("Starting AI mode...")
+                    self.loco_client.SetFsmId(FsmState.AI_MODE)


Stale fsm_id causes unconditional AI-mode command on unknown state

fsm_id is fetched once at line 269. If _get_fsm_id() returns None (DDS query failure), the comparison None != FsmState.AI_MODE evaluates to True, so SetFsmId(FsmState.AI_MODE) is issued to the robot even when its current state is completely unknown — bypassing the ZERO_TORQUE → DAMP guard. Additionally, when the state is ZERO_TORQUE, after setting DAMP the stale fsm_id (still ZERO_TORQUE) causes the code to immediately issue another SetFsmId(AI_MODE) without re-querying, leaving no opportunity to confirm the transition succeeded.

Suggested change

fsm_id = self._get_fsm_id()

if fsm_id == FsmState.ZERO_TORQUE:

logger.info("Robot in zero torque, enabling damp mode...")

self.loco_client.SetFsmId(FsmState.DAMP)

time.sleep(self._standup_step_delay / 3)

if fsm_id != FsmState.AI_MODE:

logger.info("Starting AI mode...")

self.loco_client.SetFsmId(FsmState.AI_MODE)

fsm_id = self._get_fsm_id()

if fsm_id is None:

logger.warning("Could not read FSM ID; aborting stand_up to avoid unsafe state transition")

return False

if fsm_id == FsmState.ZERO_TORQUE:

logger.info("Robot in zero torque, enabling damp mode...")

self.loco_client.SetFsmId(FsmState.DAMP)

time.sleep(self._standup_step_delay / 3)

fsm_id = self._get_fsm_id() or FsmState.DAMP

if fsm_id != FsmState.AI_MODE:

greptile-apps · 2026-04-28T06:46:43Z

+        mask = heights > 0.0
+        if not np.any(mask):
+            return
+        xs = points[mask, 0]
+        ys = points[mask, 1]
+        hs = heights[mask]
+        cell_size = cm.cell_size
+        ixs = np.floor(xs / cell_size).astype(np.int64)
+        iys = np.floor(ys / cell_size).astype(np.int64)
+        # Group by cell and take max height per cell (vectorized).
+        keys = np.column_stack((ixs, iys))
+        _, inverse, counts = np.unique(keys, axis=0, return_inverse=True, return_counts=True)
+        max_h = np.full(len(counts), float("-inf"))
+        np.maximum.at(max_h, inverse, hs)
+        unique_keys = keys[np.unique(inverse, return_index=True)[1]]
+        dirty = False
+        for i in range(len(unique_keys)):
+            key = (int(unique_keys[i, 0]), int(unique_keys[i, 1]))
+            h = float(max_h[i])
+            prev = cm._heights.get(key, float("-inf"))
+            if h > prev:
+                cm._heights[key] = h
+                dirty = True
+        if dirty:
+            cm._blocked_dirty = True
+
+    def _fresh_costmap(self) -> Costmap:
+        return Costmap(
+            cell_size=self.config.cell_size,
+            obstacle_height=self.config.obstacle_height_threshold,
+            inflation_radius=self.config.inflation_radius,


_classify_points bypasses Costmap.update() and accesses private state

_classify_points writes directly to cm._heights and sets cm._blocked_dirty instead of calling the public cm.update(x, y, height) method. This tightly couples the caller to internal Costmap layout — any future change to how Costmap tracks heights or invalidates the blocked set will silently break the vectorized path here. The public API (update) already does the same max-over-history tracking in a single dict lookup.

greptile-apps · 2026-04-28T06:46:44Z

+
+    if not zip_path.exists():
+        raise FileNotFoundError(
+            f"Failed to download scene '{scene}'. "
+            f"Check https://drive.google.com/drive/folders/{_GDRIVE_FOLDER_ID}"
+        )
+
+    extract_dir = dest_dir / scene
+    if not extract_dir.exists():
+        logger.info(f"Extracting {zip_path}...")
+        with zipfile.ZipFile(zip_path, "r") as zf:


gdown.download_folder downloads entire folder instead of target scene

gdown.download_folder(id=_GDRIVE_FOLDER_ID, ...) fetches every file in the Google Drive folder, not just the requested scene. If the folder contains multiple large scene zips, this silently downloads them all before locating the one that was actually needed. Consider downloading by individual file ID or listing the folder first and filtering to the matching scene zip.

jeff-hykin added 30 commits April 11, 2026 19:36

native rebuild changes

817ddfd

fix: ruff format and update uv.lock

b07cefb

minor fixes

bf80a6b

delete docker nav

a5ff2ca

all vis changes

9423ae4

Merge branch 'jeff/fix/rconnect2' of github.com:dimensionalOS/dimos i…

cf099d8

…nto jeff/fix/rosnav8

-

3e0cdef

fixup

6d6151c

add a conventions.md to dev docs

ac62445

fixup cmd_vel timer edgecase leaks

36492b1

switch cmd_vel_scaling to be on rerun

306bd2b

add convention

f7277d9

fixup

75bf186

Merge branch 'jeff/fix/rconnect2' of github.com:dimensionalOS/dimos i…

9237a97

…nto jeff/fix/rosnav8

Merge branch 'jeff/feat/native_rebuild2' of github.com:dimensionalOS/…

78aba2d

…dimos into jeff/fix/rosnav8

fixup signal handling (looks worse but in practice behaves better)

b8b9981

cleanup

2718e73

make streaming

7641683

cleaner cache building logic (update file cache on check time)

46df237

make dependencies work on MacOS

c1743c9

add nix performance check

efc5aa9

get build working on macos

8ed96a6

add rebuild_on_change

3e38209

add perf test

bb69f20

limit the slow rebuild to macos

2a56733

missed edgecase

29875f8

simplify logging

446871a

add max_file_size

0dc5152

Merge branch 'dev' of github.com:dimensionalOS/dimos into jeff/feat/n…

417796a

…ative_rebuild2

jeff-hykin and others added 24 commits April 27, 2026 22:32

docs: drop tautological docstring on TerrainMapExtConfig

e9f52a2

The class name already says "config for extended terrain map" — the docstring added no information beyond the name.

docs: drop tautological docstrings in tui_control

e666a2b

Both removed strings restated the identifier without adding info: - TUIControlConfig: "Configuration for the TUI controller." - TUIControlModule._handle_key: "Process a single keypress."

-

08c908e

docs: drop tautological docstring on _icp

60406b7

Just restated the function name and return type annotation.

docs: drop tautological module docstring on test_simple_planner

4f0f7b8

Just restated the file's purpose ("Unit tests for Costmap + A* used by SimplePlanner") — already obvious from the path and what's tested.

docs: drop tautological docstring on TestCrossWallPlanning

b6e0888

Same description already lives in the module-level docstring above.

docs: drop tautological module docstring on test_movement_manager

7f1c214

Just paraphrased the file's name and the MovementManager's own docstring.

-

e2e24de

remove most tests

1da9d8e

-

5f3a06e

cleanup

6f29bc0

-

6f116e0

-

05fe984

-

35a15d9

-

1e7e0fe

Apply suggestion from @paul-nechifor

eb91246

Co-authored-by: Paul Nechifor <paul@nechifor.net>

-

a0d5fb8

Merge branch 'jeff/fix/rosnav8' of github.com:dimensionalOS/dimos int…

625375a

…o jeff/fix/rosnav8

-

2dd3ff7

test: hoist remaining inline ros1 imports in test_unity_sim

18367d8

jeff-hykin marked this pull request as ready for review April 28, 2026 06:41

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

jeff-hykin added a commit that referenced this pull request Apr 28, 2026

fix: add pr_responses.yaml for unaddressed leshy comments on PR #1791

c485545

jeff-hykin removed the PlzReview label Apr 28, 2026

jeff-hykin marked this pull request as draft April 28, 2026 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jeff/fix/rosnav8#1791

Jeff/fix/rosnav8#1791
jeff-hykin wants to merge 186 commits intodevfrom
jeff/fix/rosnav8

jeff-hykin commented Apr 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeff-hykin commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-requisite PRs

Problem

Solution

Breaking Changes

How to Test

Contributor License Agreement

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeff-hykin commented Apr 16, 2026 •

edited

Loading