Description
Description:
nemoclaw onboard failed when select Local Ollama
[Environment]
Device: Brev via command line
Node.js:v22.22.2
npm: 10.9.7
Docker: Docker version 29.1.3
NemoClaw: v0.0.24
[Steps to Reproduce]
- install Nemoclaw: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
- select "7) Local Ollama (localhost:11434) — running (suggested)" -> "2) nemotron-3-nano:30b"
[Expected Result]Nemoclaw should correctly onboard the local ollama
[Actual Result]
Ollama starter models:
1) qwen2.5:7b
2) nemotron-3-nano:30b
3) Other...
No local Ollama models are installed yet. Choose one to pull and load now.
Choose model [1]: 2
Pulling Ollama model: nemotron-3-nano:30b
[GIN] 2026/04/24 - 06:12:03 | 200 | 44.604µs | 127.0.0.1 | HEAD "/"
pulling manifest ⠼ time=2026-04-24T06:12:04.520Z level=INFO source=download.go:179 msg="downloading a70437c41b3b in 25 1 GB part(s)"
pulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB time=2026-04-24T06:12:45.721Z level=INFO source=download.go:179 msg="downloading bca58c750377 inpulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB
pulling bca58c750377: 100% ▕█████████████████████████████████▏ 10 KB tpulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB
pulling bca58c750377: 100% ▕█████████████████████████████████▏ 10 KB
pulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB
pulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB
pulling bca58c750377: 100% ▕█████████████████████████████████▏ 10 KB
pulling 12e88b2a8727: 100% ▕█████████████████████████████████▏ 28 B
pulling 12bee8c08a36: 100% ▕█████████████████████████████████▏ 488 B
verifying sha256 digest
writing manifest
success
Loading Ollama model: nemotron-3-nano:30b
time=2026-04-24T06:13:06.904Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34269"
time=2026-04-24T06:13:07.118Z level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-04-24T06:13:07.119Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/shadeform/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce --port 37637"
time=2026-04-24T06:13:07.119Z level=INFO source=sched.go:484 msg="system memory" total="98.3 GiB" free="94.9 GiB" free_swap="4.0 GiB"
time=2026-04-24T06:13:07.119Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d library=CUDA available="78.8 GiB" free="79.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-24T06:13:07.119Z level=INFO source=server.go:771 msg="loading model" "model layers"=53 requested=-1
time=2026-04-24T06:13:07.130Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-24T06:13:07.130Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:37637"
time=2026-04-24T06:13:07.141Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-24T06:13:07.166Z level=INFO source=ggml.go:136 msg="" architecture=nemotron_h_moe file_type=Q4_K_M name="NVIDIA Nemotron 3 Nano 30B A3B BF16" description="" num_tensors=401 num_key_values=118
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes, ID: GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-04-24T06:13:07.277Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-04-24T06:13:07.976Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-24T06:13:08.795Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:482 msg="offloading 52 repeating layers to GPU"
time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:494 msg="offloaded 53/53 layers to GPU"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="22.4 GiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:245 msg="model weights" device=CPU size="231.0 MiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="2.7 GiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="842.5 MiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.2 MiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:272 msg="total memory" size="26.1 GiB"
time=2026-04-24T06:13:08.795Z level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-24T06:13:08.795Z level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-04-24T06:13:08.795Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
time=2026-04-24T06:13:12.558Z level=INFO source=server.go:1402 msg="llama runner started in 5.44 seconds"
[GIN] 2026/04/24 - 06:13:12 | 200 | 6.149578314s | 127.0.0.1 | POST "/api/generate"
[GIN] 2026/04/24 - 06:13:13 | 200 | 6.475856842s | 127.0.0.1 | POST "/api/generate"
[GIN] 2026/04/24 - 06:13:13 | 200 | 559.963362ms | 127.0.0.1 | POST "/v1/responses"
Responses API available — OpenClaw will use openai-responses.
ℹ Using chat completions API (Ollama tool calls require /v1/chat/completions)
[4/8] Setting up inference provider
──────────────────────────────────────────────────
✓ Active gateway set to 'nemoclaw'
[GIN] 2026/04/24 - 06:13:13 | 200 | 227.473µs | 127.0.0.1 | GET "/api/tags"
Local Ollama is responding on 127.0.0.1, but containers cannot reach the auth proxy at http://host.openshell.internal:11435. Ensure the Ollama auth proxy is running.
shadeform@brev-rsebf7yzk:~$ ps -elf | grep ollama
0 S shadefo+ 3938848 3851849 15 80 0 - 815477 futex_ 06:06 pts/1 00:01:33 ollama serve
0 S shadefo+ 3955115 1 0 80 0 - 253150 ep_pol 06:11 ? 00:00:00 /home/shadeform/.nvm/versions/node/v22.22.2/bin/node /home/shadeform/.nemoclaw/source/scripts/ollama-auth-proxy.js
0 S shadefo+ 3957711 3938848 7 80 0 - 59883714 futex_ 06:13 pts/1 00:00:12 /usr/local/bin/ollama runner --ollama-engine --model /home/shadeform/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce --port 37637
0 S shadefo+ 3964232 3851849 0 80 0 - 1653 pipe_r 06:16 pts/1 00:00:00 grep --color=auto ollama
shadeform@brev-rsebf7yzk:~$ ss -tlnp | grep 11435
LISTEN 0 511 0.0.0.0:11435 0.0.0.0:* users:(("node",pid=3955115,fd=21)
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw-SWQA-RelBlckr-Recommended |
[NVB#6110214]
Description
Description:
nemoclaw onboard failed when select Local Ollama
[Environment]
Device: Brev via command line
Node.js:v22.22.2
npm: 10.9.7
Docker: Docker version 29.1.3
NemoClaw: v0.0.24
[Steps to Reproduce]
[Expected Result]Nemoclaw should correctly onboard the local ollama
[Actual Result]
Ollama starter models: 1) qwen2.5:7b 2) nemotron-3-nano:30b 3) Other... No local Ollama models are installed yet. Choose one to pull and load now. Choose model [1]: 2 Pulling Ollama model: nemotron-3-nano:30b [GIN] 2026/04/24 - 06:12:03 | 200 | 44.604µs | 127.0.0.1 | HEAD "/" pulling manifest ⠼ time=2026-04-24T06:12:04.520Z level=INFO source=download.go:179 msg="downloading a70437c41b3b in 25 1 GB part(s)" pulling manifest pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB time=2026-04-24T06:12:45.721Z level=INFO source=download.go:179 msg="downloading bca58c750377 inpulling manifest pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB pulling bca58c750377: 100% ▕█████████████████████████████████▏ 10 KB tpulling manifest pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB pulling bca58c750377: 100% ▕█████████████████████████████████▏ 10 KB pulling manifest pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB pulling manifest pulling a70437c41b3b: 100% ▕█████████████████████████████████▏ 24 GB pulling bca58c750377: 100% ▕█████████████████████████████████▏ 10 KB pulling 12e88b2a8727: 100% ▕█████████████████████████████████▏ 28 B pulling 12bee8c08a36: 100% ▕█████████████████████████████████▏ 488 B verifying sha256 digest writing manifest success Loading Ollama model: nemotron-3-nano:30b time=2026-04-24T06:13:06.904Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34269" time=2026-04-24T06:13:07.118Z level=INFO source=server.go:259 msg="enabling flash attention" time=2026-04-24T06:13:07.119Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/shadeform/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce --port 37637" time=2026-04-24T06:13:07.119Z level=INFO source=sched.go:484 msg="system memory" total="98.3 GiB" free="94.9 GiB" free_swap="4.0 GiB" time=2026-04-24T06:13:07.119Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d library=CUDA available="78.8 GiB" free="79.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-24T06:13:07.119Z level=INFO source=server.go:771 msg="loading model" "model layers"=53 requested=-1 time=2026-04-24T06:13:07.130Z level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-24T06:13:07.130Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:37637" time=2026-04-24T06:13:07.141Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-24T06:13:07.166Z level=INFO source=ggml.go:136 msg="" architecture=nemotron_h_moe file_type=Q4_K_M name="NVIDIA Nemotron 3 Nano 30B A3B BF16" description="" num_tensors=401 num_key_values=118 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes, ID: GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2026-04-24T06:13:07.277Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-04-24T06:13:07.976Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-24T06:13:08.795Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:482 msg="offloading 52 repeating layers to GPU" time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:494 msg="offloaded 53/53 layers to GPU" time=2026-04-24T06:13:08.795Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="22.4 GiB" time=2026-04-24T06:13:08.795Z level=INFO source=device.go:245 msg="model weights" device=CPU size="231.0 MiB" time=2026-04-24T06:13:08.795Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="2.7 GiB" time=2026-04-24T06:13:08.795Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="842.5 MiB" time=2026-04-24T06:13:08.795Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.2 MiB" time=2026-04-24T06:13:08.795Z level=INFO source=device.go:272 msg="total memory" size="26.1 GiB" time=2026-04-24T06:13:08.795Z level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-24T06:13:08.795Z level=INFO source=server.go:1364 msg="waiting for llama runner to start responding" time=2026-04-24T06:13:08.795Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model" time=2026-04-24T06:13:12.558Z level=INFO source=server.go:1402 msg="llama runner started in 5.44 seconds" [GIN] 2026/04/24 - 06:13:12 | 200 | 6.149578314s | 127.0.0.1 | POST "/api/generate" [GIN] 2026/04/24 - 06:13:13 | 200 | 6.475856842s | 127.0.0.1 | POST "/api/generate" [GIN] 2026/04/24 - 06:13:13 | 200 | 559.963362ms | 127.0.0.1 | POST "/v1/responses" Responses API available — OpenClaw will use openai-responses. ℹ Using chat completions API (Ollama tool calls require /v1/chat/completions) [4/8] Setting up inference provider ────────────────────────────────────────────────── ✓ Active gateway set to 'nemoclaw' [GIN] 2026/04/24 - 06:13:13 | 200 | 227.473µs | 127.0.0.1 | GET "/api/tags" Local Ollama is responding on 127.0.0.1, but containers cannot reach the auth proxy at http://host.openshell.internal:11435. Ensure the Ollama auth proxy is running. shadeform@brev-rsebf7yzk:~$ ps -elf | grep ollama 0 S shadefo+ 3938848 3851849 15 80 0 - 815477 futex_ 06:06 pts/1 00:01:33 ollama serve 0 S shadefo+ 3955115 1 0 80 0 - 253150 ep_pol 06:11 ? 00:00:00 /home/shadeform/.nvm/versions/node/v22.22.2/bin/node /home/shadeform/.nemoclaw/source/scripts/ollama-auth-proxy.js 0 S shadefo+ 3957711 3938848 7 80 0 - 59883714 futex_ 06:13 pts/1 00:00:12 /usr/local/bin/ollama runner --ollama-engine --model /home/shadeform/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce --port 37637 0 S shadefo+ 3964232 3851849 0 80 0 - 1653 pipe_r 06:16 pts/1 00:00:00 grep --color=auto ollama shadeform@brev-rsebf7yzk:~$ ss -tlnp | grep 11435 LISTEN 0 511 0.0.0.0:11435 0.0.0.0:* users:(("node",pid=3955115,fd=21)Bug Details
[NVB#6110214]