Skip to content

[Brev] fail to onboard local ollama #2425

@zNeill

Description

@zNeill

Description

Description:
nemoclaw onboard failed when select Local Ollama

[Environment]
Device: Brev via command line
Node.js:v22.22.2
npm: 10.9.7
Docker: Docker version 29.1.3
NemoClaw: v0.0.24

[Steps to Reproduce]

  1. install Nemoclaw: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
  2. select "7) Local Ollama (localhost:11434) — running (suggested)" -> "2) nemotron-3-nano:30b"

[Expected Result]Nemoclaw should correctly onboard the local ollama

[Actual Result]

 Ollama starter models:
    1) qwen2.5:7b
    2) nemotron-3-nano:30b
    3) Other...

  No local Ollama models are installed yet. Choose one to pull and load now.

  Choose model [1]: 2
  Pulling Ollama model: nemotron-3-nano:30b
[GIN] 2026/04/24 - 06:12:03 | 200 |      44.604µs |       127.0.0.1 | HEAD     "/"
pulling manifest ⠼ time=2026-04-24T06:12:04.520Z level=INFO source=download.go:179 msg="downloading a70437c41b3b in 25 1 GB part(s)"
pulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏  24 GB                         time=2026-04-24T06:12:45.721Z level=INFO source=download.go:179 msg="downloading bca58c750377 inpulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏  24 GB
pulling bca58c750377: 100% ▕█████████████████████████████████▏  10 KB                         tpulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏  24 GB
pulling bca58c750377: 100% ▕█████████████████████████████████▏  10 KB
pulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏  24 GB
pulling manifest
pulling a70437c41b3b: 100% ▕█████████████████████████████████▏  24 GB
pulling bca58c750377: 100% ▕█████████████████████████████████▏  10 KB
pulling 12e88b2a8727: 100% ▕█████████████████████████████████▏   28 B
pulling 12bee8c08a36: 100% ▕█████████████████████████████████▏  488 B
verifying sha256 digest
writing manifest
success
  Loading Ollama model: nemotron-3-nano:30b
time=2026-04-24T06:13:06.904Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34269"
time=2026-04-24T06:13:07.118Z level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-04-24T06:13:07.119Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/shadeform/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce --port 37637"
time=2026-04-24T06:13:07.119Z level=INFO source=sched.go:484 msg="system memory" total="98.3 GiB" free="94.9 GiB" free_swap="4.0 GiB"
time=2026-04-24T06:13:07.119Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d library=CUDA available="78.8 GiB" free="79.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-24T06:13:07.119Z level=INFO source=server.go:771 msg="loading model" "model layers"=53 requested=-1
time=2026-04-24T06:13:07.130Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-24T06:13:07.130Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:37637"
time=2026-04-24T06:13:07.141Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-24T06:13:07.166Z level=INFO source=ggml.go:136 msg="" architecture=nemotron_h_moe file_type=Q4_K_M name="NVIDIA Nemotron 3 Nano 30B A3B BF16" description="" num_tensors=401 num_key_values=118
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes, ID: GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-04-24T06:13:07.277Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-04-24T06:13:07.976Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-24T06:13:08.795Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:14 GPULayers:53[ID:GPU-c6ecd61c-e96f-8366-5a00-ba7ad810947d Layers:53(0..52)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:482 msg="offloading 52 repeating layers to GPU"
time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-04-24T06:13:08.795Z level=INFO source=ggml.go:494 msg="offloaded 53/53 layers to GPU"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="22.4 GiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:245 msg="model weights" device=CPU size="231.0 MiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="2.7 GiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="842.5 MiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.2 MiB"
time=2026-04-24T06:13:08.795Z level=INFO source=device.go:272 msg="total memory" size="26.1 GiB"
time=2026-04-24T06:13:08.795Z level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-24T06:13:08.795Z level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-04-24T06:13:08.795Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
time=2026-04-24T06:13:12.558Z level=INFO source=server.go:1402 msg="llama runner started in 5.44 seconds"
[GIN] 2026/04/24 - 06:13:12 | 200 |  6.149578314s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2026/04/24 - 06:13:13 | 200 |  6.475856842s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2026/04/24 - 06:13:13 | 200 |  559.963362ms |       127.0.0.1 | POST     "/v1/responses"
  Responses API available — OpenClaw will use openai-responses.
  ℹ Using chat completions API (Ollama tool calls require /v1/chat/completions)

  [4/8] Setting up inference provider
  ──────────────────────────────────────────────────
✓ Active gateway set to 'nemoclaw'
[GIN] 2026/04/24 - 06:13:13 | 200 |     227.473µs |       127.0.0.1 | GET      "/api/tags"
  Local Ollama is responding on 127.0.0.1, but containers cannot reach the auth proxy at http://host.openshell.internal:11435. Ensure the Ollama auth proxy is running.
shadeform@brev-rsebf7yzk:~$ ps -elf | grep ollama
0 S shadefo+ 3938848 3851849 15  80   0 - 815477 futex_ 06:06 pts/1   00:01:33 ollama serve
0 S shadefo+ 3955115       1  0  80   0 - 253150 ep_pol 06:11 ?       00:00:00 /home/shadeform/.nvm/versions/node/v22.22.2/bin/node /home/shadeform/.nemoclaw/source/scripts/ollama-auth-proxy.js
0 S shadefo+ 3957711 3938848  7  80   0 - 59883714 futex_ 06:13 pts/1 00:00:12 /usr/local/bin/ollama runner --ollama-engine --model /home/shadeform/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce --port 37637
0 S shadefo+ 3964232 3851849  0  80   0 -  1653 pipe_r 06:16 pts/1    00:00:00 grep --color=auto ollama
shadeform@brev-rsebf7yzk:~$ ss -tlnp | grep 11435
LISTEN 0      511          0.0.0.0:11435      0.0.0.0:*    users:(("node",pid=3955115,fd=21)

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw-SWQA-RelBlckr-Recommended

[NVB#6110214]

Metadata

Metadata

Labels

Local ModelsRunning NemoClaw with local modelsNV QABugs found by the NVIDIA QA TeamPlatform: BrevSupport for Brev deploymentUATIssues flagged for User Acceptance Testing.bugSomething isn't workingv0.0.28Release target

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions