fix(skills): guard one-time setup with try/catch (best-effort instead of fatal) by lxistired · Pull Request #171 · OpenCoworkAI/open-cowork

lxistired · 2026-04-28T11:26:06Z

Problem

Skills directory setup runs once per ClaudeAgentRunner instance and currently lets any filesystem error abort chat startup. On fresh-install machines — Windows especially — we see ENOENT/EPERM races during the initial mkdir/symlink/copy work. A single broken built-in skill or a partially-created ~/.claude config dir prevents the user from sending any message at all.

Repro: install on a Windows box that has never run any Claude-related app, the very first chat fails because ~/.claude doesn't exist yet and one of the per-skill copy ops races against the parent dir creation.

Fix

Three layers of defensive try/catch plus one defensive parent mkdir:

Outer try around the whole one-time setup block. mkdir/readdir failures no longer bubble up — they get logged and the chat continues (without skills, but functional).
Per-skill try/catch in the built-in copy loop. One bad skill directory doesn't skip the rest.
Individual try/catches around syncUserSkillsToAppDir and syncConfiguredSkillsToRuntimeDir — they're independent failure modes (one walks ~/.claude/skills, the other walks the configured global skills dir) and shouldn't take each other down.
Defensive path.dirname(userSkillPath) mkdir before each per-skill copy/symlink — Windows fresh-install races and reports the parent missing even after the runtime skills dir was created a few statements above.

All failures go through logWarn (already imported), no behavior change on healthy machines.

Files

src/main/claude/agent-runner.ts — wrap one-time setup block (effective change ~29 lines via git diff -w; the larger raw diff is just indentation churn)

Test plan

Fresh Windows install (no ~/.claude) — chat starts, skills set up, no crash
Skills setup with one corrupted built-in skill — other skills still register
Same skills work as before on machines where setup already succeeded (no behavior change)
~/.claude/skills doesn't exist — syncUserSkillsToAppDir early-returns (existing behavior preserved)

Skills directory setup runs once per runner instance and currently lets any filesystem error abort the chat startup. On fresh-install machines (Windows especially) we see ENOENT/EPERM races during mkdir/symlink/copy ops — a single broken built-in skill or a partially-created ~/.claude config dir blocks the user from sending any message. Wrap with three layers of defensive try/catch: 1. Outer try around the whole one-time setup block (so mkdir/readdir failures don't propagate up). 2. Per-skill try/catch in the built-in copy loop (so one bad skill directory doesn't skip the rest). 3. Individual try/catches around syncUserSkillsToAppDir and syncConfiguredSkillsToRuntimeDir (independent failure modes). Also added a defensive parent mkdir before each per-skill copy/symlink — Windows fresh-install occasionally races and reports the parent missing even though we created the runtime skills dir a few lines up. Failures are logged via logWarn (non-fatal) and the chat continues without skills support.

github-actions

Findings

[Major] Transient setup failures become sticky for the lifetime of the runner. The new non-fatal outer catch means mkdirSync/readdirSync races now fall through, but _skillsSetupDone was already latched before the try, so later queries will skip setup entirely until some unrelated invalidation path runs. That leaves the Windows race described in the PR as a permanent loss of built-in/user skills for the rest of the chat session instead of recovering on the next message, evidence src/main/claude/agent-runner.ts:1496 with setup latch context at src/main/claude/agent-runner.ts:1408.

Summary

Review mode: initial

1 finding.
No targeted regression test was found under src/tests/ for retrying skills setup after a transient failure.

Testing

Not run (automation)

Open Cowork Bot

github-actions · 2026-04-30T09:29:26Z

+          } catch (e) {
+            logWarn('[ClaudeAgentRunner] syncConfiguredSkillsToRuntimeDir failed:', e);
+          }
+        } catch (setupErr) {


[MAJOR] This catch makes skills setup non-fatal, but _skillsSetupDone was already set to true earlier in the same block. If mkdirSync/readdirSync hits the transient Windows race described in the PR, every later query skips setup entirely until an explicit invalidateSkillsSetup() call, so the failure becomes sticky for the lifetime of this runner instead of recovering on the next message.

Suggested fix:

} catch (setupErr) { this._skillsSetupDone = false; logWarn('[ClaudeAgentRunner] Skills setup failed (non-fatal):', setupErr); }

hqhq1025 added bot-rerun Temporary label for rerunning bot automation and removed bot-rerun Temporary label for rerunning bot automation labels Apr 30, 2026

github-actions Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skills): guard one-time setup with try/catch (best-effort instead of fatal)#171

fix(skills): guard one-time setup with try/catch (best-effort instead of fatal)#171
lxistired wants to merge 1 commit into
OpenCoworkAI:mainfrom
lxistired:fix/skills-missing-dir-enoent

lxistired commented Apr 28, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lxistired commented Apr 28, 2026

Problem

Fix

Files

Test plan

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants