Skip to content

feat: Skill Bank M1+M2 — full Hub skill infrastructure#233

Open
ptone wants to merge 26 commits into
mainfrom
scion/skill-bank
Open

feat: Skill Bank M1+M2 — full Hub skill infrastructure#233
ptone wants to merge 26 commits into
mainfrom
scion/skill-bank

Conversation

@ptone

@ptone ptone commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

Implements Milestone 1 of the Skill Bank feature: the provisioning-side framework that enables templates to declare skills: references in scion-agent.yaml. Skills are resolved via a SkillResolver interface, downloaded, verified, and atomically installed into the agent's skill directory.

This milestone is tested with a mock resolver — real Hub integration comes in M3.

Commits

  1. Types + URI parser + mergeSkillReference struct in pkg/api/types.go, ParseSkillURI parser supporting full/shorthand/alias/bare name forms, ValidateSkillName (kebab-case, 1-64 chars), MergeScionConfig Skills append
  2. Resolver interface + download logicSkillResolver interface with batch resolve and per-skill errors, context injection (ContextWithSkillResolver), installResolvedSkills with staging + hash verification + atomic rename, downloadSkillFile with transport constraints
  3. Provisioning integration — Step 3d in provision.go, fail-closed logic, resolution record writing, container-script manifest extension (ResolvedSkills in ProvisionInputs)
  4. Polish + edge cases — Bundle hash mismatch, YAML parsing, enumerateLocalSkills, resolver error propagation tests

Safety Contracts Enforced

  • S1 (Fail-closed): Required skills with no resolver → provisioning error listing URIs and hint
  • S2 (Hash verification): Per-file SHA-256 via transfer.HashFile(), bundle hash via transfer.ComputeContentHash()
  • S3 (Path safety): Staging directory, path validation (no .., no abs, no backslashes, no NUL, no reserved names), atomic rename
  • S4 (Resolution record): agentHome/.scion/resolved-skills.json with exact versions, hashes, file manifests
  • S5 (Transport): HTTPS only (except localhost), 30s timeout, 10MB per file, no cross-host redirects, signed URLs never logged
  • S6 (Duplicate detection): Two skills resolving to same destination → error listing both URIs

Files Changed

File Change
pkg/api/types.go Add Skills []SkillReference to ScionConfig; add SkillReference struct
pkg/api/skill_uri.go (new) ParseSkillURI, ValidateSkillName, SkillURI type
pkg/api/skill_uri_test.go (new) Table-driven tests for all valid/invalid URI forms
pkg/agent/skill_resolver.go (new) SkillResolver interface, resolved types, context injection, download/stage/verify/install logic
pkg/agent/skill_resolver_test.go (new) Mock resolver tests, hash verification, path safety, duplicate detection
pkg/config/templates.go MergeScionConfig Skills append
pkg/config/templates_test.go Skills merge tests
pkg/agent/provision.go Step 3d: resolve, install, record skills
pkg/agent/provision_test.go Integration tests with mock resolver
pkg/harness/container_script_harness.go ResolvedSkills in ProvisionInputs, manifest reflection

Test Plan

  • URI parser: table-driven tests with all valid forms, invalid forms, edge cases
  • Resolver: mock resolver via httptest.NewTLSServer, hash verification pass/fail
  • Path safety: ../, absolute paths, backslashes, NUL bytes, reserved names
  • Hash verification: per-file and bundle-level mismatch detection
  • Duplicate destination detection
  • Download transport: HTTPS-only, size limit, cross-host redirect rejection
  • Provisioning integration: mock resolver success, required/no resolver failure, optional/no resolver success
  • YAML skills: field parsing roundtrip
  • All existing TestProvisionAgent* tests pass unchanged (no regression)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a skill resolution and installation mechanism for Scion agents, including URI parsing, secure downloading with transport and size constraints, atomic installation, and configuration merging. Feedback on the changes highlights several key issues: a directory creation oversight that prevents staging the resolution record for container-script harnesses, a file-sharing violation on Windows when deleting open files during download failures, a lack of de-duplication for merged skill configurations leading to installation conflicts, a performance bottleneck from not reusing the HTTP client, and a potential directory traversal bug if the skills directory path is empty.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/agent/provision.go
Comment on lines +828 to +830
if info, err := os.Stat(filepath.Dir(inputPath)); err == nil && info.IsDir() {
_ = os.WriteFile(inputPath, recordData, 0644)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

At this point in ProvisionAgent, the directory .scion/harness/inputs has not been created yet (it is created later during h.Provision or instruction injection). Because of the os.Stat check on the parent directory, resolved-skills.json will never be written, and container-script harnesses will not receive the resolved skills in their manifest.

We should explicitly create the directory using os.MkdirAll before writing the file to ensure it is correctly staged.

Suggested change
if info, err := os.Stat(filepath.Dir(inputPath)); err == nil && info.IsDir() {
_ = os.WriteFile(inputPath, recordData, 0644)
}
if err := os.MkdirAll(filepath.Dir(inputPath), 0755); err == nil {
_ = os.WriteFile(inputPath, recordData, 0644)
}

Comment on lines +390 to +403
f, err := os.Create(destPath)
if err != nil {
return fmt.Errorf("failed to create file: %w", err)
}
defer f.Close()

n, err := io.Copy(f, limitedReader)
if err != nil {
return fmt.Errorf("failed to write file: %w", err)
}
if n > maxSize {
_ = os.Remove(destPath)
return fmt.Errorf("file exceeds maximum size of %d bytes", maxSize)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On Windows, attempting to delete an open file will fail with a permission/sharing violation error. Because defer f.Close() only runs when the function returns, the os.Remove(destPath) call on line 401 will fail on Windows.

Additionally, if io.Copy fails, we should explicitly close and clean up the partial/corrupted file from disk.

We should explicitly close the file before attempting to remove it.

Suggested change
f, err := os.Create(destPath)
if err != nil {
return fmt.Errorf("failed to create file: %w", err)
}
defer f.Close()
n, err := io.Copy(f, limitedReader)
if err != nil {
return fmt.Errorf("failed to write file: %w", err)
}
if n > maxSize {
_ = os.Remove(destPath)
return fmt.Errorf("file exceeds maximum size of %d bytes", maxSize)
}
f, err := os.Create(destPath)
if err != nil {
return fmt.Errorf("failed to create file: %w", err)
}
n, err := io.Copy(f, limitedReader)
f.Close()
if err != nil {
_ = os.Remove(destPath)
return fmt.Errorf("failed to write file: %w", err)
}
if n > maxSize {
_ = os.Remove(destPath)
return fmt.Errorf("file exceeds maximum size of %d bytes", maxSize)
}

Comment on lines +152 to +154
// S6: Detect duplicate destinations
destMap := make(map[string]string) // destName → URI
for _, skill := range skills {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the same skill URI is declared in both a base template and an override template, they will be concatenated during MergeScionConfig. When resolving, this results in duplicate ResolvedSkill entries with the exact same URI.

Currently, installResolvedSkills will treat this as a conflict and fail provisioning, or redundantly download and install the same skill multiple times. We should de-duplicate the resolved skills by URI before performing conflict detection and installation.

	// De-duplicate skills by URI to avoid redundant downloads/installs
	uniqueSkills := make([]ResolvedSkill, 0, len(skills))
	seenURIs := make(map[string]bool)
	for _, skill := range skills {
		if !seenURIs[skill.URI] {
			seenURIs[skill.URI] = true
			uniqueSkills = append(uniqueSkills, skill)
		}
	}
	skills = uniqueSkills

	// S6: Detect duplicate destinations
	destMap := make(map[string]string) // destName → URI
	for _, skill := range skills {

Comment on lines +358 to +370
client := &http.Client{
Timeout: downloadTimeout,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
// S5: No cross-host redirects
if len(via) > 0 && req.URL.Host != via[0].URL.Host {
return fmt.Errorf("cross-host redirect not allowed: %s → %s", via[0].URL.Host, req.URL.Host)
}
if len(via) >= 10 {
return fmt.Errorf("too many redirects")
}
return nil
},
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new http.Client on every call to downloadSkillFile prevents TCP connection reuse and connection pooling. If a skill contains multiple files, this will force a new TCP connection and TLS handshake for every single file, significantly slowing down downloads.

We should reuse a single shared http.Client instance (e.g., as a package-level variable) since http.Client is safe for concurrent use.

Please define sharedDownloadClient as a package-level variable to enable connection pooling:

var sharedDownloadClient = &http.Client{
	Timeout: downloadTimeout,
	CheckRedirect: func(req *http.Request, via []*http.Request) error {
		if len(via) > 0 && req.URL.Host != via[0].URL.Host {
			return fmt.Errorf("cross-host redirect not allowed: %s → %s", via[0].URL.Host, req.URL.Host)
		}
		if len(via) >= 10 {
			return fmt.Errorf("too many redirects")
		}
		return nil
	},
}
	client := sharedDownloadClient

Comment on lines +434 to +435
func enumerateLocalSkills(agentHome, skillsDir string) []SkillResolutionEntry {
skillsPath := filepath.Join(agentHome, skillsDir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If skillsDir is empty, filepath.Join(agentHome, skillsDir) will resolve to agentHome. This would cause enumerateLocalSkills to read the entire agent home directory and incorrectly treat all its subdirectories (like .scion or workspace) as local skills.

We should add a defensive check to return nil immediately if skillsDir is empty.

func enumerateLocalSkills(agentHome, skillsDir string) []SkillResolutionEntry {
	if skillsDir == "" {
		return nil
	}
	skillsPath := filepath.Join(agentHome, skillsDir)

@ptone ptone changed the title feat: Skill Bank M1 — schema, URI parser, and provisioning framework feat: Skill Bank M1+M2 — full Hub skill infrastructure Jun 11, 2026
Scion Agent and others added 10 commits June 11, 2026 05:38
Add the foundational types for skill bank provisioning:

- SkillReference struct in pkg/api/types.go with URI, As, and Optional fields
- Skills field on ScionConfig for declaring skill dependencies in templates
- ParseSkillURI parser supporting full, shorthand, alias, and bare name forms
- ValidateSkillName for kebab-case name validation (1-64 chars)
- MergeScionConfig extension to append Skills slices
- Comprehensive table-driven tests for all valid/invalid URI forms
Implement the resolver framework and all safety contracts:

- SkillResolver interface with batch resolve, per-skill errors
- Context injection via ContextWithSkillResolver/SkillResolverFromContext
- installResolvedSkills: staging, per-file hash verification (S2),
  path safety validation (S3), atomic rename install
- downloadSkillFile: HTTPS-only (S5), 30s timeout, 10MB limit,
  cross-host redirect rejection, signed URL secrecy
- Duplicate destination detection (S6) before installation
- SkillResolutionRecord types for audit/diagnostic record (S4)
- Helper functions: enumerateLocalSkills, collectRequiredSkillURIs
- Comprehensive tests with httptest mock server
Wire the SkillResolver into ProvisionAgent as step 3d:

- Fail-closed (S1): required skills with no resolver → provisioning error
  listing URIs and a hint to connect Hub or mark optional
- Optional skills with no resolver → debug log and continue
- Resolve via context-injected SkillResolver, handle per-skill errors
- Install resolved skills via installResolvedSkills (staging + verify)
- Write resolution record to agentHome/.scion/resolved-skills.json (S4)
- Stage resolved-skills.json as container-script harness input
- Add ResolvedSkills field to ProvisionInputs manifest
- Add manifest reflection for resolved-skills.json in Provision()
- Integration tests: mock resolver success, required/no resolver failure,
  optional/no resolver success
- All existing provision tests pass unchanged (no regression)
Additional test coverage for the skill bank provisioning framework:

- Bundle hash mismatch detection (S2 bundle-level verification)
- enumerateLocalSkills: filters hidden dirs and files
- enumerateLocalSkills: handles non-existent directory
- Override existing local skill with resolved skill
- Empty skills list handling
- YAML skills field parsing (verifies scion-agent.yaml roundtrip)
- Resolver per-skill error on required skill → provisioning failure
- All existing provisioning tests continue to pass
…orm#399

- Add guard in provision.go when skills are resolved but harness has no
  skills directory configured
- Fix @ version split in skill_uri.go to only match after the last path
  separator (prevents confusion with URI authority credentials)
- Close file handle before os.Remove in skill_resolver.go size-exceeded
  path (Windows compatibility)
- Use filepath.ToSlash() on InstalledPath values in resolved-skills.json
  for Linux container compatibility
…skills

Introduces the database layer for the skill bank feature:

- Ent schemas for Skill and SkillVersion entities with appropriate
  indexes and enum constraints
- Store models (Skill, SkillVersion, SkillFilter) and SkillStore interface
  with CRUD operations and semver-aware version resolution
- Full Ent adapter implementation including ResolveSkillVersion with
  support for exact match, constraint ranges (^, ~), content-addressed
  lookup (sha256:), and "latest" resolution
- Storage helpers (SkillStoragePath, SkillStorageURI) for skill content
- CompositeStore integration embedding the new SkillStore
…ution

Implements the HTTP API layer for skills:

- CRUD endpoints: GET/POST /api/v1/skills, GET/PATCH/DELETE /api/v1/skills/{id}
- Version management: GET/POST /api/v1/skills/{id}/versions
- Upload/finalize workflow matching template pattern with signed URLs
- Batch resolve: POST /api/v1/skills/resolve with scope search order
  (user > project > global > core) and semver constraint matching
- Single-skill resolve and download endpoints
- Publication validation: SKILL.md required, size limits, version immutability
- Capability and scope action registration for "skill" resource type
- Route registration in server.go
Adds SkillService interface and implementation to hubclient:

- CRUD: List, Get, Create, Update, Delete
- Version management: PublishVersion, ListVersions, FinalizeVersion
- File transfer: RequestUploadURLs, UploadFile, DownloadFile
- Batch resolution: Resolve
- Client-side types mirroring Hub API request/response shapes
- Skills() accessor added to Client interface
Adds `scion skills` command group with subcommands:

- list: List skills with scope and search filtering
- show: Show skill details and version history
- create: Scaffold a local skill directory with SKILL.md template
- publish: Upload and publish a skill directory to Hub with version
- delete: Soft-delete a skill
- versions: List versions for a skill
- resolve: Debug/test skill URI resolution

Includes `scion skill` singular alias and --format json support
for all commands. Publish workflow: collect files, validate limits,
create/find skill, create draft version, upload via signed URLs,
finalize with content hash computation.
Adds comprehensive tests for the SkillStore implementation:

- CRUD operations (create, get, get-by-slug, update, delete)
- Soft delete via status=archived
- List with scope, name, and search filters
- Version CRUD and version immutability (unique index enforcement)
- Semver resolution: latest, exact, constraint (^, ~, >=), content hash
- Pre-release exclusion from latest/constraint resolution
- Draft version exclusion from resolution
- Unique slug per scope enforcement

Also fixes mockHubClient in templatecache tests to implement the
new Skills() method on the hubclient.Client interface.
@ptone ptone force-pushed the scion/skill-bank branch from 6338262 to f5b2468 Compare June 11, 2026 05:43
Scion Agent (skill-bank-m2-dev) added 16 commits June 11, 2026 05:53
- H1: Add authorization checks (CheckAccess) on all skill write
  endpoints (create, update, delete, publish, upload, finalize)
  following the template_handlers.go pattern with scope-aware checks
  for global, project, and user-scoped skills.

- M1: Thread projectID into determineScopeSearchOrder so project-
  scoped skills are found during default batch resolve search order
  (user > project > global > core).

- M3: Always filter on scope_id in GetSkillBySlug (including empty
  string for global/core) to prevent cross-tenant data leaks. Also
  filter on status=active to exclude archived skills from resolution.

- L2: Replace deprecated strings.Title with custom skillDisplayName.
- L5: Use sync.Once for thread-safe transfer client initialization.
- L6: Distinguish skill-found-but-version-not-resolved from skill-
  not-found in resolveSkill error messages.
- L7: Validate scope field against allowed values in createSkill.
Implement HubSkillResolver that bridges hubclient.SkillService to the
agent.SkillResolver interface. The resolver maps Hub response types
(hubclient.ResolvedSkill, DownloadURLInfo) to agent-internal types
(agent.ResolvedSkill, ResolvedFile), preserving As from the original
SkillReference (not from the Hub response).

Includes unit tests covering type mapping, As/Optional preservation,
per-skill errors, transport errors, multiple skills, and empty refs.
- Add ContextWithResolveProjectID/UserID context injection functions
  in skill_resolver.go for identity propagation.
- Update provision.go to populate ResolveOpts.ProjectID from context
  (falling back to config.ReadProjectID) and UserID from context.
- Inject HubSkillResolver in cmd/create.go local mode when a Hub
  client is available (lightweight EnsureHubReady with SkipSync).
- Inject HubSkillResolver in broker handlers.go from the Hub
  connection before both Provision and Start paths.
- Add ResolverNamer interface so the resolution record captures the
  actual resolver name ("hub") instead of hardcoded "mock".
H1 fix: In provision.go error iteration, when findRefByURI returns nil
(unmatched URI), the condition `ref != nil && !ref.Optional` was false,
silently skipping the error as optional. Changed to `ref == nil ||
!ref.Optional` so unmatched URIs are treated as required (fail-closed).

M1 fix: Wire ContextWithResolveUserID in broker handlers.go from
req.UserID. Local CLI mode has no user identity source on the Hub
client, so UserID remains empty there (acceptable — scoped resolution
is optional).
Add content-hash-keyed caching for resolved skills so the broker
doesn't re-download identical skill files on every provision. Reuses
the existing templatecache.Cache infrastructure with a separate cache
instance (500MB default) under ~/.scion/cache/skills/.

- CachingSkillResolver decorator wraps HubSkillResolver, injects cache
  into context, and logs cache hit/miss per resolved skill
- installOneSkill checks cache before downloading (copy on hit) and
  populates cache after successful download+verify
- Broker server initializes skCache in initHubIntegration() alongside
  the template and harness-config caches
- Handler wiring wraps resolver with CachingSkillResolver when cache
  is available
- latest/range constraints always re-resolve against the Hub but skip
  download when the content hash matches a cached entry
Wire SkillFilter.Tags in the store adapter with AND semantics (all tags
must match). Extend search to also match against the tags JSON field.
Parse tags query param in the list handler, add Tags to hub client
ListSkillsOptions, and add --tags flag to CLI list command.
Add deprecateSkillVersion handler (POST .../versions/{vid}/deprecate)
that transitions a published version to deprecated with a message and
optional replacement URI. Add DeprecateVersion to hub client and
'scion skills deprecate' CLI command.
Extend ResolvedSkillResponse, hubclient.ResolvedSkill, and agent
ResolvedSkill/SkillResolutionEntry with deprecation fields. Populate
deprecation metadata in handleSkillsResolve. Map fields through the hub
skill resolver. Emit deprecation warnings via util.Warnf during skill
installation. Update ResolveSkillVersion to include deprecated versions
so they still resolve (warnings, not errors).
Add download_count field to SkillVersion ent schema and regenerate.
Add DownloadCount to store model, IncrementSkillVersionDownloadCount to
store interface and ent adapter (atomic AddDownloadCount(1)). Fire-and-
forget increment in handleSkillsResolve goroutine. Display download
count per version in CLI show output.
Add tags column to skill list output. Fix deprecation warning to use
fmt.Fprintln(os.Stderr) instead of nonexistent util.Warnf. Add
DeprecateVersion to mock SkillService to satisfy updated interface.
M6-M1: latest and constraint resolution now prefer published versions
over deprecated. Pinned exact-version lookups still return deprecated
versions (correct: caller asked for a specific version and gets a
warning). Fallback to deprecated only when no published version matches.

M6-M2: Tag filtering wraps tag values in JSON quotes ("tag") to avoid
substring false positives (e.g. filtering by "review" no longer matches
skills tagged "preview").
M6-L1: Add server-side validation that deprecation message is non-empty.
M6-L4: Use TagsContainsFold (case-insensitive) for tag filtering to
match the case-insensitive behavior of search.
After resolver.Resolve(), validate that every requested skill URI appears
in either the resolved or error sets, reject unrequested/duplicate skills.
Fix HubSkillResolver to skip unknown URIs instead of silently accepting
zero-value refs. Enumerate local skills before installing registry skills
to prevent M2 resolution record duplication.
Add authorization checks to skill read endpoints that were missing them:
- listSkills: filter out skills where ActionRead is denied
- getSkill: check ActionRead before returning, 404 on denial
- handleSkillDownload: check ActionRead before generating URLs
- handleSkillsResolve: check ActionRead per resolved skill
- handleSkillResolveSingle: check ActionRead before resolving

Fix user scope authorization gap in createSkill: the user scope
fell through with no check, allowing arbitrary scopeId. Now enforces
scopeId = authenticated user's ID.

Add batch resolve item cap (max 50) to prevent abuse.

Add tests for all authorization behaviors.
Add cache hit hash verification in installOneSkill: after CopyToDir,
verify each file's SHA-256 matches the expected hash. On mismatch,
remove the cached copy and fall through to download.

Add nil cache guard to NewCachingSkillResolver: panic early if
cache is nil to prevent subtle nil-pointer bugs downstream.
…sion

These two read endpoints were missing ActionRead checks. Both now
fetch the parent skill, check ActionRead, and return 404 on denial,
matching the pattern used by all other skill read endpoints.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant