Skip to content

fix(runtime): replace bind-mount secret staging with env-var pipeline for stateless brokers#523

Merged
ptone merged 4 commits into
GoogleCloudPlatform:mainfrom
ptone:scion/fix-stateless-broker-secrets
Jun 28, 2026
Merged

fix(runtime): replace bind-mount secret staging with env-var pipeline for stateless brokers#523
ptone merged 4 commits into
GoogleCloudPlatform:mainfrom
ptone:scion/fix-stateless-broker-secrets

Conversation

@ptone

@ptone ptone commented Jun 28, 2026

Copy link
Copy Markdown
Member

Fixes #284: Secret staging previously relied on bind-mounting files into agent containers, which is incompatible with stateless broker deployments (Cloud Run, remote brokers). Replaces with an environment-variable pipeline that works in all deployment modes.

Scion Agent (mi-dev-d) added 2 commits June 27, 2026 17:14
…#284)

Replace host-filesystem bind-mount secret staging with a stateless
env-var-based pipeline. The broker serializes file and variable secrets
into a single base64-encoded JSON blob (SCION_STAGED_SECRETS) injected
via -e flag. sciontool init decodes and writes secrets to their target
paths before any child process starts.

This eliminates the broker's dependency on local storage for secrets,
enabling stateless and HA broker deployments.

Phase 1: file + variable secrets. Auth file staging (applyResolvedAuth)
is deferred to Phase 2.
M-1: serializeSecrets now truly deduplicates file secrets by target
path (last entry wins), matching old writeFileSecrets behavior.

M-2: Add TODO(#284) to insertVolumeFlags noting it can be removed
once Phase 2 eliminates remaining bind-mount usage.
@google-cla

google-cla Bot commented Jun 28, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Phase 1 of the stateless broker secret staging design, replacing host-filesystem bind-mounts with a single base64-encoded environment variable (SCION_STAGED_SECRETS) that is decoded and written inside the container by sciontool init. Feedback on these changes highlights a critical permission and ownership issue where files created by WriteStagedSecrets as root are unreadable by the non-root agent user; a solution is suggested to chown files using host UID/GID environment variables. Additionally, an improvement is recommended to make serialization more robust by decoding and re-encoding secrets on the host side to ensure a uniform base64 payload.

Comment thread pkg/runtime/common.go
Comment on lines +901 to 931
func WriteStagedSecrets(homeDir string, staged *StagedSecrets) error {
for _, fs := range staged.FileSecrets {
data, err := base64.StdEncoding.DecodeString(fs.Value)
if err != nil {
data = []byte(fs.Value)
}
seen[target] = true
entries = append(entries, secretMapEntry{
Name: s.Name,
Target: target,
Source: s.Name, // filename within secrets/ volume
})
}

if len(entries) == 0 {
return nil
}

secretsDir := filepath.Join(filepath.Dir(homeDir), "secrets")
if err := os.MkdirAll(secretsDir, 0700); err != nil {
return fmt.Errorf("failed to create secrets directory: %w", err)
if err := os.MkdirAll(filepath.Dir(fs.Target), 0755); err != nil {
return fmt.Errorf("failed to create directory for secret %s: %w", fs.Name, err)
}
if err := os.WriteFile(fs.Target, data, 0600); err != nil {
return fmt.Errorf("failed to write secret file %s: %w", fs.Name, err)
}
}

data, err := json.Marshal(entries)
if err != nil {
return fmt.Errorf("failed to marshal secret-map.json: %w", err)
if len(staged.VariableSecrets) > 0 {
scionDir := filepath.Join(homeDir, ".scion")
if err := os.MkdirAll(scionDir, 0700); err != nil {
return fmt.Errorf("failed to create .scion directory: %w", err)
}
data, err := json.Marshal(staged.VariableSecrets)
if err != nil {
return fmt.Errorf("failed to marshal secrets.json: %w", err)
}
if err := os.WriteFile(filepath.Join(scionDir, "secrets.json"), data, 0600); err != nil {
return fmt.Errorf("failed to write secrets.json: %w", err)
}
}

return os.WriteFile(filepath.Join(secretsDir, "secret-map.json"), data, 0600)
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Critical Permission & Ownership Bug

When sciontool init runs as root (PID 1), any directories and files created by WriteStagedSecrets will be owned by root:root with 0600 permissions. This prevents the non-root scion user (which runs the actual agent/harness process) from reading the staged secrets, causing immediate permission errors.

Solution

  1. Detect if we are running as root (os.Getuid() == 0).
  2. Retrieve the target host UID/GID from the SCION_HOST_UID and SCION_HOST_GID environment variables (which are already injected into the container).
  3. Automatically chown the created files and directories (only if they are under the agent's home directory to avoid modifying system directories like /etc/ssl).
  4. Simplify the decoding logic to strictly decode base64 without any fallbacks, matching the host-side serialization improvement.
func WriteStagedSecrets(homeDir string, staged *StagedSecrets) error {
	var uid, gid int
	if os.Getuid() == 0 {
		if uidStr := os.Getenv("SCION_HOST_UID"); uidStr != "" {
			if id, err := strconv.Atoi(uidStr); err == nil {
				uid = id
			}
		}
		if gidStr := os.Getenv("SCION_HOST_GID"); gidStr != "" {
			if id, err := strconv.Atoi(gidStr); err == nil {
				gid = id
			}
		}
	}

	for _, fs := range staged.FileSecrets {
		data, err := base64.StdEncoding.DecodeString(fs.Value)
		if err != nil {
			return fmt.Errorf("failed to decode secret %s: %w", fs.Name, err)
		}

		dir := filepath.Dir(fs.Target)
		if err := os.MkdirAll(dir, 0755); err != nil {
			return fmt.Errorf("failed to create directory for secret %s: %w", fs.Name, err)
		}
		if (uid > 0 || gid > 0) && strings.HasPrefix(dir, homeDir) {
			_ = os.Chown(dir, uid, gid)
		}

		if err := os.WriteFile(fs.Target, data, 0600); err != nil {
			return fmt.Errorf("failed to write secret file %s: %w", fs.Name, err)
		}
		if uid > 0 || gid > 0 {
			_ = os.Chown(fs.Target, uid, gid)
		}
	}

	if len(staged.VariableSecrets) > 0 {
		scionDir := filepath.Join(homeDir, ".scion")
		if err := os.MkdirAll(scionDir, 0700); err != nil {
			return fmt.Errorf("failed to create .scion directory: %w", err)
		}
		if uid > 0 || gid > 0 {
			_ = os.Chown(scionDir, uid, gid)
		}

		data, err := json.Marshal(staged.VariableSecrets)
		if err != nil {
			return fmt.Errorf("failed to marshal secrets.json: %w", err)
		}

		secretsPath := filepath.Join(scionDir, "secrets.json")
		if err := os.WriteFile(secretsPath, data, 0600); err != nil {
			return fmt.Errorf("failed to write secrets.json: %w", err)
		}
		if uid > 0 || gid > 0 {
			_ = os.Chown(secretsPath, uid, gid)
		}
	}

	return nil
}

Comment thread pkg/runtime/common.go Outdated
Comment on lines 831 to 841
// Re-encode raw values as base64 for uniform handling on the container side.
val := s.Value
if _, err := base64.StdEncoding.DecodeString(val); err != nil {
val = base64.StdEncoding.EncodeToString([]byte(val))
}

// Write to staging dir using the secret name as filename
hostPath := filepath.Join(secretsDir, s.Name)
if err := os.MkdirAll(filepath.Dir(hostPath), 0700); err != nil {
return nil, fmt.Errorf("failed to create secret directory: %w", err)
}
if err := os.WriteFile(hostPath, data, 0600); err != nil {
return nil, fmt.Errorf("failed to write secret file %s: %w", s.Name, err)
entry := StagedFileSecret{
Name: s.Name,
Target: target,
Value: val,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make the secret staging pipeline fully deterministic and robust, we should decode any base64-encoded secrets on the host side during serialization, and then always encode the resulting raw bytes to base64. This ensures that the container side receives a uniform, strictly base64-encoded payload and does not need to guess or fall back when decoding.

Suggested change
// Re-encode raw values as base64 for uniform handling on the container side.
val := s.Value
if _, err := base64.StdEncoding.DecodeString(val); err != nil {
val = base64.StdEncoding.EncodeToString([]byte(val))
}
// Write to staging dir using the secret name as filename
hostPath := filepath.Join(secretsDir, s.Name)
if err := os.MkdirAll(filepath.Dir(hostPath), 0700); err != nil {
return nil, fmt.Errorf("failed to create secret directory: %w", err)
}
if err := os.WriteFile(hostPath, data, 0600); err != nil {
return nil, fmt.Errorf("failed to write secret file %s: %w", s.Name, err)
entry := StagedFileSecret{
Name: s.Name,
Target: target,
Value: val,
}
var data []byte
if decoded, err := base64.StdEncoding.DecodeString(s.Value); err == nil {
data = decoded
} else {
data = []byte(s.Value)
}
entry := StagedFileSecret{
Name: s.Name,
Target: target,
Value: base64.StdEncoding.EncodeToString(data),
}

…teStagedSecrets

Address Gemini review findings on PR GoogleCloudPlatform#523:

1. serializeSecrets now normalizes base64 deterministically by always
   decoding first (to get raw bytes) then re-encoding, ensuring uniform
   base64 payload regardless of input format.

2. WriteStagedSecrets now uses strict base64 decoding (no fallback)
   since the host side guarantees base64-encoded values.

3. When running as root (PID 1 in containers), WriteStagedSecrets reads
   SCION_HOST_UID/GID env vars and chowns created directories and files
   under homeDir so the non-root scion user can read them.
@ptone ptone merged commit c845d00 into GoogleCloudPlatform:main Jun 28, 2026
7 of 9 checks passed
@ptone ptone deleted the scion/fix-stateless-broker-secrets branch June 28, 2026 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant