Skip to content

Strip unused SBOM fields to reduce object size by ~52%#720

Open
slashben wants to merge 2 commits intomainfrom
feature/sbom-size-reduction
Open

Strip unused SBOM fields to reduce object size by ~52%#720
slashben wants to merge 2 commits intomainfrom
feature/sbom-size-reduction

Conversation

@slashben
Copy link
Contributor

@slashben slashben commented Feb 11, 2026

Summary

This PR strips unnecessary fields from generated SBOMs at creation time to reduce memory consumption across the entire system (node-agent, etcd, synchronizer, storage, kubevuln).

Size reduction: ~3.6 MB / 52% for large images (tested with Elasticsearch 8.7.1: 6.89 MB → 3.29 MB)

Changes

Fields stripped from pkg/sbommanager/v1/sbom_manager.go:

  • Package metadata and metadataType (JAR manifests, dpkg file lists, pomProperties, etc) - saves ~3.19 MB / 46%
  • License locations - saves ~228 KB / 3.2%
  • Location accessPath and annotations - saves ~159 KB / 2.2%
  • Package foundBy cataloger name - saves ~15 KB
  • Source metadata and descriptor configuration - saves ~20 KB

Why these fields are safe to remove

These fields are not used by:

  • Grype/kubevuln for vulnerability matching (uses name, version, type, purl, cpes)
  • Third party backend (already stripped by synchronizer before sending)
  • Relevancy scanning (uses files and artifactRelationships which are preserved)

Impact

This change reduces memory pressure in:

  • node-agent (SBOM generation and storage)
  • etcd (CRD storage)
  • synchronizer (SBOM forwarding)
  • kubevuln (vulnerability scanning)

Testing

  • Existing tests continue to pass (no conversion logic tests exist)
  • Validated against ARMO backend field usage analysis
  • Files and artifactRelationships preserved for relevancy feature
  • CPEs preserved for Phase 2 (requires kubevuln changes)

Related

This is Phase 1 of a multi-phase optimization:

  • Phase 1 (this PR): Strip safe fields - saves 52%
  • Phase 2 (future): Enable CPE regeneration in kubevuln and strip CPEs - saves additional 11%
  • Phase 3 (future): Redesign relevancy feature to eliminate files/relationships - saves additional 32%

Summary by CodeRabbit

  • Optimization
    • Reduced SBOM file sizes by removing non-essential metadata and configuration information from generated reports, resulting in more efficient payload delivery.

This change strips unnecessary fields from generated SBOMs at creation time
to reduce memory consumption across the entire system (node-agent, etcd,
synchronizer, storage, kubevuln).

Fields stripped:
- Package metadata and metadataType (JAR manifests, dpkg file lists, etc) - saves ~3.19 MB / 46%
- License locations - saves ~228 KB / 3.2%
- Location accessPath and annotations - saves ~159 KB / 2.2%
- Package foundBy cataloger name - saves ~15 KB
- Source metadata and descriptor configuration - saves ~20 KB

Total savings: ~3.6 MB / 52% for large images (tested with Elasticsearch 8.7.1)

These fields are not used by:
- Grype/kubevuln for vulnerability matching (uses name, version, type, purl, cpes)
- ARMO backend (already stripped by synchronizer)
- Relevancy scanning (uses files and artifactRelationships which are preserved)

Signed-off-by: Ben <ben@armosec.io>
Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

The SBOM manager strips non-essential fields from emitted payloads to reduce size. Modifications remove Locations from licenses, VirtualPath and Annotations from locations, Metadata and Configuration from the root document, and FoundBy from packages. Control flow and public APIs remain unchanged.

Changes

Cohort / File(s) Summary
SBOM Payload Optimization
pkg/sbommanager/v1/sbom_manager.go
Removed non-essential fields across multiple functions (toLicenses, toLocations, toSyftDocument, toSyftPackages) to reduce emitted SBOM payload size. Stripped Locations, VirtualPath, Annotations, Metadata, Configuration, and FoundBy fields with explanatory comments. Also removed unused encoding/json import.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A pruning of bytes, so nimble and neat,
Where rabbits trim SBOMs to make payloads sweet,
No Locations, no paths, no metadata cloud,
Smaller and swifter—the burrow's quite proud! 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: stripping unused SBOM fields to reduce object size by approximately 52%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/sbom-size-reduction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The json package is no longer used after stripping metadata marshaling.

Signed-off-by: Ben <ben@armosec.io>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant