Add path-aware matching for duplicate package names#10
Open
neddp wants to merge 3 commits intoanthonyharrison:mainfrom
Open
Add path-aware matching for duplicate package names#10neddp wants to merge 3 commits intoanthonyharrison:mainfrom
neddp wants to merge 3 commits intoanthonyharrison:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When comparing SBOMs, packages with the same name at different filesystem locations were being deduplicated, making it impossible to track version changes for packages embedded in multiple binaries (e.g., Go stdlib in 5 different executables).
Instead of using just the package name as a key, I changed the code to use
(name, path)tuples. For CycloneDX files, it looks for any property that has both "location" and "path" in the name - so it works with syft'ssyft:location:0:pathor similar conventions from other tools. When displaying differences, if a path exists, it shows the binary name likestdlib (service-a)so you know which binary it's in. SPDX formats don't have path metadata, so they just use an empty string to keep everything consistent.You can now track the same package appearing in different binaries independently. If you have Go stdlib embedded in 5 executables and one gets updated, you'll see exactly which one changed. It works with any SBOM generation tool that puts path info in properties (syft, grype, trivy, etc), but also handles SBOMs that don't have paths at all - it just falls back gracefully. The output only shows the binary name when it's actually useful, keeping things clean.
Added 42 tests.
Fixes #8