Skip to content

perf(metrics): Optimize transitive reachability in LakosMetrics via SCC condensation and BitSet propagation #1628

@ThanosTsiamis

Description

@ThanosTsiamis

Problem Description

Currently, LakosMetrics computes architectural metrics by querying transitive component dependencies. The underlying implementation in MetricsComponentDependencyGraph resolves these queries by executing a fresh Depth-First Search (DFS) from scratch for every single component.

On complex codebase graphs with shared downstream subgraphs or deep/cyclic dependencies, this approach results in heavily redundant traversals. The runtime complexity scales toward O(V * (V + E)), causing high CPU overhead and severe memory allocation pressure due to the repeated generation of temporary HashSet instances.

Code Locations

  • com.tngtech.archunit.library.metrics.LakosMetrics (where getNumberOfTransitiveDependencies is called iteratively)
  • com.tngtech.archunit.library.metrics.MetricsComponentDependencyGraph (where getTransitiveDependenciesOf triggers a fresh traversal via addTransitiveDependenciesFrom)

Proposed Solution

Instead of a query-time DFS, we can shift to a robust, one-time precomputation model when the dependency graph is constructed. The proposed architecture applies graph-theoretic optimizations to decouple query performance from graph depth:

  1. Identify Strongly Connected Components (SCCs): Use Kosaraju's algorithm to find and collapse component cycles.
  2. DAG Condensation: Condense the original graph into a Directed Acyclic Graph (DAG) where each node represents an SCC.
  3. Topological BitSet Propagation: Determine the topological order of the DAG and propagate downstream reachability using fast bitwise operations (BitSet.or()). This ensures that shared tails are computed exactly once and combined at the hardware level.
  4. O(1) Lookups: Map the final bitsets back to their respective components and store them in an immutable structure, making all subsequent getTransitiveDependenciesOf calls instant.

Why This Is Better

Dimension Current Implementation Proposed Solution
Time Complexity O(V * (V + E)) worst-case O(V + E) for precomputation, O(1) per query
Redundant Work High (re-traverses identical sub-paths) Zero (sub-paths evaluated once on the DAG)
Allocation Footprint High (repeated transient HashSet creation) Minimal (reuses primitive indices & bit vectors)
Call-site Overhead Increases with graph size and density Constant-time index array offset lookup
Stack Security Vulnerable to recursion depth limits Guaranteed safe via explicit iterative stack frames

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions