Skip to content

Implement dense bitset strategy for a compact csr representation #3

@daniel-j-h

Description

@daniel-j-h

With a graph

(0 -> 1)
(0 -> 2)
(1 -> 0)
(1 -> 2)
(2 -> 1)

the TinyGraphIO interchange file format stores the compressed sparse row representation

offsets: [0, 2, 4, 5]
targets: [1, 2, 0, 2, 1]

where edges(v) = targets[offsets[v]] - targets[offsets[v + 1]]

Varints of Deltas

For a compact representation we use the following techniques

  1. The offsets array is monotonically increasing. We use delta encoding to transform the offsets array into small positive integers.
  2. We store the offsets deltas as varints so that small deltas can be represented in as little space as a single byte.
  3. We store the targets array as varints so that small targets can be represented in as little space as a single byte.

With delta encoded varints for offsets and varints for targets, the compressed sparse row graph can be compactly represented as

offsets = [0x0, 0x2, 0x2, 0x1]  # four bytes total
targets = [0x1, 0x2, 0x0, 0x2, 0x1]  # five bytes total

Dense Bitsets

An alternative is representing offsets as a dense bitset indicating when a new sub-range in targets starts

offsets = [1, 0, 1, 0, 1]  # 5 bits total, stored as 1 byte
targets = [1, 2, 0, 2, 1]  # 5 bytes total

This alternative representation is more space efficient for graphs with a relatively low number of edges compared to number of nodes.

Open Questions

Should we support both approaches and leave it to the user to decide? Or can we be smart and select the approach based on what the user graph looks like?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions