-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
With a graph
(0 -> 1)
(0 -> 2)
(1 -> 0)
(1 -> 2)
(2 -> 1)
the TinyGraphIO interchange file format stores the compressed sparse row representation
offsets: [0, 2, 4, 5]
targets: [1, 2, 0, 2, 1]
where edges(v) = targets[offsets[v]] - targets[offsets[v + 1]]
Varints of Deltas
For a compact representation we use the following techniques
- The offsets array is monotonically increasing. We use delta encoding to transform the offsets array into small positive integers.
- We store the offsets deltas as varints so that small deltas can be represented in as little space as a single byte.
- We store the targets array as varints so that small targets can be represented in as little space as a single byte.
With delta encoded varints for offsets and varints for targets, the compressed sparse row graph can be compactly represented as
offsets = [0x0, 0x2, 0x2, 0x1] # four bytes total
targets = [0x1, 0x2, 0x0, 0x2, 0x1] # five bytes total
Dense Bitsets
An alternative is representing offsets as a dense bitset indicating when a new sub-range in targets starts
offsets = [1, 0, 1, 0, 1] # 5 bits total, stored as 1 byte
targets = [1, 2, 0, 2, 1] # 5 bytes total
This alternative representation is more space efficient for graphs with a relatively low number of edges compared to number of nodes.
Open Questions
Should we support both approaches and leave it to the user to decide? Or can we be smart and select the approach based on what the user graph looks like?
Metadata
Metadata
Assignees
Labels
No labels