Skip to content

Integrate nearest neighbor query into public API #70

@daniel-j-h

Description

@daniel-j-h

By now we have a spatial index that works on top of a z-order space filling curve

and we have the (optional) functionality in the public API for users to sort graph nodes by a z-order space filling curve

with the idea that it will benefit graph traversals due to better memory locality when we walk a graph e.g. during dijkstra's s-t shortest path.


We need to integrate the spatial index into the public API so that users can

  • start and end their search at locations and not node ids
  • get back a path of locations and not just node ids

Questions in terms of designing the API surface

  • Should we decouple the spatial index from the graph and expose both low-level structures to the user?
  • Or should we tighly couple the graph and the spatial index so that users can do graph.search(start, stop) with locations?

One key realization I had while building the spatial index: If we already sort graph nodes based on the z-order space filling curve, then the spatial index on the very same nodes is free! 😮 This is because all our spatial index is doing is sorting nodes by z value and then doing binary searches with search space pruning on the z values.

In practice what this could mean if we want to tighly couple graph and spatial index

  1. We always sort nodes by z order space filling curve and construct the graph based on that
  2. We then put the spatial index on top of it; in theory there is nothing to do in construction; in practice I believe we want to at least compute and cache the z values because they are not the cheapest to re-compute
  3. When the user searches for a location in the spatial index we simply do binary search on the z values and return indices as results
  4. These indices are the graph node ids to start searches from and stop searches at

What we then have to store for the graph and the spatial index

  • a permutation to reverse the z value sorting to return results back to users
  • one lng, lat array with one item per graph node, keyed by node id
  • one z value array with one item per graph node, keyed by node id, computed from the lng, lat (purely as cache and an optimization)

We should explore if this sort of tighly coupling is worth it for the synnergy effects: graph and spatial index will be very small.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions