Skip to content

Landmark graph size does not match n_landmark #60

@stanleyjs

Description

@stanleyjs

Hi,

I noticed that master fails test_landmark.test_landmark_knn_graph and test_landmark.test_landmark_knn_pygsp_graph.

The reason these tests fails is due to the shape of the landmark operator. The tests expect the landmark transitions operator to be (data.shape[0], n_landmark) and the landmark operator itself to be (n_landmark, n_landmark). I looked into why this is not true. It turns out that the current way of building clusters, MiniBatchKMeans, is not assigning to all n_landmark clusters, so you can have landmark graphs with <= n_landmark nodes. This happens in the tests. I verified by changing the random seed and checking len(np.unique(G.clusters)), and indeed the size of the cluster assignments changes based on the seed.

There's two ways to fix this bug.

  1. It's working as intended: n_landmarks is an upper bound, rather than an exact target. In this case, we just need to change the tests to reflect this.
  2. It's not working as intended: n_landmarks should be the exact size of the landmark graph. In this case, we need to either a) change MinibatchKMeans to an algorithm that assigns all clusters 100% of the time, or figure out which parameters of MinibatchKMeans ensures this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions