-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hi,
I noticed that master fails test_landmark.test_landmark_knn_graph and test_landmark.test_landmark_knn_pygsp_graph.
The reason these tests fails is due to the shape of the landmark operator. The tests expect the landmark transitions operator to be (data.shape[0], n_landmark) and the landmark operator itself to be (n_landmark, n_landmark). I looked into why this is not true. It turns out that the current way of building clusters, MiniBatchKMeans, is not assigning to all n_landmark clusters, so you can have landmark graphs with <= n_landmark nodes. This happens in the tests. I verified by changing the random seed and checking len(np.unique(G.clusters)), and indeed the size of the cluster assignments changes based on the seed.
There's two ways to fix this bug.
- It's working as intended:
n_landmarksis an upper bound, rather than an exact target. In this case, we just need to change the tests to reflect this. - It's not working as intended:
n_landmarksshould be the exact size of the landmark graph. In this case, we need to either a) changeMinibatchKMeansto an algorithm that assigns all clusters 100% of the time, or figure out which parameters ofMinibatchKMeansensures this.