-
Notifications
You must be signed in to change notification settings - Fork 79
Description
I am trying to perform k-means classification on the potential distance of the query dataset.
I simply called the extend_to_data function on the query dataset to do so.
However, I don't think the function gives me the potential distance.
def extend_to_data(self, data, **kwargs):
"""Build transition matrix from new data to the graph
Creates a transition matrix such that `Y` can be approximated by
a linear combination of landmarks. Any
transformation of the landmarks can be trivially applied to `Y` by
performing
`transform_Y = transitions.dot(transform)`
Parameters
----------
Y: array-like, [n_samples_y, n_features]
new data for which an affinity matrix is calculated
to the existing data. `n_features` must match
either the ambient or PCA dimensions
Returns
-------
transitions : array-like, [n_samples_y, self.data.shape[0]]
Transition matrix from `Y` to `self.data`
"""
kernel = self.build_kernel_to_data(data, **kwargs)
if sparse.issparse(kernel):
pnm = sparse.hstack(
[
sparse.csr_matrix(kernel[:, self.clusters == i].sum(axis=1))
for i in np.unique(self.clusters)
]
)
else:
pnm = np.array(
[
np.sum(kernel[:, self.clusters == i], axis=1).T
for i in np.unique(self.clusters)
]
).transpose()
pnm = normalize(pnm, norm="l1", axis=1)
return pnmRather, it gives me the transition matrix, which I think is the diffusion probability matrix (transitioned optimal_t times).
So, to transform the transition matrix to the informational distance, I copied from the _calculate_potential function:
c = (1 - self.gamma) / 2
self._diff_potential = ((diff_op_t) ** c) / cMy attempt of mapping a query data on the reference dataset.
phate.data <- Embeddings(reference.seurat, 'symphony')
phate.ref <- phate(
phate.data,
gamma = 0, knn = 10,
ndim = 3, mds.solver = 'smacof', npca = NULL,
knn.dist.method = 'euclidean', mds.dist.method = 'euclidean', seed = 333
)
reference.seurat[['phate']] <- CreateDimReducObject(embeddings=phate.ref$embedding, key='phate_', assay='RNA')
km <- kmeans(phate.ref$operator$diff_potential, centers = 7)
reference.seurat$phate.k <- as.character(km$cluster)
query_phate <- phate.ref$operator$transform(Embeddings(query.seurat, 'symphony'))
query.seurat[['phate']] <- CreateDimReducObject(embeddings=query_phate, key='phate_', assay='RNA')
query_diff_transform<- phate.ref$operator$graph$extend_to_data(Embeddings(query.seurat, 'symphony'))
query_diff_potential <- query_diff_transform^(0.5) / 0.5 # Because gamma = 0
query.seurat$phate.k <- clue::cl_predict(km, newdata=as.matrix(query_diff_potential), type='class_ids')After merging reference.seurat and query.seurat, I visualized the phate dimensions and phate.k clusters.
The query.seurat points overlapped on the reference.seurat points, however, the phate.k position was a little off.
- Did I make a mistake? Also,
- is there a direct way to obtain the potential distance matrix of newdata (query)?, or
- is reference-based mapping with PHATE just not feasible?
Thank you.

