How does PHATE handle new data?

I am trying to perform k-means classification on the potential distance of the query dataset.
I simply called the extend_to_data function on the query dataset to do so.
However, I don't think the function gives me the potential distance.

```python
    def extend_to_data(self, data, **kwargs):
        """Build transition matrix from new data to the graph

        Creates a transition matrix such that `Y` can be approximated by
        a linear combination of landmarks. Any
        transformation of the landmarks can be trivially applied to `Y` by
        performing

        `transform_Y = transitions.dot(transform)`

        Parameters
        ----------

        Y: array-like, [n_samples_y, n_features]
            new data for which an affinity matrix is calculated
            to the existing data. `n_features` must match
            either the ambient or PCA dimensions

        Returns
        -------

        transitions : array-like, [n_samples_y, self.data.shape[0]]
            Transition matrix from `Y` to `self.data`
        """
        kernel = self.build_kernel_to_data(data, **kwargs)
        if sparse.issparse(kernel):
            pnm = sparse.hstack(
                [
                    sparse.csr_matrix(kernel[:, self.clusters == i].sum(axis=1))
                    for i in np.unique(self.clusters)
                ]
            )
        else:
            pnm = np.array(
                [
                    np.sum(kernel[:, self.clusters == i], axis=1).T
                    for i in np.unique(self.clusters)
                ]
            ).transpose()
        pnm = normalize(pnm, norm="l1", axis=1)
        return pnm
```

Rather, it gives me the transition matrix, which I think is the diffusion probability matrix (transitioned optimal_t times).

So, to transform the transition matrix to the informational distance, I copied from the _calculate_potential function:

```python
        c = (1 - self.gamma) / 2
        self._diff_potential = ((diff_op_t) ** c) / c
```

My attempt of mapping a query data on the reference dataset.

```r
phate.data <- Embeddings(reference.seurat, 'symphony')

phate.ref <- phate(
    phate.data,
    gamma = 0, knn = 10,
    ndim = 3, mds.solver = 'smacof', npca = NULL,
    knn.dist.method = 'euclidean', mds.dist.method = 'euclidean', seed = 333
)

reference.seurat[['phate']] <- CreateDimReducObject(embeddings=phate.ref$embedding, key='phate_', assay='RNA')
km <- kmeans(phate.ref$operator$diff_potential, centers = 7)
reference.seurat$phate.k <- as.character(km$cluster)

query_phate <- phate.ref$operator$transform(Embeddings(query.seurat, 'symphony'))
query.seurat[['phate']] <- CreateDimReducObject(embeddings=query_phate, key='phate_', assay='RNA')

query_diff_transform<- phate.ref$operator$graph$extend_to_data(Embeddings(query.seurat, 'symphony'))
query_diff_potential <- query_diff_transform^(0.5) / 0.5 # Because gamma = 0
query.seurat$phate.k <- clue::cl_predict(km, newdata=as.matrix(query_diff_potential), type='class_ids')
```

After merging reference.seurat and query.seurat, I visualized the phate dimensions and phate.k clusters.
The query.seurat points overlapped on the reference.seurat points, however, the phate.k position was a little off.

Reference:
![image](https://github.com/user-attachments/assets/f1b618c5-3f94-435a-98f5-858572cb3f35)

Query:
![image](https://github.com/user-attachments/assets/9f0f0ca1-0735-46fd-87d9-7df689acccf8)

1) Did I make a mistake? Also,
2) is there a direct way to obtain the potential distance matrix of newdata (query)?, or
3) is reference-based mapping with PHATE just not feasible?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does PHATE handle new data? #147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How does PHATE handle new data? #147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions