Ts embedding by emptymalei · Pull Request #72 · emptymalei/deep-learning

emptymalei · 2026-03-22T20:22:40Z

No description provided.

- Updated Jupytext version from 1.15.2 to 1.19.1 in multiple notebooks. - Added a new notebook for Contrastive Predictive Coding (CPC) for time series data, including configurations, data handling, model architecture, and training logic. - Updated dependency for `ts-bolt` from version 0.0.6 to 0.0.7 in `pyproject.toml`.

…ctive coding notebook

…g notebook with new data handling

…ding

…ta handling in predictive coding

…nization; streamline imports and enhance DataLoader configurations.

… configuration

…pochs to 300; modify normalization layer in CPCEncoder

Copilot

Pull request overview

This PR appears to support time-series embedding experiments by adding new ML dependencies (FAISS + torchdr) and introducing/expanding notebooks for embedding visualization and Contrastive Predictive Coding (CPC).

Changes:

Add embedding-related dependencies (e.g., torchdr, faiss-gpu) and update ts-bolt.
Extend existing notebooks with embedding investigation sections and checkpoint reload workflows.
Add new CPC notebooks plus YAML configs for multiple datasets.

Reviewed changes

Copilot reviewed 41 out of 45 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
requirements.txt	Adds FAISS/torchdr and large ML dependency set (torch, CUDA libs, sklearn/scipy, etc.).
pyproject.toml	Adds `torchdr` + `faiss-gpu` to main dependencies; bumps `ts-bolt` to 0.0.7.
.gitignore	Ignores `dl/notebooks/data/predictive_coding/`.
dl/notebooks/ts_dl_utils/embedding/init.py	Embedding package init (file present in PR scope).
dl/notebooks/tree_random_forest.py	Updates Jupytext metadata version.
dl/notebooks/tree_darts_random_forest.py	Updates Jupytext metadata version.
dl/notebooks/tree_darts_boosted_tree.py	Updates Jupytext metadata version.
dl/notebooks/tree_basics.py	Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla.py	Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla-testing.py	Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla-testing-m5.py	Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla_naive_data.py	Updates Jupytext metadata version.
dl/notebooks/transformer-explainer.py	Updates Jupytext metadata version.
dl/notebooks/transformer_timeseries_univariate.py	Adds `save_hyperparameters()` and a large embedding investigation section; updates kernelspec metadata.
dl/notebooks/transformer_history.py	Updates Jupytext metadata version.
dl/notebooks/transformer_embeddings.py	Adds a new (currently header-only) notebook stub.
dl/notebooks/timeseries-comparison.py	Updates Jupytext metadata version.
dl/notebooks/timeseries_gan.py	Updates Jupytext metadata version.
dl/notebooks/timeseries_data_box-cox.py	Updates Jupytext metadata version.
dl/notebooks/time-series-data-generation.py	Updates Jupytext metadata version.
dl/notebooks/time_vae.py	Updates kernelspec metadata; changes an output inspection to use `.shape`; adds embedding investigation section.
dl/notebooks/time_vae_poison.py	Updates kernelspec metadata; changes reload defaults/paths; adds embedding investigation section.
dl/notebooks/time_series_data_and_embedding.py	Updates Jupytext metadata version; removes leftover Jupytext cell markers.
dl/notebooks/tabpfn.py	Updates Jupytext metadata version.
dl/notebooks/rnn_timeseries.py	Updates Jupytext metadata version.
dl/notebooks/rnn_timeseries_comparison.py	Updates Jupytext metadata version.
dl/notebooks/rnn_phase_space.py	Updates Jupytext metadata version.
dl/notebooks/predictive_coding.py	Adds a full CPC experimentation notebook (classification + downstream evaluation).
dl/notebooks/predictive_coding_forecasting.py	Adds CPC experimentation notebook oriented toward forecasting/data module usage.
dl/notebooks/pendulum_dataset.py	Updates Jupytext metadata version.
dl/notebooks/neuralode_timeseries.py	Updates Jupytext metadata version.
dl/notebooks/lstm_properties.py	Updates Jupytext metadata version.
dl/notebooks/hierarchical_forecasting_mint.py	Updates Jupytext metadata version.
dl/notebooks/feedforward_neural_netwroks_timeseries.py	Updates Jupytext metadata version.
dl/notebooks/diffusion_process.py	Updates Jupytext metadata version.
dl/notebooks/diffusion_model.py	Updates Jupytext metadata version.
dl/notebooks/diffusion_model_timegrad.py	Updates Jupytext metadata version.
dl/notebooks/creating_time_series_datasets.py	Updates Jupytext metadata version.
dl/notebooks/configs/predictive_coding/config.synth.yaml	Adds CPC training config for synthetic series.
dl/notebooks/configs/predictive_coding/config.sleep.yaml	Adds CPC training config for Sleep dataset.
dl/notebooks/configs/predictive_coding/config.forda.yaml	Adds CPC training config for FordA dataset.
dl/notebooks/configs/predictive_coding/config.ecg5000.yaml	Adds CPC training config for ECG5000 dataset.
dl/notebooks/configs/predictive_coding/config.ecg200.yaml	Adds CPC training config for ECG200 dataset.
dl/notebooks/configs/predictive_coding/config.binaryheartbeat.yaml	Adds CPC training config for BinaryHeartbeat dataset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T20:26:42Z

 loguru = "^0.7.2"
 tabulate = "^0.9.0"
 dtaidistance = "^2.3.12"
+torchdr = "^0.3"
+faiss-gpu = "^1.7.2"


torchdr and especially faiss-gpu are added to the main (non-optional) Poetry dependencies. Since requirements.txt is generated from the main dependency set and is used by Netlify / the PDF docs workflow (pip install -r requirements.txt), this will force GPU/CUDA-specific installs during docs builds and on developer machines by default. Consider moving these into an optional dependency group (e.g. [tool.poetry.group.torch] or a dedicated embedding/notebook group) and keeping the main dependency set limited to what MkDocs needs.

Copilot · 2026-03-22T20:26:42Z

    def __init__(self, transformer: nn.Module):
        super().__init__()
        self.transformer = transformer
+        self.save_hyperparameters()



self.save_hyperparameters() will capture the transformer module object passed into __init__. This can bloat checkpoints and often breaks load_from_checkpoint (which expects to reconstruct init args from saved hparams). Consider either ignoring the module (save_hyperparameters(ignore=["transformer"])) and requiring it to be passed on load, or saving only a serializable transformer config needed to reconstruct it.

Copilot · 2026-03-22T20:26:42Z

+load_from_checkpoint = (
+    Path(
+        # "lightning_logs/transformer_ts_1_step/version_9"
+        "lightning_logs/transformer_ts_1_step/version_7"
+    )
+    / "checkpoints"


This notebook hard-codes a specific lightning_logs/.../version_7 path for checkpoint loading. That makes the notebook non-reproducible for others and brittle when log versions change. Prefer deriving the path from logger_1_step.log_dir (already noted below) or parameterizing the version/checkpoint filename.

Copilot · 2026-03-22T20:26:42Z

+def embedding_extractor(
+    forecaster: TransformerForecaster, x: torch.Tensor
+) -> tuple[torch.Tensor]:
+    """compute the embeddings based on the input
+
+    :param forecaster: the trained forecaster
+    :param x: input historical time series,
+    """
+    forecaster.transformer.to(x.device)
+    x_embedding = forecaster.transformer.embedding(
+        x.type_as(forecaster.transformer.embedding.weight)
+    )
+    x_positional = forecaster.transformer.positional_encoding(x_embedding)
+
+    encoder_state = forecaster.transformer.encoder(x_positional)
+
+    reversed = forecaster.transformer.reverse_embedding(encoder_state).squeeze(-1)
+
+    return x_embedding, x_positional, encoder_state, reversed


embedding_extractor is annotated as returning tuple[torch.Tensor] but actually returns 4 tensors. Also, assigning to reversed shadows Python’s built-in reversed() function. Update the return type annotation to match and rename the local variable to something like reversed_embedding/decoder_in for clarity.

Copilot · 2026-03-22T20:26:43Z

 )

-generated_samples_x.size()
+generated_samples_x.size


generated_samples_x is a NumPy array here, and .size is a scalar attribute (not a shape tuple). If the intent is to inspect the shape like in the other notebook, use .shape (or if this were a tensor, use .size() with parentheses).

Suggested change

generated_samples_x.size

generated_samples_x.shape

Copilot · 2026-03-22T20:26:43Z

+        self, data, batch_idx: int = 0
+    ) -> tuple[torch.Tensor, torch.Tensor]:
+        """Run one prediction step.
+
+        :param data: Mini-batch tuple ``(x, y)``.
+        :param batch_idx: Batch index within the prediction epoch.
+        :returns: Tuple of context and encoded tensors.
+        """
+        x = data.float()
+        context, encoded = self.model(x)


predict_step’s docstring says data is a mini-batch tuple (x, y), but the implementation treats it as a tensor and calls .float() directly. This will fail if you use Trainer.predict(...) with a real DataLoader. Consider changing the signature to accept batch, unpack (x, _), and then run the model.

Suggested change

self, data, batch_idx: int = 0

) -> tuple[torch.Tensor, torch.Tensor]:

"""Run one prediction step.

:param data: Mini-batch tuple ``(x, y)``.

:param batch_idx: Batch index within the prediction epoch.

:returns: Tuple of context and encoded tensors.

"""

x = data.float()

context, encoded = self.model(x)

self, batch, batch_idx: int = 0

) -> tuple[torch.Tensor, torch.Tensor]:

"""Run one prediction step.

:param batch: Mini-batch tuple ``(x, y)`` where ``x`` has shape ``[B, C_in, T]``.

:param batch_idx: Batch index within the prediction epoch.

:returns: Tuple of context and encoded tensors.

"""

batch_x, _ = batch

batch_x = batch_x.float()

context, encoded = self.model(batch_x)

Copilot · 2026-03-22T20:26:43Z

+        :param batch_idx: Batch index within the prediction epoch.
+        :returns: Tuple of context and encoded tensors.
+        """
+        x = data.permute(0, 2, 1).float()


Same issue as predictive_coding.py: the docstring describes data as a batch tuple, but predict_step treats it as a tensor and calls .permute(...). If used with Trainer.predict, this will raise. Unpack the batch first (e.g., x, _ = batch) before permuting/casting.

Suggested change

x = data.permute(0, 2, 1).float()

x, _ = data

x = x.permute(0, 2, 1).float()

emptymalei added 26 commits September 29, 2025 23:34

embeddomg

32016d4

exp: transformer embeddings

169ce1d

embedding

0885b71

visualize embeddings with UMAP

048ac0d

update embedding

d5c9755

more embedding for transformers

d61f652

refactor embedding

cce6ff7

embedding vae

cf2b987

embed z_mean

c7b5543

poison vae

bf3adc0

vae embedding

63cd033

small updates

764409a

encoder of transformer

089d7b0

add jupyter notebook metadata for transformer embeddings

ec4bc1b

Refactor imports and clean up whitespace in predictive coding notebook

692cc27

Update encoder configuration and refine CPC model parameters in predi…

944c4a5

…ctive coding notebook

Update configuration for ECG5000 dataset and enhance predictive codin…

ec3f953

…g notebook with new data handling

Add ECG5000 dataset path to .gitignore

cec9124

Add configuration files for FordA and Sleep datasets in predictive co…

6e6ea7c

…ding

Remove specific ECG dataset paths from .gitignore to allow broader da…

2deccad

…ta handling in predictive coding

Refactor predictive coding notebook for improved readability and orga…

4603d96

…nization; streamline imports and enhance DataLoader configurations.

Add initial implementation of predictive coding forecasting model and…

0d48df8

… configuration

Remove unused imports from predictive coding forecasting script

9a8d8ef

Add configuration file for BinaryHeartbeat dataset in predictive coding

70ecee4

Update BinaryHeartbeat configuration: increase batch size to 32 and e…

c8cf6c8

…pochs to 300; modify normalization layer in CPCEncoder

Copilot AI review requested due to automatic review settings March 22, 2026 20:22

emptymalei merged commit 0918332 into main Mar 22, 2026
4 checks passed

Copilot started reviewing on behalf of emptymalei March 22, 2026 20:23 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ts embedding#72

Ts embedding#72
emptymalei merged 26 commits into
mainfrom
ts-embedding

emptymalei commented Mar 22, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	x = data.permute(0, 2, 1).float()
	x, _ = data
	x = x.permute(0, 2, 1).float()

Conversation

emptymalei commented Mar 22, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants