Ts embedding#72
Conversation
- Updated Jupytext version from 1.15.2 to 1.19.1 in multiple notebooks. - Added a new notebook for Contrastive Predictive Coding (CPC) for time series data, including configurations, data handling, model architecture, and training logic. - Updated dependency for `ts-bolt` from version 0.0.6 to 0.0.7 in `pyproject.toml`.
…ctive coding notebook
…g notebook with new data handling
…ta handling in predictive coding
…nization; streamline imports and enhance DataLoader configurations.
…pochs to 300; modify normalization layer in CPCEncoder
There was a problem hiding this comment.
Pull request overview
This PR appears to support time-series embedding experiments by adding new ML dependencies (FAISS + torchdr) and introducing/expanding notebooks for embedding visualization and Contrastive Predictive Coding (CPC).
Changes:
- Add embedding-related dependencies (e.g.,
torchdr,faiss-gpu) and updatets-bolt. - Extend existing notebooks with embedding investigation sections and checkpoint reload workflows.
- Add new CPC notebooks plus YAML configs for multiple datasets.
Reviewed changes
Copilot reviewed 41 out of 45 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements.txt | Adds FAISS/torchdr and large ML dependency set (torch, CUDA libs, sklearn/scipy, etc.). |
| pyproject.toml | Adds torchdr + faiss-gpu to main dependencies; bumps ts-bolt to 0.0.7. |
| .gitignore | Ignores dl/notebooks/data/predictive_coding/. |
| dl/notebooks/ts_dl_utils/embedding/init.py | Embedding package init (file present in PR scope). |
| dl/notebooks/tree_random_forest.py | Updates Jupytext metadata version. |
| dl/notebooks/tree_darts_random_forest.py | Updates Jupytext metadata version. |
| dl/notebooks/tree_darts_boosted_tree.py | Updates Jupytext metadata version. |
| dl/notebooks/tree_basics.py | Updates Jupytext metadata version. |
| dl/notebooks/transformer-ts-nixtla.py | Updates Jupytext metadata version. |
| dl/notebooks/transformer-ts-nixtla-testing.py | Updates Jupytext metadata version. |
| dl/notebooks/transformer-ts-nixtla-testing-m5.py | Updates Jupytext metadata version. |
| dl/notebooks/transformer-ts-nixtla_naive_data.py | Updates Jupytext metadata version. |
| dl/notebooks/transformer-explainer.py | Updates Jupytext metadata version. |
| dl/notebooks/transformer_timeseries_univariate.py | Adds save_hyperparameters() and a large embedding investigation section; updates kernelspec metadata. |
| dl/notebooks/transformer_history.py | Updates Jupytext metadata version. |
| dl/notebooks/transformer_embeddings.py | Adds a new (currently header-only) notebook stub. |
| dl/notebooks/timeseries-comparison.py | Updates Jupytext metadata version. |
| dl/notebooks/timeseries_gan.py | Updates Jupytext metadata version. |
| dl/notebooks/timeseries_data_box-cox.py | Updates Jupytext metadata version. |
| dl/notebooks/time-series-data-generation.py | Updates Jupytext metadata version. |
| dl/notebooks/time_vae.py | Updates kernelspec metadata; changes an output inspection to use .shape; adds embedding investigation section. |
| dl/notebooks/time_vae_poison.py | Updates kernelspec metadata; changes reload defaults/paths; adds embedding investigation section. |
| dl/notebooks/time_series_data_and_embedding.py | Updates Jupytext metadata version; removes leftover Jupytext cell markers. |
| dl/notebooks/tabpfn.py | Updates Jupytext metadata version. |
| dl/notebooks/rnn_timeseries.py | Updates Jupytext metadata version. |
| dl/notebooks/rnn_timeseries_comparison.py | Updates Jupytext metadata version. |
| dl/notebooks/rnn_phase_space.py | Updates Jupytext metadata version. |
| dl/notebooks/predictive_coding.py | Adds a full CPC experimentation notebook (classification + downstream evaluation). |
| dl/notebooks/predictive_coding_forecasting.py | Adds CPC experimentation notebook oriented toward forecasting/data module usage. |
| dl/notebooks/pendulum_dataset.py | Updates Jupytext metadata version. |
| dl/notebooks/neuralode_timeseries.py | Updates Jupytext metadata version. |
| dl/notebooks/lstm_properties.py | Updates Jupytext metadata version. |
| dl/notebooks/hierarchical_forecasting_mint.py | Updates Jupytext metadata version. |
| dl/notebooks/feedforward_neural_netwroks_timeseries.py | Updates Jupytext metadata version. |
| dl/notebooks/diffusion_process.py | Updates Jupytext metadata version. |
| dl/notebooks/diffusion_model.py | Updates Jupytext metadata version. |
| dl/notebooks/diffusion_model_timegrad.py | Updates Jupytext metadata version. |
| dl/notebooks/creating_time_series_datasets.py | Updates Jupytext metadata version. |
| dl/notebooks/configs/predictive_coding/config.synth.yaml | Adds CPC training config for synthetic series. |
| dl/notebooks/configs/predictive_coding/config.sleep.yaml | Adds CPC training config for Sleep dataset. |
| dl/notebooks/configs/predictive_coding/config.forda.yaml | Adds CPC training config for FordA dataset. |
| dl/notebooks/configs/predictive_coding/config.ecg5000.yaml | Adds CPC training config for ECG5000 dataset. |
| dl/notebooks/configs/predictive_coding/config.ecg200.yaml | Adds CPC training config for ECG200 dataset. |
| dl/notebooks/configs/predictive_coding/config.binaryheartbeat.yaml | Adds CPC training config for BinaryHeartbeat dataset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| loguru = "^0.7.2" | ||
| tabulate = "^0.9.0" | ||
| dtaidistance = "^2.3.12" | ||
| torchdr = "^0.3" | ||
| faiss-gpu = "^1.7.2" |
There was a problem hiding this comment.
torchdr and especially faiss-gpu are added to the main (non-optional) Poetry dependencies. Since requirements.txt is generated from the main dependency set and is used by Netlify / the PDF docs workflow (pip install -r requirements.txt), this will force GPU/CUDA-specific installs during docs builds and on developer machines by default. Consider moving these into an optional dependency group (e.g. [tool.poetry.group.torch] or a dedicated embedding/notebook group) and keeping the main dependency set limited to what MkDocs needs.
| def __init__(self, transformer: nn.Module): | ||
| super().__init__() | ||
| self.transformer = transformer | ||
| self.save_hyperparameters() | ||
|
|
There was a problem hiding this comment.
self.save_hyperparameters() will capture the transformer module object passed into __init__. This can bloat checkpoints and often breaks load_from_checkpoint (which expects to reconstruct init args from saved hparams). Consider either ignoring the module (save_hyperparameters(ignore=["transformer"])) and requiring it to be passed on load, or saving only a serializable transformer config needed to reconstruct it.
| load_from_checkpoint = ( | ||
| Path( | ||
| # "lightning_logs/transformer_ts_1_step/version_9" | ||
| "lightning_logs/transformer_ts_1_step/version_7" | ||
| ) | ||
| / "checkpoints" |
There was a problem hiding this comment.
This notebook hard-codes a specific lightning_logs/.../version_7 path for checkpoint loading. That makes the notebook non-reproducible for others and brittle when log versions change. Prefer deriving the path from logger_1_step.log_dir (already noted below) or parameterizing the version/checkpoint filename.
| def embedding_extractor( | ||
| forecaster: TransformerForecaster, x: torch.Tensor | ||
| ) -> tuple[torch.Tensor]: | ||
| """compute the embeddings based on the input | ||
|
|
||
| :param forecaster: the trained forecaster | ||
| :param x: input historical time series, | ||
| """ | ||
| forecaster.transformer.to(x.device) | ||
| x_embedding = forecaster.transformer.embedding( | ||
| x.type_as(forecaster.transformer.embedding.weight) | ||
| ) | ||
| x_positional = forecaster.transformer.positional_encoding(x_embedding) | ||
|
|
||
| encoder_state = forecaster.transformer.encoder(x_positional) | ||
|
|
||
| reversed = forecaster.transformer.reverse_embedding(encoder_state).squeeze(-1) | ||
|
|
||
| return x_embedding, x_positional, encoder_state, reversed |
There was a problem hiding this comment.
embedding_extractor is annotated as returning tuple[torch.Tensor] but actually returns 4 tensors. Also, assigning to reversed shadows Python’s built-in reversed() function. Update the return type annotation to match and rename the local variable to something like reversed_embedding/decoder_in for clarity.
| ) | ||
|
|
||
| generated_samples_x.size() | ||
| generated_samples_x.size |
There was a problem hiding this comment.
generated_samples_x is a NumPy array here, and .size is a scalar attribute (not a shape tuple). If the intent is to inspect the shape like in the other notebook, use .shape (or if this were a tensor, use .size() with parentheses).
| generated_samples_x.size | |
| generated_samples_x.shape |
| self, data, batch_idx: int = 0 | ||
| ) -> tuple[torch.Tensor, torch.Tensor]: | ||
| """Run one prediction step. | ||
|
|
||
| :param data: Mini-batch tuple ``(x, y)``. | ||
| :param batch_idx: Batch index within the prediction epoch. | ||
| :returns: Tuple of context and encoded tensors. | ||
| """ | ||
| x = data.float() | ||
| context, encoded = self.model(x) |
There was a problem hiding this comment.
predict_step’s docstring says data is a mini-batch tuple (x, y), but the implementation treats it as a tensor and calls .float() directly. This will fail if you use Trainer.predict(...) with a real DataLoader. Consider changing the signature to accept batch, unpack (x, _), and then run the model.
| self, data, batch_idx: int = 0 | |
| ) -> tuple[torch.Tensor, torch.Tensor]: | |
| """Run one prediction step. | |
| :param data: Mini-batch tuple ``(x, y)``. | |
| :param batch_idx: Batch index within the prediction epoch. | |
| :returns: Tuple of context and encoded tensors. | |
| """ | |
| x = data.float() | |
| context, encoded = self.model(x) | |
| self, batch, batch_idx: int = 0 | |
| ) -> tuple[torch.Tensor, torch.Tensor]: | |
| """Run one prediction step. | |
| :param batch: Mini-batch tuple ``(x, y)`` where ``x`` has shape ``[B, C_in, T]``. | |
| :param batch_idx: Batch index within the prediction epoch. | |
| :returns: Tuple of context and encoded tensors. | |
| """ | |
| batch_x, _ = batch | |
| batch_x = batch_x.float() | |
| context, encoded = self.model(batch_x) |
| :param batch_idx: Batch index within the prediction epoch. | ||
| :returns: Tuple of context and encoded tensors. | ||
| """ | ||
| x = data.permute(0, 2, 1).float() |
There was a problem hiding this comment.
Same issue as predictive_coding.py: the docstring describes data as a batch tuple, but predict_step treats it as a tensor and calls .permute(...). If used with Trainer.predict, this will raise. Unpack the batch first (e.g., x, _ = batch) before permuting/casting.
| x = data.permute(0, 2, 1).float() | |
| x, _ = data | |
| x = x.permute(0, 2, 1).float() |
No description provided.