Skip to content

Ts embedding#72

Merged
emptymalei merged 26 commits into
mainfrom
ts-embedding
Mar 22, 2026
Merged

Ts embedding#72
emptymalei merged 26 commits into
mainfrom
ts-embedding

Conversation

@emptymalei
Copy link
Copy Markdown
Owner

No description provided.

- Updated Jupytext version from 1.15.2 to 1.19.1 in multiple notebooks.
- Added a new notebook for Contrastive Predictive Coding (CPC) for time series data, including configurations, data handling, model architecture, and training logic.
- Updated dependency for `ts-bolt` from version 0.0.6 to 0.0.7 in `pyproject.toml`.
…nization; streamline imports and enhance DataLoader configurations.
…pochs to 300; modify normalization layer in CPCEncoder
Copilot AI review requested due to automatic review settings March 22, 2026 20:22
@emptymalei emptymalei merged commit 0918332 into main Mar 22, 2026
4 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR appears to support time-series embedding experiments by adding new ML dependencies (FAISS + torchdr) and introducing/expanding notebooks for embedding visualization and Contrastive Predictive Coding (CPC).

Changes:

  • Add embedding-related dependencies (e.g., torchdr, faiss-gpu) and update ts-bolt.
  • Extend existing notebooks with embedding investigation sections and checkpoint reload workflows.
  • Add new CPC notebooks plus YAML configs for multiple datasets.

Reviewed changes

Copilot reviewed 41 out of 45 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
requirements.txt Adds FAISS/torchdr and large ML dependency set (torch, CUDA libs, sklearn/scipy, etc.).
pyproject.toml Adds torchdr + faiss-gpu to main dependencies; bumps ts-bolt to 0.0.7.
.gitignore Ignores dl/notebooks/data/predictive_coding/.
dl/notebooks/ts_dl_utils/embedding/init.py Embedding package init (file present in PR scope).
dl/notebooks/tree_random_forest.py Updates Jupytext metadata version.
dl/notebooks/tree_darts_random_forest.py Updates Jupytext metadata version.
dl/notebooks/tree_darts_boosted_tree.py Updates Jupytext metadata version.
dl/notebooks/tree_basics.py Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla.py Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla-testing.py Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla-testing-m5.py Updates Jupytext metadata version.
dl/notebooks/transformer-ts-nixtla_naive_data.py Updates Jupytext metadata version.
dl/notebooks/transformer-explainer.py Updates Jupytext metadata version.
dl/notebooks/transformer_timeseries_univariate.py Adds save_hyperparameters() and a large embedding investigation section; updates kernelspec metadata.
dl/notebooks/transformer_history.py Updates Jupytext metadata version.
dl/notebooks/transformer_embeddings.py Adds a new (currently header-only) notebook stub.
dl/notebooks/timeseries-comparison.py Updates Jupytext metadata version.
dl/notebooks/timeseries_gan.py Updates Jupytext metadata version.
dl/notebooks/timeseries_data_box-cox.py Updates Jupytext metadata version.
dl/notebooks/time-series-data-generation.py Updates Jupytext metadata version.
dl/notebooks/time_vae.py Updates kernelspec metadata; changes an output inspection to use .shape; adds embedding investigation section.
dl/notebooks/time_vae_poison.py Updates kernelspec metadata; changes reload defaults/paths; adds embedding investigation section.
dl/notebooks/time_series_data_and_embedding.py Updates Jupytext metadata version; removes leftover Jupytext cell markers.
dl/notebooks/tabpfn.py Updates Jupytext metadata version.
dl/notebooks/rnn_timeseries.py Updates Jupytext metadata version.
dl/notebooks/rnn_timeseries_comparison.py Updates Jupytext metadata version.
dl/notebooks/rnn_phase_space.py Updates Jupytext metadata version.
dl/notebooks/predictive_coding.py Adds a full CPC experimentation notebook (classification + downstream evaluation).
dl/notebooks/predictive_coding_forecasting.py Adds CPC experimentation notebook oriented toward forecasting/data module usage.
dl/notebooks/pendulum_dataset.py Updates Jupytext metadata version.
dl/notebooks/neuralode_timeseries.py Updates Jupytext metadata version.
dl/notebooks/lstm_properties.py Updates Jupytext metadata version.
dl/notebooks/hierarchical_forecasting_mint.py Updates Jupytext metadata version.
dl/notebooks/feedforward_neural_netwroks_timeseries.py Updates Jupytext metadata version.
dl/notebooks/diffusion_process.py Updates Jupytext metadata version.
dl/notebooks/diffusion_model.py Updates Jupytext metadata version.
dl/notebooks/diffusion_model_timegrad.py Updates Jupytext metadata version.
dl/notebooks/creating_time_series_datasets.py Updates Jupytext metadata version.
dl/notebooks/configs/predictive_coding/config.synth.yaml Adds CPC training config for synthetic series.
dl/notebooks/configs/predictive_coding/config.sleep.yaml Adds CPC training config for Sleep dataset.
dl/notebooks/configs/predictive_coding/config.forda.yaml Adds CPC training config for FordA dataset.
dl/notebooks/configs/predictive_coding/config.ecg5000.yaml Adds CPC training config for ECG5000 dataset.
dl/notebooks/configs/predictive_coding/config.ecg200.yaml Adds CPC training config for ECG200 dataset.
dl/notebooks/configs/predictive_coding/config.binaryheartbeat.yaml Adds CPC training config for BinaryHeartbeat dataset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml
Comment on lines 20 to +24
loguru = "^0.7.2"
tabulate = "^0.9.0"
dtaidistance = "^2.3.12"
torchdr = "^0.3"
faiss-gpu = "^1.7.2"
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torchdr and especially faiss-gpu are added to the main (non-optional) Poetry dependencies. Since requirements.txt is generated from the main dependency set and is used by Netlify / the PDF docs workflow (pip install -r requirements.txt), this will force GPU/CUDA-specific installs during docs builds and on developer machines by default. Consider moving these into an optional dependency group (e.g. [tool.poetry.group.torch] or a dedicated embedding/notebook group) and keeping the main dependency set limited to what MkDocs needs.

Copilot uses AI. Check for mistakes.
Comment on lines 219 to 223
def __init__(self, transformer: nn.Module):
super().__init__()
self.transformer = transformer
self.save_hyperparameters()

Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.save_hyperparameters() will capture the transformer module object passed into __init__. This can bloat checkpoints and often breaks load_from_checkpoint (which expects to reconstruct init args from saved hparams). Consider either ignoring the module (save_hyperparameters(ignore=["transformer"])) and requiring it to be passed on load, or saving only a serializable transformer config needed to reconstruct it.

Copilot uses AI. Check for mistakes.
Comment on lines +519 to +524
load_from_checkpoint = (
Path(
# "lightning_logs/transformer_ts_1_step/version_9"
"lightning_logs/transformer_ts_1_step/version_7"
)
/ "checkpoints"
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook hard-codes a specific lightning_logs/.../version_7 path for checkpoint loading. That makes the notebook non-reproducible for others and brittle when log versions change. Prefer deriving the path from logger_1_step.log_dir (already noted below) or parameterizing the version/checkpoint filename.

Copilot uses AI. Check for mistakes.
Comment on lines +547 to +565
def embedding_extractor(
forecaster: TransformerForecaster, x: torch.Tensor
) -> tuple[torch.Tensor]:
"""compute the embeddings based on the input

:param forecaster: the trained forecaster
:param x: input historical time series,
"""
forecaster.transformer.to(x.device)
x_embedding = forecaster.transformer.embedding(
x.type_as(forecaster.transformer.embedding.weight)
)
x_positional = forecaster.transformer.positional_encoding(x_embedding)

encoder_state = forecaster.transformer.encoder(x_positional)

reversed = forecaster.transformer.reverse_embedding(encoder_state).squeeze(-1)

return x_embedding, x_positional, encoder_state, reversed
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embedding_extractor is annotated as returning tuple[torch.Tensor] but actually returns 4 tensors. Also, assigning to reversed shadows Python’s built-in reversed() function. Update the return type annotation to match and rename the local variable to something like reversed_embedding/decoder_in for clarity.

Copilot uses AI. Check for mistakes.
)

generated_samples_x.size()
generated_samples_x.size
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated_samples_x is a NumPy array here, and .size is a scalar attribute (not a shape tuple). If the intent is to inspect the shape like in the other notebook, use .shape (or if this were a tensor, use .size() with parentheses).

Suggested change
generated_samples_x.size
generated_samples_x.shape

Copilot uses AI. Check for mistakes.
Comment on lines +671 to +680
self, data, batch_idx: int = 0
) -> tuple[torch.Tensor, torch.Tensor]:
"""Run one prediction step.

:param data: Mini-batch tuple ``(x, y)``.
:param batch_idx: Batch index within the prediction epoch.
:returns: Tuple of context and encoded tensors.
"""
x = data.float()
context, encoded = self.model(x)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predict_step’s docstring says data is a mini-batch tuple (x, y), but the implementation treats it as a tensor and calls .float() directly. This will fail if you use Trainer.predict(...) with a real DataLoader. Consider changing the signature to accept batch, unpack (x, _), and then run the model.

Suggested change
self, data, batch_idx: int = 0
) -> tuple[torch.Tensor, torch.Tensor]:
"""Run one prediction step.
:param data: Mini-batch tuple ``(x, y)``.
:param batch_idx: Batch index within the prediction epoch.
:returns: Tuple of context and encoded tensors.
"""
x = data.float()
context, encoded = self.model(x)
self, batch, batch_idx: int = 0
) -> tuple[torch.Tensor, torch.Tensor]:
"""Run one prediction step.
:param batch: Mini-batch tuple ``(x, y)`` where ``x`` has shape ``[B, C_in, T]``.
:param batch_idx: Batch index within the prediction epoch.
:returns: Tuple of context and encoded tensors.
"""
batch_x, _ = batch
batch_x = batch_x.float()
context, encoded = self.model(batch_x)

Copilot uses AI. Check for mistakes.
:param batch_idx: Batch index within the prediction epoch.
:returns: Tuple of context and encoded tensors.
"""
x = data.permute(0, 2, 1).float()
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as predictive_coding.py: the docstring describes data as a batch tuple, but predict_step treats it as a tensor and calls .permute(...). If used with Trainer.predict, this will raise. Unpack the batch first (e.g., x, _ = batch) before permuting/casting.

Suggested change
x = data.permute(0, 2, 1).float()
x, _ = data
x = x.permute(0, 2, 1).float()

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants