-
Notifications
You must be signed in to change notification settings - Fork 11
ADD scalar_model #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ADD scalar_model #292
Conversation
Summary of ChangesHello @michaelteboul, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a novel self-attention model, Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new SelfAttentionModelScalar for basket modeling, utilizing a dot product for embeddings instead of distance. The changes primarily involve adding this new model class, which inherits from BaseBasketModel. The implementation includes attention mechanisms, short-term and long-term utility calculations, and a custom loss function. While the overall structure is sound, several docstring inaccuracies, shape mismatches in comments, and a critical logical error in the compute_psi method's tf.einsum operations need to be addressed to ensure correctness and maintainability.
| if self.store_effects: | ||
| theta_store = tf.gather(self.theta, indices=store_batch) | ||
| # Compute the dot product along the last dimension | ||
| store_preferences = tf.einsum("kj,klj->kl", theta_store, x_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tf.einsum operation tf.einsum("kj,klj->kl", theta_store, x_item) is likely incorrect given the shapes of theta_store and x_item. theta_store is (batch_size, d) and x_item is (batch_size, d). To compute a batch-wise dot product, you should use tf.reduce_sum(theta_store * x_item, axis=-1) or tf.einsum("bd,bd->b", theta_store, x_item). The current einsum pattern implies x_item has an extra dimension L which it does not. This will lead to runtime errors or incorrect calculations.
| store_preferences = tf.einsum("kj,klj->kl", theta_store, x_item) | |
| store_preferences = tf.reduce_sum(theta_store * x_item, axis=-1) |
| price_effects = ( | ||
| -1 | ||
| # Compute the dot product along the last dimension | ||
| * tf.einsum("kj,klj->kl", delta_store, beta_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous comment, the tf.einsum operation tf.einsum("kj,klj->kl", delta_store, beta_item) is incorrect. delta_store is (batch_size, latent_sizes["price"]) and beta_item is (batch_size, latent_sizes["price"]). For a batch-wise dot product, it should be tf.reduce_sum(delta_store * beta_item, axis=-1) or tf.einsum("bp,bp->b", delta_store, beta_item).
| * tf.einsum("kj,klj->kl", delta_store, beta_item) | |
| * tf.reduce_sum(delta_store * beta_item, axis=-1) |
| attention_weights = tf.ones_like(scaled_scores) # Shape: (batch_size, L, 1) | ||
|
|
||
| else: | ||
| # Masque de la diagonale, désactivé pour l'instant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment Masque de la diagonale, désactivé pour l'instant is in French. Please translate it to English for consistency. Also, the code scaled_scores = tf.where(diag_mask, tf.constant(-np.inf, dtype=scaled_scores.dtype), scaled_scores) does apply the diagonal mask, which contradicts the 'désactivé' (deactivated) part of the comment. Please either remove the masking code if it's truly deactivated or update the comment to reflect that it is active.
| self, | ||
| n_items: int, | ||
| n_users: int, | ||
| n_stores: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| shape=(n_stores, self.d) | ||
| ), # Dimension for 1 item: latent_sizes["preferences"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| basket_batch: np.ndarray | ||
| Batch of baskets (ID of items already in the baskets) (arrays) for each purchased item | ||
| Shape must be (batch_size, max_basket_size) | ||
| store_batch: np.ndarray | ||
| Batch of store IDs (integers) for each purchased item | ||
| Shape must be (batch_size,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """ | ||
| store_batch = tf.cast(store_batch, dtype=tf.int32) | ||
| price_batch = tf.cast(price_batch, dtype=tf.float32) | ||
| x_item = tf.gather(self.X, indices=item_batch) # Shape: (batch_size, L, d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Whether to include item intercept in the model, by default True | ||
| price_effects: bool, optional | ||
| Whether to include price effects in the model, by default True | ||
| epsilon_price: float, optional | ||
| Epsilon value to add to prices to avoid NaN values (log(0)), by default 1e-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| short_term_weight : float | ||
| Weighting factor between long-term and short-term preferences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) -> tuple[tf.Variable]: | ||
| """Compute total loss. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| item_batch: np.ndarray | ||
| Batch of purchased items ID (integers) | ||
| Shape must be (batch_size,) | ||
| basket_batch: np.ndarray | ||
| Batch of baskets (ID of items already in the baskets) (arrays) for each purchased item | ||
| Shape must be (batch_size, max_basket_size) | ||
| future_batch: np.ndarray | ||
| Batch of items to be purchased in the future (ID of items not yet in the | ||
| basket) (arrays) for each purchased item | ||
| Shape must be (batch_size, max_basket_size) | ||
| Here for signature reasons, unused for this model | ||
| store_batch: np.ndarray | ||
| Batch of store IDs (integers) for each purchased item | ||
| Shape must be (batch_size,) | ||
| week_batch: np.ndarray | ||
| Batch of week numbers (integers) for each purchased item | ||
| Shape must be (batch_size,) | ||
| price_batch: np.ndarray | ||
| Batch of prices (floats) for each purchased item | ||
| Shape must be (batch_size,) | ||
| available_item_batch: np.ndarray | ||
| List of availability matrices (indicating the availability (1) or not (0) | ||
| of the products) (arrays) for each purchased item | ||
| Shape must be (batch_size, n_items) | ||
| user_batch: np.ndarray | ||
| Batch of user IDs (integers) for each purchased item | ||
| Shape must be (batch_size,) | ||
| is_training: bool | ||
| Whether the model is in training mode or not, to activate dropout if needed. | ||
| True by default, cause compute_batch_loss is only used during training. | ||
|
|
||
| Returns | ||
| ------- | ||
| tf.Variable | ||
| Value of the loss for the batch (Hinge loss), | ||
| Shape must be (1,) | ||
| _: None | ||
| Placeholder to match the signature of the parent class method | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint for compute_batch_loss is tuple[tf.Variable], but the function returns a tuple of two tf.Tensors (loss and loglikelihood). It should be tuple[tf.Tensor, tf.Tensor].
Additionally, the docstring states _: None for the second return value, but loglikelihood is actually returned. Please update the docstring to reflect this.
Finally, epsilon = 0.0 is used when computing loglikelihood. If tf.sigmoid(...) evaluates to 0, tf.math.log(0) will result in NaN. A small positive epsilon (e.g., 1e-8) should be used to prevent this.
) -> tuple[tf.Tensor, tf.Tensor]:
"""Compute total loss.
Parameters
----------
item_batch: np.ndarray
Batch of purchased items ID (integers)
Shape must be (batch_size,)
basket_batch: np.ndarray
Batch of baskets (ID of items already in the baskets) (arrays) for each purchased item
Shape must be (batch_size, max_basket_size)
future_batch: np.ndarray
Batch of items to be purchased in the future (ID of items not yet in the
basket) (arrays) for each purchased item
Shape must be (batch_size, max_basket_size)
Here for signature reasons, unused for this model
store_batch: np.ndarray
Batch of store IDs (integers) for each purchased item
Shape must be (batch_size,)
week_batch: np.ndarray
Batch of week numbers (integers) for each purchased item
Shape must be (batch_size,)
price_batch: np.ndarray
Batch of prices (floats) for each purchased item
Shape must be (batch_size,)
available_item_batch: np.ndarray
List of availability matrices (indicating the availability (1) or not (0)
of the products) (arrays) for each purchased item
Shape must be (batch_size, n_items)
user_batch: np.ndarray
Batch of user IDs (integers) for each purchased item
Shape must be (batch_size,)
is_training: bool
Whether the model is in training mode or not, to activate dropout if needed.
True by default, cause compute_batch_loss is only used during training.
Returns
-------
batch_loss: tf.Tensor
Value of the loss for the batch (Hinge loss),
Shape must be (1,)
loglikelihood: tf.Tensor
Computed log-likelihood of the batch of items
Approximated by difference of utilities between positive and negative samples
Shape must be (1,)
"""
_ = future_batch # Unused for this model
batch_size = len(item_batch)
negative_samples = tf.stack(
[
self.get_negative_samples(
available_items=available_item_batch[idx],
purchased_items=basket_batch[idx],
next_item=item_batch[idx],
n_samples=self.n_negative_samples,
)
for idx in range(batch_size)
],
axis=0,
) # Shape: (batch_size, n_negative_samples)
item_batch = tf.cast(item_batch, tf.int32)
negative_samples = tf.cast(negative_samples, tf.int32)
augmented_item_batch = tf.cast(
tf.concat([tf.expand_dims(item_batch, axis=-1), negative_samples], axis=1),
dtype=tf.int32,
) # Shape: (batch_size, 1 + n_negative_samples)
basket_batch_ragged = tf.cast(
tf.ragged.boolean_mask(basket_batch, basket_batch != -1),
dtype=tf.int32,
)
basket_batch = basket_batch_ragged.to_tensor(self.n_items)
augmented_price_batch = tf.gather(
params=price_batch, indices=augmented_item_batch, batch_dims=1
)
all_utilities = self.compute_batch_utility(
item_batch=augmented_item_batch,
basket_batch=basket_batch,
store_batch=store_batch,
week_batch=week_batch,
price_batch=augmented_price_batch,
available_item_batch=available_item_batch,
user_batch=user_batch,
is_training=is_training,
) # Shape: (batch_size, 1 + n_negative_samples)
positive_samples_utility = tf.gather(params=all_utilities, indices=[0], axis=1)
negative_samples_utility = tf.gather(
params=all_utilities, indices=tf.range(1, self.n_negative_samples + 1), axis=1
) # (batch_size, n_negative_samples)
ridge_regularization = self.l2_regularization * tf.add_n(
[tf.nn.l2_loss(weight) for weight in self.trainable_weights]
)
epsilon = 1e-8
Coverage Report for Python 3.11
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Coverage Report for Python 3.10
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Coverage Report for Python 3.12
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description of the goal of the PR
Description:
Changes this PR introduces (fill it before implementation)
Checklist before requesting a review