-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Prerequisites
- I have searched the existing issues to avoid duplicates
- I understand that this is just a suggestion and might not be implemented
Problem Statement
Our current caching strategy is reactive. When a popular, high-traffic product's cache TTL expires, the first user request to hit the server after expiration experiences a significant latency penalty. This is because their request is responsible for triggering the database fetch to rebuild the cache.
While our current Redlock implementation successfully prevents a database stampede (thundering herd) by ensuring only one process rebuilds the cache, it does not solve this "first-user latency" problem. For best-selling items or products featured in a flash sale, this delay can negatively impact user experience and conversion rates.
Proposed Solution
I propose implementing a "proactive cache re-warming" (also known as pre-fetching) mechanism. This system would work as follows:
- Identify Imminent Expirations: A background worker/process would periodically scan Redis for keys that are about to expire (e.g., keys with a TTL of less than 2 minutes). It would focus on a specific subset of keys we identify as critical (e.g., product:*, category:main-page).
- Acquire Lock and Refresh: For each key nearing expiration, the worker will attempt to acquire a Redlock lock for that resource (e.g., lock:rewarm:product:123).
- Re-warm the Cache: Once the lock is acquired, the worker will fetch the fresh data from the primary database and update the cache, resetting its TTL.
This process ensures that the cache is refreshed before it expires, meaning live user traffic should almost never encounter a cold cache for these critical items.
// Initialize Redlock.
// Pass an array of connected Redis clients.
const redlock = new Redlock(
[redisClient],
{
// The time in milliseconds between retries.
retryDelay: 200,
// The number of times to retry before failing.
retryCount: 5,
}
);
redlock.on('error', (error) => {
// It's important to monitor Redlock errors.
console.error('A Redlock error has occurred:', error);
});
try {
// 1. Check the cache first
const cachedProduct = await redisClient.get(productKey);
if (cachedProduct) {
console.log(`[CACHE HIT] Found product ${productId} in cache.`);
return res.json({ source: 'cache', data: JSON.parse(cachedProduct) });
}
// 2. Cache Miss: Try to acquire a lock
console.log(`[CACHE MISS] Product ${productId} not in cache. Attempting to acquire lock...`);
try {
// The `using` block automatically releases the lock when done.
// This is safer than manual unlocking.
await redlock.using([lockKey], lockTTL, async (signal) => {
console.log(`[LOCK ACQUIRED] Lock for product ${productId} acquired. Rebuilding cache.`);
// Check if another process populated the cache while we were waiting for the lock
const productExists = await redisClient.get(productKey);
if (productExists) {
console.log(`[CACHE HIT] Product ${productId} was populated by another process. Serving from cache.`);
return res.json({ source: 'cache', data: JSON.parse(productExists) });
}
// Fetch data from the database
const productData = await fetchProductFromDB(productId);
// Set data in cache with TTL
await redisClient.set(productKey, JSON.stringify(productData), { EX: cacheTTL });
console.log(`[CACHE SET] Product ${productId} stored in cache.`);
// Check if the lock is still active before responding
if (signal.aborted) {
throw new Error('The lock for this operation has already expired.');
}
return res.json({ source: 'database', data: productData });
});
} catch (err) {
// This block catches errors if the lock could not be acquired (e.g., it's already held).
// It also catches errors from within the `using` block, like the signal.aborted error.
console.log(`[LOCK FAILED] Could not acquire lock for product ${productId}. Another process is likely rebuilding the cache.`);Alternatives Considered
Simply use longer TTLs: We considered just increasing the cache TTLs to several hours. This is a blunt approach that increases the risk of serving stale data (e.g., price or stock changes not reflecting quickly). Our proposed solution keeps TTLs short while ensuring freshness.
Rely on CDN caching: While we use a CDN, its cache can be purged, and it doesn't solve the problem of refreshing the data at the origin server. The origin still needs an efficient caching strategy.
Additional Context
This feature is especially critical for marketing events like flash sales or email campaigns, where we can predict a massive, sudden surge in traffic to specific product pages. Proactively warming the cache for these items before the campaign starts would ensure a smooth user experience.
Priority
Critical