Conversation
jcollins-axway
left a comment
There was a problem hiding this comment.
Looks good from a high level, needs pounded testing wise
|
copilots review Code Review: APIGOV-31191 vs mainSummary
Findings
Positives
Recommendations
|
|
Not a bad review at all |
| // deleting state with success status - should NOT be cached | ||
| err := handler.Handle(NewEventContext(proto.Event_CREATED, nil, ri.Kind, ri.Name), nil, ri) | ||
| assert.Nil(t, err) | ||
| assert.Equal(t, []string{}, cm.GetAccessRequestCacheKeys()) |
There was a problem hiding this comment.
as we spoke of we prefer test cases over single tests
|
caching changes require an extra amount of testing as its usually to blame for our duplicate service issues |
|
Give me time to look at this closer |
| return false | ||
| } | ||
|
|
||
| cachedResources := cv.cacheMan.GetCachedResourcesByKind(filter.Group, filter.Kind) |
There was a problem hiding this comment.
I think we talked about this scope mismatch in sync. The server query being used is scoped to filter.Scope.Name because of GetKindLink. So its returning resources for that one envionrment,right?
but GetCachedResourcesByKind is returning every cached resource for this kind across all scopes. Different scopes - if len(serverMap) != len(cachedResources) { will not work because the cache side will be bigger than the server side.
|
|
||
| // validateAndRebuildCache validates the cache and rebuilds it if out of sync. | ||
| // Called when connection to Engage is restored. | ||
| func (es *EventSync) validateAndRebuildCache() { |
There was a problem hiding this comment.
so this is being called by onReconnect() while the event listener is already live. PublisingLock doesn't look to be guarding any event consumption. So Flush and SetSequence could possibly race with the listener writing to cache, no? And maybe messing up the sequence. I ran into this with my mulesoft stuff. Like the listener should be paused or wait before rebuilding. I despise race conditions. ugh
| } | ||
|
|
||
| func (es *EventSync) RebuildCache() { | ||
| // SDB - NOTE : Do we need to pause jobs. |
There was a problem hiding this comment.
Dang! @vivekschauhan @jcollins-axway I put this note in a very long time ago ;).
I was looking at another race condition down stream, but decided to see who called RebuildCache().
So RebuildCache is reachable by WithOnClientStop (poller) and With EventSyncError (stream) while the event listener go routine is alive, well technically alive.
In those paths the race is meh, maybe not a big deal because event flow has stopped like a harvester error or when the stream is not yet producing. But PublishingLock still doesn't coordinate with the listener. Maybe we should take a look at this again?
| agentStatus := newDAStatus(agentInstance.ResourceMeta, status, prevStatus, message) | ||
|
|
||
| // See if we need to rebuildCache | ||
| timeToRebuild, _ := a.shouldRebuildCache() |
There was a problem hiding this comment.
should we eat the error?
At least log the error/warn and then if requirement is to rebuild, then rebuild?
There was a problem hiding this comment.
maybe something like
timeToRebuild, err := a.shouldRebuildCache()
if err != nil {
a.logger.WithError(err).Warn("unable to determine cache rebuild state, triggering rebuild")
timeToRebuild = true
}
if timeToRebuild && a.rebuildCache != nil {
a.rebuildCache.RebuildCache()
}
| if value != nil { | ||
| logger := a.logger.WithField("cacheUpdateTime", value) | ||
| // get current cacheUpdateTime from x-agent-details | ||
| convToTimestamp, err := strconv.ParseInt(value.(string), 10, 64) |
There was a problem hiding this comment.
I know this is old and not part of your changes, but can we do a safe guard here
strVal, ok := value.(string)
if !ok {
logger.Warn("cacheUpdateTime is not a string, triggering rebuild")
return true, nil
}
Cache validation on start up and reconnect