Potential overflow in touch_cache

The FlowKit cache scoring mechanism (borrowed from https://github.com/dask/cachey) aims to retain cache entries that are frequently and/or recently used, by calculating a cache score with a contribution from each cache touch that decays exponentially over time, with more recent touches weighted more heavily.

This is achieved by adding to a cache entry's score a contribution proportional to $(1+\varepsilon)^t$ each time the cache entry is accessed (where $t$ is the "time" of the cache touch, with "time" defined as an incrementing counter of all cache touches across all cache entries).

The problem with this approach is that the value added to `cache_score_multiplier` grows exponentially with the number of cache touches, so in a long-running FlowKit instance the cache scores can get very large. In the worst case, if the number of cache touches exceeds `1024 * cache_half_life()`, attempting to touch a cache record will result in a double-precision overflow.

This can be mitigated by resetting the cache via `flowmachine.core.cache.reset_cache()`, which resets the counter. But ideally we would want to be able to keep the cache active indefinitely without resetting.

An alternative would be to scale down the previous cache score multiplier each time the cache record is touched, instead of increasing the value added for each cache touch. I.e instead of
```
UPDATE cache.cached
SET cache_score_multiplier = cache_score_multiplier + power( 1 + ln(2)/cache_half_life(),
                                                             nextval('cache.cache_touches') - 2
                                                           );
```
we could record the last cache touch counter value, and then do
```
SELECT nextval('cache.cache_touches') INTO this_cache_touch_number;
UPDATE cache.cached
SET cache_score_multiplier = cache_score_multiplier*power( 1-ln(2)/cache_half_life(),
                                                           this_cache_touch_number-cache_touch_number
                                                         ) + 1,
    cache_touch_number = this_cache_touch_number;
```
(or equivalently,
```
cache_score_multiplier = cache_score_multiplier*power( 2,
                                                       (cache_touch_number-this_cache_touch_number)/cache_half_life()
                                                     ) + 1
```
).

This would require adding a `cache_touch_number` column to `cache.cached`, and also we would need to scale down the cache score here: https://github.com/Flowminder/FlowKit/blob/f664158e0458572a00e59d9f5bfd9683a162ebc3/flowdb/bin/build/0030_utilities.sql#L270 by a factor of `power(2, (cache_touch_number - currval('cache.cache_touches'))/cache_half_life())`. Alternatively we could replace `(cache_touch_number-this_cache_touch_number)` with `EXTRACT('seconds' FROM last_accessed - NOW())`, and do away with the `cache_touches` counter altogether, so that the contribution from each cache touch decays exponentially with real time rather than total number of cache touches.

This approach is equivalent to the current one, but by normalising the cache score multipliers at each touch we prevent them from growing excessively large. There is then the possibility of underflow, instead of overflow, if a cache entry has been previously touched but not for a very long time - but in the event this occurs we could simply ignore the contribution from previous cache touches because it will be negligible compared to that from the new cache touch (i.e. set `cache_score_multiplier=1`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential overflow in touch_cache #7028

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential overflow in touch_cache #7028

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions