Skip to content

OAK-12282 : defining a fixed bound for the AbstractDiskCache#2978

Open
pat-lego wants to merge 4 commits into
apache:trunkfrom
pat-lego:issues/OAK-12282
Open

OAK-12282 : defining a fixed bound for the AbstractDiskCache#2978
pat-lego wants to merge 4 commits into
apache:trunkfrom
pat-lego:issues/OAK-12282

Conversation

@pat-lego

Copy link
Copy Markdown
Contributor

The AbstractPersistentCache is unbound leading to OOM exceptions.

@rishabhdaim rishabhdaim requested review from joerghoh and jsedding and removed request for joerghoh June 26, 2026 11:05
@rishabhdaim rishabhdaim changed the title OAK-12282 defining a fixed bound for the AbstractDiskCache OAK-12282 : defining a fixed bound for the AbstractDiskCache Jun 26, 2026

@joerghoh joerghoh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of minor things:

  • Comments: AI writes too many comments which are 100% valid and helpful in the context of this PR, but not really helpful in a year in the context of a different problem.
  • some consistency

configuration.redisMinConnections(), configuration.redisMaxConnections(), configuration.redisMaxTotalConnections(), configuration.redisDBIndex(), redisCacheIOMonitor);
closer.register(redisCache);

// OAK-12282: expose the bounded write-queue kill switch. Requires a restart to take effect.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// OAK-12282: expose the bounded write-queue kill switch. Requires a restart to take effect.
// OAK-12282: expose the bounded write-queue kill switch.

That 2nd setence is a bit misleading

* registered with the Whiteboard to revert to the pre-fix unbounded queue.
* <strong>Note:</strong> changing this flag requires a process restart, as
* the executor is created at startup.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this for consistency reasons (or we would need to annotate every configuration parameter for the cache with the same).

() -> Long.valueOf(directory.listFiles().length),
() -> FileUtils.sizeOfDirectory(directory),
() -> evictionCount.get());
segmentCacheStats.setWriteDiscardCountSupplier(() -> discardCount.get());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very inconsistent. I would extend the c'tor of SegmentCacheStats by one more parameter.

Comment on lines +81 to +84
// Formerly Executors.newFixedThreadPool() was used here, which creates an unbounded
// LinkedBlockingQueue — allowing unlimited segment buffers to pile up in memory under
// high write load. The bounded queue (gated by FT_OAK_12282) prevents OOM by dropping
// write tasks when full; this is safe because the disk cache is an optimisation only.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is typical AI, but it won't help if in a year someone is looking at this code (If I want to understand what was here before, I can checkout the right git commit ...)

Suggested change
// Formerly Executors.newFixedThreadPool() was used here, which creates an unbounded
// LinkedBlockingQueue — allowing unlimited segment buffers to pile up in memory under
// high write load. The bounded queue (gated by FT_OAK_12282) prevents OOM by dropping
// write tasks when full; this is safe because the disk cache is an optimisation only.
// Limit the amount of Runnable in the queue to avoid memory issues. Discarding elements from
// the queue is not a problem, as caching is just an optimization; if it's dropped, it will be downloaded
// again and tried again.

BlockingQueue<Runnable> writeQueue = FT_OAK_12282_BOUNDED_WRITE_QUEUE_ENABLED.get()
? new LinkedBlockingQueue<>(WRITE_QUEUE_SIZE)
: new LinkedBlockingQueue<>();
executor = new ThreadPoolExecutor(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now these 10 threads are named quite randomly, and therefor they are hard to detect; can you use a ThreadFactory to name them accordingly?

Comment on lines +93 to +94
discardCount.incrementAndGet();
logger.debug("Segment write task discarded: write queue full (capacity={})", WRITE_QUEUE_SIZE);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also log the number of discarded items from the queue

Suggested change
discardCount.incrementAndGet();
logger.debug("Segment write task discarded: write queue full (capacity={})", WRITE_QUEUE_SIZE);
long discarded = discardCount.incrementAndGet();
logger.debug("Segment write task discarded: write queue full (capacity={}, already discarded={})", WRITE_QUEUE_SIZE, discarded);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log should be throttled, as otherwise we risk flooding the logs if for some reason the write threads cannot keep up with the reads from Azure. See the LogSilencer class.

);
}

public void setWriteDiscardCountSupplier(@NotNull Supplier<Long> supplier) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, I would add this to the constructor for consistency reasons

private static final Logger logger = LoggerFactory.getLogger(AbstractPersistentCache.class);

public static final int THREADS = Integer.getInteger("oak.segment.cache.threads", 10);
public static final int WRITE_QUEUE_SIZE = Integer.getInteger("oak.segment.cache.writeQueueSize", THREADS * 100);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing to disk will be limited to the performance of the IO system and does not scale much with the number of threads that are writing. In fact, a single thread writing efficiently to disk will likely saturate the disk. The max queue size should be a function of the write IO of the disk and of the memory that we want to devote to buffering pending writes, it should not depend on the number of write threads. And we also don't want to cause an OOME only by increasing the number of threads. The number of writers and the size of the queue should be separate settings.

Since segments are around 256KB, a queue size of 100 would take up 25MB when full. I'd say this is enough, as writes to a local SSD should be faster than reading from the network, so a few write threads should be able to keep up with whatever we can get from Azure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing to disk will be limited to the performance of the IO system and does not scale much with the number of threads that are writing

This is definitely true for the classic hard drives (aka "spinning rust"), but not anymore for NVME-SSDs. You get the full IOPS only when you execute operations at a massive scale (IIRC 256 concurrent threads). And this is applicable for both read and write. Assuming a modern operating system also the filesystem should be able to handle that.
For that I strongly recommend to keep these 10 as default.

Comment on lines +93 to +94
discardCount.incrementAndGet();
logger.debug("Segment write task discarded: write queue full (capacity={})", WRITE_QUEUE_SIZE);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log should be throttled, as otherwise we risk flooding the logs if for some reason the write threads cannot keep up with the reads from Azure. See the LogSilencer class.

* Name of the feature toggle for the OAK-12282 bounded write-queue fix.
* See {@link #FT_OAK_12282_BOUNDED_WRITE_QUEUE_ENABLED}.
*/
public static final String FT_OAK_12282 = "FT_OAK-12282";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think this is unnecessary. The change is quite safe. And if for some reason having a bound on the queue causes trouble, we can increase it's size to be effectively unlimited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants