OAK-12282 : defining a fixed bound for the AbstractDiskCache#2978
OAK-12282 : defining a fixed bound for the AbstractDiskCache#2978pat-lego wants to merge 4 commits into
Conversation
joerghoh
left a comment
There was a problem hiding this comment.
a lot of minor things:
- Comments: AI writes too many comments which are 100% valid and helpful in the context of this PR, but not really helpful in a year in the context of a different problem.
- some consistency
| configuration.redisMinConnections(), configuration.redisMaxConnections(), configuration.redisMaxTotalConnections(), configuration.redisDBIndex(), redisCacheIOMonitor); | ||
| closer.register(redisCache); | ||
|
|
||
| // OAK-12282: expose the bounded write-queue kill switch. Requires a restart to take effect. |
There was a problem hiding this comment.
| // OAK-12282: expose the bounded write-queue kill switch. Requires a restart to take effect. | |
| // OAK-12282: expose the bounded write-queue kill switch. |
That 2nd setence is a bit misleading
| * registered with the Whiteboard to revert to the pre-fix unbounded queue. | ||
| * <strong>Note:</strong> changing this flag requires a process restart, as | ||
| * the executor is created at startup. | ||
| */ |
There was a problem hiding this comment.
I would remove this for consistency reasons (or we would need to annotate every configuration parameter for the cache with the same).
| () -> Long.valueOf(directory.listFiles().length), | ||
| () -> FileUtils.sizeOfDirectory(directory), | ||
| () -> evictionCount.get()); | ||
| segmentCacheStats.setWriteDiscardCountSupplier(() -> discardCount.get()); |
There was a problem hiding this comment.
Very inconsistent. I would extend the c'tor of SegmentCacheStats by one more parameter.
| // Formerly Executors.newFixedThreadPool() was used here, which creates an unbounded | ||
| // LinkedBlockingQueue — allowing unlimited segment buffers to pile up in memory under | ||
| // high write load. The bounded queue (gated by FT_OAK_12282) prevents OOM by dropping | ||
| // write tasks when full; this is safe because the disk cache is an optimisation only. |
There was a problem hiding this comment.
This comment is typical AI, but it won't help if in a year someone is looking at this code (If I want to understand what was here before, I can checkout the right git commit ...)
| // Formerly Executors.newFixedThreadPool() was used here, which creates an unbounded | |
| // LinkedBlockingQueue — allowing unlimited segment buffers to pile up in memory under | |
| // high write load. The bounded queue (gated by FT_OAK_12282) prevents OOM by dropping | |
| // write tasks when full; this is safe because the disk cache is an optimisation only. | |
| // Limit the amount of Runnable in the queue to avoid memory issues. Discarding elements from | |
| // the queue is not a problem, as caching is just an optimization; if it's dropped, it will be downloaded | |
| // again and tried again. |
| BlockingQueue<Runnable> writeQueue = FT_OAK_12282_BOUNDED_WRITE_QUEUE_ENABLED.get() | ||
| ? new LinkedBlockingQueue<>(WRITE_QUEUE_SIZE) | ||
| : new LinkedBlockingQueue<>(); | ||
| executor = new ThreadPoolExecutor( |
There was a problem hiding this comment.
right now these 10 threads are named quite randomly, and therefor they are hard to detect; can you use a ThreadFactory to name them accordingly?
| discardCount.incrementAndGet(); | ||
| logger.debug("Segment write task discarded: write queue full (capacity={})", WRITE_QUEUE_SIZE); |
There was a problem hiding this comment.
I would also log the number of discarded items from the queue
| discardCount.incrementAndGet(); | |
| logger.debug("Segment write task discarded: write queue full (capacity={})", WRITE_QUEUE_SIZE); | |
| long discarded = discardCount.incrementAndGet(); | |
| logger.debug("Segment write task discarded: write queue full (capacity={}, already discarded={})", WRITE_QUEUE_SIZE, discarded); |
There was a problem hiding this comment.
This log should be throttled, as otherwise we risk flooding the logs if for some reason the write threads cannot keep up with the reads from Azure. See the LogSilencer class.
| ); | ||
| } | ||
|
|
||
| public void setWriteDiscardCountSupplier(@NotNull Supplier<Long> supplier) { |
There was a problem hiding this comment.
as mentioned above, I would add this to the constructor for consistency reasons
| private static final Logger logger = LoggerFactory.getLogger(AbstractPersistentCache.class); | ||
|
|
||
| public static final int THREADS = Integer.getInteger("oak.segment.cache.threads", 10); | ||
| public static final int WRITE_QUEUE_SIZE = Integer.getInteger("oak.segment.cache.writeQueueSize", THREADS * 100); |
There was a problem hiding this comment.
Writing to disk will be limited to the performance of the IO system and does not scale much with the number of threads that are writing. In fact, a single thread writing efficiently to disk will likely saturate the disk. The max queue size should be a function of the write IO of the disk and of the memory that we want to devote to buffering pending writes, it should not depend on the number of write threads. And we also don't want to cause an OOME only by increasing the number of threads. The number of writers and the size of the queue should be separate settings.
Since segments are around 256KB, a queue size of 100 would take up 25MB when full. I'd say this is enough, as writes to a local SSD should be faster than reading from the network, so a few write threads should be able to keep up with whatever we can get from Azure.
There was a problem hiding this comment.
Writing to disk will be limited to the performance of the IO system and does not scale much with the number of threads that are writing
This is definitely true for the classic hard drives (aka "spinning rust"), but not anymore for NVME-SSDs. You get the full IOPS only when you execute operations at a massive scale (IIRC 256 concurrent threads). And this is applicable for both read and write. Assuming a modern operating system also the filesystem should be able to handle that.
For that I strongly recommend to keep these 10 as default.
| discardCount.incrementAndGet(); | ||
| logger.debug("Segment write task discarded: write queue full (capacity={})", WRITE_QUEUE_SIZE); |
There was a problem hiding this comment.
This log should be throttled, as otherwise we risk flooding the logs if for some reason the write threads cannot keep up with the reads from Azure. See the LogSilencer class.
| * Name of the feature toggle for the OAK-12282 bounded write-queue fix. | ||
| * See {@link #FT_OAK_12282_BOUNDED_WRITE_QUEUE_ENABLED}. | ||
| */ | ||
| public static final String FT_OAK_12282 = "FT_OAK-12282"; |
There was a problem hiding this comment.
Personally, I think this is unnecessary. The change is quite safe. And if for some reason having a bound on the queue causes trouble, we can increase it's size to be effectively unlimited.
The AbstractPersistentCache is unbound leading to OOM exceptions.