CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training by smiklosovic · Pull Request #4667 · apache/cassandra

smiklosovic · 2026-03-11T15:37:15Z

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

…eation

there is configuration parsing all over the place, I think it should be centralized and resolved from one code only

smiklosovic · 2026-03-12T16:42:55Z

src/java/org/apache/cassandra/db/compression/CompressionDictionaryScheduler.java

-            }
-            finally
-            {
-                refViewFragment.close();


I do not think what we did here was too smart (same concept was there before we started to selectAndReference) because training is done asychronously, so this method returns and finally is called before the sampling is actually finished. We should close in callback, as done above, or only in case we catch exception, as done here.

Or no?

trainer.trainDictionaryAsync(force).addCallback

This "addCallback" makes synchronous call from that? I do not think so, it just registers what should be done after it is finished, but it is not a blocking call, I guess.

smiklosovic · 2026-03-12T16:56:13Z

src/java/org/apache/cassandra/db/compression/CompressionDictionaryScheduler.java

-        ScheduledExecutors.nonPeriodicTasks.submit(task);
+        try
+        {
+            trainer = ICompressionDictionaryTrainer.create(keyspaceName, tableName, compressionParams);


whole execution chain (from manager.train) does everything to postpone trainer creation until it is absolutely necessary and all is OK, as the instantiation of a trainer might be memory-wise very demanding (when max sample size is not trivial) as it allocates a direct ByteBuffer. We do not want to create a trainer allocating a big buffer just to throw it away if something else goes south.

smiklosovic added 3 commits March 11, 2026 16:35

do not start trainer on manager creation, treat failure of trainer cr…

0a5bc7f

…eation

refactoring of configuration parsing

3dc3ad7

there is configuration parsing all over the place, I think it should be centralized and resolved from one code only

trainer per training

ff973f4

smiklosovic changed the title ~~CASSANDRA-21209 do not start trainer on manager creation, treat failure of trainer creation~~ CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training Mar 12, 2026

smiklosovic commented Mar 12, 2026

View reviewed changes

fix

596182d

smiklosovic force-pushed the CASSANDRA-21209 branch from ccbef18 to 596182d Compare March 12, 2026 16:47

smiklosovic commented Mar 12, 2026

View reviewed changes

fixes

010004d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training#4667

CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training#4667
smiklosovic wants to merge 5 commits intoapache:trunkfrom
smiklosovic:CASSANDRA-21209

smiklosovic commented Mar 11, 2026

Uh oh!

smiklosovic Mar 12, 2026 •

edited

Loading

Uh oh!

smiklosovic Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smiklosovic commented Mar 11, 2026

Uh oh!

smiklosovic Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smiklosovic Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

smiklosovic Mar 12, 2026 •

edited

Loading

smiklosovic Mar 12, 2026 •

edited

Loading