Skip to content

CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training#4667

Open
smiklosovic wants to merge 5 commits intoapache:trunkfrom
smiklosovic:CASSANDRA-21209
Open

CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training#4667
smiklosovic wants to merge 5 commits intoapache:trunkfrom
smiklosovic:CASSANDRA-21209

Conversation

@smiklosovic
Copy link
Contributor

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

there is configuration parsing all over the place, I think it should be
centralized and resolved from one code only
@smiklosovic smiklosovic changed the title CASSANDRA-21209 do not start trainer on manager creation, treat failure of trainer creation CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training Mar 12, 2026
}
finally
{
refViewFragment.close();
Copy link
Contributor Author

@smiklosovic smiklosovic Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think what we did here was too smart (same concept was there before we started to selectAndReference) because training is done asychronously, so this method returns and finally is called before the sampling is actually finished. We should close in callback, as done above, or only in case we catch exception, as done here.

Or no?

trainer.trainDictionaryAsync(force).addCallback

This "addCallback" makes synchronous call from that? I do not think so, it just registers what should be done after it is finished, but it is not a blocking call, I guess.

ScheduledExecutors.nonPeriodicTasks.submit(task);
try
{
trainer = ICompressionDictionaryTrainer.create(keyspaceName, tableName, compressionParams);
Copy link
Contributor Author

@smiklosovic smiklosovic Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whole execution chain (from manager.train) does everything to postpone trainer creation until it is absolutely necessary and all is OK, as the instantiation of a trainer might be memory-wise very demanding (when max sample size is not trivial) as it allocates a direct ByteBuffer. We do not want to create a trainer allocating a big buffer just to throw it away if something else goes south.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant