Skip to content

feat: add SnapshotManager#542

Open
zhjwpku wants to merge 7 commits intoapache:mainfrom
zhjwpku:add_snapshot_manager
Open

feat: add SnapshotManager#542
zhjwpku wants to merge 7 commits intoapache:mainfrom
zhjwpku:add_snapshot_manager

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Jan 28, 2026

The test cases are derived from Java's TestSnapshotManager.java. Since MergeAppend is not supported yet, some tests are omitted for now and can be added later.

The test cases are derived from Java's TestSnapshotManager.java.
Since MergeAppend is not supported yet, some tests are omitted for
now and can be added later.
Copy link
Contributor

@evindj evindj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM only question that I have is why is UpdateSnapshotReference and SnapShotManager both exposed. Also, is there a way we can integration test this change?

@zhjwpku zhjwpku force-pushed the add_snapshot_manager branch from 0ae7a0d to db23355 Compare February 10, 2026 10:26

private:
friend class PendingUpdate;
friend class SnapshotManager;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is necessary, because NewSetSnapshot and NewUpdateSnapshotReference have been moved to private.

}

Result<std::shared_ptr<SnapshotManager>> Transaction::NewSnapshotManager() {
// SnapshotManager has its own commit logic, so it is not added to the pending updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still call AddUpdate just like other updates and do not set last_update_committed_ to false for SnapshotManager. However, it seems that SnapshotManager cannot be retried easily so I'm fine to keep the current approach as is.

Result<std::shared_ptr<SnapshotManager>> Table::NewSnapshotManager() {
ICEBERG_ASSIGN_OR_RAISE(
auto transaction, Transaction::Make(shared_from_this(), Transaction::Kind::kUpdate,
/*auto_commit=*/false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here auto_commit means that users do not need to call Transaction::Commit() after calling SnapshotManager::Commit(). Should we set this to true and handle the commit logic internally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done that.

}

Status SnapshotManager::Commit() {
transaction_->EnableAutoCommit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to read and backup Transaction::auto_commit_ in the SnapshotManager ctor and then restore it before exiting this function, otherwise any CommitIfRefUpdatesExist() before SnapshotManager::Commit() will set Transaction::committed_ to true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To verify this issue, we can create a SnapshotManager from Table & Transaction and then call SetCurrentSnapshot and RollbackXXX for multiple times to see it they can be committed successfully.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases have been added, and I also implemented the backup/restore logic. I'm a bit unsure why this is necessary, after SnapshotManager::Commit(), the transaction is already marked as committed, I'm not clear what auto_commit_ is intended for after the commit.

void SetUp() override {
InitializeFileIO();
RegisterTableFromResource("TableMetadataV2Valid.json");
RegisterMinimalTableFromResource("TableMetadataV2ValidMinimal.json");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks weird to register two resources in the per-case setup function but each case uses only one of them. Does it make sense to use class inheritance for each resource?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I've split it into two classes: UpdateTestBase/MinimalUpdateTestBase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants