Skip to content

CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime#183

Open
skoppu22 wants to merge 10 commits intoapache:trunkfrom
skoppu22:ltime
Open

CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime#183
skoppu22 wants to merge 10 commits intoapache:trunkfrom
skoppu22:ltime

Conversation

@skoppu22
Copy link
Contributor

@skoppu22 skoppu22 commented Mar 17, 2026

  • Integers cannot hold absolute times beyond year 2038
  • C* 5.0 fixed this, and supporting extended localDeletionTime. C* analytics need to support this especially for CDC events.
  • Changed deletedAt in Avro schema to long

CI run here : https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/75/workflows/f6d5f03b-8373-4549-8daf-5f776f78ff96

@skoppu22 skoppu22 marked this pull request as draft March 17, 2026 17:02
@skoppu22 skoppu22 marked this pull request as ready for review March 18, 2026 10:39
Copy link
Contributor

@jyothsnakonisa jyothsnakonisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this shailaja, looks good to me left few minor comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you wanna revert these changes or are these intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional, CI pipelines failing otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class looks very similar to "LocalTableSchemaStore" can you check if you can use that instead of adding this new class. You might make small modifications in LocalTableSchemaStore if required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalTableSchemaStore and new Avro transformer tests are in different modules. LocalTableSchemaStore loads avsc files from class path which doesn't exists in this other modules. Hence using TestSchemaStore, which doesn't load any schema files, instead allows registering of schemas.

Comment on lines +67 to +73
CORE_MAX_PARALLEL_FORKS: 2
CORE_TEST_MAX_HEAP_SIZE: "2048m"
CASSANDRA_USE_JDK11: <<parameters.use_jdk11>>
command: |
export GRADLE_OPTS="-Xmx2g -Dorg.gradle.jvmargs=-Xmx2g"
# Run compile/unit tests, skipping integration tests
./gradlew --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>>
./gradlew --no-daemon --max-workers=2 --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making CI pipeline changes, can you try to tag the relevant tests with @Tag("Sequential")?

--no-daemon flag makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We don't know which tests causing this to tag them. Gradle getting killed while executing pipelines starting with spark3-* which run unit tests (gradlew check). Pipelines starting with int-* are running fine. This problem exists in trunk, not created by this PR. Running CI with --info didn't give any clues as it was bloating logs and CI killing gradle
  • And we will have the same problem of finding the culprit everytime CI is broken

Comment on lines +284 to +291
public static Map<String, Long> getTTL(CdcEvent event)
{
CdcEvent.TimeToLive ttl = event.getTtl();
if (ttl == null)
{
return null;
}
return mapOf(AvroConstants.TTL_KEY, ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec);
return mapOf(AvroConstants.TTL_KEY, (long) ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why converting TTL to long, meanwhile the schema has it as int?
The change is probably unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are creating map, one entry as (TTL_KEY, TTL value as int), second entry (DELETED_AT_KEY, expirationTimeInSec value as long), so we have to take higher of int and long. Otherwise we need to return Map<String, Number> so Number can represent both int and long.


@ParameterizedTest
@MethodSource("org.apache.cassandra.cdc.test.TestVersionSupplier#testVersions")
public void testBasicInsertAvroEncoding(CassandraVersion version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file adds a few irrelevant test cases. Improving test coverage is in general a good thing. So I am fine with those test cases in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have updated two avro schema files, and there are no existing tests going through avro transformers. Hence had to add them. testMaxSupportedTtlAvroEncoding test below in this file verifies deletedAt is long and supporting max TTL as expected on both C* 4.x and 5.0

maxParallelForks = Math.max(Runtime.runtime.availableProcessors() * 2, 8)
maxHeapSize = System.getenv('CORE_TEST_MAX_HEAP_SIZE') ?: '3072m'
maxParallelForks = System.getenv('CORE_MAX_PARALLEL_FORKS')?.toInteger()
?: Math.max(Runtime.runtime.availableProcessors() * 2, 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try out the Sequential test tag first.

Copy link
Contributor Author

@skoppu22 skoppu22 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, pipelines broken on trunk, we don't know which test caused that, gradlew check is getting killed on CI

Dataset<Row> preExpiry = bulkReaderDataFrame(UDT_TTL_TABLE).load();
assertThat(preExpiry.collectAsList()).hasSize(ROW_COUNT);

Uninterruptibles.sleepUninterruptibly(80, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find a different way and avoid sleep the test for 80 seconds. We cannot have the CI sleep for 80 seconds.

If there is no other way, please delete all the tests in this file. They are irrelevant to the patch anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are relevant. CqlSet, CqlMap etc are implemented in C* 4.0 bridge and extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int, 5.0 bridge expects long. I have modified these files below to use CqlType's method which is implemented separately for both bridges. And there are no existing tests for collections TTL going though this modified code path.

* Tests verifying Y2038 boundary behavior for {@link CqlType#tombstone} and {@link CqlType#expiring}
* in Cassandra 4.0.
*/
class CqlTypeY2038Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this test is valuable. Analytics/CDC does not set TTL, it only reads the TTL values from Cassandra. The TTL values are guaranteed to be valid already; otherwise the cql write fails already. Please delete this file.

Copy link
Contributor Author

@skoppu22 skoppu22 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just ensuring checkedCast code we added indeed complaining when time indeed exceeds the range where int cannot accommodate. But the test can be removed as we are sure checkedCast does the job.

Comment on lines -115 to 117
rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, type().serialize(o),
CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes()))));
rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, type().serialize(o),
CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes()))));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change can be reverted once you drop CqlTypeY2038Test and revert the change in CqlType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BufferCell.expiring in 4.0 bridge expects int for nowInSec, 5.0 bridge expects long. CqlList implemented in C* 5.0 bridge doesn't override this method to pass long. Hence I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to override this method again in 5.0 bridge to pass long value.

Comment on lines -130 to +131
rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()),
CellPath.create(keyType().serialize(entry.getKey()))));
rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()),
CellPath.create(keyType().serialize(entry.getKey()))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be reverted too

Copy link
Contributor Author

@skoppu22 skoppu22 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CqlMap is implemented in C* 4.0 bridge and not extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int for nowInSec, 5.0 bridge expects long for the same. I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to implement CqlMap again in 5.0 bridge and override this function to pass long value.

Comment on lines -114 to +115
rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER,
CellPath.create(type().serialize(o))));
rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER,
CellPath.create(type().serialize(o))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as explained above for CqlMap

CHANGES.txt Outdated
@@ -1,5 +1,6 @@
0.4.0
-----
* Fix year 2038 problem using long for absolute times and support C* 5.0 extended localDeletionTime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year 2038 problem sounds cool, but I feel it shades the actual change in this patch.

I would revise the change log entry to "Support extended deletion time in CDC for Cassandra 5.0" and update the JIRA title too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants