Skip to content

CASSANDRA-20012: Add blob type support to SAI#4685

Open
tmoschou wants to merge 3 commits intoapache:cassandra-5.0from
tmoschou:tmoschou/CASSANDRA-20012/cassandra-5.0
Open

CASSANDRA-20012: Add blob type support to SAI#4685
tmoschou wants to merge 3 commits intoapache:cassandra-5.0from
tmoschou:tmoschou/CASSANDRA-20012/cassandra-5.0

Conversation

@tmoschou
Copy link

Motivation

Users frequently store fixed-size binary identifiers in blob columns (e.g. SHA-256 hashes, proprietary binary formats) and choose blobs over string representations to reduce disk space. Previously, creating a SAI index on a blob column was rejected with Unsupported type: blob, forcing workarounds like Base64 encoding.

Blob support was originally excluded due to concerns about indexing arbitrarily large blobs (e.g. serialized objects, pagination cursors, or media payloads that could be many kilobytes). This patch introduces a dedicated sai_blob_term_size_warn_threshold (default 1KiB) / sai_blob_term_size_fail_threshold (default 8KiB) guardrail, following the existing pattern of sai_string_term_size_*, sai_frozen_term_size_*, and sai_vector_term_size_*. This allows operators to configure blob term size limits independently.

Summary

Add support for the blob CQL type in Storage Attached Index (SAI) as an equality-only (EQ) indexed literal type.

Before

CREATE TABLE mytable (id uuid, blob blob, PRIMARY KEY (id));
CREATE INDEX blob_idx ON mytable (blob) USING 'sai';
-- InvalidQueryException: Unsupported type: blob

After

CREATE TABLE mytable (id uuid, blob blob, PRIMARY KEY (id));
CREATE INDEX blob_idx ON mytable (blob) USING 'sai';
-- OK

INSERT INTO mytable (id, blob) VALUES (uuid(), 0xdeadbeef);
SELECT * FROM mytable WHERE blob = 0xdeadbeef;
  • Add CQL3Type.Native.BLOB to StorageAttachedIndex.SUPPORTED_TYPES
  • Add BytesType to EQ_ONLY_TYPES in IndexTermType and introduce a new BYTES capability so blob columns are classified as literal (trie-indexed) types
  • Add dedicated sai_blob_term_size_warn_threshold / sai_blob_term_size_fail_threshold guardrails (defaults: 1KiB / 8KiB)
  • Add BlobDataSet and parameterized CQL-level tests for the blob type, both standalone and within all collection variants (list, set, map keys/values/entries, frozen collections)
  • Add GuardrailSaiBlobTermSizeTest for the new guardrail
  • Update SAI documentation (sai-concepts.adoc, sai-faq.adoc) and cassandra.yaml / cass_yaml_file.adoc to reflect blob support

Performance

Each new parameterized test class adds ~7 seconds of runtime. With 11 new blob test classes, this adds roughly 75–80 seconds to the SAI type test suite. This follows the existing pattern used by all other SAI type tests (e.g. BooleanTest, InetTest, etc.).

Test plan

  • New unit tests: BlobTest, ListBlobTest, FrozenListBlobTest, SetBlobTest, FrozenSetBlobTest, MapBlobTest, FrozenMapBlobTest, MapKeysBlobTest, MapValuesBlobTest, MapEntriesBlobTest, MultiMapBlobTest
  • Updated IndexTermTypeTest to assert blob is a literal type
  • CI (CircleCI)

CASSANDRA-20012

@tmoschou tmoschou force-pushed the tmoschou/CASSANDRA-20012/cassandra-5.0 branch 2 times, most recently from 40cf76a to 7084d21 Compare March 19, 2026 07:50
Add support for the blob CQL type in Storage Attached Index (SAI) as an
equality-only (EQ) indexed literal type.
@tmoschou tmoschou force-pushed the tmoschou/CASSANDRA-20012/cassandra-5.0 branch from 7084d21 to c0395a5 Compare March 19, 2026 07:57
@tmoschou tmoschou marked this pull request as ready for review March 19, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant