Cloud: Add tip for Iceberg object_storage example with BigQuery#464
Cloud: Add tip for Iceberg object_storage example with BigQuery#464
Conversation
✅ Deploy Preview for rp-cloud ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis pull request makes two documentation-related changes for the Iceberg + BigQuery integration feature. The local Antora playbook is updated to use a feature-specific branch (DOC-1306-iceberg-bigquery-integration) instead of the main branch for documentation sources, while preserving other branch patterns. Additionally, the REST catalog index documentation page restores its page-layout configuration and adds a TIP block advising users to utilize REST catalogs for production environments with a reference to filesystem-based catalog alternatives. Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
local-antora-playbook.yml(1 hunks)modules/manage/pages/iceberg/rest-catalog/index.adoc(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: micheleRP
Repo: redpanda-data/cloud-docs PR: 390
File: modules/manage/pages/schema-reg/schema-reg-authorization.adoc:4-4
Timestamp: 2025-08-15T04:45:28.695Z
Learning: In the Redpanda documentation system, content is single-sourced across multiple repositories (cloud-docs and docs repos). Include directives in the cloud-docs repo may reference files that exist in separate PRs in the docs repo. These PRs are linked via local-antora-playbook.yml for preview rendering, and the includes resolve correctly during the Antora build process when repositories are merged. The playbook is reverted to main before merging. This cross-repository single sourcing pattern is commonly used, so missing include targets should be verified against linked PRs in other repositories before flagging as errors.
📚 Learning: 2025-08-15T04:45:28.695Z
Learnt from: micheleRP
Repo: redpanda-data/cloud-docs PR: 390
File: modules/manage/pages/schema-reg/schema-reg-authorization.adoc:4-4
Timestamp: 2025-08-15T04:45:28.695Z
Learning: In the Redpanda documentation system, content is single-sourced across multiple repositories (cloud-docs and docs repos). Include directives in the cloud-docs repo may reference files that exist in separate PRs in the docs repo. These PRs are linked via local-antora-playbook.yml for preview rendering, and the includes resolve correctly during the Antora build process when repositories are merged. The playbook is reverted to main before merging. This cross-repository single sourcing pattern is commonly used, so missing include targets should be verified against linked PRs in other repositories before flagging as errors.
Applied to files:
local-antora-playbook.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Redirect rules - rp-cloud
- GitHub Check: Header rules - rp-cloud
- GitHub Check: Pages changed - rp-cloud
🔇 Additional comments (1)
local-antora-playbook.yml (1)
18-18: Temporary playbook change for cross-repo single-sourcing — remember to revert before merge.The branch configuration now targets the feature branch
DOC-1306-iceberg-bigquery-integrationto coordinate with the linked docs repo PR #1494. This is the correct approach for the preview phase. However, ensure the playbook is reverted tobranches: mainbefore the PR is merged, as per standard practice for this repository.Based on learnings, this playbook change is temporary and standard for cross-repo single-sourcing efforts. Verify it will be reverted before final merge.
| :page-layout: index | ||
|
|
||
| TIP: These guides are for integrating Iceberg topics with managed REST catalogs. Integrating with a REST catalog is recommended for production deployments. If it is not possible to use a REST catalog, you can use the xref:manage:iceberg/use-iceberg-catalogs.adoc#object-storage[filesystem-based catalog]. For an example of using the filesystem-based catalog to access Iceberg topics, see the https://www.redpanda.com/blog/iceberg-topics-redpanda-cloud-byoc-setup[Getting Started with Iceberg Topics on Redpanda BYOC^] blog post. No newline at end of file |
There was a problem hiding this comment.
🧩 Analysis chain
Script executed:
#!/bin/bash
# Search for the use-iceberg-catalogs.adoc file
echo "=== Searching for use-iceberg-catalogs.adoc ==="
fd -t f "use-iceberg-catalogs.adoc"
# Check for the file and look for object-storage anchor
echo -e "\n=== Checking for [[object-storage]] anchor ==="
fd -t f "use-iceberg-catalogs.adoc" --exec rg -n "\[\[object-storage\]\]|== .*[Oo]bject.*[Ss]torage|=== .*[Oo]bject.*[Ss]torage" {} +Repository: redpanda-data/cloud-docs
<shell_result>
Repository: redpanda-data/cloud-docs
Exit code: 1
stdout:
=== Searching for use-iceberg-catalogs.adoc ===
modules/manage/pages/iceberg/use-iceberg-catalogs.adoc
=== Checking for [[object-storage]] anchor ===
stderr:
+: No such file or directory (os error 2)
</shell_result>
Script executed:
#!/bin/bash
# Search the file directly for object-storage anchor
echo "=== Checking for [[object-storage]] anchor or equivalent ==="
rg -n "\[\[object-storage\]\]|== .*[Oo]bject.*[Ss]torage|=== .*[Oo]bject.*[Ss]torage" modules/manage/pages/iceberg/use-iceberg-catalogs.adoc
echo -e "\n=== File content (first 100 lines) ==="
head -100 modules/manage/pages/iceberg/use-iceberg-catalogs.adocRepository: redpanda-data/cloud-docs
<shell_result>
Repository: redpanda-data/cloud-docs
Exit code: 0
stdout:
=== Checking for [[object-storage]] anchor or equivalent ===
=== File content (first 100 lines) ===
= Use Iceberg Catalogs
:description: Learn how to access Redpanda topic data stored in Iceberg tables, using table metadata or a catalog integration.
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
include::ROOT:manage:iceberg/use-iceberg-catalogs.adoc[tag=single-source]
</shell_result>
Script executed:
#!/bin/bash
# Search for object-storage anchor across all adoc files
echo "=== Searching for [[object-storage]] anchor across all .adoc files ==="
rg -n "\[\[object-storage\]\]" --type adoc
echo -e "\n=== Searching for object-storage section headers ==="
rg -n "== .*[Oo]bject" --type adoc | head -20
echo -e "\n=== Full content of use-iceberg-catalogs.adoc ==="
wc -l modules/manage/pages/iceberg/use-iceberg-catalogs.adoc
cat -n modules/manage/pages/iceberg/use-iceberg-catalogs.adocRepository: redpanda-data/cloud-docs
<shell_result>
Repository: redpanda-data/cloud-docs
Exit code: 0
stdout:
=== Searching for [[object-storage]] anchor across all .adoc files ===
=== Searching for object-storage section headers ===
=== Full content of use-iceberg-catalogs.adoc ===
4 modules/manage/pages/iceberg/use-iceberg-catalogs.adoc
1 = Use Iceberg Catalogs
2 :description: Learn how to access Redpanda topic data stored in Iceberg tables, using table metadata or a catalog integration.
3 :page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
4
5 include::ROOT:manage:iceberg/use-iceberg-catalogs.adoc[tag=single-source]
stderr:
unrecognized file type: adoc
unrecognized file type: adoc
</shell_result>
Script executed:
#!/bin/bash
# Search for object-storage anchor without type filter
echo "=== Searching for [[object-storage]] anchor ==="
rg -n "\[\[object-storage\]\]"
echo -e "\n=== Searching for object-storage in any context ==="
rg -n "object-storage" | head -30
echo -e "\n=== Looking for the included file in ROOT module ==="
fd "use-iceberg-catalogs.adoc"Repository: redpanda-data/cloud-docs
<shell_result>
Repository: redpanda-data/cloud-docs
Exit code: 0
stdout:
=== Searching for [[object-storage]] anchor ===
=== Searching for object-storage in any context ===
modules/manage/pages/cluster-maintenance/config-cluster.adoc:6:For a complete list of the cluster properties available in Redpanda Cloud, see xref:reference:properties/cluster-properties.adoc[Cluster Configuration Properties] and xref:reference:properties/object-storage-properties.adoc[Object Storage Properties].
modules/manage/pages/iceberg/rest-catalog/index.adoc:5:TIP: These guides are for integrating Iceberg topics with managed REST catalogs. Integrating with a REST catalog is recommended for production deployments. If it is not possible to use a REST catalog, you can use the xref:manage:iceberg/use-iceberg-catalogs.adoc#object-storage[filesystem-based catalog]. For an example of using the filesystem-based catalog to access Iceberg topics, see the https://www.redpanda.com/blog/iceberg-topics-redpanda-cloud-byoc-setup[Getting Started with Iceberg Topics on Redpanda BYOC^] blog post.
modules/reference/pages/properties/object-storage-properties.adoc:11:include::ROOT:reference:partial$properties/object-storage-properties.adoc[tags=redpanda-cloud;!deprecated;!exclude-from-docs]
modules/ROOT/nav.adoc:485:*** xref:reference:properties/object-storage-properties.adoc[]
modules/get-started/pages/whats-new-cloud.adoc:226:You can now xref:manage:cluster-maintenance/config-cluster.adoc#view-cluster-property-values[view the value of read-only cluster configuration properties] with `rpk cluster config` or with the Cloud API. Available properties are listed in xref:reference:properties/cluster-properties.adoc[Cluster Properties] and xref:reference:properties/object-storage-properties.adoc[Object Storage Properties].
=== Looking for the included file in ROOT module ===
modules/manage/pages/iceberg/use-iceberg-catalogs.adoc
</shell_result>
Script executed:
#!/bin/bash
# Search for the ROOT module include source
echo "=== Looking for the source of the include directive ==="
fd "use-iceberg-catalogs.adoc" -x rg -l "object-storage"
echo -e "\n=== Search for files in ROOT module partials ==="
fd -t d "ROOT" | head -5
find . -path "*/ROOT/*" -name "*iceberg*" -o -path "*/ROOT/*" -name "*catalog*" 2>/dev/null | head -20
echo -e "\n=== Check for any anchor definitions ==="
rg "\[\[.*storage.*\]\]" | grep -i icebergRepository: redpanda-data/cloud-docs
<shell_result>
Repository: redpanda-data/cloud-docs
Exit code: 1
stdout:
=== Looking for the source of the include directive ===
=== Search for files in ROOT module partials ===
modules/ROOT/
=== Check for any anchor definitions ===
</shell_result>
https://www.redpanda.com/blog/iceberg-topics-redpanda-cloud-byoc-setup
<web_search_result>
Summary of "Getting started with Iceberg Topics on Redpanda BYOC" (May 13, 2025) — key points from the Redpanda blog:
-
Iceberg Topics (introduced in Redpanda 25.1) let Redpanda write Kafka topics automatically in Apache Iceberg format so downstream systems (BigQuery, Snowflake, Databricks, Athena, etc.) can query them with SQL. Benefits include data-lake interoperability, time-travel, schema evolution, ACID semantics, and large-scale performance. [1]
-
BYOC beta features: self-service cluster-level Iceberg settings via rpk or Cloud API, integration with REST catalogs (e.g., Snowflake Open Catalog) or file-based catalogs (for BigQuery), and secure credential handling for catalogs. [1]
-
GCP BYOC demo (step summary):
- Prereqs: Redpanda Cloud account, install/update rpk, enable Iceberg integration. [1]
- Create a topic and enable iceberg mode (choose mode like value_schema_id_prefix, value_schema_latest, or key_value). [1]
- Produce data using Schema Registry (register schema, produce with registry wire format). [1]
- Configure Tiered Storage to write Iceberg metadata/files to a GCS bucket. [1]
- Create a BigQuery external table pointing to the Iceberg metadata file in GCS (CREATE EXTERNAL TABLE ... OPTIONS(format='ICEBERG', metadata_file_paths=[...])). Query via BigQuery as usual. [1]
-
Links and further reading: Redpanda docs on Iceberg Topics, BigQuery/BigLake Iceberg guidance, and related Redpanda docs (dead letter queue, custom partitioning). [1]
Source: Redpanda blog — "Getting started with Iceberg Topics on Redpanda BYOC" (May 13, 2025). [1]
[1] https://www.redpanda.com/blog/iceberg-topics-redpanda-cloud-byoc-setup
</web_search_result>
Critical: Internal xref anchor is missing and will result in a broken link.
The external blog URL is valid and accessible, but the internal xref target contains a critical issue: the #object-storage anchor referenced in xref:manage:iceberg/use-iceberg-catalogs.adoc#object-storage does not exist in the target file. The file modules/manage/pages/iceberg/use-iceberg-catalogs.adoc exists but only contains an include directive and no [[object-storage]] anchor definition. This broken reference will cause the link to fail during documentation builds or when users click it.
The TIP block content and structure are well-written, but the internal link must be corrected before this change can be approved. Either:
- Add the
[[object-storage]]anchor to the target file at the appropriate section, or - Update the xref to point to the correct anchor location if it exists elsewhere
🤖 Prompt for AI Agents
In modules/manage/pages/iceberg/rest-catalog/index.adoc around lines 3 to 5 the
TIP contains an xref to
xref:manage:iceberg/use-iceberg-catalogs.adoc#object-storage but that anchor
does not exist in modules/manage/pages/iceberg/use-iceberg-catalogs.adoc (it
only includes another file); fix by either adding a persistent anchor
[[object-storage]] at the appropriate section in
modules/manage/pages/iceberg/use-iceberg-catalogs.adoc (or in the included file
it pulls in) so the xref resolves, or update the xref target here to the correct
existing anchor/path if the intended section uses a different anchor name.
Description
PR for single sourcing: redpanda-data/docs#1494
Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline:
Page previews
Use Iceberg Catalogs > Integrate filesystem-based catalog
Integrate with REST Catalogs index page
Checks