Update docs for coco GA release by a-mccarthy · Pull Request #365 · NVIDIA/cloud-native-docs

a-mccarthy · 2026-03-17T17:25:15Z

No description provided.

github-actions · 2026-03-17T17:28:36Z

Documentation preview

https://nvidia.github.io/cloud-native-docs/review/pr-365

gpu-operator/kata-containers-deploy.rst

a-mccarthy · 2026-03-17T17:31:07Z

gpu-operator/kata-containers-deploy.rst

+#. Specify at least the following options when you install the Operator.
+   If you want to run Kata Containers by default on all worker nodes, also specify ``--set sandboxWorkloads.defaultWorkload=vm-passthrough``.
+
+   .. code-block:: console


the upstream doc calls out enabling NFD in the install command (and also disabling it in the kata-deploy install). Is that needed? can you elaborate on why users should include those?

@jojimt - can you help here? see https://github.com/kata-containers/kata-containers/pull/12651/changes on what we currently suggest in the Kata docs

gpu-operator/kata-containers-deploy.rst

confidential-containers/overview.rst

gpu-operator/confidential-containers-deploy.rst

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

confidential-containers/licensing.rst

confidential-containers/attestation.rst

confidential-containers/overview.rst

confidential-containers/confidential-containers-deploy.rst

a-mccarthy · 2026-04-01T13:48:46Z

confidential-containers/confidential-containers-deploy.rst

+Prerequisites
+=============
+
+* Use a supported platform for Confidential Containers.


In terms or other services needed, should we call out that folks need to have a secure container registry? or any of the other services mentioned in the architecture image, https://nvidia.github.io/cloud-native-docs/review/pr-365/confidential-containers/latest/overview.html#architecture-overview? We talk about hardware, kata and GPU operator, but dont have as much details about additional services and setup. @Hema-Bontha-NV @manuelh-dev

I defer to @Hema-Bontha-NV here. This is a good question. Ideally they would sign their container images or use a registry they trust with signed images, and ideally they'd have a trusted environment in which they are running trustee. This is however more for the production end-to-end scenario. Since this is our general deployment guide, we don't explain this in detail. Referring to such aspects though can make sense.

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

confidential-containers/overview.rst

a-mccarthy · 2026-04-01T14:19:53Z

confidential-containers/supported-platforms.rst

+
+.. _coco-supported-platforms:
+
+Limitations and Restrictions


@Hema-Bontha-NV @manuelh-dev are there any more limitations we need to call out? Also, we dont currently mention anything for openshift

We should follow up on this. One note from the upstream Kata docs: https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata-QEMU.md#deploy-pods-using-your-own-containers-and-manifests - deferring to @Hema-Bontha-NV here.

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

confidential-containers/attestation.rst

manuelh-dev · 2026-04-01T21:10:33Z

confidential-containers/attestation.rst

+
+During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters.
+
+If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment.


nit, here I would not mention nvrc.smi.srs=1 in turn.

This parameter transitions the GPU into ready state. This is done automatically during attestation. I don't think we need to set this to debug attestation failures

Yes, this has nothing to do with debugging, but note that we do need this to be set in general now.

confidential-containers/supported-platforms.rst

confidential-containers/confidential-containers-deploy.rst

manuelh-dev · 2026-04-01T21:24:11Z

confidential-containers/confidential-containers-deploy.rst

+
+   .. code-block:: console
+
+      $ export VERSION="3.29.0"


did we intentionally decide against using the command from https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata-QEMU.md#kata-containers

export VERSION=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')

uses the github API to determine the latest version. If we have newer versions we either need to update here or rely on users to not use this outdated version in a few months

manuelh-dev · 2026-04-01T21:41:27Z

confidential-containers/confidential-containers-deploy.rst

+Next Steps
+==========
+
+* Refer to the :doc:`Attestation <attestation>` page for more information on configuringattestation.


configuringattestation - missing whitespace

adding the pod security policy to protect the shim to agent interface using the genpolicy tool is related to attestation - at the place where we reference attestation we could at least mention something like "and pod security policies" and refer to relevant documentation from the kata-containers repository: https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-use-the-kata-agent-policy.md

dcmiddle · 2026-04-01T22:05:41Z

confidential-containers/attestation.rst

+
+During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters.
+
+If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment.


I think nvrc.smi.srs only applies to the pod / coco uvm - it's saying set ready state true for the GPU.
And the the rust log level would be for trustee.

Yes. Also, RUST_LOG=debug is not a kernel parameter. It's an environment variable. There is some info about enabling debug here.

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

a-mccarthy · 2026-04-02T16:21:49Z

confidential-containers/overview.rst

+.. image:: graphics/CoCo-Sample-Workflow.png
+   :alt: Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo
+
+*Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo*


@Hema-Bontha-NV can you share more about the workflow in this diagram. there is the 1-3 steps, but we dont describe them in much detail

a-mccarthy · 2026-04-02T16:48:36Z

confidential-containers/confidential-containers-deploy.rst

+Configure Image Pull Timeouts
+-----------------------------
+
+Using the guest-pull mechanism to securly manage images in your deployment scenarios means that pulling large images may take a significant amount of time and may delay container start.


What is the guest pull mechanism? we reference it, but dont really explain it that well.

fitzthum

A few comments on the attestation stuff.

fitzthum · 2026-04-02T18:29:30Z

confidential-containers/attestation.rst

+To enable the remote verifier, add the following lines to the Trustee configuration file::
+
+   [attestation_service.verifier_config.nvidia_verifier]
+   type = "Remote"


This is no longer needed. Remote verifier is set by default for docker compose.

fitzthum · 2026-04-02T18:29:58Z

confidential-containers/attestation.rst

+
+Now, the guest can be used with attestation. For more information on how to provision Trustee with resources and policies, refer to the `Trustee documentation <https://confidentialcontainers.org/docs/attestation/>`_.
+
+During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` kernel parameters.


This is no longer true. You need to set nvrc.smi.srs=1 for the GPU to be set to ready.

fitzthum · 2026-04-02T18:30:27Z

confidential-containers/attestation.rst

+
+During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters.
+
+If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment.


Yes, this has nothing to do with debugging, but note that we do need this to be set in general now.

fitzthum · 2026-04-02T18:31:34Z

confidential-containers/attestation.rst

+
+During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters.
+
+If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment.


Yes. Also, RUST_LOG=debug is not a kernel parameter. It's an environment variable. There is some info about enabling debug here.

a-mccarthy commented Mar 17, 2026

View reviewed changes

gpu-operator/kata-containers-deploy.rst Outdated Show resolved Hide resolved

a-mccarthy commented Mar 17, 2026

View reviewed changes

gpu-operator/kata-containers-deploy.rst Outdated Show resolved Hide resolved

cdesiniotis mentioned this pull request Mar 18, 2026

Add docs for 26.3.0 release #353

Merged

a-mccarthy commented Mar 18, 2026

View reviewed changes

gpu-operator/kata-containers-deploy.rst Outdated Show resolved Hide resolved