Conversation
Documentation preview |
| #. Specify at least the following options when you install the Operator. | ||
| If you want to run Kata Containers by default on all worker nodes, also specify ``--set sandboxWorkloads.defaultWorkload=vm-passthrough``. | ||
|
|
||
| .. code-block:: console |
There was a problem hiding this comment.
the upstream doc calls out enabling NFD in the install command (and also disabling it in the kata-deploy install). Is that needed? can you elaborate on why users should include those?
There was a problem hiding this comment.
@jojimt - can you help here? see https://github.com/kata-containers/kata-containers/pull/12651/changes on what we currently suggest in the Kata docs
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
| Prerequisites | ||
| ============= | ||
|
|
||
| * Use a supported platform for Confidential Containers. |
There was a problem hiding this comment.
In terms or other services needed, should we call out that folks need to have a secure container registry? or any of the other services mentioned in the architecture image, https://nvidia.github.io/cloud-native-docs/review/pr-365/confidential-containers/latest/overview.html#architecture-overview? We talk about hardware, kata and GPU operator, but dont have as much details about additional services and setup. @Hema-Bontha-NV @manuelh-dev
There was a problem hiding this comment.
I defer to @Hema-Bontha-NV here. This is a good question. Ideally they would sign their container images or use a registry they trust with signed images, and ideally they'd have a trusted environment in which they are running trustee. This is however more for the production end-to-end scenario. Since this is our general deployment guide, we don't explain this in detail. Referring to such aspects though can make sense.
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
|
|
||
| .. _coco-supported-platforms: | ||
|
|
||
| Limitations and Restrictions |
There was a problem hiding this comment.
@Hema-Bontha-NV @manuelh-dev are there any more limitations we need to call out? Also, we dont currently mention anything for openshift
There was a problem hiding this comment.
We should follow up on this. One note from the upstream Kata docs: https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata-QEMU.md#deploy-pods-using-your-own-containers-and-manifests - deferring to @Hema-Bontha-NV here.
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
|
|
||
| During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters. | ||
|
|
||
| If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment. No newline at end of file |
There was a problem hiding this comment.
nit, here I would not mention nvrc.smi.srs=1 in turn.
This parameter transitions the GPU into ready state. This is done automatically during attestation. I don't think we need to set this to debug attestation failures
There was a problem hiding this comment.
Yes, this has nothing to do with debugging, but note that we do need this to be set in general now.
|
|
||
| .. code-block:: console | ||
|
|
||
| $ export VERSION="3.29.0" |
There was a problem hiding this comment.
did we intentionally decide against using the command from https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata-QEMU.md#kata-containers
export VERSION=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')
uses the github API to determine the latest version. If we have newer versions we either need to update here or rely on users to not use this outdated version in a few months
| Next Steps | ||
| ========== | ||
|
|
||
| * Refer to the :doc:`Attestation <attestation>` page for more information on configuringattestation. |
There was a problem hiding this comment.
configuringattestation - missing whitespace
There was a problem hiding this comment.
adding the pod security policy to protect the shim to agent interface using the genpolicy tool is related to attestation - at the place where we reference attestation we could at least mention something like "and pod security policies" and refer to relevant documentation from the kata-containers repository: https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-use-the-kata-agent-policy.md
|
|
||
| During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters. | ||
|
|
||
| If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment. No newline at end of file |
There was a problem hiding this comment.
I think nvrc.smi.srs only applies to the pod / coco uvm - it's saying set ready state true for the GPU.
And the the rust log level would be for trustee.
There was a problem hiding this comment.
Yes. Also, RUST_LOG=debug is not a kernel parameter. It's an environment variable. There is some info about enabling debug here.
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
| .. image:: graphics/CoCo-Sample-Workflow.png | ||
| :alt: Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo | ||
|
|
||
| *Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo* |
There was a problem hiding this comment.
@Hema-Bontha-NV can you share more about the workflow in this diagram. there is the 1-3 steps, but we dont describe them in much detail
| Configure Image Pull Timeouts | ||
| ----------------------------- | ||
|
|
||
| Using the guest-pull mechanism to securly manage images in your deployment scenarios means that pulling large images may take a significant amount of time and may delay container start. |
There was a problem hiding this comment.
What is the guest pull mechanism? we reference it, but dont really explain it that well.
fitzthum
left a comment
There was a problem hiding this comment.
A few comments on the attestation stuff.
| To enable the remote verifier, add the following lines to the Trustee configuration file:: | ||
|
|
||
| [attestation_service.verifier_config.nvidia_verifier] | ||
| type = "Remote" |
There was a problem hiding this comment.
This is no longer needed. Remote verifier is set by default for docker compose.
|
|
||
| Now, the guest can be used with attestation. For more information on how to provision Trustee with resources and policies, refer to the `Trustee documentation <https://confidentialcontainers.org/docs/attestation/>`_. | ||
|
|
||
| During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` kernel parameters. |
There was a problem hiding this comment.
This is no longer true. You need to set nvrc.smi.srs=1 for the GPU to be set to ready.
|
|
||
| During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters. | ||
|
|
||
| If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment. No newline at end of file |
There was a problem hiding this comment.
Yes, this has nothing to do with debugging, but note that we do need this to be set in general now.
|
|
||
| During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters. | ||
|
|
||
| If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment. No newline at end of file |
There was a problem hiding this comment.
Yes. Also, RUST_LOG=debug is not a kernel parameter. It's an environment variable. There is some info about enabling debug here.
No description provided.