Skip to content

Add hosted evaluations section to eval docs#1040

Open
xeophon wants to merge 1 commit intomainfrom
add-hosted-evals-to-docs
Open

Add hosted evaluations section to eval docs#1040
xeophon wants to merge 1 commit intomainfrom
add-hosted-evals-to-docs

Conversation

@xeophon
Copy link
Contributor

@xeophon xeophon commented Mar 19, 2026

Summary

  • add a short Hosted Evaluations section near the top of docs/evaluation.md
  • link the new section from the table of contents
  • document the basic prime eval run --hosted flow, including publishing first, --follow, TOML config usage, and the official hosted eval guide

Testing

  • Not run (not requested)

Note

Low Risk
Low risk documentation-only change; no code paths or CLI behavior are modified.

Overview
Adds a new Hosted Evaluations section to docs/evaluation.md, linked from the table of contents, describing how to run prime eval run --hosted against Hub-published environments (including prime env push, --follow, and TOML config usage) and pointing to the official hosted-evals guide for hosted-only flags.

Written by Cursor Bugbot for commit aee2394. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

prime eval run configs/eval/benchmark-hosted.toml --hosted
```

For the full hosted workflow and hosted-only flags such as `--follow`, `--timeout-minutes`, `--allow-sandbox-access`, and `--custom-secrets`, see the official [Hosted Evaluations](https://docs.primeintellect.ai/tutorials-environments/hosted-evaluations) guide.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing skill update for hosted evaluations workflow

Low Severity

The PR adds a new Hosted Evaluations section to docs/evaluation.md documenting the --hosted flag, --follow, and other hosted-only flags, but skills/evaluate-environments/SKILL.md was not updated. The skill file only mentions "hosted eval workflows" in passing (line 63) and the example command there doesn't even use --hosted. The project rule requires that changes to docs/evaluation.md that affect user-facing workflows are reflected in the corresponding skill file.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aee2394a5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +40 to +41
prime env push my-env
prime eval run my-team/my-env --hosted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep the pushed namespace consistent with the hosted slug

This example fails if someone copies it literally: prime env push my-env publishes under the caller's own namespace, but the next command runs my-team/my-env. Publishing to a team requires an explicit --team <team> on the push, so the doc should either keep the owner the same in both commands or show the team flag; otherwise readers end up targeting a hosted slug they never created.

Useful? React with 👍 / 👎.

Comment on lines +45 to +48
Hosted runs also support TOML configs:

```bash
prime eval run configs/eval/benchmark-hosted.toml --hosted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Explain that hosted TOML configs must use Hub slugs

The new hosted TOML example omits the main difference from the local TOML flow: hosted [[eval]].env_id entries must point at already-published Hub slugs like owner/my-env. Later in this same file, env_id is documented as an environment module name and every config example uses local IDs, so readers following this new snippet will naturally reuse gsm8k/my-env and get hosted runs that cannot resolve the environment.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant