OracleRubric class Experimental by Jgmedina95 · Pull Request #1018 · PrimeIntellect-ai/verifiers

Jgmedina95 · 2026-03-13T17:56:51Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
[ X] New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

OracleRubric + Solubility Expert: What Changed

Summary

This update introduces a new OracleRubric class for scoring with external backends (instead of an LLM judge), and adds a concrete environment example, solubility_expert, that uses Rowan's solubility workflow API through rowan-python.

Why This Was Added

JudgeRubric is great when scoring is done by a judge model. For many tasks, scoring is better handled by:

a domain API,
a simulator,
a model server,
or any custom backend.

OracleRubric provides the same ergonomic pattern as JudgeRubric, but for backend-oracle scoring. whenever you need an external Neural Network or model for scoring/grading/simulating generated outputs.

OracleRubric Design

Location:

verifiers/rubrics/experimental/oracle_rubric.py

Compatibility shim:

verifiers/rubrics/oracle_rubric.py

Core API

rubric = vf.OracleRubric(
    oracle=my_backend,      # backend client or callable
    oracle_fn=call_backend, # optional adapter for backend invocation
)
rubric.add_reward_func(my_reward_func)

Key Behavior

oracle can be a callable backend or an object with .predict(...).
oracle_fn is optional and receives context (prompt, completion, answer, state, parsed response) plus the backend object. If not included, it will call the oracle directly Oracle()
Reward functions receive an injected oracle callable and can do:
- result = await oracle(prompt, completion, answer, state)
Oracle measurements are cached per rollout (cache_measurements=True by default), so multiple reward funcs can reuse the same backend result.

Parallels to JudgeRubric

Concept	JudgeRubric	OracleRubric
Main goal	Score with judge LLM	Score with external oracle backend
Constructor backend arg	`judge_client` / model settings	`oracle` backend object/callable
Injected callable in reward funcs	`judge(...)`	`oracle(...)`
Parser usage	Parse completion before judge call	Parse completion before oracle call
State caching	Caches judge responses	Caches oracle results
Reward registration style	`add_reward_func(...)`	`add_reward_func(...)`

The intended developer experience is intentionally parallel so users can switch scoring backends without changing rubric patterns.

Solubility Expert Example

Location:

environments/solubility_expert/solubility_expert.py
environments/solubility_expert/README.md

What It Demonstrates

A realistic OracleRubric usage in chemistry/SMILES editing.
A backend client (SolubilityPredictClient) that supports:
- mock mode (offline),
- Rowan API mode (real external call).
Directional reward scoring (higher / lower) based on oracle-returned solubility.

Rowan API Integration

When use_rowan_api=True, the environment:

reads API key from rowan_key (or ROWAN_API_KEY),
calls:
- rowan.submit_solubility_workflow(...)
waits for completion via:
- .wait_for_result().fetch_latest(in_place=True)
extracts a solubility value from the returned workflow data,
returns payload fields used by scoring:
- edited_solubility
- valid_predict_call
- workflow_uuid
- workflow_status

How to Run the Example

Install dependency:

pip install rowan-python

Set key:

export rowan_key="<your_key>"

Run:

prime eval run solubility_expert --env-args '{"use_rowan_api": true } -n 1'

NOTE: ^^ the solubility example and Rowan-API is very very SLOW, like 2mins per call. I mainly use it because it was one of the first APIs I found in the wild that I could use for this, but definitely having it locally would make it faster.

Practical Takeaways

Use JudgeRubric when scoring should come from an LLM judge.
Use OracleRubric when scoring should come from a domain backend/API.
Keep reward logic in add_reward_func(...) functions and call injected oracle(...) directly for a consistent rubric authoring pattern.

Note

Medium Risk
Adds a new scoring primitive (OracleRubric) that executes arbitrary backend/oracle calls during rollout scoring and introduces an example env that can block on external Rowan API workflows, increasing runtime/failure-surface compared to pure in-process rubrics.

Overview
Introduces experimental OracleRubric, a JudgeRubric-style rubric that scores via an external oracle (callable, .predict(...) client, or custom oracle_fn) and caches oracle measurements per rollout to avoid duplicate backend calls.

Adds a new solubility_expert example environment that combines a similarity-based reward with an oracle-based directional solubility reward, with a mock offline predictor by default and an optional Rowan submit_solubility_workflow(...) integration (API key validation + configurable solvents/temps/credits).

Exports OracleRubric from verifiers/__init__.py, adds an experimental rubric README, and includes new unit tests covering initialization, oracle_fn wiring, answer-dict scoring, and within-rollout caching behavior.

^{Written by Cursor Bugbot for commit 8ca5f56. This will update automatically on new commits. Configure here.}

…-rubric-to-main

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-13T17:59:43Z

environments/solubility_expert/solubility_expert.py

+        system_prompt=system_prompt,
+        parser=similarity_rubric.parser,
+        rubric=rubric,
+    )


Missing environments/README.md update for new environment

Low Severity

A new environment solubility_expert was added to the environments/ folder but environments/README.md was not updated. The README does not list solubility_expert in any section, nor does it mention OracleRubric as a pattern. The project rule requires that any PR adding an environment must update the environments README to list it under the appropriate category and update the "What to look at for each pattern" section if applicable.

^{Triggered by project rule: BugBot Instructions}

cursor · 2026-03-13T17:59:43Z

environments/solubility_expert/solubility_expert.py

+                "info": {},
+            },
+        ]
+    )


Mock oracle and dataset start_solubility values are inconsistent

Medium Severity

The hardcoded start_solubility values in the dataset are calibrated for Rowan's real API, not the mock estimate_solubility function. In mock mode (the default), estimate_solubility("CCO") returns ~0.40 but start_solubility is 0.61; estimate_solubility("c1ccccc1") returns ~0.14 but start_solubility is 0.70; estimate_solubility("CC(=O)N") returns ~0.49 but start_solubility is −0.42. Since directional scoring computes delta = edited_solubility − start_solubility, mock mode produces meaningless rewards — e.g., any edit to CC(=O)N automatically scores 1.0 because the mock always returns [0, 1] while the baseline is negative.

Additional Locations (1)

environments/solubility_expert/solubility_expert.py#L53-L68

Jgmedina95 added 6 commits March 12, 2026 19:37

Add experimental OracleRubric with examples and tests

9af5737

Refine OracleRubric API and docs; simplify oracle example

08248df

Refine oracle output handling in example

e41caed

Merge branch 'feature/oracle-rubric-experimental-only' into pr/oracle…

be3aa7d

…-rubric-to-main

Refactor OracleRubric example into solubility_expert

374d054

Modify sol readme

8ca5f56

Jgmedina95 changed the title ~~Pr/oracle rubric to main~~ OracleRubric class Experimental Mar 13, 2026

cursor bot reviewed Mar 13, 2026

View reviewed changes

Jgmedina95 marked this pull request as draft March 13, 2026 18:35

Jgmedina95 marked this pull request as ready for review March 15, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OracleRubric class Experimental#1018

OracleRubric class Experimental#1018
Jgmedina95 wants to merge 6 commits intoPrimeIntellect-ai:mainfrom
Jgmedina95:pr/oracle-rubric-to-main

Jgmedina95 commented Mar 13, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 13, 2026

Uh oh!

cursor bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jgmedina95 commented Mar 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

OracleRubric + Solubility Expert: What Changed

Summary

Why This Was Added

OracleRubric Design

Core API

Key Behavior

Parallels to JudgeRubric

Solubility Expert Example

What It Demonstrates

Rowan API Integration

How to Run the Example

Practical Takeaways

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 13, 2026

Choose a reason for hiding this comment

Missing environments/README.md update for new environment

Uh oh!

cursor bot Mar 13, 2026

Choose a reason for hiding this comment

Mock oracle and dataset start_solubility values are inconsistent

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jgmedina95 commented Mar 13, 2026 •

edited by cursor bot

Loading

Missing `environments/README.md` update for new environment

Mock oracle and dataset `start_solubility` values are inconsistent