RT-SEG — Reasoning Trace Segmentation

rt_seg is a Python 3.12.x package for segmenting reasoning traces into coherent chunks and (optionally) assigning a label to each chunk.

The main entry point is:

RTSeg

(from rt_segmentation.seg_factory)

It orchestrates one or more segmentation engines and — if multiple engines are used — an offset aligner that fuses their boundaries into a single segmentation.

rt_seg_demo.mp4

Installation

Install from PyPI (once published)

pip install rt-seg

Development Install (repo checkout)

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install TUI APP with Docker/Podman

Note: requires NVIDIA GPU (12.4.1+).

docker build -f docker/Dockerfile -t mytui:gpu .
docker run -it --rm --gpus all mytui:gpu

# podman build -f docker/Dockerfile -t mytui:gpu .
# podman run -it --rm --device nvidia.com/gpu=all mytui:gpu

Core Concepts

What `RTSeg` Returns

RTSeg(trace) produces:

offsets: list[tuple[int, int]] — character offsets into the trace
labels: list[str] — one label per segment

You can reconstruct segments via:

segments = [trace[s:e] for (s, e) in offsets]

Segmentation Base Unit (`seg_base_unit`)

Most engines operate on a base segmentation first:

"clause" (default) → finer granularity
"sent" → coarser segmentation

Quickstart — Single Engine

from rt_seg import RTSeg
from rt_seg import RTRuleRegex

trace = "First step... Then second step... Finally conclude."

segmentor = RTSeg(
    engines=RTRuleRegex,
    seg_base_unit="clause",
)

offsets, labels = segmentor(trace)

for (s, e), label in zip(offsets, labels):
    print(label, "=>", trace[s:e])

Multiple Engines + Late Fusion

If you pass multiple engines, you must provide an aligner.

from rt_seg import RTSeg
from rt_seg import RTRuleRegex
from rt_seg import RTBERTopicSegmentation
from rt_seg import OffsetFusionGraph

segmentor = RTSeg(
    engines=[RTRuleRegex, RTBERTopicSegmentation],
    aligner=OffsetFusionGraph,
    label_fusion_type="concat",  # or "majority"
    seg_base_unit="clause",
)

offsets, labels = segmentor(trace)

Label Fusion Modes

"majority" — choose most frequent label
"concat" — concatenate labels (useful for debugging)

Available Engines

Rule-Based

RTRuleRegex
RTNewLine

Probabilistic

RTLLMForcedDecoderBased
RTLLMSurprisal
RTLLMEntropy
RTLLMTopKShift
RTLLMFlatnessBreak

LLM Discourse / Reasoning Schemas

RTLLMThoughtAnchor
RTLLMReasoningFlow
RTLLMArgument

LLM

RTLLMOffsetBased
RTLLMSegUnitBased

PRM-Based

RTPRMBase

Topic / Semantic / NLI

RTBERTopicSegmentation
RTEmbeddingBasedSemanticShift
RTEntailmentBasedSegmentation
RTZeroShotSeqClassification
RTZeroShotSeqClassificationRF
RTZeroShotSeqClassificationTA

Engine Configuration

You can override engine parameters at call time:

offsets, labels = segmentor(
    trace,
    model_name="Qwen/Qwen2.5-7B-Instruct",
    chunk_size=200,
)

Available Aligners

OffsetFusionGraph
OffsetFusionFuzzy
OffsetFusionIntersect
OffsetFusionMerge
OffsetFusionVoting

Strategy	Behavior
Intersect	Conservative
Merge	Permissive
Voting / Graph / Fuzzy	Balanced (recommended)

Implementing a Custom Engine

from typing import Tuple, List
from rt_seg import SegBase

class MyEngine(SegBase):
    @staticmethod
    def _segment(trace: str, **kwargs) -> Tuple[List[tuple[int, int]], List[str]]:
        offsets = [(0, len(trace))]
        labels = ["UNK"]
        return offsets, labels

Using Base Offsets

base_offsets = SegBase.get_base_offsets(trace, seg_base_unit="clause")

Implementing a Custom Aligner

from typing import List, Tuple

class MyOffsetFusion:
    @staticmethod
    def fuse(engine_offsets: List[List[Tuple[int, int]]], **kwargs):
        return engine_offsets[0]

Running the TUI (Without Docker)

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m tui

If needed:

python src/tui.py

SurrealDB (Optional — Reproducible Experiments)

Required only for full experiment pipeline.

1️⃣ Start SurrealDB (Docker Recommended)

docker run --rm -it \
  -p 8000:8000 \
  -v "$(pwd)/data:/data" \
  surrealdb/surrealdb:latest \
  start --user root --pass root file:/data/surreal.db

Endpoints:

WebSocket: ws://127.0.0.1:8000/rpc
HTTP: http://127.0.0.1:8000

2️⃣ Import Database Snapshot

surreal import \
  --endpoint ws://127.0.0.1:8000/rpc \
  --username root \
  --password root \
  --namespace NR \
  --database RT \
  ./data/YOUR_EXPORT_FILE.surql

⚠️ Make sure namespace/database match your config.

3️⃣ Configure `data/sdb_login.json`

{
  "user": "root",
  "pwd": "root",
  "ns": "NR",
  "db": "RT",
  "url": "ws://127.0.0.1:8000/rpc"
}

4️⃣ Run Experiment Scripts

python src/eval_main.py
python src/evo.py

Docker + GPU Setup

Requirements

Linux
NVIDIA GPU
NVIDIA driver
Docker
NVIDIA Container Toolkit

Verify:

nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

CUDA Compatibility Rule

Host driver CUDA ≥ Container CUDA

Host	Container	Result
12.8	12.4	✅
12.8	13.1	❌
13.x	12.4	✅

Build Image

docker build -f docker/Dockerfile -t rt-seg:gpu .

Run

./run_tui_app_docker.sh

Internally:

docker run -it --rm --gpus all rt-seg:gpu

Summary

RT-SEG provides:

Modular segmentation engines
Late fusion strategies
LLM-based reasoning segmentation
Reproducible DB-backed experiments
GPU Docker deployment

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github/workflows		.github/workflows
data		data
docker		docker
docs		docs
notebook		notebook
src		src
surrealdb_dataset		surrealdb_dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_tui_app_docker.sh		run_tui_app_docker.sh

Folders and files

Latest commit

History

Repository files navigation

RT-SEG — Reasoning Trace Segmentation

Installation

Install from PyPI (once published)

Development Install (repo checkout)

Install TUI APP with Docker/Podman

Core Concepts

What RTSeg Returns

Segmentation Base Unit (seg_base_unit)

Quickstart — Single Engine

Multiple Engines + Late Fusion

Label Fusion Modes

Available Engines

Rule-Based

Probabilistic

LLM Discourse / Reasoning Schemas

LLM

PRM-Based

Topic / Semantic / NLI

Engine Configuration

Available Aligners

Implementing a Custom Engine

Using Base Offsets

Implementing a Custom Aligner

Running the TUI (Without Docker)

SurrealDB (Optional — Reproducible Experiments)

1️⃣ Start SurrealDB (Docker Recommended)

2️⃣ Import Database Snapshot

3️⃣ Configure data/sdb_login.json

4️⃣ Run Experiment Scripts

Docker + GPU Setup

Requirements

CUDA Compatibility Rule

Build Image

Run

Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What `RTSeg` Returns

Segmentation Base Unit (`seg_base_unit`)

3️⃣ Configure `data/sdb_login.json`

Packages