rodionlim · rodionlim · Jan 16, 2026 · Jan 16, 2026 · Jan 16, 2026
diff --git a/src/quantlib_st/core/algos/forecast.py b/src/quantlib_st/core/algos/forecast.py
diff --git a/src/quantlib_st/estimators/README.md b/src/quantlib_st/estimators/README.md
@@ -1,19 +1,66 @@
 # estimators
 
-Small, focused volatility estimators.
+Small, focused estimators for volatility and signal scaling.
 
-- **robust_vol_calc** — Robust exponential volatility estimator for daily returns. Uses EWM std with an absolute minimum and an optional volatility floor.
+## Volatility Estimators (`vol.py`)
 
-- **mixed_vol_calc** — Blends short-term (robust) vol with a long-term slow vol component.
+- **robust_vol_calc** — Robust exponential volatility estimator for daily returns. Uses EWM std with an absolute minimum and an optional volatility floor.
+- **mixed_vol_calc** — Blending short-term (robust) vol with a long-term slow vol component.
 
-## Usage example
+### Usage
 
 ```python
 from quantlib_st.estimators.vol import robust_vol_calc
 vol = robust_vol_calc(returns_series)
 ```
 
-Notes
+## Forecast Scaling (`forecast_scalar.py`)
+
+In this modular framework, a **forecast** is a standardized number where positive values indicate a buy signal and negative values indicate a short signal.
+
+To ensure proper risk control and prevent any single rule from dominating the portfolio's returns, all forecasts are eventually **capped within the range of -20 to +20**.
+
+### Why a Forecast Scalar is Necessary
+
+To convert any trading rule output into this specific scale, we use a forecast scalar to ensure that the "average" signal has an expected absolute value of **10.0**.
+
+- **+10.0**: Represents an average buy.
+- **+20.0**: Represents a very strong buy (the cap).
+- **0.0**: Represents a neutral or weak signal.
+
+This consistency allows the rest of the framework—such as position sizing and volatility targeting—to function correctly without needing redesign for every new rule.
+
+### How to Calculate and Apply the Scalar
+
+The forecast scalar is a fixed multiplier used to convert the "raw" output of a trading rule (e.g., price differences, moving average crossovers) into this standardized interface.
+
+1.  **Measure the Average**: Calculate the average absolute value of the raw forecast outputs across a wide backtest of various instruments.
+2.  **The Formula**:
+    $$\text{Scalar} = \frac{\text{Target Average Absolute Forecast (10.0)}}{\text{Measured Average Absolute Raw Output}}$$
+3.  **Example**: If a rule naturally generates an average absolute output of 0.33, the forecast scalar would be **30** ($10 / 0.33 \approx 30$).
+
+### Common Scalar Examples
+
+Different rules require unique scalars based on their mathematical sensitivity:
+
+- **EWMAC Rules**: Variations like EWMAC 2,8 might use a scalar of ~10.6, while the slower EWMAC 64,256 uses ~1.87.
+- **Carry Rule**: Raw carry measures (which act like annualized Sharpe ratios) typically require a scalar of approximately **30**.
+
+### Usage
+
+```python
+from quantlib_st.estimators.forecast_scalar import forecast_scalar
+
+# cs_forecasts: TxN DataFrame of raw, unscaled signals across multiple instruments
+scalar_series = forecast_scalar(cs_forecasts, target_abs_forecast=10.0)
+
+# Apply to raw signal
+scaled_forecast = raw_signal * scalar_series
+```
+
+---
+
+## Notes
 
-- If you have price data, use `robust_daily_vol_given_price(price_series)` which resamples to business days
-  (taking the last price per business day) and computes differences to produce daily returns.
+- If you have price data for volatility estimation, use `robust_daily_vol_given_price(price_series)` which resamples to business days and computes differences to produce daily returns.
+- `forecast_scalar` supports an `estimated` mode where the scalar is computed on a rolling basis, or it can be used on a full backtest to find a fixed value for configuration.
diff --git a/src/quantlib_st/estimators/forecast_scalar.py b/src/quantlib_st/estimators/forecast_scalar.py
@@ -0,0 +1,62 @@
+from copy import copy
+import pandas as pd
+import numpy as np
+
+
+def forecast_scalar(
+    cs_forecasts: pd.DataFrame,
+    target_abs_forecast: float = 10.0,
+    window: int = 250000,  ## JUST A VERY LARGE NUMBER TO USE ALL DATA
+    min_periods: int = 500,  # MINIMUM PERIODS BEFORE WE ESTIMATE A SCALAR
+    backfill: bool = True,  ## BACKFILL OUR FIRST ESTIMATE, SLIGHTLY CHEATING, BUT...
+) -> pd.Series:
+    """
+    Work out the scaling factor for cross-sectional forecasts such that T*x has an
+    average absolute value equal to target_abs_forecast (typically 10.0).
+
+    This implementation computes a rolling scalar based on historical forecast values.
+
+    :param cs_forecasts: forecasts, cross-sectionally (TxN DataFrame)
+    :type cs_forecasts: pd.DataFrame
+
+    :param target_abs_forecast: The target average absolute value for the scaled forecast
+    :type target_abs_forecast: float
+
+    :param window: Lookback window for computing the average absolute value
+    :type window: int
+
+    :param min_periods: Minimum number of periods before producing an estimate
+    :type min_periods: int
+
+    :param backfill: If True, backfills the first valid estimate to the start of the series
+    :type backfill: bool
+
+    :returns: pd.Series -- The computed scaling factors
+    """
+    # Canonicalize boolean if passed as string (e.g. from YAML)
+    if isinstance(backfill, str):
+        backfill = backfill.lower() in ("t", "true", "yes", "1")
+
+    # Remove zeros/nans to avoid bias from missing data
+    copy_cs_forecasts = copy(cs_forecasts)
+    copy_cs_forecasts[copy_cs_forecasts == 0.0] = np.nan
+
+    # Take Cross-Sectional average first (median is more robust to outliers)
+    # We do this before the Time-Series average to avoid jumps in scalar
+    # when new markets are introduced.
+    if copy_cs_forecasts.shape[1] == 1:
+        x = copy_cs_forecasts.abs().iloc[:, 0]
+    else:
+        # ffill here ensures we have a view of the "current" forecast level across the pool
+        x = copy_cs_forecasts.ffill().abs().median(axis=1)
+
+    # Compute Rolling Time-Series average of absolute values
+    avg_abs_value = x.rolling(window=window, min_periods=min_periods).mean()
+
+    # Scaling factor is Target / Current Avg
+    scaling_factor = target_abs_forecast / avg_abs_value
+
+    if backfill:
+        scaling_factor = scaling_factor.bfill()
+
+    return scaling_factor
diff --git a/src/quantlib_st/systems/README.md b/src/quantlib_st/systems/README.md
@@ -1,35 +1,91 @@
 # Systems: Rules, TradingRules, SystemStage, System
 
-This folder mirrors the core architecture from `systems/` in the original codebase. The key idea is a *pipeline* that turns raw data into forecasts, positions, and P&L through composable stages.
+The key idea of an entire System is a _pipeline_ that turns raw data into forecasts, positions, and P&L through composable stages.
 
 ## Mental Model (High Level)
 
 Think of a trading system as a production line:
 
-1. **Rules**: Pure functions that transform market data into *signals* (e.g., trend, carry).
-2. **TradingRules**: A registry/wrapper that manages a *set of Rules* and exposes a consistent interface.
-3. **SystemStage**: A pipeline step that consumes outputs from earlier stages and produces new outputs.
-4. **System**: The orchestrator that wires stages together into a full strategy.
+1. **Rule Logic**: A pure Python function that calculates a signal (forecast).
+2. **TradingRule (Singular)**: A _specification_. It wraps the logic function with specific parameters (e.g., "Trend with a 32-day window").
+3. **Rules (Plural/Stage)**: A _collection_ (dictionary) of `TradingRule` objects. This is the stage that manages all your signals.
+4. **SystemStage**: A pipeline step that consumes outputs from earlier stages and produces new outputs.
+5. **System**: The orchestrator that wires stages together into a full strategy.
 
-## What is a Rule?
+## What is a TradingRule? (The Specification)
 
-A **Rule** is the smallest unit of trading logic. It takes price data (and possibly other inputs) and returns a *forecast series*.
+A `TradingRule` is NOT a time series. It is a **template** for a signal. It answers the question: _"How do I calculate this signal for any instrument I'm given?"_
 
-- Input: prices, instrument metadata, config params
-- Output: a forecast (typically normalized and capped)
-- Purpose: create a predictive signal in isolation
+It consists of:
 
-**Example mental model**: “If the 64-day moving average is above the 256-day average, produce a positive forecast.”
+- **Logic**: The Python function to call.
+- **Data Req**: What the function needs (e.g., "give me daily prices").
+- **Parameters**: The settings for this specific version (e.g., `window=32`).
 
-## What is TradingRules?
+**Mental Model**: If a "Moving Average Crossover" is a recipe, a `TradingRule` is a **printed copy of that recipe** with specific quantities written in.
 
-**TradingRules** is a container for multiple Rule functions, providing:
+## What is Rules? (The Collection/Stage)
 
-- A single interface to run or retrieve specific rules
-- Metadata (names, parameters)
-- Consistent access patterns for the pipeline
+The `Rules` stage (the `Rules` class) is a `SystemStage`. It contains a dictionary that maps **Names** to `TradingRule` objects. You don't usually have multiple different "Rules Stages" in one system; you have one Rules stage that contains every signal you might ever want to use for any instrument.
 
-**Example mental model**: “A toolbox that holds all my signals and lets the system query them by name.”
+- **What are "Names"?**: These are arbitrary labels you invent to identify a signal. For example: `"ewmac_8_32"`, `"carry"`, or `"my_fancy_signal"`. These names are used later when you want to look up a specific signal's performance.
+- **Relationship**: The `Rules` stage acts as a "Box of Recipes".
+- **Instruments**: One instrument (e.g., Gold) is passed through **every recipe in the box**.
+- **The Result**: If you have 3 trading rules in your stage, Gold will have 3 different signals. These 3 signals are later weighted and combined into a single forecast for Gold.
+
+### How is it linked to SystemStage?
+
+1. **`Rules` is a `SystemStage`**: Like all stages, it sits inside the `System`.
+2. **Data Flow**: The `System` tells the `Rules` stage: _"I need the forecast for Gold using the 'ewmac_8_32' rule."_
+3. **Execution**: The `Rules` stage looks up that **Name**, finds the corresponding `TradingRule` object, and executes it using Gold's price data.
+
+| Component         | Nature         | Example                                                   |
+| :---------------- | :------------- | :-------------------------------------------------------- |
+| **Name**          | Key (String)   | `"trend_fast"`                                            |
+| **`TradingRule`** | Value (Object) | A template saying: "Use EWMA logic with window 32."       |
+| **`Rules` Stage** | Map (Dict)     | `{ "trend_fast": <TradingRule>, "carry": <TradingRule> }` |
+
+## Where are Rules and Collections defined? (The Config)
+
+The definition of rules and which rules belong to which instrument happens in the **System Config** (usually a `.yaml` file or a Python dictionary).
+
+### 1. Global Rule Definitions
+
+In the config under `trading_rules`, you define the names and logic for every signal in your strategy. This is a **Global Collection**.
+
+```yaml
+trading_rules:
+  ewmac_8_32:
+    function: systems.provided.rules.ewmac
+    args: { Llookback: 32, Slookback: 8 }
+  carry:
+    function: systems.provided.rules.carry
+```
+
+### 2. Instrument-Specific Weights
+
+Under `forecast_weights`, you define which rules from the global collection apply to which instrument.
+
+```yaml
+forecast_weights:
+  GOLD:
+    ewmac_8_32: 0.5 # Gold uses these two rules
+    carry: 0.5
+  CORN:
+    ewmac_8_32: 1.0 # Corn only uses the trend rule
+```
+
+### 3. Summary of Storage and Execution
+
+- **Storage**: We store **one collection** of `TradingRule` objects for the whole `System`.
+- **Filtering**: We do NOT store a separate collection per instrument. Instead, we use the `forecast_weights` as a filter.
+- **Execution (On Demand)**: The system is "lazy". It only calculates a rule's forecast for Gold if Gold has a non-zero weight for that rule in the config.
+
+**Mental Model**:
+
+- **The Rules Stage**: A generic factory that knows how to make all types of signals.
+- **The Config**: A manager that says, "For Gold, I want 50% of the Fast Trend signal and 50% of the Carry signal."
+- **The System**: On demand, it fetches the prices for Gold, asks the Factory for those specific signals, and combines them.
 
 ## What is a SystemStage?
 
@@ -43,7 +99,7 @@ Typical stages include:
 - **Position sizing** → risk-targeted positions
 - **P&L accounting** → account curves
 
-Each stage is *stateless* in the sense that it does not own the whole system. It only knows its inputs and outputs.
+Each stage is _stateless_ in the sense that it does not own the whole system. It only knows its inputs and outputs.
 
 **Example mental model**: “A stage is a node in a DAG that transforms data.”
 
@@ -66,7 +122,6 @@ Rule (signal logic)  ->  TradingRules (signal collection)
                          |
                          v
 SystemStage (Forecasting) -> SystemStage (Scaling) -> SystemStage (Position) -> SystemStage (P&L)
-
 System (orchestrator)
 ```
 

diff --git a/src/quantlib_st/core/algos/__init__.py → src/quantlib_st/systems/provided/__init__.py b/src/quantlib_st/core/algos/__init__.py → src/quantlib_st/systems/provided/__init__.py
diff --git a/src/quantlib_st/systems/provided/rules/README.md b/src/quantlib_st/systems/provided/rules/README.md
@@ -0,0 +1,37 @@
+# Trading Rules
+
+This directory contains standard implementations of common trading rules used to generate raw forecasts.
+
+## Breakout (`breakout.py`)
+
+The **Breakout** rule measures the current price's position relative to its recent range (high/low).
+
+### Why is it a "Breakout"?
+
+In technical analysis, a "breakout" occurs when a price moves outside a defined range of support or resistance. This strategy assumes that such a move signifies a shift in market sentiment and the beginning of a trend.
+
+In this implementation:
+
+- **Range Tracking**: It calculates the rolling maximum (`roll_max`) and minimum (`roll_min`) over a given `lookback` period (e.g., 20 days).
+- **Positioning**: It calculates where the current price sits relative to the midpoint of that range:
+  $$\text{signal} = 40.0 \times \frac{\text{price} - \text{midpoint}}{\text{max} - \text{min}}$$
+- **Signaling**:
+  - If the price is at the **20-day high**, the signal is **+20** (maximum bullish).
+  - If the price is at the **20-day low**, the signal is **-20** (maximum bearish).
+  - If the price is exactly at the midpoint, the signal is **0**.
+- **Smoothing**: The raw signal is smoothed with an Exponential Moving Average (`smooth`) to reduce high-frequency noise and "whipsaws" (false breakouts).
+
+## EWMAC (`ewmac.py`)
+
+The **Exponentially Weighted Moving Average Crossover (EWMAC)** is the "workhorse" trend-following rule.
+
+- **Dynamics**: It calculates the difference between a "fast" EWMA and a "slow" EWMA.
+- **Normalization**: The raw difference is divided by price volatility. This ensures that the signal strength is comparable across different instruments and over time, regardless of how "volatile" the market current is.
+- **Interpretation**: A positive values indicates the shorter-term trend is higher than the long-term trend (Bullish).
+
+## Carry (`carry.py`)
+
+The **Carry** rule captures the "income" generated by holding a position.
+
+- In futures contexts, this is usually the "roll yield" (the difference between the price of the current contract and the next one).
+- It is a "value" or "income" based strategy rather than a trend-based one.
diff --git a/src/tests/core/algos/__init__.py → ...lib_st/systems/provided/rules/__init__.py b/src/tests/core/algos/__init__.py → ...lib_st/systems/provided/rules/__init__.py
diff --git a/src/quantlib_st/systems/provided/rules/breakout.py b/src/quantlib_st/systems/provided/rules/breakout.py
@@ -0,0 +1,38 @@
+import numpy as np
+
+
+def breakout(price, lookback=10, smooth=None):
+    """
+    :param price: The price or other series to use (assumed Tx1)
+    :type price: pd.DataFrame
+
+    :param lookback: Lookback in days
+    :type lookback: int
+
+    :param lookback: Smooth to apply in days. Must be less than lookback! Defaults to smooth/4
+    :type lookback: int
+
+    :returns: pd.DataFrame -- unscaled, uncapped forecast
+
+    With thanks to nemo4242 on elitetrader.com for vectorisation
+
+    """
+    if smooth is None:
+        smooth = max(int(lookback / 4.0), 1)
+
+    assert smooth < lookback
+
+    roll_max = price.rolling(
+        lookback, min_periods=int(min(len(price), np.ceil(lookback / 2.0)))
+    ).max()
+    roll_min = price.rolling(
+        lookback, min_periods=int(min(len(price), np.ceil(lookback / 2.0)))
+    ).min()
+
+    roll_mean = (roll_max + roll_min) / 2.0
+
+    # gives a nice natural scaling
+    output = 40.0 * ((price - roll_mean) / (roll_max - roll_min))
+    smoothed_output = output.ewm(span=smooth, min_periods=np.ceil(smooth / 2.0)).mean()
+
+    return smoothed_output