Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 0 additions & 30 deletions src/quantlib_st/core/algos/forecast.py

This file was deleted.

61 changes: 54 additions & 7 deletions src/quantlib_st/estimators/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,66 @@
# estimators

Small, focused volatility estimators.
Small, focused estimators for volatility and signal scaling.

- **robust_vol_calc** — Robust exponential volatility estimator for daily returns. Uses EWM std with an absolute minimum and an optional volatility floor.
## Volatility Estimators (`vol.py`)

- **mixed_vol_calc** — Blends short-term (robust) vol with a long-term slow vol component.
- **robust_vol_calc** — Robust exponential volatility estimator for daily returns. Uses EWM std with an absolute minimum and an optional volatility floor.
- **mixed_vol_calc** — Blending short-term (robust) vol with a long-term slow vol component.

## Usage example
### Usage

```python
from quantlib_st.estimators.vol import robust_vol_calc
vol = robust_vol_calc(returns_series)
```

Notes
## Forecast Scaling (`forecast_scalar.py`)

In this modular framework, a **forecast** is a standardized number where positive values indicate a buy signal and negative values indicate a short signal.

To ensure proper risk control and prevent any single rule from dominating the portfolio's returns, all forecasts are eventually **capped within the range of -20 to +20**.

### Why a Forecast Scalar is Necessary

To convert any trading rule output into this specific scale, we use a forecast scalar to ensure that the "average" signal has an expected absolute value of **10.0**.

- **+10.0**: Represents an average buy.
- **+20.0**: Represents a very strong buy (the cap).
- **0.0**: Represents a neutral or weak signal.

This consistency allows the rest of the framework—such as position sizing and volatility targeting—to function correctly without needing redesign for every new rule.

### How to Calculate and Apply the Scalar

The forecast scalar is a fixed multiplier used to convert the "raw" output of a trading rule (e.g., price differences, moving average crossovers) into this standardized interface.

1. **Measure the Average**: Calculate the average absolute value of the raw forecast outputs across a wide backtest of various instruments.
2. **The Formula**:
$$\text{Scalar} = \frac{\text{Target Average Absolute Forecast (10.0)}}{\text{Measured Average Absolute Raw Output}}$$
3. **Example**: If a rule naturally generates an average absolute output of 0.33, the forecast scalar would be **30** ($10 / 0.33 \approx 30$).

### Common Scalar Examples

Different rules require unique scalars based on their mathematical sensitivity:

- **EWMAC Rules**: Variations like EWMAC 2,8 might use a scalar of ~10.6, while the slower EWMAC 64,256 uses ~1.87.
- **Carry Rule**: Raw carry measures (which act like annualized Sharpe ratios) typically require a scalar of approximately **30**.

### Usage

```python
from quantlib_st.estimators.forecast_scalar import forecast_scalar

# cs_forecasts: TxN DataFrame of raw, unscaled signals across multiple instruments
scalar_series = forecast_scalar(cs_forecasts, target_abs_forecast=10.0)

# Apply to raw signal
scaled_forecast = raw_signal * scalar_series
```

---

## Notes

- If you have price data, use `robust_daily_vol_given_price(price_series)` which resamples to business days
(taking the last price per business day) and computes differences to produce daily returns.
- If you have price data for volatility estimation, use `robust_daily_vol_given_price(price_series)` which resamples to business days and computes differences to produce daily returns.
- `forecast_scalar` supports an `estimated` mode where the scalar is computed on a rolling basis, or it can be used on a full backtest to find a fixed value for configuration.
62 changes: 62 additions & 0 deletions src/quantlib_st/estimators/forecast_scalar.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
from copy import copy
import pandas as pd
import numpy as np


def forecast_scalar(
cs_forecasts: pd.DataFrame,
target_abs_forecast: float = 10.0,
window: int = 250000, ## JUST A VERY LARGE NUMBER TO USE ALL DATA
min_periods: int = 500, # MINIMUM PERIODS BEFORE WE ESTIMATE A SCALAR
backfill: bool = True, ## BACKFILL OUR FIRST ESTIMATE, SLIGHTLY CHEATING, BUT...
) -> pd.Series:
"""
Work out the scaling factor for cross-sectional forecasts such that T*x has an
average absolute value equal to target_abs_forecast (typically 10.0).

This implementation computes a rolling scalar based on historical forecast values.

:param cs_forecasts: forecasts, cross-sectionally (TxN DataFrame)
:type cs_forecasts: pd.DataFrame

:param target_abs_forecast: The target average absolute value for the scaled forecast
:type target_abs_forecast: float

:param window: Lookback window for computing the average absolute value
:type window: int

:param min_periods: Minimum number of periods before producing an estimate
:type min_periods: int

:param backfill: If True, backfills the first valid estimate to the start of the series
:type backfill: bool

:returns: pd.Series -- The computed scaling factors
"""
# Canonicalize boolean if passed as string (e.g. from YAML)
if isinstance(backfill, str):
backfill = backfill.lower() in ("t", "true", "yes", "1")

# Remove zeros/nans to avoid bias from missing data
copy_cs_forecasts = copy(cs_forecasts)
copy_cs_forecasts[copy_cs_forecasts == 0.0] = np.nan

# Take Cross-Sectional average first (median is more robust to outliers)
# We do this before the Time-Series average to avoid jumps in scalar
# when new markets are introduced.
if copy_cs_forecasts.shape[1] == 1:
x = copy_cs_forecasts.abs().iloc[:, 0]
else:
# ffill here ensures we have a view of the "current" forecast level across the pool
x = copy_cs_forecasts.ffill().abs().median(axis=1)

# Compute Rolling Time-Series average of absolute values
avg_abs_value = x.rolling(window=window, min_periods=min_periods).mean()

# Scaling factor is Target / Current Avg
scaling_factor = target_abs_forecast / avg_abs_value

if backfill:
scaling_factor = scaling_factor.bfill()

return scaling_factor
93 changes: 74 additions & 19 deletions src/quantlib_st/systems/README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,91 @@
# Systems: Rules, TradingRules, SystemStage, System

This folder mirrors the core architecture from `systems/` in the original codebase. The key idea is a *pipeline* that turns raw data into forecasts, positions, and P&L through composable stages.
The key idea of an entire System is a _pipeline_ that turns raw data into forecasts, positions, and P&L through composable stages.

## Mental Model (High Level)

Think of a trading system as a production line:

1. **Rules**: Pure functions that transform market data into *signals* (e.g., trend, carry).
2. **TradingRules**: A registry/wrapper that manages a *set of Rules* and exposes a consistent interface.
3. **SystemStage**: A pipeline step that consumes outputs from earlier stages and produces new outputs.
4. **System**: The orchestrator that wires stages together into a full strategy.
1. **Rule Logic**: A pure Python function that calculates a signal (forecast).
2. **TradingRule (Singular)**: A _specification_. It wraps the logic function with specific parameters (e.g., "Trend with a 32-day window").
3. **Rules (Plural/Stage)**: A _collection_ (dictionary) of `TradingRule` objects. This is the stage that manages all your signals.
4. **SystemStage**: A pipeline step that consumes outputs from earlier stages and produces new outputs.
5. **System**: The orchestrator that wires stages together into a full strategy.

## What is a Rule?
## What is a TradingRule? (The Specification)

A **Rule** is the smallest unit of trading logic. It takes price data (and possibly other inputs) and returns a *forecast series*.
A `TradingRule` is NOT a time series. It is a **template** for a signal. It answers the question: _"How do I calculate this signal for any instrument I'm given?"_

- Input: prices, instrument metadata, config params
- Output: a forecast (typically normalized and capped)
- Purpose: create a predictive signal in isolation
It consists of:

**Example mental model**: “If the 64-day moving average is above the 256-day average, produce a positive forecast.”
- **Logic**: The Python function to call.
- **Data Req**: What the function needs (e.g., "give me daily prices").
- **Parameters**: The settings for this specific version (e.g., `window=32`).

## What is TradingRules?
**Mental Model**: If a "Moving Average Crossover" is a recipe, a `TradingRule` is a **printed copy of that recipe** with specific quantities written in.

**TradingRules** is a container for multiple Rule functions, providing:
## What is Rules? (The Collection/Stage)

- A single interface to run or retrieve specific rules
- Metadata (names, parameters)
- Consistent access patterns for the pipeline
The `Rules` stage (the `Rules` class) is a `SystemStage`. It contains a dictionary that maps **Names** to `TradingRule` objects. You don't usually have multiple different "Rules Stages" in one system; you have one Rules stage that contains every signal you might ever want to use for any instrument.

**Example mental model**: “A toolbox that holds all my signals and lets the system query them by name.”
- **What are "Names"?**: These are arbitrary labels you invent to identify a signal. For example: `"ewmac_8_32"`, `"carry"`, or `"my_fancy_signal"`. These names are used later when you want to look up a specific signal's performance.
- **Relationship**: The `Rules` stage acts as a "Box of Recipes".
- **Instruments**: One instrument (e.g., Gold) is passed through **every recipe in the box**.
- **The Result**: If you have 3 trading rules in your stage, Gold will have 3 different signals. These 3 signals are later weighted and combined into a single forecast for Gold.

### How is it linked to SystemStage?

1. **`Rules` is a `SystemStage`**: Like all stages, it sits inside the `System`.
2. **Data Flow**: The `System` tells the `Rules` stage: _"I need the forecast for Gold using the 'ewmac_8_32' rule."_
3. **Execution**: The `Rules` stage looks up that **Name**, finds the corresponding `TradingRule` object, and executes it using Gold's price data.

| Component | Nature | Example |
| :---------------- | :------------- | :-------------------------------------------------------- |
| **Name** | Key (String) | `"trend_fast"` |
| **`TradingRule`** | Value (Object) | A template saying: "Use EWMA logic with window 32." |
| **`Rules` Stage** | Map (Dict) | `{ "trend_fast": <TradingRule>, "carry": <TradingRule> }` |

## Where are Rules and Collections defined? (The Config)

The definition of rules and which rules belong to which instrument happens in the **System Config** (usually a `.yaml` file or a Python dictionary).

### 1. Global Rule Definitions

In the config under `trading_rules`, you define the names and logic for every signal in your strategy. This is a **Global Collection**.

```yaml
trading_rules:
ewmac_8_32:
function: systems.provided.rules.ewmac
args: { Llookback: 32, Slookback: 8 }
carry:
function: systems.provided.rules.carry
```

### 2. Instrument-Specific Weights

Under `forecast_weights`, you define which rules from the global collection apply to which instrument.

```yaml
forecast_weights:
GOLD:
ewmac_8_32: 0.5 # Gold uses these two rules
carry: 0.5
CORN:
ewmac_8_32: 1.0 # Corn only uses the trend rule
```

### 3. Summary of Storage and Execution

- **Storage**: We store **one collection** of `TradingRule` objects for the whole `System`.
- **Filtering**: We do NOT store a separate collection per instrument. Instead, we use the `forecast_weights` as a filter.
- **Execution (On Demand)**: The system is "lazy". It only calculates a rule's forecast for Gold if Gold has a non-zero weight for that rule in the config.

**Mental Model**:

- **The Rules Stage**: A generic factory that knows how to make all types of signals.
- **The Config**: A manager that says, "For Gold, I want 50% of the Fast Trend signal and 50% of the Carry signal."
- **The System**: On demand, it fetches the prices for Gold, asks the Factory for those specific signals, and combines them.

## What is a SystemStage?

Expand All @@ -43,7 +99,7 @@ Typical stages include:
- **Position sizing** → risk-targeted positions
- **P&L accounting** → account curves

Each stage is *stateless* in the sense that it does not own the whole system. It only knows its inputs and outputs.
Each stage is _stateless_ in the sense that it does not own the whole system. It only knows its inputs and outputs.

**Example mental model**: “A stage is a node in a DAG that transforms data.”

Expand All @@ -66,7 +122,6 @@ Rule (signal logic) -> TradingRules (signal collection)
|
v
SystemStage (Forecasting) -> SystemStage (Scaling) -> SystemStage (Position) -> SystemStage (P&L)

System (orchestrator)
```

Expand Down
37 changes: 37 additions & 0 deletions src/quantlib_st/systems/provided/rules/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Trading Rules

This directory contains standard implementations of common trading rules used to generate raw forecasts.

## Breakout (`breakout.py`)

The **Breakout** rule measures the current price's position relative to its recent range (high/low).

### Why is it a "Breakout"?

In technical analysis, a "breakout" occurs when a price moves outside a defined range of support or resistance. This strategy assumes that such a move signifies a shift in market sentiment and the beginning of a trend.

In this implementation:

- **Range Tracking**: It calculates the rolling maximum (`roll_max`) and minimum (`roll_min`) over a given `lookback` period (e.g., 20 days).
- **Positioning**: It calculates where the current price sits relative to the midpoint of that range:
$$\text{signal} = 40.0 \times \frac{\text{price} - \text{midpoint}}{\text{max} - \text{min}}$$
- **Signaling**:
- If the price is at the **20-day high**, the signal is **+20** (maximum bullish).
- If the price is at the **20-day low**, the signal is **-20** (maximum bearish).
- If the price is exactly at the midpoint, the signal is **0**.
- **Smoothing**: The raw signal is smoothed with an Exponential Moving Average (`smooth`) to reduce high-frequency noise and "whipsaws" (false breakouts).

## EWMAC (`ewmac.py`)

The **Exponentially Weighted Moving Average Crossover (EWMAC)** is the "workhorse" trend-following rule.

- **Dynamics**: It calculates the difference between a "fast" EWMA and a "slow" EWMA.
- **Normalization**: The raw difference is divided by price volatility. This ensures that the signal strength is comparable across different instruments and over time, regardless of how "volatile" the market current is.
- **Interpretation**: A positive values indicates the shorter-term trend is higher than the long-term trend (Bullish).

## Carry (`carry.py`)

The **Carry** rule captures the "income" generated by holding a position.

- In futures contexts, this is usually the "roll yield" (the difference between the price of the current contract and the next one).
- It is a "value" or "income" based strategy rather than a trend-based one.
38 changes: 38 additions & 0 deletions src/quantlib_st/systems/provided/rules/breakout.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import numpy as np


def breakout(price, lookback=10, smooth=None):
"""
:param price: The price or other series to use (assumed Tx1)
:type price: pd.DataFrame

:param lookback: Lookback in days
:type lookback: int

:param lookback: Smooth to apply in days. Must be less than lookback! Defaults to smooth/4
:type lookback: int

:returns: pd.DataFrame -- unscaled, uncapped forecast

With thanks to nemo4242 on elitetrader.com for vectorisation

"""
if smooth is None:
smooth = max(int(lookback / 4.0), 1)

assert smooth < lookback

roll_max = price.rolling(
lookback, min_periods=int(min(len(price), np.ceil(lookback / 2.0)))
).max()
roll_min = price.rolling(
lookback, min_periods=int(min(len(price), np.ceil(lookback / 2.0)))
).min()

roll_mean = (roll_max + roll_min) / 2.0

# gives a nice natural scaling
output = 40.0 * ((price - roll_mean) / (roll_max - roll_min))
smoothed_output = output.ewm(span=smooth, min_periods=np.ceil(smooth / 2.0)).mean()

return smoothed_output
Loading