BioScanCast is an open source pipeline that uses large language models and automated web retrieval to produce forecasts for biosecurity related questions.
The system gathers information from the internet, filters relevant sources, extracts structured insights, and produces probabilistic forecasts with confidence scores. The project also evaluates model forecasts against human expert forecasts.
This repository contains the full pipeline implementation, benchmarking framework, and tooling required to reproduce experiments.
- Build an open source forecasting system for biosecurity questions.
- Benchmark model forecasts against human expert forecasters.
- Provide a fully reproducible research pipeline suitable for publication.
- Produce accessible outputs including technical documentation and public explanations.
The system follows a modular pipeline with five stages.
-
Search stage Collect candidate sources from the internet.
-
Filtering stage Identify credible and relevant sources.
-
Extraction stage Retrieve and clean content from selected sources.
-
Insight stage Extract structured information such as events and timelines.
-
Forecasting stage Use structured information to generate forecasts and confidence scores.
Each stage is implemented as an independent module so developers can work on components without affecting the rest of the system.
bioscancast/
│
├── bioscancast/
│ ├── pipeline/
│ ├── stages/
│ ├── schemas/
│ ├── llm/
│ ├── retrieval/
│ ├── evaluation/
│ ├── datasets/
│ └── utils/
│
├── configs/
├── data/
├── scripts/
├── notebooks/
├── tests/
└── docs/
The sections below describe the purpose of each directory.
The bioscancast/ directory contains the main application code.
bioscancast/
This package implements the forecasting pipeline and supporting modules.
bioscancast/pipeline/
Responsible for coordinating execution across stages.
Files:
orchestrator.py Controls pipeline flow. Calls each stage sequentially.
pipeline_runner.py Entry point for running the full pipeline.
pipeline_types.py Shared data types used to pass outputs between stages.
Developers modifying pipeline order or execution logic should work here.
bioscancast/stages/
Each stage of the forecasting pipeline is implemented in its own folder.
Stages should remain independent and communicate only through defined schemas.
bioscancast/stages/search_stage/
Purpose:
Generate candidate sources relevant to a forecasting question.
Expected tasks:
• build search queries • call search APIs • retrieve top results
Expected outputs:
List[SearchResult]
Example modules:
search_engine.py query_builder.py search_clients/
bioscancast/stages/filtering_stage/
Purpose:
Identify credible and relevant sources.
Expected tasks:
• LLM relevance classification • source credibility checks • removal of duplicate or low quality URLs
Expected outputs:
List[FilteredURL]
Example modules:
url_ranker.py source_validator.py relevance_model.py
bioscancast/stages/extraction_stage/
Purpose:
Retrieve and normalize content from selected sources.
Expected tasks:
• scrape HTML pages • download PDFs • parse documents • clean text
Expected outputs:
List[Document]
Example modules:
scraper.py html_parser.py pdf_parser.py text_cleaner.py
bioscancast/stages/insight_stage/
Purpose:
Extract structured information from text.
Expected tasks:
• event extraction • timeline construction • key insight identification
Expected outputs:
DataFrame[InsightRecord]
Example modules:
information_extractor.py event_parser.py timeline_builder.py dataframe_builder.py
bioscancast/stages/forecasting_stage/
Purpose:
Produce probabilistic forecasts based on extracted insights.
Expected tasks:
• generate model prompts • apply reasoning models • calibrate probabilities • produce confidence scores
Expected outputs:
ForecastOutput
Example modules:
forecaster.py prompt_templates.py confidence_calibration.py
bioscancast/schemas/
Defines structured data objects shared between pipeline stages.
Examples:
search_result.py document.py extracted_event.py forecast_output.py
All stages should communicate using these schemas. Do not pass raw dictionaries between stages.
bioscancast/llm/
Provides abstraction layers for language models.
Supported providers may include:
• OpenAI • Anthropic • Local models
Example files:
llm_client.py openai_client.py anthropic_client.py
Stages should call these clients rather than directly interacting with APIs.
bioscancast/retrieval/
Tools for document retrieval and embedding.
Examples:
document_store.py embedding_model.py chunking.py
Used by extraction and insight stages.
bioscancast/evaluation/
Contains benchmarking and evaluation logic.
Examples:
benchmark_loader.py scoring.py brier_score.py calibration_metrics.py human_comparison.py
Used to compare model forecasts against human forecasts.
bioscancast/datasets/
Contains definitions for forecasting datasets and curated source lists.
Examples:
forecast_questions.py biosecurity_sources.py
bioscancast/utils/
General purpose helpers used throughout the codebase.
Examples:
logging utilities configuration loading rate limiting caching
configs/
Configuration files for models, scraping behavior, and pipeline parameters.
Examples:
model configuration API settings scraping limits LLM prompt settings
These files allow experimentation without modifying code.
data/
Stores intermediate and benchmark data.
Subdirectories:
raw original scraped data
processed cleaned datasets
benchmarks forecasting evaluation datasets
scripts/
Command line tools used to run experiments and pipelines.
Examples:
run_pipeline.py Runs the full forecasting pipeline.
run_benchmark.py Evaluates model forecasts against benchmark datasets.
scrape_sources.py Bulk scraping utility for collecting documents.
evaluate_forecasts.py Computes evaluation metrics.
Scripts are intended for operational tasks rather than reusable code.
notebooks/
Used for exploratory analysis and experimentation.
Examples:
model experiments prompt exploration benchmark analysis
Notebook code should not be required for the main pipeline.
tests/
Unit and integration tests for pipeline components.
Examples:
stage level tests pipeline execution tests schema validation tests
Each pipeline stage should include test coverage.
docs/
Project documentation and architecture notes.
Examples:
system architecture pipeline design benchmark methodology API documentation
These documents support research publication and developer onboarding.
- Pipeline stages must remain modular.
- Data passed between stages must use schemas.
- Stages should not import logic from other stages.
- Experimental code should live in notebooks or scripts.
- Reproducibility is a core requirement.
Example:
python scripts/run_pipeline.py
Example benchmark run:
python scripts/run_benchmark.py
Developers should work within a single pipeline stage whenever possible. Changes that affect data contracts or schemas should be discussed before merging.
All contributions should include tests.