The ClimateSense KG is a continuously updated knowledge graph that integrates climate fact-checking data from multiple sources to combat climate misinformation. It links information from fact-checking organizations with enriched data, giving researchers a more comprehensive view of the problem.
- Multi-source ingestion from major climate fact-checking organizations
- Data enrichment with:
- Text extraction from URLs using trafilatura
- Entity linking using DBpedia Spotlight
- Factors classification using fine-tuned BERT models
- RDF output using Schema.org and CIMPLE ontology
- Triple store deployment supporting Virtuoso
- YAML-based configuration
- URI design patterns and RDF namespaces
- Public SPARQL endpoint: https://data.climatesense-project.eu/sparql
- Python 3.11+
- uv (for dependency management)
- just (for task automation)
- Docker & Docker Compose (for Docker setup)
Install:
git clone https://github.com/climatesense-project/climatesense-kg.git
cd climatesense-kg
just installRun:
just run config/minimal.yamlRequirements:
Initial Setup:
-
Clone the repository and navigate to the docker directory:
git clone https://github.com/climatesense-project/climatesense-kg.git cd climatesense-kg/docker -
Copy and configure environment variables:
cp .env.example .env
Edit
.envto configure:GITHUB_TOKEN: GitHub token used for private repositoriesVIRTUOSO_HOST: Virtuoso host name (defaultvirtuoso)VIRTUOSO_PORT: Virtuoso HTTP/SPARQL port (default8890)VIRTUOSO_ISQL_PORT: Virtuoso ISQL port (default1111)VIRTUOSO_USER: Virtuoso database user (defaultdba)VIRTUOSO_PASSWORD: Virtuoso database password (defaultdba)VIRTUOSO_ISQL_SERVICE_URL: Virtuoso ISQL HTTP endpoint (defaulthttp://isql-service:8080)ISQL_SERVICE_PORT: Published port for the ISQL helper service (default8080)CIMPLE_FACTORS_API_URL: CIMPLE Factors API base URL (defaulthttp://localhost:8000)POSTGRES_HOST: Cache database host (defaultpostgres)POSTGRES_PORT: Cache database port (default5432)POSTGRES_DB: Cache database name (defaultclimatesense_cache)POSTGRES_USER: Cache database user (defaultpostgres)POSTGRES_PASSWORD: Cache database password (required)ANALYTICS_SPARQL_ENDPOINT: Virtuoso SPARQL endpoint for analytics (defaulthttp://virtuoso:8890/sparql)ANALYTICS_ALLOWED_ORIGINS: Comma-separated origins permitted to call the analytics API (defaulthttp://localhost:3000)ANALYTICS_CACHE_TTL: Analytics API cache TTL in seconds (default60)ANALYTICS_SPARQL_TIMEOUT: SPARQL timeout in seconds for analytics queries (default20)NEXT_PUBLIC_ANALYTICS_API_URL: Base URL the dashboard uses for the analytics API (defaulthttp://localhost:8000)ANALYTICS_API_PORT: Published port for the analytics API container (default8000)ANALYTICS_UI_PORT: Published port for the analytics UI container (default3000)
-
Start the services:
docker compose up -d
-
Run the pipeline:
docker compose run --rm pipeline run -c config/minimal.yaml
The pipeline uses YAML-based configuration. Example config:
data_sources:
- name: "claimreview_sample"
type: "claimreviewdata"
input_path: "samples/claimreviewdata-data"
- name: "euroclimatecheck_sample"
type: "euroclimatecheck"
input_path: "samples/euroclimatecheck-data"
enrichment:
url_text_extraction:
enabled: true
rate_limit_delay: 0.5
timeout: 15
max_retries: 2
dbpedia_spotlight:
enabled: true
api_url: "https://api.dbpedia-spotlight.org/en/annotate"
confidence: 0.6
support: 30
timeout: 20
rate_limit_delay: 0.2
bert_factors:
enabled: true
batch_size: 32
max_length: 128
timeout: 30
rate_limit_delay: 0.1
output:
format: "turtle"
output_path: "data/rdf/{DATE}/{SOURCE}.ttl"
base_uri: "http://data.climatesense-project.eu"
cache:
cache_dir: "cache"
default_ttl_hours: 24.0You can use any PostgreSQL client to connect to the PostgreSQL cache database and run SQL queries.
-- Processing success rates by step
SELECT step, COUNT(*) AS total, COUNT(*) FILTER (WHERE success) AS successes
FROM cache_entries GROUP BY step;
-- Error analysis by domain
SELECT split_part(payload->'payload'->>'review_url', '/', 3) AS domain, COUNT(*) AS failures
FROM cache_entries WHERE success = false GROUP BY domain;Once loaded into Virtuoso, query the knowledge graph using SPARQL:
- SPARQL Endpoint: http://localhost:8890/sparql
- Faceted Browser: http://localhost:8890/fct
Find all climate claims:
PREFIX schema: <http://schema.org/>
SELECT ?claim ?text ?rating
WHERE {
?claim a schema:ClaimReview ;
schema:claimReviewed ?text ;
schema:reviewRating ?rating .
}
LIMIT 10Find claims by fact-checking organization:
PREFIX schema: <http://schema.org/>
SELECT ?claim ?author
WHERE {
?claim a schema:ClaimReview ;
schema:author ?author .
}
LIMIT 10just setup-devjust format # Format code with ruff
just check # Run linting and type checks
just pre-commit-all # Run pre-commit on all files# Display help
uv run climatesense-kg --help
# Run minimal pipeline with debug logging
uv run climatesense-kg run --config config/minimal.yaml --debug
# Run daily pipeline skipping data download and forcing full RDF regeneration
uv run climatesense-kg run --config config/daily.yaml --skip-download --force-regenerateThis project builds upon the work of the CIMPLE project and reuses components from: