Skip to content

ZSYNOTZSH/FactualBench

Repository files navigation

FactualBench

Overview

FactualBench is a large-scale Chinese factual QA dataset introduced in EMNLP2025 Findings paper Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization. The dataset contains $180,504$ samples spanning $21$ domains, designed to evaluate and mitigate factual hallucinations in large language models through precise knowledge utilization.

Compared to earlier versions, we further refined the test split by:

  • Deduplicating questions that overlap with the training set
  • Removing low-quality or ambiguous samples
  • Applying time constraints to time-sensitive questions.

Repository Structure

.
├── FactualBench_train_v2.jsonl   # Training split
├── FactualBench_test_v2.jsonl    # Test split
├── evaluation_prompt.txt         # Prompt for model-based evaluation
└── evaluate.py                   # Script to compute accuracy

We adopt a model-based judgment strategy for evaluation, and gpt-4-0125 is used as the automatic evaluator in our paper.

Dataset Composition

Domain 中文名 Test Training Total
film & entertainment 影视娱乐 191 54,433 54,624
education & training 教育培训 147 3,702 3,849
physics, chemistry & mathematics & biology 数理化生 178 9,171 9,349
history & traditional culture 历史国学 186 18,086 18,272
biography 人物百科 190 11,829 12,019
politics & law 政治法律 155 6,354 6,509
economics & management 经济管理 141 4,537 4,678
computer science 计算机科学 146 6,247 6,393
medical 医学 128 7,057 7,185
sociology & humanity 社会人文 187 8,494 8,681
agriculture, forestry & fisheries & allied industries 农林牧渔 138 3,725 3,863
astronomy & geography 天文地理 151 3,887 4,038
sports & tourism 运动旅游 143 4,867 5,010
digital & automotive 数码汽车 159 3,881 4,040
industrial engineering 工业工程 149 3,279 3,428
military & war 军武战争 142 2,568 2,710
slang & memes 网词网梗 104 529 633
work & life 工作生活 131 5,849 5,980
high technology 高新科技 112 310 422
religion & culture 信仰文化 122 508 630
others 其他 - 18,191 18,191
Total - 3,000 177,504 180,504

Data Format

Each sample in FactualBench consists of a question $Q_i$ (question),

a standard answer $X_i^0$ (standard answer),

3 wrong answers ${X_i^j}$ (wrong answers),

and a domain $D_i$ it belongs to (domain).

An Example

Field Content
Question $Q_i$ 第一台微波量子放大器是在哪一年制成的?
In which year was the first microwave quantum amplifier made?
Standard Answer $X_i^0$ 第一台微波量子放大器是在1954年制成的。
The first microwave quantum amplifier was made in 1954.
Wrong Answer $X_i^1$ 第一台微波量子放大器是在1958年制成的。
The first microwave quantum amplifier was made in 1958.
Wrong Answer $X_i^2$ 第一台微波量子放大器是在1960年制成的。
The first microwave quantum amplifier was made in 1960.
Wrong Answer $X_i^3$ 第一台微波量子放大器是在1962年制成的。
The first microwave quantum amplifier was made in 1962.
Domain $D_i$ 高新科技
high technology

Notification

  • The dataset is constructed from a publicly available Internet encyclopedia (Baidu Baike).
  • It may contain references to individuals, locations, or medical and physiological concepts that are publicly known.
  • The data is collected strictly for research purposes and without any intent to violate privacy or safety policies.
  • ⚠️ Despite quality control efforts, the dataset may still contain inaccuracies or outdated facts (knowledge cutoff: 2025). FactualBench should not be treated as an authoritative knowledge base!!!

Citation

If you find this dataset useful, please cite:

@inproceedings{zhang-etal-2025-exploring-generalizability,
    title = "Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization",
    author = "Zhang, Siyuan  and
      Zhang, Yichi  and
      Dong, Yinpeng  and
      Su, Hang",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.211/",
    doi = "10.18653/v1/2025.findings-emnlp.211",
    pages = "3936--3968",
    ISBN = "979-8-89176-335-7"
}

About

The official repository for the dataset FactualBench, which is introduced in paper "Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages