ToMMeR – Efficient Entity Mention Detection from Large Language Models

Victor Morand¹ Nadi Tomeh² Josiane Mothe³ Benjamin Piwowarski¹

¹Sorbonne Université, CNRS, ISIR, F-75005 Paris, France
²LIPN, Université Sorbonne Paris Nord, UMR7030 CNRS
³IRIT, Université de Toulouse, UMR5505 CNRS, F-31400 Toulouse, France

ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.

Abstract

Identifying which text spans refer to entities - mention detection - is both foundational for information extraction and a known performance bottleneck. We introduce ToMMeR, a lightweight model (<300K parameters) probing mention detection capabilities from early LLM layers. Across 13 NER benchmarks, ToMMeR achieves 93% recall zero-shot, with over 90% precision using an LLM as a judge showing that ToMMeR rarely produces spurious predictions despite high recall. Cross-model analysis reveals that diverse architectures (14M-15B parameters) converge on similar mention boundaries (DICE >75%), confirming that mention detection emerges naturally from language modeling. When extended with span classification heads, ToMMeR achieves near SOTA NER performance (80-87% F1 on standard benchmarks). Our work provides evidence that structured entity representations exist in early transformer layers and can be efficiently recovered with minimal parameters.

Installation

Using Pip

uv pip install -e git+https://github.com/VictorMorand/llm2ner.git

Local install for Dev

Using `uv`

We suggest using uv, a super fast package manager. The following commands will clone the repo and install it within a new ready-to-use .venv with all dependencies in a few minutes.

git clone https://github.com/VictorMorand/llm2ner.git
cd llm2ner
uv sync

Usage

All trained models are available on HugginFace: 🤗 https://huggingface.co/llm2ner/models
See the demo notebook ToMMeR_Demo.ipynb

Raw inference

tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L3_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")

#tokenize in shape (1, seq_len)
tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device)
# Output raw scores
output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")

#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy")
str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")

>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']

HTML output

We also provide plotting options, outputting html for fancy notebook / web app display.

import llm2ner
from llm2ner import ToMMeR

tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L3_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "

#fancy interactive output
outputs = llm2ner.plotting.demo_inference( text, tommer, llm,
    decoding_strategy="threshold",  # or "greedy" for flat segmentation
    threshold=0.5, # default 50%
    show_attn=True,
)

Running experiments

Experimaestro is used to launch and monitor experiments. You can run an experiment training a ToMMeR Model on the specified Dataset with the following command:

uv run experimaestro run-experiment experiments/trainTokenMatching

Acknowledgements

We depend on several key packages:

experimaestro-python for experiment management.
transformer-lens can be used for wrapping LLMs in a generic HookedTransformer class with a unified nomencature for placing Hooks. It is build upon the hugginface transformers library.

Citation

If you find this work useful, please cite the associated paper:

@misc{morand2025tommerefficiententity,
      title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models}, 
      author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
      year={2025},
      eprint={2510.19410},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.19410}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Assets		Assets
Notebooks		Notebooks
experiments		experiments
saved_models/ToMMeR-Llama-3.2-1B_L6_R64		saved_models/ToMMeR-Llama-3.2-1B_L6_R64
src/llm2ner		src/llm2ner
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToMMeR – Efficient Entity Mention Detection from Large Language Models

Abstract

Installation

Using Pip

Local install for Dev

Using `uv`

Usage

Raw inference

HTML output

Running experiments

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ToMMeR – Efficient Entity Mention Detection from Large Language Models

Abstract

Installation

Using Pip

Local install for Dev

Using uv

Usage

Raw inference

HTML output

Running experiments

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `uv`

Packages