GitHub - Sharif-SLPL/tComp: Token-level attribution and alignment analysis for transformer-based machine translation, combining fairseq pipelines with a novel interpretability method (tComp).

Token-Level Attribution and Alignment Analysis for Machine Translation

This repository explores token-level contribution (attribution) analysis and word alignment quality for transformer based machine translation models. It combines fairseq-based translation pipelines with a novel interpretability tooling (tComp) and alignment evaluation (AER) to study how input and generated tokens contribute to a new token prediction. The repo contains reproducible notebooks, alignment utilities, configuration dataclasses for token-compositional analyses, and a local editable installation of fairseq for controlled experimentation.

Repository structure and contents

notebooks/: Reproducible experiments and demos
- AER.ipynb: Alignment Error Rate (AER) evaluation workflow
- changed_fairseq_usage_tcomp.ipynb: Examples of modified fairseq library that includes our tComp interpretability method usage for translation and token-level analysis
alignment/: Alignment utilities
- align.py: Alignment routines
- aer.py: Alignment Error Rate computation utilities
tcomp_utils.py: Dataclasses for token-compositional analysis configuration and outputs (encoder/decoder fields)
fairseq-main/: Local editable copy of fairseq (installed in editable mode); contains upstream code, examples, and CLI tools
requirements.txt: Python package versions used in this project
set-up.sh: Convenience script to fetch and install fairseq in editable mode with required tokenization tools

Data and file description

Model files: A URL is provided for the pretrained WMT19 de-en model in notebooks/changed_fairseq_usage_tcomp.ipynb for local experimentation.
Input data: We provide paths/links to source/target texts used in experiments.
Derived data / results: Alignment outputs, attribution tensors, and metrics (e.g., AER) can be produced by the notebooks and scripts.

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
bash set-up.sh  # installs fairseq in editable mode and tokenization utilities

Notes

set-up.sh downloads upstream fairseq (main branch) and installs it with pip install -e .. Restart your Python kernel after installation if using notebooks.
CUDA builds in requirements.txt (e.g., torch==2.3.1+cu118) may require a matching CUDA toolkit/driver.

Usage and reproducibility

Notebooks:
- Use notebooks/changed_fairseq_usage_*.ipynb to run our introduced tComp interpretability method on a de-en transformer based machine translation model.
- Use notebooks/AER.ipynb to compute AER on alignment outputs.
Alignment utilities:
- alignment/align.py: Produce alignments from parallel data or model outputs.
- alignment/aer.py: Compute AER given gold and predicted alignments.
Token-level analysis:
- Configure our default configs for tComp method via tcomp_utils.tcompConfig (e.g., include biases, FFN approximation types, layer outputs).

How to cite

Example BibTeX stub:

@misc{thesis_attribution_alignment,
  author  = {Amirzadeh, Hamidreza},
  title   = {A Novel Token-Level Attribution and Alignment Analysis for Machine Translation},
  year    = {2025},
  howpublished = {Git repository},
  url     = {https://github.com/hamid-amir/tComp},
}

License

MIT

Contact

For questions or issues, please open an issue on the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Token-Level Attribution and Alignment Analysis for Machine Translation

Repository structure and contents

Data and file description

Installation

Notes

Usage and reproducibility

How to cite

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
alignment		alignment
fairseq-main		fairseq-main
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
set-up.sh		set-up.sh
tcomp_utils.py		tcomp_utils.py

Folders and files

Latest commit

History

Repository files navigation

Token-Level Attribution and Alignment Analysis for Machine Translation

Repository structure and contents

Data and file description

Installation

Notes

Usage and reproducibility

How to cite

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages