GE2PE: Context-based Persian Grapheme-to-Phoneme Conversion Using Sequence-to-Sequence Models

Many Text-to-Speech (TTS) systems, particularly in low-resource environments, struggle to produce natural and intelligible speech from grapheme sequences. One solution to this problem is to use Grapheme-to-Phoneme (G2P) conversion to increase the information in the input sequence and improve the TTS output. However, current G2P systems are not accurate or efficient enough for Persian texts due to the language’s complexity and the lack of short vowels in Persian grapheme sequences. In our study, we aimed to improve resources for the Persian language. To achieve this, we introduced two new G2P training datasets, one manually-labeled and the other machine-generated, containing over five million sentences and their corresponding phoneme sequences. Additionally, we proposed two new evaluation datasets for Persian sub-tasks such as Kasre-Ezafe detection, homograph disambiguation, and out-of-vocabulary words. Finally, we developed a new sentence-level end-to-end model to address the challenges of the Persian language. This model was trained using a two-step method, introduced in this thesis, to maximize the impact of manually-labeled data. Our results showed that our model outperformed the state-of-the-art by 0.04% in PER, 1.86% in WER, 4.03% in Kasre-Ezafe Recall, and 3.42% in homograph disambiguation accuracy using the data and metrics proposed in this work.
Keywords: Grapheme-to-Phoneme Conversion, End-to-End Model, Semi-supervised Learning, Transformer

Data

Training Data
We use two datasets for training:

FarsDat_aligned, available in the data directory of this project.
Machine_generated, available using the link provided in the data directory.

Evaluation Data
We use two datasets for evaluation:

Kasre_test, available in the data directory of this project.
Homograph_test, also available in the data directory of this project.

Training

For pre-training and fine-tuning the model you can use the training notebook.

Testing

The GE2PE.py file contains the final module introduced in this thesis. You can use the G2P_base notebook to download all the requirements and generate output using our final module.

Models

The final pre-trained and fine-tuned versions of GE2PE are accessible trough the links provided in Models.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data		Data
G2P_base.ipynb		G2P_base.ipynb
GE2PE.py		GE2PE.py
LICENSE		LICENSE
Models.txt		Models.txt
README.md		README.md
training.ipynb		training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GE2PE: Context-based Persian Grapheme-to-Phoneme Conversion Using Sequence-to-Sequence Models

Data

Training

Testing

Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GE2PE: Context-based Persian Grapheme-to-Phoneme Conversion Using Sequence-to-Sequence Models

Data

Training

Testing

Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages