Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction in French

valhalla · June 23, 2021, 11:27am

Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction

For this project, one can use a randomly initialized or a pre-trained BART/T5 model.

Model

Pre-trained BART, T5 models can be found on the model hub.

Datasets

The dataset for this model can be prepared as described in this blog post.
One can make use OSCAR . The dataset is also available through the datasets library here: oscar · Datasets at Hugging Face.

Available training scripts

As this will be a Seq2Seq model, the run_summarization_flax.py script can be used for training.

(Optional) Desired project outcome

The desired outcome is to train a spelling correction model for the French language. This can be showcased directly on the hub or with a streamlit or gradio app.

(Optional) Challenges

Implementing the dataset noising function would be the challenging part of the project.

(Optional) Links to read upon

https://www.microsoft.com/en-us/research/blog/speller100-zero-shot-spelling-correction-at-scale-for-100-plus-languages/

Vaibhavbrkn · June 26, 2021, 7:07am

Hi @valhalla I have previously worked on grammar correction for English using T5, and it gives a great result. It would an exciting task to do same on French, would like to be part of this project.

khalidsaifullaah · June 29, 2021, 8:40pm

This one sounds interesting…

valhalla · June 30, 2021, 8:04am

Awesome! Let’s define this project then

Added you the team @khalidsaifullaah and @Vaibhavbrkn . Let me know if you have nay comments either here or in the sheet.

Vaibhavbrkn · June 30, 2021, 8:37am

Thanks @valhalla, but since I have new commitment for some projects, I can no longer will be a part of this project.

valhalla · June 30, 2021, 9:00am

Noted, removed you from the team.

ranimal · August 11, 2021, 12:41pm

interested in this project

Topic		Replies	Views
Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction in English Flax/JAX Projects	7	7370	October 11, 2021
Pre-train a Seq2Seq model for a Quick Vietnamese Input Method by mapping Ascii syllables that missing marke and tones to UTF-8 syllables. E.g. toi noi tieng Viet => tôi nói tiếng Việt Flax/JAX Projects	0	965	June 23, 2021
Create a spellchecking system 🤗 Course Projects	0	3419	November 10, 2021
Spelling correction model deletes word instead of fixing Beginners	0	289	September 27, 2022
Create your own writing assistant 🤗 Course Projects	18	3169	November 22, 2025

Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction in French