About me

After graduating from ENSIMAG (École Nationale Supérieure d’Informatique et de Mathématiques Appliquées, a computer science and applied mathematics school of engineer) in 2015, I obtained a master’s degree in NLP at Université Paris-Sorbonne. I am now a third year PhD candidate at Université Paris-Sorbonne, STIH team, under the supervision of Karën Fort and Claude Montacié.

Curriculum vitae: (en) CV (en) (fr) CV (fr)


My research focuses on questionning whether crowdsourcing (and games with a purpose) can be used to gather linguistic resources for the so-called "less-resourced" languages.

I have developed a platform to enable collaborative part-of-speech annotation of an Alsatian (French regional language) corpus: BISAME, and a platform to collect recipes, dialectal and spelling variants and POS tags for the Alsatian language Recettes de Grammaire (gsw) and the Guadeloupean Creole Recettes de Grammaire (gcf). Feel free to share theses links, create an account and contact me if you need further information!



  • Alice Millour and Karën Fort. Unsupervised Data Augmentation for Less-Resourced Languages with no Standardized Spelling. RANLP 2019, September 02-04, Varna, Bulgaria
  • Alice Millour, Marianne Grace Araneta, Ivana Lazić Konjik, Annalisa Raffone, Yann-Alan Pilatte and Karën Fort. Katana and Grand Guru: a Game of the Lost Words. 9th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, May 17-19, 2019, Poznań, Poland. 02106757
  • Alice Millour. Getting to Know the Speakers: a Survey of a Non-Standardized Language Digital Use. 9th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, May 17-19, 2019, Poznań, Poland.


  • Alice Millour and Karën Fort. À l’écoute des locuteurs : production participative de ressources langagières pour des langues non standardisées. Revue TAL, numéro spécial traitement automatique des langues peu dotées, 59-3.
  • Alice Millour, Karën Fort, Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen, Proc. of the LREC workshop CCURL 2018, Miyazaki, Japan, May 2018. Pdf

  • Alice Millour, Karën Fort, Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing, Language Resources and Evaluation Conference (LREC), Miyazaki, Japan, May 2018. Pdf


  • Alice Millour, Karën Fort, Delphine Bernhard et Lucie Steiblé. Vers une solution légère de production de données pour le TAL : création d’un tagger de l’alsacien par crowdsourcing bénévole. Actes de Traitement Automatique des Langues Naturelles (TALN), Orléans, France, juin 2017- Présentation orale. Pdf

  • Alice Millour and Karën Fort. Why do we Need Games? Analysis of the Participation on a Crowdsourcing Annotation Platform. Symposium Games4NLP, Valencia, Spain, April 2017. Pdf

  • 26 avril : Séminaire invité «Recherches linguistiques et corpus» organisé par Franck Neveu à l’Université Paris Sorbonne : « Construire un corpus annoté pour une langue peu dotée : annotation collaborative en partie du discours d’un corpus de l’alsacien.»


  • My master’s degree thesis is available here Master's Thesis(French) (French only).


  • Co-organizer with Karën Fort of the first meeting of the Working Group on Variations of GDR Lift
  • Subreviewer for ACL 2019
  • Reviewer for the 6th Biennial Worshop on Less-Ressourced Languages, co-located with LTC 2019
  • Reviewer for RECITAL 2019
  • Invited seminar “Cognition & Langage”, at IDMC (Institut des Sciences du Digital - Management & Cognition, Nancy), the 6th of January 2019, organized by Maxime Amblard and Manuel Rebuschi
  • Member of Task 1 (Quest game) during the Crowdfest organized by enetCollect in Brussels (22nd to 25th of January 2019). We developed the prototype of a game fostering transgenerational language transmission while enabling linguistic resources collection. Final presentation: , Poster:
  • Subreviewer for NAACL 2019
  • Subreviewer for the Web Conference 2019
  • Subreviewer for Information Processing and Management 2019
  • Subreviewer for TALN 2017
  • Member of the organization committee of the DiLiTAL’s workshop: Diversité Linguistique et TAL (Linguistic Diversity and NLP), taking place during TALN 2017.

  • Student volunteer at TALN 2016.


Alice Millour
STIH Université Paris-Sorbonne
Maison de la recherche
28, rue Serpente
75006 Paris

Email: aliceI.millourdon't@wantspam! Sopleaseparis-sorleave bonneme.alonef!r