Funded by: FCT (Carnegie Mellon | Portugal Research Projects Program) [CMU-PT/HuMach/0053/2008]
Start date: 01 April 2009
Duration: 36 months


The main purpose of this project is to build a Portuguese version of REAP (“REAding Practice”), a tutoring system developed at LTI to support the teaching of a language for either native or foreign speakers, through the activity of reading and focusing the students in learning vocabulary in context.


  • INESC-ID – Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa, Portugal
  • Carnegie Mellon University (CMU)
  • Universidade do Algarve (UAlg)
  • Fundação da Universidade de Lisboa (FUL/UL)

Main Researchers

Nuno Mamede Jorge Baptista Maria do Céu Viana Maxine Eskenazi
Isabel Trancoso Maria de Lurdes Cabral Palmira Marrafa Adam Skory
David Martins de Matos Joaquim Guerra Ana Isabel Mata Gabriel Parent
Thomas Pellegrini Neuza Baptista
Luís Marujo
Rui Correia
Ana Cristina Mendes
Ricardo Ribeiro
Rui Amaral
José David Lopes


The development of REAP.PT has started very recently in the framework of the CMU-Portugal program. A rudimentary version is already in place, resulting from the tight cooperation of the LTI and L2F teams. The current proposal involves two major phases: in the first one, the project teams will build a baseline version for European Portuguese, integrating all the linguistic tools and resources for that language, in order to build an equivalent version to the current English system, and making the necessary adaptations for this typologically different language. In the second phase, the project teams will concentrate on the very wide range of open research challenges presented by the possibility of learning from current texts on topics of the student´s interests. Therefore, in most of the 9 tasks of this project, two significant milestones can thus be identified, corresponding to the development of the baseline and extended versions.

The project is structured into 9 main tasks: corpus collection (T1), readability (T2), interface (T3), definitions (T4), search (T5), question generation (T6), student modeling (T7), and oral comprehension (T8). The last task (T9) will report field trials.

Task 1 (Corpus Collection) will be responsible for crawling the web for Portuguese pages and filtering them, keeping only those that satisfy all the requirements. The Readability task (Task 2) will attribute a reading difficulty value to each document collected in the previous task. Task 3 (Interface) will adapt the REAP interface to Portuguese, and Task 4 (Definitions) will integrate a Portuguese dictionary in the system. Task 5 (Search) is responsible for the retrieval of texts satisfying particular pedagogical constraints such as reading level and text length. Task 6 (Question Generation) will automatically generate fill-in-the-blank questions with multiple choice, which will be used as practice exercises for focus words. Following the practice session, the system updates the student model, defined on task 7, based on his or her performance. Task 8 will focus on practicing the oral comprehension and Task 9 will evaluate the several versions of the system.


José Lopes, Maxine Eskenazi and Isabel Tranco Towards Choosing Better Primes for Spoken Dialog Systemsin IEEE ASRU, Hawaii, 2011 (to be published).

André Silva, Nuno Mamede, Alfredo Ferreira, Jorge Baptista, João Fernandes, Towards a Serious Game for Portuguese Learning, 2nd International Conference on Serious Games Development and Applications (SGDA 2011), Oct. 2011, Springer-Verlag.

Ricardo Portela, Nuno Mamede, Jorge Baptista, Multiword Identification, Terceiro Simpósio de Informáctica (INFORUM 2011), Oct. 2011, pp. 663-674, Departamento de Engenharia Informática da Universidade de Coimbra.

Thomas Pellegrini, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno Mamede, Automatic generation of listening comprehension learning material in European Portuguese, In Interspeech, pages 1629-1632, Florence, August 2011.

Rui Correia, Thomas Pellegrini, Maxine Eskenazi, Isabel Trancoso, Jorge Baptista, Nuno J. Mamede, Listening Comprehension Games for Portuguese: Exploring the Best Features, In Speech and Language Technology in Education (SLaTE), August 2011.

Ana Mendes, Luísa Coheur, An approach for answer selection in Question Answering based on semantic relations, In Twenty-second International Joint Conference on Artificial Intelligence, AAAI Press/International Joint Conferences on Artificial International, series Proceedings of the Twenty-Second International Joint Conference, pages 1852-1857, Barcelona, July 2011.

José Lopes, Isabel Trancoso, Alberto Abad, a Nativeness Classifier for TED Talks, In International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2011.

José Lopes, Isabel Trancoso, Rui Correia, Thomas Pellegrini, Hugo Meinedo, Nuno Mamede, Maxine Eskenazi, Multimedia Learning Materials, in IEEE Spoken Language Technology Workshop, IEEE, Berkeley, USA, December 2010.

Rui Correia, Jorge Baptista, Nuno Mamede, Isabel Trancoso, Maxine Eskenazi, Automatic Generation of Cloze Question Distractors, In Second Language Studies: Acquisition, Learning, Education and Technology, SLaTE: the ISCA SIG on Speech and Language Technology in Edu, Waseda University, Tokyo, Japan, September 2010.

Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Silva, Nuno J. Mamede, Extending the punctuation module for European Portuguese, In Proc. of Interspeech 2010, Mukari, Japan, September 2010.

Thomas Pellegrini, Isabel Trancoso, Improving ASR error detection with non-decoder based features, in Interspeech 2010, Tokyo, September 2010.

Caroline Hagège, Jorge Baptista, Nuno Mamede, Caracterização e Processamento de Expressões Temporais em Português, Linguamática, vol. 2, n. 1, pages 63-76, April 2010.

Gracinda Carvalho, David Martins de Matos, and Vítor Rocio, Improving IdSay: a characterization of strengths and weaknesses in Question Answering systems for Portuguese, In International Conference on Computational Processing of Portuguese Language (PROPOR 2010), Springer, Porto Alegre, Brasil, April 2010.

Jorge Baptista, Neuza Costa, Joaquim Guerra, Marcos Zampieri, Maria Cabral, and Nuno Mamede, P-AWL: Academic Word List for Portuguese, In 9th International Conference on Computational Processing of the Portuguese Language (Propor 2010), Springer, vol. 6001, pages 120-123, Porto Alegre, Brazil, April 2010.

Jorge Baptista, Nuno Mamede, and Fernando Gomes, Auxiliary verbs and verbal chains in European Portuguese, In International Conference on Computational Processing of Portuguese Language (PROPOR 2010), Springer, vol. 6001, pages 110-119, Porto Alegre, Brazil, April 2010.

Thomas Pellegrini, Isabel Trancoso, Error detection in automatic transcriptions using Hidden Markov Models, In Language & Technology Conference, Poland, November 2009,

Luís Marujo, José Lopes, Nuno Mamede, Isabel Trancoso, Juan Pino, Maxine Eskenazi, Jorge Baptista, and Céu Viana, Porting REAP to European Portuguese. Proceedings of the SLaTE Workshop on Speech and Language Technology in Education. September 2009. Demo presented by José David Lopes at SLaTE.


