Project : calligramme
Section: Software
Linguistic Resource Development
In order to get actual lexicons to run Leopar, we needed to develop some lexical resources. The general architecture is the following:
Lexicon resources are described in two different databases: one for morphological informations and the other one for syntactical aspects; the two databases are compiled in a morpho-syntactical lexicon that combines the two kinds of information. In this compiled lexicon, feature structures are used to represent morpho-syntactical features associated to each flexed form.
From a metagrammar, through XMG (see 5.2), we generate anonymous tree descriptions that can occur in the targeted language (French); each tree description comes with a feature structure (called interface) that describes how this tree should be anchored in the lexicon database.
Finally, we use feature structure unification to combine grammatical and morpho-syntactic databases. When unification between the feature structure of a word (given by the morpho-syntactic database) and the interface of a tree description succeeds, the word anchored the corresponding tree description which is now fully instantiated.
To this end, in addition to the tools to merge the different kinds of lexicons, Bruno Guillaume and Sylvain Pogodalla have developed a tool( http://www.loria.fr/equipes/calligramme/litote/LIB/LEX-READER/lex-reader.tar.gz) that can produce Leopar formatted morphological lexicons from external morphological lexicons (as for now, from our own verb descriptions(About 6 000 verbs, 300 000 flexed forms, available at http://www.loria.fr/equipes/calligramme/litote/) and from the morphological lexicon Morphalou( http://actarus.atilf.fr/morphalou/) provided by the ATILF(Analyse et Traitement Informatique de la Langue Française http://www.atilf.fr/)).
This tool is also used in the concordancers we provide( http://www.loria.fr/equipes/calligramme/litote/concordancer/) (based on the Test Suites for Natural Language Processing (TSNLP( http://cl-www.dfki.uni-sb.de/tsnlp/)) and on Le tour du Monde en 80 jours(Available on the site ABU : la Bibliothèque Universelle http://abu.cnam.fr/)(J. Verne). These concordancers are used in our project-team and in the Langue et Dialogue INRIA project-team to help grammar writers.
Two students at École des Mines (Damien Auricchio and Nelson Da Silva) also worked on factorizing morphological informations of flexed forms and on comparison of UNITEX( http://www-igm.univ-mlv.fr/~unitex/) morphological lexicon and our own verb lexicon during a training period of three months.