Project : atoll
Section: New Results
Keywords : Context-sensitive grammatical formalisms, shared parse forests, range concatenation grammars, lexical functional grammars, polynomial parse time, grammatical modularity, finite transducers.
Contextual Parsing with LFGs
Participants : Pierre Boullier, Benoît Sagot.
- MCS
Mildly Context-sensitive Grammars
- RCG
Range Concatenation Grammars
- TAG
Tree Adjoining Grammars
This year, our work mainly concentrates on the improvement of our Lexical Functional Grammar parser SXLFG which focus on the sharing of identical computations: recent advances include (among others) techniques known as cyclic functional structures, lazy evaluation, internal disambiguation.
Lexical Functional Grammar (LFG) is a grammatical theory assuming two parallel levels of syntactic representation: constituent structure (c-structure) and functional structure (f-structure).
C-structures have the form of context-free phrase structure trees;
F-structures are sets of pairs of attributes and values; attributes may be features, such as tense and gender, or functions, such as subject and object.
At least at a conceptual level, we may see an LFG parser as a two-phase process: the first phase is a CF parser which builds the C-structure while the second phase evaluates the F-structure on the tree built by the first phase. However, the CF-backbone of real linguistic grammars (including LFG) are usually massively ambiguous. For example, for a sentence, we have exceeded the capacity of a single floating point 32 bit word in counting its number of parse trees. In Atoll, we know how to handle such a combinatorial explosion of resulting tree structures. In the LFG context, this means that, for any given sentence w, we can compute in polynomial time a polynomial size parse forest which represents all the possible C-structures of w (See for example [5]). However, the efficient evaluation of F-structures on parse forests is still a research problem. Of course, the unfolding of the parse forest into single trees upon which F-structures are evaluated is not a viable method. We have designed and implemented a method which evaluates F-structures directly on a parse forest and which shares common [sub-]computations.
The coupling of our guided Earley parser with the previous shared computation of F-structures results in a new LFG parser called SXLFG. It is now able to handle cyclic F-structures, implements a lazy unification to optimise the computation of these structures, and allows the grammar writer to specify disambiguation heuristics that can be applied on F-structures associated with any node of the forest. This improvements w.r.t. the first version of SXLFG have resulted in parsers that run approximately 5 to 10 times faster.
Though this parser still needs to be improved, it is sufficiently mature to support full natural language descriptions. SXLFG is one of the three parsers used by Atoll in the EASy campaign (cf. 7.3). We are starting to use the new version of SXLFG to parse large corpora, in order to validate our parsing techniques but also to learn information from the resulting analyses.