wiki:User Documentation

Overview

The PET system for efficient processing of unification-based grammars is an industrial strength implementation of the typed feature structure formalism used in DELPH-IN grammars. PET reads the exact same source files (modulo some configuration options) as the LKB grammar development environment and produces identical results. In a nutshell, PET can be viewed as a high-efficiency batch processing and application delivery engine, while the LKB mainly targets interactive grammar development.

Some features of PET include:

  • unknown word support, instantiating generic lexical entries at run-time
  • subsumption-based ambiguity factoring (giving a significant improvement in parsing efficiency for long inputs)
  • parse ranking according to a statistical parse selection model
  • compilation of the (Common-Lisp) MRS LKB code base, enabling output of (R)MRSs
  • output of fragmentary analysis hypotheses in case of parse failures
  • lattice input (via the YY, PIC and SMAF input formats, cf. PetInput)
  • a variety of XML-based input formats that generalize the lattice-oriented YY input mode
  • pruning of the search space using a PCFG model

When installed, PET comprises the executable files cheap (bottom-up chart parser), flop (the grammar compiler) and fspp (tokenizer).

External Documentation

Lots of documenation still resides in another wiki hosted by the Delph-IN group. Until is has been moved and cleaned up, the corresponding links can be found below:

Obtaining PET

Under https://pet.opendfki.de/repos/pet/main, you will find the current development version of PET, stored in a Subversion repository for versioned source code management. This repository also provides an issue tracker: report:1 for filing bug reports and feature requests and the Wiki you are currently reading.

To obtain the latest version, make sure you have a subversion client installed, then issue the following command:

svn co https://pet.opendfki.de/repos/pet/main

There are no official binary distributions of PET, and users should expect to compile their own executable files from source (see the next section). Ubuntu users, however, can also obtain a compiled version of PET from the [http://cl.naist.jp/~eric-n/ubuntu-nlp/ Ubuntu NLP Repository].

Compiling and Installing PET

Current PET development is exclusively carried out on Linux (x86.32) environments, hence most (reasonably) recent Linux distributions should work well. PET ports for Solaris (sparc, using gcc) and Windows (x86.32, using either CygWin or Borland C++) used to be supported, and in principle any platform for which a suitable C++ compiler is available (and for which external libraries used by PET exist) should allow successful compilation. Your mileage may vary.

In order to compile PET with complete functionality, a number of external packages (Library Dependencies) need to be installed; in general, see the documentation for each of these packages, but some coarse instructions on versions that are known to work are available from the [wiki:"Library Dependencies"] page. Compiling without some of these packages should also be possible (giving up, for example, UniCode support, [incr tsdb()] integration, or the embedded MRS code), although these configurations have not been tested for quite some time. See ./configure --help for a list of all configuration options.

PET uses the GNU build system, making it easy to configure and install the package. Note that if you've checked out a PET branch from the Subversion repository, you have to run autoreconf -i once (requiring the autoconf and automake packages), in order to generate the necessary build files (this is neither needed for the source tarball distribution nor after svn update). Finally, you minimally have to execute the following commands: {{{ ./configure make make install # (as root) }}}

The README file and the ./configure --help command give detailed instructions on how to configure and compile PET.

As of December 2006, a patch is necessary in order to use the PET svn repository version with the latest version of LKB. See the following thread in the developers mailing list: http://lists.delph-in.net/archive/developers/2006/000691.html

Compiling a grammar

One needs to preprocess the grammar files (for example english.tdl for the ERG grammar) to be used with pet: flop english.tdl This command generates the compiled grammar english.grm.

Running PET

The PET software has been used in a range of projects (and one commercial product), using grammars of several languages. There is a relatively large number of options and run-time parameters that allow customization of PET behavior to various tasks. Maybe the biggest factor of variation is in (a) how input to the cheap parser is prepared for PET-internal processing and in (b) what form analysis results are output (or returned to the caller) after parsing; these are discussed on separate PetInput and PetOutput pages, respectively. Many other aspects of PET run-time behavior can be controlled using command-line options (see the PetOptions page), given to the flop or cheap binaries upon invocation, and grammar-specific settings (see the PetParameters page), supplied in TDL syntax as part of each grammar. Since [https://pet.opendfki.de/browser/pet/main?rev=498 revision 498 in the main branch] PET employs a logging framework for configurable log output, which is described in PetLogging. Finally, when using PET as a processing client to the [incr tsdb()] profiler, some of the options and parameters are controlled from within the [incr tsdb()] environment.

For an ongoing discussion on a PET API, cf. FeforPetApi.

Tips and Tricks

Some notes on robust parsing? with PET: unknown word handling, memory limits and so on.

History

PET was originally developed by UlrichCallmeier at DFKI GmbH and Saarland University, and some of its design is documented in his 2001 MSc thesis. The software subsequently served to build a commercial email auto response product (by YY Technologies, Mountain View, CA), ported to Windows NT, generally `hardened' (eliminating memory leakage, increasing robustness to exceptional situations, et al.), and extended in functionality and interfaces (including UniCode support, unknown word support, server and API library modes, lattice input, and initial MRS support); most of this work was done by Ulrich with help from StephanOepen and BerndKiefer (of DFKI). As part of the EU-funded Deep Thought project, Ulrich and Stephan later added support for subsumption-based ambiguity factoring (giving a significant improvement in parsing efficiency for long inputs), facilities to rank alternate parses according to a statistical (Maximum Entropy) parse selection model (which, typically, one would obtain using the Redwoods tools and a hand-constructed treebank), and the ability to compile in the (Common-Lisp) MRS code base also used in the LKB, thus enabling output of (R)MRSs in various standard formats.

Towards the end of 2003, Ulrich retired from active PET development, and Bernd has since been the main developer (with occasional help from others, specifically Frederik Fouvry and Stephan). PET has seen a range of substantial additions in functionality since, including the ability to add (leaf) types at run-time, output fragmentary analysis hypotheses in case of parse failures, and an XML-based input format that generalizes the lattice-oriented YY input mode.

Yi Zhang (Saarland University) added the ability to do selective unpacking?, greatly decreasing the memory consumption for n-best parsing. Bart Cramer added the possibility to constrain the search space by using a PCFG-guided pruning of tasks, on the chart cell level. Under https://pet.opendfki.de/repos/pet/main/resources/gm-training you will find code and instructions to train a generative model for chart pruning.

Last modified 7 years ago Last modified on 10/22/10 16:56:09