wiki:Road Map

Version 1 (modified by beki01, 11 years ago) (diff)


PET road map, resulting from discussion at the Paris meeting of DELPH-IN

First of all, the list of all completed items:

  • converge main & cm into one new main branch
  • code style recommendations fixed
  • poll for functionality, esp. in input part, has been done
  • no need for a ticketing system other than trac

This is a list of possible topics to discuss in this session. Please add topics freely that you feel are missing.

  • after that: refactoring the code
    • have a sub-group working out a list of deprecated things
    • get rid of ECL as much as possible
      • remove input modes that rely on ECL, complete MRS treatment as far as necessary
    • remove as much of the load on "item" as possible
    • encourage developers to remove obsolete code and merge similar functionality where possible
    • refactoring into new code style, once it is fixed
    • documentation of (especially new) modules,
      • add scripts to illustrate use
      • necessary resources (data to train / models, training scripts, etc)
    • copyright notice updates ?
  • GUI: which one, desired functionality?
  • XML-RPC / API enhancments
  • put the road map and TODO into the PET trac
  • implement Lohuizen unifier to enable multi thread operations
  • release plans / procedures (LOGON?)
  • include C++ REPP implementation (RebeccaDridan has working code --- not 100% LISP compatible yet --- differences not necessarily bugs)


  • generation
    • MRS equivalence test
    • MRS rewriting (for trigger rules), also used in idiom checking
  • unique output separation for standard-io server mode (requested many times in the past)

PET API proposal, worked out at the Fefor meeting of DELPH-IN

The idea is to replace the cheap executable (with 'pipe' over stderr...) by a library. The library in turn then could be wrapped in an executable (if someone really still need this), or better in a XML-RPC or other socket server. The library could also be called directly from Java (via JNI) or scripting languages like Python.

The main motivation is to have a more flexible configuration. Currently, the cheap binary has to be restarted (including time-consuming grammar loading) each time the number of readings to return is to be changed, for example. Using the new API, this option could be modified for each parse individually.

Moreover, the different configuration sources (command line, pseudo TDL config file, partly true TDL definitions in the grammar) should be unified.

Generally, no new features will be added, just the implicit interface will be made explicit and streamlined. For example, it will not be possible to exchange the grammar during runtime, as this would require fundamental changes in the code (and is not considered necessary).

Elements of the API

  • basic configuration like which grammar image to load, cheap command line options that cannot be modified at runtime
  • cheap command line options that can be modified at runtime
  • MRS globals (shared with LKB, readable also from file via pseudo TDL parser; cf. JerezTop discussion)
  • parse from preprocessor (SMAF) input, raw text input obsolete?
  • retrieve result chart in different formats
    • MRS
    • MRS-XML
    • RMRS
    • tree
    • HTML (?)
    • typed FS (FS-XML?), with feature path to sub-FS
    • fragments
  • access to type hierarchy
    • sub/supertype
    • type subsumption
    • type name and code
    • FS prototype (FS-XML?), but no feature structure unification

Feature requests

  • configurable start symbol for parsing (FrancisBond)
  • change lexicon during runtime (not possible for the built-in lexicon, but probably for lexDB)
  • change model during runtime