Configuration Branch

This page is used mainly for developement of configurations system for PET. Some issues mentioned here can serve as documentation for users and developers.

General ideas

  • We probably need a list of systems on which PET is built and used.
    • Debian GNU/Linux Etch (testing), gcc 4.1.2
    • Fedora 5 and 6 on i386 and i686 machines (gcc 4)

Library dependencies

  • CppUnit - used for testing. This should be optional.
  • Log4cxx - port of log4j to c++, this is flexible and configurable logging utility that shoud replace fprintf(stderr,...). Optional. Important: Currently there is a bug in log4cxx that causes interactions with autotools used to build PET. It is described on The solution is to use unofficial version of log4cxx available on If you install it in non-standard location do not forget to set LD_LIBRARY_PATH accordingly. Gcc 4.1.2 issues warnings with that version of log4cxx, you can suppress them using compiler with option -fno-strict-aliasing. (Just configure PET with this option, e.g. ./configure CPPFLAGS=-fno-strict-aliasing --with-log4cxx=/home/tomek/usr/lo4cxx-0.10.0/)

You can use options of ./configure script to provide the path to above libraries. See ./configure --help for more information.

Configuration system

The configuration system consists of following files: Configuration.cpp, Configuration.h, ConfigurationInternal.h, ConfigurationTest.cpp. You can generate documentation from source code by running make doc.

Issues in the Configuration system

  • Should we put configuration system into separate namespace? - No.
  • How should we change global variables to configuration options? Where to use CallbackOption<T>, HandledOption<T> and ReferenceOption<T>?
  • Can/Should we implement HandledOption<T> and ReferenceOption<T> in terms of CallbackOption<T>?
  • Descriptions of options are not supported now. - Done, now they are.
  • Only a part of Configuration.h should be visible by user, the rest is in header files only because we use templates. - Done.
  • There are no checks for getting a value of an uninitialized option.

Bernd: Things that we obviously missed when designing the Configuration class (25. Jan 06)

  • having another addOption method where we can specify a default value - Done. Parameter initial was added to both addOption(...) methods, default value is T(), which means calling default constructor for user-defined types, and 0, 0.0, false, etc. for primitive types.
  • setting the value from a string / printing the value as string
    • would make most of the code in options.cpp obsolete (which was intended)
    • needs a (forth and back) conversion function with error handling
    • what if we want specialized conversion functions for certain options, e.g., a three-value option "default/passthrough/delete" that should internally be encoded as 0/1/-1 what means do we provide in Configuration such that we can have a default implementation for all basic types that can be overridden only by some options?
    • Initial version. - Initial support for these features was added, it still lacks full error handling. A field was added to Option<T>. It stores pointer to IConverter<T>, which is base class for different converters between values and strings. There are two kinds of converters:
      • Converter<T> - "default" one. It uses iostream operators << and >> to convert value to and from string. It makes possible conversions like: 1 <-> "1", 2 <-> "2", ..
      • MapConverter<T> - for special cases. With this converter mapping like 0 <-> "default", 1 <-> "passthrough", -1 <-> "delete" is possible.
      • Further converters can be added by by inheriting from IConverter<T>. Look at tests to learn how to use these features.

List of configuration options

Name Type How Module Description Comments
opt_chart_man bool handled lexproc Allow lexical dependency filtering done
opt_comment_passthrough bool handled inpproc Ignore/repeat input classified as comment: starts with '#' or '' changed from int to bool, which influences behaviour of the program (todo)
opt_compute_qc char* handled fs Activate code that collects unification/subsumption failures for quick check computation, contains filename to write results to done
opt_compute_qc_subs bool global in fs.cpp fs Activate failure registration for subsumption done
opt_compute_qc_unif bool global in fs.cpp fs Activate failure registration for unification done
opt_print_failure bool global in fs.cpp fs Log unification/subsumption failures (should be replaced by logging or new/different API functionality) done
opt_default_les bool handled lexproc try to use default lexical entries if no regular entries could be found. Uses POS annotation, if available. done
opt_derivation bool handled tsdb Store derivations in tsdb profile done
opt_filter bool global in parse.cpp parse Use the static rule filter done
opt_fullform_morph bool obsolete NIL (comment out and mark with TODO) done
opt_hyper bool global in parse.cpp parse use hyperactive parsing done
opt_jxchg_dir string handled appl the directory to write parse charts in jxchg format to (should be handled by an API function) done
opt_key int handled grammar what is the key daughter used in parsing? 0: key-driven, 1: l-r, 2: r-l, 3: head-driven done
opt_nqc_... int ignore ignore I'll have to have a closer look at these two todo
opt_nresult int handled parse The number of results to print (should be an argument of an API function) done
opt_nsolutions int global in parse.cpp parse The number of solutions until the parser is stopped, if not in packing mode, done
opt_linebreaks bool handled done (was it done correctly?)
opt_lattice bool global in item.cpp parse is the lattice structure specified in the input used to restrict the search space in parsing done
opt_nsolutions int global in parse.cpp parse The number of solutions until the parser is stopped, if not in packing mode, if in selective unpacking mode and greater zero, the number of best trees that should be unpacked and if zero, exhaustive unpacking is done. (repeated)
opt_nth_meaning int obsolete parse a limit on the number of meanings, which was only used in yy mode todo
opt_mrs char* handled mrs determines if and which kind of MRS output is generated done
opt_online_morph bool handled inpproc use the internal morphology (the regular expression style one) done
opt_packing int global in parse.cpp parse a bit vector of flags: 1:equivalence 2:proactive 4:retroactive packing 8:selective 128:no unpacking done
opt_partial bool handled appl in case of parse failure, find a set of chart edges that covers the chart in a good manner done
opt_rulestatistics bool handled tsdb dump the per-rule statistics to the tsdb database done
opt_server int unused yy run cheap as a server listening to the socket with the given number todo
opt_shaping bool
opt_shrink_mem bool
opt_tok t_id handled done
opt_tsdb int
opt_tsdb_dir string handled done
opt_yy bool
opt_gplevel u_int global in item.cpp parse determine the level of grandparenting used in the models for selective unpacking done
verbosity int
flop-only options
opt_cmi int handled appl print information about morphological processing (different types depending on value) done
opt_expand_all_instances bool handled expand expand all type definitions, except for pseudo types (remark: introduce a boolean variable at the beginning of delta_expand_types and set it with the handled option) done
opt_full_expansion bool handled expand expand the feature structures fully to find possible inconsistencies done
opt_glbdebug bool handled hierarchy print debugging information about glb type introduction (TODO: should be handled by logging) todo
opt_inst_affixes bool global in full-form.cpp are the affixes instances? todo - it's different, it's not handled by old configuration system
opt_minimal bool handled compute minimal fixed arity encoding done
opt_no_sem bool ?? done
opt_pre bool handled perform only the preprocessing stage (set local variable in fn process) done
opt_propagate_status bool
opt_unfill bool


Log4cxx defines several logging levels. They should be used in following way:

Program cannot continue running, i.e. when you normally use fprintf(stderr,...); exit(1);
Serious problem. The error message should be preserved even when running in time-critical conditions.
Minor problems. Messages will not be printed when running in time-critical conditions.
Just information what is going on
Messages used for debugging. E.g. function calls (parameters), values of expressions.

(See also Don't Use System.out.println!)

Pet defines some utilities in files logging.cpp logging.h loggingTest.cpp. You can generate documentation from source code by running make doc. To use log4cxx with Pet you need a configuration file, you can find an example in svn repository, it is called logging.conf.

Please do not use log4cxx's functions and macros for logging but stick to macros LOG, LOG_ERROR and LOG_FATAL defined in logging.h. Please note that if you compile Pet without log4cxx, LOG_ERROR and LOG_FATAL will be substituted by fprintfs, whereas LOG will not be used at all. This behavior allows you to increase performance of the program. A macro LOG_ONLY was defined for convenience. Its argument is inserted in place of macro call iff Pet was compiled with log4cxx, so LOG_ONLY(doSomething()); is equivalent to



Log messages with priority INFO, ERROR and FATAL:

  LOG(myLogger, Level::INFO, "This is info message with parameter: %d", parameter);

  LOG_ERROR(myLogger, "Error: no configuration file");

  LOG_FATAL(myLogger, "Cannot open input file");

Suppose we have following piece of code in a program:

void printData(FILE *f, Data *d) {
  fprintf(f, "[");
  for(int i = 0; i < d->nItems; i++)
    fprintf(f, "%d, ", d->items[i]);
  fprintf(f, "]");

void doSomething() {
  fprintf(ferr, "value of myData is: ");
  printData(ferr, myData);

with Pet's new logging system is becomes:

void printData(IPrintfHandler *iph, Data *d) {
  pbprintf(iph, "[");
  for(int i = 0; i < d->nItems; i++)
    pbprintf(iph, "%d, ", d->items[i]);
  pbprintf(iph, "]");

void doSomething() {
  LOG_ONLY( PrintfBuffer pb );
  LOG_ONLY( pbprintf(pb, "value of myData is: ") );
  LOG_ONLY( printData(pb, myData) );
  LOG(myLogger, Level::INFO, "%s", pb.getContents());

Please note that although getContents() return char*, we must pass "%s" as the third argument of LOG. This is also the case with LOG_FATAL and LOG_ERROR. If you forget about this, the program will not compile without log4cxx.


In future we might need separate page on testing.

To run tests use make check. Tests are built and run by automake.

Last modified 15 years ago Last modified on 03/26/07 11:19:30