FLinK ver. 1.1 - documentation

Author: Daniele Pighin

FLinK stands for “a framework for the linearization of kernel functions”.

The software has been developed as part of my PhD to support our research on the linearization of kernel functions, whose main results so far are published in the following papers authored by me with my co-advisor Alessandro Moschitti:

(My thesis contains more in-depth information, but I didn’t make it public yet because I am still fixing things here and there).

Although it could possibly be estended to support a large variety of kernel functions, at the moment the framework only supports the Syntactic Tree Kernel [Collins and Duffy, 2001] and the Partial Tree Kernel [Moschitti, 2006].

SVM-Light-TK by Alessandro Moschitti is used for learning and classification in Tree Kernel spaces.

For more information about the linearization process, please refer to the aforementioned papers. For those already familiar with the work, the following section provides a very coarse summary of the most relevant information.

Note

This software is still actively used for research purposes.

As such, there are some wild hacks in the code that were necessary to extend its capabilities in one or way on another. Furthermore, when having to choose between efficiency and generality I generally opted for the latter, so that future extensions would possibly be easier.

Linearization crash-course

We call linearization the process by which a kernel space is projected onto a considerably lower dimensional space (for large data sets, the dimensionality reduction can be of 40 orders of magnitude or more) where only the most relevant features are accounted for.

Since we focus on tree kernel (TK) functions, the features are actually fragments, i.e. substructures of some larger tree generated according to the syntactic constraints enforced by the definition of a specific tree kernel function.

Unlike other feature selection techniques, our approach has the non-negligible side effect of making the most relevant features observable, since the most relevant fragments are explicitly stored in a dictionary.

Main activities of the linearization process

The linearization of a binary TK classifier can be decomposed into several activities:

  • Instead of exploring the fragment space encoded by a whole dataset, we prefer to only explore the fragments encoded by the most relevant trees.

    To this end, we use support vector machines and tree kernel functions to learn a TK model, thus only selecting the most relevant examples and assigning them a weight.

    This stage is called kernel space learning (KSL).

  • We mine the fragment space encoded by the model according to a norm- preservation principle. In other words, we try to select a minimum number of fragments so that the norm of the separating hyperplane’s gradient in the original TK space is largely preserved. According to statistical learning theory, this condition is sufficient to largely preserve the accuracy of the original TK classifier.

    To do so, we generate fragments in a small to large fashion, using a greedy heuristic that uses gradient components to direct an extremely narrow beam search in the huge tree kernel space. The fragments that satisfy the required relevance constraints, are explicitly stored in a fragment dictionary.

    This stage we call kernel space mining (KSM).

  • The information about the most relevant fragments stored in the dictionary can be used to linearize (or decode) the dataset and to project each tree onto a lower dimensional space, where each component marks the presence or absence of one of the fragments.

    This can be achieved by traversing the data structure implementing the dictionary with constraints coming from each input tree, and by observing which fragments are actually matched in the tree.

    This stage goes under the name of linear space generation (LSG).

  • Finally, the linearized data can be used to carry out learning and classification in the linear space.

All these activities as well as other supporting functions are directly accessible to programmers through the flink.activities module.

Linearization architectures

These activities can be combined and rearranged in many ways to achieve different ends and produce different results.

  • During LSG we can linearize the training data originally used to learn the model in the TK space, and then use it to learn a new optimization problem in the linear space.

    This configuration is called OPT.

  • To make TK learning faster, during KSL we can split the training data into several chunks, learn a set of models, mine each model independently and finally recombine all the relevant fragments into a single dictionary. If we linearize the training data and optimize a classifier in the linear space, the lower accuracy of the more local TK models will not affect the final accuracy too much.

    This configuration is called Split.

  • To avoid carrying out a new optimization in the linear space, we can linearize the support vectors in the original TK model and use this linearized model to classify a linearized test set.

    This configuration is called LIN.

The flink.linearize module presents the users with command line interfaces for linearizing TK classifiers according to any of these three models.

Releases

1.1 (February 2011)
  • Added support for Python 3.2.x
  • Fixed bug that would prevent dictionary merging from working
1.0 (August 2011)
Added full support for the PTK. Migrated core functionalities to Cython for better performance
0.9 (May 2010)
Full support for the STK kernel, partial support for PTK

Indices and tables