4.2. A Python3K interface to SVMLightTK: flink.pytk

The flink.pytk module provides a python wrapper to the SVM-Light-TK tool by Alessandro Moschitti.

The module provides high level classes to handle SVM-Light-TK models directly from Python, as well as low level functions to evaluate tree kernel products, calculate the norm of a TK example, and more.

For learning and classification, the module provides two alternative interfaces: either a direct call to the corresponding functions in SVM-Light-TK C layer, or the execution of the svm_learn and svm_classify programs in a separate process.

The former option requires less resources and should be faster, but since there are still some glitches in the implementation, the default solution is the latter. This behaviour can be changed by setting the value of the configuration property pytk_use_native (see flink.config and flink.utils.props) to True.

The module raises a RuntimeError when loading if a native library for the present architecture is not found.

class Example(line)[source]

Very general class to handle SVM-Light-TK examples.

The constructor parameter line is an example line from an SVM-Light-TK datafile or model.

The example string is parsed into the following public fields:

  • label_string: the label of the example (it is not parsed as a double value to allow the class to handle compact multi-class files in which labels are actually class labels instead of numeric values);
  • trees: the list of trees found in the example, in order of appearance;
  • vectors: the list of vectors found in the example, in order of appearance.

trees and vectors are just lists of strings, no further processing of the input data is carried out.

class Model(path_to_model)[source]

Handler class for SVM-Light-TK models.

path_to_model is the path to a valid SVM-Light-TK model file.

get_kernel_params()[source]

Return an instance of KernelParams with the relevant parameters scanned from the model.

get_factories()[source]

Return the pair of classes to be used to handle, respectively, fragment generation and fragment indexing for the kernel type used to learn the model.

get_frag_module()[source]

Return the pair of classes to be used to handle, respectively, fragment generation and fragment indexing for the kernel type used to learn the model.

support_vectors()[source]

Scan a model, and for each support vector return a pair (x,y), where x is the label of the support vector and y is the associated data.

get_example_norm(data)[source]

Return the norm of the tree described by data according to the parameters of the model.

class KernelParams[source]

Facility class to handle the most relevant parameters of an SVM-Light-TK model.

Public fields:

  • kernel_name: name of the kernel function used to learn a model (supported kernels: linear, poly, STK and PTK);
  • decay_lambda: value of the “lambda” decay factor;
  • decay_mu: value of the “mu” decay factor;
  • normalize: True if the kernel is normalized, False otherwise;
  • poly_degree: degree of the polynomial kernel.
get_learn_parameters(kernel_type='linear', cost_factor=1, decay_mu=0.4, poly_degree=1, decay_lambda=0.4, normalize=True)[source]

Generate command line flags for svm_learn according to the given parameters.

kernel_type is a the name of the kernel to use. Accepted values are: STK (for the Syntactic Tree Kernel), PTK (for the Partial Tree Kernel), poly for a polynomial kernel and linear for the linear kernel.

cost_factor is the cost factor by which training errors on positive examples outweigh errors on negative examples.

decay_lambda and decay_mu are the decay factors for STK (lambda) and PTK (both lambda and mu) learning.

poly_degree is the degree of the polynomial kernel.

If normalize is set to True, then kernel output is normalized between 0 and 1.

The function returns a string with the appropriate flags to be passed to flink.pytk.learn().

parse_parameters(modelfile)[source]

Parse model parameters from the model identified by the path modelfile and return a :class:KernelParams object.

parse_examples(datafile, ignore_non_vectors=True)[source]

A generator of examples/support vectors in a data file/model for tree-like data.

For each line in datafile containing a valid example (i.e. non containing model metadata) yield an Example object.

If ignore_non_vectors is True, then also yield non-example lines as unformatted strings.

parse_linear_examples(datafile)[source]

Parse a linear data model datafile.

For each non-metadata line, return a pair (x, y) where x is the label of the example (as a string) and y is the vector of linear features (as a string as well).

norm(strnode, ker_par)[source]

Calculate the norm of the tree encoded by the string strnode according to the kernel parameters in the KernelParams instance ker_par.

count_fragments(strnode, ker_par)[source]

Count the fragments in the tree encoded by the string strnode according to the kernel parameters in the KernelParams instance ker_par.

TK(tree_a, tree_b, ker_par)[source]

Calculate the tree kernel product between the trees encoded by the strings tree_a and tree_b according to the kernel parameters in the KernelParams instance ker_par.

learn(data, model, params, stdoutfile=None, stderrfile=None)[source]

Python interface to svm_learn.

data is the training file, model is the name of the output model and params is the string of command line options and flags for svm_learn.

If stdoutfile is not None, then redirect standard output to that file. Otherwise, redirect it to model + ".stdout".

If stderrfile is not None, then redirect standard error to that file. Otherwise, redirect it to model + ".stderr".

classify(data, model, predictions, stdoutfile=None, stderrfile=None)[source]

Python interface to svm_classify.

data is the test file, model is the model to be used for classification and predictions is the output file of classifier decisions.

If stdoutfile is not None, then redirect standard output to that file. Otherwise, redirect it to predictions + ".stdout".

If stderrfile is not None, then redirect standard error to that file. Otherwise, redirect it to predictions + ".stderr".

get_tk_model_norm(model)[source]

Reads the gradient norm of the separating hyperplane from the standard out file produced while learning the model model.

get_factories(kernel_name)[source]

Returns the pair of classes to be used to handle, respectively, fragment generation and fragment indexing for the given kernel type.

path_to_svm_classify()[source]

Return the path to the executable for svm_classify.

path_to_svm_learn()[source]

Return the path to the executable for svm_learn.

4.2.1. Mining relevant fragments in the kernel space: flink.pytk.miners

Pytk comes with several algorithms for fragment mining:

  • a greedy mining algorithm, flink.pytk.miners.greedy_miner()
  • an algorithm to generate all the fragments in a tree up to a given depth, flink.pytk.miners.full_miner()
  • an algorithm to generate a given number of random fragments, flink.pytk.miners.random_miner()

Of the three, only the former is actually available through the high level interface provided by flink.activities.mine, but users can easily write their own activities or applications by directly accessing these facilities.

greedy_miner(str model, str dictionary, double threshold, int min_frequency)

Apply the greedy mining algorithm described in this paper and store the most relevant fragments discovered in the TK model model in the dictionary file dictionary.

If dictionary is None, then the dictionary file is created in:

out_dict_path = model + ".model"

threshold and min_frequency are parameters of the mining algorithm. threshold is used to calcolate the minimum relevance of a fragment to be included in the dictionary, according to the formula:

min_relevance = max_base_frag_relevance / threshold

where max_base_frag_relevance is the relevance of the heaviest of the base fragments encoded in a model. For the definition of what a base fragment in and for more details on the mining algorithm, please refer to this paper.

4.2.2. Fragment generation and indexing

Note

The API for adding support for new kernels is by and large not documented at the moment.

The package flink.pytk.defs contains definition of abstract classes and interfaces that can be used to implement support for new tree kernel families.

To add support for a new kernel, the programmer must provide:

  • an implementation of the abstract class flink.pytk.defs.Fragment.Fragment providing the machinery to generate the fragment space of the target kernel.

    If the name of the new kernel is <kernel_name>, the class implementing the Fragment interface should be called <kernel_name>_frag, and belong to the module flink.pytk.fragments.<kernel_name>_frag;

  • an implementation of the abstract class flink.pytk.defs.FragIdx.FragIdx providing facilities for fragment indexing.

    If the name of the new kernel is <kernel_name>, the class implementing the FragIdx interface should be called <kernel_name>_idx, and belong to the module flink.pytk.fragments.<kernel_name>_idx;

Then, flink.pytk should also be edited so as to recognize the newly added kernel.

The package flink.pytk.fragments contains the actual implementation of these abstract facilities for the STK and PTK kernels.