Package TEES :: Package ExampleBuilders :: Package FeatureBuilders :: Module FeatureBuilder :: Class FeatureBuilder

Class FeatureBuilder

Known Subclasses:

Multiple example builders might make use of the same features. A feature builder object can be used in different example builders that require the same feature set.

Instance Methods

[hide private]

__init__(self, featureSet, style=None)

source code

getEntityType(self, entity)

source code

getPOSSuperType(self, pos)

source code

getTokenAnnotatedType(self, token, sentenceGraph)
Multiple entities may have the same head token.

source code

getTokenFeatures(self, token, sentenceGraph, text=True, POS=True, annotatedType=True, stem=False, ontology=True)
Token features are features describing an isolated word token.

source code

normalizeFeatureVector(self)
Some machine learning tasks require feature values to be normalized to range [0,1].

source code

setFeature(self, name, value=1)
Add a feature to the feature vector.

source code

setFeatureVector(self, features, entity1=None, entity2=None)
When the feature builder builds features, they are put to this feature vector.

source code

setTag(self, tag='') source code

Method Details

[hide private]

init(self, featureSet, style=None)
(Constructor)

source code

Parameters:

featureSet (IdSet) - feature ids

getTokenAnnotatedType(self, token, sentenceGraph)

source code

Multiple entities may have the same head token. This returns a list of the types of all entities whose head token this token is. If the FeatureBuilder.maximum flag is set, the list is truncated to a length of two, otherwise to a length of one. This is done because when token features (to which the annotated type belongs to) are combined into other features, a large number of annotated type features can lead to an exponential increase in the number of features.

getTokenFeatures(self, token, sentenceGraph, text=True, POS=True, annotatedType=True, stem=False, ontology=True)

source code

Token features are features describing an isolated word token. These subfeatures are often merged into such features like n-grams. This method produces and caches a set of feature names for a token in the sentenceGraph sentence. The various flags can be used to choose which attributes will be included in the feature name list.

Parameters:

token (cElementTree.Element) - a word token
sentenceGraph (SentenceGraph) - the sentence to which the token belongs
text (boolean)
POS (boolean)
stem (boolean)
annotatedType (boolean)
ontology (boolean)

normalizeFeatureVector(self)

source code

Some machine learning tasks require feature values to be normalized to range [0,1]. The range is defined as the difference of the largest and smallest feature value in the current feature vector. If this method is used, it should be called as the last step after generating all features.

setFeature(self, name, value=1)

source code

Add a feature to the feature vector. If the feature already exists, its current value is replaced with the new value. All features are prefixed with FeatureBuilder.tag.

Parameters:

name (str)
value (float)

setFeatureVector(self, features, entity1=None, entity2=None)

source code

When the feature builder builds features, they are put to this feature vector.

Parameters:

features (dictionary) - a reference to the feature vector
entity1 (cElementTree.Element) - an entity used by trigger or edge feature builders
entity2 (cElementTree.Element) - an entity used by trigger or edge feature builders

Class FeatureBuilder

__init__(self, featureSet, style=None) (Constructor)

getTokenAnnotatedType(self, token, sentenceGraph)

getTokenFeatures(self, token, sentenceGraph, text=True, POS=True, annotatedType=True, stem=False, ontology=True)

normalizeFeatureVector(self)

setFeature(self, name, value=1)

setFeatureVector(self, features, entity1=None, entity2=None)

init(self, featureSet, style=None)
(Constructor)