Package TEES :: Package ExampleBuilders :: Package FeatureBuilders :: Module FeatureBuilder :: Class FeatureBuilder
[hide private]

Class FeatureBuilder

source code

Known Subclasses:

Multiple example builders might make use of the same features. A feature builder object can be used in different example builders that require the same feature set.

Instance Methods [hide private]
 
__init__(self, featureSet, style=None) source code
 
getEntityType(self, entity) source code
 
getPOSSuperType(self, pos) source code
 
getTokenAnnotatedType(self, token, sentenceGraph)
Multiple entities may have the same head token.
source code
 
getTokenFeatures(self, token, sentenceGraph, text=True, POS=True, annotatedType=True, stem=False, ontology=True)
Token features are features describing an isolated word token.
source code
 
normalizeFeatureVector(self)
Some machine learning tasks require feature values to be normalized to range [0,1].
source code
 
setFeature(self, name, value=1)
Add a feature to the feature vector.
source code
 
setFeatureVector(self, features, entity1=None, entity2=None)
When the feature builder builds features, they are put to this feature vector.
source code
 
setTag(self, tag='') source code
Method Details [hide private]

__init__(self, featureSet, style=None)
(Constructor)

source code 
Parameters:
  • featureSet (IdSet) - feature ids

getTokenAnnotatedType(self, token, sentenceGraph)

source code 

Multiple entities may have the same head token. This returns a list of the types of all entities whose head token this token is. If the FeatureBuilder.maximum flag is set, the list is truncated to a length of two, otherwise to a length of one. This is done because when token features (to which the annotated type belongs to) are combined into other features, a large number of annotated type features can lead to an exponential increase in the number of features.

getTokenFeatures(self, token, sentenceGraph, text=True, POS=True, annotatedType=True, stem=False, ontology=True)

source code 

Token features are features describing an isolated word token. These subfeatures are often merged into such features like n-grams. This method produces and caches a set of feature names for a token in the sentenceGraph sentence. The various flags can be used to choose which attributes will be included in the feature name list.

Parameters:
  • token (cElementTree.Element) - a word token
  • sentenceGraph (SentenceGraph) - the sentence to which the token belongs
  • text (boolean)
  • POS (boolean)
  • stem (boolean)
  • annotatedType (boolean)
  • ontology (boolean)

normalizeFeatureVector(self)

source code 

Some machine learning tasks require feature values to be normalized to range [0,1]. The range is defined as the difference of the largest and smallest feature value in the current feature vector. If this method is used, it should be called as the last step after generating all features.

setFeature(self, name, value=1)

source code 

Add a feature to the feature vector. If the feature already exists, its current value is replaced with the new value. All features are prefixed with FeatureBuilder.tag.

Parameters:
  • name (str)
  • value (float)

setFeatureVector(self, features, entity1=None, entity2=None)

source code 

When the feature builder builds features, they are put to this feature vector.

Parameters:
  • features (dictionary) - a reference to the feature vector
  • entity1 (cElementTree.Element) - an entity used by trigger or edge feature builders
  • entity2 (cElementTree.Element) - an entity used by trigger or edge feature builders