These functions read and write machine learning example files and
convert examples into final data forms. The memory representation for
each example is a 4-tuple (or list) of the format: (id, class, features,
extra). id is a string, class is an int (-1 or +1 for binary) and
features is a dictionary of int:float -pairs, where the int is the
feature id and the float is the feature value. Extra is a dictionary of
String:String pairs, for additional information about the examples.
|
|
|
|
|
removeDuplicates(examples)
removes all but one of the examples that have the same class and
identical feature vectors |
source code
|
|
|
|
|
|
|
|
|
|
|
writeExamples(examples,
filename,
commentLines=None) |
source code
|
|
|
writePredictions(predictions,
exampleFileName) |
source code
|
|
|
|
|
|
|
makeCorpusDivision(corpusElements,
fraction=0.5,
seed=0) |
source code
|
|
|
|
|
makeExampleDivision(examples,
fraction=0.5) |
source code
|
|
|
|
|
|
|
|
|
|
|
divideExampleFile(exampleFileName,
division,
outputDir) |
source code
|
|
|
|