Ivelina Nikolova
2011-12-09 09:39:50 UTC
Dear list members,
I am interested in using Generalized Expectation classification with
Mallet and that is why I went through the examples on the webpage
http://mallet.cs.umass.edu/ge-classification.php
The quick startup went well and my data was successfully classified,
but when I tried the GE MaxEnt I get the following error:
---------------------------------
Training vectors loaded from baseball-hockey-constraints.train.vectors.unlabeled
Testing vectors loaded from baseball-hockey-constraints.train.vectors
Exception in thread "main" java.lang.RuntimeException: ladies
and
gentlemen
boys
girls
lend
me
....
Training and testing alphabets don't match!
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:270)
----------------------------------
that's how I imported my training and test set. The test set is actually
the train set, before removing its labels.
bin/mallet import-dir --input train/* --output baseball-hockey-constraints.train
bin/vectors2vectors --input baseball-hockey-constraints.train \
--output baseball-hockey-constr.train.unlabeled --hide-target
And this is how I run the calssifier:
bin/mallet train-classifier \
--training-file baseball-hockey-constr.train.unlabeled \
--testing-file baseball-hockey-constraints.train \
--trainer "MaxEntGETrainer,gaussianPriorVariance=0.1,constraintsFile=\"test/constraints_baseball_hockey\"" \
--report test:accuracy
Could you please help me resolving this?
Are the train and test set supposed to have the same alphabet? Can't I
run a test on random test set where some of the words occurring in the
train may not appear?
Thank you very much in advance!
Ivelina Nikolova
I am interested in using Generalized Expectation classification with
Mallet and that is why I went through the examples on the webpage
http://mallet.cs.umass.edu/ge-classification.php
The quick startup went well and my data was successfully classified,
but when I tried the GE MaxEnt I get the following error:
---------------------------------
Training vectors loaded from baseball-hockey-constraints.train.vectors.unlabeled
Testing vectors loaded from baseball-hockey-constraints.train.vectors
Exception in thread "main" java.lang.RuntimeException: ladies
and
gentlemen
boys
girls
lend
me
....
Training and testing alphabets don't match!
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:270)
----------------------------------
that's how I imported my training and test set. The test set is actually
the train set, before removing its labels.
bin/mallet import-dir --input train/* --output baseball-hockey-constraints.train
bin/vectors2vectors --input baseball-hockey-constraints.train \
--output baseball-hockey-constr.train.unlabeled --hide-target
And this is how I run the calssifier:
bin/mallet train-classifier \
--training-file baseball-hockey-constr.train.unlabeled \
--testing-file baseball-hockey-constraints.train \
--trainer "MaxEntGETrainer,gaussianPriorVariance=0.1,constraintsFile=\"test/constraints_baseball_hockey\"" \
--report test:accuracy
Could you please help me resolving this?
Are the train and test set supposed to have the same alphabet? Can't I
run a test on random test set where some of the words occurring in the
train may not appear?
Thank you very much in advance!
Ivelina Nikolova