Discussion:
Using Mallet command tool to execute HDA topic model
Michael O'Brien
2016-02-15 14:38:47 UTC
Permalink
Forgive the very basis question but on Windows how can I get a HDA topic
model using the Mallet command line i.e not writing my own Java code?

I tried the --help but there isn't a parameter I can pass to bin\mallet
topic-model to say I want LDA or HDA used

Any pointers?

Michael
Antonio Jesús Hernández Blanco
2016-02-15 15:03:15 UTC
Permalink
Hi Michael,

LDA is by default in mallet when used train-topics option. Now, for HLDA
use cc.mallet.topics.tui.HierarchicalLDATUI, to know its options, from
command line: mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Post by Michael O'Brien
Forgive the very basis question but on Windows how can I get a HDA
topic model using the Mallet command line i.e not writing my own Java
code?
I tried the --help but there isn't a parameter I can pass to
bin\mallet topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
Michael O'Brien
2016-02-15 15:26:51 UTC
Permalink
Hola Antonio,

Thanks for responding.

Can I just confirm you get the following

*C:\Mallet>bin\mallet cc.mallet.topics.tui.HierarchicalLDATUI --help*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet instances
(one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*

* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary and
import*
* docs*
*Include --help with any option for more information*

Because it looks similar to the general info details, so doesn't tell me
anything specific about running HDA

*C:\Mallet>bin\mallet -info*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet instances
(one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*

* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary and
import*
* docs*
*Include --help with any option for more information*

On Mon, 15 Feb 2016 at 15:11 Antonio Jesús Hernández Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
LDA is by default in mallet when used train-topics option. Now, for HLDA
use cc.mallet.topics.tui.HierarchicalLDATUI, to know its options, from
command line: mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Forgive the very basis question but on Windows how can I get a HDA topic
model using the Mallet command line i.e not writing my own Java code?
I tried the --help but there isn't a parameter I can pass to bin\mallet
topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
Antonio Jesús Hernández Blanco
2016-02-15 15:40:43 UTC
Permalink
Hi Michael,

You forgot to put the "run" before
cc.mallet.topics.tui.mallet.HierarchicalLDATUI

mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Hierarchical LDA with a fixed tree depth.
--help TRUE|FALSE
Print this command line option usage information. Give argument of
TRUE for longer documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that
the text is interpreted without modification, so unlike some other Java
code options, you need to include any necessary 'new's when creating
objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILENAME
The filename from which to read the list of training instances. Use -
for stdin. The instances must be FeatureSequence or
FeatureSequenceWithBigrams, not FeatureVector
Default is null
--testing FILENAME
The filename from which to read the list of instances for held-out
likelihood calculation. Use - for stdin. The instances must be
FeatureSequence or FeatureSequenceWithBigrams, not FeatureVector
Default is null
--output-state FILENAME
The filename in which to write the Gibbs sampling state after at the
end of the iterations. By default this is null, indicating that no file
will be written.
Default is null
--random-seed INTEGER
.....
Post by Michael O'Brien
Hola Antonio,
Thanks for responding.
Can I just confirm you get the following
*C:\Mallet>bin\mallet cc.mallet.topics.tui.HierarchicalLDATUI --help*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into
mallet instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
*
*
* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and
validation portions*
* bulk-load for big input files, efficiently prune vocabulary
and import*
* docs*
*Include --help with any option for more information*
Because it looks similar to the general info details, so doesn't tell
me anything specific about running HDA
/*C:\Mallet>bin\mallet -info*/
/Mallet 2.0 commands:/
/ import-dir load the contents of a directory into mallet
instances (one/
/per file)/
/import-file load a single file into mallet instances (one per
line)/
/import-svmlight load a single SVMLight format data file into mallet
instance/
/s (one per line)/
/ info get information about Mallet instances/
/train-classifier train a classifier from Mallet data files/
/classify-dir classify data from a single file with a saved
classifier/
/classify-file classify the contents of a directory with a saved
classifier/
/
/
/classify-svmlight classify data from a single file in SVMLight format/
/train-topics train a topic model from Mallet data files/
/infer-topics use a trained topic model to infer topics for new
documents/
/evaluate-topics estimate the probability of new documents given a
trained mo//del/
/ prune remove features based on frequency or information gain/
/ split divide data into testing, training, and validation
portions/
/ bulk-load for big input files, efficiently prune vocabulary
and import/
/ docs/
/Include --help with any option for more information/
On Mon, 15 Feb 2016 at 15:11 Antonio Jesús Hernández Blanco
Hi Michael,
LDA is by default in mallet when used train-topics option. Now,
for HLDA use cc.mallet.topics.tui.HierarchicalLDATUI, to know its
options, from command line: mallet run
cc.mallet.topics.tui.HierarchicalLDATUI --help
Post by Michael O'Brien
Forgive the very basis question but on Windows how can I get a
HDA topic model using the Mallet command line i.e not writing my
own Java code?
I tried the --help but there isn't a parameter I can pass to
bin\mallet topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
Michael O'Brien
2016-02-15 15:49:02 UTC
Permalink
Ah, My Mistake, Final question does the "Mallet File" created using mallet
import-dir --input \data\johndoediary --outputjohndoediary.mallet \
--keep-sequence

work with the HDA algorithm just like it did for LDA


On Mon, 15 Feb 2016 at 15:40 Antonio Jesús Hernández Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
You forgot to put the "run" before
cc.mallet.topics.tui.mallet.HierarchicalLDATUI
mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Hierarchical LDA with a fixed tree depth.
--help TRUE|FALSE
Print this command line option usage information. Give argument of TRUE
for longer documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that the
text is interpreted without modification, so unlike some other Java code
options, you need to include any necessary 'new's when creating objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILENAME
The filename from which to read the list of training instances. Use -
for stdin. The instances must be FeatureSequence or
FeatureSequenceWithBigrams, not FeatureVector
Default is null
--testing FILENAME
The filename from which to read the list of instances for held-out
likelihood calculation. Use - for stdin. The instances must be
FeatureSequence or FeatureSequenceWithBigrams, not FeatureVector
Default is null
--output-state FILENAME
The filename in which to write the Gibbs sampling state after at the end
of the iterations. By default this is null, indicating that no file will
be written.
Default is null
--random-seed INTEGER
.....
Hola Antonio,
Thanks for responding.
Can I just confirm you get the following
*C:\Mallet>bin\mallet cc.mallet.topics.tui.HierarchicalLDATUI --help*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary and
import*
* docs*
*Include --help with any option for more information*
Because it looks similar to the general info details, so doesn't tell me
anything specific about running HDA
*C:\Mallet>bin\mallet -info*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary and
import*
* docs*
*Include --help with any option for more information*
On Mon, 15 Feb 2016 at 15:11 Antonio Jesús Hernández Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
LDA is by default in mallet when used train-topics option. Now, for HLDA
use cc.mallet.topics.tui.HierarchicalLDATUI, to know its options, from
command line: mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Forgive the very basis question but on Windows how can I get a HDA topic
model using the Mallet command line i.e not writing my own Java code?
I tried the --help but there isn't a parameter I can pass to bin\mallet
topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
Antonio Jesús Hernández Blanco
2016-02-15 15:58:54 UTC
Permalink
Yes, the same file "johndoediary.mallet", is used for LDA and hLDA.
Post by Michael O'Brien
Ah, My Mistake, Final question does the "Mallet File" created using
mallet import-dir --input \data\johndoediary
--outputjohndoediary.mallet \ --keep-sequence
work with the HDA algorithm just like it did for LDA
On Mon, 15 Feb 2016 at 15:40 Antonio Jesús Hernández Blanco
Hi Michael,
You forgot to put the "run" before
cc.mallet.topics.tui.mallet.HierarchicalLDATUI
mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Hierarchical LDA with a fixed tree depth.
--help TRUE|FALSE
Print this command line option usage information. Give argument
of TRUE for longer documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note
that the text is interpreted without modification, so unlike some
other Java code options, you need to include any necessary 'new's
when creating objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILENAME
The filename from which to read the list of training instances.
Use - for stdin. The instances must be FeatureSequence or
FeatureSequenceWithBigrams, not FeatureVector
Default is null
--testing FILENAME
The filename from which to read the list of instances for
held-out likelihood calculation. Use - for stdin. The instances
must be FeatureSequence or FeatureSequenceWithBigrams, not
FeatureVector
Default is null
--output-state FILENAME
The filename in which to write the Gibbs sampling state after at
the end of the iterations. By default this is null, indicating
that no file will be written.
Default is null
--random-seed INTEGER
.....
Post by Michael O'Brien
Hola Antonio,
Thanks for responding.
Can I just confirm you get the following
*C:\Mallet>bin\mallet cc.mallet.topics.tui.HierarchicalLDATUI --help*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances
(one per line)*
* import-svmlight load a single SVMLight format data file into
mallet instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a
saved classifier*
* classify-file classify the contents of a directory with a
saved classifier*
*
*
* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics
for new documents*
* evaluate-topics estimate the probability of new documents
given a trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and
validation portions*
* bulk-load for big input files, efficiently prune
vocabulary and import*
* docs*
*Include --help with any option for more information*
Because it looks similar to the general info details, so doesn't
tell me anything specific about running HDA
/*C:\Mallet>bin\mallet -info*/
/Mallet 2.0 commands:/
/import-dir load the contents of a directory into mallet
instances (one/
/per file)/
/import-file load a single file into mallet instances (one
per line)/
/import-svmlight load a single SVMLight format data file into
mallet instance/
/s (one per line)/
/info get information about Mallet instances/
/train-classifier train a classifier from Mallet data files/
/classify-dir classify data from a single file with a saved
classifier/
/classify-file classify the contents of a directory with a
saved classifier/
/
/
/classify-svmlight classify data from a single file in SVMLight format/
/train-topics train a topic model from Mallet data files/
/infer-topics use a trained topic model to infer topics for
new documents/
/evaluate-topics estimate the probability of new documents
given a trained mo//del/
/prune remove features based on frequency or
information gain/
/split divide data into testing, training, and
validation portions/
/bulk-load for big input files, efficiently prune
vocabulary and import/
/ docs/
/Include --help with any option for more information/
On Mon, 15 Feb 2016 at 15:11 Antonio Jesús Hernández Blanco
Hi Michael,
LDA is by default in mallet when used train-topics option.
Now, for HLDA use cc.mallet.topics.tui.HierarchicalLDATUI, to
know its options, from command line: mallet run
cc.mallet.topics.tui.HierarchicalLDATUI --help
Post by Michael O'Brien
Forgive the very basis question but on Windows how can I get
a HDA topic model using the Mallet command line i.e not
writing my own Java code?
I tried the --help but there isn't a parameter I can pass to
bin\mallet topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
Michael O'Brien
2016-02-15 16:09:02 UTC
Permalink
Great, thank you

On Mon, 15 Feb 2016, 15:58 Antonio Jesús Hernández Blanco <
Post by Antonio Jesús Hernández Blanco
Yes, the same file "johndoediary.mallet", is used for LDA and hLDA.
Ah, My Mistake, Final question does the "Mallet File" created using mallet
import-dir --input \data\johndoediary --outputjohndoediary.mallet \
--keep-sequence
work with the HDA algorithm just like it did for LDA
On Mon, 15 Feb 2016 at 15:40 Antonio Jesús Hernández Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
You forgot to put the "run" before
cc.mallet.topics.tui.mallet.HierarchicalLDATUI
mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Hierarchical LDA with a fixed tree depth.
--help TRUE|FALSE
Print this command line option usage information. Give argument of
TRUE for longer documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that
the text is interpreted without modification, so unlike some other Java
code options, you need to include any necessary 'new's when creating
objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILENAME
The filename from which to read the list of training instances. Use -
for stdin. The instances must be FeatureSequence or
FeatureSequenceWithBigrams, not FeatureVector
Default is null
--testing FILENAME
The filename from which to read the list of instances for held-out
likelihood calculation. Use - for stdin. The instances must be
FeatureSequence or FeatureSequenceWithBigrams, not FeatureVector
Default is null
--output-state FILENAME
The filename in which to write the Gibbs sampling state after at the
end of the iterations. By default this is null, indicating that no file
will be written.
Default is null
--random-seed INTEGER
.....
Hola Antonio,
Thanks for responding.
Can I just confirm you get the following
*C:\Mallet>bin\mallet cc.mallet.topics.tui.HierarchicalLDATUI --help*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary
and import*
* docs*
*Include --help with any option for more information*
Because it looks similar to the general info details, so doesn't tell me
anything specific about running HDA
*C:\Mallet>bin\mallet -info*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight format*
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary
and import*
* docs*
*Include --help with any option for more information*
On Mon, 15 Feb 2016 at 15:11 Antonio Jesús Hernández Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
LDA is by default in mallet when used train-topics option. Now, for HLDA
use cc.mallet.topics.tui.HierarchicalLDATUI, to know its options, from
command line: mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Forgive the very basis question but on Windows how can I get a HDA topic
model using the Mallet command line i.e not writing my own Java code?
I tried the --help but there isn't a parameter I can pass to bin\mallet
topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
--
Antonio Jesús Hernández Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
Michael Muller
2016-02-15 20:07:55 UTC
Permalink
Dear All,

We have experimented with H LDA in Mallet. The feature-set is not as rich
as for train-topics. That being said, there are interesting capabilities.


What we usually say (in a Windows environment) is:

bin\mallet hlda --input <INPUT>.mallet --num-top-words 10
--num-levels 3 --show-topics-interval 50 --output-state
topic-state-<OUTPUT>.txt --eta 0.1 --gamma 1.0 --alpha 10.0

(courtesy of Abbas Ghadrigolestani).

Of course, you will need to specify something in place of "<INPUT>" and
"<OUTPUT>"; and you will need to have transformed your text file into a
.mallet file, in a prior step using bin\mallet import file ...

You may also want to adjust the values of eta, gamma, and alpha (although
the solutions that I have run have seemed to be relatively insensitive to
changes of a degree of magnitude in each).

I was initially confused by the format of the output. In the above
example, there are three levels. I am using my own words in the next
sentence: The output contains a single "root" topic, a number of
"daughter" subtopics, and a number of "granddaughter" subsubtopics. The
level of each topic is indicated by indentation. The parenthood of each
sub(sub)topic is indicated by sequential position.

Good luck,
--michael
-----
Michael Muller, PhD, IBM Research, Cambridge MA USA






From: mallet-dev-***@cs.umass.edu
To: mallet-***@cs.umass.edu
Date: 02/15/2016 12:00 PM
Subject: mallet-dev Digest, Vol 7, Issue 7
Sent by: "mallet-dev" <mallet-dev-***@cs.umass.edu>



Send mallet-dev mailing list submissions to
mallet-***@cs.umass.edu

To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
or, via email, send a message with subject or body 'help' to
mallet-dev-***@cs.umass.edu

You can reach the person managing the list at
mallet-dev-***@cs.umass.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of mallet-dev digest..."


Today's Topics:

1. Re: Using Mallet command tool to execute HDA topic model
(Michael O'Brien)


----------------------------------------------------------------------

Message: 1
Date: Mon, 15 Feb 2016 16:09:02 +0000
From: "Michael O'Brien" <***@gmail.com>
To: ***@gmail.com, mallet-***@cs.umass.edu
Subject: Re: Using Mallet command tool to execute HDA topic model
Message-ID:
<CAPG8FQjJe226XoZV0Qquqx3_7_86U57QyY-***@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Great, thank you

On Mon, 15 Feb 2016, 15:58 Antonio Jes?s Hern?ndez Blanco <
Post by Antonio Jesús Hernández Blanco
Yes, the same file "johndoediary.mallet", is used for LDA and hLDA.
Ah, My Mistake, Final question does the "Mallet File" created using
mallet
Post by Antonio Jesús Hernández Blanco
import-dir --input \data\johndoediary --outputjohndoediary.mallet \
--keep-sequence
work with the HDA algorithm just like it did for LDA
On Mon, 15 Feb 2016 at 15:40 Antonio Jes?s Hern?ndez Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
You forgot to put the "run" before
cc.mallet.topics.tui.mallet.HierarchicalLDATUI
mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Hierarchical LDA with a fixed tree depth.
--help TRUE|FALSE
Print this command line option usage information. Give argument of
TRUE for longer documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that
the text is interpreted without modification, so unlike some other Java
code options, you need to include any necessary 'new's when creating
objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILENAME
The filename from which to read the list of training instances. Use
-
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
for stdin. The instances must be FeatureSequence or
FeatureSequenceWithBigrams, not FeatureVector
Default is null
--testing FILENAME
The filename from which to read the list of instances for held-out
likelihood calculation. Use - for stdin. The instances must be
FeatureSequence or FeatureSequenceWithBigrams, not FeatureVector
Default is null
--output-state FILENAME
The filename in which to write the Gibbs sampling state after at the
end of the iterations. By default this is null, indicating that no
file
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
will be written.
Default is null
--random-seed INTEGER
.....
Hola Antonio,
Thanks for responding.
Can I just confirm you get the following
*C:\Mallet>bin\mallet cc.mallet.topics.tui.HierarchicalLDATUI --help*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into
mallet
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight
format*
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information
gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary
and import*
* docs*
*Include --help with any option for more information*
Because it looks similar to the general info details, so doesn't tell
me
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
anything specific about running HDA
*C:\Mallet>bin\mallet -info*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into
mallet
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight
format*
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information
gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary
and import*
* docs*
*Include --help with any option for more information*
On Mon, 15 Feb 2016 at 15:11 Antonio Jes?s Hern?ndez Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
LDA is by default in mallet when used train-topics option. Now, for
HLDA
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
use cc.mallet.topics.tui.HierarchicalLDATUI, to know its options, from
command line: mallet run cc.mallet.topics.tui.HierarchicalLDATUI
--help
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Forgive the very basis question but on Windows how can I get a HDA
topic
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
model using the Mallet command line i.e not writing my own Java code?
I tried the --help but there isn't a parameter I can pass to
bin\mallet
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
mallet-dev mailing
--
Antonio Jes?s Hern?ndez Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jes?s Hern?ndez Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
--
Antonio Jes?s Hern?ndez Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
https://mailman.cs.umass.edu/pipermail/mallet-dev/attachments/20160215/243fadc3/attachment.html
------------------------------

Subject: Digest Footer

_______________________________________________
mallet-dev mailing list
mallet-***@cs.umass.edu
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev


------------------------------

End of mallet-dev Digest, Vol 7, Issue 7
****************************************
Michael O'Brien
2016-02-16 16:35:15 UTC
Permalink
Hi Michael,

I'm still a bit confused by the output, firstly in a break from LDA output
the output file is space delimited not tab delimited

For portability I've substituted the space for "-" in the output below

When I ran HDA with -num-levels 3 I got the following output, I left out
the 1st entry because I thought the word could confuse the discussion

11-1-0-1-course-2
11-1-0-2--retail-2
11-1-0-3-management-2

Working left to right
The 1st number value ranges from min 2 to max 95 but not sequentially
The 2nd number value is from (1,3,5,7,14,24,32)
The 3rd number is zero
The 4th value is sequentially incrementing from 0
The word appears in my corpus but is repeated elsewhere in the output
The number after the text appears to be the level number (0,1,2)

If I run -num-levels 10 I got
449 309 308 298 297 29 28 11 10 0 1 course 0
449 309 308 298 297 29 28 11 10 0 2 retail 5
449 309 308 298 297 29 28 11 10 0 3 management 4

Could you shed some light on the output?
Post by Michael Muller
Dear All,
We have experimented with H LDA in Mallet. The feature-set is not as rich
as for train-topics. That being said, there are interesting capabilities.
bin\mallet*hlda* --input <INPUT>.mallet --num-top-words 10
--num-levels 3 --show-topics-interval 50 --output-state
topic-state-<OUTPUT>.txt --eta 0.1 --gamma 1.0 --alpha 10.0
(courtesy of Abbas Ghadrigolestani).
Of course, you will need to specify something in place of "<INPUT>" and
"<OUTPUT>"; and you will need to have transformed your text file into a
.mallet file, in a prior step using bin\mallet import file ...
You may also want to adjust the values of eta, gamma, and alpha (although
the solutions that I have run have seemed to be relatively insensitive to
changes of a degree of magnitude in each).
I was initially confused by the format of the output. In the above
example, there are three levels. I am using my own words in the next
sentence: The output contains a single "root" topic, a number of
"daughter" subtopics, and a number of "granddaughter" subsubtopics. The
level of each topic is indicated by indentation. The parenthood of each
sub(sub)topic is indicated by sequential position.
Good luck,
--michael
-----
Michael Muller, PhD, IBM Research, Cambridge MA USA
Date: 02/15/2016 12:00 PM
Subject: mallet-dev Digest, Vol 7, Issue 7
------------------------------
Send mallet-dev mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of mallet-dev digest..."
1. Re: Using Mallet command tool to execute HDA topic model
(Michael O'Brien)
----------------------------------------------------------------------
Message: 1
Date: Mon, 15 Feb 2016 16:09:02 +0000
Subject: Re: Using Mallet command tool to execute HDA topic model
<
Content-Type: text/plain; charset="utf-8"
Great, thank you
On Mon, 15 Feb 2016, 15:58 Antonio Jes?s Hern?ndez Blanco <
Post by Antonio Jesús Hernández Blanco
Yes, the same file "johndoediary.mallet", is used for LDA and hLDA.
Ah, My Mistake, Final question does the "Mallet File" created using
mallet
Post by Antonio Jesús Hernández Blanco
import-dir --input \data\johndoediary --outputjohndoediary.mallet \
--keep-sequence
work with the HDA algorithm just like it did for LDA
On Mon, 15 Feb 2016 at 15:40 Antonio Jes?s Hern?ndez Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
You forgot to put the "run" before
cc.mallet.topics.tui.mallet.HierarchicalLDATUI
mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Hierarchical LDA with a fixed tree depth.
--help TRUE|FALSE
Print this command line option usage information. Give argument of
TRUE for longer documentation
Default is false
--prefix-code 'JAVA CODE'
Java code you want run before any other interpreted code. Note that
the text is interpreted without modification, so unlike some other Java
code options, you need to include any necessary 'new's when creating
objects.
Default is null
--config FILE
Read command option values from a file
Default is null
--input FILENAME
The filename from which to read the list of training instances. Use -
for stdin. The instances must be FeatureSequence or
FeatureSequenceWithBigrams, not FeatureVector
Default is null
--testing FILENAME
The filename from which to read the list of instances for held-out
likelihood calculation. Use - for stdin. The instances must be
FeatureSequence or FeatureSequenceWithBigrams, not FeatureVector
Default is null
--output-state FILENAME
The filename in which to write the Gibbs sampling state after at the
end of the iterations. By default this is null, indicating that no file
will be written.
Default is null
--random-seed INTEGER
.....
Hola Antonio,
Thanks for responding.
Can I just confirm you get the following
*C:\Mallet>bin\mallet cc.mallet.topics.tui.HierarchicalLDATUI --help*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight
format*
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information
gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary
and import*
* docs*
*Include --help with any option for more information*
Because it looks similar to the general info details, so doesn't tell me
anything specific about running HDA
*C:\Mallet>bin\mallet -info*
*Mallet 2.0 commands:*
* import-dir load the contents of a directory into mallet
instances (one*
*per file)*
* import-file load a single file into mallet instances (one per
line)*
* import-svmlight load a single SVMLight format data file into mallet
instance*
*s (one per line)*
* info get information about Mallet instances*
* train-classifier train a classifier from Mallet data files*
* classify-dir classify data from a single file with a saved
classifier*
* classify-file classify the contents of a directory with a saved
classifier*
* classify-svmlight classify data from a single file in SVMLight
format*
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
* train-topics train a topic model from Mallet data files*
* infer-topics use a trained topic model to infer topics for new
documents*
* evaluate-topics estimate the probability of new documents given a
trained mo**del*
* prune remove features based on frequency or information
gain*
* split divide data into testing, training, and validation
portions*
* bulk-load for big input files, efficiently prune vocabulary
and import*
* docs*
*Include --help with any option for more information*
On Mon, 15 Feb 2016 at 15:11 Antonio Jes?s Hern?ndez Blanco <
Post by Antonio Jesús Hernández Blanco
Hi Michael,
LDA is by default in mallet when used train-topics option. Now, for
HLDA
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
use cc.mallet.topics.tui.HierarchicalLDATUI, to know its options, from
command line: mallet run cc.mallet.topics.tui.HierarchicalLDATUI --help
Forgive the very basis question but on Windows how can I get a HDA
topic
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
model using the Mallet command line i.e not writing my own Java code?
I tried the --help but there isn't a parameter I can pass to bin\mallet
topic-model to say I want LDA or HDA used
Any pointers?
Michael
_______________________________________________
mailman.cs.umass.edu/mailman/listinfo/mallet-dev
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
Post by Antonio Jesús Hernández Blanco
--
Antonio Jes?s Hern?ndez Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
--
Antonio Jes?s Hern?ndez Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
--
Antonio Jes?s Hern?ndez Blanco
Doctoral Student
Computer Science
Department of Software and Computer Systems
University of Alicante
Mov. (+34) 622 64 90 50
(+593) 982 21 69 73
Tel. (+34) 965 90 00 00 (Ext. 2961)
Fax: (+34) 965 90 93 26
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
https://mailman.cs.umass.edu/pipermail/mallet-dev/attachments/20160215/243fadc3/attachment.html
------------------------------
Subject: Digest Footer
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
------------------------------
End of mallet-dev Digest, Vol 7, Issue 7
****************************************
_______________________________________________
mallet-dev mailing list
https://mailman.cs.umass.edu/mailman/listinfo/mallet-dev
Loading...