Apache OpenNLP N-gram model creation example

Just simple code example of building N-gram model with Apache OpenNLP.


The code:

package com.denismigol.examples.nlp;

import opennlp.tools.ngram.NGramModel;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.StringList;

/**
 * @author Denis Migol
 */
public class OpenNlpNGramDemo {
    public static void main(String[] args) {
        String text = "This is an example text for n-gram";
        System.out.println(text);

        StringList tokens = new StringList(WhitespaceTokenizer.INSTANCE.tokenize(text));
        System.out.println("Tokens: " + tokens);

        NGramModel nGramModel = new NGramModel();
        nGramModel.add(tokens, 2, 3);

        System.out.println("Total ngrams: " + nGramModel.numberOfGrams());
        for (StringList ngram : nGramModel) {
            System.out.println(nGramModel.getCount(ngram) + " - " + ngram);
        }
    }
}

Output:

This is an example text for n-gram
Tokens: [This,is,an,example,text,for,n-gram]
Total ngrams: 11
1 - [text,for,n-gram]
1 - [for,n-gram]
1 - [This,is,an]
1 - [is,an]
1 - [example,text]
1 - [an,example]
1 - [This,is]
1 - [text,for]
1 - [is,an,example]
1 - [an,example,text]
1 - [example,text,for]