Getting Starting with Apache openNLP

By Fahad Usman

Last time, I showed you how to get started with Stanford core openNLP. This tutorial is about Apache openNLP. Here are some of the core features of openNLP:

Features of OpenNLP

  • Named Entity Recognition (NER) − Using NER,  you can extract names of locations, people etc. in a given text.

  • Summarise − summarise Paragraphs, articles, documents or their collection in NLP.

  • Searching − Search using a given string and also extract its synonyms, even though the given word is altered or misspelled.

  • Tagging (POS) − Divide the text into various grammatical elements for further analysis.

  • Translation − Translate one language into another.

  • Information grouping − Group textual information in the content of the document, just like Parts of speech.

  • Natural Language Generation − It is used for generating information from a database and automating the information reports such as weather analysis or medical reports.

  • Feedback Analysis − Do you collect survey verbatim about product or services? Analyse how well the service or the product is doing.

  • Speech recognition − openNLP has some builtin features for this requirement.

Here is what you need to do: Tools of the trade:
  1. Eclipse (IDE for Java Developers)
  2. JDK (I am using the latest JDK 11)
So download these two tools and install as normal. Once done, fireup your eclipse and click File -> New -> Project Give it a name and hit finish

“OpenNLP supports the most common NLP tasks, such as tokenisationsentence segmentationpart-of-speech taggingnamed entity extractionchunkingparsinglanguage detection and coreference resolution.”

Select Java Project and give it a name and hit finish

 

Now right click on the project and select Configure -> Convert to Maven Project Let it do its thing and it will open up the POM.xml file. Add the following lines to POM.xml  

OpenNLP Tools Dependency

To use the OpenNLP Tools define the following dependency:
<dependency>
  <groupId>org.apache.opennlp</groupId>
  <artifactId>opennlp-tools</artifactId>
  <version>1.9.0</version>
</dependency>

OpenNLP UIMA Annotators Dependency

To use the OpenNLP UIMA Annotators define the following dependency:
<dependency>
  <groupId>org.apache.opennlp</groupId>
  <artifactId>opennlp-uima</artifactId>
  <version>1.9.0</version>
</dependency>
Hit Save. Now head to opennlp site and download pre-built model bin files. I download one (en-sent.bin) just to test. Download and save it in your working directory in a new folder called

OpenNLP_models

Now head back to your eclipse. Right click on the src folder  and select New -> package and create a new package in the src directory then right click on this select New -> class and create a new class named “SentenceDetection_RE” and paste the following code:  
package openNLP;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;  

import opennlp.tools.sentdetect.SentenceDetectorME; 
import opennlp.tools.sentdetect.SentenceModel; 
import opennlp.tools.util.Span;

public class SentenceDetection_RE {

	public static void main(String[] args) throws FileNotFoundException {

	      String paragraph = "Hi. How are you? Welcome to FahadUsman.com. " 
	    	         + "I provide free tutorials on various technologies"; 
	    	       
	    	      //Loading sentence detector model 
	    	      InputStream inputStream = new FileInputStream("/Users/Fahad/eclipse-workspace/openNLP/OpenNLP_models/en-sent.bin"); 
	    	      SentenceModel model = null;
				try {
					model = new SentenceModel(inputStream);
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				} 
	    	       
	    	      //Instantiating the SentenceDetectorME class 
	    	      SentenceDetectorME detector = new SentenceDetectorME(model);  
	    	       
	    	      //Detecting the position of the sentences in the raw text 
	    	      Span spans[] = detector.sentPosDetect(paragraph); 
	    	       
	    	      //Printing the spans of the sentences in the paragraph 
	    	      for (Span span : spans)         
	    	         System.out.println(span); 

	}

}
Once done. Run it as Java Application and you should see the following output in the console:

That’s it! Enjoy and good luck making your own custom built models! 🙂

Leave a Reply

Close Menu