Getting started with Stanford coreNLP

Stanford CoreNLP

This is one of the best open source projects for natural language processing. I will show you how to set it up on your computer to get started and building your own models!

By Fahad Usman

I stumbled upon Stanford coreNLP open source project and started reading about it. It gives you some builtin models that you could use for natural language processing such as tokenization, sentence extraction and named entity relationships etc. I wanted to play with this and R provides a wrapper package called cleanNLP. To use this package you need to setup rJava and python on your machine. But you don’t have the ability to build your custom models using R. 

Hence you need to turn to Java. This was a painful experience as I spent ages trying to get things in place to start using this project. 

So in this post, I will show you how you could set it up on your machine and start using it straight away!

Tools of the trade:

  1. Eclipse (IDE for Java Developers)
  2. JDK (I am using the latest JDK 11)

So download these two tools and install as normal.

“Stanford coreNLP, a modern, regularly updated package, with the overall highest quality text analytics

Once done, fireup your eclipse and click File -> New -> Project

Give it a name and hit finish

No need to create a module file/name so click don’t create. Now it will open up your newely created project. 

right click on project name and select Configure -> Convert to Maven Project

fill up the details and click finish:

Add the following lines after the build tag:
<dependencies>
<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.9.2</version>
</dependency>
<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.9.2</version>
    <classifier>models</classifier>
</dependency>
</dependencies>
and click save. Now right click on the src folder and click New -> Package.

give it a name and click finish as shown above. Now this is where the magic happens. You will see stanford coreNLP libraries being downloaded automatically and added to your Maven Dependencies.

Now time to test if everything is fine. Right click on your package and create a new class and add the following code:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;

/** app for testing if Maven distribution is working properly */

public class StanfordCoreNLPEnglishTestApp {
	
	public static void main(String[] args) throws IOException, ClassNotFoundException
    {
        String[] englishArgs = new String[]{"-file", "sample-english.txt", "-outputFormat", "text", "-props", "english.properties"};
        StanfordCoreNLP.main(englishArgs);
    }
	
	

}
I also created two txt files by right clicking the
project name -> New -> File 
as shown:
I added this to sample-english.txt file:
President Barack Obama was born in Hawaii.  He was elected in 2008.
and this to english.properties file:

annotators = tokenize,cleanxml,ssplit,pos,lemma,ner,parse,depparse,coref,natlog,openie

 
Save all this. Right click on your java file i.e. Run As Java application. It will create the sample-english.txt.out file in your project home directory which you can open to see in any editor. My final project structure looks like this:

Enjoy and good luck 🙂

Leave a Reply

Close Menu