I am trying to use the open-nlp Ruby gem to access the Java OpenNLP processor through RJB (Ruby Java Bridge). I am not a Java programmer, so I don't know how to solve this. Any recommendations regarding resolving it, debugging it, collecting more information, etc. would be appreciated.
The environment is Windows 8, Ruby 1.9.3p448, Rails 4.0.0, JDK 1.7.0-40 x586. Gems are rjb 1.4.8 and louismullie/open-nlp 0.1.4. For the record, this file runs in JRuby but I experience other problems in that environment and would prefer to stay native Ruby for now.
In brief, the open-nlp gem is failing with java.lang.NullPointerException and Ruby error method missing. I hesitate to say why this is happening because I don't know, but it appears to me that the dynamic loading of the Jars file opennlp.tools.postag.POSTaggerME#1b5080a cannot be accessed, perhaps because OpenNLP::Bindings::Utils.tagWithArrayList isn't being set up correctly. OpenNLP::Bindings is Ruby. Utils, and its methods, are Java. And Utils is supposedly the "default" Jars and Class files, which may be important.
What am I doing wrong, here? Thanks!
The code I am running is copied straight out of github/open-nlp. My copy of the code is:
class OpennlpTryer
# From https://github.com/louismullie/open-nlp
# Hints: Dir.pwd; File.expand_path('../../Gemfile', __FILE__);
# Load the module
require 'open-nlp'
#require 'jruby-jars'
# Alias "write" to "print" to monkeypatch the NoMethod write error
java_import java.io.PrintStream
class PrintStream
java_alias(:write, :print, [java.lang.String])
# Display path of jruby-jars jars...
puts JRubyJars.core_jar_path # => path to jruby-core-VERSION.jar
puts JRubyJars.stdlib_jar_path # => path to jruby-stdlib-VERSION.jar
# Set an alternative path to look for the JAR files.
# Default is gem's bin folder.
# OpenNLP.jar_path = '/path_to_jars/'
OpenNLP.jar_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
puts OpenNLP.jar_path
# Set an alternative path to look for the model files.
# Default is gem's bin folder.
# OpenNLP.model_path = '/path_to_models/'
OpenNLP.model_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
puts OpenNLP.model_path
# Pass some alternative arguments to the Java VM.
# Default is ['-Xms512M', '-Xmx1024M'].
# OpenNLP.jvm_args = ['-option1', '-option2']
OpenNLP.jvm_args = ['-Xms512M', '-Xmx1024M']
# Redirect VM output to log.txt
OpenNLP.log_file = 'log.txt'
# Set default models for a language.
# OpenNLP.use :language
OpenNLP.use :english # Make sure this is lower case!!!!
# Simple tokenizer
sent = "The death of the poet was kept from his poems."
tokenizer = OpenNLP::SimpleTokenizer.new
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
puts "Tokenize #{tokens}"
# Maximum entropy tokenizer, chunker and POS tagger
chunker = OpenNLP::ChunkerME.new
tokenizer = OpenNLP::TokenizerME.new
tagger = OpenNLP::POSTaggerME.new
sent = "The death of the poet was kept from his poems."
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
puts "Tokenize #{tokens}"
tags = tagger.tag(tokens).to_a
puts "Tags #{tags}"
chunks = chunker.chunk(tokens, tags).to_a
puts "Chunks #{chunks}"
# Abstract Bottom-Up Parser
sent = "The death of the poet was kept from his poems."
parser = OpenNLP::Parser.new
parse = parser.parse(sent)
parse.get_text.should eql sent
parse.get_span.get_start.should eql 0
parse.get_span.get_end.should eql 46
parse.get_child_count.should eql 1
child = parse.get_children[0]
child.text # => "The death of the poet was kept from his poems."
child.get_child_count # => 3
child.get_head_index #=> 5
child.get_type # => "S"
puts "Child: #{child}"
# Maximum Entropy Name Finder*
# puts File.expand_path('.', __FILE__)
text = File.read('./spec/sample.txt').gsub!("\n", "")
tokenizer = OpenNLP::TokenizerME.new
segmenter = OpenNLP::SentenceDetectorME.new
puts "Tokenizer: #{tokenizer}"
puts "Segmenter: #{segmenter}"
ner_models = ['person', 'time', 'money']
ner_finders = ner_models.map do |model|
puts "NER Finders: #{ner_finders}"
sentences = segmenter.sent_detect(text)
puts "Sentences: #{sentences}"
named_entities = []
sentences.each do |sentence|
tokens = tokenizer.tokenize(sentence)
ner_models.each_with_index do |model, i|
finder = ner_finders[i]
name_spans = finder.find(tokens)
name_spans.each do |name_span|
start = name_span.get_start
stop = name_span.get_end-1
slice = tokens[start..stop].to_a
named_entities << [slice, model]
puts "Named Entities: #{named_entities}"
# Loading specific models
# Just pass the name of the model file to the constructor. The gem will search for the file in the OpenNLP.model_path folder.
tokenizer = OpenNLP::TokenizerME.new('en-token.bin')
tagger = OpenNLP::POSTaggerME.new('en-pos-perceptron.bin')
name_finder = OpenNLP::NameFinderME.new('en-ner-person.bin')
# etc.
puts "Tokenizer: #{tokenizer}"
puts "Tagger: #{tagger}"
puts "Name Finder: #{name_finder}"
# Loading specific classes
# You may want to load specific classes from the OpenNLP library that are not loaded by default. The gem provides an API to do this:
# Default base class is opennlp.tools.
# => OpenNLP::SomeClassName
# Here, we specify another base class.
OpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')
# => OpenNLP::SomeOtherClass
The line which is failing is line 73: (tokens == the sentence being processed.)
tags = tagger.tag(tokens).to_a #
tagger.tag calls open-nlp/classes.rb line 13, which is where the error is thrown. The code there is:
class OpenNLP::POSTaggerME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def tag(*args)
OpenNLP::Bindings::Utils.tagWithArrayList(#proxy_inst, args[0]) # <== Line 13
The Ruby error thrown at this point is: `method_missing': unknown exception (NullPointerException). Debugging this, I found the error java.lang.NullPointerException. args[0] is the sentence being processed. #proxy_inst is opennlp.tools.postag.POSTaggerME#1b5080a.
OpenNLP::Bindings sets up the Java environment. For example, it sets up the Jars to be loaded and the classes within those Jars. In line 54, it sets up defaults for RJB, which should set up OpenNLP::Bindings::Utils and its methods as follows:
# Add in Rjb workarounds.
unless RUBY_PLATFORM =~ /java/
self.default_jars << 'utils.jar'
self.default_classes << ['Utils', '']
utils.jar and Utils.java are in the CLASSPATH with the other Jars being loaded. They are being accessed, which is verified because the other Jars throw error messages if they are not present. The CLASSPATH is:
.;C:\Program Files (x86)Java\jdk1.7.0_40\lib;C:\Program Files (x86)Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
The applications Jars are in D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin and, again, if they are not there I get error messages on other Jars. The Jars and Java files in ...\bin include:
Utils.java is as follows:
import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
return nameFinder.find(getStringArray(tokens));
public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
public static String[] getStringArray(ArrayList[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
So, it should define tagWithArrayList and import opennlp.tools.postag.POSTagger. (OBTW, just to try, I changed the incidences of POSTagger to POSTaggerME in this file. It changed nothing...)
The tools Jar file, opennlp-tools-1.5.2-incubating.jar, includes postag/POSTagger and POSTaggerME class files, as expected.
Error messages are:
D:\BitNami\rubystack-1.9.3-12\ruby\bin\ruby.exe -e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb
.;C:\Program Files (x86)\Java\jdk1.7.0_40\lib;C:\Program Files (x86)\Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `method_missing': unknown exception (NullPointerException)
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:73:in `<class:OpennlpTryer>'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
Modified Utils.java:
import java.util.Arrays;
import java.util.Object;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(POSTagger posTagger, Object[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
public static Object[] findWithArrayList(NameFinderME nameFinder, Object[] tokens) {
return nameFinder.find(getStringArray(tokens));
public static Object[] chunkWithArrays(ChunkerME chunker, Object[] tokens, Object[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
public static String[] getStringArray(Object[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
Modified error messages:
Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
Revised error with Utils.java revised to "import java.lang.Object;":
Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
Rescue removed from OpennlpTryer shows error trapped in classes.rb:
Uncaught exception: uninitialized constant OpenNLP::POSTaggerME::ArrayStoreException
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:16:in `rescue in tag'
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
Same error but with all rescues removed so it's "native Ruby"
Uncaught exception: unknown exception
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `method_missing'
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `tag'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
Revised Utils.java:
import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(
System.out.println("Tokens: ("+objectArray.getClass().getSimpleName()+"): \n"+objectArray);
POSTagger posTagger, ArrayList[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
return nameFinder.find(getStringArray(tokens));
public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
public static String[] getStringArray(ArrayList[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
I ran cavaj on Utils.class that I unzipped from util.jar and this is what I found. It differs from Utils.java by quite a bit. Both come installed with the open-nlp 1.4.8 gem. I don't know if this is the root cause of the problem, but this file is the core of where it breaks and we have a major discrepancy. Which should we use?
import java.util.ArrayList;
import java.util.Arrays;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.postag.POSTagger;
public class Utils
public Utils()
public static String[] tagWithArrayList(POSTagger postagger, ArrayList aarraylist[])
return postagger.tag(getStringArray(aarraylist));
public static Object[] findWithArrayList(NameFinderME namefinderme, ArrayList aarraylist[])
return namefinderme.find(getStringArray(aarraylist));
public static Object[] chunkWithArrays(ChunkerME chunkerme, ArrayList aarraylist[], ArrayList aarraylist1[])
return chunkerme.chunk(getStringArray(aarraylist), getStringArray(aarraylist1));
public static String[] getStringArray(ArrayList aarraylist[])
String as[] = (String[])Arrays.copyOf(aarraylist, aarraylist.length, [Ljava/lang/String;);
return as;
Utils.java in use as of 10/07, compiled and compressed into utils.jar:
import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;
// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {
public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
return posTagger.tag(getStringArray(objectArray));
public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
return nameFinder.find(getStringArray(tokens));
public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
return chunker.chunk(getStringArray(tokens), getStringArray(tags));
public static String[] getStringArray(ArrayList[] objectArray) {
String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
return stringArray;
Failures are occurring in BindIt::Binding::load_klass in line 110 here:
# Private function to load classes.
# Doesn't check if initialized.
def load_klass(klass, base, name=nil)
base += '.' unless base == ''
fqcn = "#{base}#{klass}"
name ||= klass
if RUBY_PLATFORM =~ /java/
rb_class = java_import(fqcn)
if name != klass
if rb_class.is_a?(Array)
rb_class = rb_class.first
const_set(name.intern, rb_class)
rb_class = Rjb::import(fqcn) # <== This is line 110
const_set(name.intern, rb_class)
The messages are as follows, however they are inconsistent in terms of the particular method that is identified. Each run may display a different method, any of POSTagger, ChunkerME, or NameFinderME.
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `import': opennlp/tools/namefind/NameFinderME (NoClassDefFoundError)
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `load_klass'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:89:in `block in load_default_classes'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `each'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `load_default_classes'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:56:in `bind'
from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp.rb:14:in `load'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:54:in `<class:OpennlpTryer>'
from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
The interesting point about these errors are that they are originating in OpennlpTryer line 54 which is:
At this point, OpenNLP fires up RJB which uses BindIt to load the jars and classes. This is well before the errors that I was seeing at the beginning of this question. However, I can't help but think it is all related. I really don't understand the inconsistency of these errors at all.
I was able to add the logging function in to Utils.java, compile it after adding in an "import java.io.*" and compress it. However, I pulled it out because of these errors as I didn't know if or not it was involved. I don't think it was. However, because these errors are occurring during load, the method is never called anyway so logging there won't help...
For each of the other jars, the jar is loaded then each class is imported using RJB. Utils is handled differently and is specified as the "default". From what I can tell, Utils.class is executed to load its own classes?
Later update on 10/07:
Here is where I am, I think. First, I have some problem replacing Utils.java, as I described earlier today. That problem probably needs solved before I can install a fix.
Second, I now understand the difference between POSTagger and POSTaggerME because the ME means Maximum Entropy. The test code is trying to call POSTaggerME but it looks to me like Utils.java, as implemented, supports POSTagger. I tried changing the test code to call POSTagger, but it said it couldn't find an initializer. Looking at the source for each of these, and I am guessing here, I think that POSTagger exists for the sole purpose to support POSTaggerME which implements it.
The source is opennlp-tools file opennlp-tools-1.5.2-incubating-sources.jar.
What I don't get is the whole reason for Utils in the first place? Why aren't the jars/classes provided in bindings.rb enough? This feels like a bad monkeypatch. I mean, look what bindings.rb does in the first place:
# Default JARs to load.
self.default_jars = [
# Default namespace.
self.default_namespace = 'opennlp.tools'
# Default classes.
self.default_classes = [
# OpenNLP classes.
['AbstractBottomUpParser', 'opennlp.tools.parser'],
['DocumentCategorizerME', 'opennlp.tools.doccat'],
['ChunkerME', 'opennlp.tools.chunker'],
['DictionaryDetokenizer', 'opennlp.tools.tokenize'],
['NameFinderME', 'opennlp.tools.namefind'],
['Parser', 'opennlp.tools.parser.chunking'],
['Parse', 'opennlp.tools.parser'],
['ParserFactory', 'opennlp.tools.parser'],
['POSTaggerME', 'opennlp.tools.postag'],
['SentenceDetectorME', 'opennlp.tools.sentdetect'],
['SimpleTokenizer', 'opennlp.tools.tokenize'],
['Span', 'opennlp.tools.util'],
['TokenizerME', 'opennlp.tools.tokenize'],
# Generic Java classes.
['FileInputStream', 'java.io'],
['String', 'java.lang'],
['ArrayList', 'java.util']
# Add in Rjb workarounds.
unless RUBY_PLATFORM =~ /java/
self.default_jars << 'utils.jar'
self.default_classes << ['Utils', '']
I ran into the same problem today. I didn't quite understand why the Utils class were being used, so I modified the classes.rb file in the following way:
unless RUBY_PLATFORM =~ /java/
def tag(*args)
#OpenNLP::Bindings::Utils.tagWithArrayList(#proxy_inst, args[0])
In that way I can make the following test to pass:
sent = "The death of the poet was kept from his poems."
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
tags = tagger.tag(tokens).to_a
# => ["prop", "prp", "n", "v-fin", "n", "adj", "prop", "v-fin", "n", "adj", "punc"]
R_G Edit:
I tested that change and it eliminated the error. I am going to have to do more testing to ensure the outcome is what should be expected. However, following that same pattern, I made the following changes in classes.rb as well:
def chunk(tokens, tags)
chunks = #proxy_inst.chunk(tokens, tags)
# chunks = OpenNLP::Bindings::Utils.chunkWithArrays(#proxy_inst, tokens,tags)
chunks.map { |c| c.to_s }
class OpenNLP::NameFinderME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def find(*args)
# OpenNLP::Bindings::Utils.findWithArrayList(#proxy_inst, args[0])
This allowed the entire sample test to execute without failure. I will provide a later update regarding verification of the results.
As it turns out, this answer was key to the desired solution. However, the results were inconsistent as it was corrected. We continued to drill down into it and implemented strong typing during the calls, as specified by RJB. This converts the call to use of the _invoke method where the parameters include the desired method, the strong type, and the additional parameters. Andre's recommendation was key to the solution, so kudos to him. Here is the complete module. It eliminates the need for the Utils.class that was attempting to make these calls but failing. We plan to issue a github pull request for the open-nlp gem to update this module:
require 'open-nlp/base'
class OpenNLP::SentenceDetectorME < OpenNLP::Base; end
class OpenNLP::SimpleTokenizer < OpenNLP::Base; end
class OpenNLP::TokenizerME < OpenNLP::Base; end
class OpenNLP::POSTaggerME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def tag(*args)
#proxy_inst._invoke("tag", "[Ljava.lang.String;", args[0])
class OpenNLP::ChunkerME < OpenNLP::Base
if RUBY_PLATFORM =~ /java/
def chunk(tokens, tags)
if !tokens.is_a?(Array)
tokens = tokens.to_a
tags = tags.to_a
tokens = tokens.to_java(:String)
tags = tags.to_java(:String)
def chunk(tokens, tags)
chunks = #proxy_inst._invoke("chunk", "[Ljava.lang.String;[Ljava.lang.String;", tokens, tags)
chunks.map { |c| c.to_s }
class OpenNLP::Parser < OpenNLP::Base
def parse(text)
tokenizer = OpenNLP::TokenizerME.new
full_span = OpenNLP::Bindings::Span.new(0, text.size)
parse_obj = OpenNLP::Bindings::Parse.new(
text, full_span, "INC", 1, 0)
tokens = tokenizer.tokenize_pos(text)
tokens.each_with_index do |tok,i|
start, stop = tok.get_start, tok.get_end
token = text[start..stop-1]
span = OpenNLP::Bindings::Span.new(start, stop)
parse = OpenNLP::Bindings::Parse.new(text, span, "TK", 0, i)
class OpenNLP::NameFinderME < OpenNLP::Base
unless RUBY_PLATFORM =~ /java/
def find(*args)
#proxy_inst._invoke("find", "[Ljava.lang.String;", args[0])
I don't think you're doing anything wrong at all. You're also not the only one with this problem. It looks like a bug in Utils. Creating an ArrayList[] in Java doesn't make much sense - it's technically legal, but it would be an array of ArrayLists, which a) is just plain odd and b) terrible practice with regard to Java generics, and c) won't cast properly to String[] like the author intends in getStringArray().
Given the way the utility's written and the fact that OpenNLP does, in fact, expect to receive a String[] as input for its tag() method, my best guess is that the original author meant to have Object[] where they have ArrayList[] in the Utils class.
To output to a file in the root of your project directory, try adjusting the logging like this (I added another line for printing the contents of the input array):
try {
File log = new File("log.txt");
FileWriter fileWriter = new FileWriter(log);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write("Tokens ("+objectArray.getClass().getSimpleName()+"): \r\n"+objectArray.toString()+"\r\n");
catch (Exception e) {
I am trying to retrive a userCertificate associated with a domain name from Windows Active Directory but having difficulties by using Java API
for example when I use 'ldapsearch' command tool, I am able to retrieve the certificate as you can see below
ldapsearch -h 192.xx.2.xx -D "CN=Administrator,CN=Users,DC=mmo,DC=co,DC=ca" -w Password -b "CN=rsa0,CN=Users,DC=mmo,DC=co,DC=ca" "userCertificate"
# extended LDIF
# LDAPv3
# base <CN=rsa0,CN=Users,DC=mmo,DC=co,DC=ca> with scope subtree
# filter: (objectclass=*)
# requesting: userCertificate
# rsa0, Users, mmo.co.ca
dn: CN=rsa0,CN=Users,DC=mmo,DC=co,DC=ca
userCertificate:: MIIDbTCCAlWgAwIBAgIEFbvHazANBgkqhkiG9w0BAQsFADBnMQswCQYDVQQG
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
however when I try to use the Java program, I am unable to retrive it, below is the sample java program
package CertStore;
import javax.naming.AuthenticationException;
import javax.naming.AuthenticationNotSupportedException;
import javax.naming.Context;
import javax.naming.NamingException;
import javax.naming.directory.DirContext;
import javax.naming.directory.InitialDirContext;
import javax.security.auth.x500.X500Principal;
import java.security.cert.*;
import java.util.*;
import java.io.*;
class CertStoreTest {
CertStoreTest() {
try {
LDAPCertStoreParameters lcsp =
new LDAPCertStoreParameters("192.xx.2.xx", 389);
String referenceID = "CN=rsa0,CN=Users,DC=bmo,DC=co,DC=ca";
X509CertSelector xcs = new X509CertSelector();
CertStore cs = CertStore.getInstance("LDAP", lcsp);
Collection certificates = cs.getCertificates((CertSelector)xcs);
System.out.println("size: "+ certificates.size());
Iterator certificate = certificates.iterator();
while(certificate.hasNext()) {
} catch(Exception e) {
public static void main(String[] args) {
System.out.println("main() called.");
CertStoreTest test = new CertStoreTest();
When I run this program, I get the size as 0 where I am expecting as 1.
main() called.
size: 0
I also have openldap running on a linux system, and in the above java program if I point to that server and with appropriate domain name information, java is able to pull the certificate associated with that domain name.
Not sure what I am missing when I try to retrive certificate from Windows Active Directory.
Can anyone shed some light on this as I have been stuck for few days now.
I have the pre-trained model like Inception-v3. I want to remove the output layer and use it in image cognition. Here is the example given by tensorflow:
Just like the python framework Keras, it has a method like model.layers.pop(). I tried do it with tensorflow java api. First I tried to use dl4j, but when I imported the keras model, I got an error like this:
2017-06-15 21:15:43 INFO KerasInceptionV3Net:52 - Importing Inception model from data/inception-model.json
2017-06-15 21:15:43 INFO KerasInceptionV3Net:53 - Importing Weights model from data/inception_v3_complete
Exception in thread "main" java.lang.RuntimeException: Unknown exception.
at org.bytedeco.javacpp.hdf5$H5File.allocate(Native Method)
at org.bytedeco.javacpp.hdf5$H5File.<init>(hdf5.java:12713)
at org.deeplearning4j.nn.modelimport.keras.Hdf5Archive.<init>(Hdf5Archive.java:61)
at org.deeplearning4j.nn.modelimport.keras.KerasModel$ModelBuilder.weightsHdf5Filename(KerasModel.java:603)
at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:176)
at edu.usc.irds.dl.dl4j.examples.KerasInceptionV3Net.<init>(KerasInceptionV3Net.java:55)
at edu.usc.irds.dl.dl4j.examples.KerasInceptionV3Net.main(KerasInceptionV3Net.java:108)
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
#000: C:\autotest\HDF5110ReleaseRWDITAR\src\H5F.c line 579 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#001: C:\autotest\HDF5110ReleaseRWDITAR\src\H5Fint.c line 1100 in H5F_open(): unable to open file: time = Thu Jun 15 21:15:44 2017,name = 'data/inception_v3_complete', tent_flags = 0
major: File accessibilty
minor: Unable to open file
#002: C:\autotest\HDF5110ReleaseRWDITAR\src\H5FD.c line 812 in H5FD_open(): open failed
major: Virtual File Layer
minor: Unable to initialize object
#003: C:\autotest\HDF5110ReleaseRWDITAR\src\H5FDsec2.c line 348 in H5FD_sec2_open(): unable to open file: name = 'data/inception_v3_complete', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
major: File accessibilty
minor: Unable to open file
So I went back to tensorflow. I'm going to modify the model in keras and convert the model to tensor. Here is my conversion script:
input_fld = './'
output_node_names_of_input_network = ["pred0"]
write_graph_def_ascii_flag = True
output_node_names_of_final_network = 'output_node'
output_graph_name = 'test2.pb'
from keras.models import load_model
import tensorflow as tf
import os
import os.path as osp
from keras.applications.inception_v3 import InceptionV3
from keras.applications.vgg16 import VGG16
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
output_fld = input_fld + 'tensorflow_model/'
if not os.path.isdir(output_fld):
net_model = InceptionV3(weights='imagenet', include_top=True)
num_output = len(output_node_names_of_input_network)
pred = [None]*num_output
pred_node_names = [None]*num_output
for i in range(num_output):
pred_node_names[i] = output_node_names_of_final_network+str(i)
pred[i] = tf.identity(net_model.output[i], name=pred_node_names[i])
print('output nodes names are: ', pred_node_names)
from keras import backend as K
sess = K.get_session()
if write_graph_def_ascii_flag:
f = 'only_the_graph_def.pb.ascii'
tf.train.write_graph(sess.graph.as_graph_def(), output_fld, f, as_text=True)
print('saved the graph definition in ascii format at: ', osp.join(output_fld, f))
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), pred_node_names)
graph_io.write_graph(constant_graph, output_fld, output_graph_name, as_t ext=False)
print('saved the constant graph (ready for inference) at: ', osp.join(output_fld, output_graph_name))
I got the model as .pb file, but when I put it into the tensor example, The LabelImage example, I got this error:
Exception in thread "main" java.lang.IllegalArgumentException: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
[[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access$100(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:285)
at org.tensorflow.Session$Runner.run(Session.java:235)
at com.dlut.cmh.sheng.LabelImage.executeInceptionGraph(LabelImage.java:98)
at com.dlut.cmh.sheng.LabelImage.main(LabelImage.java:51)
I don't know how to solve this. Can anyone help me? Or you have another way to do this?
The error message you get from the TensorFlow Java API:
Exception in thread "main" java.lang.IllegalArgumentException: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
[[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
suggests that the model is constructed in a way that requires you to feed a boolean value for the tensor named batch_normalization_1/keras_learning_phase.
So, you'd have to include that in your call to run by changing:
try (Session s = new Session(g);
Tensor result = s.runner().feed("input",image).fetch("output").run().get(0)) {
to something like:
try (Session s = new Session(g);
Tensor learning_phase = Tensor.create(false);
Tensor result = s.runner().feed("input", image).feed("batch_normalization_1/keras_learning_phase", learning_phase).fetch("output").run().get(0)) {
The names of nodes you feed and fetch depend on the model, so it's possible that the names of the 'input' and 'output' nodes are different as well.
You might also want to consider using the TensorFlow SavedModel format (see also https://github.com/tensorflow/serving/issues/310#issuecomment-297015251)
Hope that helps
I need to train the Chunker in Opennlp to classify the training data as a noun phrase. How do I proceed? The documentation online does not have an explanation how to do it without the command line, incorporated in a program. It says to use en-chunker.train, but how do you make that file?
EDIT: #Alaye
After running the code you gave in your answer, I get the following error that I cannot fix:
Indexing events using cutoff of 5
Computing event counts... done. 3 events
Dropped event B-NP:[w_2=bos, w_1=bos, w0=He, w1=reckons, w2=., w_1=bosw0=He, w0=Hew1=reckons, t_2=bos, t_1=bos, t0=PRP, t1=VBZ, t2=., t_2=bost_1=bos, t_1=bost0=PRP, t0=PRPt1=VBZ, t1=VBZt2=., t_2=bost_1=bost0=PRP, t_1=bost0=PRPt1=VBZ, t0=PRPt1=VBZt2=., p_2=bos, p_1=bos, p_2=bosp_1=bos, p_1=bost_2=bos, p_1=bost_1=bos, p_1=bost0=PRP, p_1=bost1=VBZ, p_1=bost2=., p_1=bost_2=bost_1=bos, p_1=bost_1=bost0=PRP, p_1=bost0=PRPt1=VBZ, p_1=bost1=VBZt2=., p_1=bost_2=bost_1=bost0=PRP, p_1=bost_1=bost0=PRPt1=VBZ, p_1=bost0=PRPt1=VBZt2=., p_1=bosw_2=bos, p_1=bosw_1=bos, p_1=bosw0=He, p_1=bosw1=reckons, p_1=bosw2=., p_1=bosw_1=bosw0=He, p_1=bosw0=Hew1=reckons]
Dropped event B-VP:[w_2=bos, w_1=He, w0=reckons, w1=., w2=eos, w_1=Hew0=reckons, w0=reckonsw1=., t_2=bos, t_1=PRP, t0=VBZ, t1=., t2=eos, t_2=bost_1=PRP, t_1=PRPt0=VBZ, t0=VBZt1=., t1=.t2=eos, t_2=bost_1=PRPt0=VBZ, t_1=PRPt0=VBZt1=., t0=VBZt1=.t2=eos, p_2=bos, p_1=B-NP, p_2=bosp_1=B-NP, p_1=B-NPt_2=bos, p_1=B-NPt_1=PRP, p_1=B-NPt0=VBZ, p_1=B-NPt1=., p_1=B-NPt2=eos, p_1=B-NPt_2=bost_1=PRP, p_1=B-NPt_1=PRPt0=VBZ, p_1=B-NPt0=VBZt1=., p_1=B-NPt1=.t2=eos, p_1=B-NPt_2=bost_1=PRPt0=VBZ, p_1=B-NPt_1=PRPt0=VBZt1=., p_1=B-NPt0=VBZt1=.t2=eos, p_1=B-NPw_2=bos, p_1=B-NPw_1=He, p_1=B-NPw0=reckons, p_1=B-NPw1=., p_1=B-NPw2=eos, p_1=B-NPw_1=Hew0=reckons, p_1=B-NPw0=reckonsw1=.]
Dropped event O:[w_2=He, w_1=reckons, w0=., w1=eos, w2=eos, w_1=reckonsw0=., w0=.w1=eos, t_2=PRP, t_1=VBZ, t0=., t1=eos, t2=eos, t_2=PRPt_1=VBZ, t_1=VBZt0=., t0=.t1=eos, t1=eost2=eos, t_2=PRPt_1=VBZt0=., t_1=VBZt0=.t1=eos, t0=.t1=eost2=eos, p_2B-NP, p_1=B-VP, p_2B-NPp_1=B-VP, p_1=B-VPt_2=PRP, p_1=B-VPt_1=VBZ, p_1=B-VPt0=., p_1=B-VPt1=eos, p_1=B-VPt2=eos, p_1=B-VPt_2=PRPt_1=VBZ, p_1=B-VPt_1=VBZt0=., p_1=B-VPt0=.t1=eos, p_1=B-VPt1=eost2=eos, p_1=B-VPt_2=PRPt_1=VBZt0=., p_1=B-VPt_1=VBZt0=.t1=eos, p_1=B-VPt0=.t1=eost2=eos, p_1=B-VPw_2=He, p_1=B-VPw_1=reckons, p_1=B-VPw0=., p_1=B-VPw1=eos, p_1=B-VPw2=eos, p_1=B-VPw_1=reckonsw0=., p_1=B-VPw0=.w1=eos]
Indexing... done.
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:253)
at com.oracle.crm.nlp.CustomChunker2.main(CustomChunker2.java:91)
Sorting and merging events... Process exited with exit code 1.
(My en-chunker.train had only the first 2 and last line of your sample data set.)
Could you please tell me why this is happening and how to fix it?
EDIT2: I got the Chunker to work, however it gives an error when I change the sentence in the training set to any sentence other than the one you've given in your answer. Can you tell me why that could be happening?
As said in Opennlp Documentation
Sample sentence of the training data:
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
This is how you make your en-chunk.train file and you can create the corresponding .bin file using CLI:
$ opennlp ChunkerTrainerME -model en-chunker.bin -lang en -data en-chunker.train -encoding
or using API
public class SentenceTrainer {
public static void trainModel(String inputFile, String modelFile)
throws IOException {
MarkableFileInputStreamFactory factory = new MarkableFileInputStreamFactory(
new File(inputFile));
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
new PlainTextByLineStream(new FileInputStream("en-chunker.train"),charset);
ObjectStream<ChunkSample> sampleStream = new ChunkSampleStream(lineStream);
ChunkerModel model;
try {
model = ChunkerME.train("en", sampleStream,
new DefaultChunkerContextGenerator(), TrainingParameters.defaultParams());
finally {
OutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
} finally {
if (modelOut != null)
and the main method will be:
public class Main {
public static void main(String args[]) throws IOException {
String inputFile = "//path//to//data.train";
String modelFile = "//path//to//.bin";
SentenceTrainer.trainModel(inputFile, modelFile);
reference: this blog
hope this helps!
PS: collect/write the data as above in a .txt file and rename it with .train extension or even the trainingdata.txt will work. that is how you make a .train file.
For a university project I have to implement arules(package of R) in java. I have successfully integrated R and java using JRI. I did not understand how to get output of "inspect(Groceries[1:1])". I have tried with asString(),asString[]() but this gives me following error:
Exception in thread "main" java.lang.NullPointerException
at TestR.main(TestR.java:11)
Also, how can implement summary(Groceries) in java? How to get output of summary in String array or string?
R code:
Java code:
import org.rosuda.JRI.Rengine;
import org.rosuda.JRI.REXP;
public class TestR {
public static void main(String[] args){
Rengine re = new Rengine(new String[]{"--no-save"}, false, null);
REXP result = re.eval("inspect(Groceries[1:1])");
Appears that the inspect function in pkg:arules returns NULL. The output you see is a "side-effect". You can attempt to "capture output" but this is untested since I don't have experience with this integration across languages. Try instead.:
REXP result = re.eval("capture.output( inspect(Groceries[1:1]) )");
In an R console session you will get:
rules <- apriori(Adult)
val <- inspect(rules[1000])
> str(val)
> val.co <- capture.output(inspect(rules[1000]))
> val.co
[1] " lhs rhs support confidence lift"
[2] "1 {education=Some-college, "
[3] " sex=Male, "
[4] " capital-loss=None} => {native-country=United-States} 0.1208181 0.9256471 1.031449"
But I haven't tested this in a non-interactive session. May need to muck with the file argument to capture.output, ... or it may not work at all.
So I have some prolog...
cobrakai$more operator.pl
Which defines some infix operators. I run it using SWI prolog and get the following (perfectly expected) results
?- halt.
cobrakai$swipl -s operator.pl
% library(swi_hooks) compiled into pce_swi_hooks 0.00 sec, 3,992 bytes
% /Users/josephreddington/Documents/workspace/com.plancomps.prolog.helloworld/operator.pl compiled 0.00 sec, 992 bytes
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 5.10.5)
Copyright (c) 1990-2011 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.
For help, use ?- help(Topic). or ?- apropos(Word).
?- be(a,c).
?- a be c.
?- +=(a,c).
ERROR: toplevel: Undefined procedure: (+=)/2 (DWIM could not correct goal)
?- halt.
cobrakai$swipl -s operator.pl
% library(swi_hooks) compiled into pce_swi_hooks 0.00 sec, 3,992 bytes
% /Users/josephreddington/Documents/workspace/com.plancomps.prolog.helloworld/operator.pl compiled 0.00 sec, 1,280 bytes
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 5.10.5)
Copyright (c) 1990-2011 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.
For help, use ?- help(Topic). or ?- apropos(Word).
?- be(a,c).
?- a be c.
?- +=(a,c).
?- a += c.
?- halt.
However, when I use Tuprolog to process the same file from Java (using the following code)
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import alice.tuprolog.Prolog;
import alice.tuprolog.SolveInfo;
import alice.tuprolog.Theory;
public class Testinfixoperatorconstruction {
public static void main(String[] args) throws Exception {
Prolog engine = new Prolog();
engine.addTheory(new Theory(readFile("/Users/josephreddington/Documents/workspace/com.plancomps.prolog.helloworld/operator.pl")));
SolveInfo info = engine.solve("be(a,c).");
info = engine.solve("a be c.");
private static String readFile(String file) throws IOException {
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = null;
StringBuilder stringBuilder = new StringBuilder();
String ls = System.getProperty("line.separator");
while ((line = reader.readLine()) != null) {
return stringBuilder.toString();
The prolog file does not parse - failing on the '+=' token.
Exception in thread "main" alice.tuprolog.InvalidTheoryException: Unexpected token '+='
at alice.tuprolog.TheoryManager.consult(TheoryManager.java:193)
at alice.tuprolog.Prolog.addTheory(Prolog.java:242)
at Testinfixoperatorconstruction.main(Testinfixoperatorconstruction.java:14)
We can try a slightly different approach, adding the operator directly in the java code with...
public static void main(String[] args) throws Exception {
Prolog engine = new Prolog();
engine.getOperatorManager().opNew("be", "xfx", 35);
engine.getOperatorManager().opNew("+=", "xfx", 35);
engine.addTheory(new Theory(
SolveInfo info = engine.solve("be(a,c).");
info = engine.solve("a be c.");
but we get the same error... :(
Can anyone tell me why this is happening? (and solutions would also be welcome).
SWI-Prolog could be too much permissive while parsing directives. Try enclosing operators between parenthesis:
edit I tried using 2p.jar, that allowed me to spot the problem. Need to quote operator' atom:
:-op(35,xfx, '+=').
X += Y.
p :- a += b.
interactive 2p console accepts this syntax. Note that 2p.jar by default load tuprolog libraries