I'm trying to calculate high-PMI terms for a particular token in pylucene. A colleague gave me some Java code that works, but I am having trouble translating it into Python. In particular, the code relies on a custom collector. Here's the initial query code:
def __init__(self, some_token, searcher, analyzer):
super(PMICalculator, self).__init__()
self.searcher = searcher
self.analyzer = analyzer
self.escaped_token = QueryParser.escape(some_token)
self.query = QueryParser("text",self.analyzer).parse(self.escaped_token)
self.term_count_collector = TermCountCollector(searcher)
self.searcher.search(self.query, self.term_count_collector)
self.terms = self.term_count_collector.getTerms()
Here's the Term Count Collector class: http://snipt.org/vgGi8
This code breaks at self.searcher.search with the error:
File <filename>, line 26, in __init__
self.searcher.search(self.query, self.term_count_collector)
lucene.JavaError: org.apache.jcc.PythonException: collect() takes exactly 2 arguments (3 given)
TypeError: collect() takes exactly 2 arguments (3 given)
Java stacktrace:
org.apache.jcc.PythonException: collect() takes exactly 2 arguments (3 given)
TypeError: collect() takes exactly 2 arguments (3 given)
at org.apache.pylucene.search.PythonHitCollector.collect(Native Method)
at org.apache.lucene.search.HitCollectorWrapper.collect(HitCollectorWrapper.java:46)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:86)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:74)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
at org.apache.lucene.search.Searcher.search(Searcher.java:110)
I did some google searching, but to no avail - I am new at lucene, and can't tell if this is just not a feature that's supported by 2.9.4, or if it's a pylucene issue, or if my code is wrong. Please help!
Related
I am getting below error inspite of correct python code don't know how to resolve this error. Any help is much appreciated
org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: SyntaxError: no viable alternative at input '*' in <script> at line number 35 at column number 26
python code
def get_match_list(regEx, line):
match = re.search(regEx, line)
print(match)
if match:
match_list = [*match.groups()] # this is the line exception is pointed
return match_list
else:
return []
It looks like jython use python 2.7 and as Unpacking Generalizations is a feature that introduced in python 3.5 you can not use this syntax in jython, so an alternative way to convert a tuple to a list is that use list ( match.groups) it works fine in older versions of python and current version of jython (2.7.2)
What am I doing?
I am writing a data analysis program in Java which relies on R´s arulesViz library to mine association rules.
What do I want?
My purpose is to store the rules in a String variable in Java so that I can process them later.
How does it work?
The code works using a combination of String.format and eval Java and RJava instructions respectively, being its behavior summarized as:
Given properly formatted Java data structures, creates a data frame in R.
Formats the recently created data frame into a transaction list using the arules library.
Runs the apriori algorithm with the transaction list and some necessary values passed as parameter.
Reorders the generated association rules.
Given that the association rules cannot be printed, they are written to the standard output with R´s write method, capture the output and store it in a variable. We have converted the association rules into a string variable.
We return the string.
The code is the following:
// Step 1
Rutils.rengine.eval("dataFrame <- data.frame(as.factor(c(\"Red\", \"Blue\", \"Yellow\", \"Blue\", \"Yellow\")), as.factor(c(\"Big\", \"Small\", \"Small\", \"Big\", \"Tiny\")), as.factor(c(\"Heavy\", \"Light\", \"Light\", \"Heavy\", \"Heavy\")))");
//Step 2
Rutils.rengine.eval("transList <- as(dataFrame, 'transactions')");
//Step 3
Rutils.rengine.eval(String.format("info <- apriori(transList, parameter = list(supp = %f, conf = %f, maxlen = 2))", supportThreshold, confidenceThreshold));
// Step 4
Rutils.rengine.eval("orderedRules <- sort(info, by = c('count', 'lift'), order = FALSE)");
// Step 5
REXP res = Rutils.rengine.eval("rulesAsString <- paste(capture.output(write(orderedRules, file = stdout(), sep = ',', quote = TRUE, row.names = FALSE, col.names = FALSE)), collapse='\n')");
// Step 6
return res.asString().replaceAll("'", "");
What´s wrong?
Running the code in Linux Will work perfectly, but when I try to run it in Windows, I get the following error referring to the return line:
Exception in thread "main" java.lang.NullPointerException
This is a common error I have whenever the R code generates a null result and passes it to Java. There´s no way to syntax check the R code inside Java, so whenever it´s wrong, this error message appears.
However, when I run the R code in brackets in the R command line in Windows, it works flawlessly, so both the syntax and the data flow are OK.
Technical information
In Linux, I am using R with OpenJDK 10.
In Windows, I am currently using Oracle´s latest JDK release, but trying to run the program with OpenJDK 12 for Windows does not solve anything.
Everything is 64 bits.
The IDE used in both operating systems is IntelliJ IDEA 2019.
Screenshots
Linux run configuration:
Windows run configuration:
I am trying python and java integration using Jep. I have loaded randomforest model from pickle file (rf.pkl) as sklearn.ensemble.forest.RandomForestClassifier object from java program using Jep.
I want this loading to be one time so that I wanted to execute a python function defined in python script prediction.py (to predict using rf model) by sending "rfmodel" argument from java to call python function.
But the argument sent to python from java is read as string in python. How can I retain the datatype of argument in python as sklearn.ensemble.forest.RandomForestClassifier?
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf");
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");
------------
python.py
import pickle
def trainmodel(requestid, rf):
//when rf is printed it is 'str' format.
When Jep converts a Python object into a Java object if it does not recognize the Python type it will return the String representation of the Python object, see this bug for discussion on that behavior. If you are running the latest version of Jep(3.8) you can override this behavior by passing a Java class to the getValue function. The PyObject class was created to serve as a generic wrapper around arbitrary python objects. The following code should do what you want:
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf", PyObject.class);
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");
I'm trying to use Java Opencl from within jruby, but am encountering a problem which I can't solve, even with much google searching.
require 'java'
require 'JOCL-0.1.7.jar'
platforms = org.jocl.cl_platform_id.new
puts platforms.class
org.jocl.CL.clGetPlatformIDs(1, platforms, nil)
when I run this code using: jruby test.rb
I get the following error, when the last line is uncommented:
#<Class:0x10191777e>
TypeError: cannot convert instance of class org.jruby.java.proxies.ConcreteJavaP
roxy to class [Lorg.jocl.cl_platform_id;
LukeTest at test.rb:29
(root) at test.rb:4
Just wondering whether anyone has an idea on how to solve this problem?
EDIT:
ok so I think I've solved the first part of this problem by making platforms an array:
platforms = org.jocl.cl_platform_id[1].new
but that led to this error when adding the next couple of lines:
context_properties = org.jocl.cl_context_properties.new()
context_properties.addProperty(org.jocl.CL::CL_CONTEXT_PLATFORM, platforms[0])
CodegenUtils.java:98:in `human': java.lang.NullPointerException
from CodegenUtils.java:152:in `prettyParams'
from CallableSelector.java:462:in `argumentError'
from CallableSelector.java:436:in `argTypesDoNotMatch'
from RubyToJavaInvoker.java:248:in `findCallableArityTwo'
from InstanceMethodInvoker.java:66:in `call'
from CachingCallSite.java:332:in `cacheAndCall'
from CachingCallSite.java:203:in `call'
from test.rb:36:in `module__0$RUBY$LukeTest'
from test.rb:-1:in `module__0$RUBY$LukeTest'
from test.rb:4:in `__file__'
from test.rb:-1:in `load'
from Ruby.java:679:in `runScript'
from Ruby.java:672:in `runScript'
from Ruby.java:579:in `runNormally'
from Ruby.java:428:in `runFromMain'
from Main.java:278:in `doRunFromMain'
from Main.java:198:in `internalRun'
from Main.java:164:in `run'
from Main.java:148:in `run'
from Main.java:128:in `main'
for some reason when I print the class of platforms[0] it's listed as NilClass!?
You are overlooking a very simple mistake. You write
platforms = org.jocl.cl_platform_id.new
but that line creates a single instance of the class org.jocl.cl_platform_id. You then pass that as the second parameter to org.jocl.CL.clGetPlatformIDs in
org.jocl.CL.clGetPlatformIDs(1, platforms, nil)
and that doesn't work, because the second argument of the method requires an (empty) array of org.jocl.cl_platform_id objects.
What the error says is: "I have something that is a proxy for a Java object and I can't turn it into an an array of org.jocl.cl_platform_id objects, as you are asking me to do.
If you just say
platforms = []
and pass that in, it might just work :).
I am getting expected ClassVerifyErrors when attempting to load a class i have generated using ASM. On further inspection i can see that the jvm is correct and that the method is talking about has an invalid MAX_STACK value. THe strange thing is am using the auto calculate the stack and max local options so this should not be a problem...
The method with the invalid option is very simple and yet the result is bad bytecode.
I have written a class with the intended method and compared my asm generated class against what javac produces and the byte codes matchup with the only error being the max stack is 0 which is wrong while javac sets a value of 2.
Id like to avoid having to calculate tha max stack/locals myself.
Max stack and variable calculation can produce the wrong results if bytecode is not valid. You can verify that by running generated code trough the CheckClassAdapter.
For example,
ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_MAXS);
// generate code into cw instance...
PrintWriter pw = new PrintWriter(System.out);
CheckClassAdapter.verify(new ClassReader(cw.toByteArray()), true, pw);