I am out of my R depth. I defined a function nGrams (using RWeka) that worked fine when I tried it out, and sometimes it still does. I do not know how to figure out what environment it works in, what environment I am in when I want to use it, etc. Any quick tips or can you point me to a webpage that could help? If I have to put in a change environment command every time I use it, that is just fine. I really do not understand the issue.
here is what I see in my console.
blog2gramfreq <- nGrams(cleanblogs100000, 2)
Error in ls(envir = envir, all.names = private) :
invalid 'envir' argument
Called from: top level
Called from: top level
Browse[1]>
structure(function (this, private = FALSE, ...)
{
envir <- attr(this, ".env")
ls(envir = envir, all.names = private)
}, export = FALSE, S3class = "Object", modifiers = "public")
I do see nGrams in my Global Environment window.
This was something that came up in a Coursera class blog that i did not find an answer to, at least for R. Here is an answer that worked for me when I received the "'OutOfMemoryError : not enough java heap space" error in R programming.
options(java.parameters="-Xmx4000m")
Related
What am I doing?
I am writing a data analysis program in Java which relies on R´s arulesViz library to mine association rules.
What do I want?
My purpose is to store the rules in a String variable in Java so that I can process them later.
How does it work?
The code works using a combination of String.format and eval Java and RJava instructions respectively, being its behavior summarized as:
Given properly formatted Java data structures, creates a data frame in R.
Formats the recently created data frame into a transaction list using the arules library.
Runs the apriori algorithm with the transaction list and some necessary values passed as parameter.
Reorders the generated association rules.
Given that the association rules cannot be printed, they are written to the standard output with R´s write method, capture the output and store it in a variable. We have converted the association rules into a string variable.
We return the string.
The code is the following:
// Step 1
Rutils.rengine.eval("dataFrame <- data.frame(as.factor(c(\"Red\", \"Blue\", \"Yellow\", \"Blue\", \"Yellow\")), as.factor(c(\"Big\", \"Small\", \"Small\", \"Big\", \"Tiny\")), as.factor(c(\"Heavy\", \"Light\", \"Light\", \"Heavy\", \"Heavy\")))");
//Step 2
Rutils.rengine.eval("transList <- as(dataFrame, 'transactions')");
//Step 3
Rutils.rengine.eval(String.format("info <- apriori(transList, parameter = list(supp = %f, conf = %f, maxlen = 2))", supportThreshold, confidenceThreshold));
// Step 4
Rutils.rengine.eval("orderedRules <- sort(info, by = c('count', 'lift'), order = FALSE)");
// Step 5
REXP res = Rutils.rengine.eval("rulesAsString <- paste(capture.output(write(orderedRules, file = stdout(), sep = ',', quote = TRUE, row.names = FALSE, col.names = FALSE)), collapse='\n')");
// Step 6
return res.asString().replaceAll("'", "");
What´s wrong?
Running the code in Linux Will work perfectly, but when I try to run it in Windows, I get the following error referring to the return line:
Exception in thread "main" java.lang.NullPointerException
This is a common error I have whenever the R code generates a null result and passes it to Java. There´s no way to syntax check the R code inside Java, so whenever it´s wrong, this error message appears.
However, when I run the R code in brackets in the R command line in Windows, it works flawlessly, so both the syntax and the data flow are OK.
Technical information
In Linux, I am using R with OpenJDK 10.
In Windows, I am currently using Oracle´s latest JDK release, but trying to run the program with OpenJDK 12 for Windows does not solve anything.
Everything is 64 bits.
The IDE used in both operating systems is IntelliJ IDEA 2019.
Screenshots
Linux run configuration:
Windows run configuration:
I want to parallelize my data writing process. I am writing a data frame to Oracle Database. This data has 4 million rows and 8 columns. It takes 6.5 hours without parallelizing.
When I try to go parallel, I get the error
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: No running JVM detected. Maybe .jinit() would help.
I know this error. I can solve it when I work with single cluster. But I do not know how to tell other clusters the location of Java. Here is my code
Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181')
library(rJava)
library(RJDBC)
library(DBI)
library(compiler)
library(dplyr)
library(data.table)
jdbcDriver =JDBC("oracle.jdbc.OracleDriver",classPath="C:/Program Files/directory/ojdbc6.jar", identifier.quote = "\"")
jdbcConnection =dbConnect(jdbcDriver, "jdbc:oracle:thin:#//XXXXX", "YYYYY", "ZZZZZ")
By using Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181') I solve the same problem for single core. But when I go parallel
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
clusterExport(cl, varlist = list("jdbcConnection", "brand3.merge.u"))
clusterEvalQ(cl, .libPaths("C:/Users/onur.boyar/Documents/R/win-library/3.5"))
clusterEvalQ(cl, library(RJDBC))
clusterEvalQ(cl, library(rJava))
parLapply(cl, 1:length(brand3.merge.u$CELL_PH_NUM), function(x) dbSendUpdate(jdbcConnection, "INSERT INTO xxnvdw.an_cust_analytics VALUES(?,?,?,?,?,?,?,?)", brand3.merge.u[x, 1], brand3.merge.u[x,2], brand3.merge.u[x,3],brand3.merge.u[x,4],brand3.merge.u[x,5],brand3.merge.u[x,6],brand3.merge.u[x,7],brand3.merge.u[x,8]))
#brand3.merge.u is my data frame that I try to write.
I get the above error and I do not know how to set my Java location for other nodes.
I want to use parLapply since it is faster than foreach. Any help would be appreciated. Thanks!
JAVA_HOME environment variable
If the problem really is with the location of Java, you could set the environment variable in your .Renviron file. It is likely located in ~/.Renviron. Add a line to that file and this will be propagated to all R session that run via your user:
JAVA_HOME='C:/Program Files/Java/jre1.8.0_181'
Alternatively, you can just add that location to your PATH environment variable.
JVM Initialization via rJava
On the other hand the error message may point to just a JVM not being initialized, which you can solve with .jinit, a minimal example:
library(parallel)
cl <- makeCluster(detectCores())
parallel::parLapply(cl, 1:5, function(x) {
rJava::.jinit()
rJava::.jnew(class = "java/lang/Integer", x)$toString()
})
Working around Java use
This was not specifically asked, but you can also work around the need for Java dependency using ODBC drivers, which for Oracle should be accessible here:
con <- DBI::dbConnect(
odbc::odbc(),
Driver = "[your driver's name]",
...
)
I have a strange memory problem with Java library of Z3 which I couldn't figure where the problem is. Oddly, I can't reproduce the problem on a Windows machine where I have Java 7 (I most probably have slightly older version of Z3 there though). The problem occurs on a MacOSx 10.6.8 with Java 6 and Z3 v4.3.2. I have an application that uses Z3 for analysis. I tracked the following piece of code as the (initial) source of the problem:
Symbol eNames = con.mkSymbol(domainName);
Symbol[] symbols = new Symbol[values.length];
for (int i = 0; i < values.length; i++) symbols[i] = con.mkSymbol(values[i]);
System.out.println("Before ENUMSORT");
//EnumSort eSort = con.mkEnumSort(domainName, values);
EnumSort eSort = con.mkEnumSort(eNames,symbols);
System.out.println("After ENUM SORT ...");
When I run the application I get the following after "Before ENUMSORT" is printed:
java(55938,0x100501000) malloc: *** error for object 0x10200f1b8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
I know this is not a good way of debugging especially when there is a memory problem but it is very difficult to debug the code since it originates from JNI. When I look at the Z3 code here (https://github.com/Z3Prover/z3/blob/master/src/api/api_datatype.cpp) I couldn't figure what the source of the problem is. I assume that the method Z3_mk_enumeration_sort is called from mkEnumSort method in Java. When I change the call of mkEnumSort in my code to a form like
EnumSort eSort = con.mkEnumSort(domainName,new String[]{"X","Y"});
the problem seems gone. What do you think, what could be the source of the problem?
Any help is highly appreciated.
Executing the following code in Java7
ScriptEngine scriptEngine = new ScriptEngineManager().getEngineByName("js");
Bindings b = scriptEngine.createBindings();
b.put("x", true);
scriptEngine.eval("x&y", b);
I get the error
sun.org.mozilla.javascript.internal.EcmaError: ReferenceError: "b" is not defined. (<Unknown Source>#1) in <Unknown Source> at line number 1
Is there an option to evaluate to null/false for undefined objects, like in JavaScript?
I know that an option will be to do something like "this.x&this.y" instead of "x&y", but I don't have control over that string (user entered).
I browsed a little bit through the Rhino code and it seems that there's no such option.
In the end I will append "this." in front of each variable. This is not by far a desirable solution (I will not even accept my own answer :) ), but for the time being I have no other.
I used RWeka to call Weka functions directly in R.
I tried using meta learning (bagging) but failed.
My code is Bagging(classLabel ~ ., data = train, control = Weka_control(W = J48))
However, the following error pops up:
Error in Bagging(classLabel ~ ., data = train, control = Weka_control(W = J48)) :
unused argument(s) (data = train, control = Weka_control(W = J48))
I also tried several different base learners but always met such error.
If you successfully used meta learning in RWeka before, please let me know.
Just tried another writing:
optns <- Weka_control(W = "weka.classifiers.trees.REPTree") Bagging <- make_Weka_classifier("weka/classifiers/meta/Bagging") model <- Bagging(classLabel ~ ., data=dat, control = optns)
Surprisingly the R code works now.
-Credit Leo5188