I want to parallelize my data writing process. I am writing a data frame to Oracle Database. This data has 4 million rows and 8 columns. It takes 6.5 hours without parallelizing.
When I try to go parallel, I get the error
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: No running JVM detected. Maybe .jinit() would help.
I know this error. I can solve it when I work with single cluster. But I do not know how to tell other clusters the location of Java. Here is my code
Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181')
library(rJava)
library(RJDBC)
library(DBI)
library(compiler)
library(dplyr)
library(data.table)
jdbcDriver =JDBC("oracle.jdbc.OracleDriver",classPath="C:/Program Files/directory/ojdbc6.jar", identifier.quote = "\"")
jdbcConnection =dbConnect(jdbcDriver, "jdbc:oracle:thin:#//XXXXX", "YYYYY", "ZZZZZ")
By using Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181') I solve the same problem for single core. But when I go parallel
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
clusterExport(cl, varlist = list("jdbcConnection", "brand3.merge.u"))
clusterEvalQ(cl, .libPaths("C:/Users/onur.boyar/Documents/R/win-library/3.5"))
clusterEvalQ(cl, library(RJDBC))
clusterEvalQ(cl, library(rJava))
parLapply(cl, 1:length(brand3.merge.u$CELL_PH_NUM), function(x) dbSendUpdate(jdbcConnection, "INSERT INTO xxnvdw.an_cust_analytics VALUES(?,?,?,?,?,?,?,?)", brand3.merge.u[x, 1], brand3.merge.u[x,2], brand3.merge.u[x,3],brand3.merge.u[x,4],brand3.merge.u[x,5],brand3.merge.u[x,6],brand3.merge.u[x,7],brand3.merge.u[x,8]))
#brand3.merge.u is my data frame that I try to write.
I get the above error and I do not know how to set my Java location for other nodes.
I want to use parLapply since it is faster than foreach. Any help would be appreciated. Thanks!
JAVA_HOME environment variable
If the problem really is with the location of Java, you could set the environment variable in your .Renviron file. It is likely located in ~/.Renviron. Add a line to that file and this will be propagated to all R session that run via your user:
JAVA_HOME='C:/Program Files/Java/jre1.8.0_181'
Alternatively, you can just add that location to your PATH environment variable.
JVM Initialization via rJava
On the other hand the error message may point to just a JVM not being initialized, which you can solve with .jinit, a minimal example:
library(parallel)
cl <- makeCluster(detectCores())
parallel::parLapply(cl, 1:5, function(x) {
rJava::.jinit()
rJava::.jnew(class = "java/lang/Integer", x)$toString()
})
Working around Java use
This was not specifically asked, but you can also work around the need for Java dependency using ODBC drivers, which for Oracle should be accessible here:
con <- DBI::dbConnect(
odbc::odbc(),
Driver = "[your driver's name]",
...
)
Related
What am I doing?
I am writing a data analysis program in Java which relies on R´s arulesViz library to mine association rules.
What do I want?
My purpose is to store the rules in a String variable in Java so that I can process them later.
How does it work?
The code works using a combination of String.format and eval Java and RJava instructions respectively, being its behavior summarized as:
Given properly formatted Java data structures, creates a data frame in R.
Formats the recently created data frame into a transaction list using the arules library.
Runs the apriori algorithm with the transaction list and some necessary values passed as parameter.
Reorders the generated association rules.
Given that the association rules cannot be printed, they are written to the standard output with R´s write method, capture the output and store it in a variable. We have converted the association rules into a string variable.
We return the string.
The code is the following:
// Step 1
Rutils.rengine.eval("dataFrame <- data.frame(as.factor(c(\"Red\", \"Blue\", \"Yellow\", \"Blue\", \"Yellow\")), as.factor(c(\"Big\", \"Small\", \"Small\", \"Big\", \"Tiny\")), as.factor(c(\"Heavy\", \"Light\", \"Light\", \"Heavy\", \"Heavy\")))");
//Step 2
Rutils.rengine.eval("transList <- as(dataFrame, 'transactions')");
//Step 3
Rutils.rengine.eval(String.format("info <- apriori(transList, parameter = list(supp = %f, conf = %f, maxlen = 2))", supportThreshold, confidenceThreshold));
// Step 4
Rutils.rengine.eval("orderedRules <- sort(info, by = c('count', 'lift'), order = FALSE)");
// Step 5
REXP res = Rutils.rengine.eval("rulesAsString <- paste(capture.output(write(orderedRules, file = stdout(), sep = ',', quote = TRUE, row.names = FALSE, col.names = FALSE)), collapse='\n')");
// Step 6
return res.asString().replaceAll("'", "");
What´s wrong?
Running the code in Linux Will work perfectly, but when I try to run it in Windows, I get the following error referring to the return line:
Exception in thread "main" java.lang.NullPointerException
This is a common error I have whenever the R code generates a null result and passes it to Java. There´s no way to syntax check the R code inside Java, so whenever it´s wrong, this error message appears.
However, when I run the R code in brackets in the R command line in Windows, it works flawlessly, so both the syntax and the data flow are OK.
Technical information
In Linux, I am using R with OpenJDK 10.
In Windows, I am currently using Oracle´s latest JDK release, but trying to run the program with OpenJDK 12 for Windows does not solve anything.
Everything is 64 bits.
The IDE used in both operating systems is IntelliJ IDEA 2019.
Screenshots
Linux run configuration:
Windows run configuration:
I'm trying the quickstart from here: http://datafu.incubator.apache.org/docs/datafu/getting-started.html
I tried nearly everything, but I'm sure it must be my fault somewhere. I tried already:
exporting PIG_HOME, CLASSPATH, PIG_CLASSPATH
starting pig with -cpdatafu-pig-incubating-1.3.0.jar
registering datafu-pig-incubating-1.3.0.jar locally and in hdfs => both succesful (at least no error shown)
nothing helped
Trying this on pig:
register datafu-pig-incubating-1.3.0.jar
DEFINE Median datafu.pig.stats.StreamingMedian();
data = load '/user/hduser/numbers.txt' using PigStorage() as (val:int);
data2 = FOREACH (GROUP data ALL) GENERATE Median(data);
or directly
data2 = FOREACH (GROUP data ALL) GENERATE datafu.pig.stats.StreamingMedian(data);
I get this name-resolve error:
2016-06-04 17:22:22,734 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1070: Could not resolve datafu.pig.stats.StreamingMedian using imports: [, java.lang., org.apache.pig.builtin.,
org.apache.pig.impl.builtin.] Details at logfile:
/home/hadoop/pig_1465053680252.log
When I look into the datafu-pig-incubating-1.3.0.jar it looks OK, everything in place. I also tried some Bag functions, same error then.
I think it's kind of a noob-error which I just don't see (as I did not find particular answers for datafu in SO or google), so thanks in advance for shedding some light on this.
Pig script is proper, the only thing that could break is that while registering datafu there were some class dependencies that coudn't been met.
Try to run locally (pig -x local) and see a detailed log.
Check also the version of pig - it should be newer than 0.14.0.
I'm trying to do a feature selection using the chi.squared function in FSelector package in R.
My dataset is about 132 variables X 192,000 rows.
chisquared.fs <- chi.squared(fo,df)
where fo contains the class variable: class ~.
I'm getting this error while running the code:
Error in .jcall("weka/filters/Filter", "Lweka/core/Instances;", "useFilter",
:java.lang.OutOfMemoryError: Java heap space
I know it is a Java memory leak error and I have already tried this before calling any libraries:
options( java.parameters = "-Xmx6g")
Any pointers would be really welcome.
Guys update: I had done what #copeg suggested without restarting R. I restarted R and with the options statement at the beginning before calling the libraries and it worked. Thanks for your suggestions.
I have a quite big (>2.5 GB) h2 database file. Driver version is 1.4.182. Everything worked fine but recently the DB stop to work with exception:
Błąd ogólny: "java.lang.NullPointerException"
General error: "java.lang.NullPointerException" [50000-182] HY000/50000 (Help)
org.h2.jdbc.JdbcSQLException: Błąd ogólny: "java.lang.NullPointerException"
General error: "java.lang.NullPointerException" [50000-182]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.DbException.convert(DbException.java:295)
at org.h2.engine.Database.openDatabase(Database.java:297)
at org.h2.engine.Database.<init>(Database.java:260)
at org.h2.engine.Engine.openSession(Engine.java:60)
at org.h2.engine.Engine.openSession(Engine.java:167)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:145)
at org.h2.engine.Engine.createSession(Engine.java:128)
at org.h2.engine.Engine.createSession(Engine.java:26)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:347)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:108)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:92)
at org.h2.Driver.connect(Driver.java:72)
at org.h2.server.web.WebServer.getConnection(WebServer.java:750)
at org.h2.server.web.WebApp.test(WebApp.java:895)
at org.h2.server.web.WebApp.process(WebApp.java:221)
at org.h2.server.web.WebApp.processRequest(WebApp.java:170)
at org.h2.server.web.WebThread.process(WebThread.java:137)
at org.h2.server.web.WebThread.run(WebThread.java:93)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at org.h2.mvstore.db.ValueDataType.compare(ValueDataType.java:102)
at org.h2.mvstore.MVMap.compare(MVMap.java:741)
at org.h2.mvstore.Page.binarySearch(Page.java:388)
at org.h2.mvstore.MVMap.put(MVMap.java:179)
at org.h2.mvstore.MVMap.put(MVMap.java:133)
at org.h2.mvstore.db.TransactionStore.rollbackTo(TransactionStore.java:491)
at org.h2.mvstore.db.TransactionStore$Transaction.rollback(TransactionStore.java:785)
at org.h2.mvstore.db.MVTableEngine$Store.initTransactions(MVTableEngine.java:223)
at org.h2.engine.Database.open(Database.java:736)
at org.h2.engine.Database.openDatabase(Database.java:266)
... 17 more
The problem occurs in my application and using H2 web frontend.
I have tried solution from similar question but I cannot downgrade H2 to 1.3.x as it cannot read 1.4.x DB files.
My questions are:
How to handle it? Is it to possible to make it work again? I have tried downgrade H2 to 1.4.177 but it didn't help.
Is there any way to at least recover data to other format? I could use other DB (Sqlite, etc.) however I would need a way to get to these data.
EDIT: updated stacktrace
EDIT 2: Result of using Recovery tool:
$ java -cp h2-1.4.182.jar org.h2.tools.Recover
Exception in thread "main" java.lang.IllegalStateException: Unknown tag 50 [1.4.182/6]
at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:762)
at org.h2.mvstore.type.ObjectDataType.read(ObjectDataType.java:222)
at org.h2.mvstore.db.TransactionStore$ArrayType.read(TransactionStore.java:1792)
at org.h2.mvstore.db.TransactionStore$ArrayType.read(TransactionStore.java:1759)
at org.h2.mvstore.Page.read(Page.java:843)
at org.h2.mvstore.Page.read(Page.java:230)
at org.h2.mvstore.MVStore.readPage(MVStore.java:1813)
at org.h2.mvstore.MVMap.readPage(MVMap.java:769)
at org.h2.mvstore.Page.getChildPage(Page.java:252)
at org.h2.mvstore.MVMap.getFirstLast(MVMap.java:351)
at org.h2.mvstore.MVMap.firstKey(MVMap.java:218)
at org.h2.mvstore.db.TransactionStore.init(TransactionStore.java:169)
at org.h2.mvstore.db.TransactionStore.<init>(TransactionStore.java:117)
at org.h2.mvstore.db.TransactionStore.<init>(TransactionStore.java:81)
at org.h2.tools.Recover.dumpMVStoreFile(Recover.java:593)
at org.h2.tools.Recover.process(Recover.java:331)
at org.h2.tools.Recover.runTool(Recover.java:192)
at org.h2.tools.Recover.main(Recover.java:155)
I also noticed that two another files (.txt and .sql) has been created but they don't seem to contain a data.
I got the same situation last week with JIRA database, it took me few hours to do google regarding to this problem however, there are no answers which can resolve the situation.
I decided to take a look at H2 source code and I can recover the whole database with very simple code. I may not understand whole picture about the situation e.g. root cause, in which condition it happened, etc.
However, the cause is: when you connect to h2 file then H2 engine will look into auditLog and rollback the in-progress transactions, there are some data which have unknown type (id is 17) and H2 fails to rollback due to exception during recognize the type (id 17).
My code is simple, add h2 lib into your build path then just manual connect to file and clear auditLog because I think it is just the log and will not have big impact (someone may correct me).
Hopefully, you can resolve your problem as well.
public static void main(final String[] args) {
// open the store (in-memory if fileName is null)
final MVStore store = MVStore.open("C:\\temp\\h2db.mv.db");
final MVMap<Object, Object> openMap = store.openMap("undoLog");
openMap.clear();
// close the store (this will persist changes)
store.close();
}
Another solution for this problem:
Go to your home folder (~ in linux).
Move all files named [*.mv.db] to a backup with a different name. For example: mv xyz.mv.db xyz.mv.db.backup
Restart your database.
This seems to clear out MVStore meta-data used for H2 undo features, and resolves the NPE from the MV Store compare.
I had a similar problem:
[HY000][50000] Allgemeiner Fehler: "java.lang.NullPointerException"
General error: "java.lang.NullPointerException" [50000-176]
java.lang.NullPointerException
I tried to connect with IntelliJ to H2 file DB. H2 driver version was 1.3.176 but the DB file version had 1.3.161. So downgraded the driver to 1.3.161 in IntelliJ solved the problem completely.
I've been trying to query data from a PostgreSQL DB via R. I've tried skinning the cat with a few different packages (RODBC, RJDBC, DBI, RPostgres, etc), but I seem to keep getting driver errors. Oddly, I never have trouble using the same drivers/URL's and settings to connect to Postgres from SQLWorkbench/J.
I've tried using postgresql-9.2-1002.jdbc4.jar and postgresql-9.3-1100.jdbc41.jar, as well as the generic "PostgreSQL" driver in R. The two jar files are the (i) the driver I use all the time with SQLWorkbench/J and (ii) the slightly newer version of that same driver, respectively. Yet, when I try to use it...
drv_custom <- JDBC(driverClass = "org.postgresql.Driver", classPath="/Users/xxxx/postgresql-9.3-1100.jdbc41.jar")
I get an error:
Error in .jfindClass(as.character(driverClass)[1]) : class not found
OK, so next I try it with the generic driver:
drv_generic <- dbDriver("PostgreSQL")
and strangely, it doesn't want me to enter a username:
>con <- dbConnect(drv=drv_generic, "jdbc:postgresql://xxx.xxxxx.com", port=xxxx, uid="xxxx", password="xxxx")
>Error in postgresqlNewConnection(drv, ...) : unused argument (uid = "xxxx")
so I try it without user/uid:
con <- dbConnect(drv_generic, "jdbc:postgresql://padb-01.jiwiredev.com:5439", password="paraccel")
and get an error....
Error in postgresqlNewConnection(drv, ...) :
RS-DBI driver: (could not connect jdbc:postgresql://padb-01.xxx.com:5439#local on dbname "jdbc:postgresql://xxxx.xxxx.com:5439")
Apparently the syntax is wrong?
Then I circle back to trying the "custom" driver (either one of the .jar files from earlier) but without the driverClass specified.
drv_custom1 <- JDBC( classPath="/Users/xxxx/postgresql-9.2-1002.jdbc4.jar")
con <- dbConnect(drv=drv_custom1, "jdbc:postgresql://xxx.xxx.com", port=5439, uid="paraccel", pwd="paraccel")
and get this error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
RcallMethod: attempt to call a method of a NULL object.
I tried it again with a slight alteration to the syntax:
con <- dbConnect(drv=drv_custom1, url="jdbc:postgresql://xxxx.xxxx.com", port=xxxx, uid="xxxx", pwd="xxxxx",dsn="xxxx")
and got the same error. I tried a number of other variations/approaches as well. I think part of my confusion comes from the fact that the documentation is handled in a very piecemeal way between packages like DBI and those that build upon it like RJDBC, so that when I look at documentation such as ?dbConnect many of the options I need to specify are not even mentioned, and I've been working based off of miscellaneous Google search results related to these packages/errors.
One thread I found suggested trying
.jaddClassPath( "xxxxx/postgresql-9.2-1002.jdbc4.jar" )
first, but that didn't seem to help.
I also tried using
x <- PostgreSQL(max.con = 16, fetch.default.rec = 500, force.reload = FALSE)
to no avail and I experimented with RODBC as the driver.
UPDATE:
I tried using an older version of the driver (jdbc3 instead of jdbc4), restarting R, and detaching all unnecessary packages.
I was able to load the driver
> drv_custom <- JDBC(driverClass = "org.postgresql.Driver", classPath="/xxxxx/xxxxx/postgresql-9.3-1100.jdbc3.jar")
but I still can't connect...
> con <- dbConnect(drv=drv_custom, "jdbc:postgresql://xxxxx.xxxxx.com", port=5439, uid="xxxxx", pwd="xxxxx")
Error in .verify.JDBC.result(jc, "Unable to connect JDBC to ", url) :
Unable to connect JDBC to jdbc:postgresql://xxxxx.xxxx.com
This works for me:
library(RJDBC)
drv <- JDBC("org.postgresql.Driver","C:/R/postgresql-9.4.1211.jar")
con <- dbConnect(drv, url="jdbc:postgresql://host:port/dbname", user="<user name>", password="<password>")
The trick was to include port and dbname in the url. For some reason, jdbc:postgresql does not seem to like reading those information from the dbConnect parameters.
If you are not sure what the dbname is, it is perhaps postgres.
If you are not sure what the port is, it is perhaps 5432.
So a typical call would look like:
con <- dbConnect(drv, url="jdbc:postgresql://10.10.10.10:5432/postgres", user="<user name>", password="<password>")
You can get the jar file from https://jdbc.postgresql.org/
It took some troubleshooting on IRC, but here's what had to happen:
I needed to clear the workspace, detach RODBC and RJDBC, then restart
Load RPostgreSQL
Use only the generic driver
Modify the syntax to
con <- dbConnect(drv=drv_generic, "xxx.xxx.com", port=vvvv, user="ffff", password="ffff", dbname="ggg")
Note: removing the jdbc:postgresql: part was key. Those would've been necessary with RJDBC, but RJDBC turned out to be an unnecessarily painful route.