Apache Spark : TaskResultLost (result lost from block manager) Error On cluster

Apache Spark : TaskResultLost (result lost from block manager) Error On cluster - java

I have a Spark standalone cluster with 3 slaves on Virtualbox. My code is on Java and it is working fine with my small input datasets which their inputs are around 100MB totally.
I set my virtual machines RAM to be 16GB but when I was runnig my code on big input files (about 2GB) I get this error after hours of processing in my reduce part:
Job aborted due to stage failure: Total size of serialized results of 4 tasks (4.3GB) is bigger than spark.driver.maxResultSize`
I edited the spark-defaults.conf and assigned a higher amount (2GB and 4GB) for spark.driver.maxResultSize. It didn't help and the same error showed up.
No I am trying 8GB of spark.driver.maxResultSize and my spark.driver.memory is also the same as RAM size (16GB). But I get this error:
TaskResultLost (result lost from block manager)
Any comments about this? I also include an image.
I don't know if the problem is causing by the large size of maxResultSize or this is something with collections of RDDs in the code. I also provide the mapper part of the code for a better understanding.
JavaRDD<Boolean[][][]> fragPQ = uData.map(new Function<String, Boolean[][][]>() {
public Boolean[][][] call(String s) {
Boolean[][][] PQArr = new Boolean[2][][];
PQArr[0] = new Boolean[11000][];
PQArr[1] = new Boolean[11000][];
for (int i = 0; i < 11000; i++) {
PQArr[0][i] = new Boolean[11000];
PQArr[1][i] = new Boolean[11000];
for (int j = 0; j < 11000; j++) {
PQArr[0][i][j] = true;
PQArr[1][i][j] = true;

In general, this error shows that you are collecting/bringing a large amount of data onto the driver. This should never be done. You need to rethink your application logic.
Also, you don't need to modify spark-defaults.conf to set the property. Instead, you can specify such application-specific properties via --conf option in spark-shell or spark-submit, depending on how you run the job.

SOLVED:
The problem solved by increasing the master RAM size. I studied my case and found out that based on my design assigning 32GB of RAM would be sufficient. Now by doing than, my program is working fine and is calculating everything correctly.

In my case, I got this error because a firewall was blocking the block manager ports between the driver and the executors.
The port can be specified with:
spark.blockManager.port and
spark.driver.blockManager.port
See https://spark.apache.org/docs/latest/configuration.html#networking

Related

Ignite consumes all memory and fails with OutOfMemory when iterating over cache

I'm trying to iterate over all cache entities using ScanQuery and iterator (not to pull them from distributed cache to local client at once):
IgniteCache cache = ignite.getOrCreateCache("test2");
ScanQuery<Integer, Person> scan = new ScanQuery<>();
scan.setPageSize(256);
Iterator<Cache.Entry<Integer, Person>> it = cache.query(scan).iterator();
int id;
while(it.hasNext()) {
id = it.next().getValue().getId();
<...>
}
but the code above fails, consuming all memory available. In the same time it works well when I'm trying to get iterator from cache:
IgniteCache cache = ignite.getOrCreateCache("test2");
Iterator<Cache.Entry<Integer, Person>> it = cache.iterator();
int id;
while(it.hasNext()) {
id = it.next().getValue().getId();
<...>
}
Docs state that:
QueryCursor represents query result set and allows for transparent page-by-page iteration. Whenever user starts iterating over the last page, it will automatically request the next page in the background.
So why ignite local node fails when iterating over cache with ScanQuery?
UPD:
Person is an example name instead of actual class's name. Actual class contain one Integer and 10 String fields.
Actually I've already set page size to a smaller number - 256 instead of default 1024. Same behavior with default and smaller value
When I try to use cache.query(scan).getAll() things go same way, but I can't use iterator value in while loop, application just became stuck until OOM.
Exception msg:
Aug 31, 2018 6:16:15 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Runtime error caught during grid runnable execution: Socket reader [id=105, name=tcp-disco-sock-reader-#13, nodeId=83e986dc-9fc1-433c- 8953-2a11460376a0]
java.lang.OutOfMemoryError: Java heap space
full stacktrace: https://pastebin.com/raw/ZpHnRjx8

It looks like a known issue https://issues.apache.org/jira/browse/IGNITE-8892
It is already fixed and will be available in Apache Ignite 2.7
Can you please check your code with the latest master branch?

I have also experienced this behaviour with ScanQuery; there appears to be a memory leak in the cursor where it holds on to references to iterated cache objects. As a workaround you may be able to use the SqlQuery interface which doesn't exhibit the same behaviour, assuming your caches are setup for SQL access:
SqlQuery<Integer, Person> scan = new SqlQuery<>(Person.class, "1=1");

Save a spark RDD using mapPartition with iterator

I have some intermediate data that I need to be stored in HDFS and local as well. I'm using Spark 1.6. In HDFS as intermediate form I'm getting data in /output/testDummy/part-00000 and /output/testDummy/part-00001. I want to save these partitions in local using Java/Scala so that I could save them as /users/home/indexes/index.nt(by merging both in local) or /users/home/indexes/index-0000.nt and /home/indexes/index-0001.nt separately.
Here is my code:
Note: testDummy is same as test, output is with two partitions. I want to store them separately or combined but local with index.nt file. I prefer to store separately in two data-nodes. I'm using cluster and submit spark job on YARN. I also added some comments, how many times and what data I'm getting. How could I do? Any help is appreciated.
val testDummy = outputFlatMapTuples.coalesce(Constants.INITIAL_PARTITIONS).saveAsTextFile(outputFilePathForHDFS+"/testDummy")
println("testDummy done") //1 time print
def savesData(iterator: Iterator[(String)]): Iterator[(String)] = {
println("Inside savesData") // now 4 times when coalesce(Constants.INITIAL_PARTITIONS)=2
println("iter size"+iterator.size) // 2 735 2 735 values
val filenamesWithExtension = outputPath + "/index.nt"
println("filenamesWithExtension "+filenamesWithExtension.length) //4 times
var list = List[(String)]()
val fileWritter = new FileWriter(filenamesWithExtension,true)
val bufferWritter = new BufferedWriter(fileWritter)
while (iterator.hasNext){ //iterator.hasNext is false
println("inside iterator") //0 times
val dat = iterator.next()
println("datadata "+iterator.next())
bufferWritter.write(dat + "\n")
bufferWritter.flush()
println("index files written")
val dataElements = dat.split(" ")
println("dataElements") //0
list = list.::(dataElements(0))
list = list.::(dataElements(1))
list = list.::(dataElements(2))
}
bufferWritter.close() //closing
println("savesData method end") //4 times when coal=2
list.iterator
}
println("before saving data into local") //1
val test = outputFlatMapTuples.coalesce(Constants.INITIAL_PARTITIONS).mapPartitions(savesData)
println("testRDD partitions "+test.getNumPartitions) //2
println("testRDD size "+test.collect().length) //0
println("after saving data into local") //1
PS: I followed, this and this but not exactly same what I'm looking for, I did somehow but not getting anything in index.nt

A couple of things:
Never call Iterator.size if you plan to use data later. Iterators are TraversableOnce. The only way to compute Iterator size is to traverse all its element and after that there is no more data to be read.
Don't use transformations like mapPartitions for side effects. If you want to perform some type of IO use actions like foreach / foreachPartition. It is a bad practice and doesn't guarantee that given piece of code will be executed only once.
Local path inside action or transformations is a local path of particular worker. If you want to write directly on the client machine you should fetch data first with collect or toLocalIterator. It could be better though to write to distributed storage and fetch data later.

Java 7 provides means to watch directories.
https://docs.oracle.com/javase/tutorial/essential/io/notification.html
The idea is to create a watch service, register it with the directory of interest (mention the events of your interest, like file creation, deletion, etc.,), do watch, you will be notified of any events like creation, deletion, etc., you can take whatever action you want then.
You will have to depend on Java hdfs api heavily wherever applicable.
Run the program in background since it waits for events forever. (You can write logic to quit after you do whatever you want)
On the other hand, shell scripting will also help.
Be aware of coherency model of hdfs file system while reading files.
Hope this helps with some idea.

JMX results are confusing

I am trying to learn JMX for the last few days and now got confuse here.
I have written a simple JMX programe which is using the APIs of package java.lang.management and trying to extract the Pid, CPU time, user time. In my result I am only getting the results of current JVM thread which is my JMX programe itself but I thought I should get the result of all the processes running over JVM on the same machine. How I will get the pids, cpu time, user time for all java processes running in JVM(LINUX/WDs).
How should I can get the pids, cpu time, user time for all non-java processes running in my machine(LINUX/WDs).
My code is below:
public void update() throws Exception{
final ThreadMXBean bean = ManagementFactory.getThreadMXBean();
final long[] ids = bean.getAllThreadIds();
final ThreadInfo[] infos = bean.getThreadInfo(ids);
for (long id : ids) {
if (id == threadId) {
continue; // Exclude polling thread
}
final long c = bean.getThreadCpuTime(id);
final long u = bean.getThreadUserTime(id);
if (c == -1 || u == -1) {
continue; // Thread died
}
}
String name = null;
for (int i = 0; i < infos.length; i++) {
name = infos[i].getThreadName();
System.out.print("The name of the id is /n" + name);
}
}
I am always getting the result:
The name of the id is Attach Listener
The name of the id is Signal Dispatcher
The name of the id is Finalizer
The name of the id is Reference Handler
The name of the id is main
I have some other java processes running on my machine they are not been included in the results of bean.getAllThreadIds() API..

Ah, now I see what you want to do. I'm afraid I have some bad news.
The APIs that are exposed through ManagementFactory allow you to monitor only the JVM in which your code is running. To monitor other JVMs, you have to use the JMX Remoting API (javax.management.remote), and that introduces a whole new range of issues you have to deal with.
It sounds like what you want to do is basically write your own management console using the stock APIs provided by out-of-the-box JDK. Short answer: you can't get there from here. Slightly longer answer: you can get there from here, but the road is long, winding, uphill (nearly) the entire way, and when you're done you will most likely wish you had gone a different route (read that: use a management console that has already been written).
I recommend you use JConsole or some other management console to monitor your application(s). In my experience it is usually only important that a human (not a program) interpret the stats that are provided by the various MBeans whose references are obtainable through the ManagementFactory static methods. After all, if a program had access to, say, the amount of CPU used by some other process, what conceivable use would it have with that information (other than to provide it in some human-readable format)?

OrientDB slow write

OrientDB official site says:
On common hardware stores up to 150.000 documents per second, 10
billions of documents per day. Big Graphs are loaded in few
milliseconds without executing costly JOIN such as the Relational
DBMSs.
But, executing the following code shows that it's taking ~17000ms to insert 150000 simple documents.
import com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx;
import com.orientechnologies.orient.core.record.impl.ODocument;
public final class OrientDBTrial {
public static void main(String[] args) {
ODatabaseDocumentTx db = new ODatabaseDocumentTx("remote:localhost/foo");
try {
db.open("admin", "admin");
long a = System.currentTimeMillis();
for (int i = 1; i < 150000; ++i) {
final ODocument foo = new ODocument("Foo");
foo.field("code", i);
foo.save();
}
long b = System.currentTimeMillis();
System.out.println(b - a + "ms");
for (ODocument doc : db.browseClass("Foo")) {
doc.delete();
}
} finally {
db.close();
}
}
}
My hardware:
Dell Optiplex 780
Intel(R) Core(TM)2 Duo CPU E7500 # 2.93Ghz
8GB RAM
Windows 7 64bits
What am I doing wrong?
Splitting the saves in 10 concurrent threads to minimize Java's overhead made it run in ~13000ms. Still far slower than what OrientDB front page says.

You can achieve that by using 'Flat Database' and orientdb as an embedded library in java
see more explained here
http://code.google.com/p/orient/wiki/JavaAPI
what you use is server mode and it sends many requests to orientdb server,
judging by your benchmark you got ~10 000 inserts per seconds which is not bad,
e.g I think 10 000 requests/s is very good performance for any webserver
(and orientdb server actually is a webserver and you can query it through http, but I think java is using binary mode)

The numbers from the OrientDB site are benchmarked for a local database (with no network overhead), so if you use a remote protocol, expect some delays.
As Krisztian pointed out, reuse objects if possible.

Read the documentation first on how to achive the best performance!
Few tips:
-> Do NOT instantiate ODocument always:
final ODocument doc;
for (...) {
doc.reset();
doc.setClassName("Class");
// Put data to fields
doc.save();
}
-> Do NOT rely on System.currentTimeMillis() - use perf4j or similar tool to measure times, because the first one measures global system times hence includes execution time of all other programs running on your system!

Java storedProcedure stops with OutOfMemoryError

I'm working on a Java project, running on Tomcat 6, which connects to a MySQL database. All procedures run as they should, both when testing local as testing on the server of our customer. There is one exception however, and that's for one procedure which retrieves a whole lot of data to generate a report. The stored procedure takes like 13 minutes or so when executing it from MySQL. When I run the application locally and connect to the online database, the procedure does work, the only time it doesn't work, is when it is run on the server of our client.
The client is pretty protective over his server, so we have limited control over it, but they do want us to solve the problem. When i check the log files, no errors are thrown from the function that executes the stored procedure. And putting some debug logs in the code, it shows that it does get to the execute call, but doesn't log the debug right after the call, neither logs the error in the catch, but does get into the finally section.
They claim there are no time-out errors in the MySQL logs.
If anyone has any idea on what might cause this problem, any help will be appreciated.
update:
after some nagging to the server administrator, I've finally got access to the catalina logs, and in those logs, i've finally found an error that has some meaning:
Exception in thread "Thread-16" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:407)
at java.lang.StringBuffer.append(StringBuffer.java:241)
at be.playlane.mink.database.SelectExportDataProcedure.bufferField(SelectExportDataProcedure.java:68)
at be.playlane.mink.database.SelectExportDataProcedure.extractData(SelectExportDataProcedure.java:54)
at org.springframework.jdbc.core.JdbcTemplate.processResultSet(JdbcTemplate.java:1033)
at org.springframework.jdbc.core.JdbcTemplate.extractReturnedResultSets(JdbcTemplate.java:947)
at org.springframework.jdbc.core.JdbcTemplate$5.doInCallableStatement(JdbcTemplate.java:918)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:876)
at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:908)
at org.springframework.jdbc.object.StoredProcedure.execute(StoredProcedure.java:113)
at be.playlane.mink.database.SelectExportDataProcedure.execute(SelectExportDataProcedure.java:29)
at be.playlane.mink.service.impl.DefaultExportService$ExportDataRunnable.run(DefaultExportService.java:82)
at java.lang.Thread.run(Thread.java:636)
Weird tho that this doesn't log to the application logs, even tho it is wrapped within a try catch. Now based upon the error, the problem lies withing this methods:
public Object extractData(ResultSet rs) throws SQLException, DataAccessException
{
StringBuffer buffer = new StringBuffer();
try
{
// get result set meta data
ResultSetMetaData meta = rs.getMetaData();
int count = meta.getColumnCount();
// get the column names; column indices start from 1
for (int i = 1; i < count + 1; ++i)
{
String name = meta.getColumnName(i);
bufferField(name, i == count, buffer);
}
while (rs.next())
{
// get the column values; column indices start from 1
for (int i = 1; i < count + 1; ++i)
{
String value = rs.getString(i);
bufferField(value, i == count, buffer);
}
}
}
catch (Exception e)
{
logger.error("Failed to extractData SelectExportDataProcedue: ", e);
}
return buffer.toString();
}
private void bufferField(String field, boolean last, StringBuffer buffer)
{
try
{
if (field != null)
{
field = field.replace('\r', ' ');
field = field.replace('\n', ' ');
buffer.append(field);
}
if (last)
{
buffer.append('\n');
}
else
{
buffer.append('\t');
}
}
catch (Exception e)
{
logger.error("Failed to bufferField SelectExportDataProcedue: ", e);
}
}
The goal of these function is to export a certain resultset to an excel file (which happens on a higher level).
So if anyone has some tips on optimising this, they are very well welcome.

Ok, your stack trace gives you the answer:
Exception in thread "Thread-16" java.lang.OutOfMemoryError: Java heap space
That's why you're not logging, the application is crashing (Thread, to be specific). Judging from your description it sounds like you have a massive dataset that needs to be paged.
while (rs.next())
{
// get the column values; column indices start from 1
for (int i = 1; i < count + 1; ++i)
{
String value = rs.getString(i);
bufferField(value, i == count, buffer);
}
}
This is where you're thread dies (probably). Basically your StringBuffer runs out of memory. As for correcting it, there's a huge amount of options. Throw more memory at the problem on the client side (either by configuring the JVM (Here's a link):
How to set the maximum memory usage for JVM?
Or, if you're already doing that, throw more RAM into the device.
From a programming perspective it sounds like this is a hell of a report. You could offload some of the number crunching to MySQL rather than buffering on your end (if possible), or, if this is a giant report I would consider streaming it to a File and then reading via a buffered stream to fill the report.
It totally depends on what the report is. If it is tiny, I would aim at doing more work in SQL to minimize the result set. If it is a giant report then buffering is the other option.
Another possibility that you might be missing is that the ResultSet (depending on implementations) is probably buffered. That means instead of reading it all to strings maybe your report can take the ResultSet object directly and print from it. The downside to this, of course, is that a stray SQL exception will kill your report.
Best of luck, I'd try the memory options first. You might be running with something hilariously small like 128 and it will be simple (I've seen this happen a lot on remotely administered machines).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Spark : TaskResultLost (result lost from block manager) Error On cluster - java

SOLVED: The problem solved by increasing the master RAM size. I studied my case and found out that based on my design assigning 32GB of RAM would be sufficient. Now by doing than, my program is working fine and is calculating everything correctly.

In my case, I got this error because a firewall was blocking the block manager ports between the driver and the executors. The port can be specified with: spark.blockManager.port and spark.driver.blockManager.port See https://spark.apache.org/docs/latest/configuration.html#networking

Related

Ignite consumes all memory and fails with OutOfMemory when iterating over cache

Save a spark RDD using mapPartition with iterator

JMX results are confusing

OrientDB slow write

Java storedProcedure stops with OutOfMemoryError

Categories

Resources