This question spans both serverfault and stackoverflow so I just picked this one.
I get the following exception with some simple file copy code. Its running on Windows Server 2003 x64
Caused by: java.io.IOException: Insufficient system resources exist to complete the requested service
at sun.nio.ch.FileDispatcher.pwrite0(Native Method)
at sun.nio.ch.FileDispatcher.pwrite(Unknown Source)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.write(Unknown Source)
at sun.nio.ch.FileChannelImpl.write(Unknown Source)
at sun.nio.ch.FileChannelImpl.transferFromFileChannel(Unknown Source)
at sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source)
at Tools.copy(Tools.java:473)
public static void copy(FileChannel input, FileChannel output) throws IOException {
final long size = input.size();
long pos = 0;
while (pos < size) {
final long count = (size - pos) > FIFTY_MB ? FIFTY_MB : (size - pos);
pos += output.transferFrom(input, pos, count);
}
}
The thing is the server that is running this code is brand new and super powerful, so I don't understand what system resource it could possibly be running out of.
This looks like the error described here:
http://support.microsoft.com/kb/304101
But I've tried adding the registry edits to increase kernel memory page size, and that didn't help.
What I really don't get is I've seen code that uses FileChannel transferFrom with a lot larger chunks of 50 MB. I've seen that code work for files well over 1 GB in one chunk. But the file the server is getting stuck on is just 32 MB!
What is going on here? Is this a problem with FileChannel or Windows?
It may be related to "Bug" ID 4938442: Insufficient System Resources When Copying Large Files with NIO FileChannels.
Evaluation: Not a bug. This is most likely a file-server (or possibly client)
configuration issue.
CUSTOMER SUBMITTED WORKAROUND :
Don't use NIO; we'd prefer to avoid this workaround since
NIO offers a significant performance boost for large files
(at least when performing local disk-to-local disk copies)
We can transfer using a smaller number of bytes. The
actual number of bytes that may be copied without
encountering this error seems to differ on Windows XP and
Windows 2000 server. Certainly a value of 32Mb appears to
work.
Related
Our main goal is that we want to perform operations on a large amount of input data (around 80 GB). The problem is that even for smaller datasets, we often get java heap space or other memory related errors.
Our temporary solution was to simply specify a higher maximum heap size (using -Xmx locally or by setting spark.executor.memory and spark.driver.memory for our spark instance) but this does not seem to generalize well, we still run into errors for bigger datasets or higher zoom levels.
For better understanding, here is the basic concept of what we do with our data:
Load the data using HadoopGeoTiffRDD.spatial(new Path(path))
Map the data to the tiles of some zoom level
val extent = geotrellis.proj4.CRS.fromName("EPSG:4326").worldExtent
val layout = layoutForZoom(zoom, extent)
val metadata: TileLayerMetadata[SpatialKey] = dataSet.collectMetadata[SpatialKey](layout)
val rdd = ContextRDD(dataSet.tileToLayout[SpatialKey](metadata), metadata)
Where layoutForZoom is basically the same as geotrellis.spark.tiling.ZoomedLayoutScheme.layoutForZoom
Then we perform some operations on the entries of the rdd using rdd.map and rdd.foreach for the mapped rdds.
We aggregate the results of four tiles which correspond to a single tile in a higher zoom level using groupByKey
Go to 3 until we reached a certain zoom level
The goal would be: Given a memory limit of X GB, partition and work on the data in a way that we consume at most X GB.
It seems like the tiling of the dataset via tileToLayout already takes too much memory on higher zoom levels (even for very small data sets). Are there any alternatives for tiling and loading the data according to some LayoutDefinition? As far as we understand, HadoopGeoTiffRDD.spatial already splits the dataset into small regions which are then divided into the tiles by tileToLayout. Is it somehow possible to directly load the dataset corresponding to the LayoutDefinition?
In our concrete scenario we have 3 workers with 2GB RAM and 2 cores each. On one of them is running the spark master too, which gets its work via spark-submit from a driver instance. We tried configurations like this:
val conf = new SparkConf().setAppName("Converter").setMaster("spark://IP-ADDRESS:PORT")
.set("spark.executor.memory", "900m") // to be below the available 1024 MB of default slave RAM
.set("spark.memory.fraction", "0.2") // to get more usable heap space
.set("spark.executor.cores", "2")
.set("spark.executor.instances", "3")
An example for a heap space error at the tiling step (step 2):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 (TID 5, 192.168.0.2, executor 1): java.lang.OutOfMemoryError: Java heap space
at scala.collection.mutable.ArrayBuilder$ofByte.mkArray(ArrayBuilder.scala:128)
at scala.collection.mutable.ArrayBuilder$ofByte.resize(ArrayBuilder.scala:134)
at scala.collection.mutable.ArrayBuilder$ofByte.sizeHint(ArrayBuilder.scala:139)
at scala.collection.IndexedSeqOptimized$class.slice(IndexedSeqOptimized.scala:115)
at scala.collection.mutable.ArrayOps$ofByte.slice(ArrayOps.scala:198)
at geotrellis.util.StreamingByteReader.getBytes(StreamingByteReader.scala:98)
at geotrellis.raster.io.geotiff.LazySegmentBytes.getBytes(LazySegmentBytes.scala:104)
at geotrellis.raster.io.geotiff.LazySegmentBytes.readChunk(LazySegmentBytes.scala:81)
at geotrellis.raster.io.geotiff.LazySegmentBytes$$anonfun$getSegments$1.apply(LazySegmentBytes.scala:99)
at geotrellis.raster.io.geotiff.LazySegmentBytes$$anonfun$getSegments$1.apply(LazySegmentBytes.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1012)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1010)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2118)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2118)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1026)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1008)
at geotrellis.spark.TileLayerMetadata$.collectMetadataWithCRS(TileLayerMetadata.scala:147)
at geotrellis.spark.TileLayerMetadata$.fromRdd(TileLayerMetadata.scala:281)
at geotrellis.spark.package$withCollectMetadataMethods.collectMetadata(package.scala:212)
...
Update:
I extracted an example from my code and uploaded it to the repository at https://gitlab.com/hwuerz/geotrellis-spark-example. You can run the example locally using sbt run and selecting the class demo.HelloGeotrellis. This will create the tiles for a tiny input dataset example.tif according to our layout definition starting at zoom level 20 (using two cores per default, can be adjusted in the file HelloGeotrellis.scala ~ if level 20 somehow still works, it will most likely fail using higher values for bottomLayer).
To run the code on the Spark Cluster, I use the following command:
`sbt package && bash submit.sh --dataLocation /mnt/glusterfs/example.tif --bottomLayer 20 --topLayer 10 --cesiumTerrainDir /mnt/glusterfs/terrain/ --sparkMaster spark://192.168.0.8:7077`
Wheresubmit.sh basically runs spark-submit (see the file in the repo).
The example.tif is included in the repo within the directory DebugFiles. In my setup the file is distributed via glusterfs which is why the path points to this location. The cesiumTerrainDir is just an directory where we store our generated output.
We think the main problem might be that using the given api calls, geotrellis loads the complete structure of the layout into the memory, which is too big for higher zoom levels. Is there any way to avoid this?
Is there any way I can log native memory usage from Java, i.e., either native memory directly or the total memory the process is using (e.g., ask the OS)?
I'd like to run this on user's machines behind the scenes, so the NativeMemoryTracking command line tool isn't really the most appealing option. I already log free/max/total heap sizes.
Background: A user of my software reported an exception (below) and I have no idea why. My program does use SWIG'd native code, but it's a simple API, I don't think it has a memory leak, and wasn't on the stacktrace (or run immediately before the error). My log indicated there was plenty of heap space available when the error occurred. So I'm really at a loss for how to track this down.
java.lang.OutOfMemoryError: null
at java.io.RandomAccessFile.writeBytes0(Native Method) ~[na:1.7.0_45]
at java.io.RandomAccessFile.writeBytes(Unknown Source) ~[na:1.7.0_45]
at java.io.RandomAccessFile.write(Unknown Source) ~[na:1.7.0_45]
The error occurred on Windows (7 or 10)(?) from within webstart, configured with these parameters:
<java href="http://java.sun.com/products/autodl/j2se" initial-heap-size="768m" java-vm-args="" max-heap-size="900m" version="1.7+"/>
If you want tp track down the JVM memory on your certain method or lines of code you can use the Runtime API.
Runtime runtime = Runtime.getRuntime();
NumberFormat format = NumberFormat.getInstance();
long maxMemory = runtime.maxMemory();
long allocatedMemory = runtime.totalMemory();
long freeMemory = runtime.freeMemory();
System.out.println("free memory: " + format.format(freeMemory / 1024));
System.out.println("allocated memory: " + format.format(allocatedMemory / 1024));
System.out.println("max memory: " + format.format(maxMemory / 1024));
System.out.println("total free memory: " + format.format((freeMemory + (maxMemory - allocatedMemory)) / 1024));
I ended up using this code which asks the OS for RSS and Peak memory usage. It was straightforward for me to add since I already have a SWIG module set up. The code might not be threadsafe since I hit a random malloc exception when I was testing, meaning I'm not sure I want to keep it in there.
I'm really surprised the JVM doesn't provide a way to do this. Please someone let me know if there's a way.
Here is a snippet of code that sets the string equal to the amount of memory used(mb)/total memory(mb) You can then use this to log however you want!
Runtime instance = Runtime.getRuntime();
String mem = "Memory Used: "+ (instance.totalMemory() - instance.freeMemory()) / mb +"MB ("+
(int)((instance.totalMemory() - instance.freeMemory())*1.0/instance.totalMemory()*100.0)+"%)"
public final void writeBytes(String s) throws IOException {
int len = s.length();
byte[] b = new byte[len];
s.getBytes(0, len, b, 0);
writeBytes(b, 0, len);
}
Looking at the source, it is possible that a sufficiently large String would have caused out of memory error. I suspect that your heap log was done before this happened which explains the free heap space you saw. I suggest you verify if this is the case and if yes, limit the String size and/or increase the heap size.
I request your kind help and assistance in solving the error of "Java Command Fails" which keeps throwing whenever I try to tag an Arabic corpus with size of 2 megabytes. I have searched the web and stanford POS tagger mailing list. However, I did not find the solution. I read some posts on problems similar to this, and it was suggested that the memory is used out. I am not sure of that. Still I have 19GB free memory. I tried every possible solution offered, but the same error keeps showing.
I have average command on Python and good command on Linux. I am using LinuxMint17 KDE 64-bit, Python3.4, NLTK alpha and Stanford POS tagger model for Arabic . This is my code:
import nltk
from nltk.tag.stanford import POSTagger
arabic_postagger = POSTagger("/home/mohammed/postagger/models/arabic.tagger", "/home/mohammed/postagger/stanford-postagger.jar", encoding='utf-8')
print("Executing tag_corpus.py...\n")
# Import corpus file
print("Importing data...\n")
file = open("test.txt", 'r', encoding='utf-8').read()
text = file.strip()
print("Tagging the corpus. Please wait...\n")
tagged_corpus = arabic_postagger.tag(nltk.word_tokenize(text))
IF THE CORPUS SIZE IS LESS THAN 1MB ( = 100,000 words), THERE WILL BE NO ERROR. BUT WHEN I TRY TO TAG 2MB CORPUS, THEN THE FOLLOWING ERROR MESSAGE IS SHOWN:
Traceback (most recent call last):
File "/home/mohammed/experiments/current/tag_corpus2.py", line 17, in <module>
tagged_lst = arabic_postagger.tag(nltk.word_tokenize(text))
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/tag/stanford.py", line 59, in tag
return self.batch_tag([tokens])[0]
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/tag/stanford.py", line 81, in batch_tag
stdout=PIPE, stderr=PIPE)
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/internals.py", line 171, in java
raise OSError('Java command failed!')
OSError: Java command failed!
I intend to tag 300 Million words to be used in my Ph.D. research project. If I keep tagging 100 thousand words at a time, I will have to repeat the task 3000 times. It will kill me!
I really appreciate your kind help.
After your import lines add this line:
nltk.internals.config_java(options='-xmx2G')
This will increase the maximum RAM size that java allows Stanford POS Tagger to use. The '-xmx2G' changes the maximum allowable RAM to 2GB instead of the default 512MB.
See What are the Xms and Xmx parameters when starting JVMs? for more information
If you're interested in how to debug your code, read on.
So we see that the command fail when handling huge amount of data so the first thing to look at is how the Java is initialized in NLTK before calling the Stanford tagger, from https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L19 :
from nltk.internals import find_file, find_jar, config_java, java, _java_options
We see that the nltk.internals package is handling the different Java configurations and parameters.
Then we take a look at https://github.com/nltk/nltk/blob/develop/nltk/internals.py#L65 and we see that the no value is added for the memory allocation for Java.
In version 3.9.2, the StanfordTagger class constructor accepts a parameter called java_options which can be used to set the memory for the POSTagger and also the NERTagger.
E.g. pos_tagger = StanfordPOSTagger('models/english-bidirectional-distsim.tagger', path_to_jar='stanford-postagger-3.9.2.jar', java_options='-mx3g')
I found the answer by #alvas to not work because the StanfordTagger was overriding my memory setting with the built-in default of 1000m. Perhaps using nltk.internals.config_java after initializing StanfordPOSTagger might work but I haven't tried that.
I am trying to find out which method/loop has given java.lang.OutOfMemoryError: Java heap space in my app. I am pretty new to profiling java apps using Eclipse Memory Analyzer.
In the Image, it is clear that calling JNI Local has Maximum Retained Heap, But we don't have any JNI calls in our app.
Please review and confirm whether the Memory Leak is caused by calling any Native code (JNI Local) or by using any string iterations or anything else?
for reference here I am attaching total overview and stacktrace of .hprof file:
Accumulated Objects by Class
Label Number of Objects Used Heap Size Retained Heap Size
java.lang.String
First 10 of 15,252,128 objects 15,252,128 488,068,096 2,317,743,632
Thread Details
Thread WorkManager(2)-8
Thread Properties
Object / Stack Frame java.lang.Thread # 0x69af09608
Name WorkManager(2)-8
Shallow Heap 112
Retained Heap 2,384,942,672
Context Class Loader org.jboss.util.loading.DelegatingClassLoader # 0x6912bf868
Is Daemon true
Total: 6 entries
Thread Stack
WorkManager(2)-8
at java.lang.StringCoding.decode(Ljava/lang/String;[BII)[C (StringCoding.java:185)
at java.lang.String.<init>([BIILjava/lang/String;)V (String.java:451)
at java.lang.String.<init>([BLjava/lang/String;)V (String.java:523)
at java.io.UnixFileSystem.list(Ljava/io/File;)[Ljava/lang/String; (Native Method)
at java.io.File.list()[Ljava/lang/String; (File.java:990)
at java.io.File.listFiles(Ljava/io/FilenameFilter;)[Ljava/io/File; (File.java:1107)
at com.adobe.idp.dsc.service.file.impl.WatchedFolderUtils.resolveDuplicateFile(Ljava/lang/String;Z)Ljava/io/File; (WatchedFolderUtils.java:515)
at com.adobe.idp.dsc.provider.service.file.write.impl.FileResultHandlerImpl.resolveDestination(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Z)Ljava/io/File; (FileResultHandlerImpl.java:308)
at com.adobe.idp.dsc.provider.service.file.write.impl.FileResultHandlerImpl.preserveFiles(Lcom/adobe/idp/dsc/registry/infomodel/Endpoint;Ljava/lang/String;Ljava/lang/String;)V (FileResultHandlerImpl.java:817)
at com.adobe.idp.dsc.provider.service.file.write.impl.FileResultHandlerImpl.saveResults(Lcom/adobe/idp/dsc/InvocationResponse;Lcom/adobe/idp/dsc/registry/infomodel/Endpoint;Ljava/lang/String;Ljava/lang/String;)V (FileResultHandlerImpl.java:493)
at com.adobe.idp.dsc.provider.service.file.write.impl.FileResultHandlerImpl.handleSuccess(Ljava/lang/Object;Ljava/util/Map;)V (FileResultHandlerImpl.java:104)
at com.adobe.idp.dsc.provider.service.file.write.impl.FileResultHandlerImpl.handleSuccess(Lcom/adobe/idp/dsc/InvocationResponse;)V (FileResultHandlerImpl.java:73)
at sun.reflect.GeneratedMethodAccessor2229.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:616)
at com.adobe.idp.dsc.component.impl.DefaultPOJOInvokerImpl.invoke(Lcom/adobe/idp/dsc/InvocationRequest;)Lcom/adobe/idp/dsc/InvocationResponse; (DefaultPOJOInvokerImpl.java:118)
at com.adobe.idp.dsc.interceptor.impl.InvocationInterceptor.intercept(Lcom/adobe/idp/dsc/component/ComponentContext;Lcom/adobe/idp/dsc/InvocationRequest;Lcom/adobe/idp/dsc/interceptor/RequestInterceptorChain;)Lcom/adobe/idp/dsc/InvocationResponse; (InvocationInterceptor.java:140)
at com.adobe.idp.dsc.interceptor.impl.RequestInterceptorChainImpl.proceed(Lcom/adobe/idp/dsc/component/ComponentContext;Lcom/adobe/idp/dsc/InvocationRequest;)Lcom/adobe/idp/dsc/InvocationResponse; (RequestInterceptorChainImpl.java:60)
at com.adobe.idp.dsc.interceptor.impl.DocumentPassivationInterceptor.intercept(Lcom/adobe/idp/dsc/component/ComponentContext;Lcom/adobe/idp/dsc/InvocationRequest;Lcom/adobe/idp/dsc/interceptor/RequestInterceptorChain;)Lcom/adobe/idp/dsc/InvocationResponse; (DocumentPassivationInterceptor.java:53)
at com.adobe.idp.dsc.interceptor.impl.RequestInterceptorChainImpl.proceed(Lcom/adobe/idp/dsc/component/ComponentContext;Lcom/adobe/idp/dsc/InvocationRequest;)Lcom/adobe/idp/dsc/InvocationResponse; (RequestInterceptorChainImpl.java:60)
at com.adobe.idp.dsc.transaction.interceptor.TransactionInterceptor$1.doInTransaction(Lcom/adobe/idp/dsc/transaction/TransactionContext;)Ljava/lang/Object; (TransactionInterceptor.java:74)
at com.adobe.idp.dsc.transaction.impl.ejb.adapter.EjbTransactionCMTAdapterBean.execute(Lcom/adobe/idp/dsc/transaction/TransactionContext;Lcom/adobe/idp/dsc/transaction/TransactionCallback;)Ljava/lang/Object; (EjbTransactionCMTAdapterBean.java:357)
at com.adobe.idp.dsc.transaction.impl.ejb.adapter.EjbTransactionCMTAdapterBean.doSupports(Lcom/adobe/idp/dsc/transaction/TransactionDefinition;Lcom/adobe/idp/dsc/transaction/TransactionCallback;)Ljava/lang/Object; (EjbTransactionCMTAdapterBean.java:227)
at sun.reflect.GeneratedMethodAccessor698.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:616)
at org.jboss.invocation.Invocation.performCall(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (Invocation.java:386)
at org.jboss.ejb.StatelessSessionContainer$ContainerInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (StatelessSessionContainer.java:233)
at org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (CachedConnectionInterceptor.java:156)
at org.jboss.ejb.plugins.StatelessSessionInstanceInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (StatelessSessionInstanceInterceptor.java:173)
at org.jboss.ejb.plugins.CallValidationInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (CallValidationInterceptor.java:63)
at org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(Lorg/jboss/invocation/Invocation;Z)Ljava/lang/Object; (AbstractTxInterceptor.java:121)
at org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (TxInterceptorCMT.java:378)
at org.jboss.ejb.plugins.TxInterceptorCMT.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (TxInterceptorCMT.java:181)
at org.jboss.ejb.plugins.SecurityInterceptor.process(Lorg/jboss/invocation/Invocation;Z)Ljava/lang/Object; (SecurityInterceptor.java:228)
at org.jboss.ejb.plugins.SecurityInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (SecurityInterceptor.java:211)
at org.jboss.ejb.plugins.security.PreSecurityInterceptor.process(Lorg/jboss/invocation/Invocation;Z)Ljava/lang/Object; (PreSecurityInterceptor.java:97)
at org.jboss.ejb.plugins.security.PreSecurityInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (PreSecurityInterceptor.java:81)
at org.jboss.ejb.plugins.LogInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (LogInterceptor.java:205)
at org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (ProxyFactoryFinderInterceptor.java:138)
at org.jboss.ejb.SessionContainer.internalInvoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (SessionContainer.java:650)
at org.jboss.ejb.Container.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object; (Container.java:1092)
at org.jboss.ejb.plugins.local.BaseLocalProxyFactory.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (BaseLocalProxyFactory.java:436)
at org.jboss.ejb.plugins.local.StatelessSessionProxy.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (StatelessSessionProxy.java:103)
at $Proxy289.doSupports(Lcom/adobe/idp/dsc/transaction/TransactionDefinition;Lcom/adobe/idp/dsc/transaction/TransactionCallback;)Ljava/lang/Object; (Unknown Source)
at com.adobe.idp.dsc.transaction.impl.ejb.EjbTransactionProvider.execute(Lcom/adobe/idp/dsc/transaction/TransactionDefinition;Lcom/adobe/idp/dsc/transaction/TransactionCallback;)Ljava/lang/Object; (EjbTransactionProvider.java:104)
Your stacktrace shows the method UnixFileSystem.list(File) which is a native method and it’s creating a string array for storing the file names. The remaining question is why it is creating an array of size 16 million. The method starts with an array of size 16 and will double it once the directory scan has more iterations than the array size. So the number of iterations now are between 8 million and 16 million. You either have that unbelievable number of files in that directory or there is a problem in the loop termination condition in that native code.
I am writing a hadoop job which processes many files and creates multiple files from each file. I am using "MultipleOutputs" to write them. It works fine for smaller number of files but i get the following error for large number of files.
The exception is raised on the MultipleOutputs.write(key, value, outputPath);
I have tried increasing the ulimit and -Xmx but to no avail.
2013-01-15 13:44:05,154 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hdfs.DFSOutputStream$Packet.<init>(DFSOutputStream.java:201)
at org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1423)
at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:161)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:136)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:125)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:116)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:90)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter. writeObject( TextOutputFormat.java:78)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter. write(TextOutputFormat.java:99)
**at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write( MultipleOutputs.java:386)
at com.demoapp.collector.MPReducer.reduce(MPReducer.java:298)
at com.demoapp.collector.MPReducer.reduce(MPReducer.java:28)**
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Any ideas?
If it doesn't work with a large number of files, it's probably because you've hit the max number of files that can be served by a datanode. This can be controlled with a property called dfs.datanode.max.xcievers in hdfs-site.xml.
As recommended here, you should bump its value to something that will allow your job to run correctly, they recommend 4096:
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
I increased the number of reduce task from 1 to 8 and increased the values of io.sort.mb to and mapred.task.timeout.
Details
This link was helpful-
Cloudera blog