I am writing a hadoop job which processes many files and creates multiple files from each file. I am using "MultipleOutputs" to write them. It works fine for smaller number of files but i get the following error for large number of files.
The exception is raised on the MultipleOutputs.write(key, value, outputPath);
I have tried increasing the ulimit and -Xmx but to no avail.
2013-01-15 13:44:05,154 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hdfs.DFSOutputStream$Packet.<init>(DFSOutputStream.java:201)
at org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1423)
at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:161)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:136)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:125)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:116)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:90)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter. writeObject( TextOutputFormat.java:78)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter. write(TextOutputFormat.java:99)
**at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write( MultipleOutputs.java:386)
at com.demoapp.collector.MPReducer.reduce(MPReducer.java:298)
at com.demoapp.collector.MPReducer.reduce(MPReducer.java:28)**
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Any ideas?
If it doesn't work with a large number of files, it's probably because you've hit the max number of files that can be served by a datanode. This can be controlled with a property called dfs.datanode.max.xcievers in hdfs-site.xml.
As recommended here, you should bump its value to something that will allow your job to run correctly, they recommend 4096:
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
I increased the number of reduce task from 1 to 8 and increased the values of io.sort.mb to and mapred.task.timeout.
Details
This link was helpful-
Cloudera blog
Related
Our main goal is that we want to perform operations on a large amount of input data (around 80 GB). The problem is that even for smaller datasets, we often get java heap space or other memory related errors.
Our temporary solution was to simply specify a higher maximum heap size (using -Xmx locally or by setting spark.executor.memory and spark.driver.memory for our spark instance) but this does not seem to generalize well, we still run into errors for bigger datasets or higher zoom levels.
For better understanding, here is the basic concept of what we do with our data:
Load the data using HadoopGeoTiffRDD.spatial(new Path(path))
Map the data to the tiles of some zoom level
val extent = geotrellis.proj4.CRS.fromName("EPSG:4326").worldExtent
val layout = layoutForZoom(zoom, extent)
val metadata: TileLayerMetadata[SpatialKey] = dataSet.collectMetadata[SpatialKey](layout)
val rdd = ContextRDD(dataSet.tileToLayout[SpatialKey](metadata), metadata)
Where layoutForZoom is basically the same as geotrellis.spark.tiling.ZoomedLayoutScheme.layoutForZoom
Then we perform some operations on the entries of the rdd using rdd.map and rdd.foreach for the mapped rdds.
We aggregate the results of four tiles which correspond to a single tile in a higher zoom level using groupByKey
Go to 3 until we reached a certain zoom level
The goal would be: Given a memory limit of X GB, partition and work on the data in a way that we consume at most X GB.
It seems like the tiling of the dataset via tileToLayout already takes too much memory on higher zoom levels (even for very small data sets). Are there any alternatives for tiling and loading the data according to some LayoutDefinition? As far as we understand, HadoopGeoTiffRDD.spatial already splits the dataset into small regions which are then divided into the tiles by tileToLayout. Is it somehow possible to directly load the dataset corresponding to the LayoutDefinition?
In our concrete scenario we have 3 workers with 2GB RAM and 2 cores each. On one of them is running the spark master too, which gets its work via spark-submit from a driver instance. We tried configurations like this:
val conf = new SparkConf().setAppName("Converter").setMaster("spark://IP-ADDRESS:PORT")
.set("spark.executor.memory", "900m") // to be below the available 1024 MB of default slave RAM
.set("spark.memory.fraction", "0.2") // to get more usable heap space
.set("spark.executor.cores", "2")
.set("spark.executor.instances", "3")
An example for a heap space error at the tiling step (step 2):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 (TID 5, 192.168.0.2, executor 1): java.lang.OutOfMemoryError: Java heap space
at scala.collection.mutable.ArrayBuilder$ofByte.mkArray(ArrayBuilder.scala:128)
at scala.collection.mutable.ArrayBuilder$ofByte.resize(ArrayBuilder.scala:134)
at scala.collection.mutable.ArrayBuilder$ofByte.sizeHint(ArrayBuilder.scala:139)
at scala.collection.IndexedSeqOptimized$class.slice(IndexedSeqOptimized.scala:115)
at scala.collection.mutable.ArrayOps$ofByte.slice(ArrayOps.scala:198)
at geotrellis.util.StreamingByteReader.getBytes(StreamingByteReader.scala:98)
at geotrellis.raster.io.geotiff.LazySegmentBytes.getBytes(LazySegmentBytes.scala:104)
at geotrellis.raster.io.geotiff.LazySegmentBytes.readChunk(LazySegmentBytes.scala:81)
at geotrellis.raster.io.geotiff.LazySegmentBytes$$anonfun$getSegments$1.apply(LazySegmentBytes.scala:99)
at geotrellis.raster.io.geotiff.LazySegmentBytes$$anonfun$getSegments$1.apply(LazySegmentBytes.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1012)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1010)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2118)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2118)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1026)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1008)
at geotrellis.spark.TileLayerMetadata$.collectMetadataWithCRS(TileLayerMetadata.scala:147)
at geotrellis.spark.TileLayerMetadata$.fromRdd(TileLayerMetadata.scala:281)
at geotrellis.spark.package$withCollectMetadataMethods.collectMetadata(package.scala:212)
...
Update:
I extracted an example from my code and uploaded it to the repository at https://gitlab.com/hwuerz/geotrellis-spark-example. You can run the example locally using sbt run and selecting the class demo.HelloGeotrellis. This will create the tiles for a tiny input dataset example.tif according to our layout definition starting at zoom level 20 (using two cores per default, can be adjusted in the file HelloGeotrellis.scala ~ if level 20 somehow still works, it will most likely fail using higher values for bottomLayer).
To run the code on the Spark Cluster, I use the following command:
`sbt package && bash submit.sh --dataLocation /mnt/glusterfs/example.tif --bottomLayer 20 --topLayer 10 --cesiumTerrainDir /mnt/glusterfs/terrain/ --sparkMaster spark://192.168.0.8:7077`
Wheresubmit.sh basically runs spark-submit (see the file in the repo).
The example.tif is included in the repo within the directory DebugFiles. In my setup the file is distributed via glusterfs which is why the path points to this location. The cesiumTerrainDir is just an directory where we store our generated output.
We think the main problem might be that using the given api calls, geotrellis loads the complete structure of the layout into the memory, which is too big for higher zoom levels. Is there any way to avoid this?
While training data in Mallet, the processed stopped because of OutOfMemoryError. Attribute MEMORY in bin/mallet has already been set to 3GB. The size of training file output.mallet is only 31 MB. I have tried to reduce the training data size. But it still throws the same error:
a161115#a161115-Inspiron-3250:~/dev/test_models/Mallet$ bin/mallet train-classifier --input output.mallet --trainer NaiveBayes --training-portion 0.0001 --num-trials 10
Training portion = 1.0E-4
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.9999
-------------------- Trial 0 --------------------
Trial 0 Training NaiveBayesTrainer with 7 instances
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:309)
at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251)
at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:415)
I would appriciate any help or insights into this problem
EDIT: this is my bin/mallet file.
#!/bin/bash
malletdir=`dirname $0`
malletdir=`dirname $malletdir`
cp=$malletdir/class:$malletdir/lib/mallet-deps.jar:$CLASSPATH
#echo $cp
MEMORY=10g
CMD=$1
shift
help()
{
cat <<EOF
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load SVMLight format data files into Mallet instances
info get information about Mallet instances
train-classifier train a classifier from Mallet data files
classify-dir classify data from a single file with a saved classifier
classify-file classify the contents of a directory with a saved classifier
classify-svmlight classify data from a single file in SVMLight format
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
evaluate-topics estimate the probability of new documents under a trained model
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
bulk-load for big input files, efficiently prune vocabulary and import docs
Include --help with any option for more information
EOF
}
CLASS=
case $CMD in
import-dir) CLASS=cc.mallet.classify.tui.Text2Vectors;;
import-file) CLASS=cc.mallet.classify.tui.Csv2Vectors;;
import-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Vectors;;
info) CLASS=cc.mallet.classify.tui.Vectors2Info;;
train-classifier) CLASS=cc.mallet.classify.tui.Vectors2Classify;;
classify-dir) CLASS=cc.mallet.classify.tui.Text2Classify;;
classify-file) CLASS=cc.mallet.classify.tui.Csv2Classify;;
classify-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Classify;;
train-topics) CLASS=cc.mallet.topics.tui.TopicTrainer;;
infer-topics) CLASS=cc.mallet.topics.tui.InferTopics;;
evaluate-topics) CLASS=cc.mallet.topics.tui.EvaluateTopics;;
prune) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
split) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
bulk-load) CLASS=cc.mallet.util.BulkLoader;;
run) CLASS=$1; shift;;
*) echo "Unrecognized command: $CMD"; help; exit 1;;
esac
java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$#"
It's also worth mentioning that my original training file has 60,000 items. When I reduce the number of items (20,000 instances), training will run like normal, but uses about 10GB RAM.
Check the call to Java in bin/mallet and add the flag -Xmx3g, making sure there isn't another Xmx in it; if so, edit that one).
I usually change both files: mallet files and set the memory to maximum
Mallet.batjava -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
and
java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$#"
I replaced the bold %MALLET_MEMORY% and $MEMORY with the memory I want: e.g. 4G
I request your kind help and assistance in solving the error of "Java Command Fails" which keeps throwing whenever I try to tag an Arabic corpus with size of 2 megabytes. I have searched the web and stanford POS tagger mailing list. However, I did not find the solution. I read some posts on problems similar to this, and it was suggested that the memory is used out. I am not sure of that. Still I have 19GB free memory. I tried every possible solution offered, but the same error keeps showing.
I have average command on Python and good command on Linux. I am using LinuxMint17 KDE 64-bit, Python3.4, NLTK alpha and Stanford POS tagger model for Arabic . This is my code:
import nltk
from nltk.tag.stanford import POSTagger
arabic_postagger = POSTagger("/home/mohammed/postagger/models/arabic.tagger", "/home/mohammed/postagger/stanford-postagger.jar", encoding='utf-8')
print("Executing tag_corpus.py...\n")
# Import corpus file
print("Importing data...\n")
file = open("test.txt", 'r', encoding='utf-8').read()
text = file.strip()
print("Tagging the corpus. Please wait...\n")
tagged_corpus = arabic_postagger.tag(nltk.word_tokenize(text))
IF THE CORPUS SIZE IS LESS THAN 1MB ( = 100,000 words), THERE WILL BE NO ERROR. BUT WHEN I TRY TO TAG 2MB CORPUS, THEN THE FOLLOWING ERROR MESSAGE IS SHOWN:
Traceback (most recent call last):
File "/home/mohammed/experiments/current/tag_corpus2.py", line 17, in <module>
tagged_lst = arabic_postagger.tag(nltk.word_tokenize(text))
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/tag/stanford.py", line 59, in tag
return self.batch_tag([tokens])[0]
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/tag/stanford.py", line 81, in batch_tag
stdout=PIPE, stderr=PIPE)
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/internals.py", line 171, in java
raise OSError('Java command failed!')
OSError: Java command failed!
I intend to tag 300 Million words to be used in my Ph.D. research project. If I keep tagging 100 thousand words at a time, I will have to repeat the task 3000 times. It will kill me!
I really appreciate your kind help.
After your import lines add this line:
nltk.internals.config_java(options='-xmx2G')
This will increase the maximum RAM size that java allows Stanford POS Tagger to use. The '-xmx2G' changes the maximum allowable RAM to 2GB instead of the default 512MB.
See What are the Xms and Xmx parameters when starting JVMs? for more information
If you're interested in how to debug your code, read on.
So we see that the command fail when handling huge amount of data so the first thing to look at is how the Java is initialized in NLTK before calling the Stanford tagger, from https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L19 :
from nltk.internals import find_file, find_jar, config_java, java, _java_options
We see that the nltk.internals package is handling the different Java configurations and parameters.
Then we take a look at https://github.com/nltk/nltk/blob/develop/nltk/internals.py#L65 and we see that the no value is added for the memory allocation for Java.
In version 3.9.2, the StanfordTagger class constructor accepts a parameter called java_options which can be used to set the memory for the POSTagger and also the NERTagger.
E.g. pos_tagger = StanfordPOSTagger('models/english-bidirectional-distsim.tagger', path_to_jar='stanford-postagger-3.9.2.jar', java_options='-mx3g')
I found the answer by #alvas to not work because the StanfordTagger was overriding my memory setting with the built-in default of 1000m. Perhaps using nltk.internals.config_java after initializing StanfordPOSTagger might work but I haven't tried that.
I am getting OutOfMemory Exception in my app. I have taken the heap dump and ananlyzed through MAT. While analyzing my app memory usage I found following suspect. I am unable to understand the main cause behind these suspects.
Please help me in understanding this leakage suspects and what is relevant solution for it.
Suspect 1
The thread org.apache.tomcat.util.threads.TaskThread # 0x2bdf5ff8 "ajp-bio-9002"-exec-5 keeps local variables with total size 113,973,288 (50.72%) bytes.
The memory is accumulated in one instance of "org.apache.tomcat.util.threads.TaskThread" loaded by "org.apache.catalina.loader.StandardClassLoader # 0x293b4488".
Thread Stack
"ajp-bio-9002"-exec-5
at java.util.Arrays.copyOf([CI)[C (Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(I)V (AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(C)Ljava/lang/AbstractStringBuilder; (AbstractStringBuilder.java:572)
at java.lang.StringBuffer.append(C)Ljava/lang/StringBuffer; (StringBuffer.java:320)
at org.apache.myfaces.renderkit.html.util.ReducedHTMLParser.consumeString(C)Ljava/lang/String; (ReducedHTMLParser.java:303)
at org.apache.myfaces.renderkit.html.util.ReducedHTMLParser.consumeAttrValue()Ljava/lang/String; (ReducedHTMLParser.java:327)
at org.apache.myfaces.renderkit.html.util.ReducedHTMLParser.parse()V (ReducedHTMLParser.java:579)
at org.apache.myfaces.renderkit.html.util.ReducedHTMLParser.parse(Ljava/lang/CharSequence;Lorg/apache/myfaces/renderkit/html/util/CallbackListener;)V (ReducedHTMLParser.java:66)
at org.apache.myfaces.renderkit.html.util.DefaultAddResource.parseResponse(Ljavax/servlet/http/HttpServletRequest;Ljava/lang/String;Ljavax/servlet/http/HttpServletResponse;)V (DefaultAddResource.java:699)
at org.apache.myfaces.webapp.filter.ExtensionsFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V (ExtensionsFilter.java:157)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (StandardWrapperValve.java:240)
at org.apache.catalina.core.StandardContextValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (StandardContextValve.java:164)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (AuthenticatorBase.java:462)
at org.apache.catalina.core.StandardHostValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (StandardHostValve.java:164)
at org.apache.catalina.valves.ErrorReportValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (AccessLogValve.java:562)
at org.apache.catalina.core.StandardEngineValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (StandardEngineValve.java:118)
at org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (JvmRouteBinderValve.java:218)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V (ReplicationValve.java:333)
at org.apache.catalina.connector.CoyoteAdapter.service(Lorg/apache/coyote/Request;Lorg/apache/coyote/Response;)V (CoyoteAdapter.java:395)
at org.apache.coyote.ajp.AjpProcessor.process(Lorg/apache/tomcat/util/net/SocketWrapper;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState; (AjpProcessor.java:301)
at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(Lorg/apache/tomcat/util/net/SocketWrapper;Lorg/apache/tomcat/util/net/SocketStatus;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState; (AjpProtocol.java:183)
at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(Lorg/apache/tomcat/util/net/SocketWrapper;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState; (AjpProtocol.java:169)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run()V (JIoEndpoint.java:302)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V (ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:908)
at java.lang.Thread.run()V (Thread.java:662)
Suspect 2
One instance of "java.lang.StringBuffer" loaded by "" occupies 59,216,088 (26.35%) bytes. The instance is referenced by org.apache.myfaces.renderkit.html.util.ReducedHTMLParser # 0x276990e8 , loaded by "org.apache.catalina.loader.WebappClassLoader # 0x29592038". The memory is accumulated in one instance of "char[]" loaded by "".
You can go to the "dominator_tree" tab of memory analyzer (MAT) and expand the TaskThread. This will show you the retained heap of all the objects in that taskthread. This might help you reach the part (code) in your application causing the issue.
It looks like org.apache.myfaces.renderkit.html.util.ReducedHTMLParser is the culprit. The javadoc for ReducedHTMLParser explains how it works. It buffers the entire HTML response in memory and then processes. It looks like it it trying - and failing - to process a very large response.
This question spans both serverfault and stackoverflow so I just picked this one.
I get the following exception with some simple file copy code. Its running on Windows Server 2003 x64
Caused by: java.io.IOException: Insufficient system resources exist to complete the requested service
at sun.nio.ch.FileDispatcher.pwrite0(Native Method)
at sun.nio.ch.FileDispatcher.pwrite(Unknown Source)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.write(Unknown Source)
at sun.nio.ch.FileChannelImpl.write(Unknown Source)
at sun.nio.ch.FileChannelImpl.transferFromFileChannel(Unknown Source)
at sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source)
at Tools.copy(Tools.java:473)
public static void copy(FileChannel input, FileChannel output) throws IOException {
final long size = input.size();
long pos = 0;
while (pos < size) {
final long count = (size - pos) > FIFTY_MB ? FIFTY_MB : (size - pos);
pos += output.transferFrom(input, pos, count);
}
}
The thing is the server that is running this code is brand new and super powerful, so I don't understand what system resource it could possibly be running out of.
This looks like the error described here:
http://support.microsoft.com/kb/304101
But I've tried adding the registry edits to increase kernel memory page size, and that didn't help.
What I really don't get is I've seen code that uses FileChannel transferFrom with a lot larger chunks of 50 MB. I've seen that code work for files well over 1 GB in one chunk. But the file the server is getting stuck on is just 32 MB!
What is going on here? Is this a problem with FileChannel or Windows?
It may be related to "Bug" ID 4938442: Insufficient System Resources When Copying Large Files with NIO FileChannels.
Evaluation: Not a bug. This is most likely a file-server (or possibly client)
configuration issue.
CUSTOMER SUBMITTED WORKAROUND :
Don't use NIO; we'd prefer to avoid this workaround since
NIO offers a significant performance boost for large files
(at least when performing local disk-to-local disk copies)
We can transfer using a smaller number of bytes. The
actual number of bytes that may be copied without
encountering this error seems to differ on Windows XP and
Windows 2000 server. Certainly a value of 32Mb appears to
work.