How to investigate Java Heap Space errors in jenkins

How to investigate Java Heap Space errors in jenkins - java

I have a matrix pipeline job which performs multiple stages (something like ~200) most of which are functional tests whose results are recorded by the following code:
stage('Report') {
script {
def summary = junit allowEmptyResults: true, testResults: "**/artifacts/${product}/test-reports/*.xml"
def buildURL = "${env.BUILD_URL}"
def TestAnalyzer = buildURL.replace("/${env.BUILD_NUMBER}", "/test_results_analyzer")
def TestsURL = buildURL.replace("job/${env.JOB_NAME}/${env.BUILD_NUMBER}", "blue/organizations/jenkins/${env.JOB_NAME}/detail/${env.JOB_NAME}/${env.BUILD_NUMBER}/tests")
slackSend (
color: summary.failCount == 0 ? 'good' : 'warning',
message: "*<${TestsURL}|Test Summary>* for *${env.JOB_NAME}* on *${env.HOSTNAME} - ${product}* <${env.BUILD_URL}| #${env.BUILD_NUMBER}> - <${TestAnalyzer}|${summary.totalCount} Tests>, Failed: ${summary.failCount}, Skipped: ${summary.skipCount}, Passed: ${summary.passCount}"
)
}
}
The problem is that this Report stage regularly fails with the following error:
> Archive JUnit-formatted test results 9m 25s
[2022-11-16T02:51:49.569Z] Recording test results
Java heap space
I have increased the heap space of the jenkins server to 8GB by modifying the systemd service configuration this way:
software-dev#magnet:~$ sudo cat /etc/systemd/system/jenkins.service.d/override.conf
[Service]
Environment="JAVA_OPTS=-Djava.awt.headless=true -Xmx8g"
which was taken into account, because I verified with the following command:
software-dev#magnet:~$ tr '\0' '\n' < /proc/$(pidof java)/cmdline
/usr/bin/java
-Djava.awt.headless=true
-Xmx10g
-jar
/usr/share/java/jenkins.war
--webroot=/var/cache/jenkins/war
--httpPort=8080
I just increased the Heap size to 10GB and I'll wait for the result of this night's build, but I have the feeling that this amount of Heap space really looks excessive, so I'm suspecting that a plugin, maybe the JUnit one, may be buggy and could consume too much memory.
Is anyone aware of such a thing? Could there be workarounds?
More importantly, which methods could I use to try to track if one plugin is consuming too much?
I have notions of Java since my CS degree, but I'm not familiar with the jenkins development ecosystem.
Thank you by advance.

You can try splitting the tests into chunks/batch/groups but this solution requires changes in the code.
More details
https://semaphoreci.com/community/tutorials/how-to-split-junit-tests-in-a-continuous-integration-environment
Grouping JUnit tests

Related

How to analyze and limit the memory usage of geotrellis and spark for tiling

Our main goal is that we want to perform operations on a large amount of input data (around 80 GB). The problem is that even for smaller datasets, we often get java heap space or other memory related errors.
Our temporary solution was to simply specify a higher maximum heap size (using -Xmx locally or by setting spark.executor.memory and spark.driver.memory for our spark instance) but this does not seem to generalize well, we still run into errors for bigger datasets or higher zoom levels.
For better understanding, here is the basic concept of what we do with our data:
Load the data using HadoopGeoTiffRDD.spatial(new Path(path))
Map the data to the tiles of some zoom level
val extent = geotrellis.proj4.CRS.fromName("EPSG:4326").worldExtent
val layout = layoutForZoom(zoom, extent)
val metadata: TileLayerMetadata[SpatialKey] = dataSet.collectMetadata[SpatialKey](layout)
val rdd = ContextRDD(dataSet.tileToLayout[SpatialKey](metadata), metadata)
Where layoutForZoom is basically the same as geotrellis.spark.tiling.ZoomedLayoutScheme.layoutForZoom
Then we perform some operations on the entries of the rdd using rdd.map and rdd.foreach for the mapped rdds.
We aggregate the results of four tiles which correspond to a single tile in a higher zoom level using groupByKey
Go to 3 until we reached a certain zoom level
The goal would be: Given a memory limit of X GB, partition and work on the data in a way that we consume at most X GB.
It seems like the tiling of the dataset via tileToLayout already takes too much memory on higher zoom levels (even for very small data sets). Are there any alternatives for tiling and loading the data according to some LayoutDefinition? As far as we understand, HadoopGeoTiffRDD.spatial already splits the dataset into small regions which are then divided into the tiles by tileToLayout. Is it somehow possible to directly load the dataset corresponding to the LayoutDefinition?
In our concrete scenario we have 3 workers with 2GB RAM and 2 cores each. On one of them is running the spark master too, which gets its work via spark-submit from a driver instance. We tried configurations like this:
val conf = new SparkConf().setAppName("Converter").setMaster("spark://IP-ADDRESS:PORT")
.set("spark.executor.memory", "900m") // to be below the available 1024 MB of default slave RAM
.set("spark.memory.fraction", "0.2") // to get more usable heap space
.set("spark.executor.cores", "2")
.set("spark.executor.instances", "3")
An example for a heap space error at the tiling step (step 2):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 (TID 5, 192.168.0.2, executor 1): java.lang.OutOfMemoryError: Java heap space
at scala.collection.mutable.ArrayBuilder$ofByte.mkArray(ArrayBuilder.scala:128)
at scala.collection.mutable.ArrayBuilder$ofByte.resize(ArrayBuilder.scala:134)
at scala.collection.mutable.ArrayBuilder$ofByte.sizeHint(ArrayBuilder.scala:139)
at scala.collection.IndexedSeqOptimized$class.slice(IndexedSeqOptimized.scala:115)
at scala.collection.mutable.ArrayOps$ofByte.slice(ArrayOps.scala:198)
at geotrellis.util.StreamingByteReader.getBytes(StreamingByteReader.scala:98)
at geotrellis.raster.io.geotiff.LazySegmentBytes.getBytes(LazySegmentBytes.scala:104)
at geotrellis.raster.io.geotiff.LazySegmentBytes.readChunk(LazySegmentBytes.scala:81)
at geotrellis.raster.io.geotiff.LazySegmentBytes$$anonfun$getSegments$1.apply(LazySegmentBytes.scala:99)
at geotrellis.raster.io.geotiff.LazySegmentBytes$$anonfun$getSegments$1.apply(LazySegmentBytes.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1012)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1010)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2118)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2118)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1026)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1008)
at geotrellis.spark.TileLayerMetadata$.collectMetadataWithCRS(TileLayerMetadata.scala:147)
at geotrellis.spark.TileLayerMetadata$.fromRdd(TileLayerMetadata.scala:281)
at geotrellis.spark.package$withCollectMetadataMethods.collectMetadata(package.scala:212)
...
Update:
I extracted an example from my code and uploaded it to the repository at https://gitlab.com/hwuerz/geotrellis-spark-example. You can run the example locally using sbt run and selecting the class demo.HelloGeotrellis. This will create the tiles for a tiny input dataset example.tif according to our layout definition starting at zoom level 20 (using two cores per default, can be adjusted in the file HelloGeotrellis.scala ~ if level 20 somehow still works, it will most likely fail using higher values for bottomLayer).
To run the code on the Spark Cluster, I use the following command:
`sbt package && bash submit.sh --dataLocation /mnt/glusterfs/example.tif --bottomLayer 20 --topLayer 10 --cesiumTerrainDir /mnt/glusterfs/terrain/ --sparkMaster spark://192.168.0.8:7077`
Wheresubmit.sh basically runs spark-submit (see the file in the repo).
The example.tif is included in the repo within the directory DebugFiles. In my setup the file is distributed via glusterfs which is why the path points to this location. The cesiumTerrainDir is just an directory where we store our generated output.
We think the main problem might be that using the given api calls, geotrellis loads the complete structure of the layout into the memory, which is too big for higher zoom levels. Is there any way to avoid this?

Mallet: OutOfMemoryError: Java heap space

While training data in Mallet, the processed stopped because of OutOfMemoryError. Attribute MEMORY in bin/mallet has already been set to 3GB. The size of training file output.mallet is only 31 MB. I have tried to reduce the training data size. But it still throws the same error:
a161115#a161115-Inspiron-3250:~/dev/test_models/Mallet$ bin/mallet train-classifier --input output.mallet --trainer NaiveBayes --training-portion 0.0001 --num-trials 10
Training portion = 1.0E-4
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.9999
-------------------- Trial 0 --------------------
Trial 0 Training NaiveBayesTrainer with 7 instances
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:309)
at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251)
at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:415)
I would appriciate any help or insights into this problem
EDIT: this is my bin/mallet file.
#!/bin/bash
malletdir=`dirname $0`
malletdir=`dirname $malletdir`
cp=$malletdir/class:$malletdir/lib/mallet-deps.jar:$CLASSPATH
#echo $cp
MEMORY=10g
CMD=$1
shift
help()
{
cat <<EOF
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load SVMLight format data files into Mallet instances
info get information about Mallet instances
train-classifier train a classifier from Mallet data files
classify-dir classify data from a single file with a saved classifier
classify-file classify the contents of a directory with a saved classifier
classify-svmlight classify data from a single file in SVMLight format
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
evaluate-topics estimate the probability of new documents under a trained model
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
bulk-load for big input files, efficiently prune vocabulary and import docs
Include --help with any option for more information
EOF
}
CLASS=
case $CMD in
import-dir) CLASS=cc.mallet.classify.tui.Text2Vectors;;
import-file) CLASS=cc.mallet.classify.tui.Csv2Vectors;;
import-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Vectors;;
info) CLASS=cc.mallet.classify.tui.Vectors2Info;;
train-classifier) CLASS=cc.mallet.classify.tui.Vectors2Classify;;
classify-dir) CLASS=cc.mallet.classify.tui.Text2Classify;;
classify-file) CLASS=cc.mallet.classify.tui.Csv2Classify;;
classify-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Classify;;
train-topics) CLASS=cc.mallet.topics.tui.TopicTrainer;;
infer-topics) CLASS=cc.mallet.topics.tui.InferTopics;;
evaluate-topics) CLASS=cc.mallet.topics.tui.EvaluateTopics;;
prune) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
split) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
bulk-load) CLASS=cc.mallet.util.BulkLoader;;
run) CLASS=$1; shift;;
*) echo "Unrecognized command: $CMD"; help; exit 1;;
esac
java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$#"
It's also worth mentioning that my original training file has 60,000 items. When I reduce the number of items (20,000 instances), training will run like normal, but uses about 10GB RAM.

Check the call to Java in bin/mallet and add the flag -Xmx3g, making sure there isn't another Xmx in it; if so, edit that one).

I usually change both files: mallet files and set the memory to maximum
Mallet.batjava -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
and
java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$#"
I replaced the bold %MALLET_MEMORY% and $MEMORY with the memory I want: e.g. 4G

python stanford posttager, java command failed after running for some time

I am using stanford posttager toolkit to tag list of words from academic papers. Here is my codes of this part:
st = StanfordPOSTagger(stanford_tagger_path, stanford_jar_path, encoding = 'utf8', java_options = '-mx2048m')
word_tuples = st.tag(document)
document is a list of words derived from nltk.word_tokenize, they come from mormal academic papers so usually there are several thousand of words (mostly 3000 - 4000). I need to process over 10000 files so I keep calling these functions. My program words fine on a small test set with 270 files, but when the number of file gets bigger, the program gives out this error (Java heap space 2G):
raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed
Note that this error does not occur immediately after the execution, it happens after some time of running. I really don't know the reason. Is this because my 3000 - 4000 words are too much ? Thank you very much for help !(Sorry for the bad edition, the error information is too long)

Here is my solution to the code,after I too faced the error.Basically increasing JAVA heapsize solved it.
import os
java_path = "C:\\Program Files\\Java\\jdk1.8.0_102\\bin\\java.exe"
os.environ['JAVAHOME'] = java_path
from nltk.tag.stanford import StanfordPOSTagger
path_to_model = "stanford-postagger-2015-12-09/models/english-bidirectional-distsim.tagger"
path_to_jar = "stanford-postagger-2015-12-09/stanford-postagger.jar"
tagger=StanfordPOSTagger(path_to_model, path_to_jar)
tagger.java_options='-mx4096m' ### Setting higher memory limit for long sentences
sentence = 'This is testing'
print tagger.tag(sentence.split())

I assume you have tried increasing the Java stack via the Tagger settings like so
stanford.POSTagger([...], java_options="-mxSIZEm")
Cf the docs, default is 1000:
def __init__(self, [...], java_options='-mx1000m')
In order to test if it is a problem with the size of the dataset, you can tokenize your text into sentences, e.g. using the Punkt Tokenizer and output them right after tagging.

Spark fails with java.lang.OutOfMemoryError: GC overhead limit exceeded?

This is my java code in which I am querying data from Hive using Apache spark sql.
JavaSparkContext ctx = new JavaSparkContext(new SparkConf().setAppName("LoadData").setMaster("MasterUrl"));
HiveContext sqlContext = new HiveContext(ctx.sc());
List<Row> result = sqlContext.sql("Select * from Tablename").collectAsList();
when I run this code it throws java.lang.OutOfMemoryError: GC overhead limit exceeded. How to solve this or how to increase the memory in Spark configuration.

If you are using the spark-shell to run it then you can use the driver-memory to bump the memory limit:
spark-shell --driver-memory Xg [other options]
If the executors are having problems then you can adjust their memory limits with --executor-memory XG
You can find more info how to exactly set them in the guides: submission for executor memory, configuration for driver memory.
#Edit: since you are running it from Netbeans you should be able to pass them as JVM arguments -Dspark.driver.memory=XG and -Dspark.executor.memory=XG. I think it was in Project Properties under Run.

have you found any solutions for your issue yet?
please share them if you have :D
and here is my idea: rdd and also javaRDD has a method toLocalIterator(),
spark document said that
The iterator will consume as much memory as the largest partition in
this RDD.
it means iterator will consume less memory than List if the rdd is devided into many partitions, you can try like this:
Iterator<Row> iter = sqlContext.sql("Select * from Tablename").javaRDD().toLocalIterator();
while (iter.hasNext()){
Row row = iter.next();
//your code here
}
ps: it's just an idea and i haven't tested it yet

Java Command Fails in NLTK Stanford POS Tagger

I request your kind help and assistance in solving the error of "Java Command Fails" which keeps throwing whenever I try to tag an Arabic corpus with size of 2 megabytes. I have searched the web and stanford POS tagger mailing list. However, I did not find the solution. I read some posts on problems similar to this, and it was suggested that the memory is used out. I am not sure of that. Still I have 19GB free memory. I tried every possible solution offered, but the same error keeps showing.
I have average command on Python and good command on Linux. I am using LinuxMint17 KDE 64-bit, Python3.4, NLTK alpha and Stanford POS tagger model for Arabic . This is my code:
import nltk
from nltk.tag.stanford import POSTagger
arabic_postagger = POSTagger("/home/mohammed/postagger/models/arabic.tagger", "/home/mohammed/postagger/stanford-postagger.jar", encoding='utf-8')
print("Executing tag_corpus.py...\n")
# Import corpus file
print("Importing data...\n")
file = open("test.txt", 'r', encoding='utf-8').read()
text = file.strip()
print("Tagging the corpus. Please wait...\n")
tagged_corpus = arabic_postagger.tag(nltk.word_tokenize(text))
IF THE CORPUS SIZE IS LESS THAN 1MB ( = 100,000 words), THERE WILL BE NO ERROR. BUT WHEN I TRY TO TAG 2MB CORPUS, THEN THE FOLLOWING ERROR MESSAGE IS SHOWN:
Traceback (most recent call last):
File "/home/mohammed/experiments/current/tag_corpus2.py", line 17, in <module>
tagged_lst = arabic_postagger.tag(nltk.word_tokenize(text))
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/tag/stanford.py", line 59, in tag
return self.batch_tag([tokens])[0]
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/tag/stanford.py", line 81, in batch_tag
stdout=PIPE, stderr=PIPE)
File "/usr/local/lib/python3.4/dist-packages/nltk-3.0a3-py3.4.egg/nltk/internals.py", line 171, in java
raise OSError('Java command failed!')
OSError: Java command failed!
I intend to tag 300 Million words to be used in my Ph.D. research project. If I keep tagging 100 thousand words at a time, I will have to repeat the task 3000 times. It will kill me!
I really appreciate your kind help.

After your import lines add this line:
nltk.internals.config_java(options='-xmx2G')
This will increase the maximum RAM size that java allows Stanford POS Tagger to use. The '-xmx2G' changes the maximum allowable RAM to 2GB instead of the default 512MB.
See What are the Xms and Xmx parameters when starting JVMs? for more information
If you're interested in how to debug your code, read on.
So we see that the command fail when handling huge amount of data so the first thing to look at is how the Java is initialized in NLTK before calling the Stanford tagger, from https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L19 :
from nltk.internals import find_file, find_jar, config_java, java, _java_options
We see that the nltk.internals package is handling the different Java configurations and parameters.
Then we take a look at https://github.com/nltk/nltk/blob/develop/nltk/internals.py#L65 and we see that the no value is added for the memory allocation for Java.

In version 3.9.2, the StanfordTagger class constructor accepts a parameter called java_options which can be used to set the memory for the POSTagger and also the NERTagger.
E.g. pos_tagger = StanfordPOSTagger('models/english-bidirectional-distsim.tagger', path_to_jar='stanford-postagger-3.9.2.jar', java_options='-mx3g')
I found the answer by #alvas to not work because the StanfordTagger was overriding my memory setting with the built-in default of 1000m. Perhaps using nltk.internals.config_java after initializing StanfordPOSTagger might work but I haven't tried that.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to investigate Java Heap Space errors in jenkins - java

You can try splitting the tests into chunks/batch/groups but this solution requires changes in the code. More details https://semaphoreci.com/community/tutorials/how-to-split-junit-tests-in-a-continuous-integration-environment Grouping JUnit tests

Related

How to analyze and limit the memory usage of geotrellis and spark for tiling

Mallet: OutOfMemoryError: Java heap space

python stanford posttager, java command failed after running for some time

Spark fails with java.lang.OutOfMemoryError: GC overhead limit exceeded?

Java Command Fails in NLTK Stanford POS Tagger

Categories

Resources