ZGC out of memory error when moving from G1GC

ZGC out of memory error when moving from G1GC - java

I have an application in Java 17 configured like so:
-Xms4096m -Xmx4096m -XX:+UseG1GC -XX:MaxMetaspaceSize=1024m
It's a rather large web service that is pretty memory intensive over the years that's grown from a baseline other services use:
-Xms1024m -Xmx1024m -XX:+UseG1GC -XX:MaxMetaspaceSize=128m
I've started to trial ZGC in our company on our web services to try and get a little bit more performance out of our services so I switched some of our baseline services which seemed to perform fine:
-Xms1024m -Xmx1024m -XX:+UseZGC -XX:MaxMetaspaceSize=128m
but when I went to try our single large service on ZGC I run into an error:
-Xms4096m -Xmx4096m -XX:+UseZGC -XX:MaxMetaspaceSize=1024m
Results in:
Apr 4, 2022 # 15:06:22.103 Error: Could not create the Java Virtual Machine.
Apr 4, 2022 # 15:06:22.026 [0.983s][error][gc] Failed to allocate initial Java heap (4096M)
Apr 4, 2022 # 15:06:21.952 [0.966s][error][gc] Failed to commit memory (Not enough space)
Apr 4, 2022 # 15:06:21.881 [0.983s][error][gc] Forced to lower max Java heap size from 4096M(100%) to 3366M(82%)
Apr 4, 2022 # 15:06:21.826 [0.982s][error][gc] Failed to commit memory (Not enough space)
Apr 4, 2022 # 15:06:21.826 Error: A fatal exception has occurred. Program will exit.
Apr 4, 2022 # 15:06:21.783 [0.934s][error][gc] Failed to commit memory (Not enough space)
Apr 4, 2022 # 15:06:21.226 [0.979s][error][gc] Failed to commit memory (Not enough space)
Apr 4, 2022 # 15:06:20.224 [0.591s][error][gc] Failed to commit memory (Not enough space)
I noticed that there was a similar JDK issue logged on one of their tests here and although I'm not using the -XX:+UseLargePages I believe the implication is that I'm not leaving enough space in the heap to let ZGC start up compared to G1GC. I'd definitely like to use ZGC in the bigger service as I feel like there's the most gain there; but ZGC is new enough that I haven't found a lot of information on it during my search. What are the appropriate values/ratios in the flags for utilizing ZGC? Are there official recommendations on when to use ZGC over Shenandoah or G1GC?

Related

Jelastic GC agent does not work with Tomcat 8.5.x

I use Jelastic platform from different providers: dogado.de (Jelastic version is 4.6.2) and mirhosting.com (Jelastic version is 4.6.1). I have some environments on both platforms. These environments have next configuration:
Java 8
Apache Tomcat 8.5.3
MySQL 5.7.10
The Tomcat prints next info to the log file:
Server version: Apache Tomcat/8.5.3
Server number: 8.5.3.0
OS Name: Linux
OS Version: 2.6.32-042stab113.21
Architecture: amd64
Java Home: /usr/java/jdk1.8.0_72/jre
JVM Version: 1.8.0_72-b15
JVM Vendor: Oracle Corporation
CATALINA_BASE: /opt/repo/versions/8.5.3
CATALINA_HOME: /opt/repo/versions/8.5.3
I'm trying to enable Jelastic GC agent. So I changed the conf/variables.conf file, so now it contains next lines:
-javaagent:/opt/repo/versions/8.5.3/lib/jelastic-gc-agent.jar=debug=true,period=60
It means that the debug mode should be enabled now and the agent must print every 60 seconds the info about memory releasing. For previous Tomcat version (7.0.39; on the same platform, but another environment) it looks like this:
Jul 14, 2016 6:08:30 PM com.jelastic.java.gc.JelasticGCAgent$1 run
INFO: JelasticGCAgent - Start Full GC : [free memory] : 181834896 bytes
Jul 14, 2016 6:08:30 PM com.jelastic.java.gc.JelasticGCAgent$1 run
INFO: JelasticGCAgent - Finish Full GC : [free memory] : 74885120 bytes
But it does not work for Tomcat 8: memory usage is not changing at all time, there are no any new messages in the log file. I asked supports of these providers how to fix this issue, but the issue still there. And even more it looks like a bug in Jelastic agent or in the whole platform.
Did anybody face with the same issue already? Any known ways to fix it? Maybe I need to use another jar files?
Any ideas are welcome and of course I want to ask Jelastic team about this problem.

It seems to be a problem with configuration file for variables parsing.
As a workaround:
log into your Tomcat node via SSH
navigate to /opt/repo/versions/8.5.3/bin/
edit variablesparser.sh file and change the third line from
CONFFILE='/opt/repo/versions/${Version}/conf/variables.conf'
to
CONFFILE="/opt/repo/versions/${Version}/conf/variables.conf"
restart Tomcat

Proton CEP: 100% CPU usage after a few hours

I have a Proton CEP instance deployed on my own server with 2 CPUs and 4GB RAM.
After leaving it working overnight, CPU usage increases heavily, up to 100% of each core. The command being executed is:
java -Djava.security.egd=file:/dev/./urandom -Djava.awt.headless=true -Xmx512m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start
Looking at the logs, I also see:
Feb 17, 2016 10:00:00 AM com.ibm.hrl.proton.server.executorServices.SimpleThreadFactory$ProtonExceptionHandler uncaughtException
SEVERE: Uncaught exception in thread: Thread[31516,5,main],exception: Java heap space
Feb 17, 2016 10:02:31 AM com.ibm.hrl.proton.server.executorServices.SimpleThreadFactory$ProtonExceptionHandler uncaughtException
SEVERE: Uncaught exception in thread: Thread[31643,5,main],exception: Java heap space
Althought from htop it seems that up to 2 GB of RAM are still free.
Is this normal?
The same server is also running Orion, but that one is not experiencing issues.

The problem was that due to a bug in our code, we were sending ever larger requests with data to Orion which forwarded them to Proton. Eventually the requests grew from 100B to over 50MB, causing Proton to stall as it was unable to process all the data in time.

Spring : Illegal access: this web application instance has been stopped already

I am working on a Spring-MVC application in which I am computing the statistics every night. The problem is, yesterdays computation failed and I have this error and an hs_err_something.log file. The file basically says Out of memory error, but our servers have 32GB ram and quite lot of disk space too. Also, the server is kind of relaxed in night. Why I am I getting this error. I will post contents of relevant code.
StatisticsServiceImpl :
#Override
#Scheduled(cron = "0 2 2 * * ?")
public void computeStatisticsForAllUsers() {
// One of the count as part of statistics
int groupNotesCount = this.groupNotesService.getNoteCountForUser(person.getUsername());
}
GroupNotesDAOImpl :
#Override
public int getNoteCountForUser(String noteCreatorEmail) {
Session session = this.sessionFactory.getCurrentSession();
Query query = session.createQuery("select count(*) from GroupNotes as gn where gn.noteCreatorEmail=:noteCreatorEmail");
query.setParameter("noteCreatorEmail", noteCreatorEmail);
return new Integer(String.valueOf(query.uniqueResult()));
}
Error log :
Aug 05, 2015 2:02:02 AM org.apache.catalina.loader.WebappClassLoader loadClass
INFO: Illegal access: this web application instance has been stopped already. Could not load gn. The eventual following stack trace is cause
d by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no function
al impact.
java.lang.IllegalStateException
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
at com.journaldev.spring.dao.GroupNotesDAOImpl.getNoteCountForUser(GroupNotesDAOImpl.java:359)
hs_err.log file :
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 741867520 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (os_linux.cpp:2673), pid=20080, tid=140319513569024
#
# JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build 1.8.0_45-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
What should I do. Any help would be nice. Thanks a lot.

Tomcat7 starts too late on Ubuntu 14.04 x64 [Digitalocean]

i am using digitalocean and trying to install and start tomcat on ubuntu but unfortunately i can not do it. (created new droplets and tried 10 times)
1GB Ram 30GB SSD Disk Amsterdam 2 Ubuntu 14.04 x64
When i start tomcat, it says "Tomcat started". But i can not access page from browser. and ./shutdown.sh returns error.
What can be the problem ?
I noticed something now. While i am writing this question, tomcat page is displayed. it took 28 minutes to display the page
catalina.out says: INFO: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [1,718,769] milliseconds.
Here are my installation steps (These steps works on different vps but doesn't work on digitalocean droplets):
Install oracle jdk
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
sudo apt-get install oracle-java7-set-default
java -version
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
Set java path
sudo nano /etc/environment
JAVA_HOME="/usr/lib/jvm/java-7-oracle"
source /etc/environment
wget http://ftp.itu.edu.tr/Mirror/Apache/tomcat/tomcat-7/v7.0.56/bin/apache-tomcat-7.0.56.tar.gz
tar xvzf apache-tomcat-7.0.56.tar.gz
mv apache-tomcat-7.0.56/ apache-tomcat-7.0.56-server-1/
Start Tomcat
./startup.sh
Using CATALINA_BASE: /usr/local/apache-tomcat-7.0.56-server-1
Using CATALINA_HOME: /usr/local/apache-tomcat-7.0.56-server-1
Using CATALINA_TMPDIR: /usr/local/apache-tomcat-7.0.56-server-1/temp
Using JRE_HOME: /usr/lib/jvm/java-7-oracle/jre
Using CLASSPATH: /usr/local/apache-tomcat-7.0.56-server-1/bin/bootstrap.jar:/usr/local/apache-tomcat-7.0.56-server-1/bin/tomcat-juli.jar
Tomcat started.
Checkout Port 8080
netstat -ln
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp6 0 0 :::8009 :::* LISTEN
tcp6 0 0 :::8080 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
Checkout process
ps -ef | grep tomcat
root 2825 1 1 14:23 pts/0 00:00:03 /usr/lib/jvm/java-7-oracle/jre/bin/java -Djava.util.logging.config.file=/usr/local/apache-tomcat-7.0.56-server-1/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/local/apache-tomcat-7.0.56-server-1/endorsed -classpath /usr/local/apache-tomcat-7.0.56-server-1/bin/bootstrap.jar:/usr/local/apache-tomcat-7.0.56-server-1/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/apache-tomcat-7.0.56-server-1 -Dcatalina.home=/usr/local/apache-tomcat-7.0.56-server-1 -Djava.io.tmpdir=/usr/local/apache-tomcat-7.0.56-server-1/temp org.apache.catalina.startup.Bootstrap start
Open web site at port 8080 http://5.101.107.56:8080/ Page is waiting... [content is displayed after 28 minute or more]
Try to shutdown tomcat if content is not displayed yet (before tomcat starts properly).
./shutdown.sh
SEVERE: Could not contact localhost:8005. Tomcat may not be running.
Oct 17, 2014 2:40:29 PM org.apache.catalina.startup.Catalina stopServer
SEVERE: Catalina.stop:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSoc
Checkout logs
catalina.out
Oct 17, 2014 2:31:47 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8009"]
Oct 17, 2014 2:31:47 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1492 ms
Oct 17, 2014 2:31:47 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Oct 17, 2014 2:31:47 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.56
Oct 17, 2014 2:31:47 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.56-server-1/webapps/host-manager
I also installed nginx and navigate to http://5.XXX.XXX.XX/ nginx welcome page is opened immediately
I checked catalina.out when i see the page in browser, it says:
Oct 17, 2014 2:31:47 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.56-server-1/webapps/host-manager
Oct 17, 2014 3:00:27 PM org.apache.catalina.util.SessionIdGenerator createSecureRandom
INFO: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took **[1,718,769] milliseconds.**
Memory:
total used free shared buffers cached
Mem: 1017912 849512 168400 332 18780 688468

Replacing securerandom.source=file:/dev/urandom with securerandom.source=file:/dev/./urandom in $JAVA_PATH/jre/lib/security/java.security has solved my problem.
Even when file:/dev/urandom is specified, JRE will still use /dev/random for SHA1PRNG (see bug JDK-4705093):
In SHA1PRNG, there is a SeedGenerator which does various things
depending on the configuration.
If java.security.egd or securerandom.source point to "file:/dev/random" or "file:/dev/urandom", we will use
NativeSeedGenerator, which calls super() which calls
SeedGenerator.URLSeedGenerator(/dev/random). (A nested class within
SeedGenerator.) The only things that changed in this bug was that
urandom will also trigger use of this code path.
If those properties point to another URL that exists, we'll initialize SeedGenerator.URLSeedGenerator(url). This is why
"file:///dev/urandom", "file:/./dev/random", etc. will work.
From Wikipedia on /dev/random:
In this implementation, the generator keeps an estimate of the number
of bits of noise in the entropy pool. From this entropy pool random
numbers are created. When read, the /dev/random device will only
return random bytes within the estimated number of bits of noise in
the entropy pool. /dev/random should be suitable for uses that need
very high quality randomness such as one-time pad or key generation.
When the entropy pool is empty, reads from /dev/random will block
until additional environmental noise is gathered. The intent is to
serve as a cryptographically secure pseudorandom number generator,
delivering output with entropy as large as possible. This is suggested
for use in generating cryptographic keys for high-value or long-term
protection.
Environmental noise?
The random number generator gathers environmental noise from device
drivers and other sources into an entropy pool. The generator also
keeps an estimate of the number of bits of noise in the entropy pool.
From this entropy pool random numbers are created.
That means in practice, it’s possible to block tomcat for an unknown amount of time.

This also works:
Actually, by setting the following in /etc/default/tomcat7, I was fine:
JAVA_OPTS="-Djava.security.egd=file:/dev/./urandom -Djava.awt.headless=true -Xmx1024m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC"
Comment from :
https://www.digitalocean.com/community/tutorials/how-to-install-apache-tomcat-7-on-ubuntu-14-04-via-apt-get

While using /dev/urandom as the source for entropy is a workaround that reduces the startup time for Tomcat, it is not a good idea because it can have unintended side effects.
Other components running in the Tomcat server (e.g. web applications) might depend on a securely initialized SecureRandom instance and there might be security issues when the entropy for the random numbers is not sufficient.
Actually, this is one of the reasons why using /dev/urandom does not work, but /dev/./urandom does. The SHA1PRNG heavily relies on a good seed. If the seed is not good, the random numbers are predictable. Therefore, the developer ensured that for this purpose /dev/random is used as the source of entropy, even if the JVM is configured to use /dev/urandom. There are two bug reports about this (bug 1, bug 2).
So instead of changing the entropy source to /dev/urandom, one should rather make sure that /dev/random has enough entropy. If the system has a hardware RNG, installing rng-tools should do the trick. Otherwise, installing haveged provides a very good source of entropy that does not rely on a special hardware RNG to be present. In a virtual machine, rng-tools can use entropy from the host through a virtual hardware RNG. As an alternative to this, EGD could be used, but at the moment this software is not included in the Ubuntu repositories, so that it is bothersome to use it.

neo4j failing to start after disk full

i had the /var disk full on my debian server and neo4j stopped as expected. I freed up the space on the disk but the neo4j server does not start throwing the error given below in the logs. I don't have any java or neo4j process running on the server that i may kill.
Things were stable with my neo4j setup for the past 6 months with around 1000 nodes. I am little novice on the java side so please let me know if i have missed out on anything basic.
Output on command line: service neo4j-service restart
Restarting Neo4j Graph Database: neo4jWARNING: Max 1024 open files
allowed, minimum of 40 000 recommended. See the Neo4j manual. WARNING!
You are using an unsupported Java runtime.
* Please use Oracle(R) Java(TM) 7 to run Neo4j Server. Download "Java Platform (JDK) 7" from:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
* Please see http://docs.neo4j.org/ for Neo4j Server installation instructions. Using additional JVM arguments: -server
-XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -Dlog4j.configuration=file:conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled Starting Neo4j Server...WARNING: not changing user process [4733]... waiting for
server to be ready..... Failed to start within 120 seconds. Neo4j
Server may have failed to start, please check the logs. failed!
Logfile:
Sep 26, 2014 3:31:38 PM org.neo4j.server.logging.Logger log
SEVERE: Failed to start Neo Server on port [7474]
Sep 26, 2014 3:33:00 PM org.neo4j.server.logging.Logger log
WARNING: You are using an unsupported Java runtime. Please use Oracle(R) Java(TM) Runtime Environment 7.
Sep 26, 2014 3:33:00 PM org.neo4j.server.logging.Logger log
INFO: Setting startup timeout to: 120000ms based on -1
Sep 26, 2014 3:33:03 PM org.neo4j.server.logging.Logger log
SEVERE:
org.neo4j.server.ServerStartupException: Starting Neo4j Server failed: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /var/lib/neo4j/data/graph.db
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:218)
at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:87)
at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:50)
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /var/lib/neo4j/data/graph.db
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:330)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:63)
at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:92)
at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:198)
at org.neo4j.kernel.impl.recovery.StoreRecoverer.recover(StoreRecoverer.java:115)
at org.neo4j.server.preflight.PerformRecoveryIfNecessary.run(PerformRecoveryIfNecessary.java:59)
at org.neo4j.server.preflight.PreFlightTasks.run(PreFlightTasks.java:70)
at org.neo4j.server.AbstractNeoServer.runPreflightTasks(AbstractNeoServer.java:333)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:152)
... 2 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.transaction.TxManager#61615142' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:509)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:115)
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:307)
... 10 more
Caused by: org.neo4j.graphdb.TransactionFailureException: Unable to start TM
at org.neo4j.kernel.impl.transaction.TxManager.openLog(TxManager.java:824)
at org.neo4j.kernel.impl.transaction.TxManager.start(TxManager.java:198)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:503)
... 12 more
Caused by: java.io.IOException: Branch[����] found for [GlobalId[NENEOK|5930761323375553953|5569093129887285248|13248], BranchId[ ]] but no record list found in map
at org.neo4j.kernel.impl.transaction.TxLog.readBranchAddRecordInto(TxLog.java:520)
at org.neo4j.kernel.impl.transaction.TxLog.getDanglingRecords(TxLog.java:440)
at org.neo4j.kernel.impl.transaction.TxLog.recreateActiveTransactionState(TxLog.java:133)
at org.neo4j.kernel.impl.transaction.TxLog.<init>(TxLog.java:128)
at org.neo4j.kernel.impl.transaction.TxManager.openLog(TxManager.java:796)
... 14 more
Sep 26, 2014 3:33:03 PM org.neo4j.server.logging.Logger log
SEVERE: Failed to start Neo Server on port [7474]
I am using an openjdk 1.7 runtime environment which i think i should upgrade but i don't understand what is the cause of the error since everything was working fine before. Thanks for any help!

You should never ever manually modify anything inside the graph.db directory unless you're 100% sure what you're doing.
To prevent the datastore directory from continuously growing check your setting for keep_logical_logs in neo4j.properties, see http://docs.neo4j.org/chunked/stable/configuration-logical-logs.html.
Unless you require logical logs for online backup or cluster synchronization, you might be save deleting nioneo_logical.log.v* files. Make sure to have a backup first!
Also check your settings for open file limits, http://docs.neo4j.org/chunked/stable/linux-performance-guide.html#_setting_the_number_of_open_files.
There used to be a outdated version of Neo4j having a bug that might corrupt your datastore when running out of disc space. If that has happened to you either need to manually fix it on a binary level (which requires a lot of knowledge on Neo4j internals) or restore a previous backup from a time before you run out of disc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.