After updating Jenkins to version 2.156 (from version 1.6), some of our build jobs get stuck after completing and before moving to post-build action. Job itself is finished within 5 minutes (same as before), then it hangs for 5-10 minutes before moving on.
I managed to narrow it down to this:
"Executor #10 for master : executing 03_masa #4390" Id=34464 Group=main TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at hudson.util.ProcessTree$WindowsOSProcess.killSoftly(ProcessTree.java:560)
at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:520)
at hudson.util.ProcessTree$Windows.killAll(ProcessTree.java:666)
at hudson.Launcher$LocalLauncher.kill(Launcher.java:955)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:510)
at hudson.model.Run.execute(Run.java:1810)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Referenced code can be found here (present since version 2.141).
threadDump #1, threadDump #2
Can we do something about it?
2.141 introduced a 2min wait on process termination (it would seem that multiplies with as many processes as were created during your build)
https://github.com/jenkinsci/jenkins/commit/d8eac92ee9a1c19bf145763589f1c152607bf3ed
unsure why killSoftly does not work but you can configure the timeout
In your jenkins.xml you can add this to your /service/arguments element (before the -jar) like so:
-DSoftKillWaitSeconds=0
After doing so and restarting jenkins you should be able to find your SoftKillWaitSeconds setting under /systemInfo
and your build time should be back to normal
Related
I am writing to enquire some problems about what I encountered while learning BEAM,which is about transportation simulation .First, the application of Java 1.8, Gradle 7.5.1, and GIT-IFS 2.3.4 had been installed and verified according to this website (https://beam.readthedocs.io/en/latest/users.html), After that, I cloned the beam repository from github and performed the subsequent operation. However, when I tried to run the beamville scenario, the code got something wrong. I found the error that Process 'command 'D:\java\java_1.8\bin\java.exe'' finished with non-zero exit value 2.
I have already tried my best to resolve this problem but it doesn’t work. Could you give me some advice?
Yesterday all of the sudden my projects on a Windows 10 machine stopped running in parallel due to file lock timeouts.
All my projects are using gradle-wrapper and provide a run task
When I start the 1st run-task, it work normally, but any following run-tasks break with the error like this:
> .\gradlew run
Starting a Gradle Daemon, 1 busy and 4 stopped Daemons could not be reused, use --status for details
FAILURE: Build failed with an exception.
* What went wrong:
Gradle could not start your build.
> Could not create service of type FileAccessTimeJournal using GradleUserHomeScopeServices.createFileAccessTimeJournal().
> Timeout waiting to lock journal cache (C:\Users\injec\.gradle\caches\journal-1). It is currently in use by another Gradle instance.
Owner PID: 16440
Our PID: 12216
Owner Operation:
Our operation:
Lock file: C:\Users\injec\.gradle\caches\journal-1\journal-1.lock
the --status option shows:
> .\gradlew --status
PID STATUS INFO
12216 IDLE 6.9.1
16440 BUSY 6.9.1
14992 STOPPED (stop command received)
7856 STOPPED (other compatible daemons were started and after being idle for 0 minutes and not recently used)
26680 STOPPED (by user or operating system)
18556 STOPPED (by user or operating system)
I tried different tricks, like switching the Gradle verison 5.6.1 - 6.8.3 - 6.9.1 and using the --stop option, but the error remains.
Adding the --stacktrace to the run command reveals that not only journal-1 cache is involved, but also some others dirs like modules-2.
I didn't do any changes to my system, apart from regular Win10 updates.
How can the problem be fixed?
TIA
It's likely that gradle process was exited abnormally and left the lock file behind. Check in the task manager if process with id 16440 exists, and if not just remove the orphan lock file C:\Users\injec\.gradle\caches\journal-1\journal-1.lock
This may be the file-system permissions of C:\Users\injec\.gradle... while you may have overseen one detail: you're calling .\gradlew instead of ./gradlew or gradlew.bat ...which means that you are not running on CMD, but on PS or WSL. gradlew.bat run would run directly on CMD.
Check .gradle file system. Gradle not works well on non-native file system. https://github.com/gradle/gradle/issues/15881
File system watching supports the following file system types:
APFS
btrfs
ext3
ext4
XFS
HFS+
NTFS
Gradle also supports VirtualBox’s shared folders.
Network file systems like Samba and NFS are not supported.
Symlinks
File system watching is not compatible with symlinks. If your project files include symlinks, symlinked files do not benefit from file system watching optimizations.
or you can disable file system watch for the build https://docs.gradle.org/current/userguide/file_system_watching.html#disable
Gradle maintains a Virtual File System (VFS) to calculate what needs to be rebuilt on repeat builds of a project. By watching the file system, Gradle keeps the VFS current between builds.
This is a very strange issue I am facing. I have my selenium scripts which work completely fine on my locally system when executed. Also I am using selenium grid for remote execution. When I try to run the same on my Jenkins which is installed on a VM my test fails. I have set my implicit timeout to 20 secs and page load timeout to 100 secs. Still I have the issue.
My Error logs on Jenkins for one of my failed test is:
Element not found exception.
We have Jenkins setup for running jobs for a project our team is currently working on but we are having problems with the jobs crashing constantly due to an OutOfMemory.
The Jenkins environment is running on a virtual machine. The machine it is on has fairly decent specs and doesn't have to many VMs on it. Our SBT jobs run in a separate jobs list which has 8GB of RAM available.
Project build.properties sbt.version=0.13.9
Jenkins ver. 2.6
We are executing the following command for the job:
/usr/java/default/bin/java -Xmx2G -XX:+CMSClassUnloadingEnabled -XX:MaxMetaspaceSize=2G -Dsbt.override.build.repos=true -Dsbt.log.noformat=true -jar /usr/local/sbt/default/bin/sbt-launch.jar compile test:compile test universal:publish
Which produces the following throughout the log:
Exception in thread "Thread-40" java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2626)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1321)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:945)
at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:934)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-29" java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:223)
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2321)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2614)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2624)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1321)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at sbt.React.react(ForkTests.scala:114)
at sbt.ForkTests$$anonfun$mainTestTask$1$Acceptor$2$.run(ForkTests.scala:74)
at java.lang.Thread.run(Thread.java:745)
The dump file the job produces here (pastebin.com/EM3qva5C)
We have tried different variations of the java args but all have come to the same result so we are wondering if there is something else wrong/what we need to change to prevent the builds from failing?
Your tests are working in a forked JVM, so you have to provide more memory to them.
Add the following line to build.sbt:
javaOptions ++= Seq("-Xmx1G")
So, finally trying to come up from the stone age, upgraded 1.514 to 1.644 without realizing all slaves need to be running Java 1.7 as well. So I install Java 7 on my master, swap .war files to run 1.644 and start it up. Slaves don't come up due to the aforementioned Java req. After stopping Jenkins and removing Java 1.7, I swap back to the 1.514 .war and start Jenkins back up. Now my build history is gone from all jobs with this error in the log:
WARNING: could not load /var/lib/jenkins/jobs/[job name removed]/builds/312 hudson.util.IOException2: Invalid directory name /var/lib/jenkins/jobs/YYMM Check and Build/builds/312 at hudson.model.Run.parseTimestampFromBuildDir(Run.java:354)
...
Caused by: java.text.ParseException: Unparseable date: "312" at java.text.DateFormat.parse(DateFormat.java:354) at hudson.model.Run.parseTimestampFromBuildDir(Run.java:352) ... 155 more
The only things I can find online relate to issues that were fixed pre-1.514. Anyone have any ideas? Thanks for helping.
Installed the latest version that works with Java 1.6: 1.607, and that fixed the issue as soon as it started up.
Your issues are likely related the change to the build directory naming, see JENKINS-24380+Migration.
In case you want to downgrade, there is an “unmigrate” script provided to reverse the migration of $JENKINS_HOME. To do this:
Start Jenkins ≥1.597.
Visit http://server/jenkins/JENKINS-24380/ and copy the unmigration instruction.
Shut down Jenkins completely.
Run the command as instructed by the step above.
Start Jenkins <1.597 with the same $JENKINS_HOME.