Ehcache + Hibernate Log Output - java

I am receiving Eh-cache logging output when using it with Hibernate 2nd Level cache - i do not undertand the output and what it might mean. It is being printed to the logs a lot.
DEBUG [net.sf.ehcache.store.disk.Segment] put added 0 on heap
DEBUG [net.sf.ehcache.store.disk.Segment] put updated, deleted 0 on heap
Could anyone shed some light on what this might mean? My Second level cache appears to be working, according to a print of the statistics...
INFO [com.atlaschase.falcon.commands.domain.AircraftCommandResolutionService] [ name = aircraftCache cacheHits = 824 onDiskHits = 0 offHeapHits = 0 inMemoryHits = 824 misses = 182 onDiskMisses = 182 offHeapMisses = 0 inMemoryMisses = 182 size = 91 averageGetTime = 1.0745527 evictionCount = 0 ]
Any help would be appreciated ..
Simon

This output is generated by DiskStore, IIRC enabled by default in EhCache. Basically EhCache overflows cached data from memory to disk. If you want to disable this functionality, set overflowToDisk property to flase:
<cache name="..." overflowToDisk="false"
Oh - can someone also confirm that the 'averageGetTime' is in milliseconds and not seconds?
Confirmed, milliseconds. Although the JavaDoc of Statistics.getAverageGetTime() is slightly confusing:
[...] Because ehcache support JDK1.4.2, each get time uses System.currentTimeMilis, rather than nanoseconds. The accuracy is thus limited.
I found the following code in LiveCacheStatisticsImpl:
public float getAverageGetTimeMillis() {
//...
return (float) totalGetTimeTakenMillis.get() / hitCount;
}

Related

Read tomcat jmxproxy bean

I spent a lot of time looking through the documentation to get data of memory using a specific URL from webserver/manager/jmxproxy/.
In documentation i see something like this:
http://webserver/manager/jmxproxy/?get=BEANNAME&att=MYATTRIBUTE&key=MYKEY
but when i try to fetch data from this url i get Error - javax.management.InstanceNotFoundException
...
Name: java.lang:name=G1 Old Gen,type=MemoryPool
modelerType: sun.management.MemoryPoolImpl
Name: G1 Old Gen
Type: HEAP
Valid: true
Usage: {committed=3175088128, init=8136949760, max=8589934592, used=477330968}
PeakUsage: {committed=8136949760, init=8136949760, max=8589934592, used=478437064}
MemoryManagerNames: Array[java.lang.String] of length 2
G1 Old Generation
G1 Young Generation
UsageThreshold: 0
UsageThresholdExceeded: false
UsageThresholdCount: 0
UsageThresholdSupported: true
CollectionUsageThreshold: 0
CollectionUsageThresholdExceeded: false
CollectionUsageThresholdCount: 0
CollectionUsage: {committed=0, init=8136949760, max=8589934592, used=0}
CollectionUsageThresholdSupported: true
ObjectName: java.lang:type=MemoryPool,name=G1 Old Gen
...
I want to fetch used heap space bu url, someone can help me?
I tried to use the link I found in the documentation and it returned nice data, but I care about heap G1
http://webserver:8080/manager/jmxproxy/?get=java.lang:type=Memory&att=HeapMemoryUsage&key=used
Result:
OK - Attribute get 'java.lang:type=Memory' - HeapMemoryUsage - key 'used' = 2004343816

How can I get the CPU usage of a process with "tasklist" in Windows

I am writing a program in Java to periodically display the CPU and memory usage of a given process ID. My implementation invokes tasklist. It is pretty straightforward to get the memory usage by the following command:
tasklist /fi "memusage ge 0" /fi "pid eq 2076" /v
This will return the memory usage of process id 2076 and i can use this for my task. By invoking the following command, I can extract the CPU Time.
tasklist /fi "pid eq 2076" /fi "CPUTIME ge 00:00:00" /v
My question is, how would I go about getting the CPU usage of this process?
I found a post on StackOverflow for my question but the answer isn't clear and I don't understand what to type in the command to get what I need. The question was answered in 2008 and someone asked for clarification in 2013 but the person that answered the question hasn't replied.
Here is the post that I have found.
Memory is like a tea cup, it maybe full or empty, an instantaneous look at the cup allows you to see how full of tea it is (that is your "memusage" command).
CPU is like a ski lift. It moves at a reasonably constant rate irrespective of whether your are riding the lift or not. It is not possible to determine your usage in a single instantaneous observation - we need to know how long you were riding it for (that is your "cputime" command). You have to use the "cputime" command at least twice!
For example:
At 7:09 pm, you run the cputime command on your process, and it returns "28 minutes"
At 7:17 pm, you run the cputime command on your process again, and it returns "32 minutes"
From the first time you ran the cputime command to the second time, the usage has increased from 28 minutes to 32 minutes -- the process has used 4 minutes of CPU time.
From 7:09pm to 7:17pm is a duration of 8 minutes -- A total of 8 minutes of time were available, but your process just used 4 minutes: 4 / 8 = 50% average system usage.
If your system has multiple processors, then you can divide by the total number of CPUs to get an average per CPU - e.g. 50% / 2 = 25% average in a dual cpu system.
I used minutes above for ease of writing - in reality you may be looking at how many nanoseconds of CPU time the process used during a time window that is just milliseconds long.
tasklist does not provide the information you are looking for. I would suggest using Get-Counter. A comment on an answer from the SuperUser site looks to be on track for what you're after.
Get-Counter '\Process(*)\% Processor Time' | Select-Object -ExpandProperty countersamples| Select-Object -Property instancename, cookedvalue| ? {$_.instanceName -notmatch "^(idle|_total|system)$"} | Sort-Object -Property cookedvalue -Descending| Select-Object -First 25| ft InstanceName,#{L='CPU';E={($_.Cookedvalue/100/$env:NUMBER_OF_PROCESSORS).toString('P')}} -AutoSize
I once wrote a class:
private static class PerformanceMonitor {
private int availableProcessors = ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors();
private long lastSystemTime = 0;
private long lastProcessCpuTime = 0;
/**
* Get's the cpu usage of the jvm
*
* #return the cpu usage a double of percentage
*/
private synchronized double getCpuUsage() {
if (lastSystemTime == 0) {
baselineCounters();
return 0d;
}
long systemTime = System.nanoTime();
long processCpuTime = 0;
if (getOperatingSystemMXBean() instanceof com.sun.management.OperatingSystemMXBean) {
processCpuTime = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuTime();
}
double cpuUsage = ((double) (processCpuTime - lastProcessCpuTime)) / ((double) (systemTime - lastSystemTime));
lastSystemTime = systemTime;
lastProcessCpuTime = processCpuTime;
return cpuUsage / availableProcessors;
}
private void baselineCounters() {
lastSystemTime = System.nanoTime();
if (getOperatingSystemMXBean() instanceof com.sun.management.OperatingSystemMXBean) {
lastProcessCpuTime = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuTime();
}
}
}
Which is used like:
private static final PerformanceMonitor _MONITOR = new PerformanceMonitor();
_MONITOR.getCpuUsage();
This prints out the usage of the cpu consumed by the process of this JVM.

Graphite/Carbon how to get per-second metrics

I've dockerized graphite and am working with this library to get metrics from an Apache Storm topology. I'm getting metrics data, but no matter what I do I can only get data per minute where I really need the points to be per second.
As per this SO post I've set the retention policy to grab data every second. I've also set
conf.put("topology.builtin.metrics.bucket.size.secs", 1);
and
void initMetrics(TopologyContext context) {
messageCountMetric = new CountMetric();
context.registerMetric("digest_count", messageCountMetric, 1);
}
in the class that's setting up the topology and the bolt itself, respectively. To my understanding this should cause metrics to be reported every second. What am I missing here? How can I get metrics to be reported every second?
t/y in advance and happy holidays all!
update 1
here is my storage-schemas.conf file:
root#cdd13a16103a:/etc/carbon# cat storage-schemas.conf
# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
# [name]
# pattern = regex
# retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...
# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings
[carbon]
pattern = ^carbon\.
retentions = 1s:6h,1min:7d,10min:5y
[default_1min_for_1day]
pattern = .*
retentions = 1s:6h,1min:7d,10min:5y
[test]
pattern = ^test.
retentions = 1s:6h,1min:7d,10min:5y
[storm]
pattern = ^storm.
retentions = 1s:6h,1min:7d,10min:5y
Here is my config setup:
Config conf = new Config();
conf.setDebug(false);
conf.put("topology.builtin.metrics.bucket.size.secs", 1);
conf.registerMetricsConsumer(GraphiteMetricsConsumer.class, 4);
conf.put("metrics.reporter.name", "com.verisign.storm.metrics.reporters.graphite.GraphiteReporter");
conf.put("metrics.graphite.host", "127.0.0.1");
conf.put("metrics.graphite.port", "2003");
conf.put("metrics.graphite.prefix", "storm.test");
In order to apply changes in storage-schemas.conf you have to:
restart carbons
delete old *.wsp or use whisper-resize.py to apply scheme
restart carbon-cache
make sure that DEFAULT_CACHE_DURATION in webapp's local_settings.py is set to 1
make sure nginx/apache2/uwsgi cache is set up correctly as well, if any
There is more whisper-* tools shipped with graphite. The next you may be interested is whisper-info.py
bash$ whisper-info.py /graphite/whisper/prod/some/metric.wsp
maxRetention: 1296000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 142600
Archive 0
retention: 691200
>> secondsPerPoint: 1
points: 11520
size: 138240
offset: 40
Archive 1
retention: 1296000
secondsPerPoint: 3600
points: 360
size: 4320
offset: 138280

Java Out of Memory exception in Ubuntu when using Flume/Hadoop

I'm getting an out of memory exception due to lack of Java heap space when I try and download tweets using Flume and pipe them into Hadoop.
I have set the heap space currently to 4GB in the mapred-site.xml of Hadoop, like so:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>
I am hoping to download tweets continually for two days but can't get past 45 minutes without errors.
Since I do have the disk space to hold all of this, I am assuming the error is coming from Java having to handle so many things at once. Is there a way for me to slow down the speed at which these tweets are downloaded, or do something else to solve this problem?
Edit: flume.conf included
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <required>
TwitterAgent.sources.Twitter.consumerSecret = <required>
TwitterAgent.sources.Twitter.accessToken = <required>
TwitterAgent.sources.Twitter.accessTokenSecret = <required>
TwitterAgent.sources.Twitter.keywords = manchester united, man united, man utd, man u
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:50070/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Edit 2
I've tried increasing the memory to 8GB which still doesn't help. I am assuming I am placing too many tweets in Hadoop at once and need to write them to disk and release the space again (or something to that effect). Is there a guide anywhere on how to do this?
Set JAVA_OPTS value at flume-env.sh and start flume agent.
It appears the problem had to do with the batch size and transactionCapacity. I changed them to the following:
TwitterAgent.sinks.HDFS.hdfs.batchSize = 100
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
This works without me even needing to change the JAVA_OPTS value.

log analyze: finding lines by time difference

I have a long log file generated with log4j, 10 threads writing to log.
I am looking for log analyzer tool that could find lines where user waited for a long time (i.e where the difference between log entries for the same thread is more than a minute).
P.S I am trying to use OtrosLogViewer, but it gives filtering by certain values (for example, by thread ID), and does not compare between lines.
PPS
the new version of OtrosLogViewer has a "Delta" column that calculates the difference between adj log lines (in ms)
thank you
This simple Python script may be enough. For testing, I analized my local Apache log, which BTW uses the Common Log Format so you may even reuse it as-is. I simply compute the difference between two subsequent requests, and print the request line for deltas exceeding a certain threshold (1 second in my test). You may want to encapsulate the code in a function which also accepts a parameter with the thread ID, so you can filter further
#!/usr/bin/env python
import re
from datetime import datetime
THRESHOLD = 1
last = None
for line in open("/var/log/apache2/access.log"):
# You may insert here something like
# if not re.match(THREAD_ID, line):
# continue
# Python does not support %z, hence the [:-6]
current = datetime.strptime(
re.search(r"\[([^]]+)]", line).group(1)[:-6],
"%d/%b/%Y:%H:%M:%S")
if last != None and (current - last).seconds > THRESHOLD:
print re.search('"([^"]+)"', line).group(1)
last = current
Based on #Raffaele answer, I made some fixes to work on any log file (skipping lines that doesn't begin with the requested date, e.g. Jenkins console log).
In addition, added Max / Min Threshold to filter out lines base on duration limits.
#!/usr/bin/env python
import re
from datetime import datetime
MIN_THRESHOLD = 80
MAX_THRESHOLD = 100
regCompile = r"\w+\s+(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d).*"
filePath = "C:/Users/user/Desktop/temp/jenkins.log"
lastTime = None
lastLine = ""
with open(filePath, 'r') as f:
for line in f:
regexp = re.search(regCompile, line)
if regexp:
currentTime = datetime.strptime(re.search(regCompile, line).group(1), "%Y-%m-%d %H:%M:%S")
if lastTime != None:
duration = (currentTime - lastTime).seconds
if duration >= MIN_THRESHOLD and duration <= MAX_THRESHOLD:
print ("#######################################################################################################################################")
print (lastLine)
print (line)
lastTime = currentTime
lastLine = line
f.closed
Apache Chainsaw has a time delta column.

Categories

Resources