I have a long log file generated with log4j, 10 threads writing to log.
I am looking for log analyzer tool that could find lines where user waited for a long time (i.e where the difference between log entries for the same thread is more than a minute).
P.S I am trying to use OtrosLogViewer, but it gives filtering by certain values (for example, by thread ID), and does not compare between lines.
PPS
the new version of OtrosLogViewer has a "Delta" column that calculates the difference between adj log lines (in ms)
thank you
This simple Python script may be enough. For testing, I analized my local Apache log, which BTW uses the Common Log Format so you may even reuse it as-is. I simply compute the difference between two subsequent requests, and print the request line for deltas exceeding a certain threshold (1 second in my test). You may want to encapsulate the code in a function which also accepts a parameter with the thread ID, so you can filter further
#!/usr/bin/env python
import re
from datetime import datetime
THRESHOLD = 1
last = None
for line in open("/var/log/apache2/access.log"):
# You may insert here something like
# if not re.match(THREAD_ID, line):
# continue
# Python does not support %z, hence the [:-6]
current = datetime.strptime(
re.search(r"\[([^]]+)]", line).group(1)[:-6],
"%d/%b/%Y:%H:%M:%S")
if last != None and (current - last).seconds > THRESHOLD:
print re.search('"([^"]+)"', line).group(1)
last = current
Based on #Raffaele answer, I made some fixes to work on any log file (skipping lines that doesn't begin with the requested date, e.g. Jenkins console log).
In addition, added Max / Min Threshold to filter out lines base on duration limits.
#!/usr/bin/env python
import re
from datetime import datetime
MIN_THRESHOLD = 80
MAX_THRESHOLD = 100
regCompile = r"\w+\s+(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d).*"
filePath = "C:/Users/user/Desktop/temp/jenkins.log"
lastTime = None
lastLine = ""
with open(filePath, 'r') as f:
for line in f:
regexp = re.search(regCompile, line)
if regexp:
currentTime = datetime.strptime(re.search(regCompile, line).group(1), "%Y-%m-%d %H:%M:%S")
if lastTime != None:
duration = (currentTime - lastTime).seconds
if duration >= MIN_THRESHOLD and duration <= MAX_THRESHOLD:
print ("#######################################################################################################################################")
print (lastLine)
print (line)
lastTime = currentTime
lastLine = line
f.closed
Apache Chainsaw has a time delta column.
Related
I am writing a program in Java to periodically display the CPU and memory usage of a given process ID. My implementation invokes tasklist. It is pretty straightforward to get the memory usage by the following command:
tasklist /fi "memusage ge 0" /fi "pid eq 2076" /v
This will return the memory usage of process id 2076 and i can use this for my task. By invoking the following command, I can extract the CPU Time.
tasklist /fi "pid eq 2076" /fi "CPUTIME ge 00:00:00" /v
My question is, how would I go about getting the CPU usage of this process?
I found a post on StackOverflow for my question but the answer isn't clear and I don't understand what to type in the command to get what I need. The question was answered in 2008 and someone asked for clarification in 2013 but the person that answered the question hasn't replied.
Here is the post that I have found.
Memory is like a tea cup, it maybe full or empty, an instantaneous look at the cup allows you to see how full of tea it is (that is your "memusage" command).
CPU is like a ski lift. It moves at a reasonably constant rate irrespective of whether your are riding the lift or not. It is not possible to determine your usage in a single instantaneous observation - we need to know how long you were riding it for (that is your "cputime" command). You have to use the "cputime" command at least twice!
For example:
At 7:09 pm, you run the cputime command on your process, and it returns "28 minutes"
At 7:17 pm, you run the cputime command on your process again, and it returns "32 minutes"
From the first time you ran the cputime command to the second time, the usage has increased from 28 minutes to 32 minutes -- the process has used 4 minutes of CPU time.
From 7:09pm to 7:17pm is a duration of 8 minutes -- A total of 8 minutes of time were available, but your process just used 4 minutes: 4 / 8 = 50% average system usage.
If your system has multiple processors, then you can divide by the total number of CPUs to get an average per CPU - e.g. 50% / 2 = 25% average in a dual cpu system.
I used minutes above for ease of writing - in reality you may be looking at how many nanoseconds of CPU time the process used during a time window that is just milliseconds long.
tasklist does not provide the information you are looking for. I would suggest using Get-Counter. A comment on an answer from the SuperUser site looks to be on track for what you're after.
Get-Counter '\Process(*)\% Processor Time' | Select-Object -ExpandProperty countersamples| Select-Object -Property instancename, cookedvalue| ? {$_.instanceName -notmatch "^(idle|_total|system)$"} | Sort-Object -Property cookedvalue -Descending| Select-Object -First 25| ft InstanceName,#{L='CPU';E={($_.Cookedvalue/100/$env:NUMBER_OF_PROCESSORS).toString('P')}} -AutoSize
I once wrote a class:
private static class PerformanceMonitor {
private int availableProcessors = ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors();
private long lastSystemTime = 0;
private long lastProcessCpuTime = 0;
/**
* Get's the cpu usage of the jvm
*
* #return the cpu usage a double of percentage
*/
private synchronized double getCpuUsage() {
if (lastSystemTime == 0) {
baselineCounters();
return 0d;
}
long systemTime = System.nanoTime();
long processCpuTime = 0;
if (getOperatingSystemMXBean() instanceof com.sun.management.OperatingSystemMXBean) {
processCpuTime = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuTime();
}
double cpuUsage = ((double) (processCpuTime - lastProcessCpuTime)) / ((double) (systemTime - lastSystemTime));
lastSystemTime = systemTime;
lastProcessCpuTime = processCpuTime;
return cpuUsage / availableProcessors;
}
private void baselineCounters() {
lastSystemTime = System.nanoTime();
if (getOperatingSystemMXBean() instanceof com.sun.management.OperatingSystemMXBean) {
lastProcessCpuTime = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuTime();
}
}
}
Which is used like:
private static final PerformanceMonitor _MONITOR = new PerformanceMonitor();
_MONITOR.getCpuUsage();
This prints out the usage of the cpu consumed by the process of this JVM.
I'm running into a very curious problem. I've an R code which doesn't contain any print statement (except my explicit calls to log the time taken) and yet the entire json gets dumped on to the "R console". This is causing a serious performance issue with our module and I need your help to track down the problem.
Here is just a part of the R file (due to company policies, I cannot post the entire source code and I apologize for not giving much info)
#run time/online time series models
LAST_TIME_ID <- DATA[nrow(DATA),id];
LAST_TIME_ID <- strptime(LAST_TIME_ID,format="%d-%m-%Y %H:%M");
#timestamp tag computation using frequency
FREQUENCY_VEC <- rep(TIMESTAMP_FREQUENCY*60,PREDICTION_NUMBER);
FREQUENCY_VEC <- cumsum(FREQUENCY_VEC);
TIMESTAMP_TAGS <- LAST_TIME_ID + FREQUENCY_VEC;
TIMESTAMP_TAGS <- format(strptime(TIMESTAMP_TAGS,format="%Y-%m-%d %H:%M"),format="%d-%m-%Y %H:%M");
#prepare the prediction points data per tag into table format
PREDICTION_DATA <- NULL;
startTime <- Sys.time();
for (tag_index in 1:length(MODEL[,tag_id])) {
TEMP <- data.table(id=as.character(TIMESTAMP_TAGS),tag_id = MODEL[tag_index,tag_id], prediction = as.vector(MODEL[,Forecast][[tag_index]]));
PREDICTION_DATA <- data.table(rbind(PREDICTION_DATA,TEMP));
rm(TEMP);
};
endTime <- Sys.time();
print(paste("seconds consumed (prediction points data per tag into TEMP): ",(endTime-startTime)/1000));
#OUTPUT <- dcast(PREDICTION_DATA,id~tag_id); #into output
OUTPUT <- PREDICTION_DATA;
#compute final output in json format
js_object <- toJSON(OUTPUT,asIs = TRUE);
js_object;
I can assure you the rest of code looks the same (i.e. no prints). I'm running my R code through Java (1.8) using RServe (REngine.jar) on Windows 8.
Any ideas/clues would be greatly appreciated.
The last line of your choice snippet where you just execute:
js_object;
prints the variable to the console.
Remove such type of statements where nothing assigned to a variable.
When you type the js_object file name at the end, you are calling that object and R is printing the contents to the console. All you need to do is remove that an it should stop printing it out.
I am using stanford posttager toolkit to tag list of words from academic papers. Here is my codes of this part:
st = StanfordPOSTagger(stanford_tagger_path, stanford_jar_path, encoding = 'utf8', java_options = '-mx2048m')
word_tuples = st.tag(document)
document is a list of words derived from nltk.word_tokenize, they come from mormal academic papers so usually there are several thousand of words (mostly 3000 - 4000). I need to process over 10000 files so I keep calling these functions. My program words fine on a small test set with 270 files, but when the number of file gets bigger, the program gives out this error (Java heap space 2G):
raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed
Note that this error does not occur immediately after the execution, it happens after some time of running. I really don't know the reason. Is this because my 3000 - 4000 words are too much ? Thank you very much for help !(Sorry for the bad edition, the error information is too long)
Here is my solution to the code,after I too faced the error.Basically increasing JAVA heapsize solved it.
import os
java_path = "C:\\Program Files\\Java\\jdk1.8.0_102\\bin\\java.exe"
os.environ['JAVAHOME'] = java_path
from nltk.tag.stanford import StanfordPOSTagger
path_to_model = "stanford-postagger-2015-12-09/models/english-bidirectional-distsim.tagger"
path_to_jar = "stanford-postagger-2015-12-09/stanford-postagger.jar"
tagger=StanfordPOSTagger(path_to_model, path_to_jar)
tagger.java_options='-mx4096m' ### Setting higher memory limit for long sentences
sentence = 'This is testing'
print tagger.tag(sentence.split())
I assume you have tried increasing the Java stack via the Tagger settings like so
stanford.POSTagger([...], java_options="-mxSIZEm")
Cf the docs, default is 1000:
def __init__(self, [...], java_options='-mx1000m')
In order to test if it is a problem with the size of the dataset, you can tokenize your text into sentences, e.g. using the Punkt Tokenizer and output them right after tagging.
I've dockerized graphite and am working with this library to get metrics from an Apache Storm topology. I'm getting metrics data, but no matter what I do I can only get data per minute where I really need the points to be per second.
As per this SO post I've set the retention policy to grab data every second. I've also set
conf.put("topology.builtin.metrics.bucket.size.secs", 1);
and
void initMetrics(TopologyContext context) {
messageCountMetric = new CountMetric();
context.registerMetric("digest_count", messageCountMetric, 1);
}
in the class that's setting up the topology and the bolt itself, respectively. To my understanding this should cause metrics to be reported every second. What am I missing here? How can I get metrics to be reported every second?
t/y in advance and happy holidays all!
update 1
here is my storage-schemas.conf file:
root#cdd13a16103a:/etc/carbon# cat storage-schemas.conf
# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
# [name]
# pattern = regex
# retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...
# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings
[carbon]
pattern = ^carbon\.
retentions = 1s:6h,1min:7d,10min:5y
[default_1min_for_1day]
pattern = .*
retentions = 1s:6h,1min:7d,10min:5y
[test]
pattern = ^test.
retentions = 1s:6h,1min:7d,10min:5y
[storm]
pattern = ^storm.
retentions = 1s:6h,1min:7d,10min:5y
Here is my config setup:
Config conf = new Config();
conf.setDebug(false);
conf.put("topology.builtin.metrics.bucket.size.secs", 1);
conf.registerMetricsConsumer(GraphiteMetricsConsumer.class, 4);
conf.put("metrics.reporter.name", "com.verisign.storm.metrics.reporters.graphite.GraphiteReporter");
conf.put("metrics.graphite.host", "127.0.0.1");
conf.put("metrics.graphite.port", "2003");
conf.put("metrics.graphite.prefix", "storm.test");
In order to apply changes in storage-schemas.conf you have to:
restart carbons
delete old *.wsp or use whisper-resize.py to apply scheme
restart carbon-cache
make sure that DEFAULT_CACHE_DURATION in webapp's local_settings.py is set to 1
make sure nginx/apache2/uwsgi cache is set up correctly as well, if any
There is more whisper-* tools shipped with graphite. The next you may be interested is whisper-info.py
bash$ whisper-info.py /graphite/whisper/prod/some/metric.wsp
maxRetention: 1296000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 142600
Archive 0
retention: 691200
>> secondsPerPoint: 1
points: 11520
size: 138240
offset: 40
Archive 1
retention: 1296000
secondsPerPoint: 3600
points: 360
size: 4320
offset: 138280
I'm getting an out of memory exception due to lack of Java heap space when I try and download tweets using Flume and pipe them into Hadoop.
I have set the heap space currently to 4GB in the mapred-site.xml of Hadoop, like so:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>
I am hoping to download tweets continually for two days but can't get past 45 minutes without errors.
Since I do have the disk space to hold all of this, I am assuming the error is coming from Java having to handle so many things at once. Is there a way for me to slow down the speed at which these tweets are downloaded, or do something else to solve this problem?
Edit: flume.conf included
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <required>
TwitterAgent.sources.Twitter.consumerSecret = <required>
TwitterAgent.sources.Twitter.accessToken = <required>
TwitterAgent.sources.Twitter.accessTokenSecret = <required>
TwitterAgent.sources.Twitter.keywords = manchester united, man united, man utd, man u
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:50070/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Edit 2
I've tried increasing the memory to 8GB which still doesn't help. I am assuming I am placing too many tweets in Hadoop at once and need to write them to disk and release the space again (or something to that effect). Is there a guide anywhere on how to do this?
Set JAVA_OPTS value at flume-env.sh and start flume agent.
It appears the problem had to do with the batch size and transactionCapacity. I changed them to the following:
TwitterAgent.sinks.HDFS.hdfs.batchSize = 100
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
This works without me even needing to change the JAVA_OPTS value.