I am trying to learn JMX for the last few days and now got confuse here.
I have written a simple JMX programe which is using the APIs of package java.lang.management and trying to extract the Pid, CPU time, user time. In my result I am only getting the results of current JVM thread which is my JMX programe itself but I thought I should get the result of all the processes running over JVM on the same machine. How I will get the pids, cpu time, user time for all java processes running in JVM(LINUX/WDs).
How should I can get the pids, cpu time, user time for all non-java processes running in my machine(LINUX/WDs).
My code is below:
public void update() throws Exception{
final ThreadMXBean bean = ManagementFactory.getThreadMXBean();
final long[] ids = bean.getAllThreadIds();
final ThreadInfo[] infos = bean.getThreadInfo(ids);
for (long id : ids) {
if (id == threadId) {
continue; // Exclude polling thread
}
final long c = bean.getThreadCpuTime(id);
final long u = bean.getThreadUserTime(id);
if (c == -1 || u == -1) {
continue; // Thread died
}
}
String name = null;
for (int i = 0; i < infos.length; i++) {
name = infos[i].getThreadName();
System.out.print("The name of the id is /n" + name);
}
}
I am always getting the result:
The name of the id is Attach Listener
The name of the id is Signal Dispatcher
The name of the id is Finalizer
The name of the id is Reference Handler
The name of the id is main
I have some other java processes running on my machine they are not been included in the results of bean.getAllThreadIds() API..
Ah, now I see what you want to do. I'm afraid I have some bad news.
The APIs that are exposed through ManagementFactory allow you to monitor only the JVM in which your code is running. To monitor other JVMs, you have to use the JMX Remoting API (javax.management.remote), and that introduces a whole new range of issues you have to deal with.
It sounds like what you want to do is basically write your own management console using the stock APIs provided by out-of-the-box JDK. Short answer: you can't get there from here. Slightly longer answer: you can get there from here, but the road is long, winding, uphill (nearly) the entire way, and when you're done you will most likely wish you had gone a different route (read that: use a management console that has already been written).
I recommend you use JConsole or some other management console to monitor your application(s). In my experience it is usually only important that a human (not a program) interpret the stats that are provided by the various MBeans whose references are obtainable through the ManagementFactory static methods. After all, if a program had access to, say, the amount of CPU used by some other process, what conceivable use would it have with that information (other than to provide it in some human-readable format)?
Related
Every once in a while, a server or database error causes thousands of the same stack trace in the server log files. It might be a different error/stacktrace today than a month ago. But it causes the log files to rotate completely, and I no longer have visibility into what happened before. (Alternately, I don't want to run out of disk space, which for reasons outside my control right now is limited--I'm addressing that issue separately). At any rate, I don't need thousands of copies of the same stack trace--just a dozen or so should be enough.
I would like it if I could have log4j/log4j2/another system automatically collapse repetitive errors, so that they don't fill up the log files. For example, a threshold of maybe 10 or 100 exceptions from the same place might trigger log4j to just start counting, and wait until they stop coming, then output a count of how many more times they appeared.
What pre-made solutions exist (a quick survey with links is best)? If this is something I should implement myself, what is a good pattern to start with and what should I watch out for?
Thanks!
Will the BurstFilter do what you want? If not, please create a Jira issue with the algorithm that would work for you and the Log4j team would be happy to consider it. Better yet, if you can provide a patch it would be much more likely to be incorporated.
Log4j's BurstFilter will certainly help prevent you filling your disks. Remember to configure it so that it applies in as limited a section of code as you can, or you'll filter out messages you might want to keep (that is, don't use it on your appender, but on a particular logger that you isolate in your code).
I wrote a simple utility class at one point that wrapped a logger and filtered based on n messages within a given Duration. I used instances of it around most of my warning and error logs to protect the off chance that I'd run into problems like you did. It worked pretty well for my situation, especially because it was easier to quickly adapt for different situations.
Something like:
...
public DurationThrottledLogger(Logger logger, Duration throttleDuration, int maxMessagesInPeriod) {
...
}
public void info(String msg) {
getMsgAddendumIfNotThrottled().ifPresent(addendum->logger.info(msg + addendum));
}
private synchronized Optional<String> getMsgAddendumIfNotThrottled() {
LocalDateTime now = LocalDateTime.now();
String msgAddendum;
if (throttleDuration.compareTo(Duration.between(lastInvocationTime, now)) <= 0) {
// last one was sent longer than throttleDuration ago - send it and reset everything
if (throttledInDurationCount == 0) {
msgAddendum = " [will throttle future msgs within throttle period]";
} else {
msgAddendum = String.format(" [previously throttled %d msgs received before %s]",
throttledInDurationCount, lastInvocationTime.plus(throttleDuration).format(formatter));
}
totalMessageCount++;
throttledInDurationCount = 0;
numMessagesSentInCurrentPeriod = 1;
lastInvocationTime = now;
return Optional.of(msgAddendum);
} else if (numMessagesSentInCurrentPeriod < maxMessagesInPeriod) {
msgAddendum = String.format(" [message %d of %d within throttle period]", numMessagesSentInCurrentPeriod + 1, maxMessagesInPeriod);
// within throttle period, but haven't sent max messages yet - send it
totalMessageCount++;
numMessagesSentInCurrentPeriod++;
return Optional.of(msgAddendum);
} else {
// throttle it
totalMessageCount++;
throttledInDurationCount++;
return emptyOptional;
}
}
I'm pulling this from an old version of the code, unfortunately, but the gist is there. I wrote a bunch of static factory methods that I mainly used because they let me write a single line of code to create one of these for that one log message:
} catch (IOException e) {
DurationThrottledLogger.error(logger, Duration.ofSeconds(1), "Received IO Exception. Exiting current reader loop iteration.", e);
}
This probably won't be as important in your case; for us, we were using a somewhat underpowered graylog instance that we could hose down fairly easily.
I have a small problem with creating threads in EJB.OK I understand why i can not use them in EJB, but dont know how to replace them with the same functionality.I am trying to download 30-40 webpages/files and i need to start downloading of all files at the same time(approximately).This is need ,because if i run them in one thread in queue.It will excecute more than 3 minutes.
I try with #Asyncronious anotation, but nothing happened.
public void execute(String lang2, String lang1,int number) {
Stopwatch timer = new Stopwatch().start();
htmlCodes.add(URL2String(URLs.get(number)));
timer.stop();
System.out.println( number +":"+ Thread.currentThread().getName() + timer.elapsedMillis()+"miseconds");
}
private void findMatches(String searchedWord, String lang1, String lang2) {
articles = search(searchedWord);
for (int i = 0; i < articles.size(); i++) {
execute(lang1,lang2,i);
}
Here are two really good SO answers that can help. This one gives you your options, and this one explains why you shouldn't spawn threads in an ejb. The problem with the first answer is it doesn't contain a lot of knowledge about EJB 3.0 options. So, here's a tutorial on using #Asynchronous.
No offense, but I don't see any evidence in your code that you've read this tutorial yet. Your asynchronous method should return a Future. As the tutorial says:
The client may retrieve the result using one of the Future.get methods. If processing hasn’t been completed by the session bean handling the invocation, calling one of the get methods will result in the client halting execution until the invocation completes. Use the Future.isDone method to determine whether processing has completed before calling one of the get methods.
So I'm working on web scraping for a certain website. The problem is:
Given a set of URLs (in the order of 100s to 1000s), I would like to retrieve the HTML of each URL in an efficient manner, specially time-wise. I need to be able to do 1000s of requests every 5 minutes.
This should usually imply using a pool of threads to do requests from a set of not yet requested urls. But before jumping into implementing this, I believe that it's worth asking here since I believe this is a fairly common problem when doing web scraping or web crawling.
Is there any library that has what I need?
So I'm working on web scraping for a certain website.
Are you scraping a single server or is the website scraping from multiple other hosts? If it is the former, then the server you are scraping may not like too many concurrent connections from a single i/p.
If it is the latter, this is really a general question on how many outbound connections you should open from a machine. There is physical limit, but it is pretty large. Practically, it would depend on where that client is getting deployed. The better the connectivity, the higher number of connections it can accommodate.
You might want to look at the source code of a good download manager to see if they have a limit on the number of outbound connections.
Definitely user asynchronous i/o, but you would still do well to limit the number.
Your bandwidth utilization will be the sum of all of the HTML documents that you retrieve (plus a little overhead) no matter how you slice it (though some web servers may support compressed HTTP streams, so certainly use a client capable of accepting them).
The optimal number of concurrent threads depends a great deal on your network connectivity to the sites in question. Only experimentation can find an optimal number. You can certainly use one set of threads for retrieving HTML documents and a separate set of threads to process them to make it easier to find the right balance.
I'm a big fan of HTML Agility Pack for web scraping in the .NET world but cannot make a specific recommendation for Java. The following question may be of use in finding a good, Java based scraping platform
Web scraping with Java
I would start by researching asynchronous communication. Then take a look at Netty.
Keep in mind there is always a limit to how fast one can load a web page. For an average home connection, it will be around a second. Take this into consideration when programming your application.
http://wwww.Jsoup.org just for scrapping part! The thread pooling i think you should implement urself.
Update
if this approach is fitting your need, you can download the complete class files here:
http://codetoearn.blogspot.com/2013/01/concurrent-web-requests-with-thread.html
AsyncWebReader webReader = new AsyncWebReader(5/*number of threads*/, new String[]{
"http://www.google.com",
"http://www.yahoo.com",
"http://www.live.com",
"http://www.wikipedia.com",
"http://www.facebook.com",
"http://www.khorasannews.com",
"http://www.fcbarcelona.com",
"http://www.khorasannews.com",
});
webReader.addObserver(new Observer() {
#Override
public void update(Observable o, Object arg) {
if (arg instanceof Exception) {
Exception ex = (Exception) arg;
System.out.println(ex.getMessage());
} /*else if (arg instanceof List) {
List vals = (List) arg;
System.out.println(vals.get(0) + ": " + vals.get(1));
} */else if (arg instanceof Object[]) {
Object[] objects = (Object[]) arg;
HashMap result = (HashMap) objects[0];
String[] success = (String[]) objects[1];
String[] fail = (String[]) objects[2];
System.out.println("Failds");
for (int i = 0; i < fail.length; i++) {
String string = fail[i];
System.out.println(string);
}
System.out.println("-----------");
System.out.println("success");
for (int i = 0; i < success.length; i++) {
String string = success[i];
System.out.println(string);
}
System.out.println("\n\nresult of Google: ");
System.out.println(result.remove("http://www.google.com"));
}
}
});
Thread t = new Thread(webReader);
t.start();
t.join();
OrientDB official site says:
On common hardware stores up to 150.000 documents per second, 10
billions of documents per day. Big Graphs are loaded in few
milliseconds without executing costly JOIN such as the Relational
DBMSs.
But, executing the following code shows that it's taking ~17000ms to insert 150000 simple documents.
import com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx;
import com.orientechnologies.orient.core.record.impl.ODocument;
public final class OrientDBTrial {
public static void main(String[] args) {
ODatabaseDocumentTx db = new ODatabaseDocumentTx("remote:localhost/foo");
try {
db.open("admin", "admin");
long a = System.currentTimeMillis();
for (int i = 1; i < 150000; ++i) {
final ODocument foo = new ODocument("Foo");
foo.field("code", i);
foo.save();
}
long b = System.currentTimeMillis();
System.out.println(b - a + "ms");
for (ODocument doc : db.browseClass("Foo")) {
doc.delete();
}
} finally {
db.close();
}
}
}
My hardware:
Dell Optiplex 780
Intel(R) Core(TM)2 Duo CPU E7500 # 2.93Ghz
8GB RAM
Windows 7 64bits
What am I doing wrong?
Splitting the saves in 10 concurrent threads to minimize Java's overhead made it run in ~13000ms. Still far slower than what OrientDB front page says.
You can achieve that by using 'Flat Database' and orientdb as an embedded library in java
see more explained here
http://code.google.com/p/orient/wiki/JavaAPI
what you use is server mode and it sends many requests to orientdb server,
judging by your benchmark you got ~10 000 inserts per seconds which is not bad,
e.g I think 10 000 requests/s is very good performance for any webserver
(and orientdb server actually is a webserver and you can query it through http, but I think java is using binary mode)
The numbers from the OrientDB site are benchmarked for a local database (with no network overhead), so if you use a remote protocol, expect some delays.
As Krisztian pointed out, reuse objects if possible.
Read the documentation first on how to achive the best performance!
Few tips:
-> Do NOT instantiate ODocument always:
final ODocument doc;
for (...) {
doc.reset();
doc.setClassName("Class");
// Put data to fields
doc.save();
}
-> Do NOT rely on System.currentTimeMillis() - use perf4j or similar tool to measure times, because the first one measures global system times hence includes execution time of all other programs running on your system!
I got an application written in java which runs on Unix and starts two sub-processes (via Runtime.getRuntime().exec()) on startup. If the application crashed for some reason, the sub processes won't get killed.
Now, I added a shutdown hook which gets fired on every crash, ok so far. But I'd like to send a SIGTERM signal (or at least SIGINT) on UNIX console for every sub process of the application. I should be able to find their process IDs via ps, but I did not make it to extract the PID correctly and send a signal for every process.
Can anyone help?
Thank you very much!
What I'm suggesting it is not an official feature, but a tricks.
This is how I get process id for my java applications. I never found another way.
public static final String getPid() {
try {
RuntimeMXBean runtimeBean = ManagementFactory.getRuntimeMXBean();
String name = runtimeBean.getName();
int k = name.indexOf('#');
if (k > 0)
return name.substring(0, k);
} catch (Exception ex) {
}
return null;
}
This works on win, mac and linux.