At the beginning the loop go very well and fast, after few hour, become slow, and average takes 10 seconds for a single incremental.
I created a ArrayList to save it all after processed data.
Also tried batch_size but not a significant improvement.
I am using mysql5.7, innodb, utf8mb4, and i insert directly in db are much faster.
Controller.java
// call 4000 times
Observable.fromArray(postTodayHolding).subscribe(System.out::println);
dmkm
public #ResponseBody String postTodayHolding() throws IOException, ParseException {
List<AccountHolding> cshdlist = new ArrayList<AccountHolding>();
// pnlResultNormal size about 400
for (Integer q = 0; q < pnlResultNormal.size(); q += 2) {
AccountHolding cshd = new AccountHolding();
cshd.setIdDri(fk);
String partID = pnlResultNormal.get(q).text().replace("\u00a0", "");
String holding = pnlResultNormal.get(q+1).text().replace("\u00a0", "");
CcassParticipants results = pplist.stream().filter(itm -> partID.equals(itm.getPartId())).findAny().orElse(null);
cshd.setPartId(results.getId());
cshd.setHolding(getBDformatValue(holding));
cshdlist.add(cshd);
}
AccountHoldingRepository.save(cshdlist);
... another code
i think i found the problem, when i 'top' in the terminal, it show about 100% cpu all the time and almost ate all my ram after few hour of the cron job.
java 99.6 01:51:12 42/1 1 122 3627M 0B 219M 93688 72225 running *0[1] 0.00000 0.00000 501 1926111 467
Related
I was confused by the codes as follows:
public static void test(){
long currentTime1 = System.currentTimeMillis();
final int iBound = 10000000;
final int jBound = 100;
for(int i = 1;i<=iBound;i++){
int a = 1;
int tot = 10;
for(int j = 1;j<=jBound;j++){
tot *= a;
}
}
long updateTime1 = System.currentTimeMillis();
System.out.println("i:"+iBound+" j:"+jBound+"\nIt costs "+(updateTime1-currentTime1)+" ms");
}
That's the first version, it costs 443ms on my computer.
first version result
public static void test(){
long currentTime1 = System.currentTimeMillis();
final int iBound = 100;
final int jBound = 10000000;
for(int i = 1;i<=iBound;i++){
int a = 1;
int tot = 10;
for(int j = 1;j<=jBound;j++){
tot *= a;
}
}
long updateTime1 = System.currentTimeMillis();
System.out.println("i:"+iBound+" j:"+jBound+"\nIt costs "+(updateTime1-currentTime1)+" ms");
}
The second version costs 832ms.
second version result
The only difference is that I simply swap the i and j.
This result is incredible, I test the same code in C and the difference in C is not that huge.
Why is this 2 similar codes so different in java?
My jdk version is openjdk-14.0.2
TL;DR - This is just a bad benchmark.
I did the following:
Create a Main class with a main method.
Copy in the two versions of the test as test1() and test2().
In the main method do this:
while(true) {
test1();
test2();
}
Here is the output I got (Java 8).
i:10000000 j:100
It costs 35 ms
i:100 j:10000000
It costs 33 ms
i:10000000 j:100
It costs 33 ms
i:100 j:10000000
It costs 25 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
....
So as you can see, when I run two versions of the same method alternately in the same JVM, the times for each method are roughly the same.
But more importantly, after a small number of iterations the time drops to ... zero! What has happened is that the JIT compiler has compiled the two methods and (probably) deduced that their loops can be optimized away.
It is not entirely clear why people are getting different times when the two versions are run separately. One possible explanation is that the first time run, the JVM executable is being read from disk, and the second time is already cached in RAM. Or something like that.
Another possible explanation is that JIT compilation kicks in earlier1 with one version of test() so the proportion of time spent in the slower interpreting (pre-JIT) phase is different between the two versions. (It may be possible to teas this out using JIT logging options.)
But it is immaterial really ... because the performance of a Java application while the JVM is warming up (loading code, JIT compiling, growing the heap to its working size, loading caches, etc) is generally speaking not important. And for the cases where it is important, look for a JVM that can do AOT compilation; e.g. GraalVM.
1 - This could be because of the way that the interpreter gathers stats. The general idea is that the bytecode interpreter accumulates statistics on things like branches until it has "enough". Then the JVM triggers the JIT compiler to compile the bytecodes to native code. When that is done, the code runs typically 10 or more times faster. The different looping patterns might it reach "enough" earlier in one version compared to the other. NB: I am speculating here. I offer zero evidence ...
The bottom line is that you have to be careful when writing Java benchmarks because the timings can be distorted by various JVM warmup effects.
For more information read: How do I write a correct micro-benchmark in Java?
I test it myself, I get same difference (around 16ms and 4ms).
After testing, I found that :
Declare 1M of variable take less time than multiple by 1 1M time.
How ?
I made a sum of 100
final int nb = 100000000;
for(int i = 1;i<=nb;i++){
i *= 1;
i *= 1;
[... written 20 times]
i *= 1;
i *= 1;
}
And of 100 this:
final int nb = 100000000;
for(int i = 1;i<=nb;i++){
int a = 0;
int aa = 0;
[... written 20 times]
int aaaaaaaaaaaaaaaaaaaaaa = 0;
int aaaaaaaaaaaaaaaaaaaaaaa = 0;
}
And I respectively get 8 and 3ms, which seems to correspond to what you get.
You can have different result if you have different processor.
you found the answer in algorithm books first chapter :
cost of producing and assigning is 1. so in first algorithm you have 2 declaration and assignation 10000000 and in second one you make it 100. so you reduce time ...
in first :
5 in main loop and 3 in second loop -> second loop is : 3*100 = 300
then 300 + 5 -> 305 * 10000000 = 3050000000
in second :
3*10000000 = 30000000 - > (30000000 + 5 )*100 = 3000000500
so the second one in algorithm is faster in theory but I think its back to multi cpu's ...which they can do 10000000 parallel job in first but only 100 parallel job in second .... so the first one became faster.
I wrote an Java code just for testing how my CPU will run when have to may operation to do so I wrote loop that will add 1 to var in 100000000000 iterations:
public class NoThread {
public static void main(String[] args) {
long s = System.currentTimeMillis();
int sum = 0;
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
long k = System.currentTimeMillis();
System.out.println("Time" + (k-s)+ " " + sum );
}
}
Code finish working after 30 - 40 sec.
Next I decide to split this operation into 10 threads to make my cpu more cry and say my prog to write time when each thread end:
public class WithThread {
public static void main(String[] args) {
Runnable[] run = new Runnable[10];
Thread[]thread = new Thread[10];
for (int i = 0; i<=9;i++){
run[i] = new Counter(i);
thread[i] = new Thread(run[i]);
thread[i].start();
}
}
}
and
public class Counter implements Runnable {
private int inc;
private int incc;
private int sum = 0;
private int id;
public Counter(int a){
id = a;
inc = a * 100000;
incc = (a+1)*100000;
}
#Override
public void run(){
long s = System.currentTimeMillis();
for (int i = inc;i<=incc;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
long k = System.currentTimeMillis();
System.out.println("Time" + (k-s)+ " " + sum + " in thread " + id);
}
}
In the result whole code end in 18 - 20 second - so two times faster but when I look at time in each Thread end it works i found something interesting. Each thread had same job to do but 4 threads end work in very short time ( 0,8 second ) and rest of threads ( 6 ) end in 18 to 20 second. I start it again and now i had 6 thread with fast time and 4 with slow. Run it again 7 fast and 3 slow. Amount of fast and slow thread looks randomly. So my question is why there is so big difference between fast and slow threads. Why amount of fast and slow threads is so random, and is this Language specific (Java) or maybe operating system, CPU or something else ?
Before moving into the working process of Threads and Processors, I'll explain it in more understandable way.
Scenario
Location A ------------------------------ Location B
| |_____________________________|
| |
| 200 Metres
|
| Have to be carried to
400 Bags of Sand -------------------------- Location B
(In Location A)
So, the worker will have to carry each Sand Bag from Location A to Location B until all the Sandbags are moved to location B.
Lets just pretend that the worker will be instantly Teleported back (for argument sake) to Location A (but not the other way around) once he arrives at Location B.
Case 1
Number of Workforce = 1 (No.of Mens)
Time taken = 2 mins (Time for Moving 1 SandBag from Location A to Location B)
Total time taken to carry 400 Sandbags from Location A to Location B will be
Totaltime Taken = 2 x 400 = 800 mins
Case 2
Number of Workforce = 4 (No.of Mens)
Time taken = 2 mins (Time for Moving 1 SandBag from Location A to Location B)
So now we're going to split the job equally among the available workforce.
Assigned Sandbag for Each worker = 400 / 4 = 100
Lets say everyone is starting their job at the same time.
Total time taken for carrying 100 Sandbags from Location A to Location B for an individual workforce
TimeTaken for Individual Workforce = 2 x 100 = 200 mins
Since everyone had started their job at the same time, all the 400 Sandbags will be carried from Location A to Location B in 200 mins
Case 3
Number of Workforce = 4 (No.of Mens)
Here, lets say that every men has to carry 4 sandbags from Location A to Location B in a single transfer.
Total Sandbags in Single transfer for every worker = 4 bags
Time taken = 12 mins (Time for Moving 4 SandBags from Location A to Location B in a single transfer)
Since everyone is forced to carry 4 sandbags with them instead of 1, this is greatly reduce their speed.
Even consider this,
1) I ordered you to carry 1 sandbag from A to B, you'll take 2 mins.
2) I ordered you to carry 2 sandbags from A to B at one transfer, you'll take 5 mins instead of theoritical 4 mins, because this is due to our body conditions and the weight we're carrying.
3) I ordered you to carry 4 sandbags from A to B at one transfer, you'll take 12 mins instead of (Theoritical 8 mins in Point 1, Theoritical 10 mins in Point 2), which is also because of human nature.
So now we're going to split the job equally among the available workforce.
Assigned Sandbag for Each worker = 400 / 4 = 100
Total transfers for Each worker = 100 / 4 = 25 Transfers
Calculating the time taken for single worker to complete his full job
Total time for Single worker = 12 mins x 25 tranfers = 300
So, they've taken an additional 100 min instead of theoritical 200 mins (Case 2)
Case 4
Total Sandbags in Single transfer for every worker = 100 bags
Since this is impossible to do by anyone, so he'll just quit.
xx--------------------------------------------------------------------------------------xx
This is the same kind of working principle in Threads and Processors
Here
Workforce = No. of Processors
Total Sandbags = No.of Threads
Sandbags in a Single transfer = No.of threads a (1) processor is going to handle simultaneously
Assume
Available Processors = 4
Runtime.getRuntime().availableProcessors() // -> Syntax to get the no of available processors
Note: Link every Case with the Realtime Case explained above
Case 1
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
Whole operation is series process, so it'll take execution time what it's suppose to.
Case 2
for( int n = 1; n <= 4; n++ ){
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=250000;i++){ // 1000000 / 4 = 250000
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
Here each processor will going to handle 1 thread. So it'll take 1/4th of the actual time.
Case 3
for( int n = 1; n <= 16; n++ ){
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=62500;i++){ // 1000000 / 16 = 62500
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
Totally 16 threads will be created and each processor will have to handle 4 threads simultaneously. So practically, it'll increase the processor load to its max, thus it'll reduce the efficiency of the processor resulting in increase in the execution time of each processor.
Totally it'll take 1/4th of(1/4th of actual time) + performace degrade time(will definitely be higher than than the 1/4th of actual time)
Case 4
for( int n = 1; n <= 100000; n++ ){ // 100000 - Just for argument sake
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
At this stage, creating and starting a thread is more expensive (if the processor already have more threads in it) than the time taken for creating and starting previous threads.As the number of simultaneous threads increases, it'll hugely increase the processor load until the processor reaches its capacity, thus lead to System Crash.
The reason why your threads created in the first were having less execution time is because there wont be any performance degrade in processor during the intital stage. But as the for loop continues, no of threads have to be handled by each processor increases beyond the fair ratio (1:1), so you'll start to experience lag when the threads counts were increased in processor.
I have an application which accesses about 2 million tweets from a MySQL database. Specifically one of the fields holds a tweet of text (with maximum length of 140 characters). I am splitting every tweet into an ngram of words ngrams where 1 <= n <= 3. For example, consider the sentence:
I am a boring sentence.
The corresponding nGrams are:
I
I am
I am a
am
am a
am a boring
a
a boring
a boring sentence
boring
boring sentence
sentence
With about 2 million tweets, I am generating a lot of data. Regardless, I am surprised to get a heap error from Java:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2145)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1922)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3423)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:483)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3118)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2709)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1612)
at twittertest.NGramFrequencyCounter.FreqCount(NGramFrequencyCounter.java:49)
at twittertest.Global.main(Global.java:40)
Here is the problem code statement (line 49) as given by the above output from Netbeans:
results = stmt.executeQuery("select * from tweets");
So, if I am running out of memory it must be that it is trying to return all the results at once and then storing them in memory. What is the best way to solve this problem? Specifically I have the following questions:
How can I process pieces of results rather than the whole set?
How would I increase the heap size? (If this is possible)
Feel free to include any suggestions, and let me know if you need more information.
EDIT
Instead of select * from tweets I partitioned the table into equally sized subsets of about 10% of the total size. Then I tried running the program. It looked like it was working fine but it eventually gave me the same heap error. This is strange to me because I have ran the same program in the past, successfully with 610,000 tweets. Now I have about 2,000,000 tweets or roughly 3 times as much more data. So if I split the data into thirds it should work, but I went further and split the subsets into size 10%.
Is some memory not being freed? Here is the rest of the code:
results = stmt.executeQuery("select COUNT(*) from tweets");
int num_tweets = 0;
if(results.next())
{
num_tweets = results.getInt(1);
}
int num_intervals = 10; //split into equally sized subets
int interval_size = num_tweets/num_intervals;
for(int i = 0; i < num_intervals-1; i++) //process 10% of the data at a time
{
results = stmt.executeQuery( String.format("select * from tweets limit %s, %s", i*interval_size, (i+1)*interval_size));
while(results.next()) //for each row in the tweets database
{
tweetID = results.getLong("tweet_id");
curTweet = results.getString("tweet");
int colPos = curTweet.indexOf(":");
curTweet = curTweet.substring(colPos + 1); //trim off the RT and retweeted
if(curTweet != null)
{
curTweet = removeStopWords(curTweet);
}
if(curTweet == null)
{
continue;
}
reader = new StringReader(curTweet);
tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);
//tokenizer = new StandardFilter(Version.LUCENE_36, tokenizer);
//Set stopSet = StopFilter.makeStopSet(Version.LUCENE_36, stopWords, true);
//tokenizer = new StopFilter(Version.LUCENE_36, tokenizer, stopSet);
tokenizer = new ShingleFilter(tokenizer, 2, 3);
charTermAttribute = tokenizer.addAttribute(CharTermAttribute.class);
while(tokenizer.incrementToken()) //insert each nGram from each tweet into the DB
{
insertNGram.setInt(1, nGramID++);
insertNGram.setString(2, charTermAttribute.toString().toString());
insertNGram.setLong(3, tweetID);
insertNGram.executeUpdate();
}
}
}
Don't get all rows from table. Try to select partial
data based on your requirement by setting limits to query. You are using MySQL database your query would be select * from tweets limit 0,10. Here 0 is starting row id and 10 represents 10 rows from start.
You can always increase the heap size available to your JVM using the -Xmx argument. You should read up on all the knobs available to you (e.g. perm gen size). Google for other options or read this SO answer.
You probably can't do this kind of problem with a 32-bit machine. You'll want 64 bits and lots of RAM.
Another option would be to treat it as a map-reduce problem. Solve it on a cluster using Hadoop and Mahout.
Have you considered streaming the result set? Halfway down the page is a section on result set, and it addresses your problem (I think?) Write the n grams to a file, then process the next row? Or, am I misunderstanding your problem?
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
I am trying to measure the performance of our service by putting the data in a HashMap like-
X number of calls came back in Y ms. Below is my code which is very simple. It will set the timer before hitting the service and after the response came back, it will measure the time.
private static void serviceCall() {
histogram = new HashMap<Long, Long>();
keys = histogram.keySet();
long total = 10;
long runs = total;
while (runs > 0) {
long start_time = System.currentTimeMillis();
// hitting the service
result = restTemplate
.getForObject("Some URL",String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = histogram.get(difference);
if (count != null) {
count++;
histogram.put(Long.valueOf(difference), count);
} else {
histogram.put(Long.valueOf(difference), Long.valueOf(1L));
}
runs--;
}
for (Long key : keys) {
Long value = histogram.get(key);
System.out.println("SERVICE MEASUREMENT, HG data, " + key + ":" + value);
}
}
Currently the output I am getting is something like this-
SERVICE MEASUREMENT, HG data, 166:1
SERVICE MEASUREMENT, HG data, 40:2
SERVICE MEASUREMENT, HG data, 41:4
SERVICE MEASUREMENT, HG data, 42:1
SERVICE MEASUREMENT, HG data, 43:1
SERVICE MEASUREMENT, HG data, 44:1
which means is 1 call came back in 166 ms, 2 calls came back in 40 ms and same with other outputs.
Problem Statement:-
What I am looking for now is something like this. I should have range setup like this-
X Number of calls came back in between 1 and 10 ms
Y Number of calls came back in between 11 and 20 ms
Z Number of calls came back in between 21 and 30 ms
P Number of calls came back in between 31 and 40 ms
T number of calls came back in between 41 and 50 ms
....
....
I number of calls came back in more than 100 ms
And any way to configure the range also. Suppose in future I need to tweak in the range, I should be able to do it. How can I achieve this thing in my current program? Any suggestions will be of great help.
A histogram is a set of data arranged into "bins" of equal size. You should convert your time measurement to a bin and use that bin as the map key. This can be done simply by dividing your time value by the bin size. For example: time / 10L.
Cross post from http://forums.oracle.com/forums/thread.jspa?threadID=2195025&tstart=0
There is a telecom application server (JAIN SLEE based) and the application running in it.
The application is receiving a message from the network, processes it and sends back to the network a response.
The requirement for request/response latency is 250 ms for 95% of calls and 3000 ms for 99.999% of calls.
We use EDU.oswego.cs.dl.util.concurrent.ConcurrentHashMap, 1 instance. For one call (one call is several messages) processing the following methods are invoked:
"put", "get", "get", "get", then in 180 seconds "remove".
There are 4 threads which invoke these methods.
(A small note: working with ConcurrentHashMap is not the only activity. Also for one network message there are a lot of other activities: protocol message parsing, querying a DB, writing an SDR into a file, creating short living and long living objects.)
When we move from EDU.oswego.cs.dl.util.concurrent.ConcurrentHashMap to java.util.concurrent.ConcurrentHashMap, we see a performance degradation from 1400 to 800 calls per second.
The first bottleneck for the last 800 calls per second is not sufficient latency for the requirement above.
This performance degradation is reproduced on hosts with the following CPU:
2 CPU x Quad-Core AMD Opteron 2356
2312 MHz, 8 HW threads in total,
2 CPU x Intel Xeon E5410 2.33 GHz, 8
HW threads in total.
It is not reproduced on X5570 CPU (Intel Xeon Nehalem X5570 2.93 GHz, 16 HW threads in total).
Did anybody face similar issues? How to solve them?
I assume you are taking about nano-seconds rather than milli-seconds. (That is one million times smaller!)
OR the use of ConcurrentHashMap is a trivial portion of your delay.
EDIT: Have edited the example to be multi-threaded using 100 tasks.
/*
Average operation time for a map of 10,000,000 was 48 ns
Average operation time for a map of 5,000,000 was 51 ns
Average operation time for a map of 2,500,000 was 48 ns
Average operation time for a map of 1,250,000 was 46 ns
Average operation time for a map of 625,000 was 45 ns
Average operation time for a map of 312,500 was 44 ns
Average operation time for a map of 156,200 was 38 ns
Average operation time for a map of 78,100 was 34 ns
Average operation time for a map of 39,000 was 35 ns
Average operation time for a map of 19,500 was 37 ns
*/
public static void main(String... args) {
ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
try {
for (int size = 100000; size >= 100; size /= 2)
test(es, size);
} finally {
es.shutdown();
}
}
private static void test(ExecutorService es, final int size) {
int tasks = 100;
final ConcurrentHashMap<Integer, String> map = new ConcurrentHashMap<Integer, String>(tasks*size);
List<Future> futures = new ArrayList<Future>();
long start = System.nanoTime();
for (int j = 0; j < tasks; j++) {
final int offset = j * size;
futures.add(es.submit(new Runnable() {
public void run() {
for (int i = 0; i < size; i++)
map.put(offset + i, "" + i);
int total = 0;
for (int j = 0; j < 10; j++)
for (int i = 0; i < size; i++)
total += map.get(offset + i).length();
for (int i = 0; i < size; i++)
map.remove(offset + i);
}
}));
}
try {
for (Future future : futures)
future.get();
} catch (Exception e) {
throw new AssertionError(e);
}
long time = System.nanoTime() - start;
System.out.printf("Average operation time for a map of %,d was %,d ns%n", size * tasks, time / tasks / 12 / size);
}
At first, did you check that the hash map is indeed the culprit? Assuming, that you did: There is a lock-free hash map designed to scale to hundreds of processors without introducing alot of contention. It's authored by Cliff Click a well known engineer on the original Hot Spot compiler team. Now, working on scaling the JDK to machines with hundreds of CPUs. So, I assume that he knows what he is doing in that hash map implementation. More infos about this hash map can be found in these slides.
Have you tried changing th concurrencyLevel in the ConcurrentHashMap? Try some lower values like 8, try some bigger values. And remember that the performance and concurrency of ConcurrentHashMap is dependend on you quality of HashCode function.
And yes, it - the java.util.ConcurrentHashMap has the same origin (Doug Lee from edu.oswego) as edu.oswego.cs.dl... , but it was totally rewritten by him so it can better scale.
I think it may be good for you to checkout the javolution fast map. It may be better suited for real-time applications.