Java: how can I avoid GC aka stop the world event?

Java: how can I avoid GC aka stop the world event? - java

Lets say I am reading a single incoming stream with millions of transaction per ms, is so fast that I can't afford to have a GC or the entire system will hang.
The functionality is very simple, it is merely to record every single packets that went pass NIC card. (hypothetical)
Is it even possible?
Are there design pattern for such implementation? I only know flyweight and resource pool design pattern.
Do I really need to code in C so that I can manage it?
1) I can have reasonable amount of ram but not ridiculous like 100gb (maybe 16gb)?
2) CPU processing is not an issue.
FAQ:
Must it be Java? No, please recommend me another language that can support most platform. (linux, aix, windows)

If you really want to handle everything passing through your network card. Java is the wrong language. Look into C possibly C++ or Assembler.
As you have been told a million transactions per milliseconds seem unrealistic, only achievable when you are able to split the work between multiple (read many many many) computers
There are many Garbage Collectors out there, go do some searching if anything is good for you.
If you really don't want the garbage collector to kick in, I think your only option is: Don't create garbage. Initialize an array of bytes as your memory to work in. Only use primitives, no Objects. It will be cumbersome, but it might be fast and I have been told this is the kind of stuff people working on real time systems do.

Assuming you meant millions of transactions per second, not ms, you can use Chronicle which promises up to 5-20m transactions per second, persisted.

I hope that millions of transactions per milliseconds is a joke or a hyperbole. No single computer can handle that much, particularly if a mere 100Gb counts as ridiculous amounts of RAM.
But ignoring the actual number of expected transactions, what is needed for this type of task is real-time Java. Provided that you want to stick with Java.
Either way you'll probably need a real-time OS first, because your application isn't going to be the only thing running on the computer and if the OS decides not to give your application control when it needs it, there's nothing you can do.
Update: If you only want to capture network traffic, don't reinvent the wheel, use libpcap.. AFAIK there's even a Java wrapper to it.

Related

"Give a rough estimate of the overhead incurred by each system call." - what? [duplicate]

I am a student in Computer Science and I am hearing the word "overhead" a lot when it comes to programs and sorts. What does this mean exactly?

It's the resources required to set up an operation. It might seem unrelated, but necessary.
It's like when you need to go somewhere, you might need a car. But, it would be a lot of overhead to get a car to drive down the street, so you might want to walk. However, the overhead would be worth it if you were going across the country.
In computer science, sometimes we use cars to go down the street because we don't have a better way, or it's not worth our time to "learn how to walk".

The meaning of the word can differ a lot with context. In general, it's resources (most often memory and CPU time) that are used, which do not contribute directly to the intended result, but are required by the technology or method that is being used. Examples:
Protocol overhead: Ethernet frames, IP packets and TCP segments all have headers, TCP connections require handshake packets. Thus, you cannot use the entire bandwidth the hardware is capable of for your actual data. You can reduce the overhead by using larger packet sizes and UDP has a smaller header and no handshake.
Data structure memory overhead: A linked list requires at least one pointer for each element it contains. If the elements are the same size as a pointer, this means a 50% memory overhead, whereas an array can potentially have 0% overhead.
Method call overhead: A well-designed program is broken down into lots of short methods. But each method call requires setting up a stack frame, copying parameters and a return address. This represents CPU overhead compared to a program that does everything in a single monolithic function. Of course, the added maintainability makes it very much worth it, but in some cases, excessive method calls can have a significant performance impact.

You're tired and cant do any more work. You eat food. The energy spent looking for food, getting it and actually eating it consumes energy and is overhead!
Overhead is something wasted in order to accomplish a task. The goal is to make overhead very very small.
In computer science lets say you want to print a number, thats your task. But storing the number, the setting up the display to print it and calling routines to print it, then accessing the number from variable are all overhead.

Wikipedia has us covered:
In computer science, overhead is
generally considered any combination
of excess or indirect computation
time, memory, bandwidth, or other
resources that are required to attain
a particular goal. It is a special
case of engineering overhead.

Overhead typically reffers to the amount of extra resources (memory, processor, time, etc.) that different programming algorithms take.
For example, the overhead of inserting into a balanced Binary Tree could be much larger than the same insert into a simple Linked List (the insert will take longer, use more processing power to balance the Tree, which results in a longer percieved operation time by the user).

For a programmer overhead refers to those system resources which are consumed by your code when it's running on a giving platform on a given set of input data. Usually the term is used in the context of comparing different implementations or possible implementations.
For example we might say that a particular approach might incur considerable CPU overhead while another might incur more memory overhead and yet another might weighted to network overhead (and entail an external dependency, for example).
Let's give a specific example: Compute the average (arithmetic mean) of a set of numbers.
The obvious approach is to loop over the inputs, keeping a running total and a count. When the last number is encountered (signaled by "end of file" EOF, or some sentinel value, or some GUI buttom, whatever) then we simply divide the total by the number of inputs and we're done.
This approach incurs almost no overhead in terms of CPU, memory or other resources. (It's a trivial task).
Another possible approach is to "slurp" the input into a list. iterate over the list to calculate the sum, then divide that by the number of valid items from the list.
By comparison this approach might incur arbitrary amounts of memory overhead.
In a particular bad implementation we might perform the sum operation using recursion but without tail-elimination. Now, in addition to the memory overhead for our list we're also introducing stack overhead (which is a different sort of memory and is often a more limited resource than other forms of memory).
Yet another (arguably more absurd) approach would be to post all of the inputs to some SQL table in an RDBMS. Then simply calling the SQL SUM function on that column of that table. This shifts our local memory overhead to some other server, and incurs network overhead and external dependencies on our execution. (Note that the remote server may or may not have any particular memory overhead associated with this task --- it might shove all the values immediately out to storage, for example).
Hypothetically we might consider an implementation over some sort of cluster (possibly to make the averaging of trillions of values feasible). In this case any necessary encoding and distribution of the values (mapping them out to the nodes) and the collection/collation of the results (reduction) would count as overhead.
We can also talk about the overhead incurred by factors beyond the programmer's own code. For example compilation of some code for 32 or 64 bit processors might entail greater overhead than one would see for an old 8-bit or 16-bit architecture. This might involve larger memory overhead (alignment issues) or CPU overhead (where the CPU is forced to adjust bit ordering or used non-aligned instructions, etc) or both.
Note that the disk space taken up by your code and it's libraries, etc. is not usually referred to as "overhead" but rather is called "footprint." Also the base memory your program consumes (without regard to any data set that it's processing) is called its "footprint" as well.

Overhead is simply the more time consumption in program execution. Example ; when we call a function and its control is passed where it is defined and then its body is executed, this means that we make our CPU to run through a long process( first passing the control to other place in memory and then executing there and then passing the control back to the former position) , consequently it takes alot performance time, hence Overhead. Our goals are to reduce this overhead by using the inline during function definition and calling time, which copies the content of the function at the function call hence we dont pass the control to some other location, but continue our program in a line, hence inline.

You could use a dictionary. The definition is the same. But to save you time, Overhead is work required to do the productive work. For instance, an algorithm runs and does useful work, but requires memory to do its work. This memory allocation takes time, and is not directly related to the work being done, therefore is overhead.

You can check Wikipedia. But mainly when more actions or resources are used. Like if you are familiar with .NET there you can have value types and reference types. Reference types have memory overhead as they require more memory than value types.

A concrete example of overhead is the difference between a "local" procedure call and a "remote" procedure call.
For example, with classic RPC (and many other remote frameworks, like EJB), a function or method call looks the same to a coder whether its a local, in memory call, or a distributed, network call.
For example:
service.function(param1, param2);
Is that a normal method, or a remote method? From what you see here you can't tell.
But you can imagine that the difference in execution times between the two calls are dramatic.
So, while the core implementation will "cost the same", the "overhead" involved is quite different.

Think about the overhead as the time required to manage the threads and coordinate among them. It is a burden if the thread does not have enough task to do. In such a case the overhead cost over come the saved time through using threading and the code takes more time than the sequential one.

To answer you, I would give you an analogy of cooking Rice, for example.
Ideally when we want to cook, we want everything to be available, we want pots to be already clean, rice available in enough quantities. If this is true, then we take less time to cook our rice( less overheads).
On the other hand, let's say you don't have clean water available immediately, you don't have rice, therefore you need to go buy it from the shops first and you need to also get clean water from the tap outside your house. These extra tasks are not standard or let me say to cook rice you don't necessarily have to spend so much time gathering your ingredients. Ideally, your ingredients must be present at the time of wanting to cook your rice.
So the cost of time spent in going to buy your rice from the shops and water from the tap are overheads to cooking rice. They are costs that we can avoid or minimize, as compared to the standard way of cooking rice( everything is around you, you don't have to waste time gathering your ingredients).
The time wasted in collecting ingredients is what we call the Overheads.
In Computer Science, for example in multithreading, communication overheads amongst threads happens when threads have to take turns giving each other access to a certain resource or they are passing information or data to each other. Overheads happen due to context switching.Even though this is crucial to them but it's the wastage of time (CPU cycles) as compared to the traditional way of single threaded programming where there is never a time wastage in communication. A single threaded program does the work straight away.

its anything other than the data itself, ie tcp flags, headers, crc, fcs etc..

List of performance improvement features that we can implement in java

May be this is a well known question, But i didn't find the best reference for this ques...
what is the formula to calculate and assign the default u-limit, verbose (for gc) and max heap memory value?
If there is no specific formula, what is the criteria to specify this for a particular machine.
If possible could anyone please explain these concepts also.
Is there any other concepts we need to consider for performance improvement?
How to tune the JVM for better performance,

Stop what you're doing right now.
Tuning the JVM is probably the last thing you should worry about. Until you've gone through every other performance trick in the book, the default settings should be just fine.
Firstly you need to profile your application and find out where the bottlenecks are. Specifically, you will want to know:
What functions /methods are consuming the majority of CPU time?
Where are all the memory allocations happening?
What kind of objects are taking up most space on the heap?
Then you should apply targeted optimisations to the areas that are causing problems. There are thousands of valid techniques, but here are the ones that I find are most useful:
Improve algorithms - anything that is taking up a decent chunk of CPU time and has complexity of O(n^2) or worse is probably a good candidate for improvement. Try to get it to O(n log n) or better.
Share immutable data - if you have a lot of copies of the same data then it makes sense to turn these into immutable objects and share a single instance. This can save a lot of memory (and has the nice effect of improving thread safety / concurrency)
Use primitive types - replace Integer with int etc. This saves memory and makes numerical operations faster.
Be lazy - don't compute things until they are definitely needed.
Cache things - if something is expensive to compute but frequently requested, store it in a cache after the first request. Use a cache backed by a SoftHashMap so that the memory can still be released if needed.
Offload work - Can you make use of multiple cores? Can the client application do some of the work for you?
After making any changes you then need to profile again. At the very least, you will want to confirm that your optimisations actually helped. Additionally, fixing one bottleneck will usually move the bottleneck to another part of the application. So you will need to identify the new place to focus next.
Repeat until your application is fast enough (as defined by your own or your customers' requirements).

Simple Multi-Threading in Java

Currently, I'm running on a thread-less model that isn't working simply because I'm running out of memory before I can process the data I'm being handed. I've made all the changes that I can to optimize the code, and it's still just not quite quick enough.
Clearly I should move on to a threaded model. I'm wondering what the simplest, easiest way to do the following is:
The main thread passes some info to the worker
That worker performs some work that I'll refactor out of the main method
The workers will disappear and new ones will be instantiated when needed
I've never worked with java threading and from what I've read up on it seems pretty complicated, even if what I'm looking for seems pretty simple.

If you have multiple independent units of work of equal priority, the best solution is generally some sort of work queue, where a limited number of threads (the number chosen to optimize performance) sit in a while(true) loop dequeuing work units from the queue and executing them.
Generally the optimum number of threads is going to be the number of processors +/- 1, though in some cases a larger number will be optimal if the threads tend to get stalled by disk I/O requests or some such.
But keep in mind that tuning the entire system may be required. Eg, you may need more disk arms, and certainly more RAM may be required.

I'd start by having a read through Java Concurrency as refresher ;)
In particular, I would spend some time getting to know the Executors API as it will do most of what you've described without a lot of the overhead of dealing with to many locks ;)

Distributing the memory consumption to multiple threads will not change overall memory consumption. From what I read out of your question, I would like to step forward and tell you: Increase the heap of the Java engine, this will help. Looks like you have to optimize the Java startup parameters and not your code. If I am wrong, then you will have to buffer the data. To Disk! Not to a thread in the same memory model.

Java overall arrays limit?

I am implementing a program that has about 2,000,000 (2 million) arrays each of size 16,512 (128 x 129) of integers. I only need to call 200 arrays at a time (that is 3.3 MB), but I wonder if I can expand the program to have more than 2 million (say 200 million) but I still only need to call 200 arrays at a time. So what is the limit of making more and more arrays while I don't use more than 200 arrays at a time?

I highly doubt that, unless you're running on a 64 bit machine with a lot of RAM and a very generous heap.
Let's calculate the memory you'll need for your data:
2,000,000*128*129*8/1024/1024/1024 = 30.8GB.
You'll need additional RAM for the JVM, the rest of your program, and the operating system.
Sounds like a poorly conceived solution to me.
If you mean "I only have 200 arrays in memory at a time" you can certainly do that, but you'll have to move the rest out to secondary storage or a relational database. Query for them, use them, GC them. It might not be the best solution, but it's hard to tell based on the little you've posted.
Update:
Does "trigger" mean "database trigger"?
Yes, you can store them on the disk. I can't guarantee that it'll perform. Your hard drive can certainly handle 30GB of data; it's feasible that it'll accomodate 300GB if it's large enough.
Just remember that you have to think about how you'll manage RAM. GC thrashing might be a problem. A good caching solution might be your friend here. Don't write one yourself.
What happens if that hard drive fails and you lose all that data? Do you back it up? Can your app afford to be down if the disk fails? Think about those scenarios, too. Good luck.

As long as you increase max heap size to make sure your application doesn't run out of memory, you shuold be fine.

As long as you don't keep references to arrays you no longer need, there is no hard limit. Old arrays will automatically get garbage collected, so you can keep allocating and abandoning arrays pretty much ad infinitum.
There is, of course, a limit on how many arrays you can keep around at any given time. This is limited by the amount of memory available to the JVM.

Java Disruptor pattern and low latency

Q1) Does anyone familiar with the Java Disruptor pattern know the size of messages they benchmarked their results against? I am writing a similar system (out of pure interest) and when I read the description of their testing there is no mention of the message size sent?
http://code.google.com/p/disruptor/wiki/PerformanceResults
Q2) Is the disruptor for computer to computer communications, or inter-process? I originally had the impression it was for computer to computer but their work is labelled "inter thread" messaging library?

Disruptor is not just within the same machine, it is withing a single process. When they say "inter-thread", they mean that it is for sending messages between threads of one process.
The message size is actually almost irrelevant because the messages don't get copied. The messages are all fixed at the beginning and reused, so it doesn't really matter how big they are.

Although Im not entirely familiar, just exploring it...
1) It looks like from the perf test folder in the src that they are using the ValueEvent class, which just holds a long, there is also some other xxxEvent classes that are used in other perf tests that are slightly bigger but from what i can gather so far, only a long is used within the ring buffer.
2) I would assume it is for completely same machine inter thread comms. the latency & uncertainty of comms across machines would make it extremely slow. (relatively) and then the project would also need to deal with socket comms, which I haven't seen in this lib.

1,Disruptor not care the size of message. but result should be linearly down by size of message(workload increased, the speed decreased)
In deed it's not care the message.
The KEY of the library is the ID of buffer. pointer, position, cursor, indicator, all both mean the same.
Disruptor self call it as "sequence"
Once the ID got, the whole world only owned by you!:) so ONLY one writer. the real key point.:)
2,not C2C, nor P2P:). just T2T. the T is thread. peter-lawrey have a great library Java-Chronicle, can be used in P2P case. a new article on java dzone: http://java.dzone.com/articles/ultra-fast-reliable-messaging
3, the core pattern should be capable to clone to cross boundary use cases. every thing is ID.
As to the message, customerized.
4, another important point, is the cache of volatile. a great example on github
5, JDK8 intro a new annotation #Contended, seems sexy. details about contended

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.