Which is faster:
if(this.foo != 1234)
this.foo = 1234;
or
this.foo = 1234;
is the penalty of write high enough that one should check value before writing or is it faster to just write?
wouldn't having a branch cause possible mispredictions and screwup cpu pipleine? but what is the field is volatile, with writes having higher cost than reads?
yes, it is easy to say that in isolation these operations themselves are 'free' or benchmark it but that is not an answer.
There is a nice example illustrating this dilemma in the very recent talk by Sergey Kuksenko about hardware counters (slides 45-49), where the right answer for the same code depends on the data size! The idea is that "compare and set" approach cause more branch misses and loads, but less stores and L1 store misses. The difference is subtle, and I can't even rationalize why one factors overweight different on small data sizes, but become less signigicant on large data sizes.
So, measure, don't guess.
Both those operations are free: they really take almost no time!
Now if this code is in a loop, you should definitely favor the second option as it will minimize branch mispredictions.
Otherwise, what matters here is what makes the code the more readable. And again in my opinion, the second option is clearer.
Also as mentionned in the comments, assigning is an atomic operation which makes it thread safe. An other advantage for the second option.
They are not free. They cost time and space. And branching in a tight loop can actually be very costly because of branch prediction (kid's these days and their modern CPUs) . See Mysticial's answer.
But mostly it's pointless. Either just set to what it should be or throw when it's not what you expect.
Any code you make me read had better have a good reason to exist.
What I think you are trying to do is express what you expect it's value to be and assert that it should be that. Without context I can't tell you if you should throw when your expectations are violated or simply assign to assert what it should be. But making your expectations clear and enforcing them one way or another is certainly worth a few CPU cycles. I'd rather you were a little slower than quickly giving me garbage in and garbage out.
I believe this is actually a general question rather than java-related because of low level of this operations (CPU, not JVM level).
First of all, let's see what the choice is. On one hand we have reading from memory + comparison + (optionally) writing to memory, on other hand - writing to memory.
Memory access is much more expensive than registry operations (operations on data, already loaded to CPU). Therefore, choise is read + (sometimes) write vs write.
What is more expensive, read or write? Short answer - write. Long answer - write, but difference is probably small and depends on system caching strategy. It is not easy to explain in a few words, you can learn more about caching in the beautiful book "Operating Systems" by William Stallings.
I think in practice you can ignore distinction between read and write operations and just write without a test. That is because (returning back to Java) your object with all it's fields will be in cache for this moment.
Another thing to consider is branch prediction - others already mentioned that this is the reason to just write value without test too.
It depends on what you're really interested in.
If this is a plain old vanilla program, not only does the fetch/compare/branch of the first scheme take extra time, but it's extra code and complexity, and even if the first scheme did actually save a miniscule amount of time (instead of costing time) it wouldn't be worth doing it.
However, there are scenarios where it would be different. In an intensely multi-threaded environment with multiple processors modifying shared storage can be expensive, since changes to that storage need to be propagated to other processors. In such an environment it could be well worth it to spend a few extra instructions to avoid "dirtying" cache.
Related
When I was learning C, I was taught to do stuff, say, if I wanted to loop through a something strlen(string) times, I should save that value in an 'auxiliary' variable, say count and put that in the for condition clause instead of the strlen, as I'd save the need of processing that many times.
Now, when I started learning Java, I noticed this is quite not the norm. I've seen lots of code and programmers doing just what I was told not to do in C.
What's the reason for this? Are they trading efficiency for 'readibility'? Or does the compiler manage to fix that?
E: This is NOT a duplicate to the linked question. I'm not simply asking about string length, mine is a more general question.
In the old times, every function call was expensive, compilers were dumb, usable profilers yet to come, and computers slow. This way C macros and other terrible things were born. Java is not that old.
Efficiency is important, but the impact of most program parts on efficiency is very small. But reading code still needs programmers time and this is much more costly than CPU. So we'd better optimize for readability most of the time and care about speed just in the most important places.
A local variable can make the code simpler, when it avoids repetitions of complicated expressions - this happens sometimes. It can make it faster, when it avoids expensive computation which the compiler can't do - this happens rather rarely. When neither condition is met, it's just a wasted line, so why bother?
I am a student in Computer Science and I am hearing the word "overhead" a lot when it comes to programs and sorts. What does this mean exactly?
It's the resources required to set up an operation. It might seem unrelated, but necessary.
It's like when you need to go somewhere, you might need a car. But, it would be a lot of overhead to get a car to drive down the street, so you might want to walk. However, the overhead would be worth it if you were going across the country.
In computer science, sometimes we use cars to go down the street because we don't have a better way, or it's not worth our time to "learn how to walk".
The meaning of the word can differ a lot with context. In general, it's resources (most often memory and CPU time) that are used, which do not contribute directly to the intended result, but are required by the technology or method that is being used. Examples:
Protocol overhead: Ethernet frames, IP packets and TCP segments all have headers, TCP connections require handshake packets. Thus, you cannot use the entire bandwidth the hardware is capable of for your actual data. You can reduce the overhead by using larger packet sizes and UDP has a smaller header and no handshake.
Data structure memory overhead: A linked list requires at least one pointer for each element it contains. If the elements are the same size as a pointer, this means a 50% memory overhead, whereas an array can potentially have 0% overhead.
Method call overhead: A well-designed program is broken down into lots of short methods. But each method call requires setting up a stack frame, copying parameters and a return address. This represents CPU overhead compared to a program that does everything in a single monolithic function. Of course, the added maintainability makes it very much worth it, but in some cases, excessive method calls can have a significant performance impact.
You're tired and cant do any more work. You eat food. The energy spent looking for food, getting it and actually eating it consumes energy and is overhead!
Overhead is something wasted in order to accomplish a task. The goal is to make overhead very very small.
In computer science lets say you want to print a number, thats your task. But storing the number, the setting up the display to print it and calling routines to print it, then accessing the number from variable are all overhead.
Wikipedia has us covered:
In computer science, overhead is
generally considered any combination
of excess or indirect computation
time, memory, bandwidth, or other
resources that are required to attain
a particular goal. It is a special
case of engineering overhead.
Overhead typically reffers to the amount of extra resources (memory, processor, time, etc.) that different programming algorithms take.
For example, the overhead of inserting into a balanced Binary Tree could be much larger than the same insert into a simple Linked List (the insert will take longer, use more processing power to balance the Tree, which results in a longer percieved operation time by the user).
For a programmer overhead refers to those system resources which are consumed by your code when it's running on a giving platform on a given set of input data. Usually the term is used in the context of comparing different implementations or possible implementations.
For example we might say that a particular approach might incur considerable CPU overhead while another might incur more memory overhead and yet another might weighted to network overhead (and entail an external dependency, for example).
Let's give a specific example: Compute the average (arithmetic mean) of a set of numbers.
The obvious approach is to loop over the inputs, keeping a running total and a count. When the last number is encountered (signaled by "end of file" EOF, or some sentinel value, or some GUI buttom, whatever) then we simply divide the total by the number of inputs and we're done.
This approach incurs almost no overhead in terms of CPU, memory or other resources. (It's a trivial task).
Another possible approach is to "slurp" the input into a list. iterate over the list to calculate the sum, then divide that by the number of valid items from the list.
By comparison this approach might incur arbitrary amounts of memory overhead.
In a particular bad implementation we might perform the sum operation using recursion but without tail-elimination. Now, in addition to the memory overhead for our list we're also introducing stack overhead (which is a different sort of memory and is often a more limited resource than other forms of memory).
Yet another (arguably more absurd) approach would be to post all of the inputs to some SQL table in an RDBMS. Then simply calling the SQL SUM function on that column of that table. This shifts our local memory overhead to some other server, and incurs network overhead and external dependencies on our execution. (Note that the remote server may or may not have any particular memory overhead associated with this task --- it might shove all the values immediately out to storage, for example).
Hypothetically we might consider an implementation over some sort of cluster (possibly to make the averaging of trillions of values feasible). In this case any necessary encoding and distribution of the values (mapping them out to the nodes) and the collection/collation of the results (reduction) would count as overhead.
We can also talk about the overhead incurred by factors beyond the programmer's own code. For example compilation of some code for 32 or 64 bit processors might entail greater overhead than one would see for an old 8-bit or 16-bit architecture. This might involve larger memory overhead (alignment issues) or CPU overhead (where the CPU is forced to adjust bit ordering or used non-aligned instructions, etc) or both.
Note that the disk space taken up by your code and it's libraries, etc. is not usually referred to as "overhead" but rather is called "footprint." Also the base memory your program consumes (without regard to any data set that it's processing) is called its "footprint" as well.
Overhead is simply the more time consumption in program execution. Example ; when we call a function and its control is passed where it is defined and then its body is executed, this means that we make our CPU to run through a long process( first passing the control to other place in memory and then executing there and then passing the control back to the former position) , consequently it takes alot performance time, hence Overhead. Our goals are to reduce this overhead by using the inline during function definition and calling time, which copies the content of the function at the function call hence we dont pass the control to some other location, but continue our program in a line, hence inline.
You could use a dictionary. The definition is the same. But to save you time, Overhead is work required to do the productive work. For instance, an algorithm runs and does useful work, but requires memory to do its work. This memory allocation takes time, and is not directly related to the work being done, therefore is overhead.
You can check Wikipedia. But mainly when more actions or resources are used. Like if you are familiar with .NET there you can have value types and reference types. Reference types have memory overhead as they require more memory than value types.
A concrete example of overhead is the difference between a "local" procedure call and a "remote" procedure call.
For example, with classic RPC (and many other remote frameworks, like EJB), a function or method call looks the same to a coder whether its a local, in memory call, or a distributed, network call.
For example:
service.function(param1, param2);
Is that a normal method, or a remote method? From what you see here you can't tell.
But you can imagine that the difference in execution times between the two calls are dramatic.
So, while the core implementation will "cost the same", the "overhead" involved is quite different.
Think about the overhead as the time required to manage the threads and coordinate among them. It is a burden if the thread does not have enough task to do. In such a case the overhead cost over come the saved time through using threading and the code takes more time than the sequential one.
To answer you, I would give you an analogy of cooking Rice, for example.
Ideally when we want to cook, we want everything to be available, we want pots to be already clean, rice available in enough quantities. If this is true, then we take less time to cook our rice( less overheads).
On the other hand, let's say you don't have clean water available immediately, you don't have rice, therefore you need to go buy it from the shops first and you need to also get clean water from the tap outside your house. These extra tasks are not standard or let me say to cook rice you don't necessarily have to spend so much time gathering your ingredients. Ideally, your ingredients must be present at the time of wanting to cook your rice.
So the cost of time spent in going to buy your rice from the shops and water from the tap are overheads to cooking rice. They are costs that we can avoid or minimize, as compared to the standard way of cooking rice( everything is around you, you don't have to waste time gathering your ingredients).
The time wasted in collecting ingredients is what we call the Overheads.
In Computer Science, for example in multithreading, communication overheads amongst threads happens when threads have to take turns giving each other access to a certain resource or they are passing information or data to each other. Overheads happen due to context switching.Even though this is crucial to them but it's the wastage of time (CPU cycles) as compared to the traditional way of single threaded programming where there is never a time wastage in communication. A single threaded program does the work straight away.
its anything other than the data itself, ie tcp flags, headers, crc, fcs etc..
May be this is a well known question, But i didn't find the best reference for this ques...
what is the formula to calculate and assign the default u-limit, verbose (for gc) and max heap memory value?
If there is no specific formula, what is the criteria to specify this for a particular machine.
If possible could anyone please explain these concepts also.
Is there any other concepts we need to consider for performance improvement?
How to tune the JVM for better performance,
Stop what you're doing right now.
Tuning the JVM is probably the last thing you should worry about. Until you've gone through every other performance trick in the book, the default settings should be just fine.
Firstly you need to profile your application and find out where the bottlenecks are. Specifically, you will want to know:
What functions /methods are consuming the majority of CPU time?
Where are all the memory allocations happening?
What kind of objects are taking up most space on the heap?
Then you should apply targeted optimisations to the areas that are causing problems. There are thousands of valid techniques, but here are the ones that I find are most useful:
Improve algorithms - anything that is taking up a decent chunk of CPU time and has complexity of O(n^2) or worse is probably a good candidate for improvement. Try to get it to O(n log n) or better.
Share immutable data - if you have a lot of copies of the same data then it makes sense to turn these into immutable objects and share a single instance. This can save a lot of memory (and has the nice effect of improving thread safety / concurrency)
Use primitive types - replace Integer with int etc. This saves memory and makes numerical operations faster.
Be lazy - don't compute things until they are definitely needed.
Cache things - if something is expensive to compute but frequently requested, store it in a cache after the first request. Use a cache backed by a SoftHashMap so that the memory can still be released if needed.
Offload work - Can you make use of multiple cores? Can the client application do some of the work for you?
After making any changes you then need to profile again. At the very least, you will want to confirm that your optimisations actually helped. Additionally, fixing one bottleneck will usually move the bottleneck to another part of the application. So you will need to identify the new place to focus next.
Repeat until your application is fast enough (as defined by your own or your customers' requirements).
Is there a why to tell, how expensive an operation for the processor in millisecons or flops is?
I would be intrested in "instanceof", casts (I heard they are very "expensive").
Are there some studies about that?
It will depend on which JVM you're using, and the cost of many operations can vary even within the same JVM, depending on the exact situation and how much optimization the JIT has performed.
For example, a virtual method call can still be inlined by the Hotspot JIT - so long as it hasn't been overridden by anything else. In some cases with the server JIT it can still be inlined with a quick type test, for up to a couple of types.
Basically, JITs are complex enough that there's unlikely to be a meaningful general purpose answer to the question. You should benchmark your own specific situation in as real-world a way as possible. You should usually write code with primary goals of simplicit and readability - but measure the performance regularly.
The time where counting instructions or cycles could give you a good idea about the performance of some code are long gone, thanks to many, many optimizations happening on all levels of software execution.
This is especially true for VM-based languages, where the JVM can simply skip some steps because it knows that it's not necessary.
For example, I've read some time ago in an article (I'll try to find and link it eventually) that these two methods are pretty much equivalent in cost (on the HotSpot JVM, that is):
public void frobnicate1(Object o) {
if (!(foo instanceof SomeClass)) {
throw new IllegalArgumentException("Oh Noes!");
}
frobnicateSomeClass((SomeClass) o);
}
public void frobnicate2(Object o) {
frobnicateSomeClass((SomeClass) o);
}
Obviously the first method does more work, but the JVM knows that the type of o has already been checked in the if and can actually skip the type-check on the cast later on and make it a no-op.
This and many other optimizations make counting "flops" or cycles pretty much useless.
Generally speaking an instanceof check is relatively cheap. On the HotSpot JVM it boils down to a numeric check of the type id in the object header.
This classic article describes why you should "Write Dumb Code".
There's also an article from 2002 that describes how instanceof is optimized in the HotSpot JVM.
Once the JVM has warmed up most operations can be counted in nano-seconds (millionths of a milli-second) When talking about something being expensive, you usually have to say its expensive relative to an alternative. Its next to impossible to describe something as expensive in all cases.
Usually, the most important expense is your time (and other developers in your team) Using instanceof can be expensive in development and code support time because it often indicates a poor design. Using proper OOP techniques is usually a better idea. The 10 nano-second an instanceof might take, is usually relatively trivial.
The cost of specific operations performed inside the CPU is almost never relavant for performance. If performance is bad, it's almost always because of IO (network, disk) or inefficient code. Writing efficient code is much more about finding a way to reduce the overall amount of operations rather than avoiding "costly" operations (except those that are orders of magnitude more costly, like IO).
I recently came across this in some code - basically someone trying to create a large object, coping when there's not enough heap to create it:
try {
// try to perform an operation using a huge in-memory array
byte[] massiveArray = new byte[BIG_NUMBER];
} catch (OutOfMemoryError oome) {
// perform the operation in some slower but less
// memory intensive way...
}
This doesn't seem right, since Sun themselves recommend that you shouldn't try to catch Error or its subclasses. We discussed it, and another idea that came up was explicitly checking for free heap:
if (Runtime.getRuntime().freeMemory() > SOME_MEMORY) {
// quick memory-intensive approach
} else {
// slower, less demanding approach
}
Again, this seems unsatisfactory - particularly in that picking a value for SOME_MEMORY is difficult to easily relate to the job in question: for some arbitrary large object, how can I estimate how much memory its instantiation might need?
Is there a better way of doing this? Is it even possible in Java, or is any idea of managing memory below the abstraction level of the language itself?
Edit 1: in the first example, it might actually be feasible to estimate the amount of memory a byte[] of a given length might occupy, but is there a more generic way that extends to arbitrary large objects?
Edit 2: as #erickson points out, there are ways to estimate the size of an object once it's created, but (ignoring a statistical approach based on previous object sizes) is there a way of doing so for yet-uncreated objects?
There also seems to be some debate as to whether it's reasonable to catch OutOfMemoryError - anyone know anything conclusive?
freeMemory isn't quite right. You'd also have to add maxMemory()-totalMemory(). e.g. assuming you start up the VM with max-memory=100M, the JVM may at the time of your method call only be using (from the OS) 50M. Of that, let's say 30M is actually in use by the JVM. That means you'll show 20M free (roughly, because we're only talking about the heap here), but if you try to make your larger object, it'll attempt to grab the other 50M its contract allows it to take from the OS before giving up and erroring. So you'd actually (theoretically) have 70M available.
To make this more complicated, the 30M it reports as in use in the above example includes stuff that may be eligible for garbage collection. So you may actually have more memory available, if it hits the ceiling it'll try to run a GC to free more memory.
You can try to get around this bit by manually triggering a System.GC, except that that's not such a terribly good thing to do because
-it's not guaranteed to run immediately
-it will stop everything in its tracks while it runs
Your best bet (assuming you can't easily rewrite your algorithm to deal with smaller memory chunks, or write to a memory-mapped file, or something less memory intensive) might be to do a safe rough estimate of the memory needed and insure that it's available before you run your function.
There are some kludges that you can use to estimate the size of an existing object; you could adapt some of these to predict the size of a yet-to-be created object.
However, in this case, I think it might be best to catch the Error. First of all, asking for the free memory doesn't account for what's available after garbage collection, which will be performed before raising an OOME. And, requesting a garbage collection with System.gc() isn't reliable. It's often explicitly disabled because it can wreck performance, and if it's not disabled… well, it can wreck performance when used unnecessarily.
It is impossible to recover from most errors. However, recoverability is up to the caller, not the callee. In this case, if you have a strategy to recover from an OutOfMemoryError, it is valid to catch it and fall back.
I guess that, in practice, it really comes down to the difference between the "slow" and "fast" way. If the "slow" method is fast enough, I'd stick with that, as it's safer and simpler. And, it seems to me, allowing it to be used as a fall back means that it is "fast enough." Don't let small optimizations derail the reliability of your application.
The "try to allocate and handle the error" approach is very dangerous.
What if you barely get your memory? A later OOM exception might occur because you brought things too close to the limits. Almost any library call will allocate memory at least briefly.
During your allocation a different thread may receive an OOM exception while trying to allocate a relatively small object. Even if your allocation is destined to fail.
The only viable approach is your second one, with the corrections noted in other answers. But you have to be sure and leave extra "slop space" in the heap when you decide to use your memory intensive approach.
I don't believe that there's a reasonable, generic approach to this that could safely be assumed to be 100% reliable. Even the Runtime.freeMemory approach is vulnerable to the fact that you may actually have enough memory after a garbage collection, but you wouldn't know that unless you force a gc. But then there's no foolproof way to force a GC either. :)
Having said that, I suspect if you really did know approximately how much you needed, and did run a System.gc() beforehand, and your running in a simple single-threaded app, you'd have a reasonably decent shot at getting it right with the .freeMemory call.
If any of those constraints fail, though, and you get the OOM error, your back at square one, and therefore are probably no better off than just catching the Error subclass. While there are some risks associated with this (Sun's VM does not make a lot of guarantees about what happens after an OOM... there's some risk of internal state corruption), there are many apps for which just catching it and moving on with life will leave you with no serious harm.
A more interesting question in my mind, however, is why are there cases where you do have enough memory to do this and others where you don't? Perhaps some more analysis of the performance tradeoffs involved is the real answer?
Definitely catching error is the worst approach. Error happens when there is NOTHING you can do about it. Not even create a log, puff, like "... Houston, we lost the VM".
I didn't quite get the second reason. It was bad because it is hard to relate SOME_MEMORY to the operations? Could you rephrase it for me?
The only alternative I see, is to use the hard disk as the memory ( RAM/ROM as in the old days ) I guess that is what you're pointing in your "else slower, less demanding approach"
Every platform has its limits, java suppport as much as RAM your hardware is willing to give ( well actually you by configuring the VM ) In Sun JVM impl that could be done with the
-Xmx
Option
like
java -Xmx8g some.name.YourMemConsumingApp
For instance
Of course you may end up trying to perform an operation that takes 10 gb of RAM
If that's your case then you should definitely swap to disk.
Additionally, using the strategy pattern could make a nicer code. Although here it looks overkill:
if (isEnoughMemory(SOME_MEMORY)) {
strategy = new InMemoryStrategy();
} else {
strategy = new DiskStrategy();
}
strategy.performTheAction();
But it may help if the "else" involves a lot of code and looks bad. Furthermore if somehow you can use a third approach ( like using a cloud for processing ) you can add a third Strategy
...
strategy = new ImaginaryCloudComputingStrategy();
...
:P
EDIT
After getting the problem with the second approach: If there are some times when you don't know how much RAM is going to be consumed but you do know how much you have left, you could use a mixed approach ( RAM when you have enough, ROM[disk] when you don't )
Suppose this theorical problem.
Suppose you receive a file from a stream and don't know how big it is.
Then you perform some operation on that stream ( encrypt it for instance ).
If you use RAM only it would be very fast, but if the file is large enough as to consume all your APP memory, then you have to perform some of the operation in memory and then swap to file and save temporary data there.
The VM will GC when running out of memory, you get more memory and then you perform the other chunk. And this repeat until you have the big stream processed.
while( !isDone() ) {
if (isMemoryLow()) {
//Runtime.getRuntime().freeMemory() < SOME_MEMORY + some other validations
swapToDisk(); // and make sure resources are GC'able
}
byte [] array new byte[PREDEFINED_BUFFER_SIZE];
process( array );
process( array );
}
cleanUp();