Efficient Memory utilisation?Switch case with strings - java

I was wondering what would be the best way to save some resources and memory in my test app.
I know that creating objects eats up memory and slows down the app and in the app the code requires an switch case of String values. So what would be better? An if else statement for all string values to assign each of them an integer tag and using switch case or to create an Enumerator and use switch directly?
Number of entries = 40-50.

I know that creating objects eats up memory and slows down the app ...
This generalization is problematic:
Yes, creating an object consumes memory and takes time, but it doesn't necessarily slow down the app. It depends on what the alternative to creating the object is. It could well be that the alternative slows down the app more.
And even assuming that the "create an object" version uses more resources than the alternative, there's a good chance that the difference won't be significant.
What you seem to be doing here is premature optimization. My advice is:
let the JVM deal with the optimization (it can probably do it better than you anyway),
leave any hand optimization to later, and only do it if the measured performance of the app actually warrants it, and
use a CPU or memory profiler to guide you as to the parts of your code where going to the effort of hand optimizing is likely to have a good payoff.
As to the specifics of your use-case, it is not clear what the most efficient solution would be. It depends on things like how the strings are formed, how many branches there are in the switch statement, and things like that. It is hard to predict without profiling the application with realistic input data.
A third option that you didn't mention is to use an enum instead of a String or int. That will be neater than implementing the String to int mapping using a pre-populated HashMap or something.

Don't worry about it. The new Java 7 switch statement automatically does this for you.
A switch with String cases is translated into two switches during compilation. The first maps each string to a unique integer—its position in the original switch. This is done by first switching on the hash code of the label. The corresponding case is an if statement that tests string equality; if there are collisions on the hash, the test is a cascading if-else-if. The second switch mirrors that in the original source code, but substitutes the case labels with their corresponding positions. This two-step process makes it easy to preserve the flow control of the original switch.
Just sit back, take a sip of coffee, and let the compiler+JVM do all the heavy lifting for you. Making micro-optimizations in cases like this will more likely hurt performance than help.

Related

Creating new objects versus encoding data in primitives

Let's assume I want to store (integer) x/y-values, what is considered more efficient: Storing it in a primitive value like long (which fits perfect, due sizeof(long) = 2*sizeof(int)) using bit-operations like shift, or and a mask, or creating a Point-Class?
Keep in mind that I want to create and store many(!) of these points (in a loop). Would be there a perfomance issue when using classes? The only reason I would prefer storing in primtives over storing in class is the garbage-collector. I guess generating new objects in a loop would trigger the gc way too much, is it correct?
Of course packing those as long[] is going to take less memory (though it is going to be contiguous). For each Object (a Point) you will pay at least 12 bytes more for the two headers.
On other hand, if you are creating them in a loop and thus escape analysis can prove they don't escape, it can apply an optimization called "scalar replacement" (thought is it very fragile); where your Objects will not be allocated at all. Instead those objects will be "desugared" to fields.
The general rule is that you should code the way it is the most easy to maintain and read that code. If and only if you see performance issues (via a profiler let's say or too many pauses), only then you should look at GC logs and potentially optimize code.
As an addendum, jdk code itself is full of such long where each bit means different things - so they do pack them. But then, me and I doubt you, are jdk developers. There such things matter, for us - I have serious doubts.

Java 8, memory wasted by duplicate strings

I'm investigating a memory leak in my Grails 3.3.10 server that is running on a Java 8 JVM. I took a heap dump from a production server that was running low on memory and analysed it with JXRay. The html report says that some memory is wasted on duplicate strings with 19.6% overhead. Most of it is wasted on duplicates of the empty string "" and it is mostly coming from database reads. I have two questions about this.
Should I start interning strings or is it too costly of an operation to be worth it?
Quite a bit of my code deals with deeply nested JSON structures from elasticsearch and I didn't like the fragility of the code so I made a small helper class to avoid typos when accessing data from the json.
public static final class S {
public static final String author = "author";
public static final String aspectRatio = "aspectRatio";
public static final String userId = "userId";
... etc etc
That helps me avoid typos like so:
Integer userId = json.get("userid"); // Notice the lower case i. This returns null and fails silently
Integer userId = json.get(S.userId); // If I make a typo here the compiler will tell me.
I was reasonably happy about this, but now I'm second guessing myself. Is this a bad idea for some reason? I haven't seen anyone else do this. That shouldn't cause any duplicate strings to be created because they are created once and then referenced in my parsing code, right?
The problem with a String holding class is that you are using a language against its language design.
Classes are supposed to introduce types. A type that provides no utility, because it's an "Everything that can be said with a string" type is rarely useful. While there are some patterns of this occurring in many programs, typically they introduce more behavior than "all the stuff is here." For example, locale databases provide replacement strings for different languages.
I'd start by carving out the sensible enumerations. Error messages might easily be converted into enums, which have easy auto-convert string representations. That way you get your "typo detection" and a classification built-in.
DiskErrors.DISK_NOT_FOUND
Prompts.ASK_USER_NAME
Prompts.ASK_USER_PASSWORD
The side-effect of changes like this can hit your desired goal; but beware, these kinds of changes often signal the loss of readability.
Readability isn't what you think is easy to read, it's what a person who has never used the code would think is easy to read.
If I were to see a problem with "Your selected hard drive was not found", then I'd look through the code base for a string "Your selected hard drive was not found". That could land me in two places:
In the block of code were the error message was raised.
In a table mapping that string to a name.
In many blocks of code where the same error message is raised.
With the table mapping, I can then do a second search, searching for where the name is used. That can land me with a few scenarios:
It is used in one place.
It is used in many places.
With one place, a kind of code maintenance problem arises. You now have a constant that is not used by any other part of the code maintained in a place that is not near where it is used. This means that to do any change that requires full understanding of the impact, someone has to keep the remote constant's value in mind to know if the logical change should be combined with an updated error message. It's not the updating of the error message that causes the extra chance for error, it's the fact that it is removed from the code being worked on.
With multiple places, I have to cycle through all of matches, which basically is the same effort as the multiple string matches in the first step. So, the table doesn't help me find the source of the error, it just adds extra steps that are not relevant to fixing the issue.
Now the table does have a distinct benefit in one scenario: When all the messages for a specific kind of issue should be updated at the same time. The problem is, that such a scenario is rare, and unlikely to happen. What is more likely to happen is that an error message is not specific enough for a certain scenario; but, after another "scan of all the places it is used" is correct for other scenarios. So the error message is split, instead of updated in place, because the coupling enforced by the lookup table means one cannot modify some of the error messages without creating a new error message.
Problems like this come from developers slipping in features that appeal to developers.
In your case, you're building in an anti-typo system. Let me offer a better solution; because typos are real, and a real problem too.
Write a unit test to capture the expected output. It is rare that you will write the same typo twice, exactly the same way. Yes, it is possible, but coordinated typos will impact both systems the same. If you introduce a spelling error in your lookup table, and introduce it in the usage, the benefit would be a working program, but it would be hard to call it a quality solution (because the typos weren't protected against and are there in duplicate).
Have your code reviewed before submitting it to a build system. Reviews can get out of hand, especially with inflexible reviewers, but a good review should comment on "you spelled this wrong." If possible review the code as a team, so you can point out your ideas as they make their comments. If you have difficultly working with people (or they have difficulty working with people) you will find peer-review hard. I'm sorry if that happens, but if you can get a good peer review, it's the second "best" defense against these issues.
Sorry for the length of this reply, but I hope this gives you a chance to remember to "step back" from a solution and see how it impacts your future actions with the code.
And as for the "" String, focusing on why it is being set would probably be more effective in building a better product than patching the issue with interning (but I don't have access to your code base, so I might be wrong!)
Good luck
Q1: Should I start interning strings or is it too costly of an operation to be worth it?
It is hard to say without more information about how the strings are being created and their typical lifetime, but the general answer is No. It is generally not worth it.
(And interning won't fix your memory leak.)
Here are some of the reasons (a bit hand-wavey I'm afraid):
Interning a String doesn't prevent the string you are interning from being created. Your code still needs to create it and the GC still needs to collect it.
There is a hidden data structure that organizes the interned strings. That uses memory. It also costs CPU to check to see if a string is in the interning data structure and add it if needed.
The GC needs to do special (weak reference like) things with the interning data structure to prevent it from leaking. That is an overhead.
An interned string tends to live longer than a non-interned string. It is more likely to be tenured to the "old" heap, which leads to its lifetime extended even longer ... because the "old" heap is GC'ed less often.
If you are using the G1 collector AND the duplicate strings are typically long lived, you might want to try enabling G1GC string deduplication (see here). Otherwise, you are probably better off just letting the GC deal with the strings. The Java GC's are designed to efficiently deal with with lots of objects (such as strings) being created and thrown away soon after.
If it is your code that is creating the Java strings, then it might be worth tweaking it to avoid creating new zero length strings. Manually interning the zero length strings as per #ControlAltDel's comment is probably not worth the effort.
Finally, if you are going to try to reduce the duplication one way or another, I would strongly advise that you set things up so that you can measure the effects of the optimization:
Do you actually save memory?
Does this affect the rate of GC runs?
Does this affect GC pauses?
Does it affect request times / throughput?
If the measurements say that the optimization hasn't helped, you need to back it out.
Q2: Is this a bad idea for some reason? That shouldn't cause any duplicate strings to be created because they are created once and then referenced in my parsing code, right?
I can't think of any reason not to do that. It certainly doesn't lead directly to creating of duplicate strings.
On the other hand, you won't reduce string duplication simply by doing that. String objects that represent literals get interned automatically.

Can you/How do you save CPU and memory by choosing wisely [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I understand the JVM optimizes some things for you (not clear on which things yet), but lets say I were to do this:
while(true) {
int var = 0;
}
would doing:
int var;
while(true) {
var = 0;
}
take less space? Since you aren't declaring a new reference every time, you don't have to specify the type every time.
I understand you really would only need to put var outside of while if I wanted to use it outside of that loop (instead of only being able to use it locally like in the first example). Also, what about objects, would it be different that primitive types in that situation? I understand it's a small situation, but build-up of this kind of stuff can cause my application to take a lot of memory/cpu. I'm trying to use the least amount of operations possible, but I don't completely understand whats going on behind the scenes.
If someone could help me out, even maybe link me to somewhere I can learn about saving cpu by decreasing amount of operations, it would be highly appreciated. Please no books (unless they're free! :D), no way of getting one right now /:
Don't. Premature optimization is the root of all evil.
Instead, write your code as it makes most sense conceptually. Write it thoughtfully, yes. But don't think you can be a 'human compiler' and optimize and still write good code.
Once you have written your code (more or less naively, depending on your level of experience) you write performance tests for it. Try to think of different ways in which the code may be used (many times in a row, from front to back or reversed, many concurrent invocations etc) and try to cover these in test cases. Then benchmark your code.
If you find that some test cases are not performing well, investigate why. Measure parts of the test case to see where the time is going. Zoom into the parts where most time is spent.
Mostly, you will find weird loops where, upon reading the code again, you will think 'that was silly to write it that way. Of course this is slow' and easily fix it. In my experience most performance problems can be solved this way and 'hardcore optimization' is hardly ever needed.
In the end you will find that 99* percent of all performance problems can be solved by touching only 1 percent of the code. The other code never comes into play. This is why you should not 'prematurely' optimize. You will be spending valuable time optimizing code that had no performance issues in the first place. And making it less readable in the process.
Numbers made up of course but you know what I mean :)
Hot Licks points out the fact that this isn't much of an answer, so let me expand on this with some good ol' perfomance tips:
Keep an eye out for I/O
Most performance problems are not in pure Java. Instead they are in interfacing with other systems. In particular disk access is notoriously slow. So is the network. So minimize it's use.
Optimize SQL queries
SQL queries will add seconds, even minutes, to your program's execution time if you don't watch out. So think about those very carefully. Again, benchmark them. You can write very optimized Java code, but if it first spends ten seconds waiting for the database to run some monster SQL query than it will never be fast.
Use the right kind of collections
Most performance problems are related to doing things lots of times. Usually when working with big sets of data. Putting your data in a Map instead of in a List can make a huge difference. Also there are specialized collection types for all sorts of performance requirements. Study them and pick wisely.
Don't write code
When performance really matters, squeezing the last 'drops' out of some piece of code becomes a science all in itself. Unless you are writing some very exotic code, chances are great there will be some library or toolkit to solve your kind of problems. It will be used by many in the real world. Tried and tested. Don't try to beat that code. Use it.
We humble Java developers are end-users of code. We take the building blocks that the language and it's ecosystem provides and tie it together to form an application. For the most part, performance problems are caused by us not using the provided tools correctly, or not using any tools at all for that matter. But we really need specifics to be able to discuss those. Benchmarking gives you that specifity. And when the slow code is identified it is usually just a matter of changing a collection from list to map, or sorting it beforehand, or dropping a join from some query etc.
Attempting to optimise code which doesn't need to be optimised increases complexity and decreases readability.
However, there are cases were improving readability also comes with improved performance.
For example,
if a numeric value cannot be null, use a primitive instead of a wrapper. This makes it clearer that the value cannot be null but also uses less memory and reduces pressure on the GC.
use a Set when you have a collection which cannot have duplicates. Often a List is used when in fact a Set would be more appropriate, depending on the operations you perform, this can also be faster by reducing time complexity.
consider using an enum with one instance for a singleton (if you have to use singletons at all) This is much simpler as well as faster than double check locking. Hint: try to only have stateless singletons.
writing simpler, well structured code is also easier for the JIT to optimise. This is where trying to out smart the JIT with more complex solutions will back fire because you end up confusing the JIT and what you think should be faster is actually slower. (And it's more complicated as well)
try to reduce how much you write to the console (and IO in general) in critical sections. Writing to the console is so expensive, both for the program and the poor human having to read it that is it worth spending more time producing concise console output.
try to use a StringBuilder when you have a loop of elements to add. Note: Avoid using StringBuilder for one liners, just series of append() as this can actually be slower and harder to read.
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. --
Antoine de Saint-Exupery,
French writer (1900 - 1944)
Developers like to solve hard problems and there is a very strong temptation to solve problems which don't need to be solved. This is a very common behaviour for developers of up to 10 years experience (it was for me anyway ;), after about this point you have already solved most common problem before and you start selecting the best/minimum set of solutions which will solve a problem. This is the point you want to get to in your career and you will be able to develop quality software in far less time than you could before.
If you dream up an interesting problem to solve, go ahead and solve it in your own time, see what difference it makes, but don't include it in your working code unless you know (because you measured) that it really makes a difference.
However, if you find a simpler, elegant solution to a problem, this is worth including not because it might be faster (thought it might be), but because it should make the code easier to understand and maintain and this is usually far more valuable use of your time. Successfully used software usually costs three times as much to maintain as it cost to develop. Do what will make the life of the poor person who has to understand why you did something easier (which is harder if you didn't do it for any good reason in the first place) as this might be you one day ;)
A good example on when you might make an application slower to improve reasoning, is in the use of immutable values and concurrency. Immutable values are usually slower than mutable ones, sometimes much slower, however when used with concurrency, mutable state is very hard to get provably right, and you need this because testing it is good but not reliable. Using concurrency you have much more CPU to burn so a bit more cost in using immutable objects is a very sensible trade off. In some cases using immutable objects can allow you to avoid using locks and actually improve throughput. e.g. CopyOnWriteArrayList, if you have a high read to write ration.

Is it beneficial(in terms of memory & space complexity) to write a few lines of code into a single line. Is it worth it?

Example: Simple program of swapping two nos.
int a = 10;
int b = 20;
a = a+b;
b = a-b;
a = a-b;
Now in the following piece of code:
a=a+b-(b=a);
I mean What is the difference b/w these two piece of codes?
Addition : What if the addition of these two exceed the legitimate limit of an Integer which is different in case of Java & C++?
Neither of these looks good to me. Readability is key. If you want to swap values, the most "obvious" way to do it is via a temporary value:
int a = 10;
int b = 20;
int tmp = a;
a = b;
b = tmp;
I neither know nor would I usually care whether this was as efficient as the "clever" approaches involving arithmetic. Until someone proves that the difference in performance is significant within a real application, I'd aim for the simplest possible code that works. Not just here, but for all code. Decide how well you need it to perform (and in what dimensions), test it, and change it to be more complicated but efficient if you need to.
(Of course, if you've got a swap operation available within your platform, use that instead... even clearer.)
In C++, the code yields undefined behavior because there's no sequence point in a+b-(b=a) and you're changing b and reading from it.
You're better off using std::swap(a,b), it is optimized for speed and much more readable than what you have there.
Since your specific code is already commented upon, i would just add a general aspect. Writing one liners doesn't really matter because at instruction level, you cannot escape the number of steps your assembly is going to translate into machine code. Most of the compilers would already optimize accordingly.
That is, unless the one liner is actually using a different mechanism to achieve the goal for e.g. in case of swapping two variables, if you do not use a third variable and can avoid all the hurdles such as type overflow etc. and use bitwise operators for instance, then you might have saved one memory location and thereby access time to it.
In practice, this is of almost no value and is trouble for readability as already mentioned in other answers. Professional programs need to be maintained by people so they should be easy to understand.
One definition of good code is Code actually does what it appears to be doing
Even you yourself would find it hard to fix your own code if it is written cleverly in terms of some what shortened but complex operations. Readability should always be prioritized and most of the times, the real needed efficiency comes from improving design, approach or better data structures/algorithms, than instead short - one liners.
Quoting Dijkstra: The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague.
A couple points:
Code should first reflect your intentions. After all, it's meant for humans to read. After that, if you really really must, you can start to tweak the code for performance. Most of all never write code to demonstrate a gimmick or bit twiddling hack.
Breaking code onto multiple lines has absolutely no impact on performance.
Don't underestimate the compiler's optimizer. Just write the code as intuitively as possible, and the optimizer will ensure it has the best performance.
In this regard, the most descriptive, intuitive, fastest code, is:
std::swap(a, b);
Readability and instant understand-ability is what I personally rate (and several others may vote for) when writing and reading code. It improves maintainability. In the particular example provided, it is difficult to understand immediately what the author is trying to achieve in those few lines.
The single line code:a=a+b-(b=a); although very clever does not convey the author's intent to others obviously.
In terms of efficiency, optimisation by the compiler will achieve that anyway.
In terms of java at least i remember reading that the JVM is optimized for normal straight forward uses so often times you just fool yourself if you try to do stuff like that.
Moreover it looks awful.
OK, try this. Next time you have a strange bug, start by squashing up as much code into single lines as you can.
Wait a couple weeks so you've forgotten how it's supposed to work.
Try to debug it.
Of course it depends on the compiler. Although I cannot foresee any kind of earth-shattering difference. Abstruse code is the main one.

Performance of Collection class in Java

All,
I have been going through a lot of sites that post about the performance of various Collection classes for various actions i.e. adding an element, searching and deleting. But I also notice that all of them provide different environments in which the test was conducted i.e. O.S, memory, threads running etc.
My question is, if there is any site/material that provides the same performance information on best test environment basis? i.e. the configurations should not be an issue or catalyst for poor performance of any specific data structure.
[Updated]: Example, HashSet and LinkedHashSet both have a complexity of O (1) for inserting an element. However, Bruce Eckel' test claims that insertion is going to take more time for LinkedHashSet than for HashSet [http://www.artima.com/weblogs/viewpost.jsp?thread=122295]. So should I still go by the Big-Oh notation ?
Here are my recommendations:
First of all, don't optimize :) Not that I am telling you to design crap software, but just to focus on design and code quality more than premature optimization. Assuming you've done that, and now you really need to worry about which collection is best beyond purely conceptual reasons, let's move on to point 2
Really, don't optimize yet (roughly stolen from M. A. Jackson)
Fine. So your problem is that even though you have theoretical time complexity formulas for best cases, worst cases and average cases, you've noticed that people say different things and that practical settings are a very different thing from theory. So run your own benchmarks! You can only read so much, and while you do that your code doesn't write itself. Once you're done with the theory, write your own benchmark - for your real-life application, not some irrelevant mini-application for testing purposes - and see what actually happens to your software and why. Then pick the best algorithm. It's empirical, it could be regarded as a waste of time, but it's the only way that actually works flawlessly (until you reach the next point).
Now that you've done that, you have the fastest app ever. Until the next update of the JVM. Or of some underlying component of the operating system your particular performance bottleneck depends on. Guess what? Maybe your clients have different ones. Here comes the fun: you need to be sure that your benchmark is valid for others or in most cases (or have fun writing code for different cases). You need to collect data from users. LOTS. And then you need to do that over and over again to see what happens and if it still holds true. And then re-write your code accordingly over and over again (The - now terminated - Engineering Windows 7 blog is actually a nice example of how user data collection helps to make educated decisions to improve user experience.
Or you can... you know... NOT optimize. Platforms and compilers will change, but a good design should - on average - perform well enough.
Other things you can also do:
Have a look at the JVM's source code. It's very educative and you discover a herd of hidden things (I'm not saying that you have to use them...)
See that other thing on your TODO list that you need to work on? Yes, the one near the top but that you always skip because it's too hard or not fun enough. That one right there. Well get to it and leave the optimization thingy alone: it's the evil child of a Pandora's Box and a Moebius band. You'll never get out of it, and you'll deeply regret you tried to have your way with it.
That being said, I don't know why you need the performance boost so maybe you have a very valid reason.
And I am not saying that picking the right collection doesn't matter. Just that ones you know which one to pick for a particular problem, and that you've looked at alternatives, then you've done your job without having to feel guilty. The collections have usually a semantic meaning, and as long as you respect it you'll be fine.
In my opinion, all you need to know about a data structure is the Big-O of the operations on it, not subjective measures from different architectures. Different collections serve different purposes.
Maps are dictionaries
Sets assert uniqueness
Lists provide grouping and preserve iteration order
Trees provide cheap ordering and quick searches on dynamically changing contents that require constant ordering
Edited to include bwawok's statement on the use case of tree structures
Update
From the javadoc on LinkedHashSet
Hash table and linked list implementation of the Set interface, with predictable iteration order.
...
Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
Now we have moved from the very general case of choosing an appropriate data-structure interface to the more specific case of which implementation to use. However, we still ultimately arrive at the conclusion that specific implementations are well suited for specific applications based on the unique, subtle invariant offered by each implementation.
What do you need to know about them, and why? The reason that benchmarks show a given JDK and hardware setup is so that they could (in theory) be reproduced. What you should get from benchmarks is an idea of how things will work. For an ABSOLUTE number, you will need to run it vs your own code doing your own thing.
The most important thing to know is the Big O runtime of various collections. Knowing that getting an element out of an unsorted ArrayList is O(n), but getting it out of a HashMap is O(1) is HUGE.
If you are already using the correct collection for a given job, you are 90% of the way there. The times when you need to worry about how fast you can, say, get items out of a HashMap should be pretty darn rare.
Once you leave single threaded land and move into multi-threaded land, you will need to start worrying about things like ConcurrentHashMap vs Collections.synchronized hashmap. Until you are multi threaded, you can just not worry about this kind of stuff and focus on which collection for which use.
Update to HashSet vs LinkedHashSet
I haven't ever found a use case where I needed a Linked Hash Set (because if I care about order I tend to have a List, if I care about O(1) gets, I tend to use a HashSet. Realistically, most code will use ArrayList, HashMap, or HashSet. If you need anything else, you are in a "edge" case.
The different collection classes have different big-O performances, but all that tells you is how they scale as they get large. If your set is big enough the one with O(1) will outperform the one with O(N) or O(logN), but there's no way to tell what value of N is the break-even point, except by experiment.
Generally, I just use the simplest possible thing, and then if it becomes a "bottleneck", as indicated by operations on that data structure taking much percent of time, then I will switch to something with a better big-O rating. Quite often, either the number of items in the collection never comes near the break-even point, or there's another simple way to resolve the performance problem.
Both HashSet and LinkedHashSet have O(1) performance. Same with HashMap and LinkedHashMap (actually the former are implemented based on the later). This only tells you how these algorithms scale, not how they actually perform. In this case, LinkHashSet does all the same work as HashSet but also always has to update a previous and next pointer to maintain the order. This means that the constant (this is an important value also when talking about actual algorithm performance) for HashSet is lower than LinkHashSet.
Thus, since these two have the same Big-O, they scale the same essentially - that is, as n changes, both have the same performance change and with O(1) the performance, on average, does not change.
So now your choice is based on functionality and your requirements (which really should be what you consider first anyway). If you only need fast add and get operations, you should always pick HashSet. If you also need consistent ordering - such as last accessed or insertion order - then you must also use the Linked... version of the class.
I have used the "linked" class in production applications, well LinkedHashMap. I used this in one case for a symbol like table so wanted quick access to the symbols and related information. But I also wanted to output the information in at least one context in the order that the user defined those symbols (insertion order). This makes the output more friendly for the user since they can find things in the same order that they were defined.
If I had to sort millions of rows I'd try to find a different way. Maybe I could improve my SQL, improve my algorithm, or perhaps write the elements to disk and use the operating system's sort command.
I've never had a case where collections where the cause of my performance issues.
I created my own experimentation with HashSets and LinkedHashSets. For add() and contains the running time is O(1) , not taking into consideration for a lot of collisions. In the add() method for a linkedhashset, I put the object in a user created hash table which is O(1) and then put the object in a separate linkedlist to account for order. So the running time to remove an element from a linkedhashset, you must find the element in the hashtable and then search through the linkedlist that has the order. So the running time is O(1) + O(n) respectively which is o(n) for remove()

Categories

Resources