Why is string.intern() so slow?

Why is string.intern() so slow? - java

Before anyone questions the fact of using string.intern() at all, let me say that I need it in my particular application for memory and performance reasons. [1]
So, until now I used String.intern() and assumed it was the most efficient way to do it. However, I noticed since ages it is a bottleneck in the software. [2]
Then, just recently, I tried to replace the String.intern() by a huge map where I put/get the strings in order to obtain each time a unique instance. I expected this would be slower... but it was exactly the opposite! It was tremendously faster! Replacing the intern() by pushing/polling a map (which achieves exactly the same) resulted in more than one order of magnitude faster.
The question is: why is intern() so slow?!? Why isn't it then simply backed up by a map (or actually, just a customized set) and would be tremendously faster? I'm puzzled.
[1]: For the unconvinced ones: It is in natural language processing and has to process gigabytes of text, therefore needs to avoid many instances of a same string to avoid blowing up the memory and referential string comparison to be fast enough.
[2]: without it (normal strings) it is impossible, with it, this particular step remains the most computation intensive one
EDIT:
Due to the surprising interest in this post, here is some code to test it out:
http://pastebin.com/4CD8ac69
And the results of interning a bit more than 1 million strings:
HashMap: 4 seconds
String.intern(): 54 seconds
Due to avoid some warm-up / OS IO caching and stuff like this, the experiment was repeated by inverting the order of both benchmarks:
String.intern(): 69 seconds
HashMap: 3 seconds
As you see, the difference is very noticeable, more than tenfolds. (Using OpenJDK 1.6.0_22 64bits ...but using the sun one resulted in similar results I think)

This article discusses the implementation of String.intern(). In Java 6 and 7, the implementation used a fixed size (1009) hashtable so as the number entries grew, the performance became O(n). The fixed size can be changed using -XX:StringTableSize=N. Apparently, in Java8 the default size is larger but issue remains.

Most likely reason for the performance difference: String.intern() is a native method, and calling a native method incurs massive overhead.
So why is it a native method? Probably because it uses the constant pool, which is a low-level VM construct.

#Michael Borgwardt said this in a comment:
intern() is not synchronized, at least at the Java language level.
I think that you mean that the String.intern() method is not declared as synchronized in the sourcecode of the String class. And indeed, that is a true statement.
However:
Declaring intern() as synchronized would only lock the current String instance, because it is an instance method, not a static method. So they couldn't implement string pool synchronization that way.
If you step back and think about it, the string pool has to perform some kind of internal synchronization. If it didn't it would be unusable in a multi-threaded application, because there is simply no practical way for all code that uses the intern() method to do external synchronization.
So, the internal synchronization that the string pool performs could be a bottleneck in multi-threaded application that uses intern() heavily.

I can't speak from any great experience with it, but from the String docs:
"When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the {#link #equals(Object)} method, then the string from the pool returned. Otherwise, this String object is added to the pool and a reference to this String object is returned."
When dealing with large numbers of objects, any solution involving hashing will outperform one that doesn't. I think you're just seeing the result of misusing a Java language feature. Interning isn't there to act as a Map of strings for your use. You should use a Map for that (or Set, as appropriate). The String table is for optimization at the language level, not the app level.

Related

When will the new String() object in memory gets cleared after invoking intern() method

List<String> list = new ArrayList<>();
for (int i = 0; i < 1000; i++)
{
StringBuilder sb = new StringBuilder();
String string = sb.toString();
string = string.intern()
list.add(string);
}
In the above sample, after invoking string.intern() method, when will the 1000 objects created in heap (sb.toString) be cleared?
Edit 1:
If there is no guarantee that these objects could be cleared. Assuming that GC haven't run, is it obsolete to use string.intern() itself? (In terms of the memory usage?)
Is there any way to reduce memory usage / object creation while using intern() method?

Your example is a bit odd, as it creates 1000 empty strings. If you want to get such a list with consuming minimum memory, you should use
List<String> list = Collections.nCopies(1000, "");
instead.
If we assume that there is something more sophisticated going on, not creating the same string in every iteration, well, then there is no benefit in calling intern(). What will happen, is implementation dependent. But when calling intern() on a string that is not in the pool, it will be just added to the pool in the best case, but in the worst case, another copy will be made and added to the pool.
At this point, we have no savings yet, but potentially created additional garbage.
Interning at this point can only save you some memory, if there are duplicates somewhere. This implies that you construct duplicate strings first, to look up their canonical instance via intern() afterwards, so having the duplicate string in memory until garbage collected, is unavoidable. But that’s not the real problem with interning:
in older JVMs, there was special treatment of interned string that could result in worse garbage collection performance or even running out of resources (i.e. the fixed size “PermGen” space).
in HotSpot, the string pool holding the interned strings is a fixed size hash table, yielding hash collisions, hence, poor performance, when referencing significantly more strings than the table size.
Before Java 7, update 40, the default size was about 1,000, not even sufficient to hold all string constants for any nontrivial application without hash collisions, not to speak of manually added strings. Later versions use a default size of about 60,000, which is better, but still a fixed size that should discourage you from adding an arbitrary number of strings
the string pool has to obey inter-thread semantics mandated by the language specification (as it is used to for string literals), hence, need to perform thread safe updates that can degrade the performance
Keep in mind that you pay the price of the disadvantages named above, even in the cases that there are no duplicates, i.e. there is no space saving. Also, the acquired reference to the canonical string has to have a much longer lifetime than the temporary object used to look it up, to have any positive effect on the memory consumption.
The latter touches your literal question. The temporary instances are reclaimed when the garbage collector runs the next time, which will be when the memory is actually needed. There is no need to worry about when this will happen, but well, yes, up to that point, acquiring a canonical reference had no positive effect, not only because the memory hasn’t been reused up to that point, but also, because the memory was not actually needed until then.
This is the place to mention the new String Deduplication feature. This does not change string instances, i.e. the identity of these objects, as that would change the semantic of the program, but change identical strings to use the same char[] array. Since these character arrays are the biggest payload, this still may achieve great memory savings, without the performance disadvantages of using intern(). Since this deduplication is done by the garbage collector, it will only applied to strings that survived long enough to make a difference. Also, this implies that it will not waste CPU cycles when there still is plenty of free memory.
However, there might be cases, where manual canonicalization might be justified. Imagine, we’re parsing a source code file or XML file, or importing strings from an external source (Reader or data base) where such canonicalization will not happen by default, but duplicates may occur with a certain likelihood. If we plan to keep the data for further processing for a longer time, we might want to get rid of duplicate string instances.
In this case, one of the best approaches is to use a local map, not being subject to thread synchronization, dropping it after the process, to avoid keeping references longer than necessary, without having to use special interaction with the garbage collector. This implies that occurrences of the same strings within different data sources are not canonicalized (but still being subject to the JVM’s String Deduplication), but it’s a reasonable trade-off. By using an ordinary resizable HashMap, we also do not have the issues of the fixed intern table.
E.g.
static List<String> parse(CharSequence input) {
List<String> result = new ArrayList<>();
Matcher m = TOKEN_PATTERN.matcher(input);
CharBuffer cb = CharBuffer.wrap(input);
HashMap<CharSequence,String> cache = new HashMap<>();
while(m.find()) {
result.add(
cache.computeIfAbsent(cb.subSequence(m.start(), m.end()), Object::toString));
}
return result;
}
Note the use of the CharBuffer here: it wraps the input sequence and its subSequence method returns another wrapper with different start and end index, implementing the right equals and hashCode method for our HashMap, and computeIfAbsent will only invoke the toString method, if the key was not present in the map before. So, unlike using intern(), no String instance will be created for already encountered strings, saving the most expensive aspect of it, the copying of the character arrays.
If we have a really high likelihood of duplicates, we may even save the creation of wrapper instances:
static List<String> parse(CharSequence input) {
List<String> result = new ArrayList<>();
Matcher m = TOKEN_PATTERN.matcher(input);
CharBuffer cb = CharBuffer.wrap(input);
HashMap<CharSequence,String> cache = new HashMap<>();
while(m.find()) {
cb.limit(m.end()).position(m.start());
String s = cache.get(cb);
if(s == null) {
s = cb.toString();
cache.put(CharBuffer.wrap(s), s);
}
result.add(s);
}
return result;
}
This creates only one wrapper per unique string, but also has to perform one additional hash lookup for each unique string when putting. Since the creation of a wrapper is quiet cheap, you really need a significantly large number of duplicate strings, i.e. small number of unique strings compared to the total number, to have a benefit from this trade-off.
As said, these approaches are very efficient, because they use a purely local cache that is just dropped afterwards. With this, we don’t have to deal with thread safety nor interact with the JVM or garbage collector in a special way.

You can open JMC and check for GC under Memory tab inside MBean Server of the particular JVM when it performed and how much did it cleared. Still, there is no fixed guarantee of the time when it would be called. You can initiate GC under Diagnostic Commands on a specific JVM.
Hope it helps.

Does String Pool in Java behaves like LRU cache?

Strings are immutable and are managed in String pool. I wish to know as how this pool is managed. If there are large number of String literals being used in an application, ( I understand String builder should be used when modifications like append, replace operations are more ) then Pool enhances the performance of the application by not recreating the new String objects again and again but using the same objects present in the pool, this is possible as Strings are immutable and doing so has no ill effect.
My question is as how this String Pool is managed. If in case there is huge frequency of some 'k' Strings and there may be few other String objects which are once created and not being used again. There may be other newer String literals being used.
In cases like these does String Pool behaves like LRU cache, holding
the references to the latest used literals and removing the older not
used strings from the pool ?
Does String pool has a size or can we control it in our application ?
Edit :
Usually we give size to the custom object pools we implement. I wonder why feature like LRU is not there for Sting Pools. This could have been a feature. In case of large Strings also there would not have been problem. But I feel its the way it has been implemented but I just wanted to know as why its not there, I mean its not there for some valid reason, having this feature would have resulted in some ill effects. If some one could throw some light on those ill effects, it will be good.

String pool is not an LRU cache, since entries aren't taken out unless GC'd.
There are 2 ways to get entries in the String pool. String literals go there automatically, and new entries can be added with String.intern() unless the String already exists in the pool, in which case a reference to it is returned.
The values are garbage collected if there are no more references to them, which for String literals (e.g. String constants) can be a bit harder than ones that were intern()ed.
The implementation has changed a lot between Java 6 and Java 8 (and even between minor versions). The default size of the String pool is apparently 1009, but it can be changed with -XX:StringTableSize=N (since Java 7) parameter. This size is the table size of an internal hash table, so it can be tuned higher if you're using a lot of intern() (for String literals, it should be plenty). The size affects only the speed of intern() call, not the amount of Strings you can intern.
Basically unless you're using intern() heavily (presumably for a good reason), there's very little reason to worry about the String pool. Especially since it's no longer stored in PermGen, so it can't cause OutOfMemoryErrors very easily anymore.
Source.

Is it a sensible optimization to check whether a variable holds a specific value before writing that value?

if (var != X)
var = X;
Is it sensible or not? Will the compiler always optimize-out the if statement? Are there any use cases that would benefit from the if statement?
What if var is a volatile variable?
I'm interested in both C++ and Java answers as the volatile variables have different semantics in both of the languages. Also the Java's JIT-compiling can make a difference.
The if statement introduces branching and additional read that wouldn't happen if we always overwrote var with X, so it's bad. On the other hand, if var == X then using this optimization we perform only a read and we do not perform a write, which could have some effects on cache. Clearly, there are some trade-offs here. I'd like to know how it looks like in practice. Has anyone done any testing on this?
EDIT:
I'm mostly interested about how it looks like in a multi-processor environment. In a trivial situation there doesn't seem to be much sense in checking the variable first. But when cache coherency has to be kept between processors/cores the extra check might be actually beneficial. I just wonder how big impact can it have? Also shouldn't the processor do such an optimization itself? If var == X assigning it once more value X should not 'dirt-up' the cache. But can we rely on this?

Yes, there are definitely cases where this is sensible, and as you suggest, volatile variables are one of those cases - even for single threaded access!
Volatile writes are expensive, both from a hardware and a compiler/JIT perspective. At the hardware level, these writes might be 10x-100x more expensive than a normal write, since write buffers have to be flushed (on x86, the details will vary by platform). At the compiler/JIT level, volatile writes inhibit many common optimizations.
Speculation, however, can only get you so far - the proof is always in the benchmarking. Here's a microbenchmark that tries your two strategies. The basic idea is to copy values from one array to another (pretty much System.arraycopy), with two variants - one which copies unconditionally, and one that checks to see if the values are different first.
Here are the copy routines for the simple, non-volatile case (full source here):
// no check
for (int i=0; i < ARRAY_LENGTH; i++) {
target[i] = source[i];
}
// check, then set if unequal
for (int i=0; i < ARRAY_LENGTH; i++) {
int x = source[i];
if (target[i] != x) {
target[i] = x;
}
}
The results using the above code to copy an array length of 1000, using Caliper as my microbenchmark harness, are:
benchmark arrayType ns linear runtime
CopyNoCheck SAME 470 =
CopyNoCheck DIFFERENT 460 =
CopyCheck SAME 1378 ===
CopyCheck DIFFERENT 1856 ====
This also includes about 150ns of overhead per run to reset the target array each time. Skipping the check is much faster - about 0.47 ns per element (or around 0.32 ns per element after we remove the setup overhead, so pretty much exactly 1 cycle on my box).
Checking is about 3x slower when the arrays are the same, and 4x slower then they are different. I'm surprised at how bad the check is, given that it is perfectly predicted. I suspect that the culprit is largely the JIT - with a much more complex loop body, it may be unrolled fewer times, and other optimizations may not apply.
Let's switch to the volatile case. Here, I've used AtomicIntegerArray as my arrays of volatile elements, since Java doesn't have any native array types with volatile elements. Internally, this class is just writing straight through to the array using sun.misc.Unsafe, which allows volatile writes. The assembly generated is substantially similar to normal array access, other than the volatile aspect (and possibly range check elimination, which may not be effective in the AIA case).
Here's the code:
// no check
for (int i=0; i < ARRAY_LENGTH; i++) {
target.set(i, source[i]);
}
// check, then set if unequal
for (int i=0; i < ARRAY_LENGTH; i++) {
int x = source[i];
if (target.get(i) != x) {
target.set(i, x);
}
}
And here are the results:
arrayType benchmark us linear runtime
SAME CopyCheckAI 2.85 =======
SAME CopyNoCheckAI 10.21 ===========================
DIFFERENT CopyCheckAI 11.33 ==============================
DIFFERENT CopyNoCheckAI 11.19 =============================
The tables have turned. Checking first is ~3.5x faster than the usual method. Everything is much slower overall - in the check case, we are paying ~3 ns per loop, and in the worst cases ~10 ns (the times above are in us, and cover the copy of the whole 1000 element array). Volatile writes really are more expensive. There is about 1 ns of overheaded included in the DIFFERENT case to reset the array on each iteration (which is why even the simple is slightly slower for DIFFERENT). I suspect a lot of the overhead in the "check" case is actually bounds checking.
This is all single threaded. If you actual had cross-core contention over a volatile, the results would be much, much worse for the simple method, and just about as good as the above for the check case (the cache line would just sit in the shared state - no coherency traffic needed).
I've also only tested the extremes of "every element equal" vs "every element different". This means the branch in the "check" algorithm is always perfectly predicted. If you had a mix of equal and different, you wouldn't get just a weighted combination of the times for the SAME and DIFFERENT cases - you do worse, due to misprediction (both at the hardware level, and perhaps also at the JIT level, which can no longer optimize for the always-taken branch).
So whether it is sensible, even for volatile, depends on the specific context - the mix of equal and unequal values, the surrounding code and so on. I'd usually not do it for volatile alone in a single-threaded scenario, unless I suspected a large number of sets are redundant. In heavily multi-threaded structures, however, reading and then doing a volatile write (or other expensive operation, like a CAS) is a best-practice and you'll see it quality code such as java.util.concurrent structures.

Is it a sensible optimization to check whether a variable holds a specific value before writing that value?
Are there any use cases that would benefit from the if statement?
It is when assignment is significantly more costly than an inequality comparison returning false.
A example would be a large* std::set, which may require many heap allocations to duplicate.
**for some definition of "large"*
Will the compiler always optimize-out the if statement?
That's a fairly safe "no", as are most questions that contain both "optimize" and "always".
The C++ standard makes rare mention of optimizations, but never demands one.
What if var is a volatile variable?
Then it may perform the if, although volatile doesn't achieve what most people assume.

In general the answer is no. Since if you have simple datatype, compiler would be able to perform any necessary optimizations. And in case of types with heavy operator= it is responsibility of operator= to choose optimal way to assign new value.

There are situations where even a trivial assignment of say a pointersized variable can be more expensive than a read and branch (especially if predictable).
Why? Multithreading. If several threads are only reading the same value, they can all share that value in their caches. But as soon as you write to it, you have to invalidate the cacheline and get the new value the next time you want to read it or you have to get the updated value to keep your cache coherent. Both situations lead to more traffic between the cores and add latency to the reads.
If the branch is pretty unpredictable though it's probably still slower.

In C++, assigning a SIMPLE variable (that is, a normal integer or float variable) is definitely and always faster than checking if it already has that value and then setting it if it didn't have the value. I would be very surprised if this wasn't true in Java too, but I don't know how complicated or simple things are in Java - I've written a few hundred lines, and not actually studied how byte code and JITed bytecode actually works.
Clearly, if the variable is very easy to check, but complicated to set, which could be the case for classes and other such things, then there may be a value. The typical case where you'd find this would be in some code where the "value" is some sort of index or hash, but if it's not a match, a whole lot of work is required. One example would be in a task-switch:
if (current_process != new_process_to_run)
current_process == new_process_to_run;
Because here, a "process" is a complex object to alter, but the != can be done on the ID of the process.
Whether the object is simple or complex, the compiler will almost certainly not understand what you are trying to do here, so it will probably not optimize it away - but compilers are more clever than you think SOMETIMES, and more stupid at other times, so I wouldn't bet either way.
volatile should always force the compiler to read and write values to the variable, whether it "thinks" it is necessary or not, so it will definitely READ the variable and WRITE the variable. Of course, if the variable is volatile it probably means that it can change or represents some hardware, so you should be EXTRA careful with how you treat it yourself too... An extra read of a PCI-X card could incur several bus cycles (bus cycles being an order of magnitude slower than the processor speed!), which is likely to affect the performance much more. But then writing to a hardware register may (for example) cause the hardware to do something unexpected, and checking that we have that value first MAY make it faster, because "some operation starts over", or something like that.

It would be sensible if you had read-write locking semantics involved, whenever reading is usually less disruptive than writing.

In Objective-C you have the situation where assigning a object address to a pointer variable may require that the object be "retained" (reference count incremented). In such a case it makes sense to see if the value being assigned is the same as the value currently in the pointer variable, to avoid having to do the relatively expensive increment/decrement operations.
Other languages that use reference counting likely have similar scenarios.
But when assigning, say, an int or a boolean to a simple variable (outside of the multiprocessor cache scenario mentioned elsewhere) the test is rarely merited. The speed of a store in most processors is at least as fast as the load/test/branch.

In java the answer is always no. All assignments you can do in Java are primitive. In C++, the answer is still pretty much always no - if copying is so much more expensive than an equality check, the class in question should do that equality check itself.

What is the name of this locking technique?

I've got a gigantic Trove map and a method that I need to call very often from multiple threads. Most of the time this method shall return true. The threads are doing heavy number crunching and I noticed that there was some contention due to the following method (it's just an example, my actual code is bit different):
synchronized boolean containsSpecial() {
return troveMap.contains(key);
}
Note that it's an "append only" map: once a key is added, is stays in there forever (which is important for what comes next I think).
I noticed that by changing the above to:
boolean containsSpecial() {
if ( troveMap.contains(key) ) {
// most of the time (>90%) we shall pass here, dodging lock-acquisition
return true;
}
synchronized (this) {
return troveMap.contains(key);
}
}
I get a 20% speedup on my number crunching (verified on lots of runs, running during long times etc.).
Does this optimization look correct (knowing that once a key is there it shall stay there forever)?
What is the name for this technique?
EDIT
The code that updates the map is called way less often than the containsSpecial() method and looks like this (I've synchronized the entire method):
synchronized void addSpecialKeyValue( key, value ) {
....
}

This code is not correct.
Trove doesn't handle concurrent use itself; it's like java.util.HashMap in that regard. So, like HashMap, even seemingly innocent, read-only methods like containsKey() could throw a runtime exception or, worse, enter an infinite loop if another thread modifies the map concurrently. I don't know the internals of Trove, but with HashMap, rehashing when the load factor is exceeded, or removing entries can cause failures in other threads that are only reading.
If the operation takes a significant amount of time compared to lock management, using a read-write lock to eliminate the serialization bottleneck will improve performance greatly. In the class documentation for ReentrantReadWriteLock, there are "Sample usages"; you can use the second example, for RWDictionary, as a guide.
In this case, the map operations may be so fast that the locking overhead dominates. If that's the case, you'll need to profile on the target system to see whether a synchronized block or a read-write lock is faster.
Either way, the important point is that you can't safely remove all synchronization, or you'll have consistency and visibility problems.

It's called wrong locking ;-) Actually, it is some variant of the double-checked locking approach. And the original version of that approach is just plain wrong in Java.
Java threads are allowed to keep private copies of variables in their local memory (think: core-local cache of a multi-core machine). Any Java implementation is allowed to never write changes back into the global memory unless some synchronization happens.
So, it is very well possible that one of your threads has a local memory in which troveMap.contains(key) evaluates to true. Therefore, it never synchronizes and it never gets the updated memory.
Additionally, what happens when contains() sees a inconsistent memory of the troveMap data structure?
Lookup the Java memory model for the details. Or have a look at this book: Java Concurrency in Practice.

This looks unsafe to me. Specifically, the unsynchronized calls will be able to see partial updates, either due to memory visibility (a previous put not getting fully published, since you haven't told the JMM it needs to be) or due to a plain old race. Imagine if TroveMap.contains has some internal variable that it assumes won't change during the course of contains. This code lets that invariant break.
Regarding the memory visibility, the problem with that isn't false negatives (you use the synchronized double-check for that), but that trove's invariants may be violated. For instance, if they have a counter, and they require that counter == someInternalArray.length at all times, the lack of synchronization may be violating that.
My first thought was to make troveMap's reference volatile, and to re-write the reference every time you add to the map:
synchronized (this) {
troveMap.put(key, value);
troveMap = troveMap;
}
That way, you're setting up a memory barrier such that anyone who reads the troveMap will be guaranteed to see everything that had happened to it before its most recent assignment -- that is, its latest state. This solves the memory issues, but it doesn't solve the race conditions.
Depending on how quickly your data changes, maybe a Bloom filter could help? Or some other structure that's more optimized for certain fast paths?

Under the conditions you describe, it's easy to imagine a map implementation for which you can get false negatives by failing to synchronize. The only way I can imagine obtaining false positives is an implementation in which key insertions are non-atomic and a partial key insertion happens to look like another key you are testing for.
You don't say what kind of map you have implemented, but the stock map implementations store keys by assigning references. According to the Java Language Specification:
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values.
If your map implementation uses object references as keys, then I don't see how you can get in trouble.
EDIT
The above was written in ignorance of Trove itself. After a little research, I found the following post by Rob Eden (one of the developers of Trove) on whether Trove maps are concurrent:
Trove does not modify the internal structure on retrievals. However, this is an implementation detail not a guarantee so I can't say that it won't change in future versions.
So it seems like this approach will work for now but may not be safe at all in a future version. It may be best to use one of Trove's synchronized map classes, despite the penalty.

I think you would be better off with a ConcurrentHashMap which doesn't need explicit locking and allows concurrent reads
boolean containsSpecial() {
return troveMap.contains(key);
}
void addSpecialKeyValue( key, value ) {
troveMap.putIfAbsent(key,value);
}
another option is using a ReadWriteLock which allows concurrent reads but no concurrent writes
ReadWriteLock rwlock = new ReentrantReadWriteLock();
boolean containsSpecial() {
rwlock.readLock().lock();
try{
return troveMap.contains(key);
}finally{
rwlock.readLock().release();
}
}
void addSpecialKeyValue( key, value ) {
rwlock.writeLock().lock();
try{
//...
troveMap.put(key,value);
}finally{
rwlock.writeLock().release();
}
}

Why you reinvent the wheel?
Simply use ConcurrentHashMap.putIfAbsent

Is one-instance-per-unique-immutable design pattern considered evil?

I was reading a chapter on effective Java that talks about the advantages of keeping only one instance of an immutable object, such that we can do object identity comparison x == y instead of comparing the values for identity.
Also, POJOs like java.awt.RenderingHints.Key often use the one-instance-per-unique-immutable design pattern:
Instances of this class are immutable and unique which means that tests for matches can be made using the == operator instead of the more expensive equals() method.
I can understand the speed boost with this approach,
But wouldn't this design pattern eventually cause a memory leak ?

Yes, it may cause memory growth (it's not a leak if it's an intentional behavior). Whether it will or won't depends on just how the uniqueness contract is specified. For example, if you serialize one of these objects to disk, exit the scope in which it exists, and then deserialize it back from disk, one of two things happens: either you get the same object, or you get a different one. If you get the same object, then every object every used in the life of the JVM needs to be kept, and you'll have memory growth. If you get a different object, then the objects only need to exist while there is a reference to them, and you won't have memory growth.

That is sometimes called the Flyweight pattern, especially if the space of possible objects is bounded.

Regarding implementing the cache you can choose http://docs.oracle.com/javase/6/docs/api/java/util/WeakHashMap.html or you can have bounded LRU cache implemented.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.