long static Strings in short-lived objects

long static Strings in short-lived objects - java

This might be a stupid question (or just make me look stupid :)), however I would be interested in how to work with long String objects in the context of short-lived objects.
Think about long SQL queries in cron job or anonymous, command or function-like classes. These are very short-lived classes and even will use these long Strings once in their lifetime for most of the time. What is better? To construct a String inline and let it be collected with the instance, or make it static final anyway and let them sit in the memory useless until the classes next instantiation?

Well, there's only so much control you can have over what happens to the String.
Even if you create it inline, that String will most probably be added to the String constant pool of the JVM, and will be reused when you declare it again, so in practice, you'll probably be reusing the same String object either way.
Unless the String is so huge that it has an impact on your application's performance, I wouldn't worry about it and choose the option that seemed more readable to me.
If that String will be used only in one particular point of the code, inside a method, I would declare it inline, I prefer to have my variables in the smallest scope that I can, but opinions here may vary.
If there is no change whatsoever, and it seems to make sense in your particular use case, by all means declare the String as static, again, I doubt it will affect performance.

String constants go into the constant pool of a class, and cannot be optimized away, i.e. are handled sufficiently well.
Creating long strings one does not do statically. For SQL use prepared statements with a ? place holder. The same holds for strings with placeholders: use MessageFormat.
To be explicit. The following does not cost anything extra:
static final String s = "... long string ...";

When remaining memory is limited JVM will normally do perm gen space cleaning and unload unused/unreferenced classes. So having the long strings as static variable won't do much harm in my opinion

If you feel your Strings can occupy lots of memory then dont make them static or declare them using String literal. As both of these will get stored in permgen space and will almost mever be garbage collected [there is chance but slim, statics may never be garbage collected is you have not created your own classloader].
So create String using new operator so that will be created in heap and can be easily garbage collected i.e.
String str = new String("long string");
EDIT:
How strings are stored: http://www.ntu.edu.sg/home/ehchua/programming/java/J3d_String.html
EDIT:
There has been a long discussion below regarding how new String works. Argument presented is that new String will create 2 objects one in heap and one in pool. THIS IS WRONG, it is not true by default and you can force java to do it by calling intern method. In order to back my argument following is the javadoc from Strin class for intern method:
intern
public String intern() Returns a canonical representation for the
string object. A pool of strings, initially empty, is maintained
privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in §3.10.5 of the Java Language
Specification
As can be seen by above doc that if new String always created an object in pool then intern method will be completely useless!! Also logically it doesn't makes any sense.
EDIT:
Also read the answer for this post : String pool creating two string objects for same String in Java

Related

Does string constant pool works with singleton annotated variables too?

object Keys {
#Singleton
const val KEY_Q = "question"
#Singleton
const val KEY_ID = "quesid"
}
I am using Singleton annotation with many string variables in my singleton class. I wanted to ask as by default a string variable is stored in the constant string pool, and during the updating process, JVM checks in the string pool if the same variable is available or not, if yes then it returns the reference of the same without creating a new one.
Now I want to ask, does this process works the same when we are using singleton annotation with our string variables. If yes then is there any benefit for me to use such a class with these annotations with different variables. I am a newbie to singletons please describe in detail. Thanks

Annotations make no difference to string pool behavior in Java. If your example was Java, the #Singleton annotations would not save memory1.
There is a very simple rule that covers what goes into the string pool in Java.
If a string is a result of evaluating a compile time constant expression then a single copy is placed in the string pool. The JLS specifies what a compile time constant expression is2.
The only other circumstance in which a string goes into the string pool3 is if some code explicitly calls intern on it.
However ...
In a modern JVM on modern hardware, it is most likely to be irrelevant whether a string goes into the string pool.
The string pool is part of the heap and is garbage collected like the rest of the heap.
The space taken by each individual string literal is most likely to be trivial compared to the rest of your application's memory usage. A few bytes, compared with megabytes ... or gigabytes.
If you think you can intern strings and exploit the special property of strings in the string pool (by using == to compare strings) you are treading on very dangerous grounds. This is a micro-optimization ... and it only works if you can be sure that you have interned all of the strings. (And besides, interning is more expensive that a few string comparisons, so your attempt at optimizing might be a failure.)
Finally, since Java 9, the GC performs automatic string de-duping for strings that have survived a few GC cycles. So if you really do have a lot of string data with a lot of duplicates, the best solution is to let the GC handle it.
1 - I cannot tell you exactly what this means for your example, because you are using syntax that is not valid Java. Java doesn't have a const, val or object keywords. This looks like Kotlin.
2 - Common examples are a string literal or a concatenation of string literals, but there are others; see JLS 15.28.
3 - It is implementation dependent whether the string itself or a copy of the string goes into the pool. But it is really difficult for an application to distinguish the different behaviors.

Why is String immutable in Java?

I was asked in an interview why String is immutable
I answered like this:
When we create a string in java like String s1="hello"; then an
object will be created in string pool(hello) and s1 will be
pointing to hello.Now if again we do String s2="hello"; then
another object will not be created but s2 will point to hello
because JVM will first check if the same object is present in
string pool or not.If not present then only a new one is created else not.
Now if suppose java allows string mutable then if we change s1 to hello world then s2 value will also be hello world so java String is immutable.
Can any body please tell me if my answer is right or wrong?

String is immutable for several reasons, here is a summary:
Security: parameters are typically represented as String in network connections, database connection urls, usernames/passwords etc. If it were mutable, these parameters could be easily changed.
Synchronization and concurrency: making String immutable automatically makes them thread safe thereby solving the synchronization issues.
Caching: when compiler optimizes your String objects, it sees that if two objects have same value (a="test", and b="test") and thus you need only one string object (for both a and b, these two will point to the same object).
Class loading: String is used as arguments for class loading. If mutable, it could result in wrong class being loaded (because mutable objects change their state).
That being said, immutability of String only means you cannot change it using its public API. You can in fact bypass the normal API using reflection. See the answer here.
In your example, if String was mutable, then consider the following example:
String a="stack";
System.out.println(a);//prints stack
a.setValue("overflow");
System.out.println(a);//if mutable it would print overflow

Java Developers decide Strings are immutable due to the following aspect design, efficiency, and security.
Design
Strings are created in a special memory area in java heap known as "String Intern pool". While you creating new String (Not in the case of using String() constructor or any other String functions which internally use the String() constructor for creating a new String object; String() constructor always create new string constant in the pool unless we call the method intern()) variable it searches the pool to check whether is it already exist.
If it is exist, then return reference of the existing String object.
If the String is not immutable, changing the String with one reference will lead to the wrong value for the other references.
According to this article on DZone:
Security
String is widely used as parameter for many java classes, e.g. network connection, opening files, etc. Were String not immutable, a connection or file would be changed and lead to serious security threat.
Mutable strings could cause security problem in Reflection too, as the parameters are strings.
Efficiency
The hashcode of string is frequently used in Java. For example, in a HashMap. Being immutable guarantees that hashcode will always the same, so that it can be cached without worrying the changes.That means, there is no need to calculate hashcode every time it is used.

We can not be sure of what was Java designers actually thinking while designing String but we can only conclude these reasons based on the advantages we get out of string immutability, Some of which are
1. Existence of String Constant Pool
As discussed in Why String is Stored in String Constant Pool article, every application creates too many string objects and in order to save JVM from first creating lots of string objects and then garbage collecting them. JVM stores all string objects in a separate memory area called String constant pool and reuses objects from that cached pool.
Whenever we create a string literal JVM first sees if that literal is already present in constant pool or not and if it is there, new reference will start pointing to the same object in SCP.
String a = "Naresh";
String b = "Naresh";
String c = "Naresh";
In above example string object with value Naresh will get created in SCP only once and all reference a, b, c will point to the same object but what if we try to make change in a e.g. a.replace("a", "").
Ideally, a should have value Nresh but b, c should remain unchanged because as an end user we are making the change in a only. And we know a, b, c all are pointing the same object so if we make a change in a, others should also reflect the change.
But string immutability saves us from this scenario and due to the immutability of string object string object Naresh will never change. So when we make any change in a instead of change in string object Naresh JVM creates a new object assign it to a and then make change in that object.
So String pool is only possible because of String's immutability and if String would not have been immutable, then caching string objects and reusing them would not have a possibility because any variable woulds have changed the value and corrupted others.
And That's why it is handled by JVM very specially and have been given a special memory area.
2. Thread Safety
An object is called thread-safe when multiple threads are operating on it but none of them is able to corrupt its state and object hold the same state for every thread at any point in time.
As we an immutable object cannot be modified by anyone after its creation which makes every immutable object is thread safe by default. We do not need to apply any thread safety measures to it such as creating synchronized methods.
So due to its immutable nature string object can be shared by multiple threads and even if it is getting manipulated by many threads it will not change its value.
3. Security
In every application, we need to pass several secrets e.g. user's user-name\passwords, connection URLs and in general, all of this information is passed as the string object.
Now suppose if String would not have been immutable in nature then it would cause a serious security threat to the application because these values are allowed to get changed and if it is allowed then these might get changed due to wrongly written code or any other person who have access to our variable references.
4. Class Loading
As discussed in Creating objects through Reflection in Java with Example, we can use Class.forName("class_name") method to load a class in memory which again calls other methods to do so. And even JVM uses these methods to load classes.
But if you see clearly all of these methods accepts the class name as a string object so Strings are used in java class loading and immutability provides security that correct class is getting loaded by ClassLoader.
Suppose if String would not have been immutable and we are trying to load java.lang.Object which get changed to org.theft.OurObject in between and now all of our objects have a behavior which someone can use to unwanted things.
5. HashCode Caching
If we are going to perform any hashing related operations on any object we must override the hashCode() method and try to generate an accurate hashcode by using the state of the object. If an object's state is getting changed which means its hashcode should also change.
Because String is immutable so the value one string object is holding will never get changed which means its hashcode will also not change which gives String class an opportunity to cache its hashcode during object creation.
Yes, String object caches its hashcode at the time of object creation which makes it the great candidate for hashing related operations because hashcode doesn't need to be calculated again which save us some time. This is why String is mostly used as HashMap keys.
Read More on Why String is Immutable and Final in Java.

Most important reason according to this article on DZone:
String Constant Pool
...
If string is mutable, changing the string with one reference will lead to the wrong value for the other references.
Security
String is widely used as parameter for many java classes, e.g. network connection, opening files, etc. Were String not immutable, a connection or file would be changed and lead to serious security threat.
...
Hope it will help you.

IMHO, this is the most important reason:
String is Immutable in Java because String objects are cached in
String pool. Since cached String literals are shared between multiple
clients there is always a risk, where one client's action would affect
all another client.
Ref: Why String is Immutable or Final in Java

You are right. String in java uses concept of String Pool literal. When a string is created and if the string already exists in the pool, the reference of the existing string will be returned, instead of creating a new object and returning its reference.If a string is not immutable, changing the string with one reference will lead to the wrong value for the other references.
I would add one more thing, since String is immutable, it is safe for multi threading and a single String instance can be shared across different threads. This avoid the usage of synchronization for thread safety, Strings are implicitly thread safe.

String is given as immutable by Sun micro systems,because string can used to store as key in map collection.
StringBuffer is mutable .That is the reason,It cannot be used as key in map object

The most important reason of a String being made immutable in Java is Security consideration. Next would be Caching.
I believe other reasons given here, such as efficiency, concurrency, design and string pool follows from the fact that String in made immutable. For eg. String Pool could be created because String was immutable and not the other way around.
Check Gosling interview transcript here
From a strategic point of view, they tend to more often be trouble free. And there are usually things you can do with immutables that you can't do with mutable things, such as cache the result. If you pass a string to a file open method, or if you pass a string to a constructor for a label in a user interface, in some APIs (like in lots of the Windows APIs) you pass in an array of characters. The receiver of that object really has to copy it, because they don't know anything about the storage lifetime of it. And they don't know what's happening to the object, whether it is being changed under their feet.
You end up getting almost forced to replicate the object because you don't know whether or not you get to own it. And one of the nice things about immutable objects is that the answer is, "Yeah, of course you do." Because the question of ownership, who has the right to change it, doesn't exist.
One of the things that forced Strings to be immutable was security. You have a file open method. You pass a String to it. And then it's doing all kind of authentication checks before it gets around to doing the OS call. If you manage to do something that effectively mutated the String, after the security check and before the OS call, then boom, you're in. But Strings are immutable, so that kind of attack doesn't work. That precise example is what really demanded that
Strings be immutable

String class is FINAL it mean you can't create any class to inherit it and change the basic structure and make the Sting mutable.
Another thing instance variable and methods of String class that are provided are such that you can't change String object once created.
The reason what you have added doesn't make the String immutable at all.This all says how the String is stored in heap.Also string pool make the huge difference in performance

In addition to the great answers, I wanted to add a few points. Like Strings, Array holds a reference to the starting of the array so if you create two arrays arr1 and arr2 and did something like arr2 = arr1 this will make the reference of arr2 same as arr1 hence changing value in one of them will result in change of the other one for example
public class Main {
public static void main(String[] args) {
int[] a = {1, 2, 3, 4};
int[] b = a;
a[0] = 8;
b[1] = 7;
System.out.println("A: " + a[0] + ", B: " + b[0]);
System.out.println("A: " + a[1] + ", B: " + b[1]);
//outputs
//A: 8, B: 8
//A: 7, B: 7
}
}
Not only that it would cause bugs in the code it also can(and will) be exploited by malicious user. Suppose if you have a system that changes the admin password. The user have to first enter the newPassword and then the oldPassword if the oldPassword is same as the adminPass the program change the password by adminPass = newPassword. let's say that the new password has the same reference as the admin password so a bad programmer may create a temp variable to hold the admin password before the users inputs data if the oldPassword is equal to temp it changes the password otherwise adminPass = temp. Someone knowing that could easily enter the new password and never enter the old password and abracadabra he has admin access. Another thing I didn't understand when learning about Strings why doesn't JVM create a new string for every object and have a unique place in memory for it and you can just do that using new String("str"); The reason you wouldn't want to always use new is because it's not memory efficient and it is slower in most cases read more.

If HELLO is your String then you can't change HELLO to HILLO. This property is called immutability property.
You can have multiple pointer String variable to point HELLO String.
But if HELLO is char Array then you can change HELLO to HILLO. Eg,
char[] charArr = 'HELLO';
char[1] = 'I'; //you can do this
Answer:
Programming languages have immutable data variables so that it can be used as keys in key, value pair. String variables are used as keys/indices, so they are immutable.

This probably has little to do with security because, very differently, security practices recommend using character arrays for passwords, not strings. This is because an array can be immediately erased when no longer needed. Differently, a string cannot be erased, because it is immutable. It may take long time before it is garbage collected, and even more before the content gets overwritten.
I think that immutability was chosen to allow sharing the strings and they fragments easily. String assignment, picking a substring becomes a constant time operation, and string comparison largely also, because of the reusable hash codes that are part of the string data structure and can be compared first.
From the other side, if the original string is huge (say large XML document), picking few symbols from there may prevent the whole document from being garbage collected. Because of that later Java versions seemed moved away from this immutability. Modern C++ has both mutable (std::string) and from C++17 also immutable (std::string_view) versions.

From the Security point of view we can use this practical example:
DBCursor makeConnection(String IP,String PORT,String USER,String PASS,String TABLE) {
// if strings were mutable IP,PORT,USER,PASS can be changed by validate function
Boolean validated = validate(IP,PORT,USER,PASS);
// here we are not sure if IP, PORT, USER, PASS changed or not ??
if (validated) {
DBConnection conn = doConnection(IP,PORT,USER,PASS);
}
// rest of the code goes here ....
}

Avoid duplicate Strings in Java

I want to ask a question about avoiding String duplicates in Java.
The context is: an XML with tags and attributes like this one:
<product id="PROD" name="My Product"...></product>
With JibX, this XML is marshalled/unmarshalled in a class like this:
public class Product{
private String id;
private String name;
// constructor, getters, setters, methods and so on
}
The program is a long-time batch processing, so Product objects are created, used, copied, etc.
Well, the question is:
When I analysed the execution with software like Eclipse memory analyzer (MAT), I found several duplicated Strings. For example, in the id attribute, the PROD value is duplicated around 2000 instances, etc.
How can I avoid this situation? Other attributes in Product class may change their value along the execution, but attrs like id, name... don't change so frequently.
I have readed something about String.intern() method, but I haven't used yet and I'm not sure it's a solution for this. Could I define the most frequent values in those attributes like static final constants in the class?
I hope I'd have expressed my question in a right way.
Any help or advice is very appreciated. Thanks in advance.

interning would be the right solution, if you really have a problem. Java stores String literals and a lot of other Strings in an internal pool and whenever a new String is about to be created, the JVM first checks, if the String is already in the pool. If yes, it will not create a new instance but pass the reference to the interned String object.
There are two ways to control this behaviour:
String interned = String.intern(aString); // returns a reference to an interned String
String notInterned = new String(aString); // creates a new String instance (guaranteed)
So maybe, the libraries really create new instances for all xml attribute values. This is possible and you won't be able to change it.
intern has a global effect. An interned String is immediatly available "for any object" (this view doesn't really make sense, but it may help to understand it).
So, lets say we have a line in class Foo, method foolish:
String s = "ABCD";
String literals are interned immediatly. JVM checks, if "ABCD" is already in the pool, if not, "ABCD" is stored in the pool. The JVM assigns a reference to the interned String to s.
Now, maybe in another class Bar, in method barbar:
String t = "AB"+"CD";
Then the JVM will intern "AB" and "CD" like above, create the concatenated String, look, if it is intered already, Hey, yes it is, and assign the reference to the interned String "ABCD" to t.
Calling "PROD".intern() may work or fail. Yes, it will intern the String "PROD". But there's a chance, that jibx really creates new Strings for attribute values with
String value = new String(getAttributeValue(attribute));
In that case, value will not have a reference to an interned String (even if "PROD" is in the pool) but a reference to a new String instance on the heap.
And, to the other question in your command: this happens at runtime only. Compiling simply creates class files, the String pool is a datastructure on the object heap and that is used by the JVM, that executes the application.

While String.intern() could solve that problem by reducing each value to a single unique String instance, it would introduce another problem: every intern()-ed String can survive for a long time in the JVM. If the IDs vary a lot (i.e. they are not part of a limited set, but can be any value), then this can have massive negative effects in the long run.
Edit: I used to claim that intern()-ed Strings can't ever be GCed, but #nanda proved me wrong with this JavaWorld article. While this somewhat reduces the problem introduced by intern() it's still not entirely removed: the pool provided by intern() can't be controlled and can have unexpected results with regards to garbage-collection).
Luckily Guava provides a solution in the form of the Interner interface and it's helper class Interners: Using Interners.newStrongInterner() you can create an object that can act as a "pool" of unique String objects much in the same way as String.intern() does, except that the pool is bound to that instance and if you discard the pool, then the content can become eligible for garbage collection as well.

Yes, interning is the correct solution and you'd done your homework (that is checking with profiler that this is the problem).
Interning can cause problem if you store too much. The permgen memory needs to be increased. Despite what some people said, interned Strings are also garbage collected, so if some strings are not used anymore, it will be object to be garbage collected.
Some supporting articles:
My blog: http://blog.firdau.si/2009/01/06/java-tips-memory-optimization-for-string/
Does intern garbage collected?: http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
Busting the 'Busting String.intern() Myths': http://kohlerm.blogspot.com/2009/01/is-javalangstringintern-really-evil.html

An alternative solution:
You could try is to define an <xs:enumeration/> restriction on your #id attribute (if your domain model would allow such a thing). If JibX is as intelligent as JAXB or other XML-Java mapping standards, then this could be mapped as a Java enum with constant literals, which can be reused heavily.
I would try that for the ID value, since it kinda looks like an enumeration to me...

As everyone know, String objects can be created in two ways, by using the literals and through new operator.
If you use a literal like String test = "Sample"; then this will be cached in String object pool. So interning is not required here as by default the string object will be cached.
But if you create a string object like String test = new String("Sample"); then this string object will not be added to the string pool. So here we need to use String test = new String("Sample").intern(); to forcefully push the string object to the string cache.
So it is always advisable to use string literals than new operator.
So in your case private static final String id = "PROD"; is the right solution.

Java/.NET - reusing values

Do we reduce memory consumption when storing a String value that we use very frequently?
As far as I know, every time we do a "some text" declaration in code, a new String object is constructed, instead of using the address of an existing one with the same value. Is this correct?
Is there anything that can be done to make the memory more efficient by always addressing the same String rather than creating new ones?

.NET uses a string intern pool to store string.
The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system.
Example below shows that the intern pool is used for literal string only. (s2 doesn't reference the same string as s1 even if the content is the same)
string s1 = "MyTest";
string s2 = new StringBuilder().Append("My").Append("Test").ToString();
string s3 = String.Intern(s2);
Console.WriteLine((Object)s2==(Object)s1); // Different references.
Console.WriteLine((Object)s3==(Object)s1); // The same reference.
Java does the same thing :
All literal strings and string-valued constant expressions are interned.

I wouldn't sweat it unless you can identify a genuine performance / memory issue you are trying to address. If you absolutely must optimize, then Javamex has a useful article showing some tips on how to save memory, such as by using Java's internal string pool.

From the api-doc for the java 2 sdk:
A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
So there is pooling...
The article here makes some good points about it.

In Java, you're totally wrong. Indeed, like in .net (or at least as #madgnome says), there is a constant string pool. Notice that, additionnaly to all constant strings, you can push a string to this pool by calling String.intern(). But use this method with cares, as it may be slow due to nature of that pool.

What is the purpose of the expression "new String(...)" in Java?

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.
For example:
String s;
...
s = new String("Hello World");
This, of course, compared to
s = "Hello World";
I'm not familiar with this syntax and have no idea what the purpose or effect would be.
Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?

The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in
small=new String(huge.substring(10,20))
However, this behavior is unfortunately undocumented and implementation dependent.
I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.
The only implementation agnostic way to do this is:
small=new String(huge.substring(10,20).toCharArray());
This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.
There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).
Pitfall of Assuming what the Doc Doesn't State
In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:
public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}
That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".
Be careful that you program to the documentation and not one implementation.

The only time I have found this useful is in declaring lock variables:
private final String lock = new String("Database lock");
....
synchronized(lock)
{
// do something
}
In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.

String s1="foo"; literal will go in StringPool and s1 will refer.
String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.
String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.
String s4=new String("foo"); same as s3
so System.out.println(s1==s2); //true due to literal comparison.
and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)

The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7.
There is no longer an offset field in class String, and substring always use
Arrays.copyOfRange(char[] original, int from, int to)
to trim the char array for the copy.

Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.
But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).
In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); **//DON'T DO THIS!**
(Effective Java, Second Edition)

Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):
A consequence of the fact that immutable objects can be shared freely
is that you never have to make defensive copies of them (Item
50). In fact, you never have to make any copies at all because the
copies would be forever equivalent to the originals. Therefore, you
need not and should not provide a clone method or copy constructor
(Item 13) on an immutable class. This was not well understood in the
early days of the Java platform, so the String class does have a copy
constructor, but it should rarely, if ever, be used.
So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.

Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.
Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.
So you see code that has
int q;
for(q=0;q<MAX;q++){
String s;
int ix;
// other stuff
s = new String("Hello, there!");
// do something with s
}
In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.
IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.

There are two ways in which Strings can be created in Java. Following are the examples for both the ways:
1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory.
eg: String str = "JAVA";
2)Use the constructor of String class and pass a string(within double quotes) as an argument.
eg: String s = new String("JAVA");
This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.

I guess it will depend on the code samples you're seeing.
Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.
You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.
While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.