How does Java initialize String literal

How does Java initialize String literal - java

Javadoc said that:
The String class represents character strings. All string literals in Java programs, such as "abc", are implemented as instances of this class.
We know that String class has two properties namely: value[] and hash, and String literal are stored in a String pool.
But I am not able to figure out how that String literal is initialized before putting to that pool. As if I debug the string literal later, I can see the value[] and hash is somehow populated.
Does JVM invoke a special instruction?

JVM creates a new string literal object during constant pool resolution, if the same string has not been put to the string table before by String.intern call.
It is not specified how JVM creates and initializes such strings, so JVM may do whatever it wants as long as the result object is a regular java.lang.String instance that can be accessed from application code.
As to HotSpot JVM, it does not call any of String constructors, it just allocates a new object in Heap and fills in the fields in C++ code, see java_lang_String::basic_create:
Handle java_lang_String::basic_create(int length, TRAPS) {
assert(initialized, "Must be initialized");
// Create the String object first, so there's a chance that the String
// and the char array it points to end up in the same cache line.
oop obj;
obj = InstanceKlass::cast(SystemDictionary::String_klass())->allocate_instance(CHECK_NH);
// Create the char array. The String object must be handlized here
// because GC can happen as a result of the allocation attempt.
Handle h_obj(THREAD, obj);
typeArrayOop buffer;
buffer = oopFactory::new_charArray(length, CHECK_NH);
// Point the String at the char array
obj = h_obj();
set_value(obj, buffer);
// No need to zero the offset, allocation zero'ed the entire String object
assert(offset(obj) == 0, "initial String offset should be zero");
//set_offset(obj, 0);
set_count(obj, length);
return h_obj;
}
hash field of such new object is initialized to zero. The right hash code will be calculated on the first call to String.hashCode.

Does JVM invoke a special instruction?
No. There is no "special" JVM instruction to do that.
The classloading infrastructure will most likely be creating the String objects that correspond to the literals using one of the String constructors; e.g. String(char[]) or String(byte[]). It will be getting the characters or bytes from the "constant pool" area in the ".class" file.

Follow breakpoint on public String(char value[]) of String class.
Edit: Whoever downvoted this:
Firstly, did you try as per answer? I had tried and it stops exactly in that method. I had tried with two different IDEs.
Secondly, value member(as pointed in OP) is assigned in that method. If you indeed tried, you would have got that. Thirdly, hash member(as pointed in OP) is assigned a value on the call to hashCode() and hash member is assigned a value in it.
Clearly, you haven't tried. If you did, investigation on breakpoint would have led you to answers in OP:
1) how that String literal is initialized
and
2) ...somehow populated
.

Related

How many String objects are created if we call any method like subString() or toUpperCase() on a string in Java

Please consider below 2 lines of Java code :
String s = new String("Hello");
String s1 = s.toUpperCase();
In the above code how many objects will be created when we call toUpperCase() method on String s. Will an object be created in heap as well as String constant pool or will it be created only in heap?
I know toUpperCase() will create a new object in heap. But want to know if it will also be placed in the String constant pool.

The answer is ... it depends.
The one thing that we can say with certainty is that neither toUpperCase() or subString() will place a String in the string pool1. The only String operation that is specified as (possibly) adding a String to the string pool is intern().
We cannot say for certain that toUpperCase() and subString() will generate a new String object. For example, in Java 11 str.substring(0) returns str itself. This is an implementation detail, but the javadocs for many operations are worded so that it would be valid to return this ... if this satisfied the requirements.
In the cases when a new String is created, it is implementation dependent how may objects are created. A Java string is typically represented as a String object with a reference to an backing char[] or byte[] that represent the string's characters. But in some Java releases, a backing array can be shared between multiple String objects ... under certain circumstances. And indeed substring was one of the methods that would create String with shared backing arrays.
Finally, as a commenter pointed out, this stuff is irrelevant to real world Java programming ... in all but the most extreme circumstances. Normally, you should just let the JVM deal with this kind of stuff.
1 - It is the string pool not the "constant string pool". The "constant pool" is actually something that exists in class files. But the string pool may contain strings that do not correspond to Java constants.

Java String Immutability and Using same string value to create a new string

I know that the title of the question is not very clear, sorry about that, did not know how to put it up. I have a very basic java implementation question which I want to focus on application performance, but it also involves String creation pattern in java.
I understand the immutability concept of Strings in Java. What I am not sure about is that, I have read somewhere that the following will not make two different String objects:
String name = "Sambhav";
String myName= "Sambhav";
I want to know how does Java do that? Does it actually look for a String value in the program memory and check for its existence and if it does not exist then creates a new String object?
In that case obviously it is saving memory but there are performance issues.
Also lets say I have a code like this:
public void some_method(){
String name = "Sambhav";
System.out.println(name); // or any random stufff
}
Now on each call of this function, is there a new String being made and added to memory or am I using the same String object?
I am just curious to know about the insights of how all this is happening?
Also if we say that
String name = "Sambhav";
String myName= "Sambhav";
will not create a new object because of reference, what about
String name = new String("Sambhav");
String myName= new String("Sambhav");
Will Java still be able to catch that the string are the same and just point myName to the same object as created in the previous statement?

Strings are internally char arrays with some inherent capabilities to work with the underlying char array. Eg. subString(int), split(String) methods.
Strings are immutable which means any effort made to change a String reference create a new String and allocate memory for that. As below
line 1. String a = new String("SomeString");
line 2. a = "SomeStringChanged";
line 1 allocate memory with "SomeString" referenced by variable a and add "SomeString" to String Pool
line 2 allocate memory in String Pool with "SomeStringChanged" and referenced by a. i.e a is not pointing to "SomeString" now and memory occupied by "SomeString" is available for gc now.
No reuse here
line 3. String b = "SomeStringChanged";
Now the literal "SomeStringChanged" is reused by variable a and b. i.e they are referring to the same memory location, in fact to a location called the 'String Pool'.
line 4. a = new String("SomeStringChanged");
Now a new allocation is done which contains "SomeStringChanged" and referenced by a
There is no reuse happening now. (the char array SomeStringChanged is already there in String Pool. So no String Pool allocation happen)
line 5. a = new String("SomeStringChanged").intern();
Now the allocation created during line 4 is discarded and variable a and b are referring to same location in the String Pool which contains "SomeStringChanged". There is reuse of the same char array here. The credit goes to intern() method
line 6. String x = new String("SomeX");
line 7. String y = "SomeX";
Line 6 will create an allocation for SomeX in the heap and in String Pool. The char array is duplicated.
Line 7 will not allocate any memory for SomeX since its already there in the String Pool
Line 8 String s = new String(someStringVariable);
Line 8 will only allocate single memory location in the heap and not in the String Pool.
In conclusion the reuse of a char array of string is only possible if a String reference is declared as a literal or the String object is interned i.e Only these two can make use of a String pool (which is in fact the idea behind char array reuse).

String that you put in quotes in you source files "like that" are compile-time constants and in case their contents match they are represented by a single entry in a constant pool inside your class's byte-code representation and thus represent a single String object at run-time.
String name = new String("Sambhav");
String myName= new String("Sambhav");
Those are different Objects explicitly, a new String Object will created for each call, though it could reuse char array of the underlying string (the one you provide in constructor). This happens due to new keyword that envisages Java to create a new object. And that is why name != myName in that case, even though name.equals(myName)

String name = new String("Sambhav");
String myName = new String("Sambhav");
Will Java still be able to catch that the string
are the same and just point myName to the same object as created in
the previous statement?
The JVM manages to keep only one reference of equal String objects by computing a hash.
Those String objects are kept in a String pool.
String pooling
String pooling (sometimes also called as string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object.
You can achieve this goal by keeping your own Map<String, String> (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values.
Or you can use String.intern() method which is provided to you by JDK.
Quick string pool differences by JVM version
In Java 6, this String pool was located in the Perma Gen memory. This memory is usually small and limited. Also, here the String.intern() shouldn't be used because you can run out of memory.
In Java 7 and 8 it was taken out to the heap memory and implemented with a hash-table like data structure.
Since hash-table like structures (HashMap, WeakHashMap) use a computed hash to access the entry in constant complexity, the entire process is very fast.
As mentioned in this article:
Stay away from String.intern() method on Java 6 due to a fixed size memory area (PermGen) used for JVM string pool storage.
Java 7 and 8 implement the string pool in the heap memory. It means that you are limited by the whole application memory for string pooling in Java 7 and 8.
Use -XX:StringTableSize JVM parameter in Java 7 and 8 to set the string pool map size. It is fixed, because it is implemented as a hash map with lists in the buckets. Approximate the number of distinct strings in your application (which you intend to intern) and set the pool size equal to some prime number close to this value. It will allow String.intern() to run in the constant time and requires a rather small memory consumption per interned string (explicitly used Java WeakHashMap will consume 4-5 times more memory for the same task).
The default value of -XX:StringTableSize parameter is 1009 in Java 7 and around 25-50K in Java 8.

You are actually showing 3 different reasons why the Strings may use the same buffer internally. Note that sharing a buffer is only possible for separate instances because they are immutable; otherwise changes in the buffer would be reflected in the other variable values as well.
Compiler detects identical String literals; if the string literal is repeated the compiler may simply point to the same object instance;
References to a String are pointing to the same object instance and are therefore identical by definition;
Buffer sharing may help during construction with new. If the runtime system sees that String contents may be shared then it may opt to do so; this behavior is however not guaranteed - it's implementation specific. The object instances should be different (but using them as separate instances would still not be wise).
As an example for #3, Java 6 OpenJDK source simply will point to the same buffer. If the buffer is larger than the new String instance, a copy will be created. Those are different Objects explicitly, a new String Object will created for each call, though it could reuse char array of the underlying string (the one you provide in constructor) so that the Garbage Collector can clear the larger string (otherwise the larger character buffer may be kept in memory indefinitely).
This all should not matter too much to you, unless you get careless and start using == for equality (or other constructs that confuse == with equals).

Object life cycle in java and memory management?

For the below statement in a program, how many objects will be created in heap memory and in the string constant pool?
I need clarity in object creation. Many sources I've read are not elaborating. I am confused when the object gets destroyed.
String a="MAM"+"BCD"+"EFG"+"GFE";
How many objects will be created?
I am looking for good material about the life cycle of objects, methods and classes and how the JVM handles them when they are dynamically changed and modified.

"MAM"+"BCD"+"EFG"+"GFE" is a compile-time constant expression and it compiles into "MAMBCDEFGGFE" string literal. JVM will create an instance of String from this literal when loading the class containing the above code and will put this String into the string pool. Thus String a = "MAM"+"BCD"+"EFG"+"GFE"; does not create any object, see JLS 15.18.1. String Concatenation Operator +
The String object is newly created (§12.5) unless the expression is a compile-time constant expression (§15.28).
It simply assigns a reference to String object in pool to local var a.

Only one object is created.
string s1 = "java";
string s2 = "ja" + "va";
s.o.p(s1==s2);
The statement yields true.
String s1="java";
string s2 = "ja";
String s3 = s2 +"va";
s.o.p(s1==s3);
The statement yields false.
So minimum one apparent should be permanent, then '+' operator generates new string object (in non constant pool using new()).
So, the question you asked does not have one also permanent. This means it creates only one object.

Exactly one object is created and placed in the constant pool, unless it already exists, in which case the existing object is used. The compiler concatenates string constants together, as specified in JLS 3.10.5 and 15.28.
A long string literal can always be broken up into shorter pieces and written as a (possibly parenthesized) expression using the string concatenation operator +
http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.5

Most answers seem to focus that a) the complete expression is one compile time constant and b) that the line itself does not construct a new object but only a reference to one object.
However noone so far has mentioned, that String itself contains a reference to an internal char[] (which is also in the constant pool).
Summary: There are two objects in the constant pool (String and char[]). The line does neither create nor destroy any object.
And regarding:
I am confused when the object gets destroyed.
No object is destroyed, since stuff in the constant pool will only be destroyed if the class itself would be unloaded. At most you can say, that the reference a will go out of scope eventually.

Only one object will be created since String a will compile into "MAMBCDEFGGFE".

Answers stating a single heap object in your example are correct. However, consider this code:
public class Tester
{
public String a="MAM";
public String b ="BCD";
public String c = "EFG";
public String d ="GFE";
public Tester()
{
String abcd = a + b + c + d;
}
}
In this example, there are 7 strings being created. a,b,c and d are not compiled into a single constant - they are members. 1 string is then created for each + operator - semantically speaking, + is a concatenation but logically it is creating a new string in memory. The first 2 operator strings are discarded immediately and are now eligible for garbage collection but the memory churn still occurs.
Technically there in an 8th object. The instance of Tester.
Edit: This has been proved to be nonsense in the comments

Strings memory model

What I read about strings is when a string object is created in Java it is immutable. For example:
String s=new String();
s="abc";
s="xyz";
Does the String s no longer point to "abc"?
And another thing: what is size of s;
Is the String object analogous to a char* pointer in C in terms of the memory model?

No, Java String is not as char* in C. If you are looking for the analogue char[] in java is something like this.
String is a class that wraps char array and provides a lot of functionality. Moreover it is immutable, i.e. you cannot change its content. You can only create another string. Additionally String is final, so you cannot subclass it.
String is special class. Only string supports operator (+). All other classes do not support operators at all, even primitive wrappers (Integer, Double etc). Presence of string constant in code "foobar" invokes java.lang.String constructor.

Yes, Strings in Java are immutable. That is to say, the content pointed too by a string variable cannot be changed after it has been initialized. Using your own examples to illustrate:
String s=new String();
In the above code, you have created a new, empty String and assigned it to the s variable.
s="abc";
You have created another new String, this time with content "abc". You have set the variable s to point to this new String. Your previous, empty string will get garbage collected at some point in the future.
s="xyz";
Similar to above. You have created another new String, with content "xyz", and set the variable s to point to it. The previous "abc" string will get garbage collected at some point in the future.
Note that at no point did you actually modify the empty string to become "abc", or modify the "abc" string to become "xyz". All operations on String that concatenate, convert case, or otherwise appear to modify the String actually return a new String with the function results. To illustrate:
String s = new String("Hello");
String b = s.concat(" World");
System.out.println(s); // This will NOT produce 'Hello World'
System.out.println(b); // Whereas, this will
As to the last part of your question, you would use the length() function to determine the 'size' of your string. Note that this is based on Unicode encoding, and might not be what you expect the length to be as defined in ANSI C.

Que: Does the String s no longer point to "abc"?
Ans: No, it point to "xyz" regarding your code.
Que: what is size of s?
Ans: Its a bit tricky, "s" is the reference to "xyz". It is true for any declared variables which are referencing an object. So it is better to say, s is pointing to "xyz".
One important hint, memory of an object & referencing an object takes different memory space.
Immutable: A object is immutable if you cant change its property after instantiation.

What is the purpose of the expression "new String(...)" in Java?

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.
For example:
String s;
...
s = new String("Hello World");
This, of course, compared to
s = "Hello World";
I'm not familiar with this syntax and have no idea what the purpose or effect would be.
Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?

The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in
small=new String(huge.substring(10,20))
However, this behavior is unfortunately undocumented and implementation dependent.
I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.
The only implementation agnostic way to do this is:
small=new String(huge.substring(10,20).toCharArray());
This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.
There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).
Pitfall of Assuming what the Doc Doesn't State
In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:
public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}
That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".
Be careful that you program to the documentation and not one implementation.

The only time I have found this useful is in declaring lock variables:
private final String lock = new String("Database lock");
....
synchronized(lock)
{
// do something
}
In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.

String s1="foo"; literal will go in StringPool and s1 will refer.
String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.
String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.
String s4=new String("foo"); same as s3
so System.out.println(s1==s2); //true due to literal comparison.
and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)

The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7.
There is no longer an offset field in class String, and substring always use
Arrays.copyOfRange(char[] original, int from, int to)
to trim the char array for the copy.

Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.
But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).
In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); **//DON'T DO THIS!**
(Effective Java, Second Edition)

Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):
A consequence of the fact that immutable objects can be shared freely
is that you never have to make defensive copies of them (Item
50). In fact, you never have to make any copies at all because the
copies would be forever equivalent to the originals. Therefore, you
need not and should not provide a clone method or copy constructor
(Item 13) on an immutable class. This was not well understood in the
early days of the Java platform, so the String class does have a copy
constructor, but it should rarely, if ever, be used.
So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.

Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.
Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.
So you see code that has
int q;
for(q=0;q<MAX;q++){
String s;
int ix;
// other stuff
s = new String("Hello, there!");
// do something with s
}
In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.
IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.

There are two ways in which Strings can be created in Java. Following are the examples for both the ways:
1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory.
eg: String str = "JAVA";
2)Use the constructor of String class and pass a string(within double quotes) as an argument.
eg: String s = new String("JAVA");
This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.

I guess it will depend on the code samples you're seeing.
Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.
You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.
While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.