I know that When we use String literals as given below the String object is created in String pool (if it doesn't exist).
String str1= "hello";
String str2= "hello";
In above case only one string object will be created in pool.
But, when we use new keyword it always creates a new String object in heap memory (even though there is one in String pool)
String str3=new String("hello"); // here a new object will be created in heap.
Here, i have one confusion regarding how many objects will be created in below cases and where (pool or heap memory).
1) String s="Hello";
String s1 = new String ("Hello");
2) String s = new String("Hello");
String s1 = new String("Hello");
3) String s="Hello";
String s1=new String (s);
4) String s1 = new String ("Hello");
String s="Hello";
Every invocation of new String(...) will create a new instance. You can use String.intern() to get an instance from the pool.
String s="Hello";
String s1 = new String ("Hello");
System.out.println(System.identityHashCode(s)==System.identityHashCode(s1));
String si= new String ("Hello").intern();
String s1i = new String ("Hello").intern();
System.out.println(System.identityHashCode(si)==System.identityHashCode(s1i));
This prints false and true
We can account for memory in Java’s String objects in the same way as
for any other object, except that aliasing is common for strings.
The standard String
implementation has four instance variables: a reference to a character array (8 bytes)
and three int values (4 bytes each). The first int value is an offset into the character array;
the second is a count (the string length).
In terms of the instance variable names in
the drawing on the figure, the string that is represented consists of the characters
value[offset] through value[offset + count - 1]. The third int value in String
objects is a hash code that saves recomputation in certain circumstances.
Therefore, each String object uses a total of 40 bytes (16 bytes for
object overhead plus 4 bytes for each of the three int instance variables plus 8 bytes for
the array reference plus 4 bytes of padding).
This space requirement is in addition to
the space needed for the characters themselves, which are in the array. The space needed
for the characters is accounted for separately because the char array is often shared
among strings. Since String objects are immutable, this arrangement allows the implementation
to save memory when String objects have the same underlying value[].
String values and substrings.
A String of length N typically uses 40 bytes (for the
String object) plus 24 2N bytes (for the array that contains the characters) for a
total of 64 + 2N bytes. But it is typical in string processing to work with substrings, and
Java’s representation is meant to allow us to do so without having to make copies of the string's characters!
Source: Algorithms 4th Edition
How many objects are being created for your examples?
1) 4
2) 4
3) 3
4) 4
Note that each String object contains a char-array with the content of the string. So when creating a new String you actually create two objects.
1) 2) and 4)
Each line in your examples either creates a String in the pool which contains a char-array (therefore we have two objects) or create a new String which - again - contains a char-array. Note that in neither of these examples the strings share any of the content.
3)
This example is different since we use the first String (2 objects) to create the second String. In this case the second String will be a new object but it will use the very same char-array as the first one, therefore not creating a new one. This leads to a total of only 3 objects instead of 4.
One more example
String s1 = "Hello";
String s2 = "Hello";
In this case we will have only 2 objects, since s1 and s2 will both point to the same String object in the pool with the same char-array.
Related
This question already has answers here:
String concatenation: concat() vs "+" operator
(12 answers)
Closed 1 year ago.
What does actually happens when we concatenate a string S2 of size Y to a string S1 of size X ( already present on the heap) using the + operator?
This is what I think:
If I execute the following function:
class StringConcatenation{
String S1;
String concat(String S2){
this.S1 = this.S1 + S2;
}
}
If S1 was present in the string pool (which is stored in the heap) and we execute the concat method, then the method gets executed on the stack.
So, the CPU will need to copy the S1 in the stack => READ S1
As strings are immutable, Java must make a new object (let's name its reference as S3).
Now, contents of S1 and S2 are copied at new object => COPY S1 + COPY S2
Then the reference S1 points to the new object.
Therefore, the total time complexity is O(READ S1 + COPY S1 + COPY S2 ) = O(X + X + Y) = O(2*X + Y).
Is my thought process correct?
in the doc you can read:
The Java language provides special support for the string concatenation operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method.
a = a + b is the equivalent of a += b or:
a = new StringBuilder()
.append(a)
.append(b)
.toString();
Disclaimer: The String class has undergone multiple changes to improve performance and space utilization. What happens when JIT compiles code is then entirely undefined. The following is a simplification, and ignores any optimizations that may or may not be applied.
String is a class that encapsulates a char[]. The array length is always exactly the length() of the string. The class, and the underlying array, is immutable.
class String {
private final char[] arr;
}
StringBuilder (and StringBuffer) is another class that encapsulates a char[], but the array is almost always larger than the number of characters in the array. The class, and the array, is mutable.
class StringBuilder {
private char[] arr;
private int len;
}
When you do string concatenation with the + operator, the compiler generates that as:
// Java code
s = s1 + s2 + s3;
// Generated code
s = new StringBuilder().append(s1).append(s2).append(s3).toString();
StringBuilder will initially create the array with length 16, and will re-allocate the array when needed. Worst case is that s1, s2, and s3 are all too large for the current array, so each append() call needs to re-size the array.
This means that the would progress as follows:
new StringBuilder() - Creates char[16].
append(s1) - Resizes arr, then copies chars from s1.arr to the array.
append(s2) - Resizes arr, copies existing content (chars from s1) to new array, then copies chars from s2.arr to the array.
append(s3) - Resizes arr, copies existing content (chars from s1 and s2) to new array, then copies chars from s3.arr to the array.
toString() - Create new String with char[] sized to exactly fit the characters in the StringBuilder, then copies the content (chars from s1, s2, and s3) to the new String.
All-in-all the chars from s1 ends up being copied 4 times.
If the string concatenation is S1 + S2, like in the question, then the characters from S1 are copied 2 or 3 times, and the characters from S2 are copied 2 times.
Since time complexity is generally worst case, that means O(3m + 2n), not the O(2m + n) suggested in the question. Of course, Big-O eliminates constant factors, so it is actually O(m + n).
This question already has answers here:
Strange behavior with string interning in Java
(4 answers)
Closed 5 years ago.
I was reading about String in java and was trying to understand it.
At first, it was easy how String s1="11" and String s2=new String ("11") works(created) and I understood intern method also.
But I came across this example (Given by a friend) and made me confused about everything.
I need help to understand this.
String s1 = new String(new String("2")+new String("2"));
s1.intern();
String s2="22";
System.out.print(s1==s2); //=>true as output.
String s3 =new String (new String("2")+new String("2"));
s3.intern();
String s4="22";
System.out.print(s3==s4); //=>false as output.
Answer of this code is true and false.
Part for S1 and s2 was good and was true according to my understanding but the second part I didn't understand.
Hope someone can break the code line by line and help me understand.
s1.intern(); adds s1 to the pool of strings, therefore the string "22" is now in the pool of strings. Therefore when you write s2 = "22" that's the same "22" as s1 and thus s1 == s2.
s3.intern() does NOT add s3 to the pool of strings because the string "22" is already there.
s3.intern() does return that same "22" which is s1 BUT IT IS NOT USED. Therefore s3 is not equal s4.
In java exist the heap and the stack,
Heap is where all Objects are saved
stack is where vars are saved
Now also exist another kind of list for Strings and Integers (numbers)
As you know a String can be created in some ways like
like new String("word") or just = "word" when you use the first way you create a new object (heap) when you use the other you save the word in a stack of words (Java engenniers thought it would be good if you don't create manny objects or words are repeated so they created an special stack for words, same for Integers from 0 to 127) So as I said You have to know that there is an stack and a Heap look at this example
String wordOne ="hola";
String wordTwo = "hola";
String wordTres = "hola";
System.out.println(wordOne == wordTwo);
System.out.println(wordTres == wordTwo);
System.out.println(wordOne == wordTres);
String wordFour = new String("hola");
System.out.println(wordOne == wordFour);
Integer uno = 127;
Integer dos = 127;
System.out.println(uno == uno);
Integer tres = 128;
Integer cuatro = 128;
System.out.println(tres == cuatro);
String x = "word"; is saved in an special Stack
String y = new String("it is not");
But tbh I don't remeber so well the rules for tha stack, but in any case i recomend you to compare all words using wordX.equals(wordY)
An also numbers in objects could be compared using == from 0 to 127 but the same if you use objects use equals, although using numbers there is a better do to do it in spite of use equals, convert one number to a primitive value (the memory will be better)
When you are making string with new keyword,JVM will create a new string object in normal(non pool) heap memory and the literal will be placed in the string constant pool. In your case, The variable s1 will refer to the object in heap(non pool).
String s1 = new String(new String("2")+new String("2"));
But in the next line your are calling intern() method.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
Check Javadocs.
As "22" is not in string pool, a new string literal "22" will be created and a reference of it will be returned. When you are writing:
String s2="22";
it simply refers "22" in string pool. But calling s3.intern() will not create a new string literal as "22" exists in the pool. Check the Javadocs for intern() again. It says if exists in pool, then string from the pool is returned not reference. So, this time s3 references to a different object.
But s4 is referred to same object as s1,s2.
You can print the objects hashcode for checking if the are same or not. Like:
System.out.println(System.identityHashCode(s1));
Notice that the type String is capitalized and is not one of Java's 8 primitive types (int, boolean, double, char, etc.). This indicates that any instance of a String is an object that was built using the 'blueprint' of the class String. Because variables in Java that refer to objects only store the memory address where the actual object is stored, when you compare Strings with == it compares memory location.
String str1 = new String("hello");
String str2 = str1; //sets str1 and str2 pointing to same memory loc
if (str1 == str2){
//do stuff; the code will enter this if-statement in this case
}
The way to compare the values within objects in Java is with equals(), such as:
String str1 = new String("hello");
String str2 = new String("hello"); //str2 not same memory loc as str1
if (str1.equals(str2)){
//do stuff; the code will enter this if-statement in this case
}
This is a common error for beginners, since the primitive types are not objects and you CAN compare two ints for equality like:
int one = 1; //primitive types are NOT objects
int two = 2; //notice when I make an int, I don't have to say "new"
//which means a new **object**
if (int1 == int2) {
//do stuff; in this case the program will not enter this if-statement
}
It seems that you understand everything but the meaning of the very last line. See my comment on the last line.
String s1 = new String(new String("2")+new String("2")); //declare AND initialize s1 as a new String object
s1.intern();
String s2="22"; //declare a new variable s2 and point it to the same object that s1 is pointing to
System.out.print(s1==s2);
String s3 =new String (new String("2")+new String("2"));
s3.intern();
String s4="22";
System.out.print(s3==s4); //check if s3 and s4 are stored in the same memory location = FALSE
In java object1 == object2 means
that do object1 and object2 have the same address in memory?
object1.equals(object2)
means are they equal, for example do they have the same values of all fields?
So, For two Strings S1 and S2,
string1.equals(S2) means, do they have the same characters in the same sequence?
S1 == S1 means are string1 and string2 stored at the same address in memory?
If i write :
String s = new String("abc");
It creates an object of type String with the value "abc".
And if i write :
String s = "abc";
This also creates an object with value "abc".
How does without encountering new keyword, an object is created.
Also if i write:
s.append("def");
It creates two more string objects that is:
a. "abcdef"
b. "def"
So on encountering anything within double inverted commas make a new String object.
How does that happen?
It is called a String literal, and is specified in the JLS 3.10.5:
A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF.
"abc" is a String literal , defined by the specifications of the language.
Refer JLS 3.10.5:
A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF.
Suggested Reading:
Difference between string object and string literal.
Java String Pool
Also , read String is Really Special:
String literals are stored in a common pool. This facilitates sharing of storage for strings with the same contents to conserve storage. String objects allocated via new operator are stored in the heap, and there is no sharing of storage for the same contents.
String s1 = "Hello"; // String literal
String s2 = "Hello"; // String literal
String s3 = s1; // same reference
String s4 = new String("Hello"); // String object
String s5 = new String("Hello"); // String object
s1 == s1; // true, same pointer
s1 == s2; // true, s1 and s1 share storage in common pool
s1 == s3; // true, s3 is assigned same pointer as s1
s1.equals(s3); // true, same contents
s1 == s4; // false, different pointers
s1.equals(s4); // true, same contents
s4 == s5; // false, different pointers in heap
s4.equals(s5); // true, same contents
In Java, when you do "hello", the java compiler creates a String Object similar to new String("hello"). This object is maintained in the String Pool (String#intern()).
SO doing String i = "hello". It checks if i exists in the pool by checking from String#intern(). If yes, it reuses it else it creates a new one and puts in the pool. See this for more info.
As all of them have answered "abc" is a String literal. And for your second part of question, i.e, why after executing statement s.append("def") two objects are created because in Java String is immutable.
Short version: If you call string.substring(n,m).intern(), does the string table retain the substring or the original string?
...But I'm not sure that's the right question to ask, so here's the long version:
I'm working with legacy Java code (PCGen) that parses files by slurping each in as one big string and then using String.split, .trim, .substring, and StringTokenizer to decompose them into tokens. This is very efficient for parsing, because none of those methods copy the original string, but all point at parts of a shared char[].
After parsing is over, I want to reclaim some memory. Only a few small substrings of the original big string are needed, but the strong reference keeps the big string from being collected. And later I'm suffering OOM, I believe due in part to that huge heap impact of lots of parsed files.
I know I can trim the big string down via new String(String)(copy-on-write). And I know I can reduce string duplication via String.intern (which is important because there's a lot of redundancy in the parsed files). Do I need to use both to reclaim the greatest quantity of heap, or does .intern() do both? Reading the OpenJDK7 hotspot source code (hotspot/src/share/vm/classfile/symbolTable.cpp) it looks like the string table keeps the whole string and does not trim it for offset/length at all. So I think I need to make a new String and then intern that result. Right?
All that said, switching to a streaming parser would be a big win in terms of memory, but that's too big a change for the short term.
You can use new String(String) and the intern() method and this will take a copy as required for up to Java 7 update 4. From Java 7 update 5 substring will take a deeper copy, but you may still want to use intern(). Note: Java 7 uses the heap, not the perm gen to store String literals.
public static void main(String[] args) {
char[] chars = new char[128];
Arrays.fill(chars, 'A');
String a128 = new String(chars);
printValueFor("a128", a128);
String a16 = a128.substring(0, 16);
printValueFor("a16", a16);
}
public static void printValueFor(String desc, String s) {
try {
Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
char[] valueArr = (char[]) value.get(s);
System.out.println(desc + ": " + Integer.toHexString(System.identityHashCode(valueArr)) + ", len=" + valueArr.length);
} catch (Exception e) {
throw new AssertionError(e);
}
}
on Java 7 update 4 prints
a128: 513e86ec, len=128
a16: 53281264, len=16
I would expect that Java 6 does not do this.
We can test it. String holds its character array in a field
private final char value[];
let's see what happens after substring(); intern();
Field f = String.class.getDeclaredField("value");
f.setAccessible(true);
String s1 = "12345";
String s2 = s1.substring(1, 2);
String s3 = s2.intern();
System.out.println(f.get(s2) == f.get(s1));
System.out.println(f.get(s3) == f.get(s2));
output
true
true
that is, all 3 strings share the same character array
I'm trying to understand if this code below creates 12 objects for a string like "stephan"
public String reverse(String str) {
if ((null == str) || (str.length() <= 1)) {
return str;
}
return reverse(str.substring(1)) + str.charAt(0);
}
This recursively reverses a string. I understand how it works. But I was thinking if there is a relationship in this case between the length of the strings and number of string objects created through concatenation?
Yes, it will create tons of string objects.
Every recursive call to "reverse()" will create 2:
str.substring(1) will create a new String object
reverse() call will create a new string for its return value, but we will NOT count that since that's counted when analyzing that recursive call (e.g. it will be the string from bullet point #3 from the next reverse() call).
And since Java Strings are immutable, adding a char via "+" will create a second String object.
Therefore, for a string of length N, it will create (N-1)*2 objects (since a reverse of 1-char string does NOT create new strings); so for "stephan"'s 7 characters, it will create 6*2=12 string objects.
Theorem:
When a string is N characters long, #Phoenix's reverse implementation will create (N-1)*3 new objects.
Proof (by induction):
When str is 1 character long, it is returned directly. (1*1)*3 = 0.
When str is N characters long:
a new String will be created by .substring(1).
by the induction hypothesis, the call to reverse(...) will be returned after (N-2)*3 objects have been created.
a new StringBuilder will be created to append the string and first char (you can see this by de-compiling your byte-code).
a new String will be created by StringBuilder.toString()--this is the return value.
Altogether, there were 3 + (N-2)*3 = (N-2 + 1)*3 = (N-1)*3 objects created.
QED.
[Edit] StringBuilders:
StringBuilder (extending AbstractStringBuilder) does its own fancy footwork:
When an StringBuilder is constructed, it is initialized with a char[] of size 16.
When you append something more than it's present size, it throws that away and creates a new char[] of size (<old size> + <size of new data> + 1) * 2.
So, as soon as your input string is > 16 characters, you have essentially 2x as much StringBuilder capacity as you need. (When the input string size is less, you've got more char[] than you need.)
Considering Strings are essentially char[]s (with a few ints for good measure), you're effectively using 4 times the length of the substring in char[]s -- at each step. :(