String objects and the heap

String objects and the heap - java

I am studying for the SCJP exam and I have a sample set of questions that I am working through.
One questions answer I am unsure about and was hoping someone here might be able to help me put this question to bed.
Here is the question,
Given:
11. public String makinStrings() {
12. String s = "Fred";
13. s = s + "47";
14. s = s.substring(2, 5);
15. s = s.toUpperCase();
16. return s.toString();
17. }
How many String objects will be created when this method is invoked?
A. 1
B. 2
C. 3
D. 4
E. 5
F. 6
Thank you in advance for any help offered.
I greatly appriciate it.

Let's go through it line by line.
Line 11
An easy start, no strings created here.
Line 12
We're assigning the String "Fred" to s. Although it looks like a String is created here, this string will live in the constant pool. The JVMS section 2.17.6 Creation of New Class Instances guarantees that the objects for string literals will at latest be created when the surrounding class is loaded, which by definition is before the method is invoked. So no new string objects are created on this line.
Line 13
The literal string "47" is referenced, which again will have been created statically (as above). However there's also the invocation of the + operator, which will create a new String in order to hold the result of the concatenation. So that's the first string created.
Line 14
The substring method does indeed create a new String. It shares the underlying character array with its parent - and so takes up hardly any extra memory - but since Strings are immutable, each different string representation requires a different String object. (This is probably a gotcha - my first instinctive response was "ah, string created by substring are special" but of course it still has to create a new object).
Line 15
As above - the uppercase representation is different, so a new String must be created to hold the result.
Line 16
Strings override the toString() method to simply return this - hence no additional String is created.
The scores on the doors
By my count that's three String objects created during this method (with two of those objects sharing the same underlying character array, and with two pre-existing objects referenced for the string literals).

Actually, it would be possible to make the whole method into a single constant. It's possible, but the compiler isn't allowed to do so. Hence, there are 3 Strings created using 2 from the constant pool.
Fred47
ed4 (note: using same backing char[] as Fred47 though)
ED4
2 and 3 are pretty easy as the compiler isn't allowed to optimize away this method invocations but the String is changed. Sting.toString() only returns this, so no new String either. But let's have a look on line 13 using disassembled byte code (javap -c is your friend here):
public java.lang.String makinStrings();
Code:
0: ldc #16; //String Fred
2: astore_1
3: new #18; //class java/lang/StringBuilder
6: dup
7: aload_1
8: invokestatic #20; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
11: invokespecial #26; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
14: ldc #29; //String 47
16: invokevirtual #31; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: invokevirtual #35; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
// SNIP
}
As you see, "Fred" and "47" are loaded from the constant pool (ldc) to populate a StringBuilder that will finally become a String (StringBuilder.toString()).
So that makes 2 constant Strings plus 3 newly created Strings per method invocation.

I'd say 3: the ones at lines 12, 13 and 15.
The reason why the line 14 (substring) doesn't create a new object is because of the internal way String works. Due to necessary optimization of substring (everything, including compiler, rely on substring), the String class has two pointers to the start and end of the string. Doing a substring only moves this pointers, and does not "copy" the object into a new one.

it will create the 5 objects of string
string is immutable class so for every new string it will create an object.
example
public String makinStrings() {
String s = "Fred"; (this line create 1--- Fred string)
s = s + "47"; (this line create 2- 47 string + 3- Fred47)
s = s.substring(2, 5); (this line create 4-ed4)
s = s.toUpperCase();(this line create 5-ED4)
return s.toString();
So according to me it will create 5 object

Related

How can the methods `makeConcat` and `makeConcatWithConstants` in `StringConcatFactory` used by directly calling the API?

I believe since Java 9 string concatenation has been implemented using StringConcatFactory.
Since this is provided as an API in Java how are the methods makeConcat and makeConcatWithConstants in StringConcatFactory used by directly calling the API? I so far could not find any examples of the different ways to use it. Also, what the parameters String name, MethodType concatType in makeConcat and makeConcatWithConstants and parameters String recipe, Object... constants in makeConcatWithConstants mean and what should be passed to them are not self-evident to me from the Java Docs.

You are not supposed to call this API directly. The class has been designed to provide bootstrap methods for an invokedynamic instruction, so its API is straight-forward for that use case, but not for direct invocations.
But the documentation is exhaustive:
Parameters
lookup - Represents a lookup context with the accessibility privileges of the caller. When used with invokedynamic, this is stacked automatically by the VM.
name - The name of the method to implement. This name is arbitrary, and has no meaning for this linkage method. When used with invokedynamic, this is provided by the NameAndType of the InvokeDynamic structure and is stacked automatically by the VM.
concatType - The expected signature of the CallSite. The parameter types represent the types of concatenation arguments; the return type is always assignable from String. When used with invokedynamic, this is provided by the NameAndType of the InvokeDynamic structure and is stacked automatically by the VM.
Emphasis added by me
Note how all parameters are normally provided by the JVM automatically on the basis of the invokedynamic bytecode instruction. In this context, it’s a single instruction consuming some arguments and producing a String, referring to this bootstrap method as the entity knowing how to do the operation.
When you want to invoke it manually, for whatever reasons, you’d have to do something like
String arg1 = "Hello";
char arg2 = ' ';
String arg3 = "StringConcatFactory";
MethodHandle mh = StringConcatFactory.makeConcat(
MethodHandles.lookup(), // normally provided by the JVM
"foobar", // normally provided by javac, but meaningless here
// method type is normally provided by the JVM and matches the invocation
MethodType.methodType(String.class, String.class, char.class, String.class))
.getTarget();
// we can now use the handle to perform a concatenation
// the argument types must match the MethodType specified above
String result = (String)mh.invokeExact(arg1, arg2, arg3);
System.out.println(result);
You could re-use the MethodHandle for multiple string concatenations, but your are bound to the parameter types you’ve specified during the bootstrapping.
For ordinary string concatenation expressions, each expression gets linked during its bootstrapping to a handle matching the fixed number of subexpressions and their compile-time types.
It’s not easy to imagine a scenario where using the API directly could have a benefit over just writing arg1 + arg2 + arg3, etc.
The makeConcatWithConstants bootstrap method allows to specify constant parts in addition to the potentially changing parameters. For example, when we have the code
String time = switch(LocalTime.now().get(ChronoField.HOUR_OF_DAY) / 6) {
case 0 -> "night"; case 1 -> "morning"; case 2 -> "afternoon";
case 3 -> "evening"; default -> throw new AssertionError();
};
System.out.println("Hello "+System.getProperty("user.name")+", good "+time+"!");
we have several constant parts which the compiler can merge to a single string, using the placeholder \1 to denote the places where the dynamic values have to be inserted, so the recipe parameter will be "Hello \1, good \1!". The other parameter, constants, will be unused. Then, the corresponding invokedynamic instruction only needs to provide the two dynamic values on the operand stack.
To make the equivalent manual invocation more interesting we assume the system property user.name to be invariant, hence we can provide it as a constant in the bootstrap invocation, use the placeholder \2 to reference it, and produce a handle only consuming one dynamic argument, the time string:
MethodHandle mh = StringConcatFactory.makeConcatWithConstants(
MethodHandles.lookup(), // normally provided by the JVM
"foobar", // normally provided by javac, but meaningless here
// method type is normally provided by the JVM and matches the invocation
MethodType.methodType(String.class, String.class),
"Hello \2, good \1!", // recipe, \1 binds a parameter, \2 a constant
System.getProperty("user.name") // the first constant to bind
).getTarget();
// we can now use the handle to perform a concatenation
// the argument types must match the MethodType specified above
String result = (String)mh.invokeExact(time);
System.out.println(result);
Ordinary Java code will rarely make use of the additional constants. The only scenario I know of, is the corner case of having \1 or \2 in the original constant strings. To prevent them from being interpreted as placeholders, those substrings will be provided as constants then.
As demonstrated in this online code tester, the code
String time = switch(LocalTime.now().get(ChronoField.HOUR_OF_DAY) / 6) {
case 0 -> "night"; case 1 -> "morning"; case 2 -> "afternoon";
case 3 -> "evening"; default -> throw new AssertionError();
};
System.out.println("Hello "+System.getProperty("user.name")+", good "+time+"!");
String tmp = "prefix \1 "+time+" \2 suffix";
gets compiled to (irrelevant parts omitted):
0: invokestatic #1 // Method java/time/LocalTime.now:()Ljava/time/LocalTime;
3: getstatic #7 // Field java/time/temporal/ChronoField.HOUR_OF_DAY:Ljava/time/temporal/ChronoField;
6: invokevirtual #13 // Method java/time/LocalTime.get:(Ljava/time/temporal/TemporalField;)I
9: bipush 6
11: idiv
12: tableswitch { // 0 to 3
0: 44
1: 49
2: 54
3: 59
default: 64
}
44: ldc #17 // String night
46: goto 72
49: ldc #19 // String morning
51: goto 72
54: ldc #21 // String afternoon
56: goto 72
59: ldc #23 // String evening
61: goto 72
64: new #25 // class java/lang/AssertionError
67: dup
68: invokespecial #27 // Method java/lang/AssertionError."<init>":()V
71: athrow
72: astore_1
73: getstatic #31 // Field java/lang/System.out:Ljava/io/PrintStream;
76: ldc #37 // String user.name
78: invokestatic #39 // Method java/lang/System.getProperty:(Ljava/lang/String;)Ljava/lang/String;
81: aload_1
82: invokedynamic #43, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
87: invokevirtual #47 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
90: aload_1
91: invokedynamic #53, 0 // InvokeDynamic #1:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
96: astore_2
BootstrapMethods:
0: #150 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#151 Hello \u0001, good \u0001!
1: #150 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#153 \u0002\u0001\u0002
#155 prefix \u0001
#157 \u0002 suffix

The documentation that you linked states, "These methods are typically used as bootstrap methods... to support the string concatenation feature", that means, they are used by the compiler to prepare to do String concatenation - in other words, not intended to be used by programmers (as I remember, they basically create a (kind of lambda) method to be called for concatenation).
Easy way to use that: the + operator to concatenate string and let the compiler do the job. If you really want to use that methods directly (Why? it is not that trivial), create a class using concatenation and check the generated javacode with some decompiler.
see also: What is a bootstrap method? (not 100% related, but the last sentence)

Creating a char array results in an Object in java bytecode

I've hit kind of a wall, trying to write a simple compiler in Java, using ASM. Basically, I am trying to add strings of characters together, and cannot work out why my code fails to do so. The problem lies with how the following lines of code compile:
char[] p;
p = "Hi";
p = p + i[0];
Where i is an initialized array. The line p = "Hi"; compiles as:
bipush 2;
newarray t_char;
dup;
bipush 0;
ldc h;
castore;
dup;
bipush 1;
ldc i;
castore;
Note that I am deliberately treating the string "Hi" as a char array, instead of directly as a String object. When decompiled, it reads as:
Object localObject1 = { 'H', 'i'};
And thus, as {'H', 'i'} is not a proper constructor for Object, the program does not execute. Now, my confusion, and the reason I came to stackoverflow with this is that when the line line p = p + i[0]; is removed from the program, or replaced with one not using an array, such as p = p + 5;, the line p = "Hi"; compiles, again, in the exact same way:
bipush 2;
newarray t_char;
dup;
bipush 0;
ldc h;
castore;
dup;
bipush 1;
ldc i;
castore;
And when decompiled, the same line reads as:
char[] arrayOfChar1 = {'H', 'i'};
The program runs just fine. I have absolutely no idea what is going on here, nor any about how to solve it.
To decompile the .class files, I am using this decompiler.
I would like to know why the exact same bytecode decompiles differently in these 2 cases.

In general, you can not expect to be able to recompile decompiled code. Compilation and decompilation are both lossy processes. In particular, bytecode does not have to contain explicit types like Java source code does, and the type checking rules for bytecode are much laxer than the source level type system.
This means that when decompiling the code, the decompiler has to guess at the type of local variables (unless the optional debugging metadata was included with the compiled class). In some cases, it guessed Object, which led to a compilation error. In other cases, it guessed char[]. If you want a more in depth explanation, you could dive into the decompiler source code, but the real issue is expecting the decompiler to magically give good results in the absence of type information in the first place.
Anyway, if you want to edit already compiled code, you shouldn't use a decompiler. Your best bet is to use an assembler/disassembler pair like Krakatau, which allows you to edit classfiles losslessly at the bytecode level (assuming you understand bytecode).

How to prevent "notice: 'StringBuilder sb' can be replaced with 'String'" in intellij-idea

when I use the IDE -"IDEA 14.03", it always give this notice for me. notice: 'StringBuilder sb' can be replaced with 'String'
Here is the details, when I define a object named "sb",and the object class is "StringBuilder". Here is code snippet that I tried:
StringBuilder sb = new StringBuilder("status=").append(status).append(" ,msg=").append(msg);
System.out.println(sb);
I just want to know what are the benefit if I change the "StringBuilder" to "String". And why the IDE always notify me to change the class type?

I think in order to understand why your IDE tells you to change StringBuilder to String, you should understand the differences between String, StringBuffer and StringBuilder.
String is immutable. That means if you want to change something from the your string, the original string will not be deleted but created a new one, which includes your changes. StringBuffer and StringBuilder are mutable. That means with your changes, the original string will be changed accordingly.
The another main difference between them is that String and StringBuffer are thread-safe while StringBuilder is not. There are also other differences, please have a look at this site to learn more about the differences.
If you compare String with StringBuilder, on most cases, using String is more practical and logical, if you do not know, what you do with your string.
It is not always better to concatenate string with plus sign (+). For example, StringBuilder's append method is more logical if you change your string in a loop because of its mutability. Please read the comments in the code;
String a;
StringBuilder b;
for(int i=0; i<5; i++)
{
a += i; //String is immutable and in each iteration, a new object will be created
b.append(i); //StringBuilder is mutable and in each iteration, the existing string will be used.
}
What your IDE makes is just show you the best practices. That is why, it is called as recommendation.
If you want to go on your way anyway and do not want Intellij warn you about it; you can disable the warning like;
EDIT
#CrazyCoder's comment is important to note here.
IDE is actually very smart here, it suggests you to change it for better code readability since internally compiler will generate exactly the same bytecode and your code will have the same performance and the same memory usage, but it will be easier to read. You get a readability benefit without any performance compromises. Similar question was asked and answered in IntelliJ IDEA forum some time ago.

I know it's an old question that was already answered with a very good answer, but just a tiny unrelated comment:
In such cases, for the sake of readability, (and since the compiler does the same anyway as Ad mentioned) I would use String.format.
instead of
StringBuilder sb = new StringBuilder("status=").append(status).append(" ,msg=").append(msg);
I find this more readable:
String.format("status=%s, msg=%s", status, msg);

Because code 1
String s1 = "a";
String s2 = "b";
String result = s1 + s2;
With code 2
String s1 = "a";
String s2 = "b";
String result = new StringBuilder().append(s1).append(s2).toString();
After being compiled into bytecode, the results of the two pieces of code are the same. code 1 will be optimized to StringBuilder in bytecode by the compiler:
L0
LINENUMBER 15 L0
LDC "a"
ASTORE 1
L1
LINENUMBER 16 L1
LDC "b"
ASTORE 2
L2
LINENUMBER 17 L2
NEW java/lang/StringBuilder
DUP
INVOKESPECIAL java/lang/StringBuilder.<init> ()V
ALOAD 1
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
ALOAD 2
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
INVOKEVIRTUAL java/lang/StringBuilder.toString ()Ljava/lang/String;
ASTORE 3
L3
LINENUMBER 18 L3
RETURN
L4
LOCALVARIABLE args [Ljava/lang/String; L0 L4 0
LOCALVARIABLE s1 Ljava/lang/String; L1 L4 1
LOCALVARIABLE s2 Ljava/lang/String; L2 L4 2
LOCALVARIABLE result Ljava/lang/String; L3 L4 3
MAXSTACK = 2
MAXLOCALS = 4
From the perspective of bytecode, IDAE believes that these two writing methods are equivalent, and the writing method of code 1 is more concise. So it is recommended to use code 1
ps: If you don't like it, you can turn off this prompt :)
The above test is based on jdk1.8

Although the code is easier to read with String concatenation, for Java 8 and below, this is implemented with a StringBuilder, so on the surface, it looks like your just hurting yourself.
However, StringBuilder has a default capacity of 16. If you dig in the source code for StringBuilder you'll see that the realloc uses:
int newCapacity = (value.length << 1) + 2;
So, basically you double every time. So, for arbitrary string of length, say 100, you'll end up allocating space and copying 4 times with capacities of 16, 32, 64, and finally 128.
However, you could do something like:
new StringBuffer(128).sb.append("status=").append(status).append(" ,msg=").append(msg).toString();
And, you've saved yourself 3 allocs and array copies.
As I understand it, Java 9+ has a much better solution. So, Knowing that this will be fixed, I usually use Strings unless I know performance is a concern and then I revert StringBuilder's. This is definitely the case if you're working on embedded systems or Android as resources are scarce. On a cloud server, not so much.
As for the IntelliJ inspector, I turned it off, but I usually add a JAVA10 tag in a comment, so that I can find these later and revert them when we migrate to 10. Maybe, I just need to remember to re-enable the inspector instead. :)
PS- I like Udi's answer. I'll look into the implementation.

And why the IDE always notify me to change the class type
java 8: the same effect of the codes, but string concatenation
String r = s1 + s2;
is more readable

Object for the String [duplicate]

This question already has an answer here:
String s = "a" + "b" + "c"; Can anyone tell for this statement how many object will be created [duplicate]
(1 answer)
Closed 8 years ago.
How many object will be created for this syntax
String a="b" +"c" +"d";
I tried asking with different people some say it will create 4 object some say 1 object.

Only one!
Expression String a="b" +"c" +"d"; is a compile time constant and after compilation you will just have one String instance and that is "bcd"

For a constant concatenation: only the resulting String.
If there are String variables involved, a StringBuilder is new-ed and then everything appended, so the second object is the resulting String.

One!
Since your creating ONE String object, "b","c","d" are constants, the compiler won't allocate a space for them on the stack.

The jls spec clearly states that 1 String will result as a result of a constant expression.
That still results in more than one object, as the String itself is composed of a char[].

One object will be created!
String a="b" +"c" +"d"; bcd will show up at the class constant pool .(literal object)awesome answer by fellow stackoverflow member
You can even see this in the bytecode:
LDC "bcd"
ASTORE 1

One object will be created at the constant pool

where do actual parameters in java store [duplicate]

This question already has answers here:
String Constant Pool
(5 answers)
Closed 9 years ago.
If I pass a String literal to some metohd as:
String s=new String("stack");
String s2=s.concat("overflow");
where string "overflow" will be stored.
one of my friends arguing that it is created in String constant pool and I'm opposing him.
please let me know
Thanks in advance.

All String literals go in the constant pool. The End. In this case, two constants, "stack" and "overflow", go into the pool. A new String is created that holds the same value as the "stack" in the pool, and then another String is created by concatenating the "overflow" from the constant pool to it.
Excerpt from javap -c -verbose Test:
Constant pool:
#1 = Methodref #10.#19 // java/lang/Object."<init>":()V
#2 = Class #20 // java/lang/String
#3 = String #21 // stack
#4 = Methodref #2.#22 // java/lang/String."<init>":(Ljava/lang/String;)V
#5 = String #23 // overflow
#6 = Methodref #2.#24 // java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;

This question is certainly undecidable, yet you can find out how a certain combination of java compiler and JVM does it.
As far as I can see, nothing could stop one from writing a java compiler that, when it sees a string constant, emits byte code to create that string in the heap in some way as long as the rules stated in the JLS concerning string literals are still maintained. For example, String.intern could maintain a global Map, and the compiler could compile a String literal like follows:
create a char array of the desired size
put character at index 0
put character at index 1
...
put character at index (length-1)
construct the actual string object
pass the String just created to String.intern and leave result on the stack
Actually, one could have a pre-processor changing all string constants to
(extra.HeapString.createString(new char[] { ... }))
and have createString create a String instance in such a way that the rules for String literals hold. And you couldn't write a program that could detect if it was compiled from the original source or from the preprocessed one (except through reflection on extra.HeapString).

The string stack will be in the heap, the string overflow is in the constant pool, the third string as the result of concatenation stackoverflow in the constant pool.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.