String comparison and interning - java

The following code (from an interview) produces an output of false, but I believe it should be true.
public static void main(String[] args) {
String a = "hello";
String b = a + "world";
String c = "helloworld";
System.out.println(b==c);
}
I thought that constant String expressions were interned, and a + "world" is a constant, so it should intern "hello world".
Can someone explain why the output is false?

Java interns all Strings that are compile time constants. However, only Strings declared by concatenating String literals are considered a compile time constant and so be interned.
This is because the compiler looks only at the line being compiled, so it has no idea if a if a is a constant or not. For example, a could be declared as:
String a = new Date().toString();
Hence, c is a different instance of String than b.

When you do this,
String b=a+"world";
The compiler chooses a StringBuilder based concatenation of String objects like so,
StringBuilder sb = new StringBuilder(a);
sb.append("world");
String b = sb.toString();
This yields a different reference, hence returning false as in your case.
But if you use this,
String b="hello"+"world";
Then the compiler identifies it as a constant, and both the b and c variables reference the same literal in the constant pool. Hence it returns true.

When you assign strings like in your example, a, b, and c are separate String objects. So when you compare them you get false because they are not the same object. == in Java does not do a character-by-character comparison of the string. That's what String.equals() is for.
This is a solid summary to read to understand: How do I compare strings in Java?

The code you are looking at does an equality comparison between two variables that point to two different string instances, which are diferent objects stored in different places in memory (among other things) and are therefore different even though the string they represent is the same.
To do a string comparison you would need to use
stringInstance.equals(anotherStringInstance)
If you did something like this
String a = "abcde";
String b = a;
Then you would get a == b to be true as both variables point to the same object.

Related

strings and memory allocation in java?

one thing that i always wondered, if i have a method like this:
String replaceStuff (String plainText) {
return plainText.replaceAll("&", "&");
}
will it create new String objects all the time for the "&" and the "&" that gets destroyed by the GC and then recreated again by next call?
E.g.
would it in theory be better to do something like this
final String A ="&";
final String AMP ="&";
String replaceStuff (String plainText) {
return plainText.replaceAll(A, AMP);
}
i think this is probably a more theoretic question than a real life problem, I am just curious how the memory management is handled in this aspect.
No. String literals are interned. Even if you use an equal literal (or other constant) from elsewhere, you'll still refer to the same object:
Object x = "hello";
Object y = "he" + "llo";
System.out.println(x == y); // Guaranteed to print true.
EDIT: The JLS guarantees this in section 3.10.5
String literals-or, more generally, strings that are the values of constant expressions (§15.28)-are "interned" so as to share unique instances, using the method String.intern.
Section 15.28 shows the + operator being included as an operation which can produce a new constant from two other constants.
Nope, they're literals and therefore automatically interned to the constant pool.
The only way you'd create new strings each time would be to do:
String replaceStuff (String plainText) {
return plainText.replaceAll(new String("&"), new String("&"));
}
Strings are handled little different than the normal objects by GC.
For example if
String a = "aaa";
String a1 = "aaa";
Now both a and a1 will point to same String value in memory till any of the value changes. Hence there will be only 1 object in memory.
Also, if we change 'a' and 'a1' to point to any other string, still the value "aaa" is left in the string pool and will be used later by JVM if required. The string is not GC'd

Why do I get different results when comparing strings after using different concatenation in Java?

i was working on the basic java program and i found verry funny thing which i am sharing with you. foo() gives output (s==s1) = false and bar gives (s==s1) = true.
I want to know why this happens.
public class StringTest
{
public static void main(String[] args){
foo();
bar();
}
public static void foo(){
String s = "str4";
String s1 = "str" + s.length();
System.out.println("(s==s1) = " + (s1==s));
}
public static void bar(){
String s = "str4";
String s1 = "str" + "4";
System.out.println("(s==s1) = " + (s1==s));
}
}
In the latter case, the compiler optimizes the string concatenation. As this can be done at compile time, both reference the same constant string object.
In the former case, the length() call can't be optimized during compile time. At runtime, a new string object is created, which is not identical to the string constant (but equal to it)
The string catenation in bar() can be done at compile time, because it's an expression composed of nothing but compile-time constants. Although the length of the String s is obviously known at compile time, the compiler doesn't know that length() returns that known value, so it won't be used as a constant.
When you write a line of code like this:
String s1 = "str" + "4";
then the compiler is smart enough to optimize this to:
String s1 = "str4";
Literal strings in Java are managed in a string pool. When you have two literal strings that have the same content (such as s and s1 in your second example), then just one String object will be created which will be shared by the two variables.
The == operator in Java checks if two variables refer to the same object. Since there is only one String object in the second example, s == s1 will be true.
String s1 = "str" + s.length();
String s1 = "str" + "4";
In first case s.length() will return a value of type int, In second case The type is String
Even though the number is 4 in both the cases but types are not the same :)
It probably has to do with the fact that foo() is probably creating an new String instance in s.length()(.toString()), where as bar() is just concatenating a constant. I don't know the nitty gritty of it, but my gut tells me it in that direction
If I needed to guess I would say that the java compiler performs some optimization onto bar(). At compiletime it is clear that "str" + "4" can be replaced by "str4" which (since Strings are immutable objects) is indeed the very same object as "str4"-String used for the s-initialization.
Within foo() the optimization is not that streight forward. In general the value s1-variable cannot be predicted very easily (indeed this example is quite streight forward). So the java compiler will produce two different variables for s and s1.
The "==" operator does not compare the value of the Strings! It checks whether these are the same Objects. To compare the values of the Strings use the "equals" method like this:
String s = "str4";
String s1 = "str" + s.length();
System.out.println("(s==s1) = " + (s1.equals(s2));
You should try playing with intern method of String class. Java keeps something like dictionary where all different strings are stored. When you create a string object which can be evaluated at compile time, Java searches it in its dictionary. If it founds the string, it stores only a reference to this string (which is actually returned by intern method).
You should notice that:
"str4" == ("str" + "str4".length()) returns false, but
"str4" == ("str" + "str4".length()).intern() returns true, because the only "wrapper" is a different object.

If == compares references in Java, why does it evaluate to true with these Strings?

As it is stated the == operator compares object references to check if they are referring to the same object on a heap. If so why am I getting the "Equal" for this piece of code?
public class Salmon {
public static void main(String[] args) {
String str1 = "Str1";
String str2 = "Str1";
if (str1 == str2) {
System.out.println("Equal");
} else {
System.out.println("Not equal");
}
}
}
The program will print Equal. (At least using the Sun Hotspot and suns Javac.) Here it is demonstrated on http://ideone.com/8UrRrk
This is due to the fact that string-literal constants are stored in a string pool and string references may be reused.
Further reading:
What is String literal pool?
String interning
This however:
public class Salmon {
public static void main(String[] args) {
String str1 = "Str1";
String str2 = new String("Str1");
if (str1 == str2) {
System.out.println("Equal");
} else {
System.out.println("Not equal");
}
}
}
Will print Not equal since new is guaranteed to introduce a fresh reference.
So, rule of thumb: Always compare strings using the equals method.
Java stores all Strings in a string table internally during a run. The references to the two strings are identical because in memory they're stored in the same place. Hence, Equal.
Your statement is right, that == compares object references. Try the same thing with any other class but Strings and you won't get the same result.
This code won't print Equal.
But if the two strings were the same, this case would be special.
Now that you've updated your code, it is the case :
A simple (but not totally exact) explanation is that the compiler see that the two strings are the same and do something like :
String str1 = "Str1";
String str2 = str1;
What really happens here is that the compiler will see the literal string and put it in the "String literal pool".
As a String can't be modified (it's immutable) the literal values of Strings (those found during compilation) are put in a "pool".
This way, if two different literal strings which have the same content (like in this particular case), the memory isn't wasted to store "Str1" and "Str1" two times.
People, you are forgetting that the process of placing literal strings in the pool is called "interning". The class String has a method called intern(). This method puts any string into the pool, even if it is not in the pool initially (not literal). This means that code like this:
String a = "hello";
String b = new String("hello");
b = b.intern();
System.out.println(a == b);
will print "true".
Now, why would someone need this? As you can imagine, string comparison a.equals(b) might take a long time if strings are the same length but different close to the end.
(Just look at the .equals() source code.).
However, comparing references directly is the same as comparing integers (pointers in C speak), which is near instant.
So, what does this give you? Speed. If you have to compare the same strings many, many times, your program performance will benefit tremendously if you intern these strings. If however you are going to compare strings only once, there will be no performance gain as the interning process itself uses equals().
I hope this explains this.
thanks
Comments above have summed it up pretty well.
I don't have a Java environment handy, but attempting the following should clarify things for you (hopefully this works as I anticipate).
String str1 = "Str1";
String str2 = "Str"; str2 += "1";
Should now print Not equal

Java: Why can String equality be proven with ==?

I learned that it is from the devil to test String equality with == instead of String.equals(), because every String was a reference to its own object.
But if i use something like
System.out.println("Hello" == "Hello");
it prints true.
Why?
It doesn't. It's still a bad thing to do - you'll still be testing reference equality instead of value equality.
public class Test
{
public static void main(String[] args)
{
String x = "hello";
String y = new String(x);
System.out.println(x == y); // Prints false
}
}
If you're seeing == testing "work" now then it's because you genuinely have equal references. The most common reason for seeing this would probably be due to interning of String literals, but that's been in Java forever:
public class Test
{
public static void main(String[] args)
{
String x = "hello";
String y = "hel" + "lo"; // Concatenated at compile-time
System.out.println(x == y); // Prints true
}
}
This is guaranteed by section 3.10.5 of the Java Language Specification:
Each string literal is a reference
(§4.3) to an instance (§4.3.1, §12.5)
of class String (§4.3.3). String
objects have a constant value. String
literals-or, more generally, strings
that are the values of constant
expressions (§15.28)-are "interned" so
as to share unique instances, using
the method String.intern.
It hasn't changed. However, the Java Compiler uses string.intern() to make sure that identical strings in source code compile to same String object. If however you load a String from a File or Database it will not be the same object, unless you force this using String.intern() or some other method.
It is a bad idea, and you should still use .equals()
Look, this is a tricky concept.
There is a difference between:
// These are String literals
String a = "Hiabc";
String b = "abc";
String c = "abc";
and
// These are String objects.
String a = new String("Hiabc");
String b = new String("abc");
String c = new String("abc");
If your strings were objects, i.e.,
String b = new String("abc");
String c = new String("abc");
Then, two different objects would have been created in the String pool at two different memory locations and doing
b == c
would have resulted false.
But since your String b and String c are literals,
b == c
results true. This is because two different objects were not created. And both a and b are pointing to same String in the stack memory.
This is the difference. You are right, == compares for memory location. And that is the reason,
a.substring(2, 5) == b; // a,substring(2, 5) = "abc" which is at the location of b, and
b == c // will be true, coz both b and c are literals. And their values are compared and not memory locations.
In order to have two separate Strings with same values but at different locations in the String pool and NOT stack memory, you need to create String objects as shown above.
So,
a.substring(2, 5) == b; // and
b == c; // will be false. as not both are objects. Hence are stored on separate memory locations on the String pool.
you have to use
a.substring(2, 5).equals(b);
b.equals(c);
in case of objects.

String.equals versus == [duplicate]

This question already has answers here:
How do I compare strings in Java?
(23 answers)
Closed 9 years ago.
This code separates a string into tokens and stores them in an array of strings, and then compares a variable with the first home ... why isn't it working?
public static void main(String...aArguments) throws IOException {
String usuario = "Jorman";
String password = "14988611";
String strDatos = "Jorman 14988611";
StringTokenizer tokens = new StringTokenizer(strDatos, " ");
int nDatos = tokens.countTokens();
String[] datos = new String[nDatos];
int i = 0;
while (tokens.hasMoreTokens()) {
String str = tokens.nextToken();
datos[i] = str;
i++;
}
//System.out.println (usuario);
if ((datos[0] == usuario)) {
System.out.println("WORKING");
}
}
Use the string.equals(Object other) function to compare strings, not the == operator.
The function checks the actual contents of the string, the == operator checks whether the references to the objects are equal. Note that string constants are usually "interned" such that two constants with the same value can actually be compared with ==, but it's better not to rely on that.
if (usuario.equals(datos[0])) {
...
}
NB: the compare is done on 'usuario' because that's guaranteed non-null in your code, although you should still check that you've actually got some tokens in the datos array otherwise you'll get an array-out-of-bounds exception.
Meet Jorman
Jorman is a successful businessman and has 2 houses.
But others don't know that.
Is it the same Jorman?
When you ask neighbours from either Madison or Burke streets, this is the only thing they can say:
Using the residence alone, it's tough to confirm that it's the same Jorman. Since they're 2 different addresses, it's just natural to assume that those are 2 different persons.
That's how the operator == behaves. So it will say that datos[0]==usuario is false, because it only compares the addresses.
An Investigator to the Rescue
What if we sent an investigator? We know that it's the same Jorman, but we need to prove it. Our detective will look closely at all physical aspects. With thorough inquiry, the agent will be able to conclude whether it's the same person or not. Let's see it happen in Java terms.
Here's the source code of String's equals() method:
It compares the Strings character by character, in order to come to a conclusion that they are indeed equal.
That's how the String equals method behaves. So datos[0].equals(usuario) will return true, because it performs a logical comparison.
It's good to notice that in some cases use of "==" operator can lead to the expected result, because the way how java handles strings - string literals are interned (see String.intern()) during compilation - so when you write for example "hello world" in two classes and compare those strings with "==" you could get result: true, which is expected according to specification; when you compare same strings (if they have same value) when the first one is string literal (ie. defined through "i am string literal") and second is constructed during runtime ie. with "new" keyword like new String("i am string literal"), the == (equality) operator returns false, because both of them are different instances of the String class.
Only right way is using .equals() -> datos[0].equals(usuario). == says only if two objects are the same instance of object (ie. have same memory address)
Update: 01.04.2013 I updated this post due comments below which are somehow right. Originally I declared that interning (String.intern) is side effect of JVM optimization. Although it certainly save memory resources (which was what i meant by "optimization") it is mainly feature of language
The == operator checks if the two references point to the same object or not.
.equals() checks for the actual string content (value).
Note that the .equals() method belongs to class Object (super class of all classes). You need to override it as per you class requirement, but for String it is already implemented and it checks whether two strings have the same value or not.
Case1)
String s1 = "Stack Overflow";
String s2 = "Stack Overflow";
s1 == s1; // true
s1.equals(s2); // true
Reason: String literals created without null are stored in the string pool in the permgen area of the heap. So both s1 and s2 point to the same object in the pool.
Case2)
String s1 = new String("Stack Overflow");
String s2 = new String("Stack Overflow");
s1 == s2; // false
s1.equals(s2); // true
Reason: If you create a String object using the `new` keyword a separate space is allocated to it on the heap.
equals() function is a method of Object class which should be overridden by programmer. String class overrides it to check if two strings are equal i.e. in content and not reference.
== operator checks if the references of both the objects are the same.
Consider the programs
String abc = "Awesome" ;
String xyz = abc;
if(abc == xyz)
System.out.println("Refers to same string");
Here the abc and xyz, both refer to same String "Awesome". Hence the expression (abc == xyz) is true.
String abc = "Hello World";
String xyz = "Hello World";
if(abc == xyz)
System.out.println("Refers to same string");
else
System.out.println("Refers to different strings");
if(abc.equals(xyz))
System.out.prinln("Contents of both strings are same");
else
System.out.prinln("Contents of strings are different");
Here abc and xyz are two different strings with the same content "Hello World". Hence here the expression (abc == xyz) is false where as (abc.equals(xyz)) is true.
Hope you understood the difference between == and <Object>.equals()
Thanks.
== tests for reference equality.
.equals() tests for value equality.
Consequently, if you actually want to test whether two strings have the same value you should use .equals() (except in a few situations where you can guarantee that two strings with the same value will be represented by the same object eg: String interning).
== is for testing whether two strings are the same Object.
// These two have the same value
new String("test").equals("test") ==> true
// ... but they are not the same object
new String("test") == "test" ==> false
// ... neither are these
new String("test") == new String("test") ==> false
// ... but these are because literals are interned by
// the compiler and thus refer to the same object
"test" == "test" ==> true
// concatenation of string literals happens at compile time resulting in same objects
"test" == "te" + "st" ==> true
// but .substring() is invoked at runtime, generating distinct objects
"test" == "!test".substring(1) ==> false
It is important to note that == is much cheaper than equals() (a single pointer comparision instead of a loop), thus, in situations where it is applicable (i.e. you can guarantee that you are only dealing with interned strings) it can present an important performance improvement. However, these situations are rare.
Instead of
datos[0] == usuario
use
datos[0].equals(usuario)
== compares the reference of the variable where .equals() compares the values which is what you want.
Let's analyze the following Java, to understand the identity and equality of Strings:
public static void testEquality(){
String str1 = "Hello world.";
String str2 = "Hello world.";
if (str1 == str2)
System.out.print("str1 == str2\n");
else
System.out.print("str1 != str2\n");
if(str1.equals(str2))
System.out.print("str1 equals to str2\n");
else
System.out.print("str1 doesn't equal to str2\n");
String str3 = new String("Hello world.");
String str4 = new String("Hello world.");
if (str3 == str4)
System.out.print("str3 == str4\n");
else
System.out.print("str3 != str4\n");
if(str3.equals(str4))
System.out.print("str3 equals to str4\n");
else
System.out.print("str3 doesn't equal to str4\n");
}
When the first line of code String str1 = "Hello world." executes, a string \Hello world."
is created, and the variable str1 refers to it. Another string "Hello world." will not be created again when the next line of code executes because of optimization. The variable str2 also refers to the existing ""Hello world.".
The operator == checks identity of two objects (whether two variables refer to same object). Since str1 and str2 refer to same string in memory, they are identical to each other. The method equals checks equality of two objects (whether two objects have same content). Of course, the content of str1 and str2 are same.
When code String str3 = new String("Hello world.") executes, a new instance of string with content "Hello world." is created, and it is referred to by the variable str3. And then another instance of string with content "Hello world." is created again, and referred to by
str4. Since str3 and str4 refer to two different instances, they are not identical, but their
content are same.
Therefore, the output contains four lines:
Str1 == str2
Str1 equals str2
Str3! = str4
Str3 equals str4
You should use string equals to compare two strings for equality, not operator == which just compares the references.
It will also work if you call intern() on the string before inserting it into the array.
Interned strings are reference-equal (==) if and only if they are value-equal (equals().)
public static void main (String... aArguments) throws IOException {
String usuario = "Jorman";
String password = "14988611";
String strDatos="Jorman 14988611";
StringTokenizer tokens=new StringTokenizer(strDatos, " ");
int nDatos=tokens.countTokens();
String[] datos=new String[nDatos];
int i=0;
while(tokens.hasMoreTokens()) {
String str=tokens.nextToken();
datos[i]= str.intern();
i++;
}
//System.out.println (usuario);
if(datos[0]==usuario) {
System.out.println ("WORKING");
}
Generally .equals is used for Object comparison, where you want to verify if two Objects have an identical value.
== for reference comparison (are the two Objects the same Object on the heap) & to check if the Object is null. It is also used to compare the values of primitive types.
== operator compares the reference of an object in Java. You can use string's equals method .
String s = "Test";
if(s.equals("Test"))
{
System.out.println("Equal");
}
If you are going to compare any assigned value of the string i.e. primitive string, both "==" and .equals will work, but for the new string object you should use only .equals, and here "==" will not work.
Example:
String a = "name";
String b = "name";
if(a == b) and (a.equals(b)) will return true.
But
String a = new String("a");
In this case if(a == b) will return false
So it's better to use the .equals operator...
The == operator is a simple comparison of values.
For object references the (values) are the (references). So x == y returns true if x and y reference the same object.
I know this is an old question but here's how I look at it (I find very useful):
Technical explanations
In Java, all variables are either primitive types or references.
(If you need to know what a reference is: "Object variables" are just pointers to objects. So with Object something = ..., something is really an address in memory (a number).)
== compares the exact values. So it compares if the primitive values are the same, or if the references (addresses) are the same. That's why == often doesn't work on Strings; Strings are objects, and doing == on two string variables just compares if the address is same in memory, as others have pointed out. .equals() calls the comparison method of objects, which will compare the actual objects pointed by the references. In the case of Strings, it compares each character to see if they're equal.
The interesting part:
So why does == sometimes return true for Strings? Note that Strings are immutable. In your code, if you do
String foo = "hi";
String bar = "hi";
Since strings are immutable (when you call .trim() or something, it produces a new string, not modifying the original object pointed to in memory), you don't really need two different String("hi") objects. If the compiler is smart, the bytecode will read to only generate one String("hi") object. So if you do
if (foo == bar) ...
right after, they're pointing to the same object, and will return true. But you rarely intend this. Instead, you're asking for user input, which is creating new strings at different parts of memory, etc. etc.
Note: If you do something like baz = new String(bar) the compiler may still figure out they're the same thing. But the main point is when the compiler sees literal strings, it can easily optimize same strings.
I don't know how it works in runtime, but I assume the JVM doesn't keep a list of "live strings" and check if a same string exists. (eg if you read a line of input twice, and the user enters the same input twice, it won't check if the second input string is the same as the first, and point them to the same memory). It'd save a bit of heap memory, but it's so negligible the overhead isn't worth it. Again, the point is it's easy for the compiler to optimize literal strings.
There you have it... a gritty explanation for == vs. .equals() and why it seems random.
#Melkhiah66 You can use equals method instead of '==' method to check the equality.
If you use intern() then it checks whether the object is in pool if present then returns
equal else unequal. equals method internally uses hashcode and gets you the required result.
public class Demo
{
public static void main(String[] args)
{
String str1 = "Jorman 14988611";
String str2 = new StringBuffer("Jorman").append(" 14988611").toString();
String str3 = str2.intern();
System.out.println("str1 == str2 " + (str1 == str2)); //gives false
System.out.println("str1 == str3 " + (str1 == str3)); //gives true
System.out.println("str1 equals str2 " + (str1.equals(str2))); //gives true
System.out.println("str1 equals str3 " + (str1.equals(str3))); //gives true
}
}
The .equals() will check if the two strings have the same value and return the boolean value where as the == operator checks to see if the two strings are the same object.
Someone said on a post higher up that == is used for int and for checking nulls.
It may also be used to check for Boolean operations and char types.
Be very careful though and double check that you are using a char and not a String.
for example
String strType = "a";
char charType = 'a';
for strings you would then check
This would be correct
if(strType.equals("a")
do something
but
if(charType.equals('a')
do something else
would be incorrect, you would need to do the following
if(charType == 'a')
do something else
a==b
Compares references, not values. The use of == with object references is generally limited to the following:
Comparing to see if a reference is null.
Comparing two enum values. This works because there is only one object for each enum constant.
You want to know if two references are to the same object
"a".equals("b")
Compares values for equality. Because this method is defined in the Object class, from which all other classes are derived, it's automatically defined for every class. However, it doesn't perform an intelligent comparison for most classes unless the class overrides it. It has been defined in a meaningful way for most Java core classes. If it's not defined for a (user) class, it behaves the same as ==.
Use Split rather than tokenizer,it will surely provide u exact output
for E.g:
string name="Harry";
string salary="25000";
string namsal="Harry 25000";
string[] s=namsal.split(" ");
for(int i=0;i<s.length;i++)
{
System.out.println(s[i]);
}
if(s[0].equals("Harry"))
{
System.out.println("Task Complete");
}
After this I am sure you will get better results.....

Categories

Resources