Is length member from the String class an integer constant? - java

I am trying to understand if length member from the String class an integer constant? I'm thinking it is not.

It is NOT a compile time constant expression in the sense of JLS 15.28.
The expression str.length() is a method call, and no explicit method calls are constant expressions. This is true even if the String is a literal; i.e. "hello".length() is NOT a compile time constant expression according to the JLS.
It is a runtime constant, in the sense that once the String has been created, the length cannot change. This follows from the fact that Java strings are immutable.
You can confirm this by looking at the source code of any version of the (standard) String class. The code is liable to differ between versions, but the "runtime constancy" property will remain the same across all versions.
(Actually, the above is not completely accurate. If you are willing to use bad reflection to break encapsulation, you can modify the private members of a String and mutate it. That can change the value returned by length(). You could achieve the same effect via native code methods, and possibly other means. But ... just ... don't do it! The JLS says that if you do that kind of thing, the behavior of your Java code is unspecified.)

According to the source code (jdk9) for the String class, length():
public int length() {
return value.length >> coder();
}
Where value is
#Stable
private final byte[] value;
And coder() is:
byte coder() {
return COMPACT_STRINGS ? coder : UTF16;
}
So it is essentially a runtime constant as String's are immutable and the length of value will be decided at compile time.

length member from the String class is a method, returning an int value, representing the number of characters of the String.

Related

What field is the value of the int wrapped by an Integer Java object stored in?

I am trying to figure out where the value is stored in an Integer (ie what field is it in, or is there some other way it knows its value)? Unless its pertinent, I don't need to know 'how its stored in memory', I need to know how the Object knows what value it represents.
When I looked at the javadoc, I expected to see a field of Type int which stored the value as a primitive, but I don't see that.
The Javadoc says that
An object of type Integer contains a single field whose type is int.
But when I look at the fields, I see 5 static fields, and I think all of them store characteristics of the value, but not the value (BYTES, MAX_VALUE, MIN_VALUE, SIZE, TYPE).
First - this statement seems contradictory to me...there is more than a single field (there is 5)
Second - none of the fields 'store' the value...so how does the object know what value it is representing?
I am not a regular Java user, so be kind with your answers.
The value is stored in a private final int value in OpenJDK; the actual naming and other considerations are implementation-defined and may vary.
Because it's private, it's not listed in javadocs unless the Javadoc tool was called with the -private parameter -- this was not done when the official API docs were generated from sources.
The public fields are static, so their values pertain to the class itself, not to instances. There's nothing unexpected here - MAX_VALUE is a reasonable immutable (final) thing to pertain to Integers.
In the Integer class code there is a field
private final int value;
that holds the value as a primitive int.
The code to retrieve the stored value is intValue() whose implementation is
public int intValue() {
return value;
}
Properties MIN_VALUE, MAX_VALUE ... are static properties so there is no duplication between instances of the Integer class.
There are also other interesting static properties in the Integer class. In particular there is a group of cached values between -128 and +127 that are stored in the static property private static Integer[] intCache.
That is why is better to use the method valueOf instead of creating explicitly a new Integer. The implementation of valueOf try to use the cached values when available with a code similar to the following:
public static Integer valueOf(int val) {
if (val < MIN_CACHE || val > MAX_CACHE)
return new Integer(val);
synchronized (intCache) {
if (intCache[val - MIN_CACHE] == null)
intCache[val - MIN_CACHE] = new Integer(val);
return intCache[val - MIN_CACHE];
}
}
Travelling around the source code of base classes of java gives you ideas on how to perform better your code, but you need to be sure to know details of java language so that code like the synchronized block before doesn't afraid you to read it.

Java type casting. Logic behind (String), .toString() and "" + int

Java is a strongly typed language. Hence, less bugs in compiled code and better readability at a cost of slower coding. Hence, explicit type casting is required, this part is clear.
What is the difference between these snippets:
(String)myInt // obviously doesn't compile
Integer.toString(myInt)
"" + myInt
Is it purely for historical reasons, or there is some meaning behind it?
Which method is appropriate?
As was mentioned before, (String) myInt is a typecast. In Java, we can either cast within primitives or upwards in the object-hierarchy. Since int is a primitive and String is an object, we have a problem. Even Autoboxing cannot resolve this dilemma since Integer and String do not stand in an inheritance-relationship. Therefore, it is perfectly plausible for (String) myInt to result in a compilation error.
The semantics of Integer.toString(myInt) and "" + myInt are identical. The specifics, however, are different.
When executing Integer.toString(myInt), a new String is constructed, containing a String-representation of myInt.
When executing "" + myInt, Java first constructs a global String-constant, having the value "" (this is done by the JVM, we do not see this 1). The lexer demands a String on the right side of + since it found a String on the left side of +. For primitives, the JVM "knows" how to convert them into Strings. For objects, toString() is called. Since Object has this method and each class is (at least implicitly) derived from Object, each object is guaranteed to have a toString() method. This is the second String constructed. Since Strings are immutable, the JVM might create a third String, representing the concatenation of the first two Strings2.
Epilogue
And then, at execution time, the JIT-compiler strikes and most of this might be irrelevant since the JIT-optimized versions of both variants may look equal. Or not. Or maybe only sometimes. JIT does funny stuff. So in the end it is more a question of personal style than performance :)
1 This is actually a lie. When writing
String s1 = "";
String s2 = "";
System.out.println(s1 == s2);
one will observe that the result is true, whereas false is expected. This is due to the fact that the JVM creates a pool for all String constants to save some memory.
2 It is well possible that the JVM "recognizes" that something + "" == "" + something == something and therefore does not create this third String. I did neither test nor research this.
I tend to prefer String.valueOf(int), myself, but yes there are multiple paths to get there.
In general, if you're converting the int to a String in isolation, you want Integer.toString(int), String.valueOf(int), or similar.
Your "" + myInt will be markedly less efficient than the above (though it usually doesn't matter): It compiles to creating a new StringBuilder, then appending "" to it, then appending the int to it (which calls one of the methods above to convert it to a string first), and then getting the String from the StringBuilder. So while you might do it as part of an overall concatenation operation, on its own it's pretty poor.
Your (String) example doesn't work: You can't just cast an int to a String, you'll get a compile-time "incompatible types" error.
It's a language feature that "officially" overloads the + operator such that
"" + myInt
is compiled as if it were:
new StringBuilder().append("").append(myInt).toString();
and StringBuilder#append() (ultimately) calls String.valueOf() to render myInt.
This feature was done presumably because if they didn't, no one would use the language - it would just be too painful.
First of all, lets compare (String)val and val.toString().
The key difference in those two statements is that casting is made by compiler while toString() method is usually implemented by programmer. That said, you can cast only if value extends String and you can implement toString() of any class. toString() works just like any other method and nothing(almost) special about it.
Now let's take a look at ""+int. I personally consider this as anti-pattern, because it's compiled in to wildly unefficient code. A lot of useless objects being created; in short: the result is similiar with manually concatenating few string literals with StringBuilder. Also, in statements like ""+var var does not extends String, toString() implicitly called.
I hope this was clear.
(String) - this works only on strings, so if you have this:
Object test = "test";
String test2 = (String) test;
It will work (since Object test will be of String type).
If you have something like this:
Integer test = new Integer(2);
String test2 = (String) test;
You get ClassCastException.
Integer.toString(myInt)
This is what it does:
public static String toString(int i) {
if (i == Integer.MIN_VALUE)
return "-2147483648";
int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i);
char[] buf = new char[size];
getChars(i, size, buf);
return new String(0, size, buf);
}
It just turns your number to string char by char.
"" + myInt
This will tell compiler that you want to join things as string, not add two numbers (since string is first, it will go with joining strings). myInt will be turn into the string, or if it is an object, it will call toString() method from it.
Also, make sure you read about String pool in JVM, to know what it will do here. In situations like this it can end up creating a lot of temporary strings (if you have a lot of +; avoid that by using StringBuilder).

In Java, what is the difference between using the toString method vs explicit String casting vs plus quotes (ex. myObj+"")?

What is the difference between plus quotes (+"") and using a "toString()" method or even explicitly casting with something like (String) myObject? Trade-offs?
myObject.toString()
vs.
myObject+""
or even vs.
(String) myObject
More specifically, is there any time using the myObj+"" method can get you into trouble?
Edited for clarity
EDIT 2:
Seems String.valueOf(myObj); is the prefered method for avoiding a null pointer. That said: Is there ever a time when the following is false?
String.valueOf(myObj).equals(myObj+"")
As of Java 7, if you want to avoid a NullPointerException, you can simply use one of these:
Objects.toString( myObject )
Objects.toString( myObject, "defaultValueWhenMyObjectIsNull" )
In all versions of Java, the first of these can also be accomplished with the following, as noted by #NobuGames in the first comment below:
String.valueOf( myObject )
The mechanisms you cite each has a flaw.
myObject.toString() // throws NullPointerException if myObject is null.
myObject+"" // Hack; impairs understandability.
(String) myObject // throws ClassCastException unless myObject is a String or null
EDIT (after question edit)
is there any time using the myObj+"" method can get you into trouble?
Yes, you can confuse other programmers. The intent of the hack is not clear. This can lead to increased cost in time, and increased risk of someone "fixing" it.
However, in terms of just the compiler, you're fine. From the Java Language Specification, section 15.18: String concatentation operator +:
If only one operand expression is of type String, then string conversion (ยง5.1.11) is performed on the other operand to produce a string at run time.
And from that cited section 5.1.11: String conversion:
If the reference is null, it is converted to the string "null" (four ASCII characters n, u, l, l).
Otherwise, the conversion is performed as if by an invocation of the toString method of the referenced object with no arguments; but if the result of invoking the toString method is null, then the string "null" is used instead.
This second case leads to a difference that you asked about.
Is there ever a time when the following is false? String.valueOf(myObj).equals(myObj+"")
No, but there's a time when that throws a NullPointerException. When myObj is a non-null reference to an object whose toString() method returns null, then String.valueOf(myObj) will be null. Calling the equals method will throw the NullPointerException.
But I suspect you're asking whether there's ever a time the two have different values. Yes, they can have different values. Objects.toString() and String.valueOf() can return null values. The hack will always have a non-null value.
That said, returning null from toString() is somewhat bad form. The JLS acknowledges that it can happen, but the API implies that it should not. Personally, if I were concerned about this case, I would handle it in some way other than the hack.
This code:
myObject+""
Is translated by the compiler to this:
new StringBuilder().append(myObject).append("").toString()
The StringBuilder append method does a null check on the input argument, appending the text "null".
The String class has an overloaded valueOf method, so you can also do:
String.valueOf(myObject)
Which will do a null check, returning the text "null".
Casting to String is going to be highly contextual, so more than one technique may apply here. If you are expecting to directly cast to String, my advice would be to prepare for it. If there is a chance it can be null, then check for it. Or make the API promise not to hand you a null. That is, this is a separation of concerns and a layering of responsibilities.
However, IMO, any sufficiently complicated class ought to have a toString() method. It can be for debugging, or used as a property for computation, but it ought to be human readable. There are few cases where a human-readable version of the object is not warranted, in my experience.
Relying on overloading the + operator feels like a hack, yes.

Why is string.length() a method, and int[].length a property? [duplicate]

This question already has answers here:
length and length() in Java
(8 answers)
Closed 10 years ago.
Why is String.length() a method, and int[].length a property (see below)?
int[] nums = {2,4,7,12,43};
String phrase = "Hello, world.";
System.out.length(nums.length);
System.out.length(phrase.length());
I don't think there has to be a good reason, and I think there could be many reasons.
But one is that by making String#length() a property, it can be declared in an interface instead (in this case CharSequence). Interfaces cannot declare public instance fields.
This is what the String::length() function looks like:
public int length() {
return count;
}
So essentially count could've been called length and made public to be similar to arrays (it is final after all).
It was probably just a design decision. There may have been some contributing factors that we can speculate about (one of which could've been the CharSequence thing mentioned by Mark Peters).
Because String is not an array as such. The designers of Java designed arrays (which are objects) to have a public field named length.
On the other hand, a String has a method which gives the length instead. In general it is a more conventional approach to make member fields private and use methods to access them, but in the case of arrays it is not.
They're different objects with different signatures as far as you are concerned. A String is not a char[] (although internally it might be implemented that way).
No particular reason, I think. In fact in C#, a very similar language, String.length is a property http://msdn.microsoft.com/en-us/library/system.string.length.aspx. But take a look at what C# designer has to say about this design:
The Length property returns the number of Char objects in this instance, not the number of Unicode characters.
The reason is that a Unicode character might be represented by more than one Char. Use the
System.Globalization.StringInfo class to work with each Unicode character instead of each Char.
Why int[].length a property?
Arrays are special objects in java, they have a simple attribute named length which is final.
There is no "class definition" of an array (you can't find it in any .class file), they're a part of the language itself.
The public final field length, which contains the number of components of the array. length may be positive or zero.
The public method clone, which overrides the method of the same name in class Object and throws no checked exceptions. The return type of the clone method of an array type T[] is T[].
A clone of a multidimensional array is shallow, which is to say that it creates only a single new array. Subarrays are shared.
All the members inherited from class Object; the only method of Object that is not inherited is its clone method.
Resource: JSL 10.7
Why String.length() a method?

What are the benefits of using identical String literals instead of a final variable?

I've come across a class that includes multiple uses of a string literal, "foo".
What I'd like to know, is what are the benefits and impact (in terms of object creation, memory usage and speed) of using this approach instead of declaring the String as final and replacing all the literals with the final variable?
For example (although obviously not a real word usage):
private static final String FINAL_STRING = "foo";
public void stringPrinter(){
for(int i=0;i<10;i++){
System.out.println(FINAL_STRING);
}
}
Versus:
public void stringPrinter(){
for(int i=0;i<10;i++){
System.out.println("foo");
}
}
Which is preferable and why (assuming the string value will remain constant)?
Would the above (second) example result in 10 String objects being created or would the JVM realise that only a single literal is actually used, and create a single reference. If so, is there any advantage for declaring the String as final (as in the first example)?
If the interpreted code does replace the string literal with a single reference, does that still apply if the same literal occurs in more than one place:
public void stringPrinter(){
for(int i=0;i<5;i++){
System.out.println("foo"); // first occurence
System.out.println("foo"); // second occurence
}
}
They will be exactly the same. The literal is interned (any compile time constant expression that results in that string shares the same instance as all other constants/literals) in both cases and a smart compiler+runtime should have no trouble reducing both to the most optimized example.
The advantage comes more in maintainability. If you want to change the literal, you would need only change one occurrence with a constant but you would need to search and change every instance if they were included inline.
From the JLS
Compile-time constants of type String are always "interned" so as to share unique instances, using the method String.intern.
So, no, there's gonna be only one string object.
As Mark notes, this is strictly the question of maintainability and not performance.
The advantage is not in performance, but in maintainability and reliability.
Let me take a real example I came across just recently. A programmer created a function that took a String parameter that identified the type of a transaction. Then in the program he did string compares against this type. Like:
if (type.equals("stock"))
{ ... do whatever ... }
Then he called this function, passing it the value "Stock".
Do you notice the difference in capitalization? Neither did the original programmer. It proved to be a fairly subtle bug to figure out, because even looking at both listings, the difference in capitalization didn't strike me.
If instead he had declared a final static, say
final static String stock="stock";
Then the first time he tried to pass in "Stock" instead of "stock", he would have gotten a compile-time error.
Better still in this example would have been to make an enum, but let's assume he actually had to write the string to an output file or something so it had to be a string.
Using final statics gives at least x advantages:
(1) If you mis-spell it, you get a compile-time error, rather than a possibly-subtle run-time error.
(2) A static can assign a meaingful name to a value. Which is more comprehensible:
if (employeeType.equals("R")) ...
or
if (employeeType.equals(EmployeeType.RETIRED)) ...
(3) When there are multiple related values, you can put a group of final statics together at the top of the program, thus informing future readers what all the possible values are. I've had plenty of times when I've seen a function compare a value against two or three literals. And that leaves me wondering: Are there other possible values, or is this it? (Better still is often to have an enum, but that's another story.)
All String literals are kept in a String cache (this is across all classes)
Using a constant can make the code clearer, give the the string some context and make the code easier to maintain esp if the same string appears in multiple places.
Those string literals are internalized, so no new String objects are created in the loop. Using the same literal twice could still be a sign for code smell, though; but not in terms of speed or memory usage.
In the cases you are providing, I believe the biggest reason for having it declared as FINAL_STRING somewhere is to ensure it stays in one centralized location. There will only ever be one instance of that string constant, but the first example is far easier to maintain.

Categories

Resources