String and Character Array in Java

String and Character Array in Java - java

I am a student who has just shifted from C++ to Java.
In Java what could be the main reason for defining separate data types for Strings and Char arrays? What is the difference between the two?
Since I have only studied C++, up till now I was under the impression that they are the same thing. Please clarify if possible.

String is immutable. Char array is not. A string is implemented with a char array underneath but every time you try to modify it (like with concatenation, replace etc.) it gives you a new String object.
So, String behaves as a constant Char array but comes with certain syntactic sugar that also makes them very easier to use. For example, the addition + operator has been overloaded as a string concatenation operator as well.

In Java, String is a basic system class that essentially wraps a char[]. There are several reasons why, for most uses, having a full class is preferable to directly handling arrays:
Strings are immutable; once you have a reference to some String, you know it's never going to change.
Strings provide useful methods that a bare array couldn't, such as length(), and have clearly-defined comparison semantics.
You never have to deal with string termination yourself.
Java has a special exception for the rule of "no operator overloading" to support string concatenation (with +).
Essentially, it's good OO practice to use a class to collect the desired behavior and the data structures in the same place, and String wraps up an array of characters with the useful operations that you want to perform on a string.

String is a class in Java and offers you methods and is also an Object.
A String-object is also immutable.
Internal the value is a char-array.

There is a semantic difference. Just because data is stored the same way, this doesn't mean it's the same thing. Dates and Amounts may also have the same internal representation (long for a timestamp or fixed point amount of cash), but they're not the same. The char array could as well mean a 16-bit image.
In object orientation, it's good practice to model objects based on what they are and can, and not by how they internally store their data. This allows you to encapsulate the data (and restrict or control (observer support) access with getters/setters, or even make the internal representation immutable or poolable), and provide appropriate methods for your objects.

String is immutable in Java and stored in the String pool. Once it is created it stays in the pool until garbage collected.Since, String is immutable , the logging password is as readable string.It has greater risk of producing the memory dump to find the password.
where as Char array is created in heap and you can override with some dummy values.

The advantage to using the string object is all the methods available to it.
For example:
stringExample1.equals(stringExample2);
String stringExample3 = stringExample1.replace(substring1, substring2);

Related

How Java handles dynamic Boolean array as method parameter?

I'm using a class that has a method that accepts a boolean[].
This code does not raise any errors
public class myclass{
void move(boolean[] myarray) {
//Do stufff
}
}
Now, I do a little C++ coding, and this would not work in the context of dynamic memory.
So this is essentially a java question:
In my case the array being received has a known length, but I want to know how you would handle this in Java if it is dynamic (as well as what I should do if its not dynamic).
I'm guessing the compiler or JVM is going to handle this, but I want to know the speed optimizations I can implement.

Arrays in Java are always constant length. From The Java Tutorials, "The length of an array is established when the array is created."
If you wanted dynamic arrays, you'd use something from the Collections Framework, e.g. ArrayList.
In any case, a reference to the array (or collection) is passed into move(...), so there shouldn't be any difference in speed just for the function call.
When using the array, I'd expect (static) arrays to be dereferenced more quickly than going through the function calls to access elements of (dynamic) collections. However, to have a proper comparison, you'd need to provide more context of how your array is used.

You should consider using ArrayList<>() for all your needs related to iterating arbitrary length collections.
Also using List is a good practice in the Java world. There is a article about programmers who use Lists and arrays and those who use lists tend to produce less bugs.

Using Strings vs POJOs ...which one consumes more memory

I have a database column which contains varchar data type. I would like to store this data in an arraylist for comparison with another list. I can think of two ways to do this. One is get the data and assign it to a String and store it in the arraylist. The second method would be to have a POJO , which would have a getter and setter method for this variable and store it in the POJO which in turn is stored in an arraylist. When I have to compare it against another variable, I have to either do a String comparison OR get it out of a POJOs getter method and then compare. While I feel that using the String reduces a lot of code, I would like to know if using one(String) over the other(POJO) has any memory implication. I usually need to compare around 1000 objects. So which one would consume less memory....in other words which one would be much faster to run....better performance. I need to use JAVA 1.4.

In Java, the science is, when you create an object, it takes part of heap, in pojo case it will be heap space for pojo and heap space for String (which is memory) and time complexity also (getter/setter). I would prefer just Strings.

In Java, why are arrays objects? Are there any specific reasons?

Is there any reason why an array in Java is an object?

Because the Java Language Specification says so :)
In the Java programming language arrays are objects (§4.3.1), are dynamically created, and may be assigned to variables of type Object (§4.3.2). All methods of class Object may be invoked on an array.
So, unlike C++, Java provides true arrays as first-class objects:
There is a length member.
There is a clone() method which overrides the method of the same name in class Object.
Plus all the members of the class Object.
An exception is thrown if you attempt to access an array out of bounds.
Arrays are instanciated in dynamic memory.

Having arrays be objects means that you can do operations with them (e.g., someArray.count('foo')) instead of just doing it against them (e.g., count(someArray, 'foo')), which leads to more natural syntax.

Another point is that objects are mutable and are passed by reference. In arrays there aren't any fields/methods that you can use to change "properties" of the array, but you sure can mutate the element values. And the benefits of passing arrays by reference are pretty obvious (though functional programmers probably wish Java had immutable lists passed by value).
Edit: forgot to mention. In the period before autoboxing, it was helpful to be able to store arrays in collections, write them to ObjectStreams etc.

Probably because they wanted to get as close as possible to making everything an object. Native types are there for backward compatibility.

So that they get all the benefits thereof:
getHashCode()
toString()
etc.
And arrays aren't 'primitive', so if they can't be primitive, they must be objects.

I'm not sure about the official reason.
However, it makes sense to me that they are objects because operations can be performed on them (such as taking the length) and it made more sense to support these operations as member functions rather than introduce new keywords. Other operations include clone(), the inherited operations of object, etc. Arrays are also hashable and potentially comparable.
This is different from C (and native arrays in C++), where your arrays are essentially pointers to a memory offset.

Why some method in Java modify the original value and some do not?

The method Concat() does not modify the original value. It returns a new value.
like this:
String str = "good";
str.concat("ness");
System.out.println(str); //"good"
But some method modify the original value. Why?
In Groovy:
def languages = ["Java", "Groovy", "JRuby"]
languages.reverse()
===> [JRuby, Groovy, Java]
println languages
===> [Java, Groovy, JRuby]
languages.sort()
===> [Groovy, JRuby, Java]
println languages
===> [Groovy, JRuby, Java]

String is immutable in Java. Any method that "modifies" a String must return a new instance of String.
From the Java API Specifications for the String class:
Strings are constant; their values
cannot be changed after they are
created.
The Java Language Specifications defines this behavior in Section 4.3.3: The Class String.
Response to the edit:
It appears that an example in Groovy has been added. (I haven't used Groovy before, so my understanding of it may not be correct.)
From what I understand from looking at the example, there seems to be a languages list that is being reverse-ed and sort-ed -- those operations themselves do not modify the String objects contained in the list, but are acting upon the list itself.
The way the list is returns a new list, or how it modifies or doesn't modify the list is not related to the behavior of the String objects themselves.

The Java API was designed by many many different people, as such it's hard to keep everything consistent. I believe people generally accept that immutability (i.e., the internal states should not change) is a good thing now though, at least where value objects are concerned.
Another similar question would be, "why are indexes sometimes 0-based (most of the time), and somes times 1-based (JDBC)." Again, I believe it's another situation of the API being too broad, and developers of different APIs not coordinating (I could be wrong here though, if anyone knows the real reason for JDBC being 1-based, please let me know).

I think you mean str.concat("ness") instead. In this particular example with Strings, no method can mutate the object because Strings are designed to be immutable. In the library, you will find many methods that mutate the state of the object (e.g. StringBuffer.replace()) and others that don't (e.g. String.replace()). You'll have to read the API carefully to determine which is the case. Ultimately, this is a choice made by the library designer, who has to consider the functionality, ease of use, and conventions associated with the package he or she is writing.

Because there are immutable and mutable classes.
String, as another answer points out, is an immutable class. Their value always stays the same once a String is created.
If you have an ArrayList<Integer> object, you can use its add function to add another Integer to the list. The add function changes the list in-place, instead of returning a new list. An ArrayList is mutable.
Response to Edit:
For your groovy example, probably its designers sat down and noticed that more often one would want a new list that contains the reversed result, and keep the old list untouched. (Why? I don't know). On the other side, they may have noticed there are more cases where you want not to have a new list which contains the sorted result. So it does its job in-place. But i don't know and haven't used groovy before, so just a guess.
In Ruby, i have heard there is a notion for this: Functions that change objects in-place have an exclamation written after them, and functions that return the result as a new object have no exclamation mark:
newObj = obj.sort(); // new sorted list is returned
obj.sort!(); // obj is sorted in-place

To some extent this also has to do with programming style. Not changing the original object and creating new copies to reflect the changes is an idiom for safe programming. I believe Josh Bloch mentioned it in his book "Effective Java" (The first edition). Though I cannot remember the exact term he used for it.
In the case of String it returns a new object because String is immutable. However, across the Java API, you will see some places where the original object is changed and some places where a new object is returned. As someone pointed out earlier, it is because different people have worked on the API, and they bring their own programming styles.
On a slightly different note: keeping objects immutable adds safety to the code, and it also allows us to code in a certain way.
(new Date()).add(new Month(7)).add(new Day(4))
If every method on Date returns a new object instead of changing it's own state, then we can write such code. This makes programs very readable.
However, keeping objects immutable, may add reduce the performance of the system if we have large objects.

What does the term "canonical form" or "canonical representation" in Java mean?

I have often heard this term being used, but I have never really understood it.
What does it mean, and can anyone give some examples/point me to some links?
EDIT: Thanks to everyone for the replies. Can you also tell me how the canonical representation is useful in equals() performance, as stated in Effective Java?

I believe there are two related uses of canonical: forms and instances.
A canonical form means that values of a particular type of resource can be described or represented in multiple ways, and one of those ways is chosen as the favored canonical form. (That form is canonized, like books that made it into the bible, and the other forms are not.) A classic example of a canonical form is paths in a hierarchical file system, where a single file can be referenced in a number of ways:
myFile.txt # in current working dir
../conf/myFile.txt # relative to the CWD
/apps/tomcat/conf/myFile.txt # absolute path using symbolic links
/u1/local/apps/tomcat-5.5.1/conf/myFile.txt # absolute path with no symlinks
The classic definition of the canonical representation of that file would be the last path. With local or relative paths you cannot globally identify the resource without contextual information. With absolute paths you can identify the resource, but cannot tell if two paths refer to the same entity. With two or more paths converted to their canonical forms, you can do all the above, plus determine if two resources are the same or not, if that is important to your application (solve the aliasing problem).
Note that the canonical form of a resource is not a quality of that particular form itself; there can be multiple possible canonical forms for a given type like file paths (say, lexicographically first of all possible absolute paths). One form is just selected as the canonical form for a particular application reason, or maybe arbitrarily so that everyone speaks the same language.
Forcing objects into their canonical instances is the same basic idea, but instead of determining one "best" representation of a resource, it arbitrarily chooses one instance of a class of instances with the same "content" as the canonical reference, then converts all references to equivalent objects to use the one canonical instance.
This can be used as a technique for optimizing both time and space. If there are multiple instances of equivalent objects in an application, then by forcing them all to be resolved as the single canonical instance of a particular value, you can eliminate all but one of each value, saving space and possibly time since you can now compare those values with reference identity (==) as opposed to object equivalence (equals() method).
A classic example of optimizing performance with canonical instances is collapsing strings with the same content. Calling String.intern() on two strings with the same character sequence is guaranteed to return the same canonical String object for that text. If you pass all your strings through that canonicalizer, you know equivalent strings are actually identical object references, i.e., aliases
The enum types in Java 5.0+ force all instances of a particular enum value to use the same canonical instance within a VM, even if the value is serialized and deserialized. That is why you can use if (day == Days.SUNDAY) with impunity in java if Days is an enum type. Doing this for your own classes is certainly possible, but takes care. Read Effective Java by Josh Bloch for details and advice.

Wikipedia points to the term Canonicalization.
A process for converting data that has more than one possible representation into a "standard" canonical representation. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.
The Unicode example made the most sense to me:
Variable-length encodings in the Unicode standard, in particular UTF-8, have more than one possible encoding for most common characters. This makes string validation more complicated, since every possible encoding of each string character must be considered. A software implementation which does not consider all character encodings runs the risk of accepting strings considered invalid in the application design, which could cause bugs or allow attacks. The solution is to allow a single encoding for each character. Canonicalization is then the process of translating every string character to its single allowed encoding. An alternative is for software to determine whether a string is canonicalized, and then reject it if it is not. In this case, in a client/server context, the canonicalization would be the responsibility of the client.
In summary, a standard form of representation for data. From this form you can then convert to any representation you may need.

A good example for understanding "canonical form/representation" is to look at the XML schema datatype definition of "boolean":
the "lexical representation" of boolean can be one of: {true, false, 1, 0} whereas
the "canonical representation" can only be one of {true, false}
This, in essence, means that
"true" and "1" get mapped to the canonical repr. "true" and
"false" and "0" get mapped to the canoncial repr. "false"
see the w3 XML schema datatype definition for boolean

The word "canonical" is just a synonym for "standard" or "usual". It doesn`t have any Java-specific meaning.

reduced to the simplest and most significant form without losing generality

An easy way to remember it is the way "canonical" is used in theological circles, canonical truth is the real truth so if two people find it they have found the same truth. Same with canonical instance. If you think you have found two of them (i.e. a.equals(b)) you really only have one (i.e. a == b). So equality implies identity in the case of canonical object.
Now for the comparison. You now have the choice of using a==b or a.equals(b), since they will produce the same answer in the case of canonical instance but a==b is comparison of the reference (the JVM can compare two numbers extremely rapidly as they are just two 32 bit patterns compared to a.equals(b) which is a method call and involves more overhead.

Another good example might be: you have a class that supports the use of cartesian (x, y, z), spherical (r, theta, phi) and cylindrical coordinates (r, phi, z). For purposes of establishing equality (equals method), you would probably want to convert all representations to one "canonical" representation of your choosing, e.g. spherical coordinates. (Or maybe you would want to do this in general - i.e. use one internal representation.) I am not an expert, but this did occur to me as maybe a good concrete example.

A canonical form means a naturally unique representation of the element

canonical representation means view the character in different style
for example if I write a letter A means another person may write the letter A in different style:)
This is according to OPTICAL CHARACTER RECOGNITION FIELD

The OP's questions about canonical form and how it can improve performance of the equals method can both be answered by extending the example provided in Effective Java.
Consider the following class:
public final class CaseInsensitiveString {
private final String s;
public CaseInsensitiveString(String s) {
this.s = Objects.requireNonNull(s);
}
#Override
public boolean equals(Object o) {
return o instanceof CaseInsensitiveString && ((CaseInsensitiveString) o).s.equalsIgnoreCase(s);
}
}
The equals method in this example has added cost by using String's equalsIgnoreCase method. As mentioned in the text
you may want to store a canonical form of the field so the equals
method can do a cheap exact comparison on canonical forms rather than
a more costly nonstandard comparison.
What does Joshua Bloch mean when he says canonical form? Well, I think Dónal's concise answer is very appropriate. We can store the underlying String field in the CaseInsensitiveString example in a standard way, perhaps the uppercase form of the String. Now, you can reference this canonical form of the CaseInsensitiveString, its uppercase variant, and perform cheap evaluations in your equals and hashcode methods.

Canonical Data in RDBMS, Graph Data;
Think as "Normalization" or "Normal form" of a data in a RDBMS. Same data exists in different tables, represented with a unique identifier and mapped it in different tables.
or
Think a single form of a data in Graph Database that represented in many triples.
Major benefit of it is to make Dml (Data manipulation) more efficient since you can upsert (insert/update) only one value instead of many.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.