Why does String method regionMatches not delegate to overload method [closed]

Why does String method regionMatches not delegate to overload method [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
The method boolean regionMatches(int toffset, String other, int ooffset, int len) of java.lang.String is implemented as
public boolean regionMatches(int toffset, String other, int ooffset,
int len) {
char ta[] = value;
int to = toffset;
char pa[] = other.value;
int po = ooffset;
// Note: toffset, ooffset, or len might be near -1>>>1.
if ((ooffset < 0) || (toffset < 0)
|| (toffset > (long)value.length - len)
|| (ooffset > (long)other.value.length - len)) {
return false;
}
while (len-- > 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}
Since there is an overloaded method covering the same functionality, why is this method not implemented as simple delegation, like
public boolean regionMatches(int toffset, String other, int ooffset, int len) {
return regionMatches(false, toffset, other, ooffset, len);
}

First, this is an implementation-dependent choice, so it might be possible to encounter alternative implementation actually doing this delegation you suggest. That’s why it is important to specify, which implementation you are referring to.
In case of Oracle’s JDK or OpenJDK, which seems to be the Java 8 implementation you’re referring to, the decision was most likely made for performance reasons. As you can see, the implementation of regionMatches with boolean ignoreCase parameter will re-check this parameter within the loop when two characters do not match.
It might have been the starting point for implementing both operations, but turned out to be a performance bottleneck for some cases. Usually, the decision to write a special implementation instead of handling an operation more generically, is made based on profiling widespread real life applications.
The specialized regionMatches implementation for the case sensitive match consists of a very short straight-forward loop over the character arrays, which can have a dramatic impact on the efficiency of the HotSpot optimizer. E.g. it might compile this loop to native code comparing more than one character at a time.
Newer JDKs had to adapt the code as, since Java 9, a byte[] array is used instead of a char[] array and might contain iso-latin-1 or utf-16 encoded data, so different scenarios have to be handled. The implementors took the opportunity to introduce delegation, though it is the other way round:
public boolean regionMatches(boolean ignoreCase, int toffset,
String other, int ooffset, int len) {
if (!ignoreCase) {
return regionMatches(toffset, other, ooffset, len);
}
// specialized case insensitive comparison follows
So now, you get the optimized case sensitive comparison whether you invoke regionMatches without the boolean parameter or with false. Further, the case insensitive match operation is also optimized in that the boolean parameter won’t be re-checked in a loop.

Related

java: read vs readNBytes of the InputStream instance

In java, InputStream class has methods read(byte[], int, int) and readNBytes(byte[], int, int). It seems that these two methods have exactly the same functionality, so I wonder what are the differences between them.

Edited for better visibility of discussion in the comments:
read() says it attempts to read "up to len bytes ... but a smaller number may be read. This method blocks until input data is available, end of file is detected, or an exception is thrown."
readNBytes() says "blocks until len bytes of input data have been read, end of stream is detected, or an exception is thrown."
Even though the JDK's implementation for InputStream is likely to give you identical results for both methods, the documented differences mean than other classes inheriting from it may behave differently.
E.g. Given the stream '12345<end>', read(s,0,10) is allowed to return '123', whereas readNbytes() is more likely to keep going to look for an end-of-stream and give you the whole thing.
Original answer:
You're right that the javadocs are very similar. When in doubt, always drop down to the source. Most IDEs make it easy to attach the OpenJDK source and lets you drill down to them.
This is readNBytes from InputStream.java:
public int readNBytes(byte[] b, int off, int len) throws IOException {
Objects.requireNonNull(b);
if (off < 0 || len < 0 || len > b.length - off)
throw new IndexOutOfBoundsException();
int n = 0;
while (n < len) {
int count = read(b, off + n, len - n);
if (count < 0)
break;
n += count;
}
return n;
}
As you can see, it actually performs a call to read(byte[],int,int). The difference in this case is that if the actual read bytes is less than your specified len, it will attempt to read() again until it is confirmed that there is actually nothing left to be read.
Edit: Note that
This is OpenJDK's implementation of the base InputStream. Others may differ.
Subclasses of InputStream may also have their own overridden implementation. Consult the doc/source for the relevant class.

Chained ANDs or chained ORs best practice

Which is better in terms of best practice / efficiency?
if (x == 1
&& y == 1
&& z == 1)
{ do things }
or
if (x != 1 ||
y != 1 ||
z != 1)
{ don't do things and go to a different bit of logic.}
Is there any difference in efficiency when short circuiting ANDs and ORs? Is it (generally) better to check positively or negatively when multiple logical assertions need to be made?

For pure optimization of the code it depends case-by-case. The scenario that will on average do the least amount of comparisons.
For code design it is also case-by-case. The if-cases should match what you are actually looking for. A function that tests if a string is inputted correctly for example. (the tests are made up)
public boolean isValidString (string s) {
if (s.isEmpty())
return false;
if (s.length() < 12)
return false;
if (s...)
return false
return true;
}
In this case the most logical approach is the ||. It could be written.
public boolean isValidString (string s) {
if (s.isEmpty() || s.length() < 12 || s...)
return false;
return true;
}
With http://en.wikipedia.org/wiki/De_Morgan%27s_laws this could be rewritten to not and. However it is not what we want to test, even though they yield the same result.
So stick to the logical approach in general cases.

If you think about efficiency then think about how often each case will occur. The most likely one should be put in front so the whole expression is shortcircuited immediately.

Better you use "==" instead of going for "!=".
This is also recommended with PMD.
The following is good and improves redability.
If(true){
//
}else{
//
}
than
If(!true){
//
}else{
//
}

Well, in some JVM implementations boolean values are stored as integers in the JVM. int value 1 meaning true and int value 0 meaning false. Also, comparison logic at processor level is architecture dependent. Some machines might subtract 2 operands, then add and then compare, others might compare byte by byte etc.. So, unless you are looking at a specific hardware architecture (which you shouldn't.. atleast for java programming language), I don't think this matters much..

JVM String methods implementation

String class has some methods that i cannot understand why they were implemented like this... replace is one of them.
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Are there some significant advantages over a simpler and more efficient (fast!) method?
public static String replace(String string, String searchFor, String replaceWith) {
StringBuilder result=new StringBuilder();
int index=0;
int beginIndex=0;
while((index=string.indexOf(searchFor, index))!=-1){
result.append(string.substring(beginIndex, index)+replaceWith);
index+=searchFor.length();
beginIndex=index;
}
result.append(string.substring(beginIndex, string.length()));
return result.toString();
}
Stats with Java 7:
1,000,000 iterations
replace "b" with "x" in "a.b.c"
result: "a.x.c"
Times:
string.replace: 485ms
string.replaceAll: 490ms
optimized replace = 180ms
Code like the Java 7 split method is heavily optimized to avoid pattern compile / regex processing when possible:
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0)
while (resultSize > 0 && list.get(resultSize - 1).length() == 0)
resultSize--;
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
Following the logic of the replace method:
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
The split implementation should be:
public String[] split(String regex, int limit) {
return Pattern.compile(regex).split(this, limit);
}
The performance losses are not far from the ones found on the replace methods. For some reason Oracle gives a fastpath approach on some methods and not others.

Are you sure your proposed method is indeed faster than the regex-based one used by the String class - not just for your own test input, but for every possible input that a program might throw at it? It relies on String.indexOf to do substring matching, which is itself a naive implementation that is subject to bad worst-case performance. It's entirely possible that Pattern implements a more sophisticated matching algorithm such as KMP to avoid redundant comparisons.
In general, the Java team takes performance of the core libraries very seriously, and maintains lots of internal benchmarks using a wide range of real-world data. I've never encountered a situation where regex processing was a bottleneck. My standing advice is to start by writing the simplest possible code that works correctly, and don't even begin to think about rewriting the Java built-ins until profiling proves that it's a bottleneck, and you've exhausted all other avenues of optimization.
Regarding your latest edit - first, I would not describe the split method as heavily optimized. It handles one special case that happens to be extremely common and is guaranteed not to suffer from the poor worst-case complexity described above for the naive string matching algorithm - that of splitting on a single-character, literal token.
It may very well be that the same special case could be optimized for replace, and would provide some measurable improvement. But look what it took to achieve that simple optimization - about 50 lines of code. Those lines of code come at a cost, especially when they're a part of what's probably the most widely-used class in the Java library. Cost comes in many forms:
Resources - That's 50 lines of code that some developer must spend time writing, testing, documenting, and maintaining for the lifetime of the Java language.
Risk - That's 50 opportunities for subtle bugs that slip past the initial testing.
Complexity - That's 50 extra lines of code that any developer who wants to understand how the method works must now take time to read and understand.
Your question now boils down to "why was this one method optimized to handle a special case, but not the other one?" or even more generally "why was this particular feature not implemented?" Nobody but the original author can answer that definitively, but the answer is almost always that either there is not sufficient demand for that feature, or that the benefit derived from having the feature is deemed not worth the cost of adding it.

Difference between String.isEmpty() and String.equals("")

I created a "Color Chooser" with three textboxes where the user defines rgb values.
To check if values entered are correct (only numbers between 0-255) I'm using the following:
public Color getColor() {
if (tfRed.getText().equals("") || tfGreen.getText().equals("") || tfBlue.getText().equals("")) {
return new Color(0, 0, 0, 0);
} else {
if (tfRed.getText().matches("\\d+") && tfGreen.getText().matches("\\d+") && tfBlue.getText().matches("\\d+")) {
// ...
} else {
return new Color(0, 0, 0, 0);
}
}
}
What I'm asking: is it better to use String.isEmpty()? I never found a satisfying answer and I've always wondered if there is any difference.

I think isEmpty() is a bit more efficient. However a smart compiler may optimize the equals("") call anyway. From the OpenJDK source:
671 public boolean isEmpty() {
672 return count == 0;
673 }
1013 public boolean equals(Object anObject) {
1014 if (this == anObject) {
1015 return true;
1016 }
1017 if (anObject instanceof String) {
1018 String anotherString = (String)anObject;
1019 int n = count;
1020 if (n == anotherString.count) {
1021 char v1[] = value;
1022 char v2[] = anotherString.value;
1023 int i = offset;
1024 int j = anotherString.offset;
1025 while (n-- != 0) {
1026 if (v1[i++] != v2[j++])
1027 return false;
1028 }
1029 return true;
1030 }
1031 }
1032 return false;
1033 }
Also the answer here on whether to use str.isEmpty() or "".equals(str) is spot on:
The main benefit of "".equals(s) is you don't need the null check (equals will check its argument and return false if it's null), which you seem to not care about. If you're not worried about s being null (or are otherwise checking for it), I would definitely use s.isEmpty(); it shows exactly what you're checking, you care whether or not s is empty, not whether it equals the empty string

Yes, use String.isEmpty(). It is cleaner (semantically) (performance is also slightly better, but that would be unnoticable) If the instance can be null, use commons-lang StringUtils.isEmpty(string)

Since isEmpty() checks if the length of the String is 0 and "" is the only String with length 0, each String for which isEmpty() returns true would also return true to .equals(""). So technically, they do the same thing.
There might be a minimal difference in performance, but I wouldn't bother about that at all (I'd be very surprised if it were noticeable in production code).
Another difference is if you wrote "".equals(someString), then it would be "null-safe". In other words: if someString was null, this construct would simply evaluate to false and not throw a NullPointerException. If, however, you have someString.equals("") then this would not apply.
The most important difference is how it's read: isEmpty() makes the intention very clear: you want to treat empty strings differently. .equals("") is ever so slightly less clear ("if that string equals that other string, that happens to be empty").

Typically, I like to use equals but in reverse, ie:
"".equals(someString);
Null-safe :)
But yeah, isEmpty() is a simpler operation but not so much that I see it making any significant performance contribution (unless you are writing embedded real-time stuff).

With myString.equals(""), first the compiler creates an String object (it is equivalent to myString.equals(new String("")).
So, isEmpty() should be a better option (although equals("") is very popular).

In theory, it is. For isEmpty(), only the internal metadata of the string has to be looked at (e.g., it's length). For the comparison, you would expect slightly more case differentiations happening.
In practice, it does not matter. You would not observe the speed difference.
Rule of thump: Use the method that is best understood / most readable by the programmer. If it is a test for empty string, I think isEmpty() fits that purpose best.

isEmpty() is faster because it only compares the length integer field in String class to 0 while comparing to an empty string will at best compare references (similar speed) and at worst - run a loop with 0 iterations.
But the biggest difference is readability - isEmpty() is shorter and easier to grasp. BTW I wish there was an isBlank() shorthand for .trim().isEmpty()...

One more reason using myString.equals("") or myString.length() == 0 is that String#isEmpty() method was introduced in Java 1.6.
So arguments to do not use String#isEmpty() can be compatibility reasons with previous versions of Java.

It's partly a matter of history and legacy uses. isEmpty() was only added in JDK 6:
/**
* Returns <tt>true</tt> if, and only if, {#link #length()} is <tt>0</tt>.
*
* #return <tt>true</tt> if {#link #length()} is <tt>0</tt>, otherwise
* <tt>false</tt>
*
* #since 1.6
*/
public boolean isEmpty() {
Before that, everyone compared with "" to see if an String was empty or not. Old habits die hard, so loads of people keep using the "" comparison.
Of course, as mentioned by someone else already, if you use "".equals(someString) then it's automatically null safe. Many people combine the idea of isEmpty with null safeness, by making a static isEmpty method.

isEmpty was only introduced in 1.6. Check Since tag in javadoc.
Therefore, if you are compiling for 1.5 and lower equals("") is your only choice.
However, if version compatibility is not of your concern, I would use isEmpty. As Bozho pointed out it is semantically cleaner ( and a bit faster ).

I had always used .isEmpty()... until today, when I discovered that it does not exist in Java 5.
So :
In Java 6 and newer, we have the choice, and I recommend using .isEmpty(), it is easier to write and clearer to read.
In Java 5 and older we have to use .equals("").

String.equals("") is bit slower than just an isEmpty() call. Strings store a count variable initialized in the constructor, since Strings are immutable.
isEmpty() compares the count variable to 0, while equals will check the type, string length, and then iterate over the string for comparison if the sizes match.
So to answer your question, isEmpty() will actually do a lot less! and that's a good thing.

Recursive method to determine if a string is a hex number - Java

This is a homework question that I am having a bit of trouble with.
Write a recursive method that determines if a String is a hex number.
Write javadocs for your method.
A String is a hex number if each and every character is either
0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9
or a or A or b or B or c or C or d or D or e or E or f or f.
At the moment all I can see to test this is if the character at 0 of the string is one of these values he gave me then that part of it is a hex.
Any hints or suggestions to help me out?
This is what I have so far: `
public boolean isHex(String string){
if (string.charAt(0)==//what goes here?){
//call isHex again on s.substring(1)
}else return false;
}
`

If you're looking for a good hex digit method:
boolean isHexDigit(char c) {
return Character.isDigit(c) || (Character.toUpperCase(c) >= 'A' && Character.toUpperCase(c) <= 'F');
}
Hints or suggestions, as requested:
All recursive methods call themselves with a different input (well, hopefully a different input!)
All recursive methods have a stop condition.
Your method signature should look something like this
boolean isHexString(String s) {
// base case here - an if condition
// logic and recursion - a return value
}
Also, don't forget that hex strings can start with "0x". This might be (more) difficult to do, so I would get the regular function working first. If you tackle it later, try to keep in mind that 0xABCD0x123 shouldn't pass. :-)
About substring: Straight from the String class source:
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
offset is a member variable of type int
value is a member variable of type char[]
and the constructor it calls is
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
It's clearly an O(1) method, calling an O(1) constructor. It can do this because String is immutable. You can't change the value of a String, only create a new one. (Let's leave out things like reflection and sun.misc.unsafe since they are extraneous solutions!) Since it can't be changed, you also don't have to worry about some other Thread messing with it, so it's perfectly fine to pass around like the village bicycle.

Since this is homework, I only give some hints instead of code:
Write a method that always tests the first character of a String if it fulfills the requirements. If not, return false, if yes, call the method again with the same String, but the first character missing. If it is only 1 character left and it is also a hex character then return true.
Pseudocode:
public boolean isHex(String testString) {
If String has 0 characters -> return true;
Else
If first character is a hex character -> call isHex with the remaining characters
Else if the first character is not a hex character -> return false;
}

When solving problems recursively, you generally want to solve a small part (the 'base case'), and then recurse on the rest.
You've figured out the base case - checking if a single character is hex or not.
Now you need to 'recurse on the rest'.
Here's some pseudocode (Python-ish) for reversing a string - hopefully you will see how similar methods can be applied to your problem (indeed, all recursive problems)
def ReverseString(str):
# base case (simple)
if len(str) <= 1:
return str
# recurse on the rest...
return last_char(str) + ReverseString(all_but_last_char(str))

Sounds like you should recursively iterate the characters in string and return the boolean AND of whether or not the current character is in [0-9A-Fa-f] with the recursive call...

You have already received lots of useful answers. In case you want to train your recursive skills (and Java skills in general) a bit more I can recommend you to visit Coding Bat. You will find a lot of exercises together with automated tests.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.