Apache StringUtils vs Java implementation of replace() - java

What would be the difference between Java 1.4.2's implementation of replace, and Apache 2.3's implementation? Is there a performance gain one over another?
Java 1.4.2 replace
Apache 2.3 replace

The String.replace() method you linked to takes two char values, so it only ever replaces on character with another (possibly multiple times, 'though).
The StringUtils.replace() method on the other hand takes String values as the search string and replacement, so it can replace longer substrings.
The comparable method in Java would be replaceAll(). replaceAll() is likely to be slower than the StringUtils method, because it supports regular expressions and thus introduces the overhead of compiling the search string first and running a regex search.
Note that Java 5 introduced String.replace(CharSequence, CharSequence) which does the same thing as StringUtils.replace(String,String) (except that it throws a NullPointerException if any of its arguments are null). Note that CharSequence is an interface implemented by String, so you can use plain old String objects here.

public class Compare {
public static void main(String[] args) {
StringUtils.isAlphanumeric(""); // Overhead of static class initialization for StringUtils
String key = "0 abcdefghijklmno" + Character.toString('\n') + Character.toString('\r');
String key1 = replace1(key);
String key2 = replace2(key);
}
private static String replace1(String key) {
long start = System.nanoTime();
key = StringUtils.replaceChars(key, ' ', '_');
key = StringUtils.replaceChars(key, '\n', '_');
key = StringUtils.replaceChars(key, '\r', '_');
long end = System.nanoTime() - start;
System.out.println("Time taken : " + end);
return key;
}
public static String replace2(String word) {
long start = System.nanoTime();
char[] charArr = word.toCharArray();
int length = charArr.length;
for (int i = 0; i < length; i++) {
if (charArr[i] == ' ' || charArr[i] == '\n' || charArr[i] == '\r') {
charArr[i] = '_';
}
}
String temp = new String(charArr);
long end = System.nanoTime() - start;
System.out.println("Time taken : " + end);
return temp;
}
}
Time taken : 6400
Time taken : 5888
Times are almost the same!
I've edited the code to drop out overheads of replace2 which were not because of JDK implementation.

1.4.2 replaces operates only with char arguments whereas the Apache 2.3 one takes in strings.

String.replace(char, char) can't replace whole strings
you can have null values with StringUtils.replace(..).
String.replace(CharSequence s1, CharSequence s2) will do the same thing if the first string is not-null. Otherwise it will throw a NullPointerException

Apache's is quite a bit faster, if I recall correctly. Recommended.

To replace a string character with another string using StringUtil.Replace, I tried following and it's working fine for me to replace multiple string values from a single string.
String info = "[$FIRSTNAME$]_[$LASTNAME$]_[$EMAIL$]_[$ADDRESS$]";
String replacedString = StringUtil.replace(info, new String[] { "[$FIRSTNAME$]","[$LASTNAME$]","[$EMAIL$]","[$ADDRESS$]" }, new String[] { "XYZ", "ABC" ,"abc#abc.com" , "ABCD"});
This will replace the String value of info with newly provided value...

Related

Efficient way to replace chars in a string (java)?

I'm writing a small JAVA program which:
takes a text as a String
takes 2 arrays of chars
What im trying to do will sound like "find and replace" but it is not the same so i thought its important to clear it.
Anyway I want to take this text, find if any char from the first array match a char in the text and if so, replace it with the matching char (according to index) from the second char array.
I'll explain with an example:
lets say my text (String) is: "java is awesome!";
i have 2 arrays (char[]): "absm" and "!#*$".
The wished result is to change 'a' to '!' , 'b' to '#' and so on..
meaning the resulted text will be:
"java is awesome!" changed to -> "j#v# i* #w*o$e!"
What is the most efficient way of doing this and why?
I thought about looping the text, but then i found it not so efficient.
(StringBuilder/String class can be used)
StringBuilder sb = new StringBuilder(text);
for(int i = 0; i<text.length(); i ++)
{
for (int j = 0; j < firstCharArray.length;j++)
{
if (sb.charAt(i) == firstCharArray[j])
{
sb.setCharAt(i, secondCharArray[j]);
break;
}
}
}
This way is efficient because it uses a StringBuilder to change the characters in place (if you used Strings you would have to create new ones each time because they are immutable.) Also it minimizes the amount of passes you have to do (1 pass through the text string and n passes through the first array where n = text.length())
I guess you are looking for StringUtils.replaceEach, at least as a reference.
How efficient do you need it to be? Are you doing this for hundreds, thousands, millions of words???
I don't know if it's the most efficent, but you could use the string indexOf() method on each of your possible tokens, it will tell you if it's there, and then you can replace that index at the same time with the corresponding char from the other array.
Codewise, something like (this is half pseudo code by the way):
for(each of first array) {
int temp = YourString.indexOf(current array field);
if (temp >=0) {
replace with other array
}
}
Put the 2 arrays you have in a Map
Map<Character, Character> //or Map of Strings
where the key is "a", "b" etc... and the value is the character you want to substitute with - "#" etc....
Then simply replace the keys in your String with the values.
For small stuff like this, an indexOf() search is likely to be faster than a map, while "avoiding" the inner loop of the accepted answer. Of course, the loop is still there, inside String.indexOf(), but it's likely to be optimized to a fare-thee-well by the JIT-compiler, because it's so heavily used.
static String replaceChars(String source, String from, String to)
{
StringBuilder dest = new StringBuilder(source);
for ( int i = 0; i < source.length(); i++ )
{
int foundAt = from.indexOf(source.charAt(i));
if ( foundAt >= 0 )
dest.setCharAt(i,to.charAt(foundAt));
}
return dest.toString();
}
Update: The Oracle/Sun JIT uses SIMD on at least some processors for indexOf(), making it even faster than one would guess.
Since the only way to know if a character should be replaced is to check it, you (or any util method) have to loop through the whole text, character after the other. You can never achieve better complexity than O(n) (n be the number of characters in the text).
This utility class that replaces a char or a group of chars of a String. It is equivalent to bash tr and perl tr///, aka, transliterate.
/**
* Utility class that replaces chars of a String, aka, transliterate.
*
* It's equivalent to bash 'tr' and perl 'tr///'.
*
*/
public class ReplaceChars {
public static String replace(String string, String from, String to) {
return new String(replace(string.toCharArray(), from.toCharArray(), to.toCharArray()));
}
public static char[] replace(char[] chars, char[] from, char[] to) {
char[] output = chars.clone();
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < from.length; j++) {
if (output[i] == from[j]) {
output[i] = to[j];
break;
}
}
}
return output;
}
/**
* For tests!
*/
public static void main(String[] args) {
// Example from: https://en.wikipedia.org/wiki/Caesar_cipher
String string = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
String from = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String to = "XYZABCDEFGHIJKLMNOPQRSTUVW";
System.out.println();
System.out.println("Cesar cypher: " + string);
System.out.println("Result: " + ReplaceChars.replace(string, from, to));
}
}
This is the output:
Cesar cypher: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Result: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

Remove all occurrences of char from string

I can use this:
String str = "TextX Xto modifyX";
str = str.replace('X','');//that does not work because there is no such character ''
Is there a way to remove all occurrences of character X from a String in Java?
I tried this and is not what I want: str.replace('X',' '); //replace with space
Try using the overload that takes CharSequence arguments (eg, String) rather than char:
str = str.replace("X", "");
Using
public String replaceAll(String regex, String replacement)
will work.
Usage would be str.replace("X", "");.
Executing
"Xlakjsdf Xxx".replaceAll("X", "");
returns:
lakjsdf xx
If you want to do something with Java Strings, Commons Lang StringUtils is a great place to look.
StringUtils.remove("TextX Xto modifyX", 'X');
String test = "09-09-2012";
String arr [] = test.split("-");
String ans = "";
for(String t : arr)
ans+=t;
This is the example for where I have removed the character - from the String.
Hello Try this code below
public class RemoveCharacter {
public static void main(String[] args){
String str = "MXy nameX iXs farXazX";
char x = 'X';
System.out.println(removeChr(str,x));
}
public static String removeChr(String str, char x){
StringBuilder strBuilder = new StringBuilder();
char[] rmString = str.toCharArray();
for(int i=0; i<rmString.length; i++){
if(rmString[i] == x){
} else {
strBuilder.append(rmString[i]);
}
}
return strBuilder.toString();
}
}
I like using RegEx in this occasion:
str = str.replace(/X/g, '');
where g means global so it will go through your whole string and replace all X with '';
if you want to replace both X and x, you simply say:
str = str.replace(/X|x/g, '');
(see my fiddle here: fiddle)
Use replaceAll instead of replace
str = str.replaceAll("X,"");
This should give you the desired answer.
Evaluation of main answers with a performance benchmark which confirms concerns that the current chosen answer makes costly regex operations under the hood
To date the provided answers come in 3 main styles (ignoring the JavaScript answer ;) ):
Use String.replace(charsToDelete, ""); which uses regex under the hood
Use Lambda
Use simple Java implementation
In terms of code size clearly the String.replace is the most terse. The simple Java implementation is slightly smaller and cleaner (IMHO) than the Lambda (don't get me wrong - I use Lambdas often where they are appropriate)
Execution speed was, in order of fastest to slowest: simple Java implementation, Lambda and then String.replace() (that invokes regex).
By far the fastest implementation was the simple Java implementation tuned so that it preallocates the StringBuilder buffer to the max possible result length and then simply appends chars to the buffer that are not in the "chars to delete" string. This avoids any reallocates that would occur for Strings > 16 chars in length (the default allocation for StringBuilder) and it avoids the "slide left" performance hit of deleting characters from a copy of the string that occurs is the Lambda implementation.
The code below runs a simple benchmark test, running each implementation 1,000,000 times and logs the elapsed time.
The exact results vary with each run but the order of performance never changes:
Start simple Java implementation
Time: 157 ms
Start Lambda implementation
Time: 253 ms
Start String.replace implementation
Time: 634 ms
The Lambda implementation (as copied from Kaplan's answer) may be slower because it performs a "shift left by one" of all characters to the right of the character being deleted. This would obviously get worse for longer strings with lots of characters requiring deletion. Also there might be some overhead in the Lambda implementation itself.
The String.replace implementation, uses regex and does a regex "compile" at each call. An optimization of this would be to use regex directly and cache the compiled pattern to avoid the cost of compiling it each time.
package com.sample;
import java.util.function.BiFunction;
import java.util.stream.IntStream;
public class Main {
static public String deleteCharsSimple(String fromString, String charsToDelete)
{
StringBuilder buf = new StringBuilder(fromString.length()); // Preallocate to max possible result length
for(int i = 0; i < fromString.length(); i++)
if (charsToDelete.indexOf(fromString.charAt(i)) < 0)
buf.append(fromString.charAt(i)); // char not in chars to delete so add it
return buf.toString();
}
static public String deleteCharsLambda(String fromString1, String charsToDelete)
{
BiFunction<String, String, String> deleteChars = (fromString, chars) -> {
StringBuilder buf = new StringBuilder(fromString);
IntStream.range(0, buf.length()).forEach(i -> {
while (i < buf.length() && chars.indexOf(buf.charAt(i)) >= 0)
buf.deleteCharAt(i);
});
return (buf.toString());
};
return deleteChars.apply(fromString1, charsToDelete);
}
static public String deleteCharsReplace(String fromString, String charsToDelete)
{
return fromString.replace(charsToDelete, "");
}
public static void main(String[] args)
{
String str = "XXXTextX XXto modifyX";
String charsToDelete = "X"; // Should only be one char as per OP's requirement
long start, end;
System.out.println("Start simple");
start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++)
deleteCharsSimple(str, charsToDelete);
end = System.currentTimeMillis();
System.out.println("Time: " + (end - start));
System.out.println("Start lambda");
start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++)
deleteCharsLambda(str, charsToDelete);
end = System.currentTimeMillis();
System.out.println("Time: " + (end - start));
System.out.println("Start replace");
start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++)
deleteCharsReplace(str, charsToDelete);
end = System.currentTimeMillis();
System.out.println("Time: " + (end - start));
}
}
You will need to put the characters needs to be removed inside the square brackets during the time of replacement. The example code will be as following:
String s = "$116.42".replaceAll("[$]", "");
here is a lambda function which removes all characters passed as string
BiFunction<String,String,String> deleteChars = (fromString, chars) -> {
StringBuilder buf = new StringBuilder( fromString );
IntStream.range( 0, buf.length() ).forEach( i -> {
while( i < buf.length() && chars.indexOf( buf.charAt( i ) ) >= 0 )
buf.deleteCharAt( i );
} );
return( buf.toString() );
};
String str = "TextX XYto modifyZ";
deleteChars.apply( str, "XYZ" ); // –> "Text to modify"
This solution takes into acount that the resulting String – in difference to replace() – never becomes larger than the starting String when removing characters. So it avoids the repeated allocating and copying while appending character-wise to the StringBuilder as replace() does.
Not to mention the pointless generation of Pattern and Matcher instances in replace() that are never needed for removal.
In difference to replace() this solution can delete several characters in one swoop.
…another lambda
copying a new string from the original, but leaving out the character that is to delete
String text = "removing a special character from a string";
int delete = 'e';
int[] arr = text.codePoints().filter( c -> c != delete ).toArray();
String rslt = new String( arr, 0, arr.length );
gives: rmoving a spcial charactr from a string
package com.acn.demo.action;
public class RemoveCharFromString {
static String input = "";
public static void main(String[] args) {
input = "abadbbeb34erterb";
char token = 'b';
removeChar(token);
}
private static void removeChar(char token) {
// TODO Auto-generated method stub
System.out.println(input);
for (int i=0;i<input.length();i++) {
if (input.charAt(i) == token) {
input = input.replace(input.charAt(i), ' ');
System.out.println("MATCH FOUND");
}
input = input.replaceAll(" ", "");
System.out.println(input);
}
}
}
You can use str = str.replace("X", ""); as mentioned before and you will be fine. For your information '' is not an empty (or a valid) character but '\0' is.
So you could use str = str.replace('X', '\0'); instead.

Java - removing first character of a string

In Java, I have a String:
Jamaica
I would like to remove the first character of the string and then return amaica
How would I do this?
const str = "Jamaica".substring(1)
console.log(str)
Use the substring() function with an argument of 1 to get the substring from position 1 (after the first character) to the end of the string (leaving the second argument out defaults to the full length of the string).
public String removeFirstChar(String s){
return s.substring(1);
}
In Java, remove leading character only if it is a certain character
Use the Java ternary operator to quickly check if your character is there before removing it. This strips the leading character only if it exists, if passed a blank string, return blankstring.
String header = "";
header = header.startsWith("#") ? header.substring(1) : header;
System.out.println(header);
header = "foobar";
header = header.startsWith("#") ? header.substring(1) : header;
System.out.println(header);
header = "#moobar";
header = header.startsWith("#") ? header.substring(1) : header;
System.out.println(header);
Prints:
blankstring
foobar
moobar
Java, remove all the instances of a character anywhere in a string:
String a = "Cool";
a = a.replace("o","");
//variable 'a' contains the string "Cl"
Java, remove the first instance of a character anywhere in a string:
String b = "Cool";
b = b.replaceFirst("o","");
//variable 'b' contains the string "Col"
Use substring() and give the number of characters that you want to trim from front.
String value = "Jamaica";
value = value.substring(1);
Answer: "amaica"
You can use the substring method of the String class that takes only the beginning index and returns the substring that begins with the character at the specified index and extending to the end of the string.
String str = "Jamaica";
str = str.substring(1);
substring() method returns a new String that contains a subsequence of characters currently contained in this sequence.
The substring begins at the specified start and extends to the character at index end - 1.
It has two forms. The first is
String substring(int FirstIndex)
Here, FirstIndex specifies the index at which the substring will
begin. This form returns a copy of the substring that begins at
FirstIndex and runs to the end of the invoking string.
String substring(int FirstIndex, int endIndex)
Here, FirstIndex specifies the beginning index, and endIndex specifies
the stopping point. The string returned contains all the characters
from the beginning index, up to, but not including, the ending index.
Example
String str = "Amiyo";
// prints substring from index 3
System.out.println("substring is = " + str.substring(3)); // Output 'yo'
you can do like this:
String str = "Jamaica";
str = str.substring(1, title.length());
return str;
or in general:
public String removeFirstChar(String str){
return str.substring(1, title.length());
}
public String removeFirst(String input)
{
return input.substring(1);
}
The key thing to understand in Java is that Strings are immutable -- you can't change them. So it makes no sense to speak of 'removing a character from a string'. Instead, you make a NEW string with just the characters you want. The other posts in this question give you a variety of ways of doing that, but its important to understand that these don't change the original string in any way. Any references you have to the old string will continue to refer to the old string (unless you change them to refer to a different string) and will not be affected by the newly created string.
This has a number of implications for performance. Each time you are 'modifying' a string, you are actually creating a new string with all the overhead implied (memory allocation and garbage collection). So if you want to make a series of modifications to a string and care only about the final result (the intermediate strings will be dead as soon as you 'modify' them), it may make more sense to use a StringBuilder or StringBuffer instead.
I came across a situation where I had to remove not only the first character (if it was a #, but the first set of characters.
String myString = ###Hello World could be the starting point, but I would only want to keep the Hello World. this could be done as following.
while (myString.charAt(0) == '#') { // Remove all the # chars in front of the real string
myString = myString.substring(1, myString.length());
}
For OP's case, replace while with if and it works aswell.
You can simply use substring().
String myString = "Jamaica"
String myStringWithoutJ = myString.substring(1)
The index in the method indicates from where we are getting the result string, in this case we are getting it after the first position because we dont want that "J" in "Jamaica".
Another solution, you can solve your problem using replaceAll with some regex ^.{1} (regex demo) for example :
String str = "Jamaica";
int nbr = 1;
str = str.replaceAll("^.{" + nbr + "}", "");//Output = amaica
My version of removing leading chars, one or multiple. For example, String str1 = "01234", when removing leading '0', result will be "1234". For a String str2 = "000123" result will be again "123". And for String str3 = "000" result will be empty string: "". Such functionality is often useful when converting numeric strings into numbers.The advantage of this solution compared with regex (replaceAll(...)) is that this one is much faster. This is important when processing large number of Strings.
public static String removeLeadingChar(String str, char ch) {
int idx = 0;
while ((idx < str.length()) && (str.charAt(idx) == ch))
idx++;
return str.substring(idx);
}
##KOTLIN
#Its working fine.
tv.doOnTextChanged { text: CharSequence?, start, count, after ->
val length = text.toString().length
if (length==1 && text!!.startsWith(" ")) {
tv?.setText("")
}
}

Regex for specifying an empty string

I use a validator that requires a regex to be specified. In the case of validating against an empty string, I don't know how to generate such a regex. What regex can I use to match the empty string?
The regex ^$ matches only empty strings (i.e. strings of length 0). Here ^ and $ are the beginning and end of the string anchors, respectively.
If you need to check if a string contains only whitespaces, you can use ^\s*$. Note that \s is the shorthand for the whitespace character class.
Finally, in Java, matches attempts to match against the entire string, so you can omit the anchors should you choose to.
References
regular-expressions.info/Character classes and Anchors
API references
String.matches, Pattern.matches and Matcher.matches
Non-regex solution
You can also use String.isEmpty() to check if a string has length 0. If you want to see if a string contains only whitespace characters, then you can trim() it first and then check if it's isEmpty().
I don't know about Java specifically, but ^$ usually works (^ matches only at the start of the string, $ only at the end).
If you have to use regexp in Java for checking empty string you can simply use
testString.matches("")
please see examples:
String testString = "";
System.out.println(testString.matches(""));
or for checking if only white-spaces:
String testString = " ";
testString.trim().matches("");
but anyway using
testString.isEmpty();
testString.trim().isEmpty();
should be better from performance perspective.
public static void main(String[] args) {
String testString = "";
long startTime = System.currentTimeMillis();
for (int i =1; i <100000000; i++) {
// 50% of testStrings are empty.
if ((int)Math.round( Math.random()) == 0) {
testString = "";
} else {
testString = "abcd";
}
if (!testString.isEmpty()){
testString.matches("");
}
}
long endTime = System.currentTimeMillis();
System.out.println("Total testString.empty() execution time: " + (endTime-startTime) + "ms");
startTime = System.currentTimeMillis();
for (int i =1; i <100000000; i++) {
// 50% of testStrings are empty.
if ((int)Math.round( Math.random()) == 0) {
testString = "";
} else {
testString = "abcd";
}
testString.matches("");
}
endTime = System.currentTimeMillis();
System.out.println("Total testString.matches execution time: " + (endTime-startTime) + "ms");
}
Output:
C:\Java\jdk1.8.0_221\bin\java.exe
Total testString.empty() execution time: 11023ms
Total testString.matches execution time: 17831ms
For checking empty string i guess there is no need of regex itself...
u Can check length of the string directly ..
in many cases empty string and null checked together for extra precision.
like String.length >0 && String != null

What is the easiest/best/most correct way to iterate through the characters of a string in Java?

Some ways to iterate through the characters of a string in Java are:
Using StringTokenizer?
Converting the String to a char[] and iterating over that.
What is the easiest/best/most correct way to iterate?
I use a for loop to iterate the string and use charAt() to get each character to examine it. Since the String is implemented with an array, the charAt() method is a constant time operation.
String s = "...stuff...";
for (int i = 0; i < s.length(); i++){
char c = s.charAt(i);
//Process char
}
That's what I would do. It seems the easiest to me.
As far as correctness goes, I don't believe that exists here. It is all based on your personal style.
Two options
for(int i = 0, n = s.length() ; i < n ; i++) {
char c = s.charAt(i);
}
or
for(char c : s.toCharArray()) {
// process c
}
The first is probably faster, then 2nd is probably more readable.
Note most of the other techniques described here break down if you're dealing with characters outside of the BMP (Unicode Basic Multilingual Plane), i.e. code points that are outside of the u0000-uFFFF range. This will only happen rarely, since the code points outside this are mostly assigned to dead languages. But there are some useful characters outside this, for example some code points used for mathematical notation, and some used to encode proper names in Chinese.
In that case your code will be:
String str = "....";
int offset = 0, strLen = str.length();
while (offset < strLen) {
int curChar = str.codePointAt(offset);
offset += Character.charCount(curChar);
// do something with curChar
}
The Character.charCount(int) method requires Java 5+.
Source: http://mindprod.com/jgloss/codepoint.html
In Java 8 we can solve it as:
String str = "xyz";
str.chars().forEachOrdered(i -> System.out.print((char)i));
str.codePoints().forEachOrdered(i -> System.out.print((char)i));
The method chars() returns an IntStream as mentioned in doc:
Returns a stream of int zero-extending the char values from this
sequence. Any char which maps to a surrogate code point is passed
through uninterpreted. If the sequence is mutated while the stream is
being read, the result is undefined.
The method codePoints() also returns an IntStream as per doc:
Returns a stream of code point values from this sequence. Any
surrogate pairs encountered in the sequence are combined as if by
Character.toCodePoint and the result is passed to the stream. Any
other code units, including ordinary BMP characters, unpaired
surrogates, and undefined code units, are zero-extended to int values
which are then passed to the stream.
How is char and code point different? As mentioned in this article:
Unicode 3.1 added supplementary characters, bringing the total number
of characters to more than the 2^16 = 65536 characters that can be
distinguished by a single 16-bit char. Therefore, a char value no
longer has a one-to-one mapping to the fundamental semantic unit in
Unicode. JDK 5 was updated to support the larger set of character
values. Instead of changing the definition of the char type, some of
the new supplementary characters are represented by a surrogate pair
of two char values. To reduce naming confusion, a code point will be
used to refer to the number that represents a particular Unicode
character, including supplementary ones.
Finally why forEachOrdered and not forEach ?
The behaviour of forEach is explicitly nondeterministic where as the forEachOrdered performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order. So forEach does not guarantee that the order would be kept. Also check this question for more.
For difference between a character, a code point, a glyph and a grapheme check this question.
I agree that StringTokenizer is overkill here. Actually I tried out the suggestions above and took the time.
My test was fairly simple: create a StringBuilder with about a million characters, convert it to a String, and traverse each of them with charAt() / after converting to a char array / with a CharacterIterator a thousand times (of course making sure to do something on the string so the compiler can't optimize away the whole loop :-) ).
The result on my 2.6 GHz Powerbook (that's a mac :-) ) and JDK 1.5:
Test 1: charAt + String --> 3138msec
Test 2: String converted to array --> 9568msec
Test 3: StringBuilder charAt --> 3536msec
Test 4: CharacterIterator and String --> 12151msec
As the results are significantly different, the most straightforward way also seems to be the fastest one. Interestingly, charAt() of a StringBuilder seems to be slightly slower than the one of String.
BTW I suggest not to use CharacterIterator as I consider its abuse of the '\uFFFF' character as "end of iteration" a really awful hack. In big projects there's always two guys that use the same kind of hack for two different purposes and the code crashes really mysteriously.
Here's one of the tests:
int count = 1000;
...
System.out.println("Test 1: charAt + String");
long t = System.currentTimeMillis();
int sum=0;
for (int i=0; i<count; i++) {
int len = str.length();
for (int j=0; j<len; j++) {
if (str.charAt(j) == 'b')
sum = sum + 1;
}
}
t = System.currentTimeMillis()-t;
System.out.println("result: "+ sum + " after " + t + "msec");
There are some dedicated classes for this:
import java.text.*;
final CharacterIterator it = new StringCharacterIterator(s);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
// process c
...
}
If you have Guava on your classpath, the following is a pretty readable alternative. Guava even has a fairly sensible custom List implementation for this case, so this shouldn't be inefficient.
for(char c : Lists.charactersOf(yourString)) {
// Do whatever you want
}
UPDATE: As #Alex noted, with Java 8 there's also CharSequence#chars to use. Even the type is IntStream, so it can be mapped to chars like:
yourString.chars()
.mapToObj(c -> Character.valueOf((char) c))
.forEach(c -> System.out.println(c)); // Or whatever you want
If you need to iterate through the code points of a String (see this answer) a shorter / more readable way is to use the CharSequence#codePoints method added in Java 8:
for(int c : string.codePoints().toArray()){
...
}
or using the stream directly instead of a for loop:
string.codePoints().forEach(c -> ...);
There is also CharSequence#chars if you want a stream of the characters (although it is an IntStream, since there is no CharStream).
If you need performance, then you must test on your environment. No other way.
Here example code:
int tmp = 0;
String s = new String(new byte[64*1024]);
{
long st = System.nanoTime();
for(int i = 0, n = s.length(); i < n; i++) {
tmp += s.charAt(i);
}
st = System.nanoTime() - st;
System.out.println("1 " + st);
}
{
long st = System.nanoTime();
char[] ch = s.toCharArray();
for(int i = 0, n = ch.length; i < n; i++) {
tmp += ch[i];
}
st = System.nanoTime() - st;
System.out.println("2 " + st);
}
{
long st = System.nanoTime();
for(char c : s.toCharArray()) {
tmp += c;
}
st = System.nanoTime() - st;
System.out.println("3 " + st);
}
System.out.println("" + tmp);
On Java online I get:
1 10349420
2 526130
3 484200
0
On Android x86 API 17 I get:
1 9122107
2 13486911
3 12700778
0
I wouldn't use StringTokenizer as it is one of classes in the JDK that's legacy.
The javadoc says:
StringTokenizer is a legacy class that
is retained for compatibility reasons
although its use is discouraged in new
code. It is recommended that anyone
seeking this functionality use the
split method of String or the
java.util.regex package instead.
public class Main {
public static void main(String[] args) {
String myStr = "Hello";
String myStr2 = "World";
for (int i = 0; i < myStr.length(); i++) {
char result = myStr.charAt(i);
System.out.println(result);
}
for (int i = 0; i < myStr2.length(); i++) {
char result = myStr2.charAt(i);
System.out.print(result);
}
}
}
Output:
H
e
l
l
o
World
See The Java Tutorials: Strings.
public class StringDemo {
public static void main(String[] args) {
String palindrome = "Dot saw I was Tod";
int len = palindrome.length();
char[] tempCharArray = new char[len];
char[] charArray = new char[len];
// put original string in an array of chars
for (int i = 0; i < len; i++) {
tempCharArray[i] = palindrome.charAt(i);
}
// reverse array of chars
for (int j = 0; j < len; j++) {
charArray[j] = tempCharArray[len - 1 - j];
}
String reversePalindrome = new String(charArray);
System.out.println(reversePalindrome);
}
}
Put the length into int len and use for loop.
StringTokenizer is totally unsuited to the task of breaking a string into its individual characters. With String#split() you can do that easily by using a regex that matches nothing, e.g.:
String[] theChars = str.split("|");
But StringTokenizer doesn't use regexes, and there's no delimiter string you can specify that will match the nothing between characters. There is one cute little hack you can use to accomplish the same thing: use the string itself as the delimiter string (making every character in it a delimiter) and have it return the delimiters:
StringTokenizer st = new StringTokenizer(str, str, true);
However, I only mention these options for the purpose of dismissing them. Both techniques break the original string into one-character strings instead of char primitives, and both involve a great deal of overhead in the form of object creation and string manipulation. Compare that to calling charAt() in a for loop, which incurs virtually no overhead.
Elaborating on this answer and this answer.
Above answers point out the problem of many of the solutions here which don't iterate by code point value -- they would have trouble with any surrogate chars. The java docs also outline the issue here (see "Unicode Character Representations"). Anyhow, here's some code that uses some actual surrogate chars from the supplementary Unicode set, and converts them back to a String. Note that .toChars() returns an array of chars: if you're dealing with surrogates, you'll necessarily have two chars. This code should work for any Unicode character.
String supplementary = "Some Supplementary: 𠜎𠜱𠝹𠱓";
supplementary.codePoints().forEach(cp ->
System.out.print(new String(Character.toChars(cp))));
This Example Code will Help you out!
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;
public class Solution {
public static void main(String[] args) {
HashMap<String, Integer> map = new HashMap<String, Integer>();
map.put("a", 10);
map.put("b", 30);
map.put("c", 50);
map.put("d", 40);
map.put("e", 20);
System.out.println(map);
Map sortedMap = sortByValue(map);
System.out.println(sortedMap);
}
public static Map sortByValue(Map unsortedMap) {
Map sortedMap = new TreeMap(new ValueComparator(unsortedMap));
sortedMap.putAll(unsortedMap);
return sortedMap;
}
}
class ValueComparator implements Comparator {
Map map;
public ValueComparator(Map map) {
this.map = map;
}
public int compare(Object keyA, Object keyB) {
Comparable valueA = (Comparable) map.get(keyA);
Comparable valueB = (Comparable) map.get(keyB);
return valueB.compareTo(valueA);
}
}
So typically there are two ways to iterate through string in java which has already been answered by multiple people here in this thread, just adding my version of it
First is using
String s = sc.next() // assuming scanner class is defined above
for(int i=0; i<s.length(); i++){
s.charAt(i) // This being the first way and is a constant time operation will hardly add any overhead
}
char[] str = new char[10];
str = s.toCharArray() // this is another way of doing so and it takes O(n) amount of time for copying contents from your string class to the character array
If performance is at stake then I will recommend using the first one in constant time, if it is not then going with the second one makes your work easier considering the immutability with string classes in java.

Categories

Resources