I'm writing a small JAVA program which:
takes a text as a String
takes 2 arrays of chars
What im trying to do will sound like "find and replace" but it is not the same so i thought its important to clear it.
Anyway I want to take this text, find if any char from the first array match a char in the text and if so, replace it with the matching char (according to index) from the second char array.
I'll explain with an example:
lets say my text (String) is: "java is awesome!";
i have 2 arrays (char[]): "absm" and "!#*$".
The wished result is to change 'a' to '!' , 'b' to '#' and so on..
meaning the resulted text will be:
"java is awesome!" changed to -> "j#v# i* #w*o$e!"
What is the most efficient way of doing this and why?
I thought about looping the text, but then i found it not so efficient.
(StringBuilder/String class can be used)
StringBuilder sb = new StringBuilder(text);
for(int i = 0; i<text.length(); i ++)
{
for (int j = 0; j < firstCharArray.length;j++)
{
if (sb.charAt(i) == firstCharArray[j])
{
sb.setCharAt(i, secondCharArray[j]);
break;
}
}
}
This way is efficient because it uses a StringBuilder to change the characters in place (if you used Strings you would have to create new ones each time because they are immutable.) Also it minimizes the amount of passes you have to do (1 pass through the text string and n passes through the first array where n = text.length())
I guess you are looking for StringUtils.replaceEach, at least as a reference.
How efficient do you need it to be? Are you doing this for hundreds, thousands, millions of words???
I don't know if it's the most efficent, but you could use the string indexOf() method on each of your possible tokens, it will tell you if it's there, and then you can replace that index at the same time with the corresponding char from the other array.
Codewise, something like (this is half pseudo code by the way):
for(each of first array) {
int temp = YourString.indexOf(current array field);
if (temp >=0) {
replace with other array
}
}
Put the 2 arrays you have in a Map
Map<Character, Character> //or Map of Strings
where the key is "a", "b" etc... and the value is the character you want to substitute with - "#" etc....
Then simply replace the keys in your String with the values.
For small stuff like this, an indexOf() search is likely to be faster than a map, while "avoiding" the inner loop of the accepted answer. Of course, the loop is still there, inside String.indexOf(), but it's likely to be optimized to a fare-thee-well by the JIT-compiler, because it's so heavily used.
static String replaceChars(String source, String from, String to)
{
StringBuilder dest = new StringBuilder(source);
for ( int i = 0; i < source.length(); i++ )
{
int foundAt = from.indexOf(source.charAt(i));
if ( foundAt >= 0 )
dest.setCharAt(i,to.charAt(foundAt));
}
return dest.toString();
}
Update: The Oracle/Sun JIT uses SIMD on at least some processors for indexOf(), making it even faster than one would guess.
Since the only way to know if a character should be replaced is to check it, you (or any util method) have to loop through the whole text, character after the other. You can never achieve better complexity than O(n) (n be the number of characters in the text).
This utility class that replaces a char or a group of chars of a String. It is equivalent to bash tr and perl tr///, aka, transliterate.
/**
* Utility class that replaces chars of a String, aka, transliterate.
*
* It's equivalent to bash 'tr' and perl 'tr///'.
*
*/
public class ReplaceChars {
public static String replace(String string, String from, String to) {
return new String(replace(string.toCharArray(), from.toCharArray(), to.toCharArray()));
}
public static char[] replace(char[] chars, char[] from, char[] to) {
char[] output = chars.clone();
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < from.length; j++) {
if (output[i] == from[j]) {
output[i] = to[j];
break;
}
}
}
return output;
}
/**
* For tests!
*/
public static void main(String[] args) {
// Example from: https://en.wikipedia.org/wiki/Caesar_cipher
String string = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
String from = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String to = "XYZABCDEFGHIJKLMNOPQRSTUVW";
System.out.println();
System.out.println("Cesar cypher: " + string);
System.out.println("Result: " + ReplaceChars.replace(string, from, to));
}
}
This is the output:
Cesar cypher: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Result: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD
Related
I am having an array
int arr[]={1,$,2,3,$,$,4,5}
and want the output as
arr[]={1,2,3,4,5,$,$,$}
Can you please help me
My code is
public class ArrayTest
{
static void splitString(String str)
{
StringBuffer alpha = new StringBuffer(),
num = new StringBuffer(), special = new StringBuffer();
for (int i=0; i<str.length(); i++)
{
if (Character.isDigit(str.charAt(i)))
num.append(str.charAt(i));
else
special.append(str.charAt(i));
}
System.out.print(num);
System.out.print(special);
}
public static void main(String args[])
{
String str = "1,2,$,$,3,4";
splitString(str);
}
}
I am getting the O/P as 1234,,$,$,,
instead of 1,2,3,4,$,$
Sorting is not single activity. Sorting is actually ordering, and comparing.
You can use Java built-in sorting but your own comparator (piece of code that knows how to compare).
Your comparator, need to make sure that special sign is just bigger then any other value (and is equal to any other special sign). If neither compared value is special sign, do ordinary comparison.
Here is link to other question that explains how to do that:
How to use Comparator in Java to sort
From your output I can see that your function includes the commas in the sorting as well. You must remove the commas before sorting the String:
str = str.replaceAll(",", "");
This line of code will replace all commas with nothing, or in other words remove them. Now you can execupe your sorting algorithm and add the commas at the end:
String merge = num.toString() + special.toString();
String result = "";
for (int i = 0; i < merge.length(); ++i) {
result += merge.charAt(i) + ",";
}
This will put an additional comma at the end which you can remove very easily:
result = result.substring(0, result.length() - 1);
Now result holds the wanted result.
For this Kata, i am given random function names in the PEP8 format and i am to convert them to camelCase.
(input)get_speed == (output)getSpeed ....
(input)set_distance == (output)setDistance
I have a understanding on one way of doing this written in pseudo-code:
loop through the word,
if the letter is an underscore
then delete the underscore
then get the next letter and change to a uppercase
endIf
endLoop
return the resultant word
But im unsure the best way of doing this, would it be more efficient to create a char array and loop through the element and then when it comes to finding an underscore delete that element and get the next index and change to uppercase.
Or would it be better to use recursion:
function camelCase takes a string
if the length of the string is 0,
then return the string
endIf
if the character is a underscore
then change to nothing,
then find next character and change to uppercase
return the string taking away the character
endIf
finally return the function taking the first character away
Any thoughts please, looking for a good efficient way of handing this problem. Thanks :)
I would go with this:
divide given String by underscore to array
from second word until end take first letter and convert it to uppercase
join to one word
This will work in O(n) (go through all names 3 time). For first case, use this function:
str.split("_");
for uppercase use this:
String newName = substring(0, 1).toUpperCase() + stre.substring(1);
But make sure you check size of the string first...
Edited - added implementation
It would look like this:
public String camelCase(String str) {
if (str == null ||str.trim().length() == 0) return str;
String[] split = str.split("_");
String newStr = split[0];
for (int i = 1; i < split.length; i++) {
newStr += split[i].substring(0, 1).toUpperCase() + split[i].substring(1);
}
return newStr;
}
for inputs:
"test"
"test_me"
"test_me_twice"
it returns:
"test"
"testMe"
"testMeTwice"
It would be simpler to iterate over the string instead of recursing.
String pep8 = "do_it_again";
StringBuilder camelCase = new StringBuilder();
for(int i = 0, l = pep8.length(); i < l; ++i) {
if(pep8.charAt(i) == '_' && (i + 1) < l) {
camelCase.append(Character.toUpperCase(pep8.charAt(++i)));
} else {
camelCase.append(pep8.charAt(i));
}
}
System.out.println(camelCase.toString()); // prints doItAgain
The question you pose is whether to use an iterative or a recursive approach. For this case I'd go for the recursive approach because it's straightforward, easy to understand doesn't require much resources (only one array, no new stackframe etc), though that doesn't really matter for this example.
Recursion is good for divide-and-conquer problems, but I don't see that fitting the case well, although it's possible.
An iterative implementation of the algorithm you described could look like the following:
StringBuilder buf = new StringBuilder(input);
for(int i = 0; i < buf.length(); i++){
if(buf.charAt(i) == '_'){
buf.deleteCharAt(i);
if(i != buf.length()){ //check fo EOL
buf.setCharAt(i, Character.toUpperCase(buf.charAt(i)));
}
}
}
return buf.toString();
The check for the EOL is not part of the given algorithm and could be ommitted, if the input string never ends with '_'
This question already has answers here:
How do I count the number of occurrences of a char in a String?
(48 answers)
Closed 7 years ago.
Is there a simple way (instead of traversing manually all the string, or loop for indexOf) in order to find how many times, a character appears in a string?
Say we have "abdsd3$asda$asasdd$sadas" and we want that $ appears 3 times.
public int countChar(String str, char c)
{
int count = 0;
for(int i=0; i < str.length(); i++)
{ if(str.charAt(i) == c)
count++;
}
return count;
}
This is definitely the fastest way. Regexes are much much slower here, and possible harder to understand.
Functional style (Java 8, just for fun):
str.chars().filter(num -> num == '$').count()
Not optimal, but simple way to count occurrences:
String s = "...";
int counter = s.split("\\$", -1).length - 1;
Note:
Dollar sign is a special Regular Expression symbol, so it must be escaped with a backslash.
A backslash is a special symbol for escape characters such as newlines, so it must be escaped with a backslash.
The second argument of split prevents empty trailing strings from being removed.
You can use Apache Commons' StringUtils.countMatches(String string, String subStringToCount).
Since you're scanning the whole string anyway you can build a full character count and do any number of lookups, all for the same big-Oh cost (n):
public static Map<Character,Integer> getCharFreq(String s) {
Map<Character,Integer> charFreq = new HashMap<Character,Integer>();
if (s != null) {
for (Character c : s.toCharArray()) {
Integer count = charFreq.get(c);
int newCount = (count==null ? 1 : count+1);
charFreq.put(c, newCount);
}
}
return charFreq;
}
// ...
String s = "abdsd3$asda$asasdd$sadas";
Map counts = getCharFreq(s);
counts.get('$'); // => 3
counts.get('a'); // => 7
counts.get('s'); // => 6
A character frequency count is a common task for some applications (such as education) but not general enough to warrant inclusion with the core Java APIs. As such, you'll probably need to write your own function.
you can also use a for each loop. I think it is simpler to read.
int occurrences = 0;
for(char c : yourString.toCharArray()){
if(c == '$'){
occurrences++;
}
}
I believe the "one liner" that you expected to get is this:
"abdsd3$asda$asasdd$sadas".replaceAll( "[^$]*($)?", "$1" ).length();
Remember that the requirements are:
(instead of traversing manually all the string, or loop for indexOf)
and let me add: that at the heart of this question it sounds like "any loop" is not wanted and there is no requirement for speed. I believe the subtext of this question is coolness factor.
Something a bit more functional, without Regex:
public static int count(String s, char c) {
return s.length()==0 ? 0 : (s.charAt(0)==c ? 1 : 0) + count(s.substring(1),c);
}
It's no tail recursive, for the sake of clarity.
Traversing the string is probably the most efficient, though using Regex to do this might yield cleaner looking code (though you can always hide your traverse code in a function).
Well there are a bunch of different utilities for this, e.g. Apache Commons Lang String Utils
but in the end, it has to loop over the string to count the occurrences one way or another.
Note also that the countMatches method above has the following signature so will work for substrings as well.
public static int countMatches(String str, String sub)
The source for this is (from here):
public static int countMatches(String str, String sub) {
if (isEmpty(str) || isEmpty(sub)) {
return 0;
}
int count = 0;
int idx = 0;
while ((idx = str.indexOf(sub, idx)) != -1) {
count++;
idx += sub.length();
}
return count;
}
I was curious if they were iterating over the string or using Regex.
This is simple code, but of course a little bit slower.
String s = ...;
int countDollar = s.length()-s.replaceAll("\\$","").length();
int counta = s.length()-s.replaceAll("a","").length();
An even better answer is here in a duplicate question
You can look at sorting the string -- treat it as a char array -- and then do a modified binary search which counts occurrences? But I agree with #tofutim that traversing it is the most efficient -- O(N) versus O(N * logN) + O(logN)
There is another way to count the number of characters in each string.
Assuming we have a String as
String str = "abfdvdvdfv"
We can then count the number of times each character appears by traversing only once as
for (int i = 0; i < str.length(); i++)
{
if(null==map.get(str.charAt(i)+""))
{
map.put(str.charAt(i)+"", new Integer(1));
}
else
{
Integer count = map.get(str.charAt(i)+"");
map.put(str.charAt(i)+"", count+1);
}
}
We can then check the output by traversing the Map as
for (Map.Entry<String, Integer> entry:map.entrySet())
{
System.out.println(entry.getKey()+" count is : "+entry.getValue())
}
public static int countChars(String input,char find){
if(input.indexOf(find) != -1){
return countChars(input.substring(0, input.indexOf(find)), find)+
countChars(input.substring(input.indexOf(find)+1),find) + 1;
}
else {
return 0;
}
}
In Java is there a way to check the condition:
"Does this single character appear at all in string x"
without using a loop?
You can use string.indexOf('a').
If the char a is present in string :
it returns the the index of the first occurrence of the character in
the character sequence represented by this object, or -1 if the
character does not occur.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring (there are 4 variations of this method)
I'm not sure what the original poster is asking exactly. Since indexOf(...) and contains(...) both probably use loops internally, perhaps he's looking to see if this is possible at all without a loop? I can think of two ways off hand, one would of course be recurrsion:
public boolean containsChar(String s, char search) {
if (s.length() == 0)
return false;
else
return s.charAt(0) == search || containsChar(s.substring(1), search);
}
The other is far less elegant, but completeness...:
/**
* Works for strings of up to 5 characters
*/
public boolean containsChar(String s, char search) {
if (s.length() > 5) throw IllegalArgumentException();
try {
if (s.charAt(0) == search) return true;
if (s.charAt(1) == search) return true;
if (s.charAt(2) == search) return true;
if (s.charAt(3) == search) return true;
if (s.charAt(4) == search) return true;
} catch (IndexOutOfBoundsException e) {
// this should never happen...
return false;
}
return false;
}
The number of lines grow as you need to support longer and longer strings of course. But there are no loops/recurrsions at all. You can even remove the length check if you're concerned that that length() uses a loop.
You can use 2 methods from the String class.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring or returns -1 if the character is not found (there are 4 variations of this method)
Method 1:
String myString = "foobar";
if (myString.contains("x") {
// Do something.
}
Method 2:
String myString = "foobar";
if (myString.indexOf("x") >= 0 {
// Do something.
}
Links by: Zach Scrivena
String temp = "abcdefghi";
if(temp.indexOf("b")!=-1)
{
System.out.println("there is 'b' in temp string");
}
else
{
System.out.println("there is no 'b' in temp string");
}
If you need to check the same string often you can calculate the character occurrences up-front. This is an implementation that uses a bit array contained into a long array:
public class FastCharacterInStringChecker implements Serializable {
private static final long serialVersionUID = 1L;
private final long[] l = new long[1024]; // 65536 / 64 = 1024
public FastCharacterInStringChecker(final String string) {
for (final char c: string.toCharArray()) {
final int index = c >> 6;
final int value = c - (index << 6);
l[index] |= 1L << value;
}
}
public boolean contains(final char c) {
final int index = c >> 6; // c / 64
final int value = c - (index << 6); // c - (index * 64)
return (l[index] & (1L << value)) != 0;
}}
To check if something does not exist in a string, you at least need to look at each character in a string. So even if you don't explicitly use a loop, it'll have the same efficiency. That being said, you can try using str.contains(""+char).
Is the below what you were looking for?
int index = string.indexOf(character);
return index != -1;
Yes, using the indexOf() method on the string class. See the API documentation for this method
String.contains(String) or String.indexOf(String) - suggested
"abc".contains("Z"); // false - correct
"zzzz".contains("Z"); // false - correct
"Z".contains("Z"); // true - correct
"πandπ".contains("π"); // true - correct
"πandπ".contains("π"); // false - correct
"πandπ".indexOf("π"); // 0 - correct
"πandπ".indexOf("π"); // -1 - correct
String.indexOf(int) and carefully considered String.indexOf(char) with char to int widening
"πandπ".indexOf("π".charAt(0)); // 0 though incorrect usage has correct output due to portion of correct data
"πandπ".indexOf("π".charAt(0)); // 0 -- incorrect usage and ambiguous result
"πandπ".indexOf("π".codePointAt(0)); // -1 -- correct usage and correct output
The discussions around character is ambiguous in Java world
can the value of char or Character considered as single character?
No. In the context of unicode characters, char or Character can sometimes be part of a single character and should not be treated as a complete single character logically.
if not, what should be considered as single character (logically)?
Any system supporting character encodings for Unicode characters should consider unicode's codepoint as single character.
So Java should do that very clear & loud rather than exposing too much of internal implementation details to users.
String class is bad at abstraction (though it requires confusingly good amount of understanding of its encapsulations to understand the abstraction πππ and hence an anti-pattern).
How is it different from general char usage?
char can be only be mapped to a character in Basic Multilingual Plane.
Only codePoint - int can cover the complete range of Unicode characters.
Why is this difference?
char is internally treated as 16-bit unsigned value and could not represent all the unicode characters using UTF-16 internal representation using only 2-bytes. Sometimes, values in a 16-bit range have to be combined with another 16-bit value to correctly define character.
Without getting too verbose, the usage of indexOf, charAt, length and such methods should be more explicit. Sincerely hoping Java will add new UnicodeString and UnicodeCharacter classes with clearly defined abstractions.
Reason to prefer contains and not indexOf(int)
Practically there are many code flows that treat a logical character as char in java.
In Unicode context, char is not sufficient
Though the indexOf takes in an int, char to int conversion masks this from the user and user might do something like str.indexOf(someotherstr.charAt(0))(unless the user is aware of the exact context)
So, treating everything as CharSequence (aka String) is better
public static void main(String[] args) {
System.out.println("πandπ".indexOf("π".charAt(0))); // 0 though incorrect usage has correct output due to portion of correct data
System.out.println("πandπ".indexOf("π".charAt(0))); // 0 -- incorrect usage and ambiguous result
System.out.println("πandπ".indexOf("π".codePointAt(0))); // -1 -- correct usage and correct output
System.out.println("πandπ".contains("π")); // true - correct
System.out.println("πandπ".contains("π")); // false - correct
}
Semantics
char can handle most of the practical use cases. Still its better to use codepoints within programming environment for future extensibility.
codepoint should handle nearly all of the technical use cases around encodings.
Still, Grapheme Clusters falls out of the scope of codepoint level of abstraction.
Storage layers can choose char interface if ints are too costly(doubled). Unless storage cost is the only metric, its still better to use codepoint. Also, its better to treat storage as byte and delegate semantics to business logic built around storage.
Semantics can be abstracted at multiple levels. codepoint should become lowest level of interface and other semantics can be built around codepoint in runtime environment.
package com;
public class _index {
public static void main(String[] args) {
String s1="be proud to be an indian";
char ch=s1.charAt(s1.indexOf('e'));
int count = 0;
for(int i=0;i<s1.length();i++) {
if(s1.charAt(i)=='e'){
System.out.println("number of E:=="+ch);
count++;
}
}
System.out.println("Total count of E:=="+count);
}
}
static String removeOccurences(String a, String b)
{
StringBuilder s2 = new StringBuilder(a);
for(int i=0;i<b.length();i++){
char ch = b.charAt(i);
System.out.println(ch+" first index"+a.indexOf(ch));
int lastind = a.lastIndexOf(ch);
for(int k=new String(s2).indexOf(ch);k > 0;k=new String(s2).indexOf(ch)){
if(s2.charAt(k) == ch){
s2.deleteCharAt(k);
System.out.println("val of s2 : "+s2.toString());
}
}
}
System.out.println(s1.toString());
return (s1.toString());
}
you can use this code. It will check the char is present or not. If it is present then the return value is >= 0 otherwise it's -1. Here I am printing alphabets that is not present in the input.
import java.util.Scanner;
public class Test {
public static void letters()
{
System.out.println("Enter input char");
Scanner sc = new Scanner(System.in);
String input = sc.next();
System.out.println("Output : ");
for (char alphabet = 'A'; alphabet <= 'Z'; alphabet++) {
if(input.toUpperCase().indexOf(alphabet) < 0)
System.out.print(alphabet + " ");
}
}
public static void main(String[] args) {
letters();
}
}
//Ouput Example
Enter input char
nandu
Output :
B C E F G H I J K L M O P Q R S T V W X Y Z
If you see the source code of indexOf in JAVA:
public int indexOf(int ch, int fromIndex) {
final int max = value.length;
if (fromIndex < 0) {
fromIndex = 0;
} else if (fromIndex >= max) {
// Note: fromIndex might be near -1>>>1.
return -1;
}
if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
// handle most cases here (ch is a BMP code point or a
// negative value (invalid code point))
final char[] value = this.value;
for (int i = fromIndex; i < max; i++) {
if (value[i] == ch) {
return i;
}
}
return -1;
} else {
return indexOfSupplementary(ch, fromIndex);
}
}
you can see it uses a for loop for finding a character. Note that each indexOf you may use in your code, is equal to one loop.
So, it is unavoidable to use loop for a single character.
However, if you want to find a special string with more different forms, use useful libraries such as util.regex, it deploys stronger algorithm to match a character or a string pattern with Regular Expressions. For example to find an email in a string:
String regex = "^(.+)#(.+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(email);
If you don't like to use regex, just use a loop and charAt and try to cover all cases in one loop.
Be careful recursive methods has more overhead than loop, so it's not recommended.
how about one uses this ;
let text = "Hello world, welcome to the universe.";
let result = text.includes("world");
console.log(result) ....// true
the result will be a true or false
this always works for me
You won't be able to check if char appears at all in some string without atleast going over the string once using loop / recursion ( the built-in methods like indexOf also use a loop )
If the no. of times you look up if a char is in string x is more way more than the length of the string than I would recommend using a Set data structure as that would be more efficient than simply using indexOf
String s = "abc";
// Build a set so we can check if character exists in constant time O(1)
Set<Character> set = new HashSet<>();
int len = s.length();
for(int i = 0; i < len; i++) set.add(s.charAt(i));
// Now we can check without the need of a loop
// contains method of set doesn't use a loop unlike string's contains method
set.contains('a') // true
set.contains('z') // false
Using set you will be able to check if character exists in a string in constant time O(1) but you will also use additional memory ( Space complexity will be O(n) ).
Some ways to iterate through the characters of a string in Java are:
Using StringTokenizer?
Converting the String to a char[] and iterating over that.
What is the easiest/best/most correct way to iterate?
I use a for loop to iterate the string and use charAt() to get each character to examine it. Since the String is implemented with an array, the charAt() method is a constant time operation.
String s = "...stuff...";
for (int i = 0; i < s.length(); i++){
char c = s.charAt(i);
//Process char
}
That's what I would do. It seems the easiest to me.
As far as correctness goes, I don't believe that exists here. It is all based on your personal style.
Two options
for(int i = 0, n = s.length() ; i < n ; i++) {
char c = s.charAt(i);
}
or
for(char c : s.toCharArray()) {
// process c
}
The first is probably faster, then 2nd is probably more readable.
Note most of the other techniques described here break down if you're dealing with characters outside of the BMP (Unicode Basic Multilingual Plane), i.e. code points that are outside of the u0000-uFFFF range. This will only happen rarely, since the code points outside this are mostly assigned to dead languages. But there are some useful characters outside this, for example some code points used for mathematical notation, and some used to encode proper names in Chinese.
In that case your code will be:
String str = "....";
int offset = 0, strLen = str.length();
while (offset < strLen) {
int curChar = str.codePointAt(offset);
offset += Character.charCount(curChar);
// do something with curChar
}
The Character.charCount(int) method requires Java 5+.
Source: http://mindprod.com/jgloss/codepoint.html
In Java 8 we can solve it as:
String str = "xyz";
str.chars().forEachOrdered(i -> System.out.print((char)i));
str.codePoints().forEachOrdered(i -> System.out.print((char)i));
The method chars() returns an IntStream as mentioned in doc:
Returns a stream of int zero-extending the char values from this
sequence. Any char which maps to a surrogate code point is passed
through uninterpreted. If the sequence is mutated while the stream is
being read, the result is undefined.
The method codePoints() also returns an IntStream as per doc:
Returns a stream of code point values from this sequence. Any
surrogate pairs encountered in the sequence are combined as if by
Character.toCodePoint and the result is passed to the stream. Any
other code units, including ordinary BMP characters, unpaired
surrogates, and undefined code units, are zero-extended to int values
which are then passed to the stream.
How is char and code point different? As mentioned in this article:
Unicode 3.1 added supplementary characters, bringing the total number
of characters to more than the 2^16 = 65536 characters that can be
distinguished by a single 16-bit char. Therefore, a char value no
longer has a one-to-one mapping to the fundamental semantic unit in
Unicode. JDK 5 was updated to support the larger set of character
values. Instead of changing the definition of the char type, some of
the new supplementary characters are represented by a surrogate pair
of two char values. To reduce naming confusion, a code point will be
used to refer to the number that represents a particular Unicode
character, including supplementary ones.
Finally why forEachOrdered and not forEach ?
The behaviour of forEach is explicitly nondeterministic where as the forEachOrdered performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order. So forEach does not guarantee that the order would be kept. Also check this question for more.
For difference between a character, a code point, a glyph and a grapheme check this question.
I agree that StringTokenizer is overkill here. Actually I tried out the suggestions above and took the time.
My test was fairly simple: create a StringBuilder with about a million characters, convert it to a String, and traverse each of them with charAt() / after converting to a char array / with a CharacterIterator a thousand times (of course making sure to do something on the string so the compiler can't optimize away the whole loop :-) ).
The result on my 2.6 GHz Powerbook (that's a mac :-) ) and JDK 1.5:
Test 1: charAt + String --> 3138msec
Test 2: String converted to array --> 9568msec
Test 3: StringBuilder charAt --> 3536msec
Test 4: CharacterIterator and String --> 12151msec
As the results are significantly different, the most straightforward way also seems to be the fastest one. Interestingly, charAt() of a StringBuilder seems to be slightly slower than the one of String.
BTW I suggest not to use CharacterIterator as I consider its abuse of the '\uFFFF' character as "end of iteration" a really awful hack. In big projects there's always two guys that use the same kind of hack for two different purposes and the code crashes really mysteriously.
Here's one of the tests:
int count = 1000;
...
System.out.println("Test 1: charAt + String");
long t = System.currentTimeMillis();
int sum=0;
for (int i=0; i<count; i++) {
int len = str.length();
for (int j=0; j<len; j++) {
if (str.charAt(j) == 'b')
sum = sum + 1;
}
}
t = System.currentTimeMillis()-t;
System.out.println("result: "+ sum + " after " + t + "msec");
There are some dedicated classes for this:
import java.text.*;
final CharacterIterator it = new StringCharacterIterator(s);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
// process c
...
}
If you have Guava on your classpath, the following is a pretty readable alternative. Guava even has a fairly sensible custom List implementation for this case, so this shouldn't be inefficient.
for(char c : Lists.charactersOf(yourString)) {
// Do whatever you want
}
UPDATE: As #Alex noted, with Java 8 there's also CharSequence#chars to use. Even the type is IntStream, so it can be mapped to chars like:
yourString.chars()
.mapToObj(c -> Character.valueOf((char) c))
.forEach(c -> System.out.println(c)); // Or whatever you want
If you need to iterate through the code points of a String (see this answer) a shorter / more readable way is to use the CharSequence#codePoints method added in Java 8:
for(int c : string.codePoints().toArray()){
...
}
or using the stream directly instead of a for loop:
string.codePoints().forEach(c -> ...);
There is also CharSequence#chars if you want a stream of the characters (although it is an IntStream, since there is no CharStream).
If you need performance, then you must test on your environment. No other way.
Here example code:
int tmp = 0;
String s = new String(new byte[64*1024]);
{
long st = System.nanoTime();
for(int i = 0, n = s.length(); i < n; i++) {
tmp += s.charAt(i);
}
st = System.nanoTime() - st;
System.out.println("1 " + st);
}
{
long st = System.nanoTime();
char[] ch = s.toCharArray();
for(int i = 0, n = ch.length; i < n; i++) {
tmp += ch[i];
}
st = System.nanoTime() - st;
System.out.println("2 " + st);
}
{
long st = System.nanoTime();
for(char c : s.toCharArray()) {
tmp += c;
}
st = System.nanoTime() - st;
System.out.println("3 " + st);
}
System.out.println("" + tmp);
On Java online I get:
1 10349420
2 526130
3 484200
0
On Android x86 API 17 I get:
1 9122107
2 13486911
3 12700778
0
I wouldn't use StringTokenizer as it is one of classes in the JDK that's legacy.
The javadoc says:
StringTokenizer is a legacy class that
is retained for compatibility reasons
although its use is discouraged in new
code. It is recommended that anyone
seeking this functionality use the
split method of String or the
java.util.regex package instead.
public class Main {
public static void main(String[] args) {
String myStr = "Hello";
String myStr2 = "World";
for (int i = 0; i < myStr.length(); i++) {
char result = myStr.charAt(i);
System.out.println(result);
}
for (int i = 0; i < myStr2.length(); i++) {
char result = myStr2.charAt(i);
System.out.print(result);
}
}
}
Output:
H
e
l
l
o
World
See The Java Tutorials: Strings.
public class StringDemo {
public static void main(String[] args) {
String palindrome = "Dot saw I was Tod";
int len = palindrome.length();
char[] tempCharArray = new char[len];
char[] charArray = new char[len];
// put original string in an array of chars
for (int i = 0; i < len; i++) {
tempCharArray[i] = palindrome.charAt(i);
}
// reverse array of chars
for (int j = 0; j < len; j++) {
charArray[j] = tempCharArray[len - 1 - j];
}
String reversePalindrome = new String(charArray);
System.out.println(reversePalindrome);
}
}
Put the length into int len and use for loop.
StringTokenizer is totally unsuited to the task of breaking a string into its individual characters. With String#split() you can do that easily by using a regex that matches nothing, e.g.:
String[] theChars = str.split("|");
But StringTokenizer doesn't use regexes, and there's no delimiter string you can specify that will match the nothing between characters. There is one cute little hack you can use to accomplish the same thing: use the string itself as the delimiter string (making every character in it a delimiter) and have it return the delimiters:
StringTokenizer st = new StringTokenizer(str, str, true);
However, I only mention these options for the purpose of dismissing them. Both techniques break the original string into one-character strings instead of char primitives, and both involve a great deal of overhead in the form of object creation and string manipulation. Compare that to calling charAt() in a for loop, which incurs virtually no overhead.
Elaborating on this answer and this answer.
Above answers point out the problem of many of the solutions here which don't iterate by code point value -- they would have trouble with any surrogate chars. The java docs also outline the issue here (see "Unicode Character Representations"). Anyhow, here's some code that uses some actual surrogate chars from the supplementary Unicode set, and converts them back to a String. Note that .toChars() returns an array of chars: if you're dealing with surrogates, you'll necessarily have two chars. This code should work for any Unicode character.
String supplementary = "Some Supplementary: π π ±π Ήπ ±";
supplementary.codePoints().forEach(cp ->
System.out.print(new String(Character.toChars(cp))));
This Example Code will Help you out!
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;
public class Solution {
public static void main(String[] args) {
HashMap<String, Integer> map = new HashMap<String, Integer>();
map.put("a", 10);
map.put("b", 30);
map.put("c", 50);
map.put("d", 40);
map.put("e", 20);
System.out.println(map);
Map sortedMap = sortByValue(map);
System.out.println(sortedMap);
}
public static Map sortByValue(Map unsortedMap) {
Map sortedMap = new TreeMap(new ValueComparator(unsortedMap));
sortedMap.putAll(unsortedMap);
return sortedMap;
}
}
class ValueComparator implements Comparator {
Map map;
public ValueComparator(Map map) {
this.map = map;
}
public int compare(Object keyA, Object keyB) {
Comparable valueA = (Comparable) map.get(keyA);
Comparable valueB = (Comparable) map.get(keyB);
return valueB.compareTo(valueA);
}
}
So typically there are two ways to iterate through string in java which has already been answered by multiple people here in this thread, just adding my version of it
First is using
String s = sc.next() // assuming scanner class is defined above
for(int i=0; i<s.length(); i++){
s.charAt(i) // This being the first way and is a constant time operation will hardly add any overhead
}
char[] str = new char[10];
str = s.toCharArray() // this is another way of doing so and it takes O(n) amount of time for copying contents from your string class to the character array
If performance is at stake then I will recommend using the first one in constant time, if it is not then going with the second one makes your work easier considering the immutability with string classes in java.