Find first occurrence of two characters - java

So I have dealt with this problem before and thought there would be an accepted pattern to solve the problem, but I have yet to find anything. I tried searching around and I have tried tinkering around myself and neither resulted in any satisfactory answers so I am turning to SO
String str = "blah1blah2"
I want to know whether the char '1' or '2' occurs first (this is just a made up example obviously). I know I could use str.indexOf() for 1 and 2 to compare, but this presents the problem of it possibly returning -1.
Let me know what would be a good way to tackle this.
FYI: I am working in Java but I think this sort of indexOf function is pretty common in other languages.

I don't know what degree of flexibility you require, but I would just do this the good-old-fashioned way of looping through the String, something like this:
public static char findFirstChar(String str, char c1, char c2) {
for (char c : str.toCharArray())
if (c == c1 || c == c2)
return c;
return 0;
}
Of course, this will return the char it encounters first or 0 if neither neither chars are found in the string.
If you want to search for an arbitrary number of character:
public static char findFirstChar(String str, char ... chars) {
for (char c1 : str.toCharArray())
for (char c2 : chars)
if (c1 == c2)
return c1;
return 0;
}

I would say you should start by defining exactly what behavior you want. Assume your search terms are "1" and "2", what should be returned for each of the following strings?
"blah1blah2"
"blah2blah1"
"blahblah1"
"blahblah2"
"blahblah"
Write test cases for each of these, with your answer. Now make the tests pass. Simple!

I'm not sure there's another way but to check whether the characters are there before you compare:
String result;
if (str.indexOf('1') > -1 && str.indexOf('2') > -1 ) {
str.indexOf('2') > str.indexOf('1') ? result ="1 before 2":result="2 before 1";
}
else {
result="one of them is not there"
}
System.out.println(result);
All depends on the results you expect

String str = "blah1blah2"
int indexOf1 = str.indexOf(1);
int indexOf2 = str.indexOf(2);
if(indexOf1!=-1)
{
if(indexOf2!=-1)
{
int flag = indexOf1 - indexOf2;
if(flag<0) // 1 first
else // 2 first
}
else
{ // 1 is present, but 2 is not }
}
else
{
if(indexOf2!=-1) // 2 is present, but 1 is not
}

Related

Check if all characters in string are the same with inbuilt methods without loops

How can I build a Java method that returns TRUE if all the characters in a String are the same using inbuilt methods - without using Regex, loops, or recursion?
Examples:
aaaaaa --> True
abaaaa --> False
Aaaaaa --> False
You can convert the string to an IntStream using .chars() and check if the stream has distinct count of 1 using .distinct().count() == 1.
String s = "aaaaaa";
boolean isAllCharsSame = s.chars().distinct().count() == 1;
isAllCharsSame will be true if all characters in the string s are same, otherwise false.
Edit:
String s = "aaaaaa";
boolean isAllCharsSame = s.codePoints().distinct().count() == 1;
.chars() won't work for Unicode codepoints like "πŸ‘‹πŸ‘‹πŸ‘‹", use .codePoints() for that. Thanks to #BasilBourque for pointing this out.
tl;dr
if ( "πŸ‘‹πŸ‘‹πŸ‘‹".codePoints().distinct().count() == 1 ) { … }
Code point, not char
Some of the other Answers use char. Unfortunately the char type is obsolete, unable to represent even half of the 143,859 characters in Unicode. Just try using the string "πŸ‘‹πŸ‘‹πŸ‘‹" instead of "aaa".
Instead use code point integer numbers.
Set < Integer > codePointsDistinct = "aaaaaaa".codePoints().boxed().collect( Collectors.toSet());
boolean allSameCharacter = ( codePointsDistinct.size() == 1 ) ;
See that code run live at IdeOne.com.
true
We can make that even more brief by asking the stream to eliminate duplicates by calling .distinct().
boolean allSameCharacter = ( "πŸ‘‹πŸ‘‹πŸ‘‹".codePoints().distinct().count() == 1 );
You can use String.replaceAll()
String s = "aaaaaaa";
boolean same = s.replaceAll("" + s.charAt(0), "").length() == 0;
You can do it without a loop at all, using recursion. It's awful, but possible:
boolean singleChar(String s) {
// Add check to make sure there is at least one char.
char first = s.charAt(0);
return singleCharRecursive(first, 1, s);
}
boolean singleCharRecursive(char first, int idx, String s) {
return idx >= s.length()
|| (s.charAt(idx) == first && singleCharRecursive (first, idx+1, s));
}
Here's another answer
I believe there maybe another option analyzing off the first character in the string to see if they all match, but this one works.
String t = "aaa";
char c = t.charAt(0);
long cnt = t.chars().filter(ch -> ch == c).count();
System.out.println(cnt==t.length());
You can use replace method:
boolean r = "aaaaaaa".replace("a", "").length() == 0;
System.out.println(r); // true
See also: Easier way to represent indicies in a 2D array

Check if string contains only Unicode values [\u0030-\u0039] or [\u0660-\u0669]

I need to check, in java, if a string is composed only of Unicode values [\u0030-\u0039] or [\u0660-\u0669]. What is the most efficient way of doing this?
Use \x for unicode characters:
^([\x{0030}-\x{0039}\x{0660}-\x{0669}]+)$
if the patternt should match an empty string too, use * instead of +
Use this if you dont want to allows mixing characters from both sets you provided:
^([\x{0030}-\x{0039}]+|[\x{0660}-\x{0669}]+)$
https://regex101.com/r/xqWL4q/6
As mentioned by Holger in comments below. \x{0030}-\x{0039} is equivalent with [0-9]. So could be substituted and would be more readable.
As said here, it’s not clear whether you want to check for probably mixed occurrences of these digits or check for either of these ranges.
A simple check for mixed digits would be string.matches("[0-9Ω -Ω©]*") or to avoid confusing changes of the read/write direction, or if your source code encoding doesn’t support all characters, string.matches("[0-9\u0660-\u669]*").
Checking whether the string matches either range, can be done using
string.matches("[0-9]*")||string.matches("[Ω -Ω©]*") or
string.matches("[0-9]*")||string.matches("[\u0660-\u669]*").
An alternative would be
string.chars().allMatch(c -> c >= '0' && c <= '9' || c >= 'Ω ' && c <= 'Ω©').
Or to check for either, string.chars().allMatch(c -> c >= '0' && c <= '9') || string.chars().allMatch(c -> c >= 'Ω ' && c <= 'Ω©')
Since these codepoints represent numerals in two different unicode blocks,
I suggest to check if respective character is a numeral:
boolean isNumerals(String s) {
return !s.chars().anyMatch(v -> !Character.isDigit(v));
}
This will definitely match more than asked for, but in some cases or in more controlled environment it may be useful to make code more readable.
(edit)
Java API also allows to determine a unicode block of a specific character:
Character.UnicodeBlock arabic = Character.UnicodeBlock.ARABIC;
Character.UnicodeBlock latin = Character.UnicodeBlock.BASIC_LATIN;
boolean isValidBlock(String s) {
return s.chars().allMatch(v ->
Character.UnicodeBlock.of(v).equals(arabic) ||
Character.UnicodeBlock.of(v).equals(latin)
);
}
Combined with the check above will give exact result OP has asked for.
On the plus side - higher abstraction gives more flexibility, makes code more readable and is not dependent on exact encoding of string passed.
simple solution by using regex:
(see also lot better explained by #Predicate https://stackoverflow.com/a/60597367/12558456)
private boolean legalRegex(String s) {
return s.matches("^([\u0030-\u0039]|[\u0660-\u0669])*$");
}
faster but ugly solution: (needs a hashset of allowed chars)
private boolean legalCharactersOnly(String s) {
for (char c:s.toCharArray()) {
if (!allowedCharacters.contains(c)) {
return false;
}
}
return true;
}
Here is a solution which works without regex for arbitrary unicode code points (outside of the Basic Multilingual Plane).
private final Set<Integer> codePoints = new HashSet<Integer>();
public boolean test(String string) {
for (int i = 0, codePoint = 0; i < string.length(); i += Character.charCount(codePoint)) {
codePoint = string.codePointAt(i);
if (!codePoints.contains(codePoint)) {
return false;
}
}
return true;
}

Comparing chars in Java

I want to check a char variable is one of 21 specific chars, what is the shortest way I can do this?
For example:
if(symbol == ('A'|'B'|'C')){}
Doesn't seem to be working. Do I need to write it like:
if(symbol == 'A' || symbol == 'B' etc.)
If your input is a character and the characters you are checking against are mostly consecutive you could try this:
if ((symbol >= 'A' && symbol <= 'Z') || symbol == '?') {
// ...
}
However if your input is a string a more compact approach (but slower) is to use a regular expression with a character class:
if (symbol.matches("[A-Z?]")) {
// ...
}
If you have a character you'll first need to convert it to a string before you can use a regular expression:
if (Character.toString(symbol).matches("[A-Z?]")) {
// ...
}
If you know all your 21 characters in advance you can write them all as one String and then check it like this:
char wanted = 'x';
String candidates = "abcdefghij...";
boolean hit = candidates.indexOf(wanted) >= 0;
I think this is the shortest way.
The first statement you have is probably not what you want... 'A'|'B'|'C' is actually doing bitwise operation :)
Your second statement is correct, but you will have 21 ORs.
If the 21 characters are "consecutive" the above solutions is fine.
If not you can pre-compute a hash set of valid characters and do something like
if (validCharHashSet.contains(symbol))...
you can use this:
if ("ABCDEFGHIJKLMNOPQRSTUVWXYZ".contains(String.valueOf(yourChar)))
note that you do not need to create a separate String with the letters A-Z.
It might be clearer written as a switch statement with fall through e.g.
switch (symbol){
case 'A':
case 'B':
// Do stuff
break;
default:
}
If you have specific chars should be:
Collection<Character> specificChars = Arrays.asList('A', 'D', 'E'); // more chars
char symbol = 'Y';
System.out.println(specificChars.contains(symbol)); // false
symbol = 'A';
System.out.println(specificChars.contains(symbol)); // true
Using Guava:
if (CharMatcher.anyOf("ABC...").matches(symbol)) { ... }
Or if many of those characters are a range, such as "A" to "U" but some aren't:
CharMatcher.inRange('A', 'U').or(CharMatcher.anyOf("1379"))
You can also declare this as a static final field so the matcher doesn't have to be created each time.
private static final CharMatcher MATCHER = CharMatcher.anyOf("ABC...");
Option 2 will work. You could also use a Set<Character> or
char[] myCharSet = new char[] {'A', 'B', 'C', ...};
Arrays.sort(myCharSet);
if (Arrays.binarySearch(myCharSet, symbol) >= 0) { ... }
You can solve this easily by using the String.indexOf(char) method which returns -1 if the char is not in the String.
String candidates = "ABCDEFGHIJK";
if(candidates.indexOf(symbol) != -1){
//character in list of candidates
}
Yes, you need to write it like your second line. Java doesn't have the python style syntactic sugar of your first line.
Alternatively you could put your valid values into an array and check for the existence of symbol in the array.
pseudocode as I haven't got a java sdk on me:
Char candidates = new Char[] { 'A', 'B', ... 'G' };
foreach(Char c in candidates)
{
if (symbol == c) { return true; }
}
return false;
One way to do it using a List<Character> constructed using overloaded convenience factory methods in java9 is as :
if(List.of('A','B','C','D','E').contains(symbol) {
// do something
}
You can just write your chars as Strings and use the equals method.
For Example:
String firstChar = "A";
String secondChar = "B";
String thirdChar = "C";
if (firstChar.equalsIgnoreCase(secondChar) ||
(firstChar.equalsIgnoreCase(thirdChar))) // As many equals as you want
{
System.out.println(firstChar + " is the same as " + secondChar);
} else {
System.out.println(firstChar + " is different than " + secondChar);
}

How do I find out if first character of a string is a number?

In Java is there a way to find out if first character of a string is a number?
One way is
string.startsWith("1")
and do the above all the way till 9, but that seems very inefficient.
Character.isDigit(string.charAt(0))
Note that this will allow any Unicode digit, not just 0-9. You might prefer:
char c = string.charAt(0);
isDigit = (c >= '0' && c <= '9');
Or the slower regex solutions:
s.substring(0, 1).matches("\\d")
// or the equivalent
s.substring(0, 1).matches("[0-9]")
However, with any of these methods, you must first be sure that the string isn't empty. If it is, charAt(0) and substring(0, 1) will throw a StringIndexOutOfBoundsException. startsWith does not have this problem.
To make the entire condition one line and avoid length checks, you can alter the regexes to the following:
s.matches("\\d.*")
// or the equivalent
s.matches("[0-9].*")
If the condition does not appear in a tight loop in your program, the small performance hit for using regular expressions is not likely to be noticeable.
Regular expressions are very strong but expensive tool. It is valid to use them for checking if the first character is a digit but it is not so elegant :) I prefer this way:
public boolean isLeadingDigit(final String value){
final char c = value.charAt(0);
return (c >= '0' && c <= '9');
}
IN KOTLIN :
Suppose that you have a String like this :
private val phoneNumber="9121111111"
At first you should get the first one :
val firstChar=phoneNumber.slice(0..0)
At second you can check the first char that return a Boolean :
firstChar.isInt() // or isFloat()
regular expression starts with number->'^[0-9]'
Pattern pattern = Pattern.compile('^[0-9]');
Matcher matcher = pattern.matcher(String);
if(matcher.find()){
System.out.println("true");
}
I just came across this question and thought on contributing with a solution that does not use regex.
In my case I use a helper method:
public boolean notNumber(String input){
boolean notNumber = false;
try {
// must not start with a number
#SuppressWarnings("unused")
double checker = Double.valueOf(input.substring(0,1));
}
catch (Exception e) {
notNumber = true;
}
return notNumber;
}
Probably an overkill, but I try to avoid regex whenever I can.
To verify only first letter is number or character --
For number
Character.isDigit(str.charAt(0)) --return true
For character
Character.isLetter(str.charAt(0)) --return true

How can I check if a single character appears in a string?

In Java is there a way to check the condition:
"Does this single character appear at all in string x"
without using a loop?
You can use string.indexOf('a').
If the char a is present in string :
it returns the the index of the first occurrence of the character in
the character sequence represented by this object, or -1 if the
character does not occur.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring (there are 4 variations of this method)
I'm not sure what the original poster is asking exactly. Since indexOf(...) and contains(...) both probably use loops internally, perhaps he's looking to see if this is possible at all without a loop? I can think of two ways off hand, one would of course be recurrsion:
public boolean containsChar(String s, char search) {
if (s.length() == 0)
return false;
else
return s.charAt(0) == search || containsChar(s.substring(1), search);
}
The other is far less elegant, but completeness...:
/**
* Works for strings of up to 5 characters
*/
public boolean containsChar(String s, char search) {
if (s.length() > 5) throw IllegalArgumentException();
try {
if (s.charAt(0) == search) return true;
if (s.charAt(1) == search) return true;
if (s.charAt(2) == search) return true;
if (s.charAt(3) == search) return true;
if (s.charAt(4) == search) return true;
} catch (IndexOutOfBoundsException e) {
// this should never happen...
return false;
}
return false;
}
The number of lines grow as you need to support longer and longer strings of course. But there are no loops/recurrsions at all. You can even remove the length check if you're concerned that that length() uses a loop.
You can use 2 methods from the String class.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring or returns -1 if the character is not found (there are 4 variations of this method)
Method 1:
String myString = "foobar";
if (myString.contains("x") {
// Do something.
}
Method 2:
String myString = "foobar";
if (myString.indexOf("x") >= 0 {
// Do something.
}
Links by: Zach Scrivena
String temp = "abcdefghi";
if(temp.indexOf("b")!=-1)
{
System.out.println("there is 'b' in temp string");
}
else
{
System.out.println("there is no 'b' in temp string");
}
If you need to check the same string often you can calculate the character occurrences up-front. This is an implementation that uses a bit array contained into a long array:
public class FastCharacterInStringChecker implements Serializable {
private static final long serialVersionUID = 1L;
private final long[] l = new long[1024]; // 65536 / 64 = 1024
public FastCharacterInStringChecker(final String string) {
for (final char c: string.toCharArray()) {
final int index = c >> 6;
final int value = c - (index << 6);
l[index] |= 1L << value;
}
}
public boolean contains(final char c) {
final int index = c >> 6; // c / 64
final int value = c - (index << 6); // c - (index * 64)
return (l[index] & (1L << value)) != 0;
}}
To check if something does not exist in a string, you at least need to look at each character in a string. So even if you don't explicitly use a loop, it'll have the same efficiency. That being said, you can try using str.contains(""+char).
Is the below what you were looking for?
int index = string.indexOf(character);
return index != -1;
Yes, using the indexOf() method on the string class. See the API documentation for this method
String.contains(String) or String.indexOf(String) - suggested
"abc".contains("Z"); // false - correct
"zzzz".contains("Z"); // false - correct
"Z".contains("Z"); // true - correct
"πŸ˜€andπŸ˜€".contains("πŸ˜€"); // true - correct
"πŸ˜€andπŸ˜€".contains("πŸ˜‚"); // false - correct
"πŸ˜€andπŸ˜€".indexOf("πŸ˜€"); // 0 - correct
"πŸ˜€andπŸ˜€".indexOf("πŸ˜‚"); // -1 - correct
String.indexOf(int) and carefully considered String.indexOf(char) with char to int widening
"πŸ˜€andπŸ˜€".indexOf("πŸ˜€".charAt(0)); // 0 though incorrect usage has correct output due to portion of correct data
"πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".charAt(0)); // 0 -- incorrect usage and ambiguous result
"πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".codePointAt(0)); // -1 -- correct usage and correct output
The discussions around character is ambiguous in Java world
can the value of char or Character considered as single character?
No. In the context of unicode characters, char or Character can sometimes be part of a single character and should not be treated as a complete single character logically.
if not, what should be considered as single character (logically)?
Any system supporting character encodings for Unicode characters should consider unicode's codepoint as single character.
So Java should do that very clear & loud rather than exposing too much of internal implementation details to users.
String class is bad at abstraction (though it requires confusingly good amount of understanding of its encapsulations to understand the abstraction πŸ˜’πŸ˜’πŸ˜’ and hence an anti-pattern).
How is it different from general char usage?
char can be only be mapped to a character in Basic Multilingual Plane.
Only codePoint - int can cover the complete range of Unicode characters.
Why is this difference?
char is internally treated as 16-bit unsigned value and could not represent all the unicode characters using UTF-16 internal representation using only 2-bytes. Sometimes, values in a 16-bit range have to be combined with another 16-bit value to correctly define character.
Without getting too verbose, the usage of indexOf, charAt, length and such methods should be more explicit. Sincerely hoping Java will add new UnicodeString and UnicodeCharacter classes with clearly defined abstractions.
Reason to prefer contains and not indexOf(int)
Practically there are many code flows that treat a logical character as char in java.
In Unicode context, char is not sufficient
Though the indexOf takes in an int, char to int conversion masks this from the user and user might do something like str.indexOf(someotherstr.charAt(0))(unless the user is aware of the exact context)
So, treating everything as CharSequence (aka String) is better
public static void main(String[] args) {
System.out.println("πŸ˜€andπŸ˜€".indexOf("πŸ˜€".charAt(0))); // 0 though incorrect usage has correct output due to portion of correct data
System.out.println("πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".charAt(0))); // 0 -- incorrect usage and ambiguous result
System.out.println("πŸ˜€andπŸ˜€".indexOf("πŸ˜‚".codePointAt(0))); // -1 -- correct usage and correct output
System.out.println("πŸ˜€andπŸ˜€".contains("πŸ˜€")); // true - correct
System.out.println("πŸ˜€andπŸ˜€".contains("πŸ˜‚")); // false - correct
}
Semantics
char can handle most of the practical use cases. Still its better to use codepoints within programming environment for future extensibility.
codepoint should handle nearly all of the technical use cases around encodings.
Still, Grapheme Clusters falls out of the scope of codepoint level of abstraction.
Storage layers can choose char interface if ints are too costly(doubled). Unless storage cost is the only metric, its still better to use codepoint. Also, its better to treat storage as byte and delegate semantics to business logic built around storage.
Semantics can be abstracted at multiple levels. codepoint should become lowest level of interface and other semantics can be built around codepoint in runtime environment.
package com;
public class _index {
public static void main(String[] args) {
String s1="be proud to be an indian";
char ch=s1.charAt(s1.indexOf('e'));
int count = 0;
for(int i=0;i<s1.length();i++) {
if(s1.charAt(i)=='e'){
System.out.println("number of E:=="+ch);
count++;
}
}
System.out.println("Total count of E:=="+count);
}
}
static String removeOccurences(String a, String b)
{
StringBuilder s2 = new StringBuilder(a);
for(int i=0;i<b.length();i++){
char ch = b.charAt(i);
System.out.println(ch+" first index"+a.indexOf(ch));
int lastind = a.lastIndexOf(ch);
for(int k=new String(s2).indexOf(ch);k > 0;k=new String(s2).indexOf(ch)){
if(s2.charAt(k) == ch){
s2.deleteCharAt(k);
System.out.println("val of s2 : "+s2.toString());
}
}
}
System.out.println(s1.toString());
return (s1.toString());
}
you can use this code. It will check the char is present or not. If it is present then the return value is >= 0 otherwise it's -1. Here I am printing alphabets that is not present in the input.
import java.util.Scanner;
public class Test {
public static void letters()
{
System.out.println("Enter input char");
Scanner sc = new Scanner(System.in);
String input = sc.next();
System.out.println("Output : ");
for (char alphabet = 'A'; alphabet <= 'Z'; alphabet++) {
if(input.toUpperCase().indexOf(alphabet) < 0)
System.out.print(alphabet + " ");
}
}
public static void main(String[] args) {
letters();
}
}
//Ouput Example
Enter input char
nandu
Output :
B C E F G H I J K L M O P Q R S T V W X Y Z
If you see the source code of indexOf in JAVA:
public int indexOf(int ch, int fromIndex) {
final int max = value.length;
if (fromIndex < 0) {
fromIndex = 0;
} else if (fromIndex >= max) {
// Note: fromIndex might be near -1>>>1.
return -1;
}
if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
// handle most cases here (ch is a BMP code point or a
// negative value (invalid code point))
final char[] value = this.value;
for (int i = fromIndex; i < max; i++) {
if (value[i] == ch) {
return i;
}
}
return -1;
} else {
return indexOfSupplementary(ch, fromIndex);
}
}
you can see it uses a for loop for finding a character. Note that each indexOf you may use in your code, is equal to one loop.
So, it is unavoidable to use loop for a single character.
However, if you want to find a special string with more different forms, use useful libraries such as util.regex, it deploys stronger algorithm to match a character or a string pattern with Regular Expressions. For example to find an email in a string:
String regex = "^(.+)#(.+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(email);
If you don't like to use regex, just use a loop and charAt and try to cover all cases in one loop.
Be careful recursive methods has more overhead than loop, so it's not recommended.
how about one uses this ;
let text = "Hello world, welcome to the universe.";
let result = text.includes("world");
console.log(result) ....// true
the result will be a true or false
this always works for me
You won't be able to check if char appears at all in some string without atleast going over the string once using loop / recursion ( the built-in methods like indexOf also use a loop )
If the no. of times you look up if a char is in string x is more way more than the length of the string than I would recommend using a Set data structure as that would be more efficient than simply using indexOf
String s = "abc";
// Build a set so we can check if character exists in constant time O(1)
Set<Character> set = new HashSet<>();
int len = s.length();
for(int i = 0; i < len; i++) set.add(s.charAt(i));
// Now we can check without the need of a loop
// contains method of set doesn't use a loop unlike string's contains method
set.contains('a') // true
set.contains('z') // false
Using set you will be able to check if character exists in a string in constant time O(1) but you will also use additional memory ( Space complexity will be O(n) ).

Categories

Resources