indexOf Case Sensitive? - java

Is the indexOf(String) method case sensitive? If so, is there a case insensitive version of it?

The indexOf() methods are all case-sensitive. You can make them (roughly, in a broken way, but working for plenty of cases) case-insensitive by converting your strings to upper/lower case beforehand:
s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
s1.indexOf(s2);

Is the indexOf(String) method case sensitive?
Yes, it is case sensitive:
#Test
public void indexOfIsCaseSensitive() {
assertTrue("Hello World!".indexOf("Hello") != -1);
assertTrue("Hello World!".indexOf("hello") == -1);
}
If so, is there a case insensitive version of it?
No, there isn't. You can convert both strings to lower case before calling indexOf:
#Test
public void caseInsensitiveIndexOf() {
assertTrue("Hello World!".toLowerCase().indexOf("Hello".toLowerCase()) != -1);
assertTrue("Hello World!".toLowerCase().indexOf("hello".toLowerCase()) != -1);
}

There is an ignore case method in StringUtils class of Apache Commons Lang library
indexOfIgnoreCase(CharSequence str, CharSequence searchStr)

Yes, indexOf is case sensitive.
The best way to do case insensivity I have found is:
String original;
int idx = original.toLowerCase().indexOf(someStr.toLowerCase());
That will do a case insensitive indexOf().

Here is my solution which does not allocate any heap memory, therefore it should be significantly faster than most of the other implementations mentioned here.
public static int indexOfIgnoreCase(final String haystack,
final String needle) {
if (needle.isEmpty() || haystack.isEmpty()) {
// Fallback to legacy behavior.
return haystack.indexOf(needle);
}
for (int i = 0; i < haystack.length(); ++i) {
// Early out, if possible.
if (i + needle.length() > haystack.length()) {
return -1;
}
// Attempt to match substring starting at position i of haystack.
int j = 0;
int ii = i;
while (ii < haystack.length() && j < needle.length()) {
char c = Character.toLowerCase(haystack.charAt(ii));
char c2 = Character.toLowerCase(needle.charAt(j));
if (c != c2) {
break;
}
j++;
ii++;
}
// Walked all the way to the end of the needle, return the start
// position that this was found.
if (j == needle.length()) {
return i;
}
}
return -1;
}
And here are the unit tests that verify correct behavior.
#Test
public void testIndexOfIgnoreCase() {
assertThat(StringUtils.indexOfIgnoreCase("A", "A"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "A"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("A", "a"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "a"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "ba"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("ba", "a"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", " Royal Blue"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase(" Royal Blue", "Royal Blue"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "royal"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "oyal"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "al"), is(3));
assertThat(StringUtils.indexOfIgnoreCase("", "royal"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", ""), is(0));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BLUE"), is(6));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BIGLONGSTRING"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "Royal Blue LONGSTRING"), is(-1));
}

Yes, it is case-sensitive. You can do a case-insensitive indexOf by converting your String and the String parameter both to upper-case before searching.
String str = "Hello world";
String search = "hello";
str.toUpperCase().indexOf(search.toUpperCase());
Note that toUpperCase may not work in some circumstances. For instance this:
String str = "Feldbergstraße 23, Mainz";
String find = "mainz";
int idxU = str.toUpperCase().indexOf (find.toUpperCase ());
int idxL = str.toLowerCase().indexOf (find.toLowerCase ());
idxU will be 20, which is wrong! idxL will be 19, which is correct. What's causing the problem is tha toUpperCase() converts the "ß" character into TWO characters, "SS" and this throws the index off.
Consequently, always stick with toLowerCase()

What are you doing with the index value once returned?
If you are using it to manipulate your string, then could you not use a regular expression instead?
import static org.junit.Assert.assertEquals;
import org.junit.Test;
public class StringIndexOfRegexpTest {
#Test
public void testNastyIndexOfBasedReplace() {
final String source = "Hello World";
final int index = source.toLowerCase().indexOf("hello".toLowerCase());
final String target = "Hi".concat(source.substring(index
+ "hello".length(), source.length()));
assertEquals("Hi World", target);
}
#Test
public void testSimpleRegexpBasedReplace() {
final String source = "Hello World";
final String target = source.replaceFirst("(?i)hello", "Hi");
assertEquals("Hi World", target);
}
}

I've just looked at the source. It compares chars so it is case sensitive.

#Test
public void testIndexofCaseSensitive() {
TestCase.assertEquals(-1, "abcDef".indexOf("d") );
}

Yes, I am fairly sure it is. One method of working around that using the standard library would be:
int index = str.toUpperCase().indexOf("FOO");

Had the same problem.
I tried regular expression and the apache StringUtils.indexOfIgnoreCase-Method, but both were pretty slow...
So I wrote an short method myself...:
public static int indexOfIgnoreCase(final String chkstr, final String searchStr, int i) {
if (chkstr != null && searchStr != null && i > -1) {
int serchStrLength = searchStr.length();
char[] searchCharLc = new char[serchStrLength];
char[] searchCharUc = new char[serchStrLength];
searchStr.toUpperCase().getChars(0, serchStrLength, searchCharUc, 0);
searchStr.toLowerCase().getChars(0, serchStrLength, searchCharLc, 0);
int j = 0;
for (int checkStrLength = chkstr.length(); i < checkStrLength; i++) {
char charAt = chkstr.charAt(i);
if (charAt == searchCharLc[j] || charAt == searchCharUc[j]) {
if (++j == serchStrLength) {
return i - j + 1;
}
} else { // faster than: else if (j != 0) {
i = i - j;
j = 0;
}
}
}
return -1;
}
According to my tests its much faster... (at least if your searchString is rather short).
if you have any suggestions for improvement or bugs it would be nice to let me know... (since I use this code in an application ;-)

The first question has already been answered many times. Yes, the String.indexOf() methods are all case-sensitive.
If you need a locale-sensitive indexOf() you could use the Collator. Depending on the strength value you set you can get case insensitive comparison, and also treat accented letters as the same as the non-accented ones, etc.
Here is an example of how to do this:
private int indexOf(String original, String search) {
Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);
for (int i = 0; i <= original.length() - search.length(); i++) {
if (collator.equals(search, original.substring(i, i + search.length()))) {
return i;
}
}
return -1;
}

Just to sum it up, 3 solutions:
using toLowerCase() or toUpperCase
using StringUtils of apache
using regex
Now, what I was wondering was which one is the fastest?
I'm guessing on average the first one.

I would like to lay claim to the ONE and only solution posted so far that actually works. :-)
Three classes of problems that have to be dealt with.
Non-transitive matching rules for lower and uppercase. The Turkish I problem has been mentioned frequently in other replies. According to comments in Android source for String.regionMatches, the Georgian comparison rules requires additional conversion to lower-case when comparing for case-insensitive equality.
Cases where upper- and lower-case forms have a different number of letters. Pretty much all of the solutions posted so far fail, in these cases. Example: German STRASSE vs. Straße have case-insensitive equality, but have different lengths.
Binding strengths of accented characters. Locale AND context effect whether accents match or not. In French, the uppercase form of 'é' is 'E', although there is a movement toward using uppercase accents . In Canadian French, the upper-case form of 'é' is 'É', without exception. Users in both countries would expect "e" to match "é" when searching. Whether accented and unaccented characters match is locale-specific. Now consider: does "E" equal "É"? Yes. It does. In French locales, anyway.
I am currently using android.icu.text.StringSearch to correctly implement previous implementations of case-insensitive indexOf operations.
Non-Android users can access the same functionality through the ICU4J package, using the com.ibm.icu.text.StringSearch class.
Be careful to reference classes in the correct icu package (android.icu.text or com.ibm.icu.text) as Android and the JRE both have classes with the same name in other namespaces (e.g. Collator).
this.collator = (RuleBasedCollator)Collator.getInstance(locale);
this.collator.setStrength(Collator.PRIMARY);
....
StringSearch search = new StringSearch(
pattern,
new StringCharacterIterator(targetText),
collator);
int index = search.first();
if (index != SearchString.DONE)
{
// remember that the match length may NOT equal the pattern length.
length = search.getMatchLength();
....
}
Test Cases (Locale, pattern, target text, expectedResult):
testMatch(Locale.US,"AbCde","aBcDe",true);
testMatch(Locale.US,"éèê","EEE",true);
testMatch(Locale.GERMAN,"STRASSE","Straße",true);
testMatch(Locale.FRENCH,"éèê","EEE",true);
testMatch(Locale.FRENCH,"EEE","éèê",true);
testMatch(Locale.FRENCH,"éèê","ÉÈÊ",true);
testMatch(new Locale("tr-TR"),"TITLE","tıtle",true); // Turkish dotless I/i
testMatch(new Locale("tr-TR"),"TİTLE","title",true); // Turkish dotted I/i
testMatch(new Locale("tr-TR"),"TITLE","title",false); // Dotless-I != dotted i.
PS: As best as I can determine, the PRIMARY binding strength should do the right thing when locale-specific rules differentiate between accented and non-accented characters according to dictionary rules; but I don't which locale to use to test this premise. Donated test cases would be gratefully appreciated.
--
Copyright notice: because StackOverflow's CC-BY_SA copyrights as applied to code-fragments are unworkable for professional developers, these fragments are dual licensed under more appropriate licenses here: https://pastebin.com/1YhFWmnU

But it's not hard to write one:
public class CaseInsensitiveIndexOfTest extends TestCase {
public void testOne() throws Exception {
assertEquals(2, caseInsensitiveIndexOf("ABC", "xxabcdef"));
}
public static int caseInsensitiveIndexOf(String substring, String string) {
return string.toLowerCase().indexOf(substring.toLowerCase());
}
}

Converting both strings to lower-case is usually not a big deal but it would be slow if some of the strings is long. And if you do this in a loop then it would be really bad. For this reason, I would recommend indexOfIgnoreCase.

static string Search(string factMessage, string b)
{
int index = factMessage.IndexOf(b, StringComparison.CurrentCultureIgnoreCase);
string line = null;
int i = index;
if (i == -1)
{ return "not matched"; }
else
{
while (factMessage[i] != ' ')
{
line = line + factMessage[i];
i++;
}
return line;
}
}

Here's a version closely resembling Apache's StringUtils version:
public int indexOfIgnoreCase(String str, String searchStr) {
return indexOfIgnoreCase(str, searchStr, 0);
}
public int indexOfIgnoreCase(String str, String searchStr, int fromIndex) {
// https://stackoverflow.com/questions/14018478/string-contains-ignore-case/14018511
if(str == null || searchStr == null) return -1;
if (searchStr.length() == 0) return fromIndex; // empty string found; use same behavior as Apache StringUtils
final int endLimit = str.length() - searchStr.length() + 1;
for (int i = fromIndex; i < endLimit; i++) {
if (str.regionMatches(true, i, searchStr, 0, searchStr.length())) return i;
}
return -1;
}

indexOf is case sensitive. This is because it uses the equals method to compare the elements in the list. The same thing goes for contains and remove.

Related

How can I test if a String contains the characters of a particular string?

I have a String with value
String rest="bac";
I have another String with value
String str="baack";
If i use
str.contains(rest);
it returns false. But i want the output to be true. As "baack" contains all the letters from string rest
Is it possible to do so? With or without this method?
Unfortunately, there is no standard method doing this, as far as I know.
If what you want is to check that the second string contains at least once every character of the first string, then you can check each character one by one with the following test:
boolean result = true;
for (char c : test.toCharArray()) {
result &= str.indexOf(c) > -1;
}
return result;
Or alternatively:
for (char c : test.toCharArray()) {
if (str.indexOf(c) == -1) {
return false;
}
}
return true;
It might not be optimal, but it works and it is simple to read.
Since the order is not important, your question turns to be whether the first set of character contains the second set of character.
// Initial the sets
Set<char> bigSet = new HashSet<char>(Arrays.asList(str));
Set<char> smallSet = new HashSet<char>(Arrays.asList(rest));
for (char c : smallSet) {
if(!bigSet.contains(c)){
return false;
}
}
return true;
Here is another way to make sure all characters from one string are in the second string. It is a lengthy way but it is one of the very basic ways to work with characters from String in java. I know it is not optimal but it serves the purpose.
String rest="bac";
String str="baack";
char[] strChar = str.toCharArray();
char[] restChar = rest.toCharArray();
int count = 0;
for(int i=0;i<restChar.length;i++){
for(int j=0;j<strChar.length; j++){
if(restChar[i] == strChar[j]){
count++;
}
}
}
if(count>=restChar.length){
System.out.println("All the characters from: "+rest+" are in: "+str);
}

Removing leading zero in java code

May I know how can I remove the leading zero in JAVA code? I tried several methods like regex tools
"s.replaceFirst("^0+(?!$)", "") / replaceAll("^0*", "");`
but it's seem like not support with my current compiler compliance level (1.3), will have a red line stated the method replaceFirst(String,String)is undefined for the type String.
Part of My Java code
public String proc_MODEL(Element recElement)
{
String SEAT = "";
try
{
SEAT = setNullToString(recElement.getChildText("SEAT")); // xml value =0000500
if (SEAT.length()>0)
{
SEAT = SEAT.replaceFirst("^0*", ""); //I need to remove leading zero to only 500
}
catch (Exception e)
{
e.printStackTrace();
return "501 Exception in proc_MODEL";
}
}
}
Appreciate for help.
If you want remove leading zeros, you could parse to an Integer and convert back to a String with one line like
String seat = "001";// setNullToString(recElement.getChildText("SEAT"));
seat = Integer.valueOf(seat).toString();
System.out.println(seat);
Output is
1
Of course if you intend to use the value it's probably better to keep the int
int s = Integer.parseInt(seat);
System.out.println(s);
replaceFirst() was introduced in 1.4 and your compiler pre-dates that.
One possibility is to use something like:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
while ((s.length() > 1) && (s.charAt(0) == '0'))
s = s.substring(1);
System.out.println(s);
}
}
It's not the most efficient code in the world but it'll get the job done.
A more efficient segment without unnecessary string creation could be:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
int pos = 0;
int len = s.length();
while ((pos < len-1) && (s.charAt(pos) == '0'))
pos++;
s = s.substring(pos);
System.out.println(s);
}
}
Both of those also handle the degenerate cases of an empty string and a string containing only 0 characters.
Using a java method str.replaceAll("^0+(?!$)", "") would be simple;
First parameter:regex -- the regular expression to which this string is to be matched.
Second parameter: replacement -- the string which would replace matched expression.
As stated in Java documentation, 'replaceFirst' only started existing since Java 1.4 http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceFirst(java.lang.String,%20java.lang.String)
Use this function instead:
String removeLeadingZeros(String str) {
while (str.indexOf("0")==0)
str = str.substring(1);
return str;
}

java find if the string contains 2 other strings

I have 2 strings "test" "bet" and another string a="tbtetse". I need to check if the "tbtetse" contains the other two strings.
I was thinking if I could find all the anagrams of string a and and then find the other two strings in those, but it doesn't work that way and also my anagram code is failing for a lengthy string.
Could you please help with any other ways to solve it?
Assuming you're trying to test whether the letters in a can be used to form an anagram of the test strings test and bet: I recommend making a dictionary (HashMap or whatever) of character counts from string a, indexed by character. Build a similar dictionary for the words you're testing. Then make sure that a has at least as many instances of each character from the test strings as they have.
Edit: Alcanzar suggests arrays of length 26 for holding the counts (one slot for each letter). Assuming you're dealing with only English letters, that is probably less of a hassle than dictionaries. If you don't know the number of allowed characters, the dictionary route is necessary.
Check below code, it may help you.
public class StringTest {
public static void main(String[] args) {
String str1 = "test";
String str2 = "bev";
String str3 = "tbtetse";
System.out.println(isStringPresent(str1, str2, str3));
}
private static boolean isStringPresent(String str1, String str2, String str3) {
if ((str1.length() + str2.length()) != str3.length()) {
return false;
} else {
String[] str1Arr = str1.split("");
String[] str2Arr = str2.split("");
for (String string : str1Arr) {
if (!str3.contains(string)) {
return false;
}
}
for (String string : str2Arr) {
if (!str3.contains(string)) {
return false;
}
}
}
return true;
}
}
basically you need to count characters in both sets and compare them
void fillInCharCounts(String word,int[] counts) {
for (int i = 0; i<word.length(); i++) {
char ch = word.charAt(i);
int index = ch - 'a';
counts[index]++;
}
}
int[] counts1 = new int[26];
int[] counts2 = new int[26];
fillInCharCounts("test",counts1);
fillInCharCounts("bet",counts1);
fillInCharCounts("tbtese",counts2);
boolean failed = false;
for (int i = 0; i<counts1.length; i++) {
if (counts1[i] > counts2[i]) {
failed = true;
}
}
if (failed) {
whatever
} else {
something else
}
If you are generalizing it, don't forget to call .toLowerCase() on the word before sending it in (or fix the counting method).
Pseudo code:
Make a copy of string "tbtetse".
Loop through each character in "test".
Do a indexOf() search for the character in your copied string and remove it if found.
If not found, fail.
Do the same for the string "bet".
class WordLetter {
char letter;
int nth; // Occurrence of that letter
...
}
One now can use Sets
Set<WordLetter>
// "test" = { t0 e0 s0 t1 }
Then testing reduces to set operations. If both words need to be present, a union can be tested. If both words must be formed from separate letters, a set of the concatenation can be tested.

Find difference between two Strings

Suppose I have two long strings. They are almost same.
String a = "this is a example"
String b = "this is a examp"
Above code is just for example. Actual strings are quite long.
Problem is one string have 2 more characters than the other.
How can I check which are those two character?
You can use StringUtils.difference(String first, String second).
This is how they implemented it:
public static String difference(String str1, String str2) {
if (str1 == null) {
return str2;
}
if (str2 == null) {
return str1;
}
int at = indexOfDifference(str1, str2);
if (at == INDEX_NOT_FOUND) {
return EMPTY;
}
return str2.substring(at);
}
public static int indexOfDifference(CharSequence cs1, CharSequence cs2) {
if (cs1 == cs2) {
return INDEX_NOT_FOUND;
}
if (cs1 == null || cs2 == null) {
return 0;
}
int i;
for (i = 0; i < cs1.length() && i < cs2.length(); ++i) {
if (cs1.charAt(i) != cs2.charAt(i)) {
break;
}
}
if (i < cs2.length() || i < cs1.length()) {
return i;
}
return INDEX_NOT_FOUND;
}
To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.
StringUtils.difference(null, null) = null
StringUtils.difference("", "") = ""
StringUtils.difference("", "abc") = "abc"
StringUtils.difference("abc", "") = ""
StringUtils.difference("abc", "abc") = ""
StringUtils.difference("ab", "abxyz") = "xyz"
StringUtils.difference("abcde", "abxyz") = "xyz"
StringUtils.difference("abcde", "xyz") = "xyz"
Without iterating through the strings you can only know that they are different, not where - and that only if they are of different length. If you really need to know what the different characters are, you must step through both strings in tandem and compare characters at the corresponding places.
The following Java snippet efficiently computes a minimal set of characters that have to be removed from (or added to) the respective strings in order to make the strings equal. It's an example of dynamic programming.
import java.util.HashMap;
import java.util.Map;
public class StringUtils {
/**
* Examples
*/
public static void main(String[] args) {
System.out.println(diff("this is a example", "this is a examp")); // prints (le,)
System.out.println(diff("Honda", "Hyundai")); // prints (o,yui)
System.out.println(diff("Toyota", "Coyote")); // prints (Ta,Ce)
System.out.println(diff("Flomax", "Volmax")); // prints (Fo,Vo)
}
/**
* Returns a minimal set of characters that have to be removed from (or added to) the respective
* strings to make the strings equal.
*/
public static Pair<String> diff(String a, String b) {
return diffHelper(a, b, new HashMap<>());
}
/**
* Recursively compute a minimal set of characters while remembering already computed substrings.
* Runs in O(n^2).
*/
private static Pair<String> diffHelper(String a, String b, Map<Long, Pair<String>> lookup) {
long key = ((long) a.length()) << 32 | b.length();
if (!lookup.containsKey(key)) {
Pair<String> value;
if (a.isEmpty() || b.isEmpty()) {
value = new Pair<>(a, b);
} else if (a.charAt(0) == b.charAt(0)) {
value = diffHelper(a.substring(1), b.substring(1), lookup);
} else {
Pair<String> aa = diffHelper(a.substring(1), b, lookup);
Pair<String> bb = diffHelper(a, b.substring(1), lookup);
if (aa.first.length() + aa.second.length() < bb.first.length() + bb.second.length()) {
value = new Pair<>(a.charAt(0) + aa.first, aa.second);
} else {
value = new Pair<>(bb.first, b.charAt(0) + bb.second);
}
}
lookup.put(key, value);
}
return lookup.get(key);
}
public static class Pair<T> {
public Pair(T first, T second) {
this.first = first;
this.second = second;
}
public final T first, second;
public String toString() {
return "(" + first + "," + second + ")";
}
}
}
To directly get only the changed section, and not just the end, you can use Google's Diff Match Patch.
List<Diff> diffs = new DiffMatchPatch().diffMain("stringend", "stringdiffend");
for (Diff diff : diffs) {
if (diff.operation == Operation.INSERT) {
return diff.text; // Return only single diff, can also find multiple based on use case
}
}
For Android, add: implementation 'org.bitbucket.cowwoc:diff-match-patch:1.2'
This package is far more powerful than just this feature, it is mainly used for creating diff related tools.
String strDiffChop(String s1, String s2) {
if (s1.length > s2.length) {
return s1.substring(s2.length - 1);
} else if (s2.length > s1.length) {
return s2.substring(s1.length - 1);
} else {
return null;
}
}
Google's Diff Match Patch is good, but it was a pain to install into my Java maven project. Just adding a maven dependency did not work; eclipse just created the directory and added the lastUpdated info files. Finally, on the third try, I added the following to my pom:
<dependency>
<groupId>fun.mike</groupId>
<artifactId>diff-match-patch</artifactId>
<version>0.0.2</version>
</dependency>
Then I manually placed the jar and source jar files into my .m2 repo from https://search.maven.org/search?q=g:fun.mike%20AND%20a:diff-match-patch%20AND%20v:0.0.2
After all that, the following code worked:
import fun.mike.dmp.Diff;
import fun.mike.dmp.DiffMatchPatch;
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diff_main("Hello World.", "Goodbye World.");
System.out.println(diffs);
The result:
[Diff(DELETE,"Hell"), Diff(INSERT,"G"), Diff(EQUAL,"o"), Diff(INSERT,"odbye"), Diff(EQUAL," World.")]
Obviously, this was not originally written (or even ported fully) into Java. (diff_main? I can feel the C burning into my eyes :-) )
Still, it works. And for people working with long and complex strings, it can be a valuable tool.
To find the words that are different in the two lines, one can use the following code.
String[] strList1 = str1.split(" ");
String[] strList2 = str2.split(" ");
List<String> list1 = Arrays.asList(strList1);
List<String> list2 = Arrays.asList(strList2);
// Prepare a union
List<String> union = new ArrayList<>(list1);
union.addAll(list2);
// Prepare an intersection
List<String> intersection = new ArrayList<>(list1);
intersection.retainAll(list2);
// Subtract the intersection from the union
union.removeAll(intersection);
for (String s : union) {
System.out.println(s);
}
In the end, you will have a list of words that are different in both the lists. One can modify it easily to simply have the different words in the first list or the second list and not simultaneously. This can be done by removing the intersection from only from list1 or list2 instead of the union.
Computing the exact location can be done by adding up the lengths of each word in the split list (along with the splitting regex) or by simply doing String.indexOf("subStr").
On top of using StringUtils.difference(String first, String second) as seen in other answers, you can also use StringUtils.indexOfDifference(String first, String second) to get the index of where the strings start to differ. Ex:
StringUtils.indexOfDifference("abc", "dabc") = 0
StringUtils.indexOfDifference("abc", "abcd") = 3
where 0 is used as the starting index.
Another great library for discovering the difference between strings is DiffUtils at https://github.com/java-diff-utils. I used Dmitry Naumenko's fork:
public void testDiffChange() {
final List<String> changeTestFrom = Arrays.asList("aaa", "bbb", "ccc");
final List<String> changeTestTo = Arrays.asList("aaa", "zzz", "ccc");
System.out.println("changeTestFrom=" + changeTestFrom);
System.out.println("changeTestTo=" + changeTestTo);
final Patch<String> patch0 = DiffUtils.diff(changeTestFrom, changeTestTo);
System.out.println("patch=" + Arrays.toString(patch0.getDeltas().toArray()));
String original = "abcdefghijk";
String badCopy = "abmdefghink";
List<Character> originalList = original
.chars() // Convert to an IntStream
.mapToObj(i -> (char) i) // Convert int to char, which gets boxed to Character
.collect(Collectors.toList()); // Collect in a List<Character>
List<Character> badCopyList = badCopy.chars().mapToObj(i -> (char) i).collect(Collectors.toList());
System.out.println("original=" + original);
System.out.println("badCopy=" + badCopy);
final Patch<Character> patch = DiffUtils.diff(originalList, badCopyList);
System.out.println("patch=" + Arrays.toString(patch.getDeltas().toArray()));
}
The results show exactly what changed where (zero based counting):
changeTestFrom=[aaa, bbb, ccc]
changeTestTo=[aaa, zzz, ccc]
patch=[[ChangeDelta, position: 1, lines: [bbb] to [zzz]]]
original=abcdefghijk
badCopy=abmdefghink
patch=[[ChangeDelta, position: 2, lines: [c] to [m]], [ChangeDelta, position: 9, lines: [j] to [n]]]
For a simple use case like this. You can check the sizes of the string and use the split function. For your example
a.split(b)[1]
I think the Levenshtein algorithm and the 3rd party libraries brought out for this very simple (and perhaps poorly stated?) test case are WAY overblown.
Assuming your example does not suggest the two bytes are always different at the end, I'd suggest the JDK's Arrays.mismatch( byte[], byte[] ) to find the first index where the two bytes differ.
String longer = "this is a example";
String shorter = "this is a examp";
int differencePoint = Arrays.mismatch( longer.toCharArray(), shorter.toCharArray() );
System.out.println( differencePoint );
You could now repeat the process if you suspect the second character is further along in the String.
Or, if as you suggest in your example the two characters are together, there is nothing further to do. Your answer then would be:
System.out.println( longer.charAt( differencePoint ) );
System.out.println( longer.charAt( differencePoint + 1 ) );
If your string contains characters outside of the Basic Multilingual Plane - for example emoji - then you have to use a different technique. For example,
String a = "a 🐣 is cuter than a 🐇.";
String b = "a 🐣 is cuter than a 🐹.";
int firstDifferentChar = Arrays.mismatch( a.toCharArray(), b.toCharArray() );
int firstDifferentCodepoint = Arrays.mismatch( a.codePoints().toArray(), b.codePoints().toArray() );
System.out.println( firstDifferentChar ); // prints 22!
System.out.println( firstDifferentCodepoint ); // prints 20, which is correct.
System.out.println( a.codePoints().toArray()[ firstDifferentCodepoint ] ); // prints out 128007
System.out.println( new String( Character.toChars( 128007 ) ) ); // this prints the rabbit glyph.
You may try this
String a = "this is a example";
String b = "this is a examp";
String ans= a.replace(b, "");
System.out.print(now);
//ans=le

Is there an existing library method that checks if a String is all upper case or lower case in Java?

I know there are plenty of upper() methods in Java and other frameworks like Apache commons lang, which convert a String to all upper case.
Are there any common libraries that provide a method like isUpper(String s) and isLower(String s), to check if all the characters in the String are upper or lower case?
EDIT:
Many good answers about converting to Upper and comparing to this. I guess I should have been a bit more specific, and said that I already had thought of that, but I was hoping to be able to use an existing method for this.
Good comment about possible inclusion of this in apache.commons.lang.StringUtils.
Someone has even submitted a patch (20090310). Hopefully we will see this soon.
https://issues.apache.org/jira/browse/LANG-471
EDIT:
What I needed this method for, was to capitalize names of hotels that sometimes came in all uppercase. I only wanted to capitalize them if they were all lower or upper case.
I did run in to the problems with non letter chars mentioned in some of the posts, and ended up doing something like this:
private static boolean isAllUpper(String s) {
for(char c : s.toCharArray()) {
if(Character.isLetter(c) && Character.isLowerCase(c)) {
return false;
}
}
return true;
}
This discussion and differing solutions (with different problems), clearly shows that there is a need for a good solid isAllUpper(String s) method in commons.lang
Until then I guess that the myString.toUpperCase().equals(myString) is the best way to go.
Now in StringUtils isAllUpperCase
This if condition can get the expected result:
String input = "ANYINPUT";
if(input.equals(input.toUpperCase())
{
// input is all upper case
}
else if (input.equals(input.toLowerCase())
{
// input is all lower case
}
else
{
// input is mixed case
}
Not a library function unfortunately, but it's fairly easy to roll your own. If efficiency is a concern, this might be faster than s.toUpperCase().equals(s) because it can bail out early.
public static boolean isUpperCase(String s)
{
for (int i=0; i<s.length(); i++)
{
if (!Character.isUpperCase(s.charAt(i)))
{
return false;
}
}
return true;
}
Edit: As other posters and commenters have noted, we need to consider the behaviour when the string contains non-letter characters: should isUpperCase("HELLO1") return true or false? The function above will return false because '1' is not an upper case character, but this is possibly not the behaviour you want. An alternative definition which would return true in this case would be:
public static boolean isUpperCase2(String s)
{
for (int i=0; i<s.length(); i++)
{
if (Character.isLowerCase(s.charAt(i)))
{
return false;
}
}
return true;
}
Not that i know.
You can copy the string and convert the copy to lower/upper case and compare to the original one.
Or create a loop which checks the single characters if the are lower or upper case.
This method might be faster than comparing a String to its upper-case version as it requires only 1 pass:
public static boolean isUpper(String s)
{
for(char c : s.toCharArray())
{
if(! Character.isUpperCase(c))
return false;
}
return true;
}
Please note that there might be some localization issues with different character sets. I don't have any first hand experience but I think there are some languages (like Turkish) where different lower case letters can map to the same upper case letter.
Guava's CharMatchers tend to offer very expressive and efficient solutions to this kind of problem.
CharMatcher.javaUpperCase().matchesAllOf("AAA"); // true
CharMatcher.javaUpperCase().matchesAllOf("A SENTENCE"); // false
CharMatcher.javaUpperCase().or(CharMatcher.whitespace()).matchesAllOf("A SENTENCE"); // true
CharMatcher.javaUpperCase().or(CharMatcher.javaLetter().negate()).matchesAllOf("A SENTENCE"); // true
CharMatcher.javaLowerCase().matchesNoneOf("A SENTENCE"); // true
A static import for com.google.common.base.CharMatcher.* can help make these more succinct.
javaLowerCase().matchesNoneOf("A SENTENCE"); // true
Try this, may help.
import java.util.regex.Pattern;
private static final String regex ="^[A-Z0-9]"; //alpha-numeric uppercase
public static boolean isUpperCase(String str){
return Pattern.compile(regex).matcher(str).find();
}
with this code, we just change the regex.
I realise that this question is quite old, but the accepted answer uses a deprecated API, and there's a question about how to do it using ICU4J. This is how I did it:
s.chars().filter(UCharacter::isLetter).allMatch(UCharacter::isUpperCase)
If you expect your input string to be short, you could go with myString.toUpperCase().equals(myString) as you suggested. It's short and expressive.
But you can also use streams:
boolean allUpper = myString.chars().noneMatch(Character::isLowerCase);
You can use java.lang.Character.isUpperCase()
Then you can easily write a method that check if your string is uppercase (with a simple loop).
Sending the message toUpperCase() to your string and then checking if the result is equal to your string will be probably slower.
Here's a solution I came up with that's a bit universal as it doesn't require any libraries or special imports, should work with any version of Java, requires only a single pass, and should be much faster than any regex based solutions:
public static final boolean isUnicaseString(String input) {
char[] carr = input.toCharArray();
// Get the index of the first letter
int i = 0;
for (; i < carr.length; i++) {
if (Character.isLetter(carr[i])) {
break;
}
}
// If we went all the way to the end above, then return true; no case at all is technically unicase
if (i == carr.length) {
return true;
}
// Determine if first letter is uppercase
boolean firstUpper = Character.isUpperCase(carr[i]);
for (; i < carr.length; i++) {
// Check each remaining letter, stopping when the case doesn't match the first
if (Character.isLetter(carr[i]) && Character.isUpperCase(carr[i]) != firstUpper) {
return false;
}
}
// If we didn't stop above, then it's unicase
return true;
}

Categories

Resources