Find difference between two Strings - java

Suppose I have two long strings. They are almost same.
String a = "this is a example"
String b = "this is a examp"
Above code is just for example. Actual strings are quite long.
Problem is one string have 2 more characters than the other.
How can I check which are those two character?

You can use StringUtils.difference(String first, String second).
This is how they implemented it:
public static String difference(String str1, String str2) {
if (str1 == null) {
return str2;
}
if (str2 == null) {
return str1;
}
int at = indexOfDifference(str1, str2);
if (at == INDEX_NOT_FOUND) {
return EMPTY;
}
return str2.substring(at);
}
public static int indexOfDifference(CharSequence cs1, CharSequence cs2) {
if (cs1 == cs2) {
return INDEX_NOT_FOUND;
}
if (cs1 == null || cs2 == null) {
return 0;
}
int i;
for (i = 0; i < cs1.length() && i < cs2.length(); ++i) {
if (cs1.charAt(i) != cs2.charAt(i)) {
break;
}
}
if (i < cs2.length() || i < cs1.length()) {
return i;
}
return INDEX_NOT_FOUND;
}

To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.
StringUtils.difference(null, null) = null
StringUtils.difference("", "") = ""
StringUtils.difference("", "abc") = "abc"
StringUtils.difference("abc", "") = ""
StringUtils.difference("abc", "abc") = ""
StringUtils.difference("ab", "abxyz") = "xyz"
StringUtils.difference("abcde", "abxyz") = "xyz"
StringUtils.difference("abcde", "xyz") = "xyz"

Without iterating through the strings you can only know that they are different, not where - and that only if they are of different length. If you really need to know what the different characters are, you must step through both strings in tandem and compare characters at the corresponding places.

The following Java snippet efficiently computes a minimal set of characters that have to be removed from (or added to) the respective strings in order to make the strings equal. It's an example of dynamic programming.
import java.util.HashMap;
import java.util.Map;
public class StringUtils {
/**
* Examples
*/
public static void main(String[] args) {
System.out.println(diff("this is a example", "this is a examp")); // prints (le,)
System.out.println(diff("Honda", "Hyundai")); // prints (o,yui)
System.out.println(diff("Toyota", "Coyote")); // prints (Ta,Ce)
System.out.println(diff("Flomax", "Volmax")); // prints (Fo,Vo)
}
/**
* Returns a minimal set of characters that have to be removed from (or added to) the respective
* strings to make the strings equal.
*/
public static Pair<String> diff(String a, String b) {
return diffHelper(a, b, new HashMap<>());
}
/**
* Recursively compute a minimal set of characters while remembering already computed substrings.
* Runs in O(n^2).
*/
private static Pair<String> diffHelper(String a, String b, Map<Long, Pair<String>> lookup) {
long key = ((long) a.length()) << 32 | b.length();
if (!lookup.containsKey(key)) {
Pair<String> value;
if (a.isEmpty() || b.isEmpty()) {
value = new Pair<>(a, b);
} else if (a.charAt(0) == b.charAt(0)) {
value = diffHelper(a.substring(1), b.substring(1), lookup);
} else {
Pair<String> aa = diffHelper(a.substring(1), b, lookup);
Pair<String> bb = diffHelper(a, b.substring(1), lookup);
if (aa.first.length() + aa.second.length() < bb.first.length() + bb.second.length()) {
value = new Pair<>(a.charAt(0) + aa.first, aa.second);
} else {
value = new Pair<>(bb.first, b.charAt(0) + bb.second);
}
}
lookup.put(key, value);
}
return lookup.get(key);
}
public static class Pair<T> {
public Pair(T first, T second) {
this.first = first;
this.second = second;
}
public final T first, second;
public String toString() {
return "(" + first + "," + second + ")";
}
}
}

To directly get only the changed section, and not just the end, you can use Google's Diff Match Patch.
List<Diff> diffs = new DiffMatchPatch().diffMain("stringend", "stringdiffend");
for (Diff diff : diffs) {
if (diff.operation == Operation.INSERT) {
return diff.text; // Return only single diff, can also find multiple based on use case
}
}
For Android, add: implementation 'org.bitbucket.cowwoc:diff-match-patch:1.2'
This package is far more powerful than just this feature, it is mainly used for creating diff related tools.

String strDiffChop(String s1, String s2) {
if (s1.length > s2.length) {
return s1.substring(s2.length - 1);
} else if (s2.length > s1.length) {
return s2.substring(s1.length - 1);
} else {
return null;
}
}

Google's Diff Match Patch is good, but it was a pain to install into my Java maven project. Just adding a maven dependency did not work; eclipse just created the directory and added the lastUpdated info files. Finally, on the third try, I added the following to my pom:
<dependency>
<groupId>fun.mike</groupId>
<artifactId>diff-match-patch</artifactId>
<version>0.0.2</version>
</dependency>
Then I manually placed the jar and source jar files into my .m2 repo from https://search.maven.org/search?q=g:fun.mike%20AND%20a:diff-match-patch%20AND%20v:0.0.2
After all that, the following code worked:
import fun.mike.dmp.Diff;
import fun.mike.dmp.DiffMatchPatch;
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diff_main("Hello World.", "Goodbye World.");
System.out.println(diffs);
The result:
[Diff(DELETE,"Hell"), Diff(INSERT,"G"), Diff(EQUAL,"o"), Diff(INSERT,"odbye"), Diff(EQUAL," World.")]
Obviously, this was not originally written (or even ported fully) into Java. (diff_main? I can feel the C burning into my eyes :-) )
Still, it works. And for people working with long and complex strings, it can be a valuable tool.

To find the words that are different in the two lines, one can use the following code.
String[] strList1 = str1.split(" ");
String[] strList2 = str2.split(" ");
List<String> list1 = Arrays.asList(strList1);
List<String> list2 = Arrays.asList(strList2);
// Prepare a union
List<String> union = new ArrayList<>(list1);
union.addAll(list2);
// Prepare an intersection
List<String> intersection = new ArrayList<>(list1);
intersection.retainAll(list2);
// Subtract the intersection from the union
union.removeAll(intersection);
for (String s : union) {
System.out.println(s);
}
In the end, you will have a list of words that are different in both the lists. One can modify it easily to simply have the different words in the first list or the second list and not simultaneously. This can be done by removing the intersection from only from list1 or list2 instead of the union.
Computing the exact location can be done by adding up the lengths of each word in the split list (along with the splitting regex) or by simply doing String.indexOf("subStr").

On top of using StringUtils.difference(String first, String second) as seen in other answers, you can also use StringUtils.indexOfDifference(String first, String second) to get the index of where the strings start to differ. Ex:
StringUtils.indexOfDifference("abc", "dabc") = 0
StringUtils.indexOfDifference("abc", "abcd") = 3
where 0 is used as the starting index.

Another great library for discovering the difference between strings is DiffUtils at https://github.com/java-diff-utils. I used Dmitry Naumenko's fork:
public void testDiffChange() {
final List<String> changeTestFrom = Arrays.asList("aaa", "bbb", "ccc");
final List<String> changeTestTo = Arrays.asList("aaa", "zzz", "ccc");
System.out.println("changeTestFrom=" + changeTestFrom);
System.out.println("changeTestTo=" + changeTestTo);
final Patch<String> patch0 = DiffUtils.diff(changeTestFrom, changeTestTo);
System.out.println("patch=" + Arrays.toString(patch0.getDeltas().toArray()));
String original = "abcdefghijk";
String badCopy = "abmdefghink";
List<Character> originalList = original
.chars() // Convert to an IntStream
.mapToObj(i -> (char) i) // Convert int to char, which gets boxed to Character
.collect(Collectors.toList()); // Collect in a List<Character>
List<Character> badCopyList = badCopy.chars().mapToObj(i -> (char) i).collect(Collectors.toList());
System.out.println("original=" + original);
System.out.println("badCopy=" + badCopy);
final Patch<Character> patch = DiffUtils.diff(originalList, badCopyList);
System.out.println("patch=" + Arrays.toString(patch.getDeltas().toArray()));
}
The results show exactly what changed where (zero based counting):
changeTestFrom=[aaa, bbb, ccc]
changeTestTo=[aaa, zzz, ccc]
patch=[[ChangeDelta, position: 1, lines: [bbb] to [zzz]]]
original=abcdefghijk
badCopy=abmdefghink
patch=[[ChangeDelta, position: 2, lines: [c] to [m]], [ChangeDelta, position: 9, lines: [j] to [n]]]

For a simple use case like this. You can check the sizes of the string and use the split function. For your example
a.split(b)[1]

I think the Levenshtein algorithm and the 3rd party libraries brought out for this very simple (and perhaps poorly stated?) test case are WAY overblown.
Assuming your example does not suggest the two bytes are always different at the end, I'd suggest the JDK's Arrays.mismatch( byte[], byte[] ) to find the first index where the two bytes differ.
String longer = "this is a example";
String shorter = "this is a examp";
int differencePoint = Arrays.mismatch( longer.toCharArray(), shorter.toCharArray() );
System.out.println( differencePoint );
You could now repeat the process if you suspect the second character is further along in the String.
Or, if as you suggest in your example the two characters are together, there is nothing further to do. Your answer then would be:
System.out.println( longer.charAt( differencePoint ) );
System.out.println( longer.charAt( differencePoint + 1 ) );
If your string contains characters outside of the Basic Multilingual Plane - for example emoji - then you have to use a different technique. For example,
String a = "a 🐣 is cuter than a 🐇.";
String b = "a 🐣 is cuter than a 🐹.";
int firstDifferentChar = Arrays.mismatch( a.toCharArray(), b.toCharArray() );
int firstDifferentCodepoint = Arrays.mismatch( a.codePoints().toArray(), b.codePoints().toArray() );
System.out.println( firstDifferentChar ); // prints 22!
System.out.println( firstDifferentCodepoint ); // prints 20, which is correct.
System.out.println( a.codePoints().toArray()[ firstDifferentCodepoint ] ); // prints out 128007
System.out.println( new String( Character.toChars( 128007 ) ) ); // this prints the rabbit glyph.

You may try this
String a = "this is a example";
String b = "this is a examp";
String ans= a.replace(b, "");
System.out.print(now);
//ans=le

Related

Sort Array List lexicographically ignoring integers

I have this code. I want to order a list of strings. Every item in the list consists of a three word sentence. I want to ignore the first word and sort the sentence lexicographically with the 2nd and 3rd words. If the 2nd or 3rd words contain an integer, I want to ignore sorting them but add them to the end of the list.
For example: (19th apple orange, 17th admin 7th, 19th apple table) should be sorted in the list as (19th apple orange, 19th apple table, 17th admin 7th)
So far my code only ignores the first word and sort lexicographically the rest of the lists
public static List<String> sortOrders(List<String> orderList) {
// Write your code here
Collections.sort( orderList,
(a, b) -> a.split(" *", 2)[1].compareTo( b.split(" *", 2)[1] )
);
return orderList;
}
In your compare method check for numbers first and then strings. You just have to add code to the steps you described:
Here's a pseudo code of what you described
...
(a,b) -> {
// Every item in the list consists of a three word sentence.
var awords = a.split(" ")
var bwords = a.split(" ")
// I want to ignore the first word
var as = awords[1] + " " awords[2]
var bs ...
// and sort the sentence lexicographically with the 2nd and 3rd words.
var r = as.compareTo(bs)
// If the 2nd or 3rd words contain an integer, I want to ignore sorting them but add them to the end of the list
if ( as.matches(".*\\d.*) ) {
return -1
} else {
return r
}
}
...
It's not clear what to do if both have numbers, e.g. a 1 a vs a 1 b, but that's something you have to clarify.
So basically you just have to go, divide each of the statements in your problem and add some code that solves it (like the example below )
You might notice there are some gaps (like what to do if two of them have strings). Once you have a working solution you can clean it up.
Another alternative with a similar idea
var as = a.substring(a.indexOf(" ")) // "a b c" -> "b c"
var bs = b.substring(b.indexOf(" ")) // "a b c" -> "b c"
return as.matches("\\d+") ? -1 : as.compareTo(bs);
Remember the compare(T,T) method returns < 0 if a is "lower" than b, so if a has numbers, it will always be "higher" thus should return 1, if b has numbers then a will be "lower", thus it should return -1, otherwise just compare the strings
Here's the full program:
import java.util.*;
public class Sl {
public static void main(String ... args ) {
var list = Arrays.asList("19th apple orange", "17th admin 7th", "19th apple table");
Collections.sort(list, (a, b) -> {
// just use the last two words
var as = a.substring(a.indexOf(" "));
var bs = b.substring(b.indexOf(" "));
// if a has a number, will always be higher
return as.matches(".*\\d+.*") ? 1
// if b has a number, a will always be lower
: bs.matches(".*\\d+.*") ? -1
// if none of the above, compare lexicographically the strings
: as.compareTo(bs);
});
System.out.println(list);
}
}
If you aren't careful, you will get an error such as Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
In order to prevent that you can do it as follows by creating a comparator that parses the string and checks the second and third elements for integers. If the first element has not integers but the second one does, it will be sent to the bottom since the second one is considered greater by returning a 1. But the next condition must only check on the second element and return a -1 indicating that it is smaller than the one so gain, it goes to the bottom of the list.
public static List<String> sortOrders(List<String> orderList) {
Comparator<String> comp = (a, b) -> {
String[] aa = a.split("\\s+", 2);
String[] bb = b.split("\\s+", 2);
boolean aam = aa[1].matches(".*[0-9]+.*");
boolean bbm = bb[1].matches(".*[0-9]+.*");
return aam && !bbm ? 1 : bbm ? -1 :
aa[1].compareTo(bb[1]);
};
return orderList.stream().sorted(comp).toList();
}
If you want to preserve your original data, use the above. If you want to sort in place, then apply the Comparator defined above and use Collections.sort(data, comp).
I have tested this extensively using the following data generation code which generated random strings meeting your requirements. I suggest you test any answers you get (including this one) to ensure it satisfies your requirements.
String letters = "abcdefghijklmnopqrstuvwxyz";
Random r = new Random(123);
List<String> data = r.ints(200000, 1, 100).mapToObj(i -> {
StringBuilder sb = new StringBuilder();
boolean first = r.nextBoolean();
boolean second = r.nextBoolean();
int ltr = r.nextInt(letters.length());
String fstr = letters.substring(ltr,ltr+1);
ltr = r.nextInt(letters.length());
String sstr = letters.substring(ltr,ltr+1);
sb.append(fstr).append(first ? ltr : "").append(" ");
sb.append(fstr);
if (first) {
sb.append(r.nextInt(100));
}
sb.append(" ").append(sstr);
if (!first && second) {
sb.append(r.nextInt(100));
}
return sb.toString();
}).collect(Collectors.toCollection(ArrayList::new));

Problem handling multiple substrings on dynamic string

Currently I am having a hard time trying to figure out if there is a better way to refactor the following code.
Given the following:
String detail = "POTATORANDOMFOOD";
Lets say I want to assign variables with different parts of detail, the end result would look something like this.
String title = detail.substring(0, 6); // POTATO
String label = detail.substring(6, 12); // RANDOM
String tag = detail.substring(12, 16); // FOOD
Now lets say the string detail length constantly changes, sometimes it only contains "POTATORANDOM" and no "FOOD", sometimes it contains even more characters "POTATORANDOMFOODTODAY", so another variable would be used.
String title = detail.substring(0, 6); // POTATO
String label = detail.substring(6, 12); // RANDOM
String tag = detail.substring(12, 16); // FOOD
...
String etc = detail.substring(30, 40); // etc value from detail string
The issue with this, is that since the string sometimes is shorter or longer, we would run into the StringIndexOutOfBoundsException which is not good.
So currently I have a naive way to handle this:
if (detail != null || !detail.isEmpty()) {
if (detail.length() >= 6) {
title = detail.substring(0, 6);
if (detail.length() >= 12) {
label = detail.substring(6, 12);
if (detail.length() >= 16) {
tag = detail.substring(12, 16);
.
.
.
}
}
}
}
This can get really messy, especially if lets say the string were to grow even more.
So my question is, what would be a good design pattern that would fit for this type of problem? I have tried the chain of responsibility design pattern but, the issue with this one is that it only returns a single value, while I am trying to return multiple ones if possible. This way I can assign multiple variables depending on the length of the string.
Any help/hints is greatly appreciated!
Edited:
The order and length are always the same. So title will always be first and it will always contain 6 characters. label will always be second and it will always contain 6 characters. tag will always be third and it will always contain 4 characters, etc.
If I was you, I would do the following:
Define a class to hold a Word definition
public class Word {
private final String name;
private final int startIndex;
private final int endIndex;
public Word(String name, int startIndex, int endIndex) {
this.name = name;
this.startIndex = startIndex;
this.endIndex = endIndex;
}
public String getName() { return name; }
public int getStartIndex() { return startIndex; }
public int getEndIndex() { return endIndex; }
}
Create a static list which holds all the possible words
public static final List<Word> WORDS = List.of(
new Word("title", 0, 6),
new Word("label", 6, 12),
new Word("tag", 12, 16),
...
);
Create a function that parses the String detail by walking this list until when the size of the string is exhausted
... and of course storing the elements into a Map<String, String> so that you can access them later.
public Map<String, String> parseDetail(String detail) {
Map<String, String> receivedWords = new LinkedHashMap<>(); //<-- map respecting insertion order
if (detail.isEmpty()) {
return receivedWords;
}
int parsedLength = 0; ​
​for (Word word : WORDS) {
​receivedWords.put(word.getName(), detail.substring(word.getStartIndex(), word.getEndIndex()); //<-- store the current word
parsedLength += word.getEndIndex() - word.getStartIndex(); //increase the parsedLength by the length of your word
if (parsedLength >= detail.length()) {
break; //<-- exit the loop when you're done with the parsing
}
​}
return receivedWords;
}
To sum up:
Map<String, String> receivedWords = parseDetail(detail);
receivedWords.forEach((k, v) -> {
System.out.println("Key: " + k + ", value: " + v);
});
Output:
Key: title, value: POTATO
Key: label, value: RANDOM
Key: tag, value: FOOD
...
Tip 1: The input you receive looks pretty weird. I understand that you cannot change it but I would try to negotiate with the caller (if possible) a better way to send you their input (ideally a structured object, if not possible at least a string with some separator so that you can simply split by that character).
Tip 2: I have defined the list of words statically in the code. But I would instead define an external file (e.g. a Json file, or an Xml, or even a simple text file) that you parse dynamically to create the list. That will allow someone else to configure this file with the words/start index/end index without you having to do it in the code each time there is a change. ​
You could simply check the length of the total string to see if it has the RANDOM and the FOOD attributes before using substring()
String title = "", label = "", tag = "";
if (detail.length() >= 6)
title = detail.substring(0, 6);
if (detail.length() >= 12)
label = detail.substring(6, 12);
if (detail.length() == 16)
tag = detail.substring(12,16);
I would suggest a regex aproach:
public static void main(String[] args) {
String detail = "POTATORANDOMFOODTODAY";
Pattern p = Pattern.compile("(.{0,6})(.{0,6})(.{0,4})(.{0,5})");
Matcher m = p.matcher(detail);
m.find();
String title = m.group(1);
String label = m.group(2);
String tag = m.group(3);
String day = m.group(4);
System.out.println("title: " + title + ", lable: " + label + ", tag: " + tag + ", day: " + day);
}
//output: title: POTATO, lable: RANDOM, tag: FOOD, day: TODAY
If you have a lots of groups I would suggest to use named captured groups. The approach above can particularly be difficult to maintain as adding or removing a group in the middle of the regex upsets the previous numbering used via Matcher#group(int groupNumber). Using named capturing groups:
public static void main(String[] args) {
String detail = "POTATORANDOMFOODTODAY";
Pattern p = Pattern.compile("(?<title>.{0,6})(?<label>.{0,6})(?<tag>.{0,4})(?<day>.{0,5})");
Matcher m = p.matcher(detail);
m.find();
String title = m.group("title");
String label = m.group("label");
String tag = m.group("tag");
String day = m.group("day");
System.out.println("title: " + title + ", lable: " + label + ", tag: " + tag + ", day: " + day);
}
//output: title: POTATO, lable: RANDOM, tag: FOOD, day: TODAY
If the string is dynamic then it can essentially contain basically anything and since there can possibly be no whitespace(s) in the string the only way to know what a specific word (substring) might be is to play the string against a 'word list'. You can quickly come to realize how pivotal even a single whitespace (or separator character) can be within a string. Using the String#substring() method is only good if you already know what all the words within the detail string happen to be.
The simple solution would be to set acceptable rules as to how a specific string should be received. After all, why would you want to accept a string that contains multiple words without a separator character of some type to begin with. If the string has whitespaces in it, to separate the words contained within that string, a mere:
String[] words = string.split("\\s+");
line of code would do the trick. Bottom line, get rid of that nonsense of accepting strings containing multiple words with no separation mechanism included, even if that separation mechanism is by making use of the underscore ( _ ) character (or some other character). Well...if you can.
I suppose sometimes we just can't modify how we're dealt things (something like taxes) and how we receive specific strings is simply out of our control. If this is the case then one way to deal with this dilemma is to work against an established Word-List. This word list can in in the size of a few words to hundreds of thousands of words. The situation you need to deal with will determine the word list size. If small enough the word list can be contained within a String Array or a collection like an ArrayList or List Interface. If really large however then the word list would most likely be contained within a Text file. The word list I most commonly use contains well over 370,000 individual words.
Here is an example of using a small Word-List contained within a List Interface:
String detail = "POTATORANDOMFOODTODAY";
List<String> wordList = Arrays.asList(new String[] {
"pumpkin", "carrot", "potato", "tomato", "lettus", "radish", "bean",
"pea", "food", "random", "today", "yesterday", "tomorrow",
});
// See if the detail string 'contains' any word-list words...
List<String> found = new ArrayList<>();
for (int i = 0; i < wordList.size(); i++) {
String word = wordList.get(i);
if (detail.toLowerCase().contains(word.toLowerCase())) {
found.add(word.toUpperCase());
}
}
/* Ensure the words within the list are in proper order.
That is, the same order as they are received within the
detail String. This is necessary since words from the
word-List can be found anywhere within the detail string. */
int startIndex = 0;
List<String> foundWords = new ArrayList<>();
String tmpStrg = "";
while (!tmpStrg.equals(detail)) {
for (int i = 0; i < found.size(); i++) {
String word = found.get(i);
if (detail.indexOf(word) == startIndex) {
foundWords.add(word);
startIndex = startIndex + word.length();
String procStrg = foundWords.toString().replace(", ", "");
tmpStrg = procStrg.substring(1, procStrg.length() - 1);
}
}
}
//Format and Display the required data
if (foundWords.isEmpty()) {
System.err.println("Couldn't find any required words!");
return; // or whatever...
}
String title = foundWords.get(0);
String label = foundWords.size() > 1 ? foundWords.get(1) : "N/A";
String[] tag = new String[1];
if (foundWords.size() > 2) {
tag = new String[foundWords.size()-2];
for (int i = 0; i < foundWords.size() - 2; i++) {
tag[i] = foundWords.get(i + 2);
}
}
else {
tag[0] = "N/A";
}
System.out.println("Title:\t" + title);
System.out.println("Label:\t" + label);
System.out.println("Tags:\t"
+ Arrays.toString(tag).substring(1, Arrays.toString(tag).length() - 1));
When the above code is run the console window would display:
Title: POTATO
Label: RANDOM
Tags: FOOD, TODAY
You can use the Stream API and use filter() method.
Then you use map() to apply your existing logic, that should do the trick.
Switch-cases could be an alternative but it adds more LoC but reduces the arrow code of all the nested ifs

how to print the longest of three strings?

Is there a quick way to select the longest of three strings (s1,s2,s3) using if/else method?
I'm using Java
I have tried using something like this
if (s1.length() > s2.length()) {
System.out.println(s1); ...
but did not get it right.
Don't try to program all possible combinations with an if-else construct, as the complexity will grow exponentially if you add more strings.
This solution works well for a small number of strings with a linear complexity:
string longest = s1;
if (s2.length() > longest.length()) {
longest = s2;
}
if (s3.length() > longest.length()) {
longest = s3;
}
System.out.println(longest);
For a lager number of strings, put them in collection and find the longest using a loop.
You can use if, else if, else in C# (if you aren't actually using Java which it looks like you are) to handle this.
string current = str;
if(str2.Length > current.Length)
{
current = str2;
}
if (str3.Length > current.Length)
{
current = str3;
}
Unless using if/else is a requirement of this code, using a collection and LINQ will be a cleaner option.
List<string> strList = new List<string>
{
"str",
"strLen",
"strLength"
};
// This aggregate will return the longest string in a list.
string longestStr = strList.Aggregate("", (max, cur) => max.Length > cur.Length ? max : cur);
string a = "123";
string b = "1322";
string c = "122332";
if (a.Length > b.Length && a.Length > c.Length)
{
Console.WriteLine(a);
}
else if (b.Length > c.Length)
{
Console.WriteLine(b);
}
else
{
Console.WriteLine(c);
}
}
if/then/else constructs Java is same as C#. you can use solutions above. LINQ is like Streams in Java. In Java you can code:
public static void main(String args[]) {
printtLongest ("VampireApi","C#-Api","Java Api");
}
public static void printtLongest(String ... strings){
java.util.Arrays
.stream(strings)
.sorted(java.util.Comparator.comparingInt(String::length).reversed())
.findFirst().ifPresent(System.out::println);
}
create an array and input a string into each part of the array(can do this through loop or manually add- String[] st= new st String[]; then you can: st[0]="aaa"; st[1]="eff"... after this you can use a loop which takes the current length of the string at the array[i] and use a variable max(which will start at 0) which keep the highest length using the Math.max() function.
if the length(which is an integer) is larger then max then you save the string in a string variable and the loop will go through every string In your array and will update the max if needed. after this you can either return or print the string which is the longest.
this is one of many ways. or you could do three if's to check. this method would work great with larger amount of strings.
Not using if-else as the OP asked, but a cleaner solution is this:
void longest(String a, String b, String c) {
String[] triplet = {a, b, c};
Arrays.sort(triplet, Comparator.comparingInt(String::length));
System.out.println(triplet[2]);
}

How to stop a while loop in correct place?

I need help because I can't stop the while loop in the correct place.
On this example, I want to remove the last char as long as the String "seq" will be the same as some number on the list (867), then stop.
More advanced is, after finding the String, to add them to new ArrayList, then try to find the rest of String (75). If found, also input to new Array.
All better solutions are very welcome. Thanks!
The ArrayList list can contain hundreds of elements, there are 4 only, some elements in the list can repeat more than one time. Find first on the list is fine for me at this moment. If found, split 86775 to 867:75. Remove 867 from String seq and add to new ArrayList. Finally, we have to find 75 - if it doesn't exist, split to 7:5, and try to find 7 and 5. On my list, single numbers always exist so no problem with exceptions.
If you want to use split method always I need to find the higher number as possible, then if not found split and removing last one but I need every digit.
The String being examined can have 20 chars or more. This is the simplest example possible.
I was looking for many methods to solve this problem and I am not sure this is best.
public class Main {
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("567");
list.add("867");
list.add("86");
list.add("75");
System.out.println(list);
String seq = "86775";
System.out.println(seq + ": found ?: " + getPatternFound(String.valueOf(list), seq) + " times");
int nbChar = 0;
do {
getRemoveLastChar(seq, 1);
if (getPatternFound(String.valueOf(list), seq) == 0) ;
getRemoveLastChar(seq, 2);
nbChar++;
System.out.println(getRemoveLastChar(seq, nbChar));
}
while (nbChar < seq.length());
}
private static String getRemoveLastChar(String str, int nbChar) {
return str.substring(0, str.length() - nbChar);
}
private static int getPatternFound(String longString, String pat) {
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(longString);
int count = 0;
while (matcher.find())
count++;
return count;
}
}
[567, 867, 86, 30]
867755: found ?: 0 times
86775
8677
867 <- stop here
86
8
To answer the main question: how do you quit when done?
Of course you can do it in-loop with the break or continue keywords.
What I deem more elegant is the use of return in a separate method.
It makes the code easier to understand if later you have changes.
Here is an implementation that shows the behaviour as you stated:
import java.util.ArrayList;
public class Test {
public static void main(String[] args) {
ArrayList<String> list = new ArrayList<String>();
list.add("567");
list.add("867");
list.add("86");
list.add("75");
System.out.println("List = " + list);
String seq = "86775";
String result = search(list, seq);
}
/**
*
* #param list
* #param seq
* #return null means no match;
*/
private static String search(ArrayList<String> list, String seq) {
while (seq.length() > 0) {
System.out.println("analyzing seq: " + seq);
boolean broken = false;
for (String s : list) {
if (seq.contentEquals(s)) { // if matched
System.out.println("found match: " + s);
// do your logic, for example
seq = seq.replace(s, "");
// quit the loop at right place
return s;
}
}
seq = seq.substring(0, seq.length() - 1);
}
return null;
}
}
generates output:
List = [567, 867, 86, 75]
analyzing seq: 86775
analyzing seq: 8677
analyzing seq: 867
found match: 867
When reading what you want to do with it: the String API offers String.contains(String infix) in combination with String.replace(String old, String new) . These should solve your real problem much better than doing everything manually in loops.

indexOf Case Sensitive?

Is the indexOf(String) method case sensitive? If so, is there a case insensitive version of it?
The indexOf() methods are all case-sensitive. You can make them (roughly, in a broken way, but working for plenty of cases) case-insensitive by converting your strings to upper/lower case beforehand:
s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
s1.indexOf(s2);
Is the indexOf(String) method case sensitive?
Yes, it is case sensitive:
#Test
public void indexOfIsCaseSensitive() {
assertTrue("Hello World!".indexOf("Hello") != -1);
assertTrue("Hello World!".indexOf("hello") == -1);
}
If so, is there a case insensitive version of it?
No, there isn't. You can convert both strings to lower case before calling indexOf:
#Test
public void caseInsensitiveIndexOf() {
assertTrue("Hello World!".toLowerCase().indexOf("Hello".toLowerCase()) != -1);
assertTrue("Hello World!".toLowerCase().indexOf("hello".toLowerCase()) != -1);
}
There is an ignore case method in StringUtils class of Apache Commons Lang library
indexOfIgnoreCase(CharSequence str, CharSequence searchStr)
Yes, indexOf is case sensitive.
The best way to do case insensivity I have found is:
String original;
int idx = original.toLowerCase().indexOf(someStr.toLowerCase());
That will do a case insensitive indexOf().
Here is my solution which does not allocate any heap memory, therefore it should be significantly faster than most of the other implementations mentioned here.
public static int indexOfIgnoreCase(final String haystack,
final String needle) {
if (needle.isEmpty() || haystack.isEmpty()) {
// Fallback to legacy behavior.
return haystack.indexOf(needle);
}
for (int i = 0; i < haystack.length(); ++i) {
// Early out, if possible.
if (i + needle.length() > haystack.length()) {
return -1;
}
// Attempt to match substring starting at position i of haystack.
int j = 0;
int ii = i;
while (ii < haystack.length() && j < needle.length()) {
char c = Character.toLowerCase(haystack.charAt(ii));
char c2 = Character.toLowerCase(needle.charAt(j));
if (c != c2) {
break;
}
j++;
ii++;
}
// Walked all the way to the end of the needle, return the start
// position that this was found.
if (j == needle.length()) {
return i;
}
}
return -1;
}
And here are the unit tests that verify correct behavior.
#Test
public void testIndexOfIgnoreCase() {
assertThat(StringUtils.indexOfIgnoreCase("A", "A"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "A"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("A", "a"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "a"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "ba"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("ba", "a"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", " Royal Blue"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase(" Royal Blue", "Royal Blue"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "royal"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "oyal"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "al"), is(3));
assertThat(StringUtils.indexOfIgnoreCase("", "royal"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", ""), is(0));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BLUE"), is(6));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BIGLONGSTRING"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "Royal Blue LONGSTRING"), is(-1));
}
Yes, it is case-sensitive. You can do a case-insensitive indexOf by converting your String and the String parameter both to upper-case before searching.
String str = "Hello world";
String search = "hello";
str.toUpperCase().indexOf(search.toUpperCase());
Note that toUpperCase may not work in some circumstances. For instance this:
String str = "Feldbergstraße 23, Mainz";
String find = "mainz";
int idxU = str.toUpperCase().indexOf (find.toUpperCase ());
int idxL = str.toLowerCase().indexOf (find.toLowerCase ());
idxU will be 20, which is wrong! idxL will be 19, which is correct. What's causing the problem is tha toUpperCase() converts the "ß" character into TWO characters, "SS" and this throws the index off.
Consequently, always stick with toLowerCase()
What are you doing with the index value once returned?
If you are using it to manipulate your string, then could you not use a regular expression instead?
import static org.junit.Assert.assertEquals;
import org.junit.Test;
public class StringIndexOfRegexpTest {
#Test
public void testNastyIndexOfBasedReplace() {
final String source = "Hello World";
final int index = source.toLowerCase().indexOf("hello".toLowerCase());
final String target = "Hi".concat(source.substring(index
+ "hello".length(), source.length()));
assertEquals("Hi World", target);
}
#Test
public void testSimpleRegexpBasedReplace() {
final String source = "Hello World";
final String target = source.replaceFirst("(?i)hello", "Hi");
assertEquals("Hi World", target);
}
}
I've just looked at the source. It compares chars so it is case sensitive.
#Test
public void testIndexofCaseSensitive() {
TestCase.assertEquals(-1, "abcDef".indexOf("d") );
}
Yes, I am fairly sure it is. One method of working around that using the standard library would be:
int index = str.toUpperCase().indexOf("FOO");
Had the same problem.
I tried regular expression and the apache StringUtils.indexOfIgnoreCase-Method, but both were pretty slow...
So I wrote an short method myself...:
public static int indexOfIgnoreCase(final String chkstr, final String searchStr, int i) {
if (chkstr != null && searchStr != null && i > -1) {
int serchStrLength = searchStr.length();
char[] searchCharLc = new char[serchStrLength];
char[] searchCharUc = new char[serchStrLength];
searchStr.toUpperCase().getChars(0, serchStrLength, searchCharUc, 0);
searchStr.toLowerCase().getChars(0, serchStrLength, searchCharLc, 0);
int j = 0;
for (int checkStrLength = chkstr.length(); i < checkStrLength; i++) {
char charAt = chkstr.charAt(i);
if (charAt == searchCharLc[j] || charAt == searchCharUc[j]) {
if (++j == serchStrLength) {
return i - j + 1;
}
} else { // faster than: else if (j != 0) {
i = i - j;
j = 0;
}
}
}
return -1;
}
According to my tests its much faster... (at least if your searchString is rather short).
if you have any suggestions for improvement or bugs it would be nice to let me know... (since I use this code in an application ;-)
The first question has already been answered many times. Yes, the String.indexOf() methods are all case-sensitive.
If you need a locale-sensitive indexOf() you could use the Collator. Depending on the strength value you set you can get case insensitive comparison, and also treat accented letters as the same as the non-accented ones, etc.
Here is an example of how to do this:
private int indexOf(String original, String search) {
Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);
for (int i = 0; i <= original.length() - search.length(); i++) {
if (collator.equals(search, original.substring(i, i + search.length()))) {
return i;
}
}
return -1;
}
Just to sum it up, 3 solutions:
using toLowerCase() or toUpperCase
using StringUtils of apache
using regex
Now, what I was wondering was which one is the fastest?
I'm guessing on average the first one.
I would like to lay claim to the ONE and only solution posted so far that actually works. :-)
Three classes of problems that have to be dealt with.
Non-transitive matching rules for lower and uppercase. The Turkish I problem has been mentioned frequently in other replies. According to comments in Android source for String.regionMatches, the Georgian comparison rules requires additional conversion to lower-case when comparing for case-insensitive equality.
Cases where upper- and lower-case forms have a different number of letters. Pretty much all of the solutions posted so far fail, in these cases. Example: German STRASSE vs. Straße have case-insensitive equality, but have different lengths.
Binding strengths of accented characters. Locale AND context effect whether accents match or not. In French, the uppercase form of 'é' is 'E', although there is a movement toward using uppercase accents . In Canadian French, the upper-case form of 'é' is 'É', without exception. Users in both countries would expect "e" to match "é" when searching. Whether accented and unaccented characters match is locale-specific. Now consider: does "E" equal "É"? Yes. It does. In French locales, anyway.
I am currently using android.icu.text.StringSearch to correctly implement previous implementations of case-insensitive indexOf operations.
Non-Android users can access the same functionality through the ICU4J package, using the com.ibm.icu.text.StringSearch class.
Be careful to reference classes in the correct icu package (android.icu.text or com.ibm.icu.text) as Android and the JRE both have classes with the same name in other namespaces (e.g. Collator).
this.collator = (RuleBasedCollator)Collator.getInstance(locale);
this.collator.setStrength(Collator.PRIMARY);
....
StringSearch search = new StringSearch(
pattern,
new StringCharacterIterator(targetText),
collator);
int index = search.first();
if (index != SearchString.DONE)
{
// remember that the match length may NOT equal the pattern length.
length = search.getMatchLength();
....
}
Test Cases (Locale, pattern, target text, expectedResult):
testMatch(Locale.US,"AbCde","aBcDe",true);
testMatch(Locale.US,"éèê","EEE",true);
testMatch(Locale.GERMAN,"STRASSE","Straße",true);
testMatch(Locale.FRENCH,"éèê","EEE",true);
testMatch(Locale.FRENCH,"EEE","éèê",true);
testMatch(Locale.FRENCH,"éèê","ÉÈÊ",true);
testMatch(new Locale("tr-TR"),"TITLE","tıtle",true); // Turkish dotless I/i
testMatch(new Locale("tr-TR"),"TİTLE","title",true); // Turkish dotted I/i
testMatch(new Locale("tr-TR"),"TITLE","title",false); // Dotless-I != dotted i.
PS: As best as I can determine, the PRIMARY binding strength should do the right thing when locale-specific rules differentiate between accented and non-accented characters according to dictionary rules; but I don't which locale to use to test this premise. Donated test cases would be gratefully appreciated.
--
Copyright notice: because StackOverflow's CC-BY_SA copyrights as applied to code-fragments are unworkable for professional developers, these fragments are dual licensed under more appropriate licenses here: https://pastebin.com/1YhFWmnU
But it's not hard to write one:
public class CaseInsensitiveIndexOfTest extends TestCase {
public void testOne() throws Exception {
assertEquals(2, caseInsensitiveIndexOf("ABC", "xxabcdef"));
}
public static int caseInsensitiveIndexOf(String substring, String string) {
return string.toLowerCase().indexOf(substring.toLowerCase());
}
}
Converting both strings to lower-case is usually not a big deal but it would be slow if some of the strings is long. And if you do this in a loop then it would be really bad. For this reason, I would recommend indexOfIgnoreCase.
static string Search(string factMessage, string b)
{
int index = factMessage.IndexOf(b, StringComparison.CurrentCultureIgnoreCase);
string line = null;
int i = index;
if (i == -1)
{ return "not matched"; }
else
{
while (factMessage[i] != ' ')
{
line = line + factMessage[i];
i++;
}
return line;
}
}
Here's a version closely resembling Apache's StringUtils version:
public int indexOfIgnoreCase(String str, String searchStr) {
return indexOfIgnoreCase(str, searchStr, 0);
}
public int indexOfIgnoreCase(String str, String searchStr, int fromIndex) {
// https://stackoverflow.com/questions/14018478/string-contains-ignore-case/14018511
if(str == null || searchStr == null) return -1;
if (searchStr.length() == 0) return fromIndex; // empty string found; use same behavior as Apache StringUtils
final int endLimit = str.length() - searchStr.length() + 1;
for (int i = fromIndex; i < endLimit; i++) {
if (str.regionMatches(true, i, searchStr, 0, searchStr.length())) return i;
}
return -1;
}
indexOf is case sensitive. This is because it uses the equals method to compare the elements in the list. The same thing goes for contains and remove.

Categories

Resources