Java custom Sort by 2 parts of same string - java

I have seen other questions like this, but couldn't adapt any of the information to my code. Either because it wasn't specific to my issue or I couldn't get my head around the answer. So, I am hoping to ask "how" with my specific code. Tell me if more is needed.
I have various files (all jpg's) with names with the format "20140214-ddEventBlahBlah02.jpg" and "20150302-ddPsBlagBlag2".
I have a custom comparator in use that sorts things in a Windows OS fashion... i.e. 02,2,003,4,4b,4c,10, etc. Instead of the computer way of sorting, which was screwed up. Everything is good, except I now want to sort these strings using 2 criteria in the strings.
1) The date (in the beginning). i.e. 20150302
2) The rest of the filename after the "-" i.e. ddPsBlagBlag2
I am currently using the comparator for a project that displays these files in reverse order. They are displaying according to what was added most recently. i.e. 20150302 is displaying before 20140214. Which is good. But I would like the files, after being sorted by date in reverse order, to display by name in normal Windows OS ascending order (not in reverse).
Code:
Collections.sort(file, new Comparator<File>()
{
private final Comparator<String> NATURAL_SORT = new WindowsExplorerComparator();
#Override
public int compare(File o1, File o2)
{
return NATURAL_SORT.compare(o1.getName(), o2.getName());
}
});
Collections.reverse(file);
The code above takes the ArayList of file names and sends it to the custom WindowsExplorerComparator class. After being sorted, Collections.reverse() is called on the ArrayList.
Code:
class WindowsExplorerComparator implements Comparator<String>
{
private static final Pattern splitPattern = Pattern.compile("\\d\\.|\\s");
#Override
public int compare(String str1, String str2) {
Iterator<String> i1 = splitStringPreserveDelimiter(str1).iterator();
Iterator<String> i2 = splitStringPreserveDelimiter(str2).iterator();
while (true)
{
//Til here all is equal.
if (!i1.hasNext() && !i2.hasNext())
{
return 0;
}
//first has no more parts -> comes first
if (!i1.hasNext() && i2.hasNext())
{
return -1;
}
//first has more parts than i2 -> comes after
if (i1.hasNext() && !i2.hasNext())
{
return 1;
}
String data1 = i1.next();
String data2 = i2.next();
int result;
try
{
//If both datas are numbers, then compare numbers
result = Long.compare(Long.valueOf(data1), Long.valueOf(data2));
//If numbers are equal than longer comes first
if (result == 0)
{
result = -Integer.compare(data1.length(), data2.length());
}
}
catch (NumberFormatException ex)
{
//compare text case insensitive
result = data1.compareToIgnoreCase(data2);
}
if (result != 0) {
return result;
}
}
}
private List<String> splitStringPreserveDelimiter(String str) {
Matcher matcher = splitPattern.matcher(str);
List<String> list = new ArrayList<String>();
int pos = 0;
while (matcher.find()) {
list.add(str.substring(pos, matcher.start()));
list.add(matcher.group());
pos = matcher.end();
}
list.add(str.substring(pos));
return list;
}
}
The code above is the custom WindowsExplorerComperator class being used to sort the ArrayList.
So, an example of what I would like the ArrayList to look like after being sorted (and date sort reversed) is:
20150424-ssEventBlagV002.jpg
20150323-ssEventBlagV2.jpg
20150323-ssEventBlagV3.jpg
20150323-ssEventBlagV10.jpg
20141201-ssEventZoolander.jpg
20141102-ssEventApple1.jpg
As you can see, first sorted by date (and reversed), then sorted in ascending order by the rest of the string name.
Is this possible? Please tell me its an easy fix.

Your close, whenever dealing with something not working debug your program and make sure that methods are returning what you would expect. When I ran your program first thing I noticed was that EVERY compare iteration which attempted to convert a string to Long threw a NumberFormatException. This was a big red flag so I threw in some printlns to check what the value of data1 and data2 were.
Heres my output:
Compare: 20150323-ssEventBlagV 20150424-ssEventBlagV00
Compare: 20150323-ssEventBlagV 20150323-ssEventBlagV
Compare: 3. 2.
Compare: 20150323-ssEventBlagV 20150424-ssEventBlagV00
Compare: 20150323-ssEventBlagV 20150323-ssEventBlagV
Compare: 3. 2.
Compare: 20150323-ssEventBlagV1 20150323-ssEventBlagV
Compare: 20150323-ssEventBlagV1 20150424-ssEventBlagV00
Compare: 20141201-ssEventZoolander.jpg 20150323-ssEventBlagV1
Compare: 20141201-ssEventZoolander.jpg 20150323-ssEventBlagV
Compare: 20141201-ssEventZoolander.jpg 20150323-ssEventBlagV
Big thing to notice here is that its trying to convert 3. and 2. to long values which of course wont work.
The simplest solution with your code is to simply change your regular expression. Although you might go for a more simple route of string iteration instead of regex in the future, I feel as though regex complicates this problem more than it helps.
New regex: \\d+(?=\\.)|\\s
Changes:
\\d -> \\d+ - Capture all digits before the period not just the first one
\\. -> (?=\\.) - place period in non capturing group so your method doesn't append it to our digits
New debug output:
Compare: 20150323-ssEventBlagV 20150424-ssEventBlagV
Compare: 20150323-ssEventBlagV 20150323-ssEventBlagV
Compare: 3 2
Compare: 20150323-ssEventBlagV 20150323-ssEventBlagV
Compare: 10 3
Compare: 20141201-ssEventZoolander.jpg 20150323-ssEventBlagV
As you can see the numbers at the end are actually getting parsed correctly.
One more minor thing:
Your result for digit comparison is backwards
result = Long.compare(Long.valueOf(data1), Long.valueOf(data2));
should be either:
result = -Long.compare(Long.valueOf(data1), Long.valueOf(data2));
or
result = Long.compare(Long.valueOf(data2), Long.valueOf(data1));
because its sorting them backwards.

There are a few things you should do:
First, you need to fix your split expression as #ug_ stated. However, I think splitting on numbers is more appropriate.
private static final Pattern splitPattern = Pattern.compile("\\d+");
which, for 20150323-ssEventBlagV2.jpg will result in
[, 20150323, -ssEventBlagV, 2, .jpg]
Second, perform a date comparison separate from your Long comparison. Using SimpleDateFormat will make sure you are only comparing numbers that are formatted as dates.
try {
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
result = sdf.parse(data2).compareTo(sdf.parse(data1));
if (result != 0) {
return result;
}
} catch (final ParseException e) {
/* continue */
}
Last, swap the order of your Long compare
Long.compare(Long.valueOf(data2), Long.valueOf(data1));
And you should be good to go. Full code below.
private static final Pattern splitPattern = Pattern.compile("\\d+");
#Override
public int compare(String str1, String str2) {
Iterator<String> i1 = splitStringPreserveDelimiter(str1).iterator();
Iterator<String> i2 = splitStringPreserveDelimiter(str2).iterator();
while (true) {
// Til here all is equal.
if (!i1.hasNext() && !i2.hasNext()) {
return 0;
}
// first has no more parts -> comes first
if (!i1.hasNext() && i2.hasNext()) {
return -1;
}
// first has more parts than i2 -> comes after
if (i1.hasNext() && !i2.hasNext()) {
return 1;
}
String data1 = i1.next();
String data2 = i2.next();
int result;
try {
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
result = sdf.parse(data1).compareTo(sdf.parse(data2));
if (result != 0) {
return result;
}
} catch (final ParseException e) {
/* continue */
}
try {
// If both datas are numbers, then compare numbers
result = Long.compare(Long.valueOf(data2),
Long.valueOf(data1));
// If numbers are equal than longer comes first
if (result == 0) {
result = -Integer.compare(data1.length(),
data2.length());
}
} catch (NumberFormatException ex) {
// compare text case insensitive
result = data1.compareToIgnoreCase(data2);
}
if (result != 0) {
return result;
}
}
}

You will need to edit your WindowsExporerComparator Class so that it performs this sorting. Given two file names as Strings you need to determine what order they go in using a following high level algorithm.
are they the same? if yes return 0
Split the file name into two strings, the date portion and the name portion.
Using the date portion convert the string to a date using the Java DateTime and then compare the dates.
If the dates are the same compare the two name portions using your current compare code and return the result from that.
This is a bit complicated and sort of confusing, but you will have to do it in one comparator and put in all of your custom logic

Related

Sort Array List lexicographically ignoring integers

I have this code. I want to order a list of strings. Every item in the list consists of a three word sentence. I want to ignore the first word and sort the sentence lexicographically with the 2nd and 3rd words. If the 2nd or 3rd words contain an integer, I want to ignore sorting them but add them to the end of the list.
For example: (19th apple orange, 17th admin 7th, 19th apple table) should be sorted in the list as (19th apple orange, 19th apple table, 17th admin 7th)
So far my code only ignores the first word and sort lexicographically the rest of the lists
public static List<String> sortOrders(List<String> orderList) {
// Write your code here
Collections.sort( orderList,
(a, b) -> a.split(" *", 2)[1].compareTo( b.split(" *", 2)[1] )
);
return orderList;
}
In your compare method check for numbers first and then strings. You just have to add code to the steps you described:
Here's a pseudo code of what you described
...
(a,b) -> {
// Every item in the list consists of a three word sentence.
var awords = a.split(" ")
var bwords = a.split(" ")
// I want to ignore the first word
var as = awords[1] + " " awords[2]
var bs ...
// and sort the sentence lexicographically with the 2nd and 3rd words.
var r = as.compareTo(bs)
// If the 2nd or 3rd words contain an integer, I want to ignore sorting them but add them to the end of the list
if ( as.matches(".*\\d.*) ) {
return -1
} else {
return r
}
}
...
It's not clear what to do if both have numbers, e.g. a 1 a vs a 1 b, but that's something you have to clarify.
So basically you just have to go, divide each of the statements in your problem and add some code that solves it (like the example below )
You might notice there are some gaps (like what to do if two of them have strings). Once you have a working solution you can clean it up.
Another alternative with a similar idea
var as = a.substring(a.indexOf(" ")) // "a b c" -> "b c"
var bs = b.substring(b.indexOf(" ")) // "a b c" -> "b c"
return as.matches("\\d+") ? -1 : as.compareTo(bs);
Remember the compare(T,T) method returns < 0 if a is "lower" than b, so if a has numbers, it will always be "higher" thus should return 1, if b has numbers then a will be "lower", thus it should return -1, otherwise just compare the strings
Here's the full program:
import java.util.*;
public class Sl {
public static void main(String ... args ) {
var list = Arrays.asList("19th apple orange", "17th admin 7th", "19th apple table");
Collections.sort(list, (a, b) -> {
// just use the last two words
var as = a.substring(a.indexOf(" "));
var bs = b.substring(b.indexOf(" "));
// if a has a number, will always be higher
return as.matches(".*\\d+.*") ? 1
// if b has a number, a will always be lower
: bs.matches(".*\\d+.*") ? -1
// if none of the above, compare lexicographically the strings
: as.compareTo(bs);
});
System.out.println(list);
}
}
If you aren't careful, you will get an error such as Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
In order to prevent that you can do it as follows by creating a comparator that parses the string and checks the second and third elements for integers. If the first element has not integers but the second one does, it will be sent to the bottom since the second one is considered greater by returning a 1. But the next condition must only check on the second element and return a -1 indicating that it is smaller than the one so gain, it goes to the bottom of the list.
public static List<String> sortOrders(List<String> orderList) {
Comparator<String> comp = (a, b) -> {
String[] aa = a.split("\\s+", 2);
String[] bb = b.split("\\s+", 2);
boolean aam = aa[1].matches(".*[0-9]+.*");
boolean bbm = bb[1].matches(".*[0-9]+.*");
return aam && !bbm ? 1 : bbm ? -1 :
aa[1].compareTo(bb[1]);
};
return orderList.stream().sorted(comp).toList();
}
If you want to preserve your original data, use the above. If you want to sort in place, then apply the Comparator defined above and use Collections.sort(data, comp).
I have tested this extensively using the following data generation code which generated random strings meeting your requirements. I suggest you test any answers you get (including this one) to ensure it satisfies your requirements.
String letters = "abcdefghijklmnopqrstuvwxyz";
Random r = new Random(123);
List<String> data = r.ints(200000, 1, 100).mapToObj(i -> {
StringBuilder sb = new StringBuilder();
boolean first = r.nextBoolean();
boolean second = r.nextBoolean();
int ltr = r.nextInt(letters.length());
String fstr = letters.substring(ltr,ltr+1);
ltr = r.nextInt(letters.length());
String sstr = letters.substring(ltr,ltr+1);
sb.append(fstr).append(first ? ltr : "").append(" ");
sb.append(fstr);
if (first) {
sb.append(r.nextInt(100));
}
sb.append(" ").append(sstr);
if (!first && second) {
sb.append(r.nextInt(100));
}
return sb.toString();
}).collect(Collectors.toCollection(ArrayList::new));

how to print the longest of three strings?

Is there a quick way to select the longest of three strings (s1,s2,s3) using if/else method?
I'm using Java
I have tried using something like this
if (s1.length() > s2.length()) {
System.out.println(s1); ...
but did not get it right.
Don't try to program all possible combinations with an if-else construct, as the complexity will grow exponentially if you add more strings.
This solution works well for a small number of strings with a linear complexity:
string longest = s1;
if (s2.length() > longest.length()) {
longest = s2;
}
if (s3.length() > longest.length()) {
longest = s3;
}
System.out.println(longest);
For a lager number of strings, put them in collection and find the longest using a loop.
You can use if, else if, else in C# (if you aren't actually using Java which it looks like you are) to handle this.
string current = str;
if(str2.Length > current.Length)
{
current = str2;
}
if (str3.Length > current.Length)
{
current = str3;
}
Unless using if/else is a requirement of this code, using a collection and LINQ will be a cleaner option.
List<string> strList = new List<string>
{
"str",
"strLen",
"strLength"
};
// This aggregate will return the longest string in a list.
string longestStr = strList.Aggregate("", (max, cur) => max.Length > cur.Length ? max : cur);
string a = "123";
string b = "1322";
string c = "122332";
if (a.Length > b.Length && a.Length > c.Length)
{
Console.WriteLine(a);
}
else if (b.Length > c.Length)
{
Console.WriteLine(b);
}
else
{
Console.WriteLine(c);
}
}
if/then/else constructs Java is same as C#. you can use solutions above. LINQ is like Streams in Java. In Java you can code:
public static void main(String args[]) {
printtLongest ("VampireApi","C#-Api","Java Api");
}
public static void printtLongest(String ... strings){
java.util.Arrays
.stream(strings)
.sorted(java.util.Comparator.comparingInt(String::length).reversed())
.findFirst().ifPresent(System.out::println);
}
create an array and input a string into each part of the array(can do this through loop or manually add- String[] st= new st String[]; then you can: st[0]="aaa"; st[1]="eff"... after this you can use a loop which takes the current length of the string at the array[i] and use a variable max(which will start at 0) which keep the highest length using the Math.max() function.
if the length(which is an integer) is larger then max then you save the string in a string variable and the loop will go through every string In your array and will update the max if needed. after this you can either return or print the string which is the longest.
this is one of many ways. or you could do three if's to check. this method would work great with larger amount of strings.
Not using if-else as the OP asked, but a cleaner solution is this:
void longest(String a, String b, String c) {
String[] triplet = {a, b, c};
Arrays.sort(triplet, Comparator.comparingInt(String::length));
System.out.println(triplet[2]);
}

Check if any part of a string input is not a number

I couldnt find an answer for this in Java, so I'll ask here. I need to check if 3 parts of a string input contains a number (int).
The input will be HOURS:MINUTES:SECONDS (E.g. 10:40:50, which will be 10 hours, 40 minutes and 50 seconds). So far I am getting the values in String[] into an array by splitting it on :. I have parsed the strings into ints and I am using an if statement to check if all 3 parts is equal or larger than 0. The problem is that if I now use letters I will only just get an error, but I want to check if any of the 3 parts contains a character that is not 0-9, but dont know how.
First I thought something like this could work, but really dont.
String[] inputString = input.split(":");
if(inputString.length == 3) {
String[] alphabet = {"a","b","c"};
if(ArrayUtils.contains(alphabet,input)){
gives error message
}
int hoursInt = Integer.parseInt(inputString[0]);
int minutesInt = Integer.parseInt(inputString[1]);
int secondsInt = Integer.parseInt(inputString[2]);
else if(hoursInt >= 0 || minutesInt >= 0 || secondsInt >= 0) {
successfull
}
else {
gives error message
}
else {
gives error message
}
In the end I just want to check if any of the three parts contains a character, and if it doesnt, run something.
If you are sure you always have to parse a String of the form/pattern HH:mm:ss
(describing a time of day),
you can try to parse it to a LocalTime, which will only work if the parts HH, mm and ss are actually valid integers and valid time values.
Do it like this and maybe catch an Exception for a wrong input String:
public static void main(String[] arguments) {
String input = "10:40:50";
String wrongInput = "ab:cd:ef";
LocalTime time = LocalTime.parse(input);
System.out.println(time.format(DateTimeFormatter.ISO_LOCAL_TIME));
try {
LocalTime t = LocalTime.parse(wrongInput);
} catch (DateTimeParseException dtpE) {
System.err.println("Input not parseable...");
dtpE.printStackTrace();
}
}
The output of this minimal example is
10:40:50
Input not parseable...
java.time.format.DateTimeParseException: Text 'ab:cd:ef' could not be parsed at index 0
at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
at java.time.LocalTime.parse(LocalTime.java:441)
at java.time.LocalTime.parse(LocalTime.java:426)
at de.os.prodefacto.StackoverflowDemo.main(StackoverflowDemo.java:120)
I would personally create my own helper methods for this, instead of using an external library such as Apache (unless you already plan on using the library elsewhere in the project).
Here is an example of what it could look like:
public static void main(String[] arguments) {
String time = "10:50:45";
String [] arr = time.split(":");
if (containsNumbers(arr)) {
System.out.println("Time contained a number!");
}
//You can put an else if you want something to happen when it is not a number
}
private static boolean containsNumbers(String[] arr) {
for (String s : arr) {
if (!isNumeric(s)) {
return false;
}
}
return true;
}
public static boolean isNumeric(String str) {
return str.matches("-?\\d+(.\\d+)?");
}
containsNumbers will take a String array as an input and use an enhanced for loop to iterate through all the String values, using the other helper method isNumeric that checks if the String is a number or not using regex.
This code has the benefit of not being dependent on Exceptions to handle any of the logic.
You can also modify this code to use a String as a parameter instead of an array, and let it handle the split inside of the method instead of outside.
Note that typically there are better ways to work with date and time, but I thought I would answer your literal question.
Example Runs:
String time = "sd:fe:gbdf";
returns false
String time = "as:12:sda";
returns false
String time = "10:50:45";
returns true
You can check the stream of characters.
If the filter does not detect a non-digit, return "Numeric"
Otherwise, return "Not Numeric"
String str = "922029202s9202920290220";
String result = str.chars()
.filter(c -> !Character.isDigit(c))
.findFirst().isEmpty() ? "Numeric"
: "Not Numeric";
System.out.println(result);
If you want to check with nested loop you can see this proposal:
Scanner scanner = new Scanner(System.in);
String [] inputString = scanner.nextLine().split(":");
for (int i = 0; i < inputString.length; i++) {
String current = inputString[i];
for (int k = 0; k < current.length(); k++) {
if (!Character.isDigit(current.charAt(k))) {
System.out.println("Error");
break;
}
}
}
you could use String.matches method :
String notANum= "ok";
String aNum= "7";
if(notANum.matches("^[0-9]+$") sop("no way!");
if(aNum.matches("^[0-9]+$") sop("yes of course!");
The code above would print :
yes of course
The method accepts a regex, the one in the above exemple is for integers.
EDIT
I would use this instead :
if(input.matches("^\d+:\d+:\d+$")) success;
else error
You don't have to split the string.
I tried to make your code better, take a look. You can use Java regex to validate numbers. also defined range for time so no 24:61:61 values is allowed.
public class Regex {
static boolean range(int timeval,int min,int max)
{
boolean status=false;
if(timeval>=min && timeval<max)
{status=true;}
return status;
}
public static void main(String[] args) {
String regex = "[0-9]{1,2}";
String input ="23:59:59";
String msg="please enter valid time ";
String[] inputString = input.split(":");
if(inputString[0].matches(regex) && inputString[1].matches(regex) && inputString[2].matches(regex) )
{
if(Regex.range(Integer.parseInt(inputString[0]), 00, 24) &&Regex.range(Integer.parseInt(inputString[1]), 00, 60) && Regex.range(Integer.parseInt(inputString[2]), 00, 60))
{msg="converted time = " + Integer.parseInt(inputString[0]) + " : " +Integer.parseInt(inputString[1])+ " : " +Integer.parseInt(inputString[2]) ;}
}
System.out.println(msg);
}
}

Removing leading zero in java code

May I know how can I remove the leading zero in JAVA code? I tried several methods like regex tools
"s.replaceFirst("^0+(?!$)", "") / replaceAll("^0*", "");`
but it's seem like not support with my current compiler compliance level (1.3), will have a red line stated the method replaceFirst(String,String)is undefined for the type String.
Part of My Java code
public String proc_MODEL(Element recElement)
{
String SEAT = "";
try
{
SEAT = setNullToString(recElement.getChildText("SEAT")); // xml value =0000500
if (SEAT.length()>0)
{
SEAT = SEAT.replaceFirst("^0*", ""); //I need to remove leading zero to only 500
}
catch (Exception e)
{
e.printStackTrace();
return "501 Exception in proc_MODEL";
}
}
}
Appreciate for help.
If you want remove leading zeros, you could parse to an Integer and convert back to a String with one line like
String seat = "001";// setNullToString(recElement.getChildText("SEAT"));
seat = Integer.valueOf(seat).toString();
System.out.println(seat);
Output is
1
Of course if you intend to use the value it's probably better to keep the int
int s = Integer.parseInt(seat);
System.out.println(s);
replaceFirst() was introduced in 1.4 and your compiler pre-dates that.
One possibility is to use something like:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
while ((s.length() > 1) && (s.charAt(0) == '0'))
s = s.substring(1);
System.out.println(s);
}
}
It's not the most efficient code in the world but it'll get the job done.
A more efficient segment without unnecessary string creation could be:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
int pos = 0;
int len = s.length();
while ((pos < len-1) && (s.charAt(pos) == '0'))
pos++;
s = s.substring(pos);
System.out.println(s);
}
}
Both of those also handle the degenerate cases of an empty string and a string containing only 0 characters.
Using a java method str.replaceAll("^0+(?!$)", "") would be simple;
First parameter:regex -- the regular expression to which this string is to be matched.
Second parameter: replacement -- the string which would replace matched expression.
As stated in Java documentation, 'replaceFirst' only started existing since Java 1.4 http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceFirst(java.lang.String,%20java.lang.String)
Use this function instead:
String removeLeadingZeros(String str) {
while (str.indexOf("0")==0)
str = str.substring(1);
return str;
}

Find difference between two Strings

Suppose I have two long strings. They are almost same.
String a = "this is a example"
String b = "this is a examp"
Above code is just for example. Actual strings are quite long.
Problem is one string have 2 more characters than the other.
How can I check which are those two character?
You can use StringUtils.difference(String first, String second).
This is how they implemented it:
public static String difference(String str1, String str2) {
if (str1 == null) {
return str2;
}
if (str2 == null) {
return str1;
}
int at = indexOfDifference(str1, str2);
if (at == INDEX_NOT_FOUND) {
return EMPTY;
}
return str2.substring(at);
}
public static int indexOfDifference(CharSequence cs1, CharSequence cs2) {
if (cs1 == cs2) {
return INDEX_NOT_FOUND;
}
if (cs1 == null || cs2 == null) {
return 0;
}
int i;
for (i = 0; i < cs1.length() && i < cs2.length(); ++i) {
if (cs1.charAt(i) != cs2.charAt(i)) {
break;
}
}
if (i < cs2.length() || i < cs1.length()) {
return i;
}
return INDEX_NOT_FOUND;
}
To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.
StringUtils.difference(null, null) = null
StringUtils.difference("", "") = ""
StringUtils.difference("", "abc") = "abc"
StringUtils.difference("abc", "") = ""
StringUtils.difference("abc", "abc") = ""
StringUtils.difference("ab", "abxyz") = "xyz"
StringUtils.difference("abcde", "abxyz") = "xyz"
StringUtils.difference("abcde", "xyz") = "xyz"
Without iterating through the strings you can only know that they are different, not where - and that only if they are of different length. If you really need to know what the different characters are, you must step through both strings in tandem and compare characters at the corresponding places.
The following Java snippet efficiently computes a minimal set of characters that have to be removed from (or added to) the respective strings in order to make the strings equal. It's an example of dynamic programming.
import java.util.HashMap;
import java.util.Map;
public class StringUtils {
/**
* Examples
*/
public static void main(String[] args) {
System.out.println(diff("this is a example", "this is a examp")); // prints (le,)
System.out.println(diff("Honda", "Hyundai")); // prints (o,yui)
System.out.println(diff("Toyota", "Coyote")); // prints (Ta,Ce)
System.out.println(diff("Flomax", "Volmax")); // prints (Fo,Vo)
}
/**
* Returns a minimal set of characters that have to be removed from (or added to) the respective
* strings to make the strings equal.
*/
public static Pair<String> diff(String a, String b) {
return diffHelper(a, b, new HashMap<>());
}
/**
* Recursively compute a minimal set of characters while remembering already computed substrings.
* Runs in O(n^2).
*/
private static Pair<String> diffHelper(String a, String b, Map<Long, Pair<String>> lookup) {
long key = ((long) a.length()) << 32 | b.length();
if (!lookup.containsKey(key)) {
Pair<String> value;
if (a.isEmpty() || b.isEmpty()) {
value = new Pair<>(a, b);
} else if (a.charAt(0) == b.charAt(0)) {
value = diffHelper(a.substring(1), b.substring(1), lookup);
} else {
Pair<String> aa = diffHelper(a.substring(1), b, lookup);
Pair<String> bb = diffHelper(a, b.substring(1), lookup);
if (aa.first.length() + aa.second.length() < bb.first.length() + bb.second.length()) {
value = new Pair<>(a.charAt(0) + aa.first, aa.second);
} else {
value = new Pair<>(bb.first, b.charAt(0) + bb.second);
}
}
lookup.put(key, value);
}
return lookup.get(key);
}
public static class Pair<T> {
public Pair(T first, T second) {
this.first = first;
this.second = second;
}
public final T first, second;
public String toString() {
return "(" + first + "," + second + ")";
}
}
}
To directly get only the changed section, and not just the end, you can use Google's Diff Match Patch.
List<Diff> diffs = new DiffMatchPatch().diffMain("stringend", "stringdiffend");
for (Diff diff : diffs) {
if (diff.operation == Operation.INSERT) {
return diff.text; // Return only single diff, can also find multiple based on use case
}
}
For Android, add: implementation 'org.bitbucket.cowwoc:diff-match-patch:1.2'
This package is far more powerful than just this feature, it is mainly used for creating diff related tools.
String strDiffChop(String s1, String s2) {
if (s1.length > s2.length) {
return s1.substring(s2.length - 1);
} else if (s2.length > s1.length) {
return s2.substring(s1.length - 1);
} else {
return null;
}
}
Google's Diff Match Patch is good, but it was a pain to install into my Java maven project. Just adding a maven dependency did not work; eclipse just created the directory and added the lastUpdated info files. Finally, on the third try, I added the following to my pom:
<dependency>
<groupId>fun.mike</groupId>
<artifactId>diff-match-patch</artifactId>
<version>0.0.2</version>
</dependency>
Then I manually placed the jar and source jar files into my .m2 repo from https://search.maven.org/search?q=g:fun.mike%20AND%20a:diff-match-patch%20AND%20v:0.0.2
After all that, the following code worked:
import fun.mike.dmp.Diff;
import fun.mike.dmp.DiffMatchPatch;
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diff_main("Hello World.", "Goodbye World.");
System.out.println(diffs);
The result:
[Diff(DELETE,"Hell"), Diff(INSERT,"G"), Diff(EQUAL,"o"), Diff(INSERT,"odbye"), Diff(EQUAL," World.")]
Obviously, this was not originally written (or even ported fully) into Java. (diff_main? I can feel the C burning into my eyes :-) )
Still, it works. And for people working with long and complex strings, it can be a valuable tool.
To find the words that are different in the two lines, one can use the following code.
String[] strList1 = str1.split(" ");
String[] strList2 = str2.split(" ");
List<String> list1 = Arrays.asList(strList1);
List<String> list2 = Arrays.asList(strList2);
// Prepare a union
List<String> union = new ArrayList<>(list1);
union.addAll(list2);
// Prepare an intersection
List<String> intersection = new ArrayList<>(list1);
intersection.retainAll(list2);
// Subtract the intersection from the union
union.removeAll(intersection);
for (String s : union) {
System.out.println(s);
}
In the end, you will have a list of words that are different in both the lists. One can modify it easily to simply have the different words in the first list or the second list and not simultaneously. This can be done by removing the intersection from only from list1 or list2 instead of the union.
Computing the exact location can be done by adding up the lengths of each word in the split list (along with the splitting regex) or by simply doing String.indexOf("subStr").
On top of using StringUtils.difference(String first, String second) as seen in other answers, you can also use StringUtils.indexOfDifference(String first, String second) to get the index of where the strings start to differ. Ex:
StringUtils.indexOfDifference("abc", "dabc") = 0
StringUtils.indexOfDifference("abc", "abcd") = 3
where 0 is used as the starting index.
Another great library for discovering the difference between strings is DiffUtils at https://github.com/java-diff-utils. I used Dmitry Naumenko's fork:
public void testDiffChange() {
final List<String> changeTestFrom = Arrays.asList("aaa", "bbb", "ccc");
final List<String> changeTestTo = Arrays.asList("aaa", "zzz", "ccc");
System.out.println("changeTestFrom=" + changeTestFrom);
System.out.println("changeTestTo=" + changeTestTo);
final Patch<String> patch0 = DiffUtils.diff(changeTestFrom, changeTestTo);
System.out.println("patch=" + Arrays.toString(patch0.getDeltas().toArray()));
String original = "abcdefghijk";
String badCopy = "abmdefghink";
List<Character> originalList = original
.chars() // Convert to an IntStream
.mapToObj(i -> (char) i) // Convert int to char, which gets boxed to Character
.collect(Collectors.toList()); // Collect in a List<Character>
List<Character> badCopyList = badCopy.chars().mapToObj(i -> (char) i).collect(Collectors.toList());
System.out.println("original=" + original);
System.out.println("badCopy=" + badCopy);
final Patch<Character> patch = DiffUtils.diff(originalList, badCopyList);
System.out.println("patch=" + Arrays.toString(patch.getDeltas().toArray()));
}
The results show exactly what changed where (zero based counting):
changeTestFrom=[aaa, bbb, ccc]
changeTestTo=[aaa, zzz, ccc]
patch=[[ChangeDelta, position: 1, lines: [bbb] to [zzz]]]
original=abcdefghijk
badCopy=abmdefghink
patch=[[ChangeDelta, position: 2, lines: [c] to [m]], [ChangeDelta, position: 9, lines: [j] to [n]]]
For a simple use case like this. You can check the sizes of the string and use the split function. For your example
a.split(b)[1]
I think the Levenshtein algorithm and the 3rd party libraries brought out for this very simple (and perhaps poorly stated?) test case are WAY overblown.
Assuming your example does not suggest the two bytes are always different at the end, I'd suggest the JDK's Arrays.mismatch( byte[], byte[] ) to find the first index where the two bytes differ.
String longer = "this is a example";
String shorter = "this is a examp";
int differencePoint = Arrays.mismatch( longer.toCharArray(), shorter.toCharArray() );
System.out.println( differencePoint );
You could now repeat the process if you suspect the second character is further along in the String.
Or, if as you suggest in your example the two characters are together, there is nothing further to do. Your answer then would be:
System.out.println( longer.charAt( differencePoint ) );
System.out.println( longer.charAt( differencePoint + 1 ) );
If your string contains characters outside of the Basic Multilingual Plane - for example emoji - then you have to use a different technique. For example,
String a = "a 🐣 is cuter than a 🐇.";
String b = "a 🐣 is cuter than a 🐹.";
int firstDifferentChar = Arrays.mismatch( a.toCharArray(), b.toCharArray() );
int firstDifferentCodepoint = Arrays.mismatch( a.codePoints().toArray(), b.codePoints().toArray() );
System.out.println( firstDifferentChar ); // prints 22!
System.out.println( firstDifferentCodepoint ); // prints 20, which is correct.
System.out.println( a.codePoints().toArray()[ firstDifferentCodepoint ] ); // prints out 128007
System.out.println( new String( Character.toChars( 128007 ) ) ); // this prints the rabbit glyph.
You may try this
String a = "this is a example";
String b = "this is a examp";
String ans= a.replace(b, "");
System.out.print(now);
//ans=le

Categories

Resources