Java - How to validate this string? - java

Do it exists a tool in Java to do this type of task below?
I got this hard typed String: {[1;3] || [7;9;10-13]}
The curly brackets {} means that is required
The square brackets [] means a group that is required
The double pipe || means a "OR"
Reading the string above, we get this:
It's required that SOME STRING have 1 AND 3 OR 7 AND 9 AND 10, 11, 12 AND 13
If true, it will pass. If false, will not pass.
I'm trying to do this in hard coding, but I'm felling that there is an easier or a RIGHT WAY to this type of validation.
Which type of content I must study to learn more about this?
I started with this code, but I'm felling that is not right:
//Gets the string
String requiredGroups = "{[1;3]||[7;9;10-13]}";
//Gets the groups that an Object belongs to
//It will return something like 5,7,9,10,11,12
List<Integer> groupsThatAnObjectIs = object.getListOfGroups();
//Validate if the Object is in the required groups
if ( DoTheObjectIsInRequiredGroups( groupsThatAnObjectIs, requiredGroups ) ) {
//Do something
}
I'm trying to use this iterator to get the required values from the requiredGroups variable
//Used for values like {[1;3]||[9;10;11-15]} and returns the required values
public static void IterateRequiredValues(String values, List<String> requiredItems) {
values = values.trim();
if( !values.equals("") && values.length() > 0 ) {
values = values.replace("{", "");
values = values.replace("}", "");
String arrayRequiredItems[];
if ( values.contains("||") ) {
arrayRequiredItems = values.split("||");
}
//NOTE: it's not done yet
}
}

So the rules are not really clear to me.
For example, are you only focussing on || or do you also have &&?
If I look at your example, I can derive from it that the && and operators are implicit in the ;.
None the less, I have made a code example (without much regex) that checks your rules.
First you need to begin with the || operator.
Put all the different OR statements into a String block.
Next you will need to check each element in the String block and check if the input value contains all block values.
If so then it must be true that your input string contains all the rules set by you.
If your rule consists of a range, you must first fully fill the range block
and then do the same with the range block as you would with the normal rule value.
Complete code example below.
package nl.stackoverflow.www.so;
import java.util.ArrayList;
import java.util.List;
public class App
{
private String rules = "{[1;3] || [7;9;10-13] || [34;32]}";
public static void main( String[] args )
{
new App();
}
public App() {
String[] values = {"11 12", "10 11 12 13", "1 2 3", "1 3", "32 23", "23 32 53 34"};
// Iterate over each value in String array
for (String value : values) {
if (isWithinRules(value)) {
System.out.println("Success: " + value);
}
}
}
private boolean isWithinRules(String inputValue) {
boolean result = false;
// || is a special char, so you need to escape it with \. and since \ is also a special char
// You need to escape the \ with another \ so \\| is valid for one | (pipe)
String[] orRules = rules.split("\\|\\|");
// Iterate over each or rules
for (String orRule : orRules) {
// Remove [] and {} from rules
orRule = orRule.replace("[", "");
orRule = orRule.replace("]", "");
orRule = orRule.replace("{", "");
orRule = orRule.replace("}", "");
orRule.trim();
// Split all and rules of or rule
String[] andRules = orRule.split(";");
boolean andRulesApply = true;
// Iterate over all and rules
for (String andRule : andRules) {
andRule = andRule.trim();
// check if andRule is range
if (andRule.contains("-")) {
String[] andRulesRange = andRule.split("-");
int beginRangeAndRule = Integer.parseInt(andRulesRange[0]);
int endRangeAndRule = Integer.parseInt(andRulesRange[1]);
List<String> andRangeRules = new ArrayList<String>();
// Add all values to another rule array
while (beginRangeAndRule < endRangeAndRule) {
andRangeRules.add(Integer.toString(beginRangeAndRule));
beginRangeAndRule++;
}
for (String andRangeRule : andRangeRules) {
// Check if andRule does not contain in String inputValue
if (!valueContainsRule(inputValue, andRangeRule)) {
andRulesApply = false;
break;
}
}
} else {
// Check if andRule does not contain in String inputValue
if (!valueContainsRule(inputValue, andRule)) {
andRulesApply = false;
break;
}
}
}
// If andRules apply, break and set bool to true because string contains all andRules
if (andRulesApply) {
result = true;
break;
}
}
return result;
}
private boolean valueContainsRule(String val, String rule) {
boolean result = true;
// Check if andRule does not contain in String inputValue
if (!val.contains(rule)) {
result = false;
}
return result;
}
}

Related

How to program a context-free grammar?

I have two classes here.
The CFG class takes a string array in its constructor that defines the context-free grammar. The SampleTest class is being used to test the CFG class by inputting the grammar (C) into the class, then inputting a string by the user, and seeing if that string can be generated by the context-free grammar.
The problem I'm running into is a stack overflow (obviously). I'm assuming that I just created a never-ending recursive function.
Could someone take a look at the processData() function, and help me out figure out how to correctly configure it. I'm basically using recursion to take generate all possibilities for strings that the CFG can create, then returning true if one of those possibilities being generated matches the user's input (inString). Oh, and the wkString parameter is simply the string being generated by the grammar through each recursive iteration.
public class SampleTest {
public static void main(String[] args) {
// Language: strings that contain 0+ b's, followed by 2+ a's,
// followed by 1 b, and ending with 2+ a's.
String[] C = { "S=>bS", "S=>aaT", "T=>aT", "T=>bU", "U=>Ua", "U=>aa" };
String inString, startWkString;
boolean accept1;
CFG CFG1 = new CFG(C);
if (args.length >= 1) {
// Input string is command line parameter
inString = args[0];
char[] startNonTerm = new char[1];
startNonTerm[0] = CFG1.getStartNT();
startWkString = new String(startNonTerm);
accept1 = CFG1.processData(inString, startWkString);
System.out.println(" Accept String? " + accept1);
}
} // end main
} // end class
public class CFG {
private String[] code;
private char startNT;
CFG(String[] c) {
this.code = c;
setStartNT(c[0].charAt(0));
}
void setStartNT(char startNT) {
this.startNT = startNT;
}
char getStartNT() {
return this.startNT;
}
boolean processData(String inString, String wkString) {
if (inString.equals(wkString)) {
return true;
} else if (wkString.length() > inString.length()) {
return false;
}
// search for non-terminal in the working string
boolean containsNT = false;
for (int i = 0; i < wkString.length(); i++) {
// if one of the characters in the working string is a non-terminal
if (Character.isUpperCase(wkString.charAt(i))) {
// mark containsNT as true, and exit the for loop
containsNT = true;
break;
}
}
// if there isn't a non-terminal in the working string
if (containsNT == false) {
return false;
}
// for each production rule
for (int i = 0; i < this.code.length; i++) {
// for each character on the RHS of the production rule
for (int j = 0; j <= this.code[i].length() - 3; j++) {
if (Character.isUpperCase(this.code[i].charAt(j))) {
// make substitution for non-terminal, creating a new working string
String newWk = wkString.replaceFirst(Character.toString(this.code[i].charAt(0)), this.code[i].substring(3));
if (processData(inString, newWk) == true) {
return true;
}
}
}
} // end for loop
return false;
} // end processData
} // end class
Your grammar contains a left-recursive rule
U=>Ua
Recursive-descent parsers can't handle left-recursion, as you've just discovered.
You have two options: Alter your grammar to not be left-recursive anymore, or use a parsing algorithm that can handle it, such as LR1. In your case, U is matching "at least two a characters", so we can just move the recursion to the right.
U=>aU
and everything will be fine. This isn't always possible to do in such a nice way, but in your case, avoiding left-recursion is the easy solution.
You don't need this for loop: "for (int j = 0; j <= this.code[i].length() - 3; j++)". jus create a var to hold the Capital letter in the nonterminal search you did above. Then do your outer for loop followed by if there is a production rule in String[] that starts with that found Non-terminal, do your substitution and recursion.

Check if any part of a string input is not a number

I couldnt find an answer for this in Java, so I'll ask here. I need to check if 3 parts of a string input contains a number (int).
The input will be HOURS:MINUTES:SECONDS (E.g. 10:40:50, which will be 10 hours, 40 minutes and 50 seconds). So far I am getting the values in String[] into an array by splitting it on :. I have parsed the strings into ints and I am using an if statement to check if all 3 parts is equal or larger than 0. The problem is that if I now use letters I will only just get an error, but I want to check if any of the 3 parts contains a character that is not 0-9, but dont know how.
First I thought something like this could work, but really dont.
String[] inputString = input.split(":");
if(inputString.length == 3) {
String[] alphabet = {"a","b","c"};
if(ArrayUtils.contains(alphabet,input)){
gives error message
}
int hoursInt = Integer.parseInt(inputString[0]);
int minutesInt = Integer.parseInt(inputString[1]);
int secondsInt = Integer.parseInt(inputString[2]);
else if(hoursInt >= 0 || minutesInt >= 0 || secondsInt >= 0) {
successfull
}
else {
gives error message
}
else {
gives error message
}
In the end I just want to check if any of the three parts contains a character, and if it doesnt, run something.
If you are sure you always have to parse a String of the form/pattern HH:mm:ss
(describing a time of day),
you can try to parse it to a LocalTime, which will only work if the parts HH, mm and ss are actually valid integers and valid time values.
Do it like this and maybe catch an Exception for a wrong input String:
public static void main(String[] arguments) {
String input = "10:40:50";
String wrongInput = "ab:cd:ef";
LocalTime time = LocalTime.parse(input);
System.out.println(time.format(DateTimeFormatter.ISO_LOCAL_TIME));
try {
LocalTime t = LocalTime.parse(wrongInput);
} catch (DateTimeParseException dtpE) {
System.err.println("Input not parseable...");
dtpE.printStackTrace();
}
}
The output of this minimal example is
10:40:50
Input not parseable...
java.time.format.DateTimeParseException: Text 'ab:cd:ef' could not be parsed at index 0
at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
at java.time.LocalTime.parse(LocalTime.java:441)
at java.time.LocalTime.parse(LocalTime.java:426)
at de.os.prodefacto.StackoverflowDemo.main(StackoverflowDemo.java:120)
I would personally create my own helper methods for this, instead of using an external library such as Apache (unless you already plan on using the library elsewhere in the project).
Here is an example of what it could look like:
public static void main(String[] arguments) {
String time = "10:50:45";
String [] arr = time.split(":");
if (containsNumbers(arr)) {
System.out.println("Time contained a number!");
}
//You can put an else if you want something to happen when it is not a number
}
private static boolean containsNumbers(String[] arr) {
for (String s : arr) {
if (!isNumeric(s)) {
return false;
}
}
return true;
}
public static boolean isNumeric(String str) {
return str.matches("-?\\d+(.\\d+)?");
}
containsNumbers will take a String array as an input and use an enhanced for loop to iterate through all the String values, using the other helper method isNumeric that checks if the String is a number or not using regex.
This code has the benefit of not being dependent on Exceptions to handle any of the logic.
You can also modify this code to use a String as a parameter instead of an array, and let it handle the split inside of the method instead of outside.
Note that typically there are better ways to work with date and time, but I thought I would answer your literal question.
Example Runs:
String time = "sd:fe:gbdf";
returns false
String time = "as:12:sda";
returns false
String time = "10:50:45";
returns true
You can check the stream of characters.
If the filter does not detect a non-digit, return "Numeric"
Otherwise, return "Not Numeric"
String str = "922029202s9202920290220";
String result = str.chars()
.filter(c -> !Character.isDigit(c))
.findFirst().isEmpty() ? "Numeric"
: "Not Numeric";
System.out.println(result);
If you want to check with nested loop you can see this proposal:
Scanner scanner = new Scanner(System.in);
String [] inputString = scanner.nextLine().split(":");
for (int i = 0; i < inputString.length; i++) {
String current = inputString[i];
for (int k = 0; k < current.length(); k++) {
if (!Character.isDigit(current.charAt(k))) {
System.out.println("Error");
break;
}
}
}
you could use String.matches method :
String notANum= "ok";
String aNum= "7";
if(notANum.matches("^[0-9]+$") sop("no way!");
if(aNum.matches("^[0-9]+$") sop("yes of course!");
The code above would print :
yes of course
The method accepts a regex, the one in the above exemple is for integers.
EDIT
I would use this instead :
if(input.matches("^\d+:\d+:\d+$")) success;
else error
You don't have to split the string.
I tried to make your code better, take a look. You can use Java regex to validate numbers. also defined range for time so no 24:61:61 values is allowed.
public class Regex {
static boolean range(int timeval,int min,int max)
{
boolean status=false;
if(timeval>=min && timeval<max)
{status=true;}
return status;
}
public static void main(String[] args) {
String regex = "[0-9]{1,2}";
String input ="23:59:59";
String msg="please enter valid time ";
String[] inputString = input.split(":");
if(inputString[0].matches(regex) && inputString[1].matches(regex) && inputString[2].matches(regex) )
{
if(Regex.range(Integer.parseInt(inputString[0]), 00, 24) &&Regex.range(Integer.parseInt(inputString[1]), 00, 60) && Regex.range(Integer.parseInt(inputString[2]), 00, 60))
{msg="converted time = " + Integer.parseInt(inputString[0]) + " : " +Integer.parseInt(inputString[1])+ " : " +Integer.parseInt(inputString[2]) ;}
}
System.out.println(msg);
}
}

Generating logic for possible output of all the combinations of string in a list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Hi I am trying to generate response as true or false if a code exists in the list. So I am able to generate the response if the string contains a 'single in-brackets' values for example:"ABC(Q,E,1)EEE", but if a string has multiple brackets like:"B(A,1)AA(E,Z)EE", I am not able to generate output from this. I am new to coding and building logics, it will be great if someone can help.
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
System.out.println("Enter the code you want to check: ");
String input = scan.next();
List<String> codes = new ArrayList<>();
codes.add("ABC(Q,E,1)EEE");
codes.add("ABDCE(E,Z,X)E");
codes.add("B(A,1)AAEEE");
codes.add("R(1,2,3,4,5)RT(U,M,N,B,V,H)(Q,E,R,F,G,H)(R,Z)");
codes.add("B(A,1)AA(E,Z)EE");
for (Iterator<String> i = codes.iterator(); i.hasNext(); ) {
String code = i.next();
String prefix = code.substring(0, code.indexOf("("));
String suffix = code.substring(code.indexOf(")") + 1);
String middle = code.substring(code.indexOf("(") + 1, code.indexOf(")"));
String[] var = middle.split(",");
String[] result = new String[var.length];
for (int j = 0; j < var.length; j++) {
result[j] = prefix + var[j] + suffix;
if (result[j].equals(input)) {
System.out.println("True: This code is present");
}
}
}
}
Output (which works):
Enter the code you want to check:
BAAAEEE
True: The code is present
Output(not working):
Enter the code you want to check:
BAAAZEE
<gives no output>
Let me give you an example(for "ABC(Q,E,1)EEE") of what is being done: it makes three possible outputs of this string that are: "ABCQEEE", "ABCEEEE", "ABC1EEE". So if i give the input as "ABCQEEE" , it will generate these outputs internally and give me output as True if the code is present anywhere in the list.
If all you have to do is to out put true or false depending on the user input, you can convert your code strings to regular expressions and check if input matches the list of regex.
Steps:
Convert each element in your codes list to a regex
// convert "ABC(Q,E,1)EEE" to "ABC[QE1]EEE" to match each string starting with ABC followed by one of [QE1] and ending with EEE
//"R(1,2,3,4,5)RT(U,M,N,B,V,H)(Q,E,R,F,G,H)(R,Z)" to "R[12345]RT[UMNBVH][QERFGH][RZ]"
etc
Check if input matches one of the regexes
Example:
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
System.out.println("Enter the code you want to check: ");
String input = scan.next();
List<String> codes = new ArrayList<>();
codes.add("ABC(Q,E,1)EEE");
codes.add("ABDCE(E,Z,X)E");
codes.add("B(A,1)AAEEE");
codes.add("R(1,2,3,4,5)RT(U,M,N,B,V,H)(Q,E,R,F,G,H)(R,Z)");
codes.add("B(A,1)AA(E,Z)EE");
//list to store the modified strings
List<String> modifiedCodes = new ArrayList<>();
//for each string in list find if there is a pattern like '('some chars')'
Pattern p = Pattern.compile("\\(.*\\)");
for (Iterator<String> i = codes.iterator(); i.hasNext();) {
String code = i.next();
StringBuffer sb = new StringBuffer ();
Matcher m = p.matcher(code);
while (m.find()) {
String match = m.group();
//if found a match replace '(' and ')' with '[' and ']' and remove commas
m.appendReplacement(sb, match.replace('(', '[').replace(')', ']').replace(",", ""));
}
m.appendTail(sb);
//add modified string to list
modifiedCodes.add(sb.toString());
}
boolean codeIsPresent = false;
for(String code: modifiedCodes){
//check if input matches one of the regex in the list 'modifiedCodes'
if (input.matches(code)) {
codeIsPresent = true;
System.out.println("True: This code is present");
break;
}
}
if(!codeIsPresent){
System.out.println("Code not found");
}
}
EDIT
how can we print the list of all the combinations of the string from
which it is getting the output? say, I just have a string
"BA(1,2,3)QW(A-Z,0-9)" and I want all the possible combinations of it
The above question from your coments is slightly difference as the original post, it might be better if you post a new question. You can create your own algorithm with somekind of tree structure to solve the issue but it can be very hackish and messy. I would suggest to use a 3rd party libraray like generex if possible. You can download the jar from the maven repo here. With generex you can have all your possible commbinations:
public static void main(String args[]){
//change your input to a regular expression
//"BA(1,2,3)QW(A-Z,0-9)" to "BA[1-3]QW[A-Z][0-9]"
Generex generex = new Generex("BA[1-3]QW[A-Z][0-9]");
List<String> matchedStrs = generex.getAllMatchedStrings();
matchedStrs.forEach(System.out::println);
}
Try this.
Edited : Added code comments.
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
public class Main {
public static void main(String args[]) {
Scanner scan = new Scanner(System.in);
System.out.println("Enter the code you want to check: ");
String input = scan.next();
scan.close();
List<String> codes = new ArrayList<>();
codes.add("ABC(Q,E,1)EEE");
codes.add("ABDCE(E,Z,X)E");
codes.add("B(A,1)AAEEE");
codes.add("R(1,2,3,4,5)RT(U,M,N,B,V,H)(Q,E,R,F,G,H)(R,Z)");
codes.add("B(A,1)AA(E,Z)EE");
for (Iterator<String> i = codes.iterator(); i.hasNext();) {
String code = i.next();
List<String> codePossiblity = generatePossibilities(code);
// check if the input is in the list of all the possibility
for (String s : codePossiblity) {
if (s.contains(input)) {
System.out.println("True: This code is present");
}
}
}
}
/* This method removes the parenthesis and generates all the possibilities.
* This method assumes that the parenthesis always comes in pair, thus
* for every opening parenthesis ["("] there is a closing parenthesis [")"]
* Example if the "code" is [A(WX)C(YZ)] then it will generate AWCY, AWCZ, AXCY and AXCZ
*
* #param code - The string which contains parenthesis.
* #return a list of all the possibilities
*/
public static List<String> generatePossibilities(String code) {
// This will hold the left part of the possibleCodes (part before "(")
List<String> possibleCodeList = new LinkedList<>();
String s = code;
boolean first = true;
// Loop while an open parenthesis ["("] can be found
while (s.contains("(")) {
// Retrieve from the string the first substring where "(" starts and ends with ")"
// In the example, in the first iteration will be "WX"
// in the second iteration this will be "YZ"
String inside = s.substring(s.indexOf("(") + 1, s.indexOf(")"));
// Retrieve the list inside the "(" and ")"
// In the example, in the first iteration the list will have "W", "X"
// in the second iteration the list will have "Y", "Z"
String[] listOfChoices = inside.split(",");
// This will hold the right part of the possibleCodes (part after ")")
List<String> listOfCombinations = new LinkedList<>();
// Loop all the possible choices
for (String choice : listOfChoices) {
// If it is the first iteration then you need to include those characters before the "("
if (first) {
// add the characters before the "(" and the remaining characters after ")"
// In the example, the first iteration of this list ("W", "X") will add "AWC(YZ)"
// the second iteration of this list ("W", "X") will add "AXC(YZ)"
listOfCombinations.add(s.substring(0, s.indexOf("(")) + choice + s.substring(s.indexOf(")") + 1));
}
// Else just start with choice
else {
// add the remaining characters after ")"
// In the example, the first iteration of this list ("Y", "Z") will add "Y"
// the second iteration of this list ("Y", "Z") will add "Z"
listOfCombinations.add(choice + s.substring(s.indexOf(")") + 1));
}
}
// Remove the subtring before the ")", in the example this will be "C(YZ)"
s = s.substring(s.indexOf(")") + 1);
// If it is the first iteration then you just need to assign the listOfCombinations directly to possibleCodeList,
// since possibleCodeList is still empty
if (first) {
possibleCodeList = listOfCombinations;
first = false;
}
// Else combine the left and right part
else {
List<String> codePossiblity2 = new LinkedList<>();
// Iterate though all the list of possible codes since we want all the elements in the list to be concatenated with the right half of the string
// The list will have "AWC(YZ)" and "AXC(YZ)"
for (String possibleCodes : possibleCodeList) {
// Iterate the possible combinations of the right half of the original string (the second pair of "()")
// The list will have "Y" and "Z"
for (String sTmp : listOfCombinations) {
// Replace the string which are inside the "()" in the left half of the original string.
// Replace it with the right half of the original string
// In the string of "AWC(YZ)" replace "(YZ)" with "Y"
// In the string of "AWC(YZ)" replace "(YZ)" with "Z"
// In the string of "AXC(YZ)" replace "(YZ)" with "Y"
// In the string of "AXC(YZ)" replace "(YZ)" with "Z"
String t = possibleCodes.replace("(" + inside + ")", sTmp);
// add the newly created string to codePossiblity2
codePossiblity2.add(t);
}
// At the end of the loop above codePossiblity2 will have these values
// AWCY, AWCZ, AXCY and AXCZ
}
// overwrite the possibleCodeList since we have now a new left part of the string
possibleCodeList = codePossiblity2;
}
}
return possibleCodeList;
}
}

Removing leading zero in java code

May I know how can I remove the leading zero in JAVA code? I tried several methods like regex tools
"s.replaceFirst("^0+(?!$)", "") / replaceAll("^0*", "");`
but it's seem like not support with my current compiler compliance level (1.3), will have a red line stated the method replaceFirst(String,String)is undefined for the type String.
Part of My Java code
public String proc_MODEL(Element recElement)
{
String SEAT = "";
try
{
SEAT = setNullToString(recElement.getChildText("SEAT")); // xml value =0000500
if (SEAT.length()>0)
{
SEAT = SEAT.replaceFirst("^0*", ""); //I need to remove leading zero to only 500
}
catch (Exception e)
{
e.printStackTrace();
return "501 Exception in proc_MODEL";
}
}
}
Appreciate for help.
If you want remove leading zeros, you could parse to an Integer and convert back to a String with one line like
String seat = "001";// setNullToString(recElement.getChildText("SEAT"));
seat = Integer.valueOf(seat).toString();
System.out.println(seat);
Output is
1
Of course if you intend to use the value it's probably better to keep the int
int s = Integer.parseInt(seat);
System.out.println(s);
replaceFirst() was introduced in 1.4 and your compiler pre-dates that.
One possibility is to use something like:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
while ((s.length() > 1) && (s.charAt(0) == '0'))
s = s.substring(1);
System.out.println(s);
}
}
It's not the most efficient code in the world but it'll get the job done.
A more efficient segment without unnecessary string creation could be:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
int pos = 0;
int len = s.length();
while ((pos < len-1) && (s.charAt(pos) == '0'))
pos++;
s = s.substring(pos);
System.out.println(s);
}
}
Both of those also handle the degenerate cases of an empty string and a string containing only 0 characters.
Using a java method str.replaceAll("^0+(?!$)", "") would be simple;
First parameter:regex -- the regular expression to which this string is to be matched.
Second parameter: replacement -- the string which would replace matched expression.
As stated in Java documentation, 'replaceFirst' only started existing since Java 1.4 http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceFirst(java.lang.String,%20java.lang.String)
Use this function instead:
String removeLeadingZeros(String str) {
while (str.indexOf("0")==0)
str = str.substring(1);
return str;
}

Find difference between two Strings

Suppose I have two long strings. They are almost same.
String a = "this is a example"
String b = "this is a examp"
Above code is just for example. Actual strings are quite long.
Problem is one string have 2 more characters than the other.
How can I check which are those two character?
You can use StringUtils.difference(String first, String second).
This is how they implemented it:
public static String difference(String str1, String str2) {
if (str1 == null) {
return str2;
}
if (str2 == null) {
return str1;
}
int at = indexOfDifference(str1, str2);
if (at == INDEX_NOT_FOUND) {
return EMPTY;
}
return str2.substring(at);
}
public static int indexOfDifference(CharSequence cs1, CharSequence cs2) {
if (cs1 == cs2) {
return INDEX_NOT_FOUND;
}
if (cs1 == null || cs2 == null) {
return 0;
}
int i;
for (i = 0; i < cs1.length() && i < cs2.length(); ++i) {
if (cs1.charAt(i) != cs2.charAt(i)) {
break;
}
}
if (i < cs2.length() || i < cs1.length()) {
return i;
}
return INDEX_NOT_FOUND;
}
To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.
StringUtils.difference(null, null) = null
StringUtils.difference("", "") = ""
StringUtils.difference("", "abc") = "abc"
StringUtils.difference("abc", "") = ""
StringUtils.difference("abc", "abc") = ""
StringUtils.difference("ab", "abxyz") = "xyz"
StringUtils.difference("abcde", "abxyz") = "xyz"
StringUtils.difference("abcde", "xyz") = "xyz"
Without iterating through the strings you can only know that they are different, not where - and that only if they are of different length. If you really need to know what the different characters are, you must step through both strings in tandem and compare characters at the corresponding places.
The following Java snippet efficiently computes a minimal set of characters that have to be removed from (or added to) the respective strings in order to make the strings equal. It's an example of dynamic programming.
import java.util.HashMap;
import java.util.Map;
public class StringUtils {
/**
* Examples
*/
public static void main(String[] args) {
System.out.println(diff("this is a example", "this is a examp")); // prints (le,)
System.out.println(diff("Honda", "Hyundai")); // prints (o,yui)
System.out.println(diff("Toyota", "Coyote")); // prints (Ta,Ce)
System.out.println(diff("Flomax", "Volmax")); // prints (Fo,Vo)
}
/**
* Returns a minimal set of characters that have to be removed from (or added to) the respective
* strings to make the strings equal.
*/
public static Pair<String> diff(String a, String b) {
return diffHelper(a, b, new HashMap<>());
}
/**
* Recursively compute a minimal set of characters while remembering already computed substrings.
* Runs in O(n^2).
*/
private static Pair<String> diffHelper(String a, String b, Map<Long, Pair<String>> lookup) {
long key = ((long) a.length()) << 32 | b.length();
if (!lookup.containsKey(key)) {
Pair<String> value;
if (a.isEmpty() || b.isEmpty()) {
value = new Pair<>(a, b);
} else if (a.charAt(0) == b.charAt(0)) {
value = diffHelper(a.substring(1), b.substring(1), lookup);
} else {
Pair<String> aa = diffHelper(a.substring(1), b, lookup);
Pair<String> bb = diffHelper(a, b.substring(1), lookup);
if (aa.first.length() + aa.second.length() < bb.first.length() + bb.second.length()) {
value = new Pair<>(a.charAt(0) + aa.first, aa.second);
} else {
value = new Pair<>(bb.first, b.charAt(0) + bb.second);
}
}
lookup.put(key, value);
}
return lookup.get(key);
}
public static class Pair<T> {
public Pair(T first, T second) {
this.first = first;
this.second = second;
}
public final T first, second;
public String toString() {
return "(" + first + "," + second + ")";
}
}
}
To directly get only the changed section, and not just the end, you can use Google's Diff Match Patch.
List<Diff> diffs = new DiffMatchPatch().diffMain("stringend", "stringdiffend");
for (Diff diff : diffs) {
if (diff.operation == Operation.INSERT) {
return diff.text; // Return only single diff, can also find multiple based on use case
}
}
For Android, add: implementation 'org.bitbucket.cowwoc:diff-match-patch:1.2'
This package is far more powerful than just this feature, it is mainly used for creating diff related tools.
String strDiffChop(String s1, String s2) {
if (s1.length > s2.length) {
return s1.substring(s2.length - 1);
} else if (s2.length > s1.length) {
return s2.substring(s1.length - 1);
} else {
return null;
}
}
Google's Diff Match Patch is good, but it was a pain to install into my Java maven project. Just adding a maven dependency did not work; eclipse just created the directory and added the lastUpdated info files. Finally, on the third try, I added the following to my pom:
<dependency>
<groupId>fun.mike</groupId>
<artifactId>diff-match-patch</artifactId>
<version>0.0.2</version>
</dependency>
Then I manually placed the jar and source jar files into my .m2 repo from https://search.maven.org/search?q=g:fun.mike%20AND%20a:diff-match-patch%20AND%20v:0.0.2
After all that, the following code worked:
import fun.mike.dmp.Diff;
import fun.mike.dmp.DiffMatchPatch;
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diff_main("Hello World.", "Goodbye World.");
System.out.println(diffs);
The result:
[Diff(DELETE,"Hell"), Diff(INSERT,"G"), Diff(EQUAL,"o"), Diff(INSERT,"odbye"), Diff(EQUAL," World.")]
Obviously, this was not originally written (or even ported fully) into Java. (diff_main? I can feel the C burning into my eyes :-) )
Still, it works. And for people working with long and complex strings, it can be a valuable tool.
To find the words that are different in the two lines, one can use the following code.
String[] strList1 = str1.split(" ");
String[] strList2 = str2.split(" ");
List<String> list1 = Arrays.asList(strList1);
List<String> list2 = Arrays.asList(strList2);
// Prepare a union
List<String> union = new ArrayList<>(list1);
union.addAll(list2);
// Prepare an intersection
List<String> intersection = new ArrayList<>(list1);
intersection.retainAll(list2);
// Subtract the intersection from the union
union.removeAll(intersection);
for (String s : union) {
System.out.println(s);
}
In the end, you will have a list of words that are different in both the lists. One can modify it easily to simply have the different words in the first list or the second list and not simultaneously. This can be done by removing the intersection from only from list1 or list2 instead of the union.
Computing the exact location can be done by adding up the lengths of each word in the split list (along with the splitting regex) or by simply doing String.indexOf("subStr").
On top of using StringUtils.difference(String first, String second) as seen in other answers, you can also use StringUtils.indexOfDifference(String first, String second) to get the index of where the strings start to differ. Ex:
StringUtils.indexOfDifference("abc", "dabc") = 0
StringUtils.indexOfDifference("abc", "abcd") = 3
where 0 is used as the starting index.
Another great library for discovering the difference between strings is DiffUtils at https://github.com/java-diff-utils. I used Dmitry Naumenko's fork:
public void testDiffChange() {
final List<String> changeTestFrom = Arrays.asList("aaa", "bbb", "ccc");
final List<String> changeTestTo = Arrays.asList("aaa", "zzz", "ccc");
System.out.println("changeTestFrom=" + changeTestFrom);
System.out.println("changeTestTo=" + changeTestTo);
final Patch<String> patch0 = DiffUtils.diff(changeTestFrom, changeTestTo);
System.out.println("patch=" + Arrays.toString(patch0.getDeltas().toArray()));
String original = "abcdefghijk";
String badCopy = "abmdefghink";
List<Character> originalList = original
.chars() // Convert to an IntStream
.mapToObj(i -> (char) i) // Convert int to char, which gets boxed to Character
.collect(Collectors.toList()); // Collect in a List<Character>
List<Character> badCopyList = badCopy.chars().mapToObj(i -> (char) i).collect(Collectors.toList());
System.out.println("original=" + original);
System.out.println("badCopy=" + badCopy);
final Patch<Character> patch = DiffUtils.diff(originalList, badCopyList);
System.out.println("patch=" + Arrays.toString(patch.getDeltas().toArray()));
}
The results show exactly what changed where (zero based counting):
changeTestFrom=[aaa, bbb, ccc]
changeTestTo=[aaa, zzz, ccc]
patch=[[ChangeDelta, position: 1, lines: [bbb] to [zzz]]]
original=abcdefghijk
badCopy=abmdefghink
patch=[[ChangeDelta, position: 2, lines: [c] to [m]], [ChangeDelta, position: 9, lines: [j] to [n]]]
For a simple use case like this. You can check the sizes of the string and use the split function. For your example
a.split(b)[1]
I think the Levenshtein algorithm and the 3rd party libraries brought out for this very simple (and perhaps poorly stated?) test case are WAY overblown.
Assuming your example does not suggest the two bytes are always different at the end, I'd suggest the JDK's Arrays.mismatch( byte[], byte[] ) to find the first index where the two bytes differ.
String longer = "this is a example";
String shorter = "this is a examp";
int differencePoint = Arrays.mismatch( longer.toCharArray(), shorter.toCharArray() );
System.out.println( differencePoint );
You could now repeat the process if you suspect the second character is further along in the String.
Or, if as you suggest in your example the two characters are together, there is nothing further to do. Your answer then would be:
System.out.println( longer.charAt( differencePoint ) );
System.out.println( longer.charAt( differencePoint + 1 ) );
If your string contains characters outside of the Basic Multilingual Plane - for example emoji - then you have to use a different technique. For example,
String a = "a 🐣 is cuter than a 🐇.";
String b = "a 🐣 is cuter than a 🐹.";
int firstDifferentChar = Arrays.mismatch( a.toCharArray(), b.toCharArray() );
int firstDifferentCodepoint = Arrays.mismatch( a.codePoints().toArray(), b.codePoints().toArray() );
System.out.println( firstDifferentChar ); // prints 22!
System.out.println( firstDifferentCodepoint ); // prints 20, which is correct.
System.out.println( a.codePoints().toArray()[ firstDifferentCodepoint ] ); // prints out 128007
System.out.println( new String( Character.toChars( 128007 ) ) ); // this prints the rabbit glyph.
You may try this
String a = "this is a example";
String b = "this is a examp";
String ans= a.replace(b, "");
System.out.print(now);
//ans=le

Categories

Resources