How to match the biggest digit in regex - java

I have some pdf files in my downloads folder with a particular string pattern.I need to take the latest saved file.
My code is
public static void main(String args[])
{
String directory=System.getProperty("user.home")+"\\Downloads";
File dir=new File(directory);
for(File file:dir.listFiles())
{
if(file.getName().endsWith(".pdf"))
{
String res=file.getName();
match(res);
//System.out.println(file.getName());
}
}
}
private static void match(String res) {
String pattern="[a-zA-Z][0-9][0-9]CR[0-9][0-9][0-9][0-9]-[a-zA-Z][a-zA-Z][a-zA-Z]-[A-Z]-[0-9] \\(\\d+\\).pdf";
Pattern r=Pattern.compile(pattern);
Matcher m=r.matcher(res);
if(m.find())
{
System.out.println("******* Match *********"+m.group());
}
else
{
System.out.println("******No match*******");
}
}
And my output is like this
******* Match *********F90CR0010-HBR-C-4 (5).pdf
******* Match *********F90CR0010-HBR-C-4 (6).pdf
******* Match *********F90CR0010-HBR-C-4 (7).pdf
Now I need to find the file which has the greatest number inside the braces ().So In this case I need
******* Match *********F90CR0010-HBR-C-4 (7).pdf
Here how to match the greatest integer in regex?
thanks

A simple strategy may be to retrieve the digit in parenthesis, to fill some sorted map where the mapping would be digit -> filename, and finally to get the filename associated to the greatest digit. I don't think it's possible simply with a REGEX.

You can add a group to your regex, and and a counter to keep the number:
int greater = 0;
String greaterFile = "";
String pattern="[a-zA-Z][0-9][0-9]CR[0-9][0-9][0-9][0-9]-[a-zA-Z][a-zA-Z][a-zA-Z]-[A-Z]-[0-9] \\((\\d+)\\).pdf";
//^^^^^^^^
Pattern r=Pattern.compile(pattern);
Matcher m=r.matcher("F90CR0010-HBR-C-4 (7).pdf");
if(m.find())
{
System.out.println("******* Match *********"+m.group());
int number = Integer.parseInt(m.group(1));
if (number > greater)
{
greater = number;
greaterFile = m.group();
}
}
else
{
System.out.println("******No match*******");
}
System.out.println("Greater number is " + greater + " for " + greaterFile);
Notice that I did not escape the () in \\((\\d+)\\).pdf, this is because of their function in the expression, they define a group.
I can later retrieve the group using its index, knowing that the group 0 is the entire match, the next group, 1, is our number.
This is for one file, but you can easily transpose it to your context.
Edit regarding your regex, it can be simplified like this:
String pattern="[a-zA-Z]\\d{2}CR\\d{4}-[a-zA-Z]{3}-[A-Z]-\\d \\((\\d+)\\).pdf";
\\d means a number and {n} means the previous expression n times.

Related

Java check if a scan input is both a specific string and only three digits

I have a project that is asking, "Order is entered by the user. The order either begins with FB or SB and then has three digits after those letters. Must check to be sure the order number is either letter code and only three digits." in java.
ex.
Create order number [FB or SB for type of gift and three integers]: FB343
I'm struggling to find how to validate both in one input.
That looks like a regular expression to me. You can use a Pattern and Matcher to test if the given order matches the Pattern; does it start with F or S then B and then three digits. Like,
String[] arr = { "SB123", "FB124", "CBXXX", "FB1234" };
Pattern p = Pattern.compile("[SF]B\\d{3}");
for (String s : arr) {
Matcher m = p.matcher(s);
System.out.printf("%s %b%n", s, m.matches());
}
Outputs
SB123 true
FB124 true
CBXXX false
FB1234 false
Regex should do the trick.
Just run Pattern.matches() on the sequence ^((FB)|(SB){1})([0-9]{3})$
Something like
public class Matcher(string){
bool = Pattern.matches(^((FB)|(SB){1})([0-9]{3})$), string);
}
So to further enhance the Order Number validation:
String orderNumber = "fb323"; // The Order Number.
int minNumber = 100; // The min value that will ever be in a Order. Number
int maxNumber = 2500; // The max value that will ever be in a Order. Number
int curNumber = Integer.parseInt(orderNumber.replaceAll("\\D", ""));
if (orderNumber.matches("(?i)[SF]B\\d{3,}") && (curNumber >= minNumber && curNumber <= maxNumber)) {
System.out.println("VALID!");
}
else {
System.out.println("INVALID!");
}

Splitting a String that has a particular structure

I have a string that goes something like this
"330 Daniel T92435"
Now I need to obtain the name "Daniel", and I could simply just type
string.substring(4,11);
But the position where a name ("Daniel") is placed could vary.
And I don't want to use the split[] method.
I was thinking if there was a way to make the substring method read data until a whitespace is found.
If input string always has the following string structure "someSymbols Name someSymbols" you can use the following regular expression to extract the name:
"[^\\s]+\\s+(\\p{Alpha}+)\\s+[^\\s]+"
\\p{Alpha} - alphabetic character;
\\s - white space;
[^\\s] - any symbol apart from the white space.
In the code below Pattern is as object representing the regular expression. In turn, Matcher is a special object that is responsible for navigation over the given string and allows discovering the parts of this string that match the pattern.
public static String findName(String source) {
Pattern pattern = Pattern.compile("[^\\s]+\\s+(\\p{Alpha}+)\\s+[^\\s]+");
Matcher matcher = pattern.matcher(source);
String result = "no match was found";
if (matcher.find()) {
result = matcher.group(1); // group 1 corresponds to the first element enclosed in parentheses (\\p{Alpha}+)
}
return result;
}
main()
public static void main(String[] args) {
System.out.println(findName("330 Daniel T92435"));
}
Output
Daniel
You can use the str.indexOf(" ") function.
int start = string.indexOf(" ")+1;
string.substring(start,start + 7);
Edit: You can use
int start = string.indexOf(" ")+1;
int end = string.indexOf(" ", start+1);
string.substring(start,end >= 0 ? end : string.length());
if you want to select the first word and don't know how long it will be.

Java extracting substring from sentences

There are combination of words like is, is not, does not contain. We have to match these words in a sentence and have to split it.
Intput : if name is tom and age is not 45 or name does not contain tom then let me know.
Expected output:
If name is
tom and age is not
45 or name does not contain
tom then let me know
I tried below code to split and extract but the occurrence of "is" is in "is not" as well which my code is not able to find out:
public static void loadOperators(){
operators.add("is");
operators.add("is not");
operators.add("does not contain");
}
public static void main(String[] args) {
loadOperators();
for(String s : operators){
System.out.println(str.split(s).length - 1);
}
}
Since there could be multiple occurence of a word split wouldn't solve your use case, as in is and is not being different operators for you. You would ideally :
Iterate :
1. Find the index of the 'operator'.
2. Search for the next space _ or word.
3. Then update your string as substring from its index to length-1.
I am not entirely sure about what you try to achieve, but let's give it a shot.
For your case, a simple "workaround" might work just fine:
Sort the operators by their length, descending. This way the "largest match" will get found first. You can define "largest" as either literally the longest string, or preferably the number of words (number of spaces contained), so is a has precedence over contains
You'll need to make sure that no matches overlap though, which can be done by comparing all matches' start and end indices and discarding overlaps by some criteria, like first match wins
This code does what you seem to be wanting to do (or what I guessed you are wanting to do):
public static void main(String[] args) {
List<String> operators = new ArrayList<>();
operators.add("is");
operators.add("is not");
operators.add("does not contain");
String input = "if name is tom and age is not 45 or name does not contain tom then let me know.";
List<String> output = new ArrayList<>();
int lastFoundOperatorsEndIndex = 0; // First start at the beginning of input
for (String operator : operators){
int indexOfOperator = input.indexOf(operator); // Find current operator's position
if (indexOfOperator > -1) { // If operator was found
int thisOperatorsEndIndex = indexOfOperator + operator.length(); // Get length of operator and add it to the index to include operator
output.add(input.substring(lastFoundOperatorsEndIndex, thisOperatorsEndIndex).trim()); // Add operator to output (and remove trailing space)
lastFoundOperatorsEndIndex = thisOperatorsEndIndex; // Update startindex for next operator
}
}
output.add(input.substring(lastFoundOperatorsEndIndex, input.length()).trim()); // Add rest of input as last entry to output
for (String part : output) { // Output to console
System.out.println(part);
}
}
But it is highly dependant on the order of the sentence and the operators. If we're talking about user-input, the task will be much more complicated.
A better method using regular expressions (regExp) would be:
public static void main(String... args) {
// Define inputs
String input1 = "if name is tom and age is not 45 or name does not contain tom then let me know.";
String input2 = "the name is tom and he is 22 years old but the name does not contain jack, but merry is 24 year old.";
// Output split strings
for (String part : split(input1)) {
System.out.println(part.trim());
}
System.out.println();
for (String part : split(input2)) {
System.out.println(part.trim());
}
}
private static String[] split(String input) {
// Define list of operators - 'is not' has to precede 'is'!!
String[] operators = { "\\sis not\\s", "\\sis\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
// Concatenate operators to regExp-String for search
StringBuilder searchString = new StringBuilder();
for (String operator : operators) {
if (searchString.length() > 0) {
searchString.append("|");
}
searchString.append(operator);
}
// Replace all operators by operator+\n and split resulting string at \n-character
return input.replaceAll("(" + searchString.toString() + ")", "$1\n").split("\n");
}
Notice the order of the operators! 'is' has to come after 'is not' or 'is not' will always be split.
You can prevent this by using a negative lookahead for the operator 'is'.
So "\\sis\\s" would become "\\sis(?! not)\\s" (reading like: "is", not followed by a " not").
A minimalist Version (with JDK 1.6+) could look like this:
private static String[] split(String input) {
String[] operators = { "\\sis(?! not)\\s", "\\sis not\\s", "\\sdoes not contain\\s", "\\sdoes contain\\s" };
return input.replaceAll("(" + String.join("|", operators) + ")", "$1\n").split("\n");
}

How to find first occurance of whitespace(tab+space+etc) in java?

So I have something like this
System.out.println(some_string.indexOf("\\s+"));
this gives me -1
but when I do with specific value like \t or space
System.out.println(some_string.indexOf("\t"));
I get the correct index.
Is there any way I can get the index of the first occurrence of whitespace without using split, as my string is very long.
PS - if it helps, here is my requirement. I want the first number in the string which is separated from the rest of the string by a tab or space ,and i am trying to avoid split("\\s+")[0]. The string starts with that number and has a space or tab after the number ends
The point is: indexOf() takes a char, or a string; but not a regular expression.
Thus:
String input = "a\tb";
System.out.println(input);
System.out.println(input.indexOf('\t'));
prints 1 because there is a TAB char at index 1.
System.out.println(input.indexOf("\\s+"));
prints -1 because there is no substring \\s+ in your input value.
In other words: if you want to use the powers of regular expressions, you can't use indexOf(). You would be rather looking towards String.match() for example. But of course - that gives a boolean result; not an index.
If you intend to find the index of the first whitespace, you have to iterate the chars manually, like:
for (int index = 0; index < input.length(); index++) {
if (Character.isWhitespace(input.charAt(index))) {
return index;
}
}
return -1;
Something of this sort might help? Though there are better ways to do this.
class Sample{
public static void main(String[] args) {
String s = "1110 001";
int index = -1;
for(int i = 0; i < s.length(); i++ ){
if(Character.isWhitespace(s.charAt(i))){
index = i;
break;
}
}
System.out.println("Required Index : " + index);
}
}
Well, to find with a regular expression, you'll need to use the regular expression classes.
Pattern pat = Pattern.compile("\\s");
Matcher m = pat.matcher(s);
if ( m.find() ) {
System.out.println( "Found \\s at " + m.start());
}
The find method of the Matcher class locates the pattern in the string for which the matcher was created. If it succeeds, the start() method gives you the index of the first character of the match.
Note that you can compile the pattern only once (even create a constant). You just have to create a Matcher for every string.

Regex to allow only one punctuation character in Java string

I need to parse raw data and allow strings that can contain alphabets and ONLY one punctuation character.
Here is what I have done so far:
public class ProcessRawData {
public static void main(String[] args) {
String myData = "Australia India# America#!";
ProcessRawData data = new ProcessRawData();
data.process(myData);
}
public void process(String rawData) {
String[] splitData = rawData.split(" ");
for (String s : splitData) {
System.out.println("My Data Elements: " + s);
Pattern pattern = Pattern.compile("^[\\p{Alpha}\\p{Punct}]*$");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
}
}
It prints below,
My Data Elements: Australia
Allowed
My Data Elements: India#
Allowed
My Data Elements: America#!
Allowed
Expected is it should NOT print America#! as it contains more than one punctuation character.
I guess I might need to use quantifiers, but not sure where to place them so that it will allow ONLY one punctuation character?
Can someone help?
You should compile your Pattern outside the loop.
When using matches(), there's no need for ^ and $, since it'll match against the entire string anyway.
If you need at most one punctuation character, you need to match a single optional punctuation character, preceded and/or followed by optional alphabet characters.
Note that using \\p{Alpha} and \\p{Punct} excludes digits. No digit will be allowed. If you want to consider a digit as a special character, replace \\p{Punct} with \\P{Alpha} (uppercase P means not Alpha).
public static void main(String[] args) {
process("Australia India# Amer$ca America#! America1");
}
public static void process(String rawData) {
Pattern pattern = Pattern.compile("\\p{Alpha}*\\p{Punct}?\\p{Alpha}*");
for (String s : rawData.split(" ")) {
System.out.println("My Data Elements: " + s);
if (pattern.matcher(s).matches()) {
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
}
Output
My Data Elements: Australia
Allowed
My Data Elements: India#
Allowed
My Data Elements: Amer$ca
Allowed
My Data Elements: America#!
Not allowed
My Data Elements: America1
Not allowed
You may use
^\\p{Alpha}*(?:\\p{Punct}\\p{Alpha}*)?$
Explanation:
^ - start of string
\\p{Alpha}* - zero or more letters
(?:\\p{Punct}\\p{Alpha}*)? - one or zero (due to the ? quantifier) sequences of:
\\p{Punct} - a single occurrence of a punctuation symbol
\\p{Alpha}* - zero or more letters
$ - end of string.
Using it with String#matches will allow dropping the ^ and $ anchors since the pattern will then be anchored by default:
if (input.matches("\\p{Alpha}*(?:\\p{Punct}\\p{Alpha}*)?")) { ... }
You can do it with a simple negative look-ahead:
((?!\\p{Punct}{2}).)*
So your code becomes simply:
public void process(String rawData) {
if (input.matches("((?!\\p{Punct}{2}).)*"))
System.out.println("Allowed");
} else {
System.out.println("Not allowed");
}
}
The regex just asserts that each character is not a {Punct} followed by another {Punct}.
I hope that would be helpful.
public static void process(String rawData) {
String[] splitData = rawData.split(" ");
for (String s : splitData) {
Pattern pNum = Pattern.compile("[0-9]");
Matcher match = pNum.matcher(s);
if (match.find()) {
System.out.println(s + ": Not Allowed");
continue;
}
Pattern p = Pattern.compile("[^a-z]", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(s);
int count = 0;
while (m.find()) {
count = count + 1;
}
if (count > 1) {
System.out.println(s + ": Not Allowed");
} else {
System.out.println(s + ": Allowed");
}
}
}
Output
Australia: Allowed
India#: Allowed
America#!: Not Allowed
America1: Not Allowed
Alright! edit again
You can use following regex
^[A-Za-z]*[!"\#$%&'()*+,\-.\/:;<=>?#\[\\\]^_`{|}~]?[A-Za-z]*$
Regex
This will work for only one punctuation residing at any place.

Categories

Resources