I want to know if this is the right pattern for matches string like following
String samples
23.04.2019-30.04.2019
3.06.2019-20.06.2019
Pattern
private final Pattern TIMELINE_PATTERN = Pattern.compile("^\\d{2}.\\d{2}.\\d{4}-\\d{2}.\\d{2}.\\d{4}$");
If the day/month components could be one or two digit characters, then you should use this pattern:
^\d{1,2}\.\d{1,2}\.\d{4}-\d{1,2}\.\d{1,2}\.\d{4}$
Demo
Presumably the years might also not be fixed width, but it is probably unlikely that a year earlier than 1000 would appear, so we can fix the year at 4 digits. Also, literal dot in a regex pattern needs to be escaped with a backslash.
Edit:
If you want to first validate the string, and then separate the two dates, then consider this:
String input = "3.06.2019-20.06.2019";
if (input.matches("\\d{1,2}\\.\\d{1,2}\\.\\d{4}-\\d{1,2}\\.\\d{1,2}\\.\\d{4}")) {
String[] dates = input.split("-");
System.out.println("date1: " + dates[0]);
System.out.println("date2: " + dates[1]);
}
Two problems in your current regex,
First quantifier needs to be {1,2} instead of just {2} to support either one digit or two
You need to escape dot
The correct regex you need to use should be this,
^\d{1,2}\.\d{2}\.\d{4}-\d{2}\.\d{2}\.\d{4}$
Regex Demo
Java code,
List<String> list = Arrays.asList("23.04.2019-30.04.2019", "3.06.2019-20.06.2019");
list.forEach(x -> {
System.out.println(x + " --> " + x.matches("^\\d{1,2}\\.\\d{2}\\.\\d{4}-\\d{2}\\.\\d{2}\\.\\d{4}$"));
});
Prints,
23.04.2019-30.04.2019 --> true
3.06.2019-20.06.2019 --> true
Related
I would like to partially mask data using regex. Here is the input :
123-12345-1234567
And here is what I'd like as output :
1**-*****-*****67
I figure out how to replace for the last group but I don't know to do for the rest of the data.
String s = "123-12345-1234567";
System.out.println(s.replaceAll("\\d(?=\\d{2})", "*")); // output is *23-***45-*****67
Also, I'd like to use only regex because I have different type of data, so different type of mask. I don't want to create functions for each type of data.
For example :
AAAAAAAAA // becomes ********AA
12334567 // becomes 123******
Thanks for your help !
We can use the following regex replacement approach:
String input = "123-12345-1234567";
String output = input.substring(0, 1) +
input.substring(1, input.length()-2).replaceAll("\\d", "*") +
input.substring(input.length()-2);
System.out.println(output); // 1**-*****-*****67
Here we concatenate together the first digit, followed by the middle portion with all digits replaced by *, along with the final two digits.
Edit: A pure regex solution, which, however, is more lines of code than the above and might be less performant.
String input = "123-12345-1234567";
String pattern = "^(\\d)(.*)(\\d{2})$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
String output = m.group(1) + m.group(2).replaceAll("\\d", "*") + m.group(3);
System.out.println(output); // 1**-*****-*****67
}
Java supports a fixed quantifier in a lookbehind, so what you might do is use a pattern with an alternation to account for the different scenario's if you must use a regex only.
Using the lookarounds you can select a single character to be replaced by *
Note that this is hard to maintain, and it would be a better option to write separate functions for the different data formats using separate patterns or string functions (perhaps accompanied by unit tests)
(?<=^\d{3,7})\d(?=\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$)|\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?)
The separate parts match:
(?<=^\d{3,7})\d(?=\d*$) Match a digit asserting 3-7 digits to the left and only digits to the right
| Or
(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$) Match A-Z asserting 0-6 chars to the left and only chars A-Z to the right
| Or
\d(?<=^\d{2,3})(?=\d?-\d{5}-\d{7}$) Match a digit asserting 2-3 digits to the left and optional digit, - with 5 digits and - with 7 digits to the right
| Or
\d(?<=^\d{3}-\d{1,5}(?:-\d{1,5})?) Match a digit asserting 3 digits to the left followed - and 1-5 digits and optionally - with 1-5 digits
Regex demo | Java demo
String regex = "(?<=^\\d{3,7})\\d(?=\\d*$)|(?<=^[A-Z]{0,6})[A-Z](?=[A-Z]*$)|\\d(?<=^\\d{2,3})(?=\\d?-\\d{5}-\\d{7}$)|\\d(?<=^\\d{3}-\\d{1,5}(?:-\\d{1,5})?)";
String s1 = "123-12345-1234567";
String s2 = "AAAAAAAAA";
String s3 = "12334567";
System.out.println(s1.replaceAll(regex, "*"));
System.out.println(s2.replaceAll(regex, "*"));
System.out.println(s3.replaceAll(regex, "*"));
Output
1**-*****-*****67
*******AA
123*****
public static void main(String[] args) {
System.out.println("123-12345-1234567".replaceAll("(?<=.{1,})\\d(?=.{3,})", "*"));
System.out.println("AAAAAAAAA".replaceAll(".(?=.{2,})", "*"));
System.out.println("12334567".replaceAll("(?<=.{3,}).", "*"));
}
output:
1**-*****-*****67
*******AA
123*****
I'm trying to create a regex that will allow only digits followed by only one character after every digit within a Textfield
Regex that needs to match - \d*\+{1}
Regex in case it does not match - [^\d*\+){1}] will replace with "" (removes everything else)
final String regexFinalInteger = "\\d*\\+{1}";
numberElements.textProperty().addListener((observable, oldValueE, newValueE) -> {
if (!newValueE.matches(regexFinalInteger)) {
numberElements.setText(newValueE.replaceAll("[^\\d*\\+){1}]", ""));
}
});
I will expect an output of 122+1+3 but the actual output can be 1++2+++4+123 (multiple +)
If I understood correctly, you want to replace multiple +s with only one.
I believe this would enable what you're looking for:
String regex = "[+](?=[+])";
String text = "122+1+3";
assertEquals("122+1+3", text.replaceAll(regex, ""));
text = "1++2+++4+123";
assertEquals("1+2+4+123", text.replaceAll(regex, ""));
That is my first Java program, I'm sorry if it offends someone.
I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}
I've got a string in my Java project which looks something like this
9201,92710,94500,920,1002
How can I enter a dot 2 places before the comma? So it looks like
this:
920.1,9271.0,9450.0,92.0,100.2
I had an attempt at it but I can't get the last number to get a dot.
numbers = numbers.replaceAll("([0-9],)", "\\.$1");
The result I got is
920.1,9271.0,9450.0,92.0,1002
Note: The length of the string is not always the same. It can be longer / shorter.
Check if string ends with ",". If not, append a "," to the string, run the same replaceAll, remove "," from end of String.
Split string by the "," delimiter, process each piece adding the "." where needed.
Just add a "." at numbers.length-1 to solve the issue with the last number
As your problem is not only inserting the dot before every comma, but also before end of string, you just must add this additional condition to your capturing group:
numbers = numbers.replaceAll("([0-9](,|$))", "\\.$1");
As suggested by Siguza, you could as well use a non-capturing group which is even more what a "human" would expect to be captured in the capturing group:
numbers = numbers.replaceAll("([0-9](?:,|$))", "\\.$1");
But as a non-capturing group is (although a really nice feature) not standard Regex and the overhead is not that significant here, I would recommend using the first option.
You could use word boundary:
numbers = numbers.replaceAll("(\\d)\b", ".$1");
Your solution is fine, as long as you put a comma at the end like dan said.
So instead of:
numbers = numbers.replaceAll("([0-9],)", "\\.$1");
write:
numbers = (numbers+",").replaceAll("([0-9],)", "\\.$1");
numbers = numbers.substring(0,numbers.size()-1);
You may use a positive lookahead to check for the , or end of string right after a digit and a zeroth backreference to the whole match:
String s = "9201,92710,94500,920,1002";
System.out.println(s.replaceAll("\\d(?=,|$)", ".$0"));
// => 920.1,9271.0,9450.0,92.0,100.2
See the Java demo and a regex demo.
Details:
\\d - exactly 1 digit...
(?=,|$) - that must be before a , or end of string ($).
A capturing variation (Java demo):
String s = "9201,92710,94500,920,1002";
System.out.println(s.replaceAll("(\\d)(,|$)", ".$1$2"));
You where right to go for the replaceAll method. But your regex was not matching the end of the string, the last set of numbers.
Here is my take on your problem:
public static void main(String[] args) {
String numbers = "9201,92710,94500,920,1002";
System.out.println(numbers.replaceAll("(\\d,|\\d$)", ".$1"));
}
the regex (\\d,|\\d$) matches a digit followed by a comma \d,, OR | a digit followed by the end of the string \d$.
I have tested it and found to work.
As others have suggested you could add a comma at the end, run the replace all and then remove it. But it seems as extra effort.
Example:
public static void main(String[] args) {
String numbers = "9201,92710,94500,920,1002";
//add on the comma
numbers += ",";
numbers = numbers.replaceAll("(\\d,)", "\\.$1");
//remove the comma
numbers = numbers.substring(0, numbers.length()-1);
System.out.println(numbers);
}
I have an extremely long string that I want to parse for a numeric value that occurs after the substring "ISBN". However, this grouping of 13 digits can be arranged differently via the "-" character. Examples: (these are all valid ISBNs) 123-456-789-123-4, OR 1-2-3-4-5-67891234, OR 12-34-56-78-91-23-4. Essentially, I want to use a regex pattern matcher on the potential ISBN to see if there is a valid 13 digit ISBN. How do I 'ignore' the "-" character so I can just regex for a \d{13} pattern? My function:
public String parseISBN (String sourceCode) {
int location = sourceCode.indexOf("ISBN") + 5;
String ISBN = sourceCode.substring(location); //substring after "ISBN" occurs
int i = 0;
while ( ISBN.charAt(i) != ' ' )
i++;
ISBN = ISBN.substring(0, i); //should contain potential ISBN value
Pattern pattern = Pattern.compile("\\d{13}"); //this clearly will find 13 consecutive numbers, but I need it to ignore the "-" character
Matcher matcher = pattern.matcher(ISBN);
if (matcher.find()) return ISBN;
else return null;
}
Alternative 1:
pattern.matcher(ISBN.replace("-", ""))
Alternative 2: Something like
Pattern.compile("(\\d-?){13}")
Demo of second alternative:
String ISBN = "ISBN: 123-456-789-112-3, ISBN: 1234567891123";
Pattern pattern = Pattern.compile("(\\d-?){13}");
Matcher matcher = pattern.matcher(ISBN);
while (matcher.find())
System.out.println(matcher.group());
Output:
123-456-789-112-3
1234567891123
Try this:
Pattern.compile("\\d(-?\\d){12}")
Use this pattern:
Pattern.compile("(?:\\d-?){13}")
and strip all dashes from the found isbn number
Do it in one step with a pattern recognizing everything, and optional dashes between digits. No need to fiddle with ISBN offset + substrings.
ISBN(\d(-?\d){12})
If you want the raw number, strip dashes from the first matched subgroup afterwards.
I am not a Java guy so I won't show you code.
If you're going to be calling the method a lot, the best thing you can do is not compile the Pattern inside it. Otherwise, each time you call the method you'll spend more time creating the regex than you will actually searching for it.
But after looking at your code again, I think you have a bigger problem, performance-wise. All that business of locating "ISBN" and then creating substrings to apply the regex to is completely unnecessary. Let the regex do that stuff; it's what they're for. The following regex finds the "ISBN" sentinel and the following thirteen digits, if they're there:
static final Pattern isbnPattern = Pattern.compile(
"\\bISBN[^A-Z0-9]*+(\\d(?:-*+\\d){12})", Pattern.CASE_INSENSITIVE );
The [^A-Z0-9]*+ gobbles up whatever characters may appear between the "ISBN" and the first digit. The possessive quantifier (*+) prevents needless backtracking; if the next character is not a digit, the regex engine immediately quits that match attempt and resumes scanning for another "ISBN" instance.
I used another possessive quantifier for the optional hyphens, plus a non-capturing group ((?:...)) for the repeated portion; that gives another slight performance gain over the capturing groups most of the other responders are using. But I used a capturing group for the whole number, so it can be extracted from the overall match easily. With these changes, your method reduces to this:
public String parseISBN (String source) {
Matcher m = isbnPattern.matcher(source);
return m.find() ? m.group(1) : null;
}
...and it's much more efficient, too. Note that we haven't addressed how the strings are getting into memory. If you're doing the I/O yourself, it's possible there are significant performance gains to be achieved in that area, too.
You can strip out the dashes with string manipulation, or you could use this:
"\\b(?:\\d-?){13}\\b"
It has the added bonus of making sure the string doesn't start or end with -.
Try stripping the dashes out, and regex the new string
you can try this
"(?:[0-9]{9}[0-9X]|[0-9]{13}|[0-9][0-9-]{11}[0-9X]|[0-9][0-9-]{15}[0-9])(?![0-9-])"