Matching a whitespace or emptry string using regex in Java - java

I have this regex in java
String pattern = "(\\s)(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})(\\s)";
It works as intended but I have a new problem to get some valid dates:
1st problem:
If I have this String It was at 22-febrero-1999 and 10-enero-2009 and 01-diciembre-2000 I should get another string as febrero-enero-diciembre and I only get febrero-enero
2nd problem
If I have a single date in a String like 12-octubre-1989 I get an emptry String.
Why I have in my pattern to have whitespaces in the start and end of any date? because I have to catch only valid months in a String like adsadasd 12-validMonth-2999 asd 11-validMonth-1989 I should get both validMonth, then never get a validMonth in a String like asdadsad12-validMonth-1989 asdadsad 23-validMonth-1989 in the last one I only should get the last validMonth
PD: My java code is
String resultado = "";
String pattern = "(\\s)(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})(\\s)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(fecha);
while (m.find()) {
resultado += m.group().split("-")[1] + "-";
}
return (resultado.compareTo("") == 0 ? "" : resultado.substring(0, resultado.length() - 1));

You might want to use a word boundary instead:
\\b(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})\\b
And I believe some of the months can be optimized a little bit (it could reduce readability unfortunately, but should speed things up by a notch):
\\b(\\d{2}-)((?:en|febr)ero|ma(?:rz|y)o|abril|ju[ln]io|agosto|(?:septiem|octu|noviem|diciem)bre)(-\\d{4})\\b

Perhaps try using a \b instead of \s:
String pattern = "\\b(\\d{2}-)(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(-\\d{4})\\b";
This will only match strings where the first digit is not preceded by another word character (digit, letter, or underscore), and the last digit is not followed by a word character. I've also removed the capturing groups around the \b, because it would always be a zero-length string, if matched.

I wouldn't use a word boundry as a delimeter.
I'd suggest to use either whitespace or NOT digit,
or no delimeter and put in a validation range of numbers for day/year.
This way you may catch more embeded dates that are in close
proximity (adjacent) to letters and underscore.
Something like:
# "(?<!\\d)\\d{2}-(?:enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)-\\d{4}(?!\\d)"
(?<! \d ) # Not a digit before us
\d{2} - # Two digits followed by dash
(?: # A month
enero
| febrero
| marzo
| abril
| mayo
| junio
| julio
| agosto
| septiembre
| octubre
| noviembre
| diciembre
)
- \d{4} # Dash followed by four digits
(?! \d ) # Not a digit after us

Related

Masking credit card number using regex

I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!

Using Regex Pattern and Matcher to print out number once only

This piece of code is part of my programme and I am trying to print out the last integer value of the string only whenever the operator and the equals sign are together (e.g. ^=, *=, etc.).
Hence, if I enter 4 4 ^ 4 ^ 4 ^=, I would only want to print out "4". The same counts if the number 4 is directly before the "^=", e.g. 4 4 ^ 4 ^ 4^=.
My code is this:
if ((input.endsWith("^=")) | (input.endsWith("*=")) |
(input.endsWith("+=")) | (input.endsWith("-=")) |
(input.endsWith("%=")) | (input.endsWith("/=")))
{
Pattern p = Pattern.compile("[^\\d]*[\\d]+[^\\d]+([\\d]+)");
Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(1)); // second matched digits
}
}
Currently my code prints out the number 4 multiple times, but I would only want to print it once. Any help is is appreciate.
Thank you!
You might use:
([0-9]+)\h*[\^+%/*-]=(?!.*[\^+%/*-]=)
([0-9]+) Capture 1+ digits 0-9 in group 1
\h* Match 0+ horizontal whitespace chars
[\^+%/*-]= Match any of the listed followed by =
(?!.*[\^+%/*-]=) Negative lookahead, assert what is on the right does not contain an operator followed by an equals sign
Regex demo | Java demo
In Java
final String regex = "([0-9]+)\\h*[\\^+%/*-]=(?!.*[\\^+%/*-]=)";
Try
(\d+)\s*[-+*/^%]=$
Find 1 or more digits and capture them
if they're followed by 0 or more spaces
followed by -, +, *, /, ^ or %
followed by =
followed by the end of the string

Regex to find Integers in particular string lines

I have this regex to find integers in a string (newlines). However, I want to filtrate this. I want the regex to find the number in certain lines, and not others.
String:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(test);
System.out.println();
if (match.find()) {
System.out.println("Found: " + match.group());
}
This regex find the integers after the comma, for all the lines. If I want a particular regex to find the integers in the line containing "test1", "test2", and "test3". How should I do this? I want to create three different regex, but my regex skills are weak.
First regex should print out 2. The second 8 and the third 3.
You can expand your pattern to include test[123] in the lookbehind, which would match test1, test2, or test3:
String pattern = "(?<=test[123][^,]{0,100},[^,]{1,100},)\\d+";
Pattern pr = Pattern.compile(pattern);
Matcher match = pr.matcher(test);
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
The ,[^,] portion skis everything between two commas that follow testN.
I use {0,100} in place of * and {1,100} in place of + inside lookbehind expressions, because Java regex engine requires that lookbehinds had a pre-defined limit on their length. If you need to allow skipping more than 100 characters, adjust the maximum length accordingly.
Demo.
You can use the following Pattern and loop for this:
String test= "ytrt.ytrwyt.ytreytre.test1,0,2,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test2,0,8,0"
+ System.getProperty("line.separator")
+"sfgtr.ytyer.qdfre.uyeyrt.test3,0,3,0";
// | "test" literal
// | | any number of digits
// | | | comma
// | | | any number of digits
// | | | | comma
// | | | | | group1, your digits
Pattern p = Pattern.compile("test\\d+,\\d+,(\\d+)");
Matcher m = p.matcher(test);
while (m.find()) {
// prints back-reference to group 1
System.out.printf("Found: %s%n", m.group(1));
}
Output
Found: 2
Found: 8
Found: 3
You could also use capturing groups to extract the test number and the other number from the string:
String pattern = "test([123]),\\d+,(\\d+),";
...
while (match.find()) {
// get and parse the number after "test" (first capturing group)
int testNo = Integer.parseInt(match.group(1));
// get and parse the number you wanted to extract (second capturing group)
int num = Integer.parseInt(match.group(2));
System.out.println("test"+testNo+": " + num);
}
Which prints
test1: 2
test2: 8
test3: 3
Note: In this example parsing the strings is only done for demonstration purposes, but it could be useful, if you want to do something with the numbers, like storing them in a array.
Update: If you also want to match strings like "ytrt.ytrwyt.test1.ytrwyt,0,2,0" you could change pattern to "test([123])\\D*,\\d+,(\\d+)," to allow any number of non-digits to follow test1, test2 or test3 (preceding the comma seperated ints).

regular expression that starts with an underscore but not contains any underscore

I am trying to fetch the name of a file without the part from the last underscore until the end.
For example,
ABC_AA.xml should be ABC
and
ABC_AASD_AD_AA.xml should be ABC_AASD_AD
I am thinking about using non-greedy with exlusive ^ symbol.
I have tried this:
String nameToSearch = testName.replaceAll("_(^(_).)+\\.xml$", "");
How about using simple substring instead of regex
String nameToSearch = testName.substring(0, testName.lastIndexOf('_'));
or in case there can be no _ you can use
String noSuffix = testName.substring(0, testName.lastIndexOf('.'));//remove ".xml"
String nameToSearch = noSuffix.substring(0, testName.lastIndexOf('_'));
But if you really want to use regex then you can try with
testName.replaceAll("_[^_]*[.]xml$", "");
which will match (and remove) _ which has zero or more non _ characters [^_]* and ends with .xml.
In case there can be no _ you can make _[^_]* optional with
testName.replaceAll("(_[^_]*)?[.]xml$", "");
Simple.
Use groups and back-references, as such:
String input = "ABC_AASD_AD_AA.xml";
// | using replaceAll to pass regex
// | | group 1: one or more characters, greedy
// | | | underscore
// | | || one or more characters, reluctant
// | | || | escaped dot and extension
// | | || | | back-reference to group 1
System.out.println(input.replaceAll("(.+)_.+?\\.xml", "$1"));
Output
ABC_AASD_AD
Note
Any input not conforming to the Pattern will be returned as such.
I believe this regex should work:
String repl = str.replaceFirst("_[^_]+$", "");
The ^ character can be used as "exclusive", i.e. to exclude certain characters, only as the first character of a character class inside []. [^_] matches any character that's not an underscore. Outside of square brackets, it means "the beginning of the source string".
So you're close. Try this:
String nameToSearch = testName.replaceAll("_[^_]+\\.xml$", "");
Or, if you want to handle file names ending in underscore (i.e. change ABC_.XML to ABC), and remove the underscore in that case, change + (1 or more) to * (0 or more).

Replacing variable numbers of items... regex?

Ok... I have an unsatisfactory solution to a problem.
The problem is I have input like so:
{sup 19}F({sup 3}He,t){sup 19}Ne(p){sup 18}F
and need output like so:
¹⁹F(³He,t)¹⁹Ne(p)¹⁸F
I use a series of replacements first to split each of the {sup xx} sections into {sup x}{sup x} and then use a regex to match each of those and replace the characters with their UTF-8 single equivalents. The "problem" is that the {sup} sections can have numbers 1, 2 or 3 digits long (maybe more, I don't know), and I want to "expand" them into separate {sup} sections with one digit each. ( I also have the same problem with {sub} for subscripts... )
My current solution looks like this (in java):
retval = retval.replaceAll("\\{sup ([1-9])([0-9])\\}", "{sup $1}{sup $2}");
retval = retval.replaceAll("\\{sup ([1-9])([0-9])([0-9])\\}", "{sup $1}{sup $2}{sup $3}");
My question: is there a way to do this in a single pass no matter how many digits ( or at least some reasonable number ) there are?
Yes, but it may be a bit of a hack, and you'll have to be careful it doesn't overmatch!
Regex:
(?:\{sup\s)?(\d)(?=\d*})}?
Replacement String:
{sup $1}
A short explanation:
(?: | start non-capturing group 1
\{ | match the character '{'
sup | match the substring: "sup"
\s | match any white space character
) | end non-capturing group 1
? | ...and repeat it once or not at all
( | start group 1
\d | match any character in the range 0..9
) | end group 1
(?= | start positive look ahead
\d | match any character in the range 0..9
* | ...and repeat it zero or more times
} | match the substring: "}"
) | stop negative look ahead
} | match the substring: "}"
? | ...and repeat it once or not at all
In plain English: it matches a single digit, only when looking ahead there's a } with optional digits in between. If possible, the substrings {sup and } are also replaced.
EDIT:
A better one is this:
(?:\{sup\s|\G)(\d)(?=\d*})}?
That way, digits like in the string "set={123}" won't be replaced. The \G in my second regex matches the spot where the previous match ended.
The easiest way to do this kind of thing is with something like PHP's preg_replace_callback or .NET's MatchEvaluator delegates. Java doesn't have anything like that built in, but it does expose the lower-level API that lets you implement it yourself. Here's one way to do it:
import java.util.regex.*;
public class Test
{
static String sepsup(String orig)
{
Pattern p = Pattern.compile("(\\{su[bp] )(\\d+)\\}");
Matcher m = p.matcher(orig);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement(sb, "");
for (char ch : m.group(2).toCharArray())
{
sb.append(m.group(1)).append(ch).append("}");
}
}
m.appendTail(sb);
return sb.toString();
}
public static void main (String[] args)
{
String s = "{sup 19}F({sup 3}He,t){sub 19}Ne(p){sup 18}F";
System.out.println(s);
System.out.println(sepsup(s));
}
}
result:
{sup 19}F({sup 3}He,t){sub 19}Ne(p){sup 18}F
{sup 1}{sup 9}F({sup 3}He,t){sub 1}{sub 9}Ne(p){sup 1}{sup 8}F
If you wanted, you could go ahead and generate the superscript and subscript characters and insert those instead.
Sure, this is a standard Regular Expression construct. You can find out about all the metacharacters in the Pattern Javadoc, but for your purposes, you probably want the "+" metacharacter, or the {1,3} greedy quantifier. Details in the link.

Categories

Resources