Regular Expression is failing in java - java

I have a file name which starts with %payslip%.xml.gpg.
Below are the possible file name example :
Taswkly_payslips_Pay27.xml.gpg
exec_payslip.xml.gpg
Cairns_payslips_adv_P27.xml.gpg
Could you please help me suggesting the regex for above pattern name.
In the above pattern below things are fixed i.e.
*payslip*.xml.gpg.
Any help would be appreciated.

You can use this regex:
^.*payslip.*\.xml\.gpg$
^ start of the line
.* any character multiple times
payslip the string "payslip"
.* any character multiple times
\. the "." character
xml the string "xml"
\. the "." character
gpg the string "gpg"
$ end of the line
Also don't forget to escape it in java
^.*payslip.*\\.xml\\.gpg$
Working example

Related

Java Regex to validate group field pattern example - abc.def.gh1

I am just writing some piece of java code where I need to validate groupId (maven) passed by user.
For example - com.fb.test1.
I have written regex which says string should not start and end with '.' and can have alphanumeric characters delimited by '.'
[^\.][[a-zA-Z0-9]+\\.{0,1}]*[a-zA-Z0-9]$
But this regex not able to find out consecutive '.' For example - com..fb.test. I have added {0,1} followed by decimal to restrict it limitation to 1 but it didnt work.
Any leads would be highly appreciated.
The quantifier {0,1} and the dot should not be in the character class, because you are repeating the whole character class allowing for 0 or more dots, including { , } chars.
You can also exclude a dot to the left using a negative lookbehind instead of matching an actual character that is not a dot.
In Java you could write the pattern as
(?<!\\.)[a-zA-Z0-9]+(?:\\.[a-zA-Z0-9]+)+[a-zA-Z0-9]$
Note that the $ makes sure that that match is at the end of the string.
Regex demo

Problem with regex creation if excape character is at the end of the parameter value

I get three parameters in a string. Each parameter is written in the form: Quotes, Name, Quotes, Equals sign, Quotes, Text, Quotes. The parameter separator is a space.
Example 1:
"param1"="Peter" "param2"="Harald" "param3"="Marie"
With java.util.regex.Matcher I can find any name and text by the following regex:
"([^"]*)"\s*=\s*"([^"]*)"
Now, however, there may be a quotation mark in the text. This is masked by a backslash.
Example 2:
"param1"="Peter" "param2"="Har\"ald" "param3"="Marie"
I have built the following regex:
"([^"]*)"\s*=\s*("([^"]*(\\")*[^"]*)*[^\\]")
This works well for example 2, but is not a universal solution.
If the backslash is at the end of a parameter-value, the solution does not work anymore.
Example 3:
"param1"="Peter" "param2"="Harald\" "param3"="Marie"
If the backslash is at the end of the value, the matcher interprets "Harald\" " as the value of parameter 2 instead of "Harald\".
Do you have a universal solution for this problem? Thanks in advance for your input.
Kind regards
Dominik
You may use this regex in Java:
\"([^\"]*)\"\h*=\h*(\"[^\\\"]*(?:\\(?=\"(?:\h|$))|(?:\\.[^\\\"]*))*\")
RegEx Demo
RegEx Demo:
\"([^\"]*)\": Match quoted string a parameter name
\h*=\h*: Match = surrounded with optional spaces
(: Start capture group #1
\": Match opening "
[^\\\"]*: Match 0 or more of non-quote, non-backslash characters
(?::
\\: Match a \
(?=\"(?:\h|$)): Must be followed by a " that has a whitespace or line afterwards
|: OR
(?:\\.[^\\\"]*))*: Match an escaped character followed by 0 or more of non-quote, non-backslash characters
\": Match closing "
): End capture group #1

Regular expression for matching texts before and after string

I would like to match URL strings which can be specified in the following manner.
xxx.yyy.com (For example, the regular expression should match all strings like 4xxx.yyy.com, xxx4.yyy.com, xxx.yyy.com, 4xxx4.yyy.com, 444xxx666.yyy.com, abcxxxdef.yyy.com etc).
I have tried to use
([a-zA-Z0-9]+$)xxx([a-zA-Z0-9]+$).yyy.com
([a-zA-Z0-9]*)xxx([a-zA-Z0-9]*).yyy.com
But they don't work. Please help me write a correct regular expression. Thanks in advance.
Note: I'm trying to do this in Java.
If you want to make sure there is xxx and you want to allow all non whitespace chars before and after. If you want to match the whole string, you could add anchors at the start and end.
Note to escape the dot to match it literally.
^\S*xxx\S*\.yyy\.com$
^ Start of string
\S*xxx\S* Match xxx between optional non whitespace chars
\.yyy Match .yyy
\.com Match .com
$ End of string
Regex demo
In Java double escape the backslash
String regex = "^\\S*xxx\\S*\\.yyy\\.com$";
Or specify the characters on the left and right that you would allow to match in the character class:
^[0-9A-Za-z!##$%^&*()_+]*xxx[0-9A-Za-z!##$%^&*()_+]*\.yyy\.com$
Regex demo

Illegal escape character error in Java regex

I've read the manual, and at the end there was an exercise:
Use a backreference to write an expression that will match a person's name only if that person's first name and last name are the same.
I've written the next program http://pastebin.com/YkuUuP5M
But when I compile it, I'm getting an error:
PersonName.java:18: illegal escape character
p = Pattern.compile("([A-Z][a-zA-Z]+)\s+\1");
^
If I rewrite 18 line in this way:
pattern = Pattern.compile(console.readLine("%nEnter your regex: "));
and write the pattern in the console, then the program works fine. Why I can't use the pattern as in the 1st program case and is there some way to fix it?
You want to get this text into a string:
([A-Z][a-zA-Z]+)\s+\1
However, \ in a string literal in Java source code is the character used for escaping (e.g. "\t" for tab). Therefore you need to use "\" in a string literal to end up with a single backslash in the resulting string. So you want:
"([A-Z][a-zA-Z]+)\\s+\\1"
Note that there's nothing regular-expression-specific to this. Any time you want to express a string containing a backslash in a Java string literal, you'll need to escape that backslash. Regular expressions and Windows filenames are just the most common cases for that.

What does this regular expression mean?

In a recent interview I was asked to decipher this regex
^\^[^^]
Can you please help me with it. Also please provide some links where I can learn regex for interviews.
It matches strings that begin with ^ followed by any character other than ^.
So it would match:
^foo
^b
but not
foo
^^b
Explanation:
Caret (^) is a regex meta character with two different meanings:
Outside the character class(1st use in your regex) it works as start anchor.
Inside the character class it acts like negator if used as the first character of the character class(3rd use in your regex).
Preceding a regex with \ escapes it (makes it non-special). The 2nd use of ^ in your regex is escaped and it matches a literal ^ in the string.
Inside a character class a ^ which is not the first character of the character class is treated literally. So the 4th use in your regex is a literal ^.
Some more examples to make it clear:
^a : Matches string beginning
with a
^ab : Matches string beginning
with a followed by b
[a] : Matches a string which
has an a
[^a] : Matches a string which
does not have an a
^a[^a] : Matches a string
beginning with an a followed by any
character other than a.
I'm testing this regex here however it does not seem to be valid.
The first ^ denotes the start of the line.
The first \ escapes the following \.
Thus the second "^" is not escaped
Finally the first caret inside the square brackets [^ acts as the negation and second one ^] is not escaped as a result is not valid.
IMHO the correct regexp should be ^\^[^\^]
Guys, kindly confirm. Many thanks
Match beginning of line or string
followed by a literal \
followed by the beginning of the line or string
followed by any character that is not a space, return or new line character
The first ^ is the beginning of line.
The second one is a literal character of ^ (\ is to escape the other usual meaning of ^)
The third one is to say
a class of characters which does not include the character ^
Some example to show using Ruby:
ruby-1.9.2-p0 > "hello" =~ /^h/ # it found a match at position 0
=> 0
ruby-1.9.2-p0 > "hello" =~ /^e/ # nil means can't find it
=> nil
ruby-1.9.2-p0 > "he^llo" =~ /\^/ # found at position 2
=> 2
ruby-1.9.2-p0 > "he^llo"[/[^^]*/] # anything repeatedly but not including the ^ character
=> "he"

Categories

Resources