Java conditional regex - java

I have such text:
120.65UAH Produkti Kvartal
5*14 14:24
Bal. 16603.52UAH
What I want to do:
If this text contains "5*14", I need to get 16603.52 via one java reg exp.
this
and this
and this
I tried to create conditional regexp like this:
(5*14 ([\d\.*]+)UAH)
(5*14 d{2}:d{2} Bal. ([\d\.*]+))
etc
But no luck, can you please share your th

You can use a regex like this:
(?=5\*14)[\s\S]*?(\d{5}\.\d{2})
Working demo
Update: you even don't need the look ahead, you can just use:
5\*14[\s\S]*?(\d{5}\.\d{2})

(\d*\.\d\d)(?>\w*)$
will match a group on the last set of DDDDD.DD in the line. You will need to take the contents of the first matching group.

If you have 5*14 before the float number you need to get, you can just use
(?s)\\b5\\*14\\b.*?\\b(\\d+\\.\\d+)
See demo. The value will be in Group 1. I also used Java escaping style.
Note that 5\*14 can match in 145*143 that is why I am using word boundaries \b. .*? with (?s) matches any number of any symbols but as few as possible. \d+\.\d+ matches simple float number (irrespective of the number of digits there are in it).
IDEONE demo:
String str = "120.65UAH Produkti Kvartal\n5*14 14:24\nBal. 16603.52UAH";
Pattern ptrn = Pattern.compile("(?s)\\b5\\*14\\b.*?\\b(\\d+\\.\\d+)");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result: 16603.52

Related

Java regex, replace certain characters except

I have this string "u2x4m5x7" and I want replace all the characters but a number followed by an x with "".
The output should be:
"2x5x"
Just the number followed by the x.
But I am getting this:
"2x45x7"
I'm doing this:
String string = "u2x4m5x7";
String s = string.replaceAll("[^0-9+x]","");
Please help!!!
Here is a one-liner using String#replaceAll with two replacements:
System.out.println(string.replaceAll("\\d+(?!x)", "").replaceAll("[^x\\d]", ""));
Here is another working solution. We can iterate the input string using a formal pattern matcher with the pattern \d+x. This is the whitelist approach, of trying to match the variable combinations we want to keep.
String input = "u2x4m5x7";
Pattern pattern = Pattern.compile("\\d+x");
Matcher m = pattern.matcher(input);
StringBuilder b = new StringBuilder();
while(m.find()) {
b.append(m.group(0));
}
System.out.println(b)
This prints:
2x5x
It looks like this would be much simpler by searching to get the match rather than replacing all non matches, but here is a possible solution, though it may be missing a few cases:
\d(?!x)|[^0-9x]|(?<!\d)x
https://regex101.com/r/v6udph/1
Basically it will:
\d(?!x) -- remove any digit not followed by an x
[^0-9x] -- remove all non-x/digit characters
(?<!\d)x -- remove all x's not preceded by a digit
But then again, grabbing from \dx would be much simpler
Capture what you need to $1 OR any character and replace with captured $1 (empty if |. matched).
String s = string.replaceAll("(\\d+x)|.", "$1");
See this demo at regex101 or a Java demo at tio.run

Why does this regex capture the excluded character?

I have a regex like this:
(?:(\\s| |\\A|^))(?:#)[A-Za-z0-9]{2,}
What I am trying to do is find a pattern that starts with an # and has two or more characters after, however it can't start in the middle of a word.
I'm new to regex but was under the impression ?: matches but then excludes the character however my regex seems to match but include the characters. Ideally I'd like for "#test" to return "test" and "test#test" to not match at all.
Can anyone tell me what I've done wrong?
Thanks.
Your understanding is incorrect. The difference between (...) and (?:...) is only that the former also creates a numbered match group which can be referred to with a backreference from within the regex, or as a captured match group from code following the match.
You could change the code to use lookbehinds, but the simple and straightforward fix is to put ([A-Za-z0-9]{2,}) inside regular parentheses, like I have done here, and retrieve the first matched group. (The # doesn't need any parentheses around it in this scenario, but the ones you have are harmless.)
Try this : You could use word boundary to specify your condition.
public static void main(String[] args) {
String s1 = "#test";
String s2 = "test#test";
String pattern = "\\b#\\w{2,}\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(s1);
m.find();
System.out.println(m.group());
}
o/p :
#test
throws `IllegalStateException` in the second case (s2)..
How about:
\W#[\S]{2}[\S]*
The strings caught by this regular expression needs to be trimmed and remove the first character.
I guess you better need the following one:
(?<=(?<!\w)#)\w{2,}
Debuggex Demo
Don't forget to escape the backslashes in Java since in a string literal:
(?<=(?<!\\w)#)\\w{2,}

a strange regular on look behind

i write a piece of program to fetch content from a string between ":"(may not have) and "#" and order guaranteed,for example a string like "url:123#my.com",the I fetch "123",or "123#my.com" then i fetch "123" ,too; so I write a regular expression to implement it ,but i can not work,behind is first version:
Pattern pattern = Pattern.compile("(?<=:?).*?(?=#)");
Matcher matcher = pattern.matcher("sip:+8610086#dmcw.com");
if (matcher.find()) {
Log.d("regex", matcher.group());
} else {
Log.d("regex", "not match");
}
it can not work because in the first case:"url:123#my.com" it will get the result:"url:123"
obviously not what i want:
so i write the second version:
Pattern pattern = Pattern.compile("(?<=:??).*?(?=#)");
but it get the error,somebody said java not support variable length in look behind;
so I try the third version:
Pattern pattern = Pattern.compile("(?<=:).*?(?=#)|.*?(?=#)");
and its result is same as the first version ,BUT SHOULD NOT THE FIRST CONDITION BE CONSIDERED FIRST?
it same as
Pattern pattern = Pattern.compile(".*?(?=#)|(?<=:).*?(?=#)");
not left to right! I consider I understood regular expression before ,but confused again.thanks in advance anyway.
Try this (slightly edited, see comments):
String test = "sip:+8610086#dmcw.com";
String test2 = "8610086#dmcw.com";
Pattern pattern = Pattern.compile("(.+?:)?(.+?)(?=#)");
Matcher matcher = pattern.matcher(test);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
matcher = pattern.matcher(test2);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
Output:
+8610086
8610086
Let me know if you need explanations on the pattern.
You really don't need any look-aheads or look-behinds here. What you need can be accomplished by using a a greedy quantifer and some alternation:
.*(?:^|:)([^#]+)
By default java regular expression quantifiers (*+{n}?) are all greedy (will match as many characters as possible until a match can't be found. They can be made lazy by using a question mark after the quantifier like so: .*?
You will want to output capture group 1 for this expression, outputting capture group 0 will return the entire match.
As you said, you can't do a variable lookbehind in java.
Then, you can do something like this, you don't need lookbehind or lookaround.
Regex: :?([^#:]*)#
Example In this example (forget about \n, its because of regex101) you will get in the first group what you need, and you don't have to do anything special. Sometimes the easiest solution is the best.

Can't get a match for regular expression in Java

This is the format/example of the string I want to get data:
<span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#B82933;font-size:120%' href='/cartelera/pelicula/18312'>Español </a></span><br><span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#FBEBC4;font-size:120%' href='/cartelera/pelicula/18313'>Subtitulada </a></span><br> </div>
And this is the regular expression I'm using for it:
"pelicula/([0-9]*)'>([\\w\\s]*)</a>"
I tested this regular expression in RegexPlanet, and it turned out OK, it gave me the expected result:
group(1) = 18313
group(2) = Subtitulada
But when I try to implement that regular expression in Java, it won't match anything. Here's the code:
Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>");
Matcher matcher = pattern.matcher(inputLine);
while(matcher.find()){
version = matcher.group(2);
}
}
What's the problem? If the regular expression is already tested, and in that same code I search for more patterns but I'm having trouble with two (I'm showing you here just one). Thank you in advance!
_EDIT__
I discovered the problem... If I check the sourcecode of the page it shows everything, but when I try to consume it from Java, it gets another sourcecode. Why? Because this page asks for your city so it can show information about that. I don't know if there's a workaround about that to actually access the information I want, but that's it.
Your regex is correct but it seems \w does not match ñ.
I changed the regex to
"pelicula/([0-9]*)'>(.*?)</a>"
and it seems to match both the occurrences.
Here I've used the reluctant *? operator to prevent .* match all characters in between first <a> till last <\a>
See What is the difference between `Greedy` and `Reluctant` regular expression quantifiers? for explanation.
#Bohemian is correct in pointing out that you might need to enable the Pattern.DOTALL flag as well if the text in <a> has line breaks
If your input is over several lines (ie it contains newline characters) you'll need to turn on "dot matches newline".
There are two way to do this:
Use the "dot matches newline" regex switch (?s) in your regex:
Pattern pattern = Pattern.compile("(?s)pelicula/([0-9]*)'>([\\w\\s]*)</a>");
or use the Pattern.DOTALL flag in the call to Pattern.compile():
Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>", Pattern.DOTALL);

Java Regex Question - Ignore Quotations

I am trying to write a program using regex. The format for an identifier, as I might have explained in another question of mine, is that it can only begin with a letter (and the rest of it can contain whatever). I have this part worked out for the most part.
However, anything within quotes cannot count as an identifier either.
Currently I am using Pattern pattern = Pattern.compile("[A-Za-z][_A-Za-z0-9]*"); as my pattern, which indicates that the first character can only be letters. So how can I edit this to check if the word is surrounded by quotations (and EXCLUSE those words)?
Use negative lookaround assertions:
"(?<!\")\\b[A-Za-z][_A-Za-z0-9]*\\b(?!\")"
Example:
Pattern pattern = Pattern.compile("(?<!\")\\b[A-Za-z][_A-Za-z0-9]*\\b(?!\")");
Matcher matcher = pattern.matcher("Foo \"bar\" baz");
while (matcher.find())
{
System.out.println(matcher.group());
}
Output:
Foo
baz
See it working online: ideone.
Use lookarounds.
"(?<![\"A-Za-z])[A-Z...
The (?<![\"A-Za-z]) part means "if the previous character is not a quotation mark or a letter".

Categories

Resources