How to express this pattern with java regex? - java

I'm working on a java program that tries extracts a string with the format:
|something|=
from different string inputs, where |something| has the following constraints:
|something| can not start with "*"
|something| can not be equal to "name"
|something| can have any length
Can such pattern be represented with a java regular expression? Or should I consider a different approach. Any help would be greatly appreciated!

This is a regular expression you can use:
\|(?!name\|)[^*].*\|=
You can get a nice explanation here.

Related

Java Regex with collon

I want to match below string
RegEx I'm trying:
{1:F21XXXXXXXX9999123456}{4:{177:1007300\\d{2}8}{451:0}{108:XXX190876234-1}}
{1:F21XXXXXXXX9999123456}{4:{177:1007300\\d+
String to match:
{1:F21XXXXXXXX9999123456}{4:{177:1007300838}{451:0}{108:XXX190876234-1}}
I tried other ways as well but facing issues in matching any help would be appreciated.
Your first regexp (with missing escapes) is:
\{1:F21XXXXXXXX9999123456\}\{4:\{177:1007300\d{2}8\}\{451:0\}\{108:XXX190876234-1\}\}
When written as a Java String literal:
Pattern.compile("\\{1:F21XXXXXXXX9999123456\\}\\{4:\\{177:1007300\\d{2}8\\}\\{451:0\\}\\{108:XXX190876234-1\\}\\}");

Matching regular expression OR and AND

I have a requirement to match the 1s OR (AND) 0s using a regular expression.
For example
^((1+|0+)|(1*,0*))1((1+|0+)|(1*,0*))1+((1+|0+)|(1*,0*))$
matches 01001110.
I would like to know how to match "either series of 1 or series of 0 or mixed series of 1 and 0"
I have tried the following and doesn't work for string 1010.
^((1+|0+)|(1*,0*))1+((1+|0+)|(1*,0*))$
The idea is that in this problem, "A"s can be represented as 1+ OR 0+ or mix of both.
So I'm trying to derive the regular expression as shown above.
I am not sure if that is what you want but if you want to find series which contains only 1 and/or 0 then your regex can look like (1|0)+ which can be also written using character class as [01]+
If you're just matching binary digits, you can use:
Pattern any = Pattern.compile("[01]+");
If you want alternate binary digits, you can use:
Pattern alt = Pattern.compile("((?<=0|^)1|1(?=0|$))+");
Is this what you need?
^([0,1]+)$
Try this site for further tests:
http://www.regexplanet.com/advanced/java/index.html

Java - Regex to Split Tokens With Minimum Size and Delimiters

I know I know, there are many similar questions, and I can say I read all of them. But, I am not good in regex and I couldn't figure out the regular expression that I need.
I want to split a String in Java, and I have 4 constraints:
The delimiters are [.?!] (end of the sentence)
Decimal numbers shouldn't be tokenized
The delimiters shouldn't be removed.
The minimum size of each token should be 5
For example, for input:
"Hello World! This answer worth $1.45 in U.S. dollar. Thank you."
The output will be:
[Hello World!, This answer worth $1.45 in U.S. dollar., Thank you.]
Up to now I got the answer for three first constraints by this regex:
text.split("(?<=[.!?])(?<!\\d)(?!\\d)");
And I know I should use {5,} somewhere in my regex, but any combination that I tried doesn't work.
For cases like: "I love U.S. How about you?" it doesn't matter if it gives me one or two sentences, as far as it doesn't tokenize S. as a separate sentence.
Finally, introducing a good tutorial of regex is appreciated.
UPDATE: As Chris mentioned in the comments, it is almost impossible to solve questions like this (to cover all the cases happen in natural languages) with regex. However, I found HamZa's answer the closet, and the most useful one.
So, Be careful! The accepted answer will not cover all possible use cases!
Basing my answer from a previously made regex.
The regex was basically (?<=[.?!])\s+(?=[a-z]) which means match any whitespace one or more times preceded with either ., ? or ! and followed by [a-z] (not forgetting the i modifier).
Now let's modify it to the needs of this question:
We'll first convert it to a JAVA regex: (?<=[.?!])\\s+(?=[a-z])
We'll add the i modifier to match case insensitive (?i)(?<=[.?!])\\s+(?=[a-z])
We'll put the expression in a positive lookahead to prevent the "eating" of the characters (delimiters in this case) : (?=(?i)(?<=[.?!])\\s+(?=[a-z]))
We'll add a negative lookbehind to check if there is no abbreviation in the format LETTER DOT LETTER DOT : (?i)(?<=[.?!])(?<![a-z]\.[a-z]\.)\\s+(?=[a-z])
So our final regex looks like : (?i)(?<=[.?!])(?<![a-z]\.[a-z]\.)\\s+(?=[a-z]).
Some links:
Online tester, jump to JAVA
Explain tool (Not JAVA based)
THE regex tutorial
Java specific regex tutorial
SO regex chatroom
Some advanced nice regex-fu on SO
How does this regex find triangular numbers?
How can we match a^n b^n?
How does this Java regex detect palindromes?
How to determine if a number is a prime with regex?
"vertical" regex matching in an ASCII "image"
Can the for loop be eliminated from this piece of PHP code? ^-- See regex solution, although not sure if applicable in JAVA
What about the next regular expression?
(?<=[.!?])(?!\w{1,5})(?<!\d)(?!\d)
e.g.
private static final Pattern REGEX_PATTERN =
Pattern.compile("(?<=[.!?])(?!\\w{1,5})(?<!\\d)(?!\\d)");
public static void main(String[] args) {
String input = "Hello World! This answer worth $1.45 in U.S. dollar. Thank you.";
System.out.println(java.util.Arrays.toString(
REGEX_PATTERN.split(input)
)); // prints "[Hello World!, This answer worth $1.45 in U.S., dollar., Thank you.]"
}

How do I match regular expressions with strings?

in Java, what regular-expression will match the following string (please note that the | is part of the string)?
|---0-1-2-5-3-||-3-----------|---------1-0-|-0-----------|
As stated in the comments above by Eyal Schneider and Naveed S:
if you need to match that exact string, use
str.equals("|---0-1-2-5-3-||-3-----------|---------1-0-|-0-----------|")
if you need to match a certain pattern within the string, use RegEx.
A Tutorial on Java Regex
A tester where you can test your Regular Expressions

Regex: Match a string between two tags in a string

I am new to Regexp. I am struck in writing regexp for below scenario. Can some one please help me in solving this?
If i have a String like the following:
<Tag1 attr="test"/>
<Tag2>
<Tag4 attr="test"/>
<Tag5 attr="test"/>
</Tag2>
<Tag3 attr="test"/>
Whats the regex to match 'test' between the <Tag2> and </Tag2> tags?
Output should match 'test' in both Tag4 and Tag5...
Any help would be highly appreciated..
Why are you using a regex for this? I am not familiar with the Java libraries, but I would imagine there is a library that would allow you to do XQueries using XPaths. That would be the simpler approach.
Here is a website that shows examples
Here is a SO question on XPath in Java
XPath is really more appropriate for this. This looks like duplicate post. Original
Perl has a couple of good xpath parsers on CPAN. But here's a good page on multiline regex parsing if you absolutely must use it.
All said before is totally true - however if you still want to practice some regex heres an alternative:
Doing it in one match is not possible since one of the inner groups will always be discarded (see this) , so you'll have to extract the inner passage first.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTagParse {
static String html = "<Tag1 attr=\"test\"/><Tag2> <Tag4 attr=\"test_one\"/> <Tag5 attr=\"test_two\"/></Tag2><Tag3 attr=\"test\"/>";
public static void main(String[] args) {
Matcher mat1 = Pattern.compile("Tag2>(.*)</Tag2").matcher(html);
mat1.find();
Matcher mat2 = Pattern.compile("<[^<>]*attr=\"([^\"]+)\"[^<>]>").matcher(mat1.group(1));
while(mat2.find()){
System.out.println(mat2.group(1));
}
}
}
anyways, you'd be much better off using XPath :)
I'm not in practice with java, but I can offer some guidance to the regular expression, I hope. If you know what the specific attribute and value is that you're looking for, you can use something like the following:
Pattern pattern = Pattern.compile("<tag[45].*attr\s*=\s*[\"']test['\"][^>]*>", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("<Tag1 attr='test'/><Tag2><Tag4 attr='test'/><Tag5 attr='test'/></Tag2><Tag3 attr='test'/>");
matcher.matches();
the regex is made up of the following components:
match the literal string:
followed by either a 4 or a 5 (the [45] designation)
followed by any number of characters preceding the literal string: attr
followed by any number of spaces
followed by the literal character: =
followed by any number of spaces
followed by either the ' or " character
followed by the string literal: test
followed by either the ' or " character
followed by any character that is not >
followed by >
the point in adding some of these extra bits is simply to highlight that you may need/want to consider accounting for different coding styles, etc. note: I took the easy away out by setting the pattern as case-insensitive, but you can omit that and change your expression to check for the appropriate case (for example, if your attribute value is case-sensitive, you can change the 'tag' literal to be [tT][aA][gG] in order to allow matching the tag to be case-insensitive.
I'm apparently too slow to type, since jvataman has already answered your question, but perhaps there is some value in my writeup, so I'll post anyway.

Categories

Resources