Regex for all Symbols except for | - java

Is there a way to use the patter \p{Punct} in a regex in Java without the Symbol |.
I tried \\p{Punct}&&[^|], but it didn't worked.

What you have is nearly correct. The correct syntax is:
[\\p{Punct}&&[^|]]

Did you see this question: Punctuation Regex in Java?
I think you can modify that answer slightly to use (?![|])\\p{Punct}

do you try with this class : [^\\P{Punct}|] (note the uppercase P)

Related

why does this regex not work?

I am trying to match a string with a java regex and I cannot succeed. I'm pretty new to java and with most of my experience being linux based regex, I've had no success. Can someone help me?
Below are the codes that Im using.
The regex is-
//vod//final\_\d{0,99}.\d{0,99}\\-Frag\d{0,99}
The line that I'm trying to match is
/vod/final_1.3Seg1-Frag1
where I want 1.3, 1 and 1 to be wildcarded.
Someone please help me out... :(
You are missing the Seg1 part. Also you are escaping characters that need not to be escaped. Try out this regexp: /vod/final_\\d+\\.\\d+Seg1-Frag\\d+
This should work:
Pattern p = Pattern.compile( "/vod/final_\\d+\\.\\d+Seg\\d+-Frag\\d+" );
Notes: To protect special characters, you can use Pattern.quote()
When running into problems like this, start with a simple text and pattern and build from there. I.e. first try to match /, then /vod/, then /vod/final_1, etc.
You're escaping too much. Don't escape /, _, -.
Something like:
/vod/final_\d{0,99}.\d{0,99}-Frag\d{0,99}
Does this work?
/\/vod\/final\_\d{0,99}.\d{0,99}Seg\d-Frag\d{0,99}
Also, here's what I used to edit the regex you provided above: http://rubular.com/
It says it's for ruby, but it also mentions that it works for java too.

How do I write this regex in Java?

Basically, for this regex
{(\(\(("\w{1,}",{0,1}){2}\),\(("[^:=;#"\)\(\{\}\[\]]{1,}",{0,1}){2}"[LR]{1}"\)\),{0,1}){1,}}
Which I've tested on Regexpal for this input:
{(("st0","sy0"),("st1","sy3","L")),(("st0","sy0"),("st1","^","L"))}
I now need in Java. I can't seem to figure out how to convert it. Can somebody show me how to?
You need to escape the special chars - specifically the backslashes and the quote marks.
The regular expression could work on Java, the only thing that you have to do, is escape the backslash .

Simple Java regex not working

I have this regex which is supposed to remove sentence delimiters(. and ?):
sentence = sentence.replaceAll("\\.|\\?$","");
It works fine it converts
"I am Java developer." to "I am Java developer"
"Am I a Java developer?" to "Am I a Java developer"
But after deployment we found that it also replaces any other dots in the sentence as
"Hi.Am I a Java developer?" becomes "HiAm I a Java developer"
Why is this happening?
The pipe (|) has the lowest precedence of all operators. So your regex:
\\.|\\?$
is being treated as:
(\\.)|(\\?$)
which matches a . anywhere in the string and matches a ? at the end of the string.
To fix this you need to group the . and ? together as:
(?:\\.|\\?)$
You could also use:
[.?]$
Within a character class . and ? are treated literally so you need not escape them.
What you're saying with "\\.|\\?$" is "either a period" or "a question mark as the last character".
I would recommend "[.?]$" instead in order to avoid the confusing escaping (and undesirable result, of course).
Your problem is because of the low precedence of the alternation operator |. Your regular expression means match one of:
. anywhere or
? at the end of a line.
Use a character class instead:
"[.?]$"
You have forgotten to embrace the sentence-ending characters with round brackets:
sentence = sentence.replaceAll("(\\.|\\?)$","");
The better approach is to use [.?]$ like #Mark Byers suggested.
sentence = sentence.replaceAll("[.?]$","");

How can I include a minus sign in this regex?

I'm trying to match the following three lines:
usemtl ftw
kd 1.2 3.2 3.1
v -12.1892 -53.4267 -276.4055
My regex matches the first two:
^(\w+) ((\S+)( \S+)*) *$
I've tried a few variants to match the negative numbers, but they just stop anything from being matched:
^(\w+) (([\S-]+)( [\S-]+)*) *$
^(\w+) (((\S|-)+)( (\S|-)+)*) *$
What am I supposed to do here? - isn't a special character in regex, is it?
- is only a special character in character classes [...]
Your problem comes from v -12.1892 -53.4267 -276.4055 containing 2 spaces in between v and -12.18.... Your regex only matches one.
Try this regex instead:
^(\w+)\s*((\S+)( \S+)*) *$
Although your regex could be simplified to (not sure exactly what you want to match and capture though):
^(\w+)(\s*\S+)*$
See it on http://rubular.com/r/D86njdYzJF
Put it first in the class: [-\S]
Then it should work.
There are two spaces between v and -12.1892 that seems to be your problem. Also to use - inside a character class i.e. [...] you need to escape it with \-
The reason why it isn't matching is because your third line has two spaces between the v and -12.1892. Try this:
^(\w+) +(([\S]+)( [\S]+)*) *$ (the added + sign allows for multiple spaces)
Here is the jsfiddle to test it: http://jsfiddle.net/xewys/
The most basic regex I could think of to match your sample data was "(\S+\s+)+", but that might not be suitable for you - it seems too generic.

Refactor Regex Pattern - Java

I have the following aaaa_bb_cc string to match and written a regex pattern like
\\w{4}+\\_\\w{2}\\_\\w{2} and it works. Is there any simple regex which can do this same ?
You don't need to escape the underscores:
\w{4}+_\w{2}_\w{2}
And you can collapse the last two parts, if you don't capture them anyway:
\w{4}+(?:_\w{2}){2}
Doesn't get shorter, though.
(Note: Re-add the needed backslashes for Java's strings, if you like; I prefer to omit them while talking about regular expressions :))
I sometimes do what I call "meta-regexing" as follows:
String pattern = "x{4}_x{2}_x{2}".replace("x", "[a-z]");
System.out.println(pattern); // prints "[a-z]{4}_[a-z]{2}_[a-z]{2}"
Note that this doesn't use \w, which can match an underscore. That is, your original pattern would match "__________".
If x really needs to be replaced with [a-zA-Z0-9], then just do it in the one place (instead of 3 places).
Other examples
Regex for metamap in Java
How do I convert CamelCase into human-readable names in Java?
Yes, you can use just \\w{4}_\\w{2}_\\w{2} or maybe \\w{4}(_\\w{2}){2}.
Looks like your \w does not need to match underscore, so you can use [a-zA-Z0-9] instead
[a-zA-Z0-9]{4}_[a-zA-Z0-9]{2}_[a-zA-Z0-9]{2}

Categories

Resources