Negating a Regular Expression for string replacement

Negating a Regular Expression for string replacement - java

I have the following code that can replace the email address in a String in Java:
addressStr.replaceFirst("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})", "")
So, a string with John Smith <john#smith.com> would become John Smith <>. How do I negate it so that it will instead replace all that doesn't match the email address and have the final result as just john#smith.com?
I tried to put in the ^ and ?<= at the front but it doesn't work.

Well, it's not the regex you need to change but the calling code. Your regex matches the e-mail address (in a weird way), and the replace() removes it from the string.
So just use
Pattern regex = Pattern.compile("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})");
Matcher regexMatcher = regex.matcher(addressStr);
if (regexMatcher.find()) {
address = regexMatcher.group();
}

The complete Java regex for catching e-mails would be as follows:
"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])"
Take a look at https://www.rfc-editor.org/rfc/rfc2822#section-3.4.1 for more info on this.
A bit complicated but it is valid for all known and valid emails formats (yours do not allows mails like bob+bib#gmail.com which are valid).
For your problem, as stated multiple times, just find (stealing Tim Pietzcker piece of code):
Pattern regex = Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])");
Matcher regexMatcher = regex.matcher(addressStr);
foundMatch = regexMatcher.find();

You can try:
String mailId = Pattern.compile(regexp, Pattern.LITERAL).matcher(addressStr).group();
Idea here is to get the matched string rather than trying to replace everything else with blank. You can extract the pattern into a field if this operation is repetitive.

Just don't replace.... use match(es) instead.

Related

Regular expression for hgsv notation in java

HGSV nomenclature has a pattern:
xxxxx.yyyy:charactersnumbercharacters
I would like to make a regex in java and fetch the all the tokens from above eg:
it should have 5 tokens :
{ 'xxxxx', 'yyyy', 'characters', 'number' , 'characters'}
I have used simple split methodology to fetch the tokens, but I don't find its an optimal solution:
my current code is :
String hgsv = "BRAF.p:V600E";
String[] tokens = hgsv.split(".");
this.symbol = tokens[0];
String type = tokens[1].split(":")[0];
I would like to use Pattern and Matcher in Java. No idea, how to make regex for the above token.
Any clue how to do that?
(even to separate characters, numbers, characters I will be using regex). So why not to use REGEX for entire token.
I found link but this is in Python, I need similar in Java.

I think what you're probably looking for is to use capture groups, like this:
String s = "BRAF.p:V600E";
Pattern p = Pattern.compile("(\\w+)\\.(\\w+):([a-zA-Z]+)(\\d+)([a-zA-Z]+)");
Matcher m = p.matcher(s);
if (m.matches()) {
String[] parts = {m.group(1),
m.group(2),
m.group(3),
m.group(4),
m.group(5)};
// Prints "[BRAF, p, V, 600, E]"
System.out.println(Arrays.toString(parts));
} else {
// The input String is invalid.
}
That's really just a lot like a split, but it's more stable because you're using the pattern to validate the String beforehand.
Note that I have no idea if that is the exact right pattern that you should be using. I don't know the exact details of the HGSV notation you're talking about and your description is actually pretty vague. (What are e.g. xxxxx and yyyy? What are "characters"?) If you link me to some sort of specification or detailed description of this notation I can try to write a regex that's more definitely correct.
Anyhow, my example shows the basic idea. You might also see http://www.regular-expressions.info/brackets.html for more information.

Java String RegularExpressions

Team,
I had a task. i.e., i want to check 98% in a blcvk of data.
I trying to write some regex but its giving continuous error.
String str="OAM-2 OMFUL abmasc01 and prdrot01 98% users NB in host nus918pe locked.";
if(str.matches("[0-9][0-9]%"))
but it is returning false.
Response is truly appreciated.

Use the pattern/matcher/find method. matches applies the regex to the whole string.
Pattern pattern = Pattern.compile("[0-9]{2}%");
String test = "OAM-2 OMFUL abmasc01 and prdrot01 98% users NB in host nus918pe locked.";
Matcher matcher = pattern.matcher(test);
if(matcher.find()) {
System.out.println("Matched!");
}

Try:
str.matches(".*[0-9][0-9]%.*")
or (\d = digit):
str.matches(".*\\d\\d%.*")
The matching pattern should also match the characters that come before/after the 98% which is why you should add the .*
Comment:
You can use Pattern matcher like the others suggested, it's especially effective if you want to extract 98% out of the string - but if you're just looking to find if there's a match - I find .matches() to be simpler to use.

str.matches("[0-9][0-9]%") actually applies this regex ^[0-9][0-9]%$, which is anchored at start and end. Others have described solutions to this already.

You can try this regex \d{1,2}(\.\d{0,2})?% this will match 98% or percentage with decimal values like 98.56%as well.
Pattern pattern = Pattern.compile("\\d{1,2}(\\.\\d{0,2})?%");
String yourString= "OAM-2 OMFUL abmasc01 and prdrot01 98% users NB in host nus918pe locked.";
Matcher matcher = pattern.matcher(yourString);
while(matcher.find()) {
System.out.println(matcher.group());
}

How can I get all content between two pipes using regular expression

I have a String say
String s = "|India| vs Aus";
In this case result should be only India.
Second case :
String s = "Aus vs |India|";
In this case result should be only India.
3rd case:
String s = "|India| vs |Aus|"
Result shouls contain only India, Aus. vs should not present in output.
And in these scenarios, there can be any other word in place of vs. e.g. String can be like this also |India| in |Aus|. and the String can be like this also |India| and |Sri Lanka| in |Aus|. I want those words that are present in between two pipes like India, Sri Lanka , Aus.
I want to do it in Java.
Any pointer will be helpful.

You would use a regex like...
\|[^|]+\|
...or...
\|.+?\|
You must escape the pipe because the pipe has special meaning in a regex as or.

You are looking at something similar to this:
String s = "|India| vs |Aus|";
Pattern p = Pattern.compile("\\|(.*?)\\|");
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group(1));
}
You need to use the group to get the contents inside the paranthesis in the regexp.

Getting value of $1 from matcher.replaceAll()

In my application I need get the link and break it if it is bigger than 10(example) chars.
The problem is, if I send the whole text, for example: "this is my website www.stackoverflow.com" directly to this matcher
Pattern patt = Pattern.compile("(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>???“”‘’]))");
Matcher matcher = patt.matcher(text);
matcher.replaceAll("$1");
it would show the whole website, without breaking it.
What I was trying to do, is to get the value of $1, so i could break the second one, keeping the first one correctly.
I've got another method to break the string up.
UPDATE
What I want to get is only the website so I could break it after all. It would help me a lot.

You can't use replaceAll; you should iterate through the matches and process each one individually. Java's Matcher already has an API for this:
// expanding on the example in the 'appendReplacement' JavaDoc:
Pattern p = Pattern.compile("..."); // your URL regexp
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String truncatedURL = m.group(1).replaceFirst("^(.{10}).*","$1..."); // i iz smrt
m.appendReplacement(sb,
"<a href=\"http://$1\" target=\"_blank\">"); // simple replacement for $1
sb.append(truncatedURL);
sb.append("</a>");
}
m.appendTail(sb);
System.out.println(sb.toString());
(For performance, you should factor out compiled Patterns for the replace* calls inside the loop.)
Edit: use sb.append() so not to worry about escaping $ and \ in 'truncatedURL'.

I think that you have a similar problem to the one mentioned on this question
Java : replacing text URL with clickable HTML link
they suggested something like this
String basicUrlRegex = "(.*://[^<>[:space:]]+[[:alnum:]/])";
myString.replaceAll(basicUrlRegex, "$1");

java email extraction regular expression?

I would like a regular expression that will extract email addresses from a String (using Java regular expressions).
That really works.

Here's the regular expression that really works.
I've spent an hour surfing on the web and testing different approaches,
and most of them didn't work although Google top-ranked those pages.
I want to share with you a working regular expression:
[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})
Here's the original link:
http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/

I had to add some dashes to allow for them. So a final result in Javanese:
final String MAIL_REGEX = "([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";

Install this regex tester plugin into eclipse, and you'd have whale of a time testing regex
http://brosinski.com/regex/.
Points to note:
In the plugin, use only one backslash for character escape. But when you transcribe the regex into a Java/C# string you would have to double them as you would be performing two escapes, first escaping the backslash from Java/C# string mechanism, and then second for the actual regex character escape mechanism.
Surround the sections of the regex whose text you wish to capture with round brackets/ellipses. Then, you could use the group functions in Java or C# regex to find out the values of those sections.
([_A-Za-z0-9-]+)(\.[_A-Za-z0-9-]+)#([A-Za-z0-9]+)(\.[A-Za-z0-9]+)
For example, using the above regex, the following string
abc.efg#asdf.cde
yields
start=0, end=16
Group(0) = abc.efg#asdf.cde
Group(1) = abc
Group(2) = .efg
Group(3) = asdf
Group(4) = .cde
Group 0 is always the capture of whole string matched.
If you do not enclose any section with ellipses, you would only be able to detect a match but not be able to capture the text.
It might be less confusing to create a few regex than one long catch-all regex, since you could programmatically test one by one, and then decide which regexes should be consolidated. Especially when you find a new email pattern that you had never considered before.

a little late but ok.
Here is what i use. Just paste it in the console of FireBug and run it. Look on the webpage for a 'Textarea' (Most likely on the bottom of the page) That will contain a , seperated list of all email address found in A tags.
var jquery = document.createElement('script');
jquery.setAttribute('src', 'http://code.jquery.com/jquery-1.10.1.min.js');
document.body.appendChild(jquery);
var list = document.createElement('textarea');
list.setAttribute('emaillist');
document.body.appendChild(list);
var lijst = "";
$("#emaillist").val("");
$("a").each(function(idx,el){
var mail = $(el).filter('[href*="#"]').attr("href");
if(mail){
lijst += mail.replace("mailto:", "")+",";
}
});
$("#emaillist").val(lijst);

The Java 's build-in email address pattern (Patterns.EMAIL_ADDRESS) works perfectly:
public static List<String> getEmails(#NonNull String input) {
List<String> emails = new ArrayList<>();
Matcher matcher = Patterns.EMAIL_ADDRESS.matcher(input);
while (matcher.find()) {
int matchStart = matcher.start(0);
int matchEnd = matcher.end(0);
emails.add(input.substring(matchStart, matchEnd));
}
return emails;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Negating a Regular Expression for string replacement - java

You can try: String mailId = Pattern.compile(regexp, Pattern.LITERAL).matcher(addressStr).group(); Idea here is to get the matched string rather than trying to replace everything else with blank. You can extract the pattern into a field if this operation is repetitive.

Just don't replace.... use match(es) instead.

Related

Regular expression for hgsv notation in java

Java String RegularExpressions

How can I get all content between two pipes using regular expression

Getting value of $1 from matcher.replaceAll()

java email extraction regular expression?

Categories

Resources