How can I get all content between two pipes using regular expression

How can I get all content between two pipes using regular expression - java

I have a String say
String s = "|India| vs Aus";
In this case result should be only India.
Second case :
String s = "Aus vs |India|";
In this case result should be only India.
3rd case:
String s = "|India| vs |Aus|"
Result shouls contain only India, Aus. vs should not present in output.
And in these scenarios, there can be any other word in place of vs. e.g. String can be like this also |India| in |Aus|. and the String can be like this also |India| and |Sri Lanka| in |Aus|. I want those words that are present in between two pipes like India, Sri Lanka , Aus.
I want to do it in Java.
Any pointer will be helpful.

You would use a regex like...
\|[^|]+\|
...or...
\|.+?\|
You must escape the pipe because the pipe has special meaning in a regex as or.

You are looking at something similar to this:
String s = "|India| vs |Aus|";
Pattern p = Pattern.compile("\\|(.*?)\\|");
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group(1));
}
You need to use the group to get the contents inside the paranthesis in the regexp.

Related

How to get the middle strings with regex?

I have an input string that looks like this
DatalogSetupFile: BTS50xx1EJA\3.20\log_all.stp
The DatalogSetupFile: and \3.20\log_all.stp are constant. I wish to extract BTS50xx1EJA from the string. How should I do it?

You can make a regex group in which you can specify what all are the static content and then specify what are the dynamic content as a whole group, So that you can get the dynamic content as a whole group,
You can define regex as follow
^(?:DatalogSetupFile:\s)(.*)(?:\3.20\log_all.stp)$
Try this Demo
Here you can use the first group to get your dynamic string

Give this regex a try:
\s\K[^\\]+
Which, in Java would look like:
String myInputString = "DatalogSetupFile: BTS50xx1EJA\\3.20\\log_all.stp";
Pattern myPattern = Pattern.compile("\\s\\K[^\\\\]+");
Matcher myMatcher = Pattern.matcher(myInputString);
System.out.println(myMatcher.group(0));

Regular expression for hgsv notation in java

HGSV nomenclature has a pattern:
xxxxx.yyyy:charactersnumbercharacters
I would like to make a regex in java and fetch the all the tokens from above eg:
it should have 5 tokens :
{ 'xxxxx', 'yyyy', 'characters', 'number' , 'characters'}
I have used simple split methodology to fetch the tokens, but I don't find its an optimal solution:
my current code is :
String hgsv = "BRAF.p:V600E";
String[] tokens = hgsv.split(".");
this.symbol = tokens[0];
String type = tokens[1].split(":")[0];
I would like to use Pattern and Matcher in Java. No idea, how to make regex for the above token.
Any clue how to do that?
(even to separate characters, numbers, characters I will be using regex). So why not to use REGEX for entire token.
I found link but this is in Python, I need similar in Java.

I think what you're probably looking for is to use capture groups, like this:
String s = "BRAF.p:V600E";
Pattern p = Pattern.compile("(\\w+)\\.(\\w+):([a-zA-Z]+)(\\d+)([a-zA-Z]+)");
Matcher m = p.matcher(s);
if (m.matches()) {
String[] parts = {m.group(1),
m.group(2),
m.group(3),
m.group(4),
m.group(5)};
// Prints "[BRAF, p, V, 600, E]"
System.out.println(Arrays.toString(parts));
} else {
// The input String is invalid.
}
That's really just a lot like a split, but it's more stable because you're using the pattern to validate the String beforehand.
Note that I have no idea if that is the exact right pattern that you should be using. I don't know the exact details of the HGSV notation you're talking about and your description is actually pretty vague. (What are e.g. xxxxx and yyyy? What are "characters"?) If you link me to some sort of specification or detailed description of this notation I can try to write a regex that's more definitely correct.
Anyhow, my example shows the basic idea. You might also see http://www.regular-expressions.info/brackets.html for more information.

Regular expression for multiple words with * and space

My regular expression is of format "Exit* Order*". When i use in java its not working as expected.
String pattern = "Exit* Order*";
String ipLine = "Exiting orders";
Match: NO
String pattern = "Exit Order";
String ipLine = "Exit order";
Match: Yes.
Java Code:
Pattern patrn = Pattern.compile(pattern,Pattern.CASE_INSENSITIVE);
Matcher match = patrn.matcher(ipLine);
Can any one let me know what should be the pattern in such cases.

I believe you are looking for something like:
"Exit.* Order.*"
or maybe something instead of .*, e.g. \S*, \w*, [A-Za-z]*.
Your current regular expression is looking for zero or more t and r on the ends of the words, e.g. it would match
Exi Orde
Exit Orde
Exitt Orde
Exi Order
Exi Orderr
...

Exit\\w* Order\\w*
You should use this..* can match much more than intended.use i or ignorecase flag

It seems like you just want to match "Exit Order" case-insensitively:
Try this:
if (str.matches("(?i)exit order"))
Or to restrict the match to just your examples, where the "O" of "Order may be "o", use:
if (str.matches("Exit [Oo]rder"))

Java does not use Linux regexp expression:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Use this:
String pattern = "Exit.* Order.*";

Negating a Regular Expression for string replacement

I have the following code that can replace the email address in a String in Java:
addressStr.replaceFirst("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})", "")
So, a string with John Smith <john#smith.com> would become John Smith <>. How do I negate it so that it will instead replace all that doesn't match the email address and have the final result as just john#smith.com?
I tried to put in the ^ and ?<= at the front but it doesn't work.

Well, it's not the regex you need to change but the calling code. Your regex matches the e-mail address (in a weird way), and the replace() removes it from the string.
So just use
Pattern regex = Pattern.compile("([a-zA-Z0-9_\\-\\.]+)#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})");
Matcher regexMatcher = regex.matcher(addressStr);
if (regexMatcher.find()) {
address = regexMatcher.group();
}

The complete Java regex for catching e-mails would be as follows:
"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])"
Take a look at https://www.rfc-editor.org/rfc/rfc2822#section-3.4.1 for more info on this.
A bit complicated but it is valid for all known and valid emails formats (yours do not allows mails like bob+bib#gmail.com which are valid).
For your problem, as stated multiple times, just find (stealing Tim Pietzcker piece of code):
Pattern regex = Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])");
Matcher regexMatcher = regex.matcher(addressStr);
foundMatch = regexMatcher.find();

You can try:
String mailId = Pattern.compile(regexp, Pattern.LITERAL).matcher(addressStr).group();
Idea here is to get the matched string rather than trying to replace everything else with blank. You can extract the pattern into a field if this operation is repetitive.

Just don't replace.... use match(es) instead.

java email extraction regular expression?

I would like a regular expression that will extract email addresses from a String (using Java regular expressions).
That really works.

Here's the regular expression that really works.
I've spent an hour surfing on the web and testing different approaches,
and most of them didn't work although Google top-ranked those pages.
I want to share with you a working regular expression:
[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})
Here's the original link:
http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/

I had to add some dashes to allow for them. So a final result in Javanese:
final String MAIL_REGEX = "([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";

Install this regex tester plugin into eclipse, and you'd have whale of a time testing regex
http://brosinski.com/regex/.
Points to note:
In the plugin, use only one backslash for character escape. But when you transcribe the regex into a Java/C# string you would have to double them as you would be performing two escapes, first escaping the backslash from Java/C# string mechanism, and then second for the actual regex character escape mechanism.
Surround the sections of the regex whose text you wish to capture with round brackets/ellipses. Then, you could use the group functions in Java or C# regex to find out the values of those sections.
([_A-Za-z0-9-]+)(\.[_A-Za-z0-9-]+)#([A-Za-z0-9]+)(\.[A-Za-z0-9]+)
For example, using the above regex, the following string
abc.efg#asdf.cde
yields
start=0, end=16
Group(0) = abc.efg#asdf.cde
Group(1) = abc
Group(2) = .efg
Group(3) = asdf
Group(4) = .cde
Group 0 is always the capture of whole string matched.
If you do not enclose any section with ellipses, you would only be able to detect a match but not be able to capture the text.
It might be less confusing to create a few regex than one long catch-all regex, since you could programmatically test one by one, and then decide which regexes should be consolidated. Especially when you find a new email pattern that you had never considered before.

a little late but ok.
Here is what i use. Just paste it in the console of FireBug and run it. Look on the webpage for a 'Textarea' (Most likely on the bottom of the page) That will contain a , seperated list of all email address found in A tags.
var jquery = document.createElement('script');
jquery.setAttribute('src', 'http://code.jquery.com/jquery-1.10.1.min.js');
document.body.appendChild(jquery);
var list = document.createElement('textarea');
list.setAttribute('emaillist');
document.body.appendChild(list);
var lijst = "";
$("#emaillist").val("");
$("a").each(function(idx,el){
var mail = $(el).filter('[href*="#"]').attr("href");
if(mail){
lijst += mail.replace("mailto:", "")+",";
}
});
$("#emaillist").val(lijst);

The Java 's build-in email address pattern (Patterns.EMAIL_ADDRESS) works perfectly:
public static List<String> getEmails(#NonNull String input) {
List<String> emails = new ArrayList<>();
Matcher matcher = Patterns.EMAIL_ADDRESS.matcher(input);
while (matcher.find()) {
int matchStart = matcher.start(0);
int matchEnd = matcher.end(0);
emails.add(input.substring(matchStart, matchEnd));
}
return emails;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I get all content between two pipes using regular expression - java

You would use a regex like... \|[^|]+\| ...or... \|.+?\| You must escape the pipe because the pipe has special meaning in a regex as or.

You are looking at something similar to this: String s = "|India| vs |Aus|"; Pattern p = Pattern.compile("\\|(.*?)\\|"); Matcher m = p.matcher(s); while(m.find()){ System.out.println(m.group(1)); } You need to use the group to get the contents inside the paranthesis in the regexp.

Related

How to get the middle strings with regex?

Regular expression for hgsv notation in java

Regular expression for multiple words with * and space

Negating a Regular Expression for string replacement

java email extraction regular expression?

Categories

Resources