Regex for a particular pattern - java

I am trying to extract a string that looks something like this(below) using java regex.
Automotive Vehicles (154949)
Cars (91364)
Auto Parts & Accessories (29987)
Motorcycles & Scooters (11648)
I have tried the following below:
for (Element link : links) {
String cat = link.text();
String pattern = "(\\w+\\w+?\\s?.?\\w+)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(cat);
while (m.find( )) {
System.out.println("Category: "+m.group(0));
}
}

Extract the text and the number with a vim regex
\(.*\)(\(\d*\))
Group 1 is the text, Group 2 is the number
So.. it's been a while since I've done RegExs in Java but I think its:
(.*)\((\d+)\)

Related

Regex capture group within if statement in Java

I'm facing a stupid problem... I know how to use Pattern and Matcher objects to capture a group in Java.
However, I cannot find a way to use them with an if statement where each choice depends on a match (simple example to illustrate the question, in reality, it's more complicated) :
String input="A=B";
String output="";
if (input.matches("#.*")) {
output="comment";
} else if (input.matches("A=(\\w+)")) {
output="value of key A is ..."; //how to get the content of capturing group?
} else {
output="unknown";
}
Should I create a Matcher for each possible test?!
Yes, you should.
Here is the example.
Pattern p = Pattern.compile("Phone: (\\d{9})");
String str = "Phone: 123456789";
Matcher m = p.matcher(str);
if (m.find()) {
String g = m.group(1); // g should hold 123456789
}

How to parse a string to get array of #tags out of the string?

so I have this string like
"#tag1 #tag2 #tag3 not_tag1 not_tag2 #tag4" (the space between tag2 and tag4 is to indicate there can be many spaces). From this string I want to parse just a tag1, tag2 and so on. They are similar to #tags we see on LinkedIn or any other social media. Is there any easy way to do this using regex or any other function in Java. Or should I do it hard way(i.e. using loops and conditions).
Tag format should be "#" (to indicate tag is starting) and space " "(to indicate end of tag). In between there can be character or numbers but start should be a character only.
example,
input : "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4"
output : ["tag1", "tag2", "tag3", "tag4"]
split by regex: "#\w+"
EDIT: this is the correct regex, but split is not the right method.
same solution as javadev suggested, but use instead:
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Matcher matcher = Pattern.compile("#\\w+").matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
output with # as expected.
Maybe something like:
public static void main(String[] args ) {
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Pattern pattern = Pattern.compile("#([A-z][A-z0-9]*) *");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
worked for me :)
Output:
tag1
tag2
tag3
tag4

Java Regex : Extract a specific pattern from a string "I_INSERT_TO_TOPIC_345674_123456_4.json"

I want to extract only "_123456_4" from this string using java Regex.
I_INSERT_TO_TOPIC_345674_123456_4.json
I have tried
Pattern.compile("(_([^_]*_[^_]))") and Pattern.compile("_" + "([^[0-9]]*)" + "_[0-9]") but these do not work.
If you want to get 2 group of digits just before .json then you can use regex group to find the required match. You can modify the pattern as per your requirement.
Pattern p = Pattern.compile("(_\\d+_\\d+)\\.json");
Matcher matcher = p.matcher(s);
if (matcher.find()) {
String group = matcher.group(1);
}
【\_[0-9]\*\_[0-9]\*(?=\\.)】
You can try to see if this works

Getting multiple matches via regex

I want to retrieve a strings from a global string via Matcher & Pattern using REGEX.
String str = "<strong>ABC</strong>123<strong>DEF</strong>"
Pattern pattern = Pattern.compile("<strong>(.*)</strong>");
Matcher matcher = pattern.matcher(str);
My problem is that the matcher gives me just one match that is inside the global tag strong:
ABC</strong>123<strong>DEF
My objective is to get 2 matches:
ABC
DEF
Thank you very match for you help.
You need a non greedy regex:
Pattern pattern = Pattern.compile("<strong>.*?</strong>");
Use ? to specify non greedy. This means it will match the first match it finds instead of the outer most match...
If you only want ABC and DEF then you can do something like this using lookaheads and lookbehinds:
String str = "<strong>ABC</strong>123<strong>DEF</strong>";
Pattern pattern = Pattern.compile("((?<=<strong>).*?(?=</strong>))");
Matcher matcher = pattern.matcher(str);
while(matcher.find())
{
System.out.println(matcher.group());
}
If you do a google search you should be able to find information on lookaheads and lookbehinds...
I recommend to use JSOUP to parse your HTML code instead of regex as
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
// select your tag
Elements elements = doc.select("strong");
// get the iterator to traverse all elements
Iterator<Element> it = elements.iterator();
// loop through all elements and fetch their text
while (it.hasNext()) {
System.out.println(it.next().text());
}
Output :
ABC
DEF
or get Output as single string
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
Elements elements = doc.select("strong");
System.out.println(elements.text());
Output:
ABC DEF
Download Jsoup and add it as a dependency

Extracting part of URL using java regular expression

I'm trying to extract part of the URL in the text files.
for example:
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed" class="search_bin"><span>Closed Tickets</span></a>
I would like to extract only
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
HOW I COULD DO THAT BY USING REGULAR Expression. I tried with regex
"/p/*./bugs/*."
but it didn't work.
Try this:
"\/p.*\/bugs[^"]*"
it means: "/p"
then: all chars,
then: "/bugs",
then: all chars except "
You can use :
(\/p\/.*\/bugs\/.*?(?="))
Java Code :
String REGEX = "(\\/p\\/.*\\/bugs\\/.*?(?=\"))";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(line);
while (m.find()) {
String matched = m.group();
System.out.println("Mached : "+ matched);
}
OUTPUT
Mached : /p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
DEMO
Explanation:
Here's another way:
(?i)/p/[a-z/]+bugs/[^ "]+
The (?i) in the beginning makes the regex case insensitive so you don't have to worry about that. Then after bugs/ it will continue until it reaches either a space or a ".

Categories

Resources