Extracting part of URL using java regular expression

Extracting part of URL using java regular expression - java

I'm trying to extract part of the URL in the text files.
for example:
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed" class="search_bin"><span>Closed Tickets</span></a>
I would like to extract only
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
HOW I COULD DO THAT BY USING REGULAR Expression. I tried with regex
"/p/*./bugs/*."
but it didn't work.

Try this:
"\/p.*\/bugs[^"]*"
it means: "/p"
then: all chars,
then: "/bugs",
then: all chars except "

You can use :
(\/p\/.*\/bugs\/.*?(?="))
Java Code :
String REGEX = "(\\/p\\/.*\\/bugs\\/.*?(?=\"))";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(line);
while (m.find()) {
String matched = m.group();
System.out.println("Mached : "+ matched);
}
OUTPUT
Mached : /p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
DEMO
Explanation:

Here's another way:
(?i)/p/[a-z/]+bugs/[^ "]+
The (?i) in the beginning makes the regex case insensitive so you don't have to worry about that. Then after bugs/ it will continue until it reaches either a space or a ".

Related

Java Regex : Extract a specific pattern from a string "I_INSERT_TO_TOPIC_345674_123456_4.json"

I want to extract only "_123456_4" from this string using java Regex.
I_INSERT_TO_TOPIC_345674_123456_4.json
I have tried
Pattern.compile("(_([^_]*_[^_]))") and Pattern.compile("_" + "([^[0-9]]*)" + "_[0-9]") but these do not work.

If you want to get 2 group of digits just before .json then you can use regex group to find the required match. You can modify the pattern as per your requirement.
Pattern p = Pattern.compile("(_\\d+_\\d+)\\.json");
Matcher matcher = p.matcher(s);
if (matcher.find()) {
String group = matcher.group(1);
}

【\_[0-9]\*\_[0-9]\*(?=\\.)】
You can try to see if this works

Java Regular expression for exacted matched case

I have been struggling to find the matched string(s) with Java Regular expression for the syntax {//<some string>/<some String>}
My regular expression should return with these matched cases: {//data/process_id}
Below is the String which i want to find matched syntax:
#process_id={//data/process_id}##history_id={//data/history_id}##Pdataxml={//data/dataxml}##Prules =_UNESCAPEXMLVALUE({//data/rules})##submitted_by={//data/submitted_by}##table_definition={//data/table_definition}
I have tried with below regx pattern but it did not work:
[a-zA-Z_/\\[\\]\\(\\)0-9|]+
Can someone please help me to solve this issue?

You can use the following regex:
\{\/\/[^\/{}\s]*\/[^\/{}\s]*\}
Demo on regex101
code:
String input = "#process_id={//data/process_id}##history_id={//data/history_id}##Pdataxml={//data/dataxml}##Prules =_UNESCAPEXMLVALUE({//data/rules})##submitted_by={//data/submitted_by}##table_definition={//data/table_definition}";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("\\{\\/\\/[^\\/{}\\s]*\\/[^\\/{}\\s]*\\}").matcher(input);
while (m.find()) {
allMatches.add(m.group());
}
System.out.println(allMatches);
output:
[{//data/process_id}, {//data/history_id}, {//data/dataxml}, {//data/rules}, {//data/submitted_by}, {//data/table_definition}]

Try this regex with a Matcher:
"\\{//([^/]+)/([^/}]+)}"
The parts are captured in groups 1 and 2.
Like this:
Matcher m = Pattern.compile("\\{//([^/]+)/([^/}]+)}").matcher(str);
while (m.find()) {
String part1 = m.group(1);
String part2 = m.group(2);
// do something with the parts
}
To just grab the whole thing, which would be got from m.group(), use this regex:
"(?<=\\{)//[^/]+/[^/}]+(?=})"

extract a set of a characters between some characters

I have a string email = John.Mcgee.r2d2#hitachi.com
How can I write a java code using regex to bring just the r2d2?
I used this but got an error on eclipse
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = patter.matcher
for (Strimatcher.find()){
System.out.println(matcher.group(1));
}

To match after the last dot in a potential sequence of multiple dots request that the sequence that you capture does not contain a dot:
(?<=[.])([^.]*)(?=#)
(?<=[.]) means "preceded by a single dot"
(?=#) means "followed by # sign"
Note that since dot . is a metacharacter, it needs to be escaped either with \ (doubled for Java string literal) or with square brackets around it.
Demo.

Not sure if your posting the right code. I'll rewrite it based on what it should look like though:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
int count = 0;
while(matcher.find()) {
count++;
System.out.println(matcher.group(count));
}
but I think you just want something like this:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
if(matcher.find()){
System.out.println(matcher.group(1));
}

No need to Pattern you just need replaceAll with this regex .*\.([^\.]+)#.* which mean get the group ([^\.]+) (match one or more character except a dot) which is between dot \. and #
email = email.replaceAll(".*\\.([^\\.]+)#.*", "$1");
Output
r2d2
regex demo
If you want to go with Pattern then you have to use this regex \\.([^\\.]+)# :
String email = "John.Mcgee.r2d2#hitachi.com";
Pattern pattern = Pattern.compile("\\.([^\\.]+)#");
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
System.out.println(matcher.group(1));// Output : r2d2
}
Another solution you can use split :
String[] split = email.replaceAll("#.*", "").split("\\.");
email = split[split.length - 1];// Output : r2d2
Note :
Strings in java should be between double quotes "John.Mcgee.r2d2#hitachi.com"
You don't need to escape # in Java, but you have to escape the dot with double slash \\.
There are no syntax for a for loop like you do for (Strimatcher.find()){, maybe you mean while

Match Strings which begin with X and end with Y?

I want to match every file name which ends with .js and is stored in a directory called lib.
Therefore I created the following regular expression: (lib/)(.*?).js$.
I tested the expression (lib/)(.*?).js$ in a Regex Tester and matched this filename: src/main/lib/abc/DocumentHandler.js.
To use my expression in Java, I escaped it to: (lib/)(.*?)\\.js$.
Nevertheless, Java tells me that my expression does not match.
Here is my code:
String regEx = "(lib/)(.*?).js$";
String escapedRegEx = "(lib/)(.*?)\\.js$";
Pattern pattern = Pattern.compile(escapedRegEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
System.out.println("Matches: " + matcher.matches()); // false :-(
Did I forgot to escape something?

Use Matcher.find() instead of Matcher.matches() to check for subset of any string.
As per Java Doc:
Matcher#matches()
Attempts to match the entire region against the pattern.
Matcher#find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
sample code:
String regEx = "(lib/)(.*)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) { // <== returns true if found
System.out.println("Matches: " + matcher.group());
System.out.println("Path: " + matcher.group(2));
}
output:
Matches: lib/abc/DocumentHandler.js
Path: abc/DocumentHandler
Use Matcher#group(index) to get the matched group that is grouped by enclosing inside parenthesis (...) in the regex pattern.
You can use String#matches() method to match the whole string.
String regEx = "(.*)(/lib/)(.*?)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
System.out.println("Matched :" + str.matches(regEx)); // Matched : true
Note: Don't forget to escape dot . that has special meaning in regex pattern to match any thing other than new line.

Try this RegEx pattern
String regEx = "(.*)(lib\\/)(.*)(\\.js$)";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
It's working for me:

Firstly you don't need to escape it, and secondly you are not matching the first part of the string.
String regEx = "(.*)(lib/)(.*?).js$";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");

How to extract word from string?

Suppose I have a string:
String message = "you should try http://google.com/";
Now, I want to send "http://google.com/" to a new
String url
What I want to do is:
check if a "word" in the string begins with "http://" and extract that word, where a word is
something that's surrounded by spaces (general english definition of word).
I have no idea how to extract the string, and the best I can do is use startsWith on the string. How to I use startsWith on a word, and extract the word?
Sorry if this is a little bit difficult to explain.
Thanks in advance!
EDIT: Also, what should I do to extract the word from the REGEX operation? And how should I handle it if there is more than 1 url in the string?

Use Pattern & Matcher classes.
String str = "blabla http://www.mywebsite.com blabla";
String regex = "((https?:\\/\\/)?(www.)?(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_/.0-9#:+?%=&;,]*)?)?)";
Matcher m = Pattern.compile(regex).matcher(str);
if (m.find()) {
String url = m.group(); //value "http://www.mywebsite.com"
}
This regex will work for http://..., https://... and even www... URLs. Others regex can be easily found on the net.

You can try this:
String str = "blabla http://www.mywebsite.com blabla";
Matcher m = Pattern.compile("(http://.*)").matcher(str);
if (m.find()) {
String url = (new StringTokenizer(m.group(), " ")).nextToken();
}

The "correct" way to perform this task is to split the String by whitespace -- String#split("\s") -- and then pipe it to the URL constructor. If the string starts with your prefix and a MalformedURLException is thrown it is invalid. The URL class constructor is far better tested and more robust than any solution that you or I could come up with. So, use it, please and don't reinvent the wheel.

You can use Java Regex for this:
The following regex catches any string starting with http:// or https:// till the next whitespace character:
Pattern urlPattern = Pattern.compile("(http(s)?://[.^[\\S]]*)");
Matcher matcher = compile.matcher(myString);
if (matcher.find()) {
String url = matcher.group();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting part of URL using java regular expression - java

Try this: "\/p.\/bugs[^"]" it means: "/p" then: all chars, then: "/bugs", then: all chars except "

Here's another way: (?i)/p/[a-z/]+bugs/[^ "]+ The (?i) in the beginning makes the regex case insensitive so you don't have to worry about that. Then after bugs/ it will continue until it reaches either a space or a ".

Related

Java Regex : Extract a specific pattern from a string "I_INSERT_TO_TOPIC_345674_123456_4.json"

Java Regular expression for exacted matched case

extract a set of a characters between some characters

Match Strings which begin with X and end with Y?

How to extract word from string?

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting part of URL using java regular expression - java

Try this: "\/p.*\/bugs[^"]*" it means: "/p" then: all chars, then: "/bugs", then: all chars except "

Here's another way: (?i)/p/[a-z/]+bugs/[^ "]+ The (?i) in the beginning makes the regex case insensitive so you don't have to worry about that. Then after bugs/ it will continue until it reaches either a space or a ".

Related

Java Regex : Extract a specific pattern from a string "I_INSERT_TO_TOPIC_345674_123456_4.json"

Java Regular expression for exacted matched case

extract a set of a characters between some characters

Match Strings which begin with X and end with Y?

How to extract word from string?

Categories

Resources

Try this: "\/p.\/bugs[^"]" it means: "/p" then: all chars, then: "/bugs", then: all chars except "