Extract part of a string from a URL - Java Regex

Extract part of a string from a URL - Java Regex - java

I'm trying to extract a string between '/' and '.' of a URL. For example, I have a URL like "some.com/part1/part2/part3/stringINeed.xyz". I need to extract "stringINeed" from the above URL, the one between last '/' and the '.' nothing else.
So far, I tried the following and it gives an empty output:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Extract
{
public static void main (String[] args) throws java.lang.Exception
{
String str = "part1/part2/part3/stringINeed.xyz" ;
Pattern pattern = Pattern.compile("/(.*?).");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
What is wrong with my code. Can anyone help?

Use this regex:
[^/.]+(?=\.[^.]+$)
See demo.
In Java:
Pattern regex = Pattern.compile("[^/.]+(?=\\.[^.]+$)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
Explanation
[^/.]+ matches any chars that are not a slash or a dot
The lookahead (?=\.[^.]+) asserts that what follows is a dot followed by non-dots and the end of the string

Without regex
str.substring(str.lastIndexOf("/"), str.lastIndexOf(".")).replaceAll("/", "");

Related

Java - regular expression to check if begins and ends with certain characters

Considering a string in following format,
[ABCD:defg] [MSG:information] [MSG2:hello]
How to write regex to check if the line has '[MSG:' followed by some message & ']' and extract text 'information' from above string?

You can use the regex, \[MSG:(.*?)\] and extract the value of group(1).
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String args[]) {
String str = "[ABCD:defg] [MSG:information] [MSG2:hello]";
Matcher matcher = Pattern.compile("\\[MSG:(.*?)\\]").matcher(str);
if (matcher.find())
System.out.println(matcher.group(1));
}
}
Output:
information

Your requirement would be something like
/\[MSG:.+\]/ in standard regex notation. But I would suggest to you that you could use String.indexOf to extract your information
String str = ...
int idx = str.indexOf("MSG:");
int idx2 = str.indexOf("]", idx);
val = str.substring(idx + "MSG:".length(), idx2);

Regex to split the first from a "/token1/token2/token3"

I'm pretty rusty with regex, but I have the requirement to extract the first token of the following string:
Input: /token1/token2/token3
Required output: /token1
I have tried:
List<String> connectorPath = Splitter.on("^[/\\w+]+")
.trimResults()
.splitToList(actionPath);
Doesn't work for me, any ideas?

Instead of split, you can match
^/\\w+
Or if the string has 3 parts, use a capture group for the first part.
^(/\\w+)/\\w+/\\w+$
Java example
Pattern pattern = Pattern.compile("^/\\w+");
Matcher matcher = pattern.matcher("/token1/token2/token3");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
/token1

You can split on the / that is not at the string start using the (?!^)/ regex:
String[] res = "/token1/token2/token3".split("(?!^)/");
System.out.println(res[0]); // => /token1
See the Java code demo and the regex demo.
(?!^) - a negative lookahead that matches a location not at the start of string
/ - a / char.
Using Guava:
Splitter splitter = Splitter.onPattern("(?!^)/").trimResults();
Iterable<String> iterable = splitter.split(actionPath);
String first = Iterables.getFirst(iterable, "");

You are over-complicating it.
Try the following regular expression: ^(\/\w+)(.+)$
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PathSplitter {
public static void main(String args[]) {
String input = "/token1/token2/token3";
Pattern pattern = Pattern.compile("^(\\/\\w+)(.+)$");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println(matcher.group(1)); // /token1
System.out.println(matcher.group(2)); // /token2/token3
} else {
System.out.println("NO MATCH");
}
}
}

Remove double quotes from output Java

I am trying to extract a url from the string. But I am unable to skip the double quotes in the output.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String s1 = "<a id=\"BUTTON_LINK\" style=\"%%BUTTON_LINK%%\" target=\"_blank\" href=\"https://||domainName||/basketReviewPageLoadAction.do\">%%CHECKOUT%%</a>";
//System.out.println(s1);
Pattern pattern = Pattern.compile("\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))");
Matcher matcher = pattern.matcher(s1);
if(matcher.find()){
String url = matcher.group(1);
System.out.println(url);
}
}
}
My Output is:
"https://||domainName||/basketReviewPageLoadAction.do"
Expected Output is:
https://||domainName||/basketReviewPageLoadAction.do
I cannot do string replace. I have add few get param in this output and attach back it to original string.

Regex: (?<=href=")([^\"]*) Substitution: $1?params...
Details:
(?<=) Positive Lookbehind
() Capturing group
[^] Match a single character not present in the list
* Matches between zero and unlimited times
$1 Group 1.
Java code:
By using function replaceAll you can add your params ?abc=12 to the end of the capturing group $1 in this case href.
String text = "<a id=\"BUTTON_LINK\" style=\"%%BUTTON_LINK%%\" target=\"_blank\" href=\"https://||domainName||/basketReviewPageLoadAction.do\">%%CHECKOUT%%</a>";
text = text.replaceAll("(?<=href=\")([^\"]*)", String.format("$1%s", "?abc=12"));
System.out.print(text);
Output:
<a id="BUTTON_LINK" style="%%BUTTON_LINK%%" target="_blank" href="https://||domainName||/basketReviewPageLoadAction.do?abc=12">%%CHECKOUT%%</a>
Code demo

You can try one of these options:
System.out.println(url.replaceAll("^\"|\"$", ""));
System.out.println(url.substring(1, url.length()-1));

ugly, seems works.Hope this help.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
class Main {
public static void main(String[] args) {
String s1 = "<a id=\"BUTTON_LINK\" style=\"%%BUTTON_LINK%%\" target=\"_blank\" href= \"https://||domainName||/basketReviewPageLoadAction.do\">%%CHECKOUT%%</a>";
//System.out.println(s1);
Pattern pattern = Pattern.compile("\\s*(?i)href\\s*=\\s*(\"([^\"]*)\"|'([^']*)'|([^'\">\\s]+))");
Matcher matcher = pattern.matcher(s1);
if (matcher.find()) {
String url = Stream.of(matcher.group(2), matcher.group(3),
matcher.group(4)).filter(s -> s != null).collect(Collectors.joining());
System.out.print(url);
}
}
}

This solution worked for now.
Pattern pattern = Pattern.compile("\\s*(?i)href\\s*=\\s*\"([^\"]*)");

You will try this out,
s1 = s1.Replace("\"", "");

How to preserve delimeters while using String.split() in Java?

String TextValue = "hello{MyVar} Discover {MyVar2} {MyVar3}";
String[] splitString = TextValue.split("\\{*\\}");
What I'm getting output is [{MyVar, {MyVar2, {MyVar3] in splitString
But my requirement is to preserve those delimiters {} i.e. [{MyVar}, {MyVar2}, {MyVar3}].
Required a way to match above output.

Use something like so:
Pattern p = Pattern.compile("(\\{\\w+\\})");
String str = ...
Matcher m = p.matcher(str);
while(m.find())
System.out.println(m.group(1));
Note, the code above is untested but that will look for words within curly brackets and place them in a group. It will then go over the string and output any string which matches the expression above.
An example of the regular expression is available here.

Thanks kelvin & npinti.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class CreateMatcherExample {
public static void main(String[] args) {
String TextValue = "hello{MyVar} Discover {My_Var2} {My_Var3}";
String patternString = "\\{\\w+\\}";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(TextValue);
while(matcher.find()) {
System.out.println(matcher.group());
}
}
}

Regular expression not matching

I'm trying to write a small program that extract information from a website. I only want to get certain information that is in between two strings, "ORIGIN" and "//". Im not getting any errors in the code but I'm unable to print the info to screen for some reason. Could someone point out what I'm doing wrong?
import java.io.IOException;
import java.io.PrintStream;
import java.io.FileOutputStream;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.util.regex.*;
class main {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=293762&db=nuccore&dopt=genbank&extrafeat=976&fmt_mask=0&retmode=html&withmarkup=on&log$=seqview&maxplex=3&maxdownloadsize=1000000").get();
String text = doc.text();
String pattern1 = "ORIGIN";
String pattern2 = "//";
String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);
Pattern pattern = Pattern.compile(regexString, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
String textInBetween = matcher.group(1);
}
Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group(1));
}
}
}

You need to use the DOTALL flag to match any possible newline characters
Pattern pattern = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" +
Pattern.quote(pattern2), Pattern.DOTALL);

You have to compile the patterns with DOTALL modifier:
Pattern pattern = Pattern.compile(regexString, Pattern.MULTILINE | Pattern.DOTALL);
Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2), Pattern.DOTALL);
This modifier allows the period . to match every character including new lines. Without them, dot matches every character except for new lines.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract part of a string from a URL - Java Regex - java

Without regex str.substring(str.lastIndexOf("/"), str.lastIndexOf(".")).replaceAll("/", "");

Related

Java - regular expression to check if begins and ends with certain characters

Regex to split the first from a "/token1/token2/token3"

Remove double quotes from output Java

How to preserve delimeters while using String.split() in Java?

Regular expression not matching

Categories

Resources