How to extract data using pattern and matcher - java

Hey i am trying to map old URLs to new url. Like -
/oldapp/viewReview.do?action=show_references&bugId=xy12&queueName=OLLD-CodeReviews
to - newapp/review/reference?bugId=xy12&queueName=OLLD-CodeReviews
How i can use Pattern and Matcher to match the pattern and extract bugId and queueName from URL. please help.

Any characters followed by ? or & followed by the identifier, =, and the value which cannot contain & as a group, and then any trailing characters:
Pattern bugidp = Pattern.compile(".*[?&]bugId=([^&]+).*");
Pattern queuep = Pattern.compile(".*[?&]queueName=([^&]+).*");
Matcher bugidm = bugidp.matcher(url);
Matcher queuem = queuep.matcher(url);
if (bugid.matches() && queuem.matches()) {
String bugid = bugidm.group(1);
String qname = queuem.group(1);
String newrl = String.format("newapp/review/reference?bugId=%s&queueName=%s",
bugid, qname);
} else {
// not found
}

Related

Regex for finding mp4 in string

I want to get all .mp4 URLs of this String using Regex.
Also I want to know how to get only the last .mp4 URL using Regex.
Thanks
contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8},
Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd},
Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4},
Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4},
Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]";
Regex:
https?.*?\.mp4
Literal http
Followed by an optional 's': s?
Remove the question mark if they will all use HTTPS.
Followed by as few characters as possible: .*?
Followed by an mp4 extension (literal dot) \.mp4
2 Approaches:
If you're sure the URL's will always begin with https:// and will not contain a mp4 after the complete URL is finished, then you can use
pattern = "https://.*mp4";
String[] arr = {
"contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8}",
"Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd}",
"Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4}",
"Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4}",
"Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]"
};
String pattern = "https://.*mp4";
Pattern r = Pattern.compile(pattern);
for (String line : arr) {
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println(m.group(0));
} else {
System.out.println("NO MATCH");
}
}
If not, to Support all types of URL's then change your pattern to what is defined here with a little modification,
String pattern =
"(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" +
"(\\w+:\\w+#)?(([-\\w]+\\.)+(com|org|net|gov" +
"|mil|biz|info|mobi|name|aero|jobs|museum" +
"|travel|[a-z]{2}))(:[\\d]{1,5})?" +
"(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" +
"((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" +
"(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" +
"(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b"+"mp4";
Output:
NO MATCH
NO MATCH
https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4

How to split a long string in Java?

How to edit this string and split it into two?
String asd = {RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef};
I want to make two strings.
String reponame;
String RepoID;
reponame should be CodeCommitTest
repoID should be 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Can someone help me get it? Thanks
Here is Java code using a regular expression in case you can't use a JSON parsing library (which is what you probably should be using):
String pattern = "^\\{RepositoryName:\\s(.*?),RepositoryId:\\s(.*?)\\}$";
String asd = "{RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef}";
String reponame = "";
String repoID = "";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(asd);
if (m.find()) {
reponame = m.group(1);
repoID = m.group(2);
System.out.println("Found reponame: " + reponame + " with repoID: " + repoID);
} else {
System.out.println("NO MATCH");
}
This code has been tested in IntelliJ and runs without error.
Output:
Found reponame: CodeCommitTest with repoID: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Assuming there aren't quote marks in the input, and that the repository name and ID consist of letters, numbers, and dashes, then this should work to get the repository name:
Pattern repoNamePattern = Pattern.compile("RepositoryName: *([A-Za-z0-9\\-]+)");
Matcher matcher = repoNamePattern.matcher(asd);
if (matcher.find()) {
reponame = matcher.group(1);
}
and you can do something similar to get the ID. The above code just looks for RepositoryName:, possibly followed by spaces, followed by one or more letters, digits, or hyphen characters; then the group(1) method extracts the name, since it's the first (and only) group enclosed in () in the pattern.

Matcher not finding matches

I'm trying to extract the numbers in the following string:
09/29/2014
I am currently using the code:
Pattern p = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
Matcher m = p.matcher(startDatepicker);
String startYear = m.group(3);
String startMonth = m.group(1);
String startDay = m.group(2);
startDatepicker contains: 09/29/2014
However, I am not receiving any matches.. I also tried escaping the forward slashes with \\ but that also didn't work.
Am I missing something?
Thanks for your help.
Before you could access the matched groups, you need to call find() on the matcher, and check that it has found a match:
Pattern p = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
Matcher m = p.matcher(startDatepicker);
if (!m.find()) {
return;
}
String startYear = m.group(3);
String startMonth = m.group(1);
String startDay = m.group(2);
The call of m.find() positions the matcher on the first match.
Demo.
You need to call find() to iterate through your match groups.
Pattern p = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
Matcher m = p.matcher(startDatepicker);
while (m.find()) {
...
}
The find() method searches for occurrences of the regex in the input passed to p.matcher(). If multiple matches can be found, this method will find the first, and then move to the next match for each subsequent call.

Pattern Matching with dynamic matcher

Sample input string : Customer ${/xml:Name} has Ordered Product ${/xml:product} of ${/xml:unit} units.
i able to find get strings that match ${ ...... } using "\\$\\{.*?\\}"
I resolve the value for string from xml and now i have to replace the value back in input string.
i am using this method,
Pattern MY_PATTERN = Pattern.compile("\\$\\{.*?\\}");
Matcher m = MY_PATTERN.matcher(inputstring);
while (m.find()) {
String s = m.group(0); // s is ${/xml:Name}
// escaping wild characters
s = s.replaceAll("${", "\\$\\{"); // s is \$\{/xml:Name}
s = s.replaceAll("}", "\\}"); // s is \$\{/xml:Name\}
Pattern inner_pattern = Pattern.compile(s);
Matcher m1 = inner_pattern.matcher(inputstring);
name = m1.replaceAll(xPathValues.get(s));
}
but i get error at s = s.replaceAll("${", "\\$\\{"); i get Pattern Syntax Exception
You must escape the { too, try $\\{
Instead of:
s = s.replaceAll("${", "\\$\\{"); // s is \$\{/xml:Name}
s = s.replaceAll("}", "\\}"); // s is \$\{/xml:Name\}
You can use it without regex method String#replace(string):
s = s.replace("${", "\\$\\{").replace("}", "\\}"); // s is \$\{/xml:Name\}
It's because you could have a regexp likea{1,4} means to match a,aa,aaa,aaaa so a times 1 to 4, java tries to interpret your regexp like this, therefore try escaping the {
Yes, you must escape the {, but I would rather capture what's inside the braces:
Pattern MY_PATTERN = Pattern.compile("\\$\\{/xml:(.*?)\\}");
Matcher m = MY_PATTERN.matcher(inputstring);
while (m.find()) {
name = m.group(1); // s is Name
...
}

Get an array of Strings matching a pattern from a String

I have a long string let's say
I like this #computer and I want to buy it from #XXXMall.
I know the regular expression pattern is
Pattern tagMatcher = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
Now i want to get all the hashtags in an array. How can i use this expression to get array of all hash tags from string something like
ArrayList hashtags = getArray(pattern, str)
You can write like?
private static List<String> getArray(Pattern tagMatcher, String str) {
Matcher m = tagMatcher.matcher(str);
List<String> l = new ArrayList<String>();
while(m.find()) {
String s = m.group(); //will give you "#computer"
s = s.substring(1); // will give you just "computer"
l.add(s);
}
return l;
}
Also you can use \\w- instead of A-Za-z0-9-_ making the regex [#]+[\\w]+\\b
This link would surely be helpful for achieving what you want.
It says:
The find() method searches for occurrences of the regular expressions
in the text passed to the Pattern.matcher(text) method, when the
Matcher was created. If multiple matches can be found in the text, the
find() method will find the first, and then for each subsequent call
to find() it will move to the next match.
The methods start() and end() will give the indexes into the text
where the found match starts and ends.
Example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
You got the hint now.
Here is one way, using Matcher
Pattern tagMatcher = Pattern.compile("#+[-\\w]+\\b");
Matcher m = tagMatcher.matcher(stringToMatch);
ArrayList<String> hashtags = new ArrayList<>();
while (m.find()) {
hashtags.add(m.group());
}
I took the liberty of simplifying your regex. # does not need to be in a character class. [A-Za-z0-9_] is the same as \w, so [A-Za-z0-9-_] is the same as [-\w]
You can use :
String val="I like this #computer and I want to buy it from #XXXMall.";
String REGEX = "(?<=#)[A-Za-z0-9-_]+";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(val);
while(matcher.find()){
list.add(matcher.group());
}
(?<=#) Positive Lookbehind - Assert that the character # literally be matched.
you can use the following code for getting the names
String saa = "#{akka}nikhil#{kumar}aaaaa";
Pattern regex = Pattern.compile("#\\{(.*?)\\}");
Matcher m = regex.matcher(saa);
while(m.find()) {
String s = m.group(1);
System.out.println(s);
}
It will print
akka
kumar

Categories

Resources