Using Regular Expression in Java to extract information from a String

Using Regular Expression in Java to extract information from a String - java

I have one input String like this:
"I am Duc/N Ta/N Van/N"
String "/N" present it is the Name of one person.
The expected output is:
Name: Duc Ta Van
How can I do it by using regular expression?

You can use Pattern and Matcher like this :
String input = "I am Duc/N Ta/N Van/N";
Pattern pattern = Pattern.compile("([^\\s]+)/N");
Matcher matcher = pattern.matcher(input);
String result = "";
while (matcher.find()) {
result+= matcher.group(1) + " ";
}
System.out.println("Name: " + result.trim());
Output
Name: Duc Ta Van
Another Solution using Java 9+
From Java9+ you can use Matcher::results like this :
String input = "I am Duc/N Ta/N Van/N";
String regex = "([^\\s]+)/N";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
String result = matcher.results().map(s -> s.group(1)).collect(Collectors.joining(" "));
System.out.println("Name: " + result); // Name: Duc Ta Van

Here is the regex to use to capture every "name" preceded by a /N
(\w+)\/N
Validate with Regex101
Now, you just need to loop on every match in that String and concatenate the to get the result :
String pattern = "(\\w+)\\/N";
String test = "I am Duc/N Ta/N Van/N";
Matcher m = Pattern.compile(pattern).matcher(test);
StringBuilder sbNames = new StringBuilder();
while(m.find()){
sbNames.append(m.group(1)).append(" ");
}
System.out.println(sbNames.toString());
Duc Ta Van
It is giving you the hardest part. I let you adapt this to match your need.
Note :
In java, it is not required to escape a forward slash, but to use the same regex in the entire answer, I will keep "(\\w+)\\/N", but "(\\w+)/N" will work as well.

I've used "[/N]+" as the regular expression.
Regex101
[] = Matches characters inside the set
\/ = Matches the character / literally (case sensitive)
+ = Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

Related

avoid using multiple split method

I have a string like this.
//Locaton;RowIndex;maxRows=New York, NY_10007;1;4
From this i need to get the contry name New York only.
How it can possible in a single step code.
i used..
String str = "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 ";
str = str.split("=")[1];
str = str.split(",")[0]
the above code contails lots of splits.How can i avoid thiis.
i want to get the contry name only using single code.

Try to use this regular expression "=(.*?)," like this:
String str = "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 ";
Pattern pattern = Pattern.compile("=(.*?),");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
New York
Using matcher.group(1) means capturing groups make it easy to extract part of the regex match,parentheses also create a numbered capturing group.
It stores the part of the string matched by the part of the regular expression inside the parentheses.
Match "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 "
Group 1: "New York"

Use capture groups with regex which perfect capturing the specific data from string.
String str = "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 ";
String pattern = "(.*?=)(.*?)(,.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(str);
if (m.find()) {
System.out.println("Group 1: " + m.group(1));
System.out.println("Group 2: " + m.group(2));
System.out.println("Group 3: " + m.group(3));
}
Here is the output
Group 1: Locaton;RowIndex;maxRows=
Group 2: New York
Group 3: , NY_10007;1;4

How to split a long string in Java?

How to edit this string and split it into two?
String asd = {RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef};
I want to make two strings.
String reponame;
String RepoID;
reponame should be CodeCommitTest
repoID should be 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Can someone help me get it? Thanks

Here is Java code using a regular expression in case you can't use a JSON parsing library (which is what you probably should be using):
String pattern = "^\\{RepositoryName:\\s(.*?),RepositoryId:\\s(.*?)\\}$";
String asd = "{RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef}";
String reponame = "";
String repoID = "";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(asd);
if (m.find()) {
reponame = m.group(1);
repoID = m.group(2);
System.out.println("Found reponame: " + reponame + " with repoID: " + repoID);
} else {
System.out.println("NO MATCH");
}
This code has been tested in IntelliJ and runs without error.
Output:
Found reponame: CodeCommitTest with repoID: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef

Assuming there aren't quote marks in the input, and that the repository name and ID consist of letters, numbers, and dashes, then this should work to get the repository name:
Pattern repoNamePattern = Pattern.compile("RepositoryName: *([A-Za-z0-9\\-]+)");
Matcher matcher = repoNamePattern.matcher(asd);
if (matcher.find()) {
reponame = matcher.group(1);
}
and you can do something similar to get the ID. The above code just looks for RepositoryName:, possibly followed by spaces, followed by one or more letters, digits, or hyphen characters; then the group(1) method extracts the name, since it's the first (and only) group enclosed in () in the pattern.

Match Strings which begin with X and end with Y?

I want to match every file name which ends with .js and is stored in a directory called lib.
Therefore I created the following regular expression: (lib/)(.*?).js$.
I tested the expression (lib/)(.*?).js$ in a Regex Tester and matched this filename: src/main/lib/abc/DocumentHandler.js.
To use my expression in Java, I escaped it to: (lib/)(.*?)\\.js$.
Nevertheless, Java tells me that my expression does not match.
Here is my code:
String regEx = "(lib/)(.*?).js$";
String escapedRegEx = "(lib/)(.*?)\\.js$";
Pattern pattern = Pattern.compile(escapedRegEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
System.out.println("Matches: " + matcher.matches()); // false :-(
Did I forgot to escape something?

Use Matcher.find() instead of Matcher.matches() to check for subset of any string.
As per Java Doc:
Matcher#matches()
Attempts to match the entire region against the pattern.
Matcher#find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
sample code:
String regEx = "(lib/)(.*)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) { // <== returns true if found
System.out.println("Matches: " + matcher.group());
System.out.println("Path: " + matcher.group(2));
}
output:
Matches: lib/abc/DocumentHandler.js
Path: abc/DocumentHandler
Use Matcher#group(index) to get the matched group that is grouped by enclosing inside parenthesis (...) in the regex pattern.
You can use String#matches() method to match the whole string.
String regEx = "(.*)(/lib/)(.*?)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
System.out.println("Matched :" + str.matches(regEx)); // Matched : true
Note: Don't forget to escape dot . that has special meaning in regex pattern to match any thing other than new line.

Try this RegEx pattern
String regEx = "(.*)(lib\\/)(.*)(\\.js$)";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
It's working for me:

Firstly you don't need to escape it, and secondly you are not matching the first part of the string.
String regEx = "(.*)(lib/)(.*?).js$";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");

RegEx: Grabbing value between quotation marks from string

This is related to: RegEx: Grabbing values between quotation marks.
If there is a String like this:
HYPERLINK "hyperlink_funda.docx" \l "Sales"
The regex given on the link
(["'])(?:(?=(\\?))\2.)*?\1
is giving me
[" HYPERLINK ", " \l ", " "]
What regex will return values enclosed in quotation mark (specifically between the \" marks) ?
["hyperlink_funda.docx", "Sales"]
Using Java, String.split(String regex) way.

You're not supposed to use that with .split() method. Instead use a Pattern with capturing groups:
{
Pattern pattern = Pattern.compile("([\"'])((?:(?=(\\\\?))\\3.)*?)\\1");
Matcher matcher = pattern.matcher(" HYPERLINK \"hyperlink_funda.docx\" \\l \"Sales\" ");
while (matcher.find())
System.out.println(matcher.group(2));
}
Output:
hyperlink_funda.docx
Sales
Here is a regex demo, and here is an online code demo.

I think you are misunderstanding the nature of the String.split method. Its job is to find a way of splitting a string by matching the features of the separator, not by matching features of the strings you want returned.
Instead you should use a Pattern and a Matcher:
String txt = " HYPERLINK \"hyperlink_funda.docx\" \\l \"Sales\" ";
String re = "\"([^\"]*)\"";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(txt);
ArrayList<String> matches = new ArrayList<String>();
while (m.find()) {
String match = m.group(1);
matches.add(match);
}
System.out.println(matches);

Get an array of Strings matching a pattern from a String

I have a long string let's say
I like this #computer and I want to buy it from #XXXMall.
I know the regular expression pattern is
Pattern tagMatcher = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
Now i want to get all the hashtags in an array. How can i use this expression to get array of all hash tags from string something like
ArrayList hashtags = getArray(pattern, str)

You can write like?
private static List<String> getArray(Pattern tagMatcher, String str) {
Matcher m = tagMatcher.matcher(str);
List<String> l = new ArrayList<String>();
while(m.find()) {
String s = m.group(); //will give you "#computer"
s = s.substring(1); // will give you just "computer"
l.add(s);
}
return l;
}
Also you can use \\w- instead of A-Za-z0-9-_ making the regex [#]+[\\w]+\\b

This link would surely be helpful for achieving what you want.
It says:
The find() method searches for occurrences of the regular expressions
in the text passed to the Pattern.matcher(text) method, when the
Matcher was created. If multiple matches can be found in the text, the
find() method will find the first, and then for each subsequent call
to find() it will move to the next match.
The methods start() and end() will give the indexes into the text
where the found match starts and ends.
Example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
You got the hint now.

Here is one way, using Matcher
Pattern tagMatcher = Pattern.compile("#+[-\\w]+\\b");
Matcher m = tagMatcher.matcher(stringToMatch);
ArrayList<String> hashtags = new ArrayList<>();
while (m.find()) {
hashtags.add(m.group());
}
I took the liberty of simplifying your regex. # does not need to be in a character class. [A-Za-z0-9_] is the same as \w, so [A-Za-z0-9-_] is the same as [-\w]

You can use :
String val="I like this #computer and I want to buy it from #XXXMall.";
String REGEX = "(?<=#)[A-Za-z0-9-_]+";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(val);
while(matcher.find()){
list.add(matcher.group());
}
(?<=#) Positive Lookbehind - Assert that the character # literally be matched.

you can use the following code for getting the names
String saa = "#{akka}nikhil#{kumar}aaaaa";
Pattern regex = Pattern.compile("#\\{(.*?)\\}");
Matcher m = regex.matcher(saa);
while(m.find()) {
String s = m.group(1);
System.out.println(s);
}
It will print
akka
kumar

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using Regular Expression in Java to extract information from a String - java

I have one input String like this: "I am Duc/N Ta/N Van/N" String "/N" present it is the Name of one person. The expected output is: Name: Duc Ta Van How can I do it by using regular expression?

I've used "[/N]+" as the regular expression. Regex101 [] = Matches characters inside the set \/ = Matches the character / literally (case sensitive) + = Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

Related

avoid using multiple split method

How to split a long string in Java?

Match Strings which begin with X and end with Y?

RegEx: Grabbing value between quotation marks from string

Get an array of Strings matching a pattern from a String

Categories

Resources