Java - regular expression to check if begins and ends with certain characters

Java - regular expression to check if begins and ends with certain characters - java

Considering a string in following format,
[ABCD:defg] [MSG:information] [MSG2:hello]
How to write regex to check if the line has '[MSG:' followed by some message & ']' and extract text 'information' from above string?

You can use the regex, \[MSG:(.*?)\] and extract the value of group(1).
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String args[]) {
String str = "[ABCD:defg] [MSG:information] [MSG2:hello]";
Matcher matcher = Pattern.compile("\\[MSG:(.*?)\\]").matcher(str);
if (matcher.find())
System.out.println(matcher.group(1));
}
}
Output:
information

Your requirement would be something like
/\[MSG:.+\]/ in standard regex notation. But I would suggest to you that you could use String.indexOf to extract your information
String str = ...
int idx = str.indexOf("MSG:");
int idx2 = str.indexOf("]", idx);
val = str.substring(idx + "MSG:".length(), idx2);

Related

Regex to split the first from a "/token1/token2/token3"

I'm pretty rusty with regex, but I have the requirement to extract the first token of the following string:
Input: /token1/token2/token3
Required output: /token1
I have tried:
List<String> connectorPath = Splitter.on("^[/\\w+]+")
.trimResults()
.splitToList(actionPath);
Doesn't work for me, any ideas?

Instead of split, you can match
^/\\w+
Or if the string has 3 parts, use a capture group for the first part.
^(/\\w+)/\\w+/\\w+$
Java example
Pattern pattern = Pattern.compile("^/\\w+");
Matcher matcher = pattern.matcher("/token1/token2/token3");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
/token1

You can split on the / that is not at the string start using the (?!^)/ regex:
String[] res = "/token1/token2/token3".split("(?!^)/");
System.out.println(res[0]); // => /token1
See the Java code demo and the regex demo.
(?!^) - a negative lookahead that matches a location not at the start of string
/ - a / char.
Using Guava:
Splitter splitter = Splitter.onPattern("(?!^)/").trimResults();
Iterable<String> iterable = splitter.split(actionPath);
String first = Iterables.getFirst(iterable, "");

You are over-complicating it.
Try the following regular expression: ^(\/\w+)(.+)$
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PathSplitter {
public static void main(String args[]) {
String input = "/token1/token2/token3";
Pattern pattern = Pattern.compile("^(\\/\\w+)(.+)$");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println(matcher.group(1)); // /token1
System.out.println(matcher.group(2)); // /token2/token3
} else {
System.out.println("NO MATCH");
}
}
}

Remove double quotes from output Java

I am trying to extract a url from the string. But I am unable to skip the double quotes in the output.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String s1 = "<a id=\"BUTTON_LINK\" style=\"%%BUTTON_LINK%%\" target=\"_blank\" href=\"https://||domainName||/basketReviewPageLoadAction.do\">%%CHECKOUT%%</a>";
//System.out.println(s1);
Pattern pattern = Pattern.compile("\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))");
Matcher matcher = pattern.matcher(s1);
if(matcher.find()){
String url = matcher.group(1);
System.out.println(url);
}
}
}
My Output is:
"https://||domainName||/basketReviewPageLoadAction.do"
Expected Output is:
https://||domainName||/basketReviewPageLoadAction.do
I cannot do string replace. I have add few get param in this output and attach back it to original string.

Regex: (?<=href=")([^\"]*) Substitution: $1?params...
Details:
(?<=) Positive Lookbehind
() Capturing group
[^] Match a single character not present in the list
* Matches between zero and unlimited times
$1 Group 1.
Java code:
By using function replaceAll you can add your params ?abc=12 to the end of the capturing group $1 in this case href.
String text = "<a id=\"BUTTON_LINK\" style=\"%%BUTTON_LINK%%\" target=\"_blank\" href=\"https://||domainName||/basketReviewPageLoadAction.do\">%%CHECKOUT%%</a>";
text = text.replaceAll("(?<=href=\")([^\"]*)", String.format("$1%s", "?abc=12"));
System.out.print(text);
Output:
<a id="BUTTON_LINK" style="%%BUTTON_LINK%%" target="_blank" href="https://||domainName||/basketReviewPageLoadAction.do?abc=12">%%CHECKOUT%%</a>
Code demo

You can try one of these options:
System.out.println(url.replaceAll("^\"|\"$", ""));
System.out.println(url.substring(1, url.length()-1));

ugly, seems works.Hope this help.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
class Main {
public static void main(String[] args) {
String s1 = "<a id=\"BUTTON_LINK\" style=\"%%BUTTON_LINK%%\" target=\"_blank\" href= \"https://||domainName||/basketReviewPageLoadAction.do\">%%CHECKOUT%%</a>";
//System.out.println(s1);
Pattern pattern = Pattern.compile("\\s*(?i)href\\s*=\\s*(\"([^\"]*)\"|'([^']*)'|([^'\">\\s]+))");
Matcher matcher = pattern.matcher(s1);
if (matcher.find()) {
String url = Stream.of(matcher.group(2), matcher.group(3),
matcher.group(4)).filter(s -> s != null).collect(Collectors.joining());
System.out.print(url);
}
}
}

This solution worked for now.
Pattern pattern = Pattern.compile("\\s*(?i)href\\s*=\\s*\"([^\"]*)");

You will try this out,
s1 = s1.Replace("\"", "");

How to preserve delimeters while using String.split() in Java?

String TextValue = "hello{MyVar} Discover {MyVar2} {MyVar3}";
String[] splitString = TextValue.split("\\{*\\}");
What I'm getting output is [{MyVar, {MyVar2, {MyVar3] in splitString
But my requirement is to preserve those delimiters {} i.e. [{MyVar}, {MyVar2}, {MyVar3}].
Required a way to match above output.

Use something like so:
Pattern p = Pattern.compile("(\\{\\w+\\})");
String str = ...
Matcher m = p.matcher(str);
while(m.find())
System.out.println(m.group(1));
Note, the code above is untested but that will look for words within curly brackets and place them in a group. It will then go over the string and output any string which matches the expression above.
An example of the regular expression is available here.

Thanks kelvin & npinti.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class CreateMatcherExample {
public static void main(String[] args) {
String TextValue = "hello{MyVar} Discover {My_Var2} {My_Var3}";
String patternString = "\\{\\w+\\}";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(TextValue);
while(matcher.find()) {
System.out.println(matcher.group());
}
}
}

Regex after a special character in Java

I am using regex in java to get a specific output from a list of rooms at my University.
A outtake from the list looks like this:
(A55:G260) Laboratorium 260
(A55:G292) Grupperom 292
(A55:G316) Grupperom 316
(A55:G366) Grupperom 366
(HDS:FLØYEN) Fløyen (appendix)
(ODO:PC-STUE) Pulpakammeret (PC-stue)
(SALEM:KONF) Konferanserom
I want to get the value that comes between the colon and the parenthesis.
The regex I am using at the moment is:
pattern = Pattern.compile("[:]([A-Za-z0-9ÆØÅæøå-]+)");
matcher = pattern.matcher(room.text());
I've included ÆØÅ, because some of the rooms have Norwegian letters in them.
Unfortunately the regex includes the building code also (e.g. "A55") in the output... Comes out like this:
A55
A55
A55
:G260
:G292
:G316
Any ideas on how to solve this?

The problem is not your regular expression. You need to reference group(1) for the match result.
while (matcher.find()) {
System.out.println(matcher.group(1));
}
However, you may consider using a negated character class instead.
pattern = Pattern.compile(":([^)]+)");

You can try a regex like this :
public static void main(String[] args) {
String s = "(HDS:FLØYEN) Fløyen (appendix)";
// select everything after ":" upto the first ")" and replace the entire regex with the selcted data
System.out.println(s.replaceAll(".*?:(.*?)\\).*", "$1"));
String s1 = "ODO:PC-STUE) Pulpakammeret (PC-stue)";
System.out.println(s1.replaceAll(".*?:(.*?)\\).*", "$1"));
}
O/P :
FLØYEN
PC-STUE

Can try with String Opreations as follows,
String val = "(HDS:FLØYEN) Fløyen (appendix)";
if(val.contains(":")){
String valSub = val.split("\\s")[0];
System.out.println(valSub);
valSub = valSub.substring(1, valSub.length()-1);
String valA = valSub.split(":")[0];
String valB = valSub.split(":")[1];
System.out.println(valA);
System.out.println(valB);
}
Output :
(HDS:FLØYEN)
HDS
FLØYEN

import java.util.regex.Matcher;
import java.util.regex.Pattern;
class test
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "(HDS:FLØYEN) Fløyen (appendix)";
String pattern = ":([^)]+)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(1));
}
}
}

how to get character length of the unicode along with space in java

I need to find the length of my string "பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்.பி. நேற்று தேர்தல் ஆணையர் வி.சம்பத்". I got the string length as 45 but i expect the string length to be 59. Here i need to add the regular expression condition for spaces and dot (.). My code
import java.util.*;
import java.lang.*;
import java.util.regex.*;
class UnicodeLength
{
public static void main (String[] args)
{
String s="பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்பி நேற்று தேர்தல் ஆணையர் விசம்பத்";
List<String> characters=new ArrayList<String>();
Pattern pat = Pattern.compile("\\p{L}\\p{M}*");
Matcher matcher = pat.matcher(s);
while (matcher.find()) {
characters.add(matcher.group());
}
// Test if we have the right characters and length
System.out.println(characters);
System.out.println("String length: " + characters.size());
}
}

The code below worked for me. There were three issues that I fixed:
I added a check for spaces to your regular expression.
I added a check for punctuation to your regular expression.
I pasted the string from your comment into the string in your code. They weren't the same!
Here's the code:
public static void main(String[] args) {
String s = "பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்.பி. நேற்று தேர்தல் ஆணையர் வி.சம்பத்";
List<String> characters = new ArrayList<String>();
Pattern pat = Pattern.compile("\\p{P}|\\p{L}\\p{M}*| ");
Matcher matcher = pat.matcher(s);
while (matcher.find()) {
characters.add(matcher.group());
}
// Test if we have the right characters and length
int i = 1;
for (String character : characters) {
System.out.println(String.format("%d = [%s]", i++, character));
}
System.out.println("Characters Size: " + characters.size());
}
It's probably worth pointing out that your code is remarkably similar to the solution for this SO. One comment on that solution in particular led me to discover the missing check for punctuation in your code and allowed me to notice that the string from your comment didn't match the string in your code.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - regular expression to check if begins and ends with certain characters - java

Considering a string in following format, [ABCD:defg] [MSG:information] [MSG2:hello] How to write regex to check if the line has '[MSG:' followed by some message & ']' and extract text 'information' from above string?

Related

Regex to split the first from a "/token1/token2/token3"

Remove double quotes from output Java

How to preserve delimeters while using String.split() in Java?

Regex after a special character in Java

how to get character length of the unicode along with space in java

Categories

Resources