Java pattern matching using regex

Java pattern matching using regex - java

I am new to java coding and using pattern matching.I am reading this string from file. So, this will give compilation error. I have a string as follows :
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ; // no compile error
I want to extract "128.210.16.48" value and "Hello Everyone" from above string. This values are not constant.
can you please give me some suggestions?
Thanks

I suggest you to use String#split() method but still if you are looking for regex pattern then try it and get the matched group from index 1.
("[^"][\d\.]+"|"[^)]*+)
Online demo
Sample code:
String str = "find(\"128.210.16.48\",\"Hello Everyone\")";
String regex = "(\"[^\"][\\d\\.]+\"|\"[^)]*+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
"128.210.16.48"
"Hello Everyone"
Pattern explanation:
( group and capture to \1:
" '"'
[^"] any character except: '"'
[\d\.]+ any character of: digits (0-9), '\.' (1
or more times (matching the most amount
possible))
" '"'
| OR
" '"'
[^)]* any character except: ')' (0 or more
times (matching the most amount
possible))
) end of \1

Try with String.split()
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ;
System.out.println(str.split(",")[0].split("\"")[1]);
System.out.println(str.split(",")[1].split("\"")[1]);
Output:
128.210.16.48
Hello Everyone
Edit:
Explanation:
For the first string split it by comma (,). From that array choose the first string as str.split(",")[0] split the string again with doublequote (") as split("\"")[1] and choose the second element from the array. Same the second string is also done.

The accepted answer is fine, but if for some reason you wanted to still use regex (or whoever finds this question) instead of String.split here's something:
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ; // no compile error
String regex1 = "\".+?\"";
Pattern pattern1 = Pattern.compile(regex1);
Matcher matcher1 = pattern1.matcher(str);
while (matcher1.find()){
System.out.println("Matcher 1 found (trimmed): " + matcher1.group().replace("\"",""));
}
Output:
Matcher 1 found (trimmed): 128.210.16.48
Matcher 1 found (trimmed): Hello Everyone
Note: this will only work if " is only used as a separator character. See Braj's demo as an example from the comments here.

Related

How parse key-value with regex

i use Kotlin \ Java for parse some string.
My regex:
\[\'(.*?)[\]]=\'(.*?)(?!\,)[\']
text for parse:
someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..
i need result:
key1=value1
key2=value2
key3=value3
key4=value4
Thanks for help

Another regex for you
\['(\w+)'\]\s+(=)\s+'(\w+)'
Regex101 Demo Fiddle
Java test code
String str = "someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..";
String regex = "\\['(\\w+)'\\]\\s+(=)\\s+'(\\w+)'";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1) + matcher.group(2) + matcher.group(3));
}
Test result:
key1=value1
key2=value2
key3=value3
key4=value4

A few notes about the pattern that you tried
In your pattern you are not matching the spaces around the equals sign.
Also note that this part (?!\,)[\'] will always work as it says that it asserts not a comma to the right, and then matches a single quote.
You don't have to escape the \' and the single characters do not have to be in a character class.
You can use a pattern with a negated character class to capture the values between the single quotes to prevent .*? matching too much as the dot can match any character.
You might write the pattern as
\['([^']*)'\]\h+=\h+'([^']*)'
The pattern matches:
\[' Match ['
( Capture group 1
[^']* Match optional chars other than '
) Close group 1
'\] Match ']
\h+=\h+ Match an equals sign between 1 or more horizontal whitespace characters
'([^']*)' Capture group 2 which has the same pattern as group 1
Regex demo | Java demo
Example
String regex = "\\['([^']*)'\\]\\h+=\\h+'([^']*)'";
String string = "someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1) + "=" + matcher.group(2));
}
Output
key1=value1
key2=value2
key3=value3
key4=value4

In Java, how do you tokenize a string that contains the delimiter in the tokens?

Let's say I have the string:
String toTokenize = "prop1=value1;prop2=String test='1234';int i=4;;prop3=value3";
I want the tokens:
prop1=value1
prop2=String test='1234';int i=4;
prop3=value3
For backwards compatibility, I have to use the semicolon as a delimiter. I have tried wrapping code in something like CDATA:
String toTokenize = "prop1=value1;prop2=<![CDATA[String test='1234';int i=4;]]>;prop3=value3";
But I can't figure out a regular expression to ignore the semicolons that are within the cdata tags.
I've tried escaping the non-delimiter:
String toTokenize = "prop1=value1;prop2=String test='1234'\\;int i=4\\;;prop3=value3";
But then there is an ugly mess of removing the escape characters.
Do you have any suggestions?

You may match either <![CDATA...]]> or any char other than ;, 1 or more times, to match the values. To match the keys, you may use a regular \w+ pattern:
(\w+)=((?:<!\[CDATA\[.*?]]>|[^;])+)
See the regex demo.
Details
(\w+) - Group 1: one or more word chars
= - a = sign
((?:<!\[CDATA\[.*?]]>|[^;])+) - Group 1: one or more sequences of
<!\[CDATA\[.*?]]> - a <![CDATA[...]]> substring
| - or
[^;] - any char but ;
See a Java demo:
String rx = "(\\w+)=((?:<!\\[CDATA\\[.*?]]>|[^;])+)";
String s = "prop1=value1;prop2=<![CDATA[String test='1234';int i=4;]]>;prop3=value3";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1) + " => " + matcher.group(2));
}
Results:
prop1 => value1
prop2 => <![CDATA[String test='1234';int i=4;]]>
prop3 => value3

Prerequisite:
All your tokens start with prop
There is no prop in the file other than the beginning of a token
I'd just do a replace of all ;prop by ~prop
Then your string becomes:
"prop1=value1~prop2=String test='1234';int i=4~prop3=value3";
You can then tokenize using the ~ delimiter

Java repeated character regex with condition

I have large database. I want to check my database capitalize errors. I use this pattern for repeated chars. Pattern works but i need to start and end condition with string.
Pattern:
(\w)\1+
Target String:
Javaaa
result: aaa
I want to add condition to regex; Start with Ja and end with a*. Result **only must be repetead characters.
(I dont want to control programmatically only regex do this if its possible
(I'm do this with String.replaceAll(regex, string) not to
Pattern or Matcher class)

You may use a lookahead anchored at the leading word boundary:
\b(?=Ja\w*a\b)\w*?((\w)\2+)\w*\b
See the regex demo
Details:
\b - leading word boundary
(?=Ja\w*a\b) - a positive lookahead that requires the whole word to start with Ja, then it can have 0+ word characters and end with a
\w*? - 0+ word characters but as few as possible
((\w)\2+) - Group 1 matching identical consecutive characters
\w* - any remaining word characters (0 or more)
\b - trailing word boundary.
The result you are seeking is in Group 1.
String s = "Prooo\nJavaaa";
Pattern pattern = Pattern.compile("\\b(?=Ja\\w*a\\b)\\w*?((\\w)\\2+)\\w*\\b");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
See the Java demo.

Another code example (inspired from #Wiktor Stribizew's code ) as per your expected input and output format.
public static void main( String[] args )
{
String[] input =
{ "Javaaa", "Javaaaaaaaaa", "Javaaaaaaaaaaaaaaaaaa", "Paoooo", "Paoooooooo", "Paooooooooxxxxxxxxx" };
for ( String str : input )
{
System.out.println( "Target String :" + str );
Pattern pattern = Pattern.compile( "((.)\\2+)" );
Matcher matcher = pattern.matcher( str );
while ( matcher.find() )
{
System.out.println( "result: " + matcher.group() );
}
System.out.println( "---------------------" );
}
System.out.println( "Finish" );
}
Output:
Target String :Javaaa
result: aaa
---------------------
Target String :Javaaaaaaaaa
result: aaaaaaaaa
---------------------
Target String :Javaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaa
---------------------
Target String :Paoooo
result: oooo
---------------------
Target String :Paoooooooo
result: oooooooo
---------------------
Target String :Paooooooooxxxxxxxxx
result: oooooooo
result: xxxxxxxxx
---------------------
Finish

Java Split String by colon on both side

Can you suggest me an approach by which I can split a String which is like:
:31C:150318
:31D:150425 IN BANGLADESH
:20:314015040086
So I tried to parse that string with
:[A-za-z]|\\d:
This kind of regular expression, but it is not working . Please suggest me a regular expression by which I can split that string with 20 , 31C , 31D etc as Keys and 150318 , 150425 IN BANGLADESH etc as Values .
If I use string.split(":") then it would not serve my purpose.
If a string is like:
:20: MY VALUES : ARE HERE
then It will split up into 3 string , and key 20 will be associated with "MY VALUES" , and "ARE HERE" will not associated with key 20 .

You may use matching mechanism instead of splitting since you need to match a specific colon in the string.
The regex to get 2 groups between the first and second colon and also capture everything after the second colon will look like
^:([^:]*):(.*)$
See demo. The ^ will assert the beginning of the string, ([^:]*) will match and capture into Group 1 zero or more characters other than :, and (.*) will match and capture into Group 2 the rest of the string. $ will assert the position at the end of a single line string (as . matches any symbol but a newline without Pattern.DOTALL modifier).
String s = ":20:AND:HERE";
Pattern pattern = Pattern.compile("^:([^:]*):(.*)$");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1) + ", Value: " + matcher.group(2) + "\n");
}
Result for this demo: Key: 20, Value: AND:HERE

You can use the following to split:
^[:]+([^:]+):

Try with split function of String class
String[] splited = string.split(":");
For your requirements:
String c = ":31D:150425 IN BANGLADESH:todasdsa";
c=c.substring(1);
System.out.println("C="+c);
String key= c.substring(0,c.indexOf(":"));
String value = c.substring(c.indexOf(":")+1);
System.out.println("key="+key+" value="+value);
Result:
C=31D:150425 IN BANGLADESH:todasdsa
key=31D value=150425 IN BANGLADESH:todasdsa

Regex for matching pattern within quotes

I have some input data such as
some string with 'hello' inside 'and inside'
How can I write a regex so that the quoted text (no matter how many times it is repeated) is returned (all of the occurrences).
I have a code that returns a single quotes, but I want to make it so that it returns multiple occurances:
String mydata = "some string with 'hello' inside 'and inside'";
Pattern pattern = Pattern.compile("'(.*?)+'");
Matcher matcher = pattern.matcher(mydata);
while (matcher.find())
{
System.out.println(matcher.group());
}

Find all occurences for me:
String mydata = "some '' string with 'hello' inside 'and inside'";
Pattern pattern = Pattern.compile("'[^']*'");
Matcher matcher = pattern.matcher(mydata);
while(matcher.find())
{
System.out.println(matcher.group());
}
Output:
''
'hello'
'and inside'
Pattern desciption:
' // start quoting text
[^'] // all characters not single quote
* // 0 or infinite count of not quote characters
' // end quote

I believe this should fit your requirements:
\'\w+\'

\'.*?' is the regex you are looking for.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java pattern matching using regex - java

Related

How parse key-value with regex

In Java, how do you tokenize a string that contains the delimiter in the tokens?

Java repeated character regex with condition

Java Split String by colon on both side

Regex for matching pattern within quotes

Categories

Resources