Java repeated character regex with condition - java

I have large database. I want to check my database capitalize errors. I use this pattern for repeated chars. Pattern works but i need to start and end condition with string.
Pattern:
(\w)\1+
Target String:
Javaaa
result: aaa
I want to add condition to regex; Start with Ja and end with a*. Result **only must be repetead characters.
(I dont want to control programmatically only regex do this if its possible
(I'm do this with String.replaceAll(regex, string) not to
Pattern or Matcher class)

You may use a lookahead anchored at the leading word boundary:
\b(?=Ja\w*a\b)\w*?((\w)\2+)\w*\b
See the regex demo
Details:
\b - leading word boundary
(?=Ja\w*a\b) - a positive lookahead that requires the whole word to start with Ja, then it can have 0+ word characters and end with a
\w*? - 0+ word characters but as few as possible
((\w)\2+) - Group 1 matching identical consecutive characters
\w* - any remaining word characters (0 or more)
\b - trailing word boundary.
The result you are seeking is in Group 1.
String s = "Prooo\nJavaaa";
Pattern pattern = Pattern.compile("\\b(?=Ja\\w*a\\b)\\w*?((\\w)\\2+)\\w*\\b");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
See the Java demo.

Another code example (inspired from #Wiktor Stribizew's code ) as per your expected input and output format.
public static void main( String[] args )
{
String[] input =
{ "Javaaa", "Javaaaaaaaaa", "Javaaaaaaaaaaaaaaaaaa", "Paoooo", "Paoooooooo", "Paooooooooxxxxxxxxx" };
for ( String str : input )
{
System.out.println( "Target String :" + str );
Pattern pattern = Pattern.compile( "((.)\\2+)" );
Matcher matcher = pattern.matcher( str );
while ( matcher.find() )
{
System.out.println( "result: " + matcher.group() );
}
System.out.println( "---------------------" );
}
System.out.println( "Finish" );
}
Output:
Target String :Javaaa
result: aaa
---------------------
Target String :Javaaaaaaaaa
result: aaaaaaaaa
---------------------
Target String :Javaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaa
---------------------
Target String :Paoooo
result: oooo
---------------------
Target String :Paoooooooo
result: oooooooo
---------------------
Target String :Paooooooooxxxxxxxxx
result: oooooooo
result: xxxxxxxxx
---------------------
Finish

Related

How parse key-value with regex

i use Kotlin \ Java for parse some string.
My regex:
\[\'(.*?)[\]]=\'(.*?)(?!\,)[\']
text for parse:
someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..
i need result:
key1=value1
key2=value2
key3=value3
key4=value4
Thanks for help
Another regex for you
\['(\w+)'\]\s+(=)\s+'(\w+)'
Regex101 Demo Fiddle
Java test code
String str = "someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..";
String regex = "\\['(\\w+)'\\]\\s+(=)\\s+'(\\w+)'";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1) + matcher.group(2) + matcher.group(3));
}
Test result:
key1=value1
key2=value2
key3=value3
key4=value4
A few notes about the pattern that you tried
In your pattern you are not matching the spaces around the equals sign.
Also note that this part (?!\,)[\'] will always work as it says that it asserts not a comma to the right, and then matches a single quote.
You don't have to escape the \' and the single characters do not have to be in a character class.
You can use a pattern with a negated character class to capture the values between the single quotes to prevent .*? matching too much as the dot can match any character.
You might write the pattern as
\['([^']*)'\]\h+=\h+'([^']*)'
The pattern matches:
\[' Match ['
( Capture group 1
[^']* Match optional chars other than '
) Close group 1
'\] Match ']
\h+=\h+ Match an equals sign between 1 or more horizontal whitespace characters
'([^']*)' Capture group 2 which has the same pattern as group 1
Regex demo | Java demo
Example
String regex = "\\['([^']*)'\\]\\h+=\\h+'([^']*)'";
String string = "someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1) + "=" + matcher.group(2));
}
Output
key1=value1
key2=value2
key3=value3
key4=value4

In Java, how do you tokenize a string that contains the delimiter in the tokens?

Let's say I have the string:
String toTokenize = "prop1=value1;prop2=String test='1234';int i=4;;prop3=value3";
I want the tokens:
prop1=value1
prop2=String test='1234';int i=4;
prop3=value3
For backwards compatibility, I have to use the semicolon as a delimiter. I have tried wrapping code in something like CDATA:
String toTokenize = "prop1=value1;prop2=<![CDATA[String test='1234';int i=4;]]>;prop3=value3";
But I can't figure out a regular expression to ignore the semicolons that are within the cdata tags.
I've tried escaping the non-delimiter:
String toTokenize = "prop1=value1;prop2=String test='1234'\\;int i=4\\;;prop3=value3";
But then there is an ugly mess of removing the escape characters.
Do you have any suggestions?
You may match either <![CDATA...]]> or any char other than ;, 1 or more times, to match the values. To match the keys, you may use a regular \w+ pattern:
(\w+)=((?:<!\[CDATA\[.*?]]>|[^;])+)
See the regex demo.
Details
(\w+) - Group 1: one or more word chars
= - a = sign
((?:<!\[CDATA\[.*?]]>|[^;])+) - Group 1: one or more sequences of
<!\[CDATA\[.*?]]> - a <![CDATA[...]]> substring
| - or
[^;] - any char but ;
See a Java demo:
String rx = "(\\w+)=((?:<!\\[CDATA\\[.*?]]>|[^;])+)";
String s = "prop1=value1;prop2=<![CDATA[String test='1234';int i=4;]]>;prop3=value3";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1) + " => " + matcher.group(2));
}
Results:
prop1 => value1
prop2 => <![CDATA[String test='1234';int i=4;]]>
prop3 => value3
Prerequisite:
All your tokens start with prop
There is no prop in the file other than the beginning of a token
I'd just do a replace of all ;prop by ~prop
Then your string becomes:
"prop1=value1~prop2=String test='1234';int i=4~prop3=value3";
You can then tokenize using the ~ delimiter

Java pattern matching using regex

I am new to java coding and using pattern matching.I am reading this string from file. So, this will give compilation error. I have a string as follows :
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ; // no compile error
I want to extract "128.210.16.48" value and "Hello Everyone" from above string. This values are not constant.
can you please give me some suggestions?
Thanks
I suggest you to use String#split() method but still if you are looking for regex pattern then try it and get the matched group from index 1.
("[^"][\d\.]+"|"[^)]*+)
Online demo
Sample code:
String str = "find(\"128.210.16.48\",\"Hello Everyone\")";
String regex = "(\"[^\"][\\d\\.]+\"|\"[^)]*+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
"128.210.16.48"
"Hello Everyone"
Pattern explanation:
( group and capture to \1:
" '"'
[^"] any character except: '"'
[\d\.]+ any character of: digits (0-9), '\.' (1
or more times (matching the most amount
possible))
" '"'
| OR
" '"'
[^)]* any character except: ')' (0 or more
times (matching the most amount
possible))
) end of \1
Try with String.split()
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ;
System.out.println(str.split(",")[0].split("\"")[1]);
System.out.println(str.split(",")[1].split("\"")[1]);
Output:
128.210.16.48
Hello Everyone
Edit:
Explanation:
For the first string split it by comma (,). From that array choose the first string as str.split(",")[0] split the string again with doublequote (") as split("\"")[1] and choose the second element from the array. Same the second string is also done.
The accepted answer is fine, but if for some reason you wanted to still use regex (or whoever finds this question) instead of String.split here's something:
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ; // no compile error
String regex1 = "\".+?\"";
Pattern pattern1 = Pattern.compile(regex1);
Matcher matcher1 = pattern1.matcher(str);
while (matcher1.find()){
System.out.println("Matcher 1 found (trimmed): " + matcher1.group().replace("\"",""));
}
Output:
Matcher 1 found (trimmed): 128.210.16.48
Matcher 1 found (trimmed): Hello Everyone
Note: this will only work if " is only used as a separator character. See Braj's demo as an example from the comments here.

Pattern matching for character and end of line

I have a string which is in following format:
I am extracting this Hello:A;B;C, also Hello:D;E;F
How do I extract the strings A;B;C and D;E;F?
I have written below code snippet to extract but not able to extract the last matching character D;E;F
Pattern pattern = Pattern.compile("(?<=Hello:).*?(?=,)");
The $ means end-of-line.
Thus this should work:
Pattern pattern = Pattern.compile("(?<=Hello:).*?(?=,|$)");
So you look-ahead for a comma or the end-of-line.
Test.
Try this:
String test = "I am extracting this Hello:Word;AnotherWord;YetAnotherWord, also Hello:D;E;F";
// any word optionally followed by ";" three times, the whole thing followed by either two non-word characters or EOL
Pattern pattern = Pattern.compile("(\\w+;?){3}(?=\\W{2,}|$)");
Matcher matcher = pattern.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Word;AnotherWord;YetAnotherWord
D;E;F
Assuming you mean omitting certain patterns in a string:
String s = "I am extracting this Hello:A;B;C, also Hello:D;E;F" ;
ArrayList<String> tokens = new ArrayList<String>();
tokens.add( "A;B;C" );
tokens.add( "D;E;F" );
for( String tok : tokens )
{
if( s.contains( tok ) )
{
s = s.replace( tok, "");
}
}
System.out.println( s );

Use variables in pattern

So i need to get a word between 2 other words; and im using pattern and matcher.
Pattern p = Pattern.compile("Hello(.*?)GoodBye");
Matcher m = p.matcher(line);
In this example i'm getting the word between Hello and Goodbye and it works.
What i want to do is replace Hello and GoodBye bye variables such as:
String StartDelemiter = "Hello";
String EndDelemiter = "GoodBye";
How should write it in Pattern p = Pattern.compile(---); I Tried :
Pattern p = Pattern.compile( "{ "+StartDelemiter +" (.*?) "+EndDelemiter+" }" );
But application crashes !!
You need to escape { and } with backslashes, something like:
Pattern p = Pattern.compile( "\\{ "+StartDelemiter +" (.*?) "+EndDelemiter+" \\}" );
The curly braces are Regex quantifiers
<pattern>{n} Match exactly n times
<pattern>{n,} Match at least n times
<pattern>{n,m} Match at least n but not more than m times

Categories

Resources