Regex for italic markdown

Regex for italic markdown - java

I'm trying for hours with regex: I need a regex to select all that is inside underlines.
Example:
\_italic\_
But with the only condition that I need it to ignore \\_ (backslash followed by underscore).
So, this would be a match (all the text which is inside the \_):
\_italic some text 123 \\_*%&$ _
SO far I have this regex:
(\_.*?\_)(?!\\\_)
But is not ignoring the \\_
Which regex would work?

You can use
(?s)(?<!\\)(?:\\{2})*_((?:[^\\_]|\\.)+)_
See the regex demo. Details:
(?s) - an inline embedded flag option equal to Pattern.DOTALL
(?<!\\)(?:\\{2})* - a position that is not immediately preceded with a backslash and then zero or more sequences of double backslashes
_ - an underscore
((?:[^\\_]|\\.)+) - Capturing group 1: one or more occurrences of any char other than a \ and _, or any escaped char (a combination of a \ and any one char)
_ - an underscore
See the Java demo:
List<String> strs = Arrays.asList("xxx _italic some text 123 \\_*%&$ _ xxx",
"\\_test_test_");
String regex = "(?s)(?<!\\\\)(?:\\\\{2})*_((?:[^\\\\_]|\\\\.)+)_";
Pattern p = Pattern.compile(regex);
for (String str : strs) {
Matcher m = p.matcher(str);
List<String> result = new ArrayList<>();
while(m.find()) {
result.add(m.group(1));
}
System.out.println(str + " => " + String.join(", ", result));
}
Output:
xxx _italic some text 123 \_*%&$ _ xxx => italic some text 123 \_*%&$
\_test_test_ => test

Related

How parse key-value with regex

i use Kotlin \ Java for parse some string.
My regex:
\[\'(.*?)[\]]=\'(.*?)(?!\,)[\']
text for parse:
someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..
i need result:
key1=value1
key2=value2
key3=value3
key4=value4
Thanks for help

Another regex for you
\['(\w+)'\]\s+(=)\s+'(\w+)'
Regex101 Demo Fiddle
Java test code
String str = "someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..";
String regex = "\\['(\\w+)'\\]\\s+(=)\\s+'(\\w+)'";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1) + matcher.group(2) + matcher.group(3));
}
Test result:
key1=value1
key2=value2
key3=value3
key4=value4

A few notes about the pattern that you tried
In your pattern you are not matching the spaces around the equals sign.
Also note that this part (?!\,)[\'] will always work as it says that it asserts not a comma to the right, and then matches a single quote.
You don't have to escape the \' and the single characters do not have to be in a character class.
You can use a pattern with a negated character class to capture the values between the single quotes to prevent .*? matching too much as the dot can match any character.
You might write the pattern as
\['([^']*)'\]\h+=\h+'([^']*)'
The pattern matches:
\[' Match ['
( Capture group 1
[^']* Match optional chars other than '
) Close group 1
'\] Match ']
\h+=\h+ Match an equals sign between 1 or more horizontal whitespace characters
'([^']*)' Capture group 2 which has the same pattern as group 1
Regex demo | Java demo
Example
String regex = "\\['([^']*)'\\]\\h+=\\h+'([^']*)'";
String string = "someArray1['key1'] = 'value1', someArray2['key2'] = 'value2', ignoreText=ignore, some['key3'] = 'value3', ignoreMe['ignore']=ignore, some['key4'] = 'value4'..";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1) + "=" + matcher.group(2));
}
Output
key1=value1
key2=value2
key3=value3
key4=value4

In Java, how do you tokenize a string that contains the delimiter in the tokens?

Let's say I have the string:
String toTokenize = "prop1=value1;prop2=String test='1234';int i=4;;prop3=value3";
I want the tokens:
prop1=value1
prop2=String test='1234';int i=4;
prop3=value3
For backwards compatibility, I have to use the semicolon as a delimiter. I have tried wrapping code in something like CDATA:
String toTokenize = "prop1=value1;prop2=<![CDATA[String test='1234';int i=4;]]>;prop3=value3";
But I can't figure out a regular expression to ignore the semicolons that are within the cdata tags.
I've tried escaping the non-delimiter:
String toTokenize = "prop1=value1;prop2=String test='1234'\\;int i=4\\;;prop3=value3";
But then there is an ugly mess of removing the escape characters.
Do you have any suggestions?

You may match either <![CDATA...]]> or any char other than ;, 1 or more times, to match the values. To match the keys, you may use a regular \w+ pattern:
(\w+)=((?:<!\[CDATA\[.*?]]>|[^;])+)
See the regex demo.
Details
(\w+) - Group 1: one or more word chars
= - a = sign
((?:<!\[CDATA\[.*?]]>|[^;])+) - Group 1: one or more sequences of
<!\[CDATA\[.*?]]> - a <![CDATA[...]]> substring
| - or
[^;] - any char but ;
See a Java demo:
String rx = "(\\w+)=((?:<!\\[CDATA\\[.*?]]>|[^;])+)";
String s = "prop1=value1;prop2=<![CDATA[String test='1234';int i=4;]]>;prop3=value3";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1) + " => " + matcher.group(2));
}
Results:
prop1 => value1
prop2 => <![CDATA[String test='1234';int i=4;]]>
prop3 => value3

Prerequisite:
All your tokens start with prop
There is no prop in the file other than the beginning of a token
I'd just do a replace of all ;prop by ~prop
Then your string becomes:
"prop1=value1~prop2=String test='1234';int i=4~prop3=value3";
You can then tokenize using the ~ delimiter

Regular expression split string on colon

I have a string
String l = "name: kumar age: 22 relationship: single "
it is comming from UI dynamically now i need to split the above string to
name: kumar
age: 22
relationship: single
My code is :
Pattern ptn = Pattern.compile("([^\\s]+( ?= ?[^\\s]*)?)");
Matcher mt = ptn.matcher(l);
while(mt.find())
{
String col_dat=mt.group(0);
if(col_dat !=null && col_dat.length()>0)
{
System.out.println("\t"+col_dat );
}
}
Any Suggestions will appreciated Thank you

You can use this regex:
\S+\s*:\s*\S+
Or this:
\w+\s*:\s*\w+
Demo: https://regex101.com/r/EgXlcD/6
Regex:
\S+ - 1 or more non space characters
\s* - 0 or more space characters
\w+ - 0 or more \w i.e [A-Za-z0-9_] characters.

Java repeated character regex with condition

I have large database. I want to check my database capitalize errors. I use this pattern for repeated chars. Pattern works but i need to start and end condition with string.
Pattern:
(\w)\1+
Target String:
Javaaa
result: aaa
I want to add condition to regex; Start with Ja and end with a*. Result **only must be repetead characters.
(I dont want to control programmatically only regex do this if its possible
(I'm do this with String.replaceAll(regex, string) not to
Pattern or Matcher class)

You may use a lookahead anchored at the leading word boundary:
\b(?=Ja\w*a\b)\w*?((\w)\2+)\w*\b
See the regex demo
Details:
\b - leading word boundary
(?=Ja\w*a\b) - a positive lookahead that requires the whole word to start with Ja, then it can have 0+ word characters and end with a
\w*? - 0+ word characters but as few as possible
((\w)\2+) - Group 1 matching identical consecutive characters
\w* - any remaining word characters (0 or more)
\b - trailing word boundary.
The result you are seeking is in Group 1.
String s = "Prooo\nJavaaa";
Pattern pattern = Pattern.compile("\\b(?=Ja\\w*a\\b)\\w*?((\\w)\\2+)\\w*\\b");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
See the Java demo.

Another code example (inspired from #Wiktor Stribizew's code ) as per your expected input and output format.
public static void main( String[] args )
{
String[] input =
{ "Javaaa", "Javaaaaaaaaa", "Javaaaaaaaaaaaaaaaaaa", "Paoooo", "Paoooooooo", "Paooooooooxxxxxxxxx" };
for ( String str : input )
{
System.out.println( "Target String :" + str );
Pattern pattern = Pattern.compile( "((.)\\2+)" );
Matcher matcher = pattern.matcher( str );
while ( matcher.find() )
{
System.out.println( "result: " + matcher.group() );
}
System.out.println( "---------------------" );
}
System.out.println( "Finish" );
}
Output:
Target String :Javaaa
result: aaa
---------------------
Target String :Javaaaaaaaaa
result: aaaaaaaaa
---------------------
Target String :Javaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaa
---------------------
Target String :Paoooo
result: oooo
---------------------
Target String :Paoooooooo
result: oooooooo
---------------------
Target String :Paooooooooxxxxxxxxx
result: oooooooo
result: xxxxxxxxx
---------------------
Finish

Java pattern matching using regex

I am new to java coding and using pattern matching.I am reading this string from file. So, this will give compilation error. I have a string as follows :
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ; // no compile error
I want to extract "128.210.16.48" value and "Hello Everyone" from above string. This values are not constant.
can you please give me some suggestions?
Thanks

I suggest you to use String#split() method but still if you are looking for regex pattern then try it and get the matched group from index 1.
("[^"][\d\.]+"|"[^)]*+)
Online demo
Sample code:
String str = "find(\"128.210.16.48\",\"Hello Everyone\")";
String regex = "(\"[^\"][\\d\\.]+\"|\"[^)]*+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
"128.210.16.48"
"Hello Everyone"
Pattern explanation:
( group and capture to \1:
" '"'
[^"] any character except: '"'
[\d\.]+ any character of: digits (0-9), '\.' (1
or more times (matching the most amount
possible))
" '"'
| OR
" '"'
[^)]* any character except: ')' (0 or more
times (matching the most amount
possible))
) end of \1

Try with String.split()
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ;
System.out.println(str.split(",")[0].split("\"")[1]);
System.out.println(str.split(",")[1].split("\"")[1]);
Output:
128.210.16.48
Hello Everyone
Edit:
Explanation:
For the first string split it by comma (,). From that array choose the first string as str.split(",")[0] split the string again with doublequote (") as split("\"")[1] and choose the second element from the array. Same the second string is also done.

The accepted answer is fine, but if for some reason you wanted to still use regex (or whoever finds this question) instead of String.split here's something:
String str = "find(\"128.210.16.48\",\"Hello Everyone\")" ; // no compile error
String regex1 = "\".+?\"";
Pattern pattern1 = Pattern.compile(regex1);
Matcher matcher1 = pattern1.matcher(str);
while (matcher1.find()){
System.out.println("Matcher 1 found (trimmed): " + matcher1.group().replace("\"",""));
}
Output:
Matcher 1 found (trimmed): 128.210.16.48
Matcher 1 found (trimmed): Hello Everyone
Note: this will only work if " is only used as a separator character. See Braj's demo as an example from the comments here.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for italic markdown - java

Related

How parse key-value with regex

In Java, how do you tokenize a string that contains the delimiter in the tokens?

Regular expression split string on colon

Java repeated character regex with condition

Java pattern matching using regex

Categories

Resources