Java regex except combination of symbols - java

I'm trying to find substing contains any character, but not include combination "[%"
As examples:
Input: atrololo[%trololo
Output: atrololo
Input: tro[tro%tro[%trololo
Output: tro[tro%tro
I already wrote regex, take any symbol except [ or %:
[A-Za-z-0-9\s!-$-&/:-#\\-`\{-~]*
I must put in the end of my expression something like [^("[%")], but i can't solve how it should input.
You may check my regular in
https://www.regex101.com/
Put as test string this:
sdfasdsdfasa##!55#321!2h/ хf[[[[[sds d
asgfdgsdf[[[%for (int i = 0; i < 5; i++){}%]
[% fo%][%r(int i = 0; i < 5; i++){ %]*[%}%]
[%for(int i = 0; i < 5; i++){%][%=i%][%}%]
[%#n%]<[%# n + m %]*[%#%]>[%#%]
%?s.equals(""TEST"")%]TRUE[%#3%]![%#%][%?%]
Kind regards.

You could use a negative lookahead based regex like below to get the part before the [%
^(?:(?!\[%).)*
(?:(?!\[%).)* matches any character but not of [% zero or more times.
DEMO
String s = "tro[tro%tro[%trololo";
Pattern regex = Pattern.compile("^(?:(?!\\[%).)*");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group()); // output : tro[tro%tro
}
OR
A lookahead based regex,
^.*?(?=\[%)
DEMO
Pattern regex = Pattern.compile("^.*?(?=\\[%)");
OR
You could split the input string based on the regex \[% and get the parts you want.
String s = "tro[tro%tro[%trololo";
String[] part = s.split("\\[%");
System.out.println(part[0]); // output : tro[tro%tro

Using your input/output pairs as the spec:
String input; // the starting string
String output = input.replaceAll("\\[%.*", "");

Related

Matching columns containing aggregates with regex

I'm trying to design a regular expression to identify certain columns in the string. This is the input string -
GENDER = Y OR (SUM(TOTAL_AMOUNT) > 100 AND SUM(TOTAL_AMOUNT) < 600)
I'm trying to match SUM(TOTAL_AMOUNT) from above string.
This is the regex I've tried:
SUM([a-zA-Z])
But its not able to match properly. Could someone tell me what I'm doing wrong with my regex here. Thanks in advance.
Sample Code:
List<String> input = new ArrayList<>();
Matcher m = Pattern.compile("SUM([a-zA-Z])").matcher(str);
while (m.find())
input.add(m.group(1));
You can use
String str = "GENDER = Y OR (SUM(TOTAL_AMOUNT) > 100 AND SUM(TOTAL_AMOUNT) < 600)";
Matcher matcher = Pattern.compile("SUM\\([^()]+\\)").matcher(str);
List<String> input = new ArrayList<>();
while (matcher.find()) {
input.add(matcher.group());
}
System.out.println(input);
See the Java demo online. See the regex demo, too. It matches
SUM\( - a SUM( string
[^()]+ - one or more chars other than ( and )
\) - a ) char.
Note that I am using matcher.group() in the code to get the full match since there is no capturing group in the pattern (thus, you can't use matcher.group(1) here).

How would I replace this function with a regex replace

I have a file name with this format yy_MM_someRandomString_originalFileName.
example:
02_01_fEa3129E_my Pic.png
I want replace the first 2 underscores with / so that the example becomes:
02/01/fEa3129E_my Pic.png
That can be done with replaceAll, but the problem is that files may contain underscores as well.
#Test
void test() {
final var input = "02_01_fEa3129E_my Pic.png";
final var formatted = replaceNMatches(input, "_", "/", 2);
assertEquals("02/01/fEa3129E_my Pic.png", formatted);
}
private String replaceNMatches(String input, String regex,
String replacement, int numberOfTimes) {
for (int i = 0; i < numberOfTimes; i++) {
input = input.replaceFirst(regex, replacement);
}
return input;
}
I solved this using a loop, but is there a pure regex way to do this?
EDIT: this way should be able to let me change a parameter and increase the amount of underscores from 2 to n.
You could use 2 capturing groups and use those in the replacement where the match of the _ will be replaced by /
^([^_]+)_([^_]+)_
Replace with:
$1/$2/
Regex demo | Java demo
For example:
String regex = "^([^_]+)_([^_]+)_";
String string = "02_01_fEa3129E_my Pic.png";
String subst = "$1/$2/";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
String result = matcher.replaceFirst(subst);
System.out.println(result);
Result
02/01/fEa3129E_my Pic.png
Your current solution has few problems:
It is inefficient - because each replaceFirst need to start from beginning of string so it needs to iterate over same starting characters many times.
It has a bug - because of point 1. while iterating from beginning instead of last modified place, we can replace value which was inserted previously.
For instance if we want to replace single character two times, each with X like abc -> XXc after code like
String input = "abc";
input = input.replaceFirst(".", "X"); // replaces a with X -> Xbc
input = input.replaceFirst(".", "X"); // replaces X with X -> Xbc
we will end up with Xbc instead of XXc because second replaceFirst will replace X with X instead of b with X.
To avoid that kind of problems you can rewrite your code to use Matcher#appendReplacement and Matcher#appendTail methods which ensures that we will iterate over input once and can replace each matched part with value we want
private static String replaceNMatches(String input, String regex,
String replacement, int numberOfTimes) {
Matcher m = Pattern.compile(regex).matcher(input);
StringBuilder sb = new StringBuilder();
int i = 0;
while(i++ < numberOfTimes && m.find() ){
m.appendReplacement(sb, replacement); // replaces currently matched part with replacement,
// and writes replaced version to StringBuilder
// along with text before the match
}
m.appendTail(sb); //lets add to builder text after last match
return sb.toString();
}
Usage example:
System.out.println(replaceNMatches("abcdefgh", "[efgh]", "X", 2)); //abcdXXgh

Converting string data from an array of data arranged in columns

I'm trying to convert a string array that is itself a part of another array fed into Java from an external file.
There are two parts to this question:
How do I convert the string's substring elements to doubles or ints?
How do I skip the header which is itself a part of the string?
I have the following piece of code that is NOT giving me an error but neither is it giving me output. The data is arranged in columns, so as far as the split, I'm not sure what delimiter to use as the argument for that method. I've tried \r, \n, ",", " " and nothing works.
str0 = year.split(",");
year = year.trim();
int[] yearData = new int[str0.length-1];
for(i = 0; i < str0.length-1; i++) {
yearData[i] = Integer.parseInt(str0[i]);
System.out.println(yearData[i]);
}
The code you have provided is not working. Anyway consider the given example which is using Regular Expression, where you found all the numbers in the string, so our regular expression works well. By changing the Regular Expression you can get the substring as well as you can skip the head part. I hope it would help.
String regEx = "[+|-]?(\\d+(\\.\\d*)?)|(\\.\\d+)";
String str = "256 is the square of 16 and -2.5 squared is 6.25 “ + “and -.243 is less than 0.1234.";
Pattern pattern = Pattern.compile(regEx);
Matcher m = pattern.matcher(str);
int i = 0;
String subStr = null;
while(m.find()) {
System.out.println(m.group());
Try something like this:
year = year.trim(); // This should come before the split()...
str0 = year.split("[\\s,;]+"); // split() uses RegEx...
int[] yearData = new int[str0.length-1];
for(i = 0; i < str0.length-1; i++) {
yearData[i] = Integer.parseInt(str0[i]);
System.out.println(yearData[i]);
}

extract values from string with Regular Expression

I have this java code
String msg = "*1*20*11*30*IGNORE*53*40##";
String regex = "\\*1\\*(.*?)\\*11\\*(.*?)\\*(.*?)\\*53\\*(.*?)##";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(msg);
if (matcher.find()) {
for (int i = 0; i < matcher.groupCount(); i++) {
System.out.println(matcher.group((i+1)));
}
}
the output is
20
30
IGNORE
40
How do I have to change the regex, that the String which is IGNORE is ignored?
I want to,that anything which is written there not to be found by the matcher.
the positions where 20,30,40 is are values for me which I need to extract, IGNORE in my case is any protocol specific counter which has no need for me
Always ignore the 3rd parameter:
Simply don't create a capture (don't use parentheses).
\\*1\\*(.*?)\\*11\\*(.*?)\\*.*?\\*53\\*(.*?)##
Ignore independently of position:
You need to capture the IGNORE part just like you're doing, and check in your loop if it needs to be ignored:
String msg = "*1*20*11*30*IGNORE*53*40##";
String regex = "\\*1\\*(.*?)\\*11\\*(.*?)\\*(.*?)\\*53\\*(.*?)##";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(msg);
if (matcher.find()) {
for (int i = 0; i < matcher.groupCount(); i++) {
if (!matcher.group(i+1).equals("IGNORE")) {
System.out.println(matcher.group(i+1));
}
}
}
DEMO
You can use a tempered greedy token to make sure you do not get a match when IGNORE is in-between the 2nd and 3rd capture groups:
\\*1\\*(.*?)\\*11\\*(.*?)\\*(?:(?!IGNORE).)*\\*53\\*(.*?)##
See demo. In this case, the 3rd group cannot contain IGNORE.
The token is useful when you need to match the closest window between two subpatterns that does not contain some substring.
In case you just do not want the 3rd group to be equal to IGNORE, use a negative look-ahead:
\\*1\\*(.*?)\\*11\\*(.*?)\\*(?!IGNORE\\*)(.*?)\\*53\\*(.*?)##
^^^^^^^^^^^^
See demo
Split the input on * and treat IGNORE as an optional part of the delimiter, having first trimmed off the prefix and suffix:
String[] parts = msg.replaceAll("^\\*\\d\\*|##$","").split("(\\*IGNORE)?\\*\\d+\\*");
Some test code:
String msg = "*1*20*11*30*IGNORE*53*40##";
String[] parts = msg.replaceAll("^\\*\\d\\*|##$","").split("(\\*IGNORE)?\\*\\d+\\*");
System.out.println(Arrays.toString(parts));
Output:
[20, 30, 40]

How to find the text between ( and )

I have a few strings which are like this:
text (255)
varchar (64)
...
I want to find out the number between ( and ) and store that in a string. That is, obviously, store these lengths in strings.
I have the rest of it figured out except for the regex parsing part.
I'm having trouble figuring out the regex pattern.
How do I do this?
The sample code is going to look like this:
Matcher m = Pattern.compile("<I CANT FIGURE OUT WHAT COMES HERE>").matcher("text (255)");
Also, I'd like to know if there's a cheat sheet for regex parsing, from where one can directly pick up the regex patterns
I would use a plain string match
String s = "text (255)";
int start = s.indexOf('(')+1;
int end = s.indexOf(')', start);
if (end < 0) {
// not found
} else {
int num = Integer.parseInt(s.substring(start, end));
}
You can use regex as sometimes this makes your code simpler, but that doesn't mean you should in all cases. I suspect this is one where a simple string indexOf and substring will not only be faster, and shorter but more importantly, easier to understand.
You can use this pattern to match any text between parentheses:
\(([^)]*)\)
Or this to match just numbers (with possible whitespace padding):
\(\s*(\d+)\s*\)
Of course, to use this in a string literal, you have to escape the \ characters:
Matcher m = Pattern.compile("\\(\\s*(\\d+)\\s*\\)")...
Here is some example code:
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="varchar (64)";
String re1=".*?"; // Non-greedy match on filler
String re2="\\((\\d+)\\)"; // Round Braces 1
Pattern p = Pattern.compile(re1+re2,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String rbraces1=m.group(1);
System.out.print("("+rbraces1.toString()+")"+"\n");
}
}
}
This will print out any (int) it finds in the input string, txt.
The regex is \((\d+)\) to match any numbers between ()
int index1 = string.indexOf("(")
int index2 = string.indexOf(")")
String intValue = string.substring(index1+1, index2-1);
Matcher m = Pattern.compile("\\((\\d+)\\)").matcher("text (255)");
if (m.find()) {
int len = Integer.parseInt (m.group(1));
System.out.println (len);
}

Categories

Resources