using Regex pattern match for pipe delimited string - java

I am trying to do pattern match on pipe delimited string in java . But not sure what's going wrong. Need help from experts.
A|Bill Access Key|CBEBALCOM|
D|215325775|20210507|9|BALCOM SYSTEMS LTD|||
I|1|Back of Duplex Page|
I have file with records as above and I want to find record starting with 'D' and get the next pipe delimited values. And store those in some POJO object.
So tried to first read values by applying pattern but unable to find match.
String pattern = "r'D((?:\"(.*?)\"))'";
Pattern r = Pattern.compile(pattern); Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("value1" + m.group(0) );
System.out.println("value2" + m.group(1) );
} else {
System.out.println("No Match");
}

You might try it like this rather than using a pure regex solution
stream the lines of the file
filter for lines starting with D|
split filtered lines starting at index 2
create new array of just first two values
and collect in a List<String[]>
List<String[]> list = Files.lines(Path.of("Myfile.txt"))
.filter(str->str.startsWith("D|"))
.map(str->str.substring(2).split("\\|",3))
.map(arr->new String[]{arr[0],arr[1]})
.collect(Collectors.toList());
} catch (IOException ioe) {
ioe.printStackTrace();
}

Related

How can I get non-matching groups using a Matcher in Java?

I'm trying to write a java regex to catch some groups of words from a String using a Matcher.
Say i got this string: "Hello, we are #happy# to see you today".
I would like to get 2 group of matches, one having
Hello, we are
to see you today
and the other
happy
So far, I was only able to match the word between the #s using this Pattern:
Pattern p = Pattern.compile("#(.+?)#");
I've read about negative lookahead and lookaround, played a bit with it but without success.
I assume I should do some sort of negation of the regex so far, but I couldn't come up with anything.
Any help would be really appreciated, thank you.
From comment:
I may incur in a string where I got more than one instances of words wrapped by #, such as "#Hello# kind #stranger#"
From comment:
I need to apply some different style format to both the text inside and outside.
Since you need to apply different stylings, the code need to process each block of text separately, and needs to know if the text is inside or outside a #..# section.
Note, in the following code, it will silently skip the last #, if there is an odd number of them.
String input = ...
for (Matcher m = Pattern.compile("([^#]+)|#([^#]+)#").matcher(input); m.find(); ) {
if (m.start(1) != -1) {
String outsideText = m.group(1);
System.out.println("Outside: \"" + outsideText + "\"");
} else {
String insideText = m.group(2);
System.out.println("Inside: \"" + insideText + "\"");
}
}
Output for input = "Hello, we are #happy# to see you today"
Outside: "Hello, we are "
Inside: "happy"
Outside: " to see you today"
Output for input = "#Hello# kind #stranger#"
Inside: "Hello"
Outside: " kind "
Inside: "stranger"
Output for input = "This #text# has unpaired # characters"
Outside: "This "
Inside: "text"
Outside: " has unpaired "
Outside: " characters"
The best I could do is splitting in 3 groups, then merging the group 1 and 4 :
(^.*)(\#(.+?)\#)(.*)
Test it here
EDIT: Taking remarks from the comments :
(^[^\#]*)(?:\#(.+?)\#)([^\#]*)
Thanks to #Lino we don't capture the useless group with # anymore, and we capture anything except #, instead of any non whitespace character in the 1st and 2nd groups.
Test it here
Is this solution fine?
Pattern pattern =
Pattern.compile("([^#]+)|#([^#]*)#");
Matcher matcher =
pattern.matcher("Hello, we are #happy# to see you today");
List<String> notBetween = new ArrayList<>(); // not surrounded by #
List<String> between = new ArrayList<>(); // surrounded by #
while (matcher.find()) {
if (Objects.nonNull(matcher.group(1))) notBetween.add(matcher.group(1));
if (Objects.nonNull(matcher.group(2))) between.add(matcher.group(2));
}
System.out.println("Printing group 1");
for (String string :
notBetween) {
System.out.println(string);
}
System.out.println("Printing group 2");
for (String string :
between) {
System.out.println(string);
}

How to splitting records based white spaces when different lines have spaces at different positions

I have a file with records as below and I am trying to split the records in it based on white spaces and convert them into comma.
file:
a 3w 12 98 header P6124
e 4t 2 100 header I803
c 12L 11 437 M12
BufferedReader reader = new BufferedReader(new FileReader("/myfile.txt"));
String line = reader.readLine();
while (line != null) {
System.out.println(line);
line = reader.readLine();
String[] splitLine = line.split("\\s+")
If the data is separated by multiple white spaces, I usually go for regex replace -> split('\\s+') or split(" +").
But in the above case, I have a record c which doesn't have the data header. Hence the regex "\s+" or " +" will just skip that record and I will get an empty space as c,12L,11,437,M12 instead of c,12L,11,437,,M12
How do I properly split the lines based on any delimiter in this case so that I get data in the below format:
a,3w,12,98,header,P6124
e,4t,2,100,header,I803
c,12L,11,437,,M12
Could anyone let me know how I can achieve this ?
May be you can try using a more complicated approach, using a complex regex in order to match exatcly six fields for each line and handling explicitly the case of a missing value for the fifth one.
I rewrote your example adding some console log in order to clarify my suggestion:
public class RegexTest {
private static final String Input = "a 3w 12 98 header P6124\n" +
"e 4t 2 100 header I803\n" +
"c 12L 11 437 M12";
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new StringReader(Input));
String line = null;
Pattern pattern = Pattern.compile("^([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+)? +([^ ]+)$");
do {
line = reader.readLine();
System.out.println(line);
if(line != null) {
String[] splitLine = line.split("\\s+");
System.out.println(splitLine.length);
System.out.println("Line: " + line);
Matcher matcher = pattern.matcher(line);
System.out.println("matches: " + matcher.matches());
System.out.println("groups: " + matcher.groupCount());
for(int i = 1; i <= matcher.groupCount(); i++) {
System.out.printf(" Group %d has value '%s'\n", i, matcher.group(i));
}
}
} while (line != null);
}
}
The key is that the pattern used to match each line requires a sequence of six fields:
for each field, the value is described as [^ ]+
separators between fields are described as +
the value of the fifth (nullable) field is described as [^ ]+?
each value is captured as a group using parentheses: ( ... )
start (^) and end ($) of each line are marked explicitly
Then, each line is matched against the given pattern, obtaining six groups: you can access each group using matcher.group(index), where index is 1-based because group(0) returns the full match.
This is a more complex approach but I think it can help you to solve your problem.
Put a limit on the number of whitespace chars that may be used to split the input.
In the case of your example data, a maximum of 5 works:
String[] splitLine = line.split("\\s{1,5}");
See live demo (of this code working as desired).
Are you just trying to switch your delimiters from spaces to commas?
In that case:
cat myFile.txt | sed 's/ */ /g' | sed 's/ /,/g'
*edit: added a stage to strip out lists of more than two spaces, replacing them with just the two spaces needed to retain the double comma.

How to create a regex that accepts specific characters?

I have this regex:
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$
I need a regex to accept a minimum word length of 8, letters(uppercase & lowercase), numbers and these characters:
!#$%&'*+-/=?^_`{|}~"(),:;<>#[]
It works when I tested it here.
This is how I used it in Java Android.
public static final String regex = "^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\\]]{8,}$";
This is the error that I received.
java.util.regex.PatternSyntaxException: Missing closing bracket in character class near index 49
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$
If you just want to test if a given input string matches your pattern, you may use String#matches directly, e.g.
String regex = "[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>\\[\\]-]{8,}";
String input = "Jon#Skeet#123";
if (input.matches(regex)) {
System.out.println("Found a match");
}
else {
System.out.println("No match");
}
If you wanted to parse a larger input text and identify such matching words, then you would want to use a formal Pattern and Matcher. But, I don't see the need for this just based on your question.
You have to use pattern marcher concept. it may help you.
follow tutorial : https://www.mkyong.com/regular-expressions/how-to-validate-password-with-regular-expression/
Here is one Example.
try {
Pattern pattern;
Matcher matcher;
final String PASSWORD_PATTERN = "((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20})";
pattern = Pattern.compile(PASSWORD_PATTERN);
matcher = pattern.matcher(password_string );
if(matcher.matches()){
Log.e("TAG", "TRUE")
}else{
Log.e("TAG", "FALSE")
}
} catch (RuntimeException e) {
return false;
}

JAVA Get text from String

Hi I get this String from server :
id_not="autoincrement"; id_obj="-"; id_tr="-"; id_pgo="-"; typ_not=""; tresc="Nie wystawił"; datetime="-"; lon="-"; lat="-";
I need to create a new String e.x String word and send a value which I get from String tresc="Nie wystawił"
Like #Jan suggest in comment you can use regex for example :
String str = "id_not=\"autoincrement\"; id_obj=\"-\"; id_tr=\"-\"; id_pgo=\"-\"; typ_not=\"\"; tresc=\"Nie wystawił\"; datetime=\"-\"; lon=\"-\"; lat=\"-\";";
Pattern p = Pattern.compile("tresc(.*?);");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group());
}
Output
tresc="Nie wystawił";
If you want to get only the value of tresc you can use :
Pattern p = Pattern.compile("tresc=\"(.*?)\";");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
Output
Nie wystawił
Something along the lines of
Pattern p = Pattern.compile("tresc=\"([^\"]+)\");
Matcher m = p.matcher(stringFromServer);
if(m.find()) {
String whatYouWereLookingfor = m.group(1);
}
should to the trick. JSON parsing might be much better in the long run if you need additional values
Your question is unclear but i think you get a string from server and from that string you want the string/value for tresc. You can first search for tresc in the string you get. like:
serverString.substring(serverString.indexOf("tresc") + x , serverString.length());
Here replace x with 'how much further you want to pick characters.
Read on substring and delimiters
As values are separated by semicolon so annother solution could be:
int delimiter = serverstring.indexOf(";");
//in string thus giving you the index of where it is in the string
// Now delimiter can be -1, if lets say the string had no ";" at all in it i.e. no ";" is not found.
//check and account for it.
if (delimiter != -1)
String subString= serverstring.substring(5 , iend);
Here 5 means tresc is on number five in string, so it will five you tresc part.
You can then use it anyway you want.

Regular Expressions in Java: Matching a date value surrounded by other data

I have a lot of files I am retrieving data from, and I have hit a wall with date values surrounded by other data. I am using Java, and the regular expression I am using works for the variable string_i_currently_match however I need it to match example_string_i_need_to_match
String example_string_i_need_to_match = "data 10/12/2010, data, data";
String string_i_currently_match = "10/12/2010,";
Pattern pattern = Pattern.compile(
"^(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\\d\\d(?:,)$"
);
Matcher matcher = pattern.matcher(fileString);
boolean found = false;
while (matcher.find()) {
System.out.printf("I found the text \"%s\" starting at " +
"index %d and ending at index %d.\n",
matcher.group(), matcher.start(), matcher.end());
found = true;
}
if(!found){
System.out.println("No match found.");
}
Perhaps it's because I'm exhausted, but I can't get it to match. Any help, even pointers would be greatly appreciated.
Edit: To clarify, I do not want to match data, data but just get the index of the date its self.
The ^ sign matches the start of the string and $ matches the end. Removing those allows the pattern to match dates within the string.
Like this:
"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\\d\\d(?:,)"
This will match your date:
[\d]{2}/[\d]{2}/[\d]{4}
In what you posted, you made at least one error: Only matches a date at the start of the string.
String ResultString = null;
try {
Pattern regex = Pattern.compile("\\b[0-9]{2}/[0-9]{2}/[0-9]{4}\\b");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Unless I am overlooking something this should match your date.
See it working here : http://ideone.com/HETGU

Categories

Resources