Java Regular Expression Multiline - java

I'm trying to get the result of a match with two lines and more, this is my text in a file (for JOURNAL ENTRIES for Wincor ATM):
DEMANDE SOLDE
N° CARTE : 1500000001180006
OPERATION NO. : 585068
========================================
RETRAIT
N° CARTE 1600001002200006
OPERATION NO. : 585302
MONTANT : MAD 200.00
========================================
... etc.
Theare more lines repeated for each operation : retrait(ATMs), demande de solde (balance inquiry), which I want to get a resul like: RETRAIT\nN° CARTE 1600001002200006
My java code:
String filename="20140604.jrn";
File file=new File(filename);
String regexe = ".*RETRAIT^\r\n.*CARTE.*\\d{16}"; // Work with .*CARTE.*\\d{16}: result: N° CARTE : 1500000001180006 N° CARTE 1600001002200006
Pattern pattern = Pattern.compile(regexe,Pattern.MULTILINE);
try {
BufferedReader in = new BufferedReader(new FileReader(file));
while (in.ready()) {
String s = in.readLine();
Matcher matcher = pattern.matcher(s);
while (matcher.find()) { // find the next match
System.out.println("found the pattern \"" + matcher.group());
}
}
in.close();
}
catch(IOException e) {
System.out.println("File 20140604.jrn not found");
}
Any Solution Please ?

I am unable to test this right now, but it looks like you have the boundary special character '^' in the wrong spot. It is trying to match RETRAIT followed by the beginning of a line followed by newline characters, when the beginning of the line won't start until after the newline characters.
UPDATE:
With an online java regex tool, I've been able to test this:
^RETRAIT\s*\w+.*CARTE\s+\d{16}
which matches what you want in multiline mode. The \s special character consumes whitespace (including carriage return and new line), which is more resilient than checking explicitly for \n or \r.

Related

Regex for replacing Exact String match [duplicate]

My input:
1. end
2. end of the day or end of the week
3. endline
4. something
5. "something" end
Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
String delete="end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+delete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
My output If I use the above snippet:(Also my expected output)
1.
2. of the day or of the week
3. endline
4. something
5. "something"
But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+toDelete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
I get my output as: (It just removes the space)
1. end
2. endofthedayorendoftheweek
3. endline
4. something
5. "something" end
Can u guys help me on this?
Click here to follow the thread
You need to create an alternation group out of the set with
String.join("|", toDelete)
and use as
line = line.replaceAll("\\b(?:"+String.join("|", toDelete)+")\\b", "");
The pattern will look like
\b(?:end|something)\b
See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).
Or, better, compile the regex before entering the loop:
Pattern pat = Pattern.compile("\\b(?:" + String.join("|", toDelete) + ")\\b");
...
line = pat.matcher(line).replaceAll("");
UPDATE:
To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!\w) instead of the initial \b to make sure there is no word char before and (?!\w) negative lookahead instead of the final \b to make sure there is no word char after the match.
In Java 8, you may use this code:
Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
.map(Pattern::quote)
.collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\\w)(?:" + String.join("|", nToDel) + ")(?!\\w)";
The regex will look like (?<!\w)(?:\Q+end\E|\Qsomething-\E)(?!\w). Note that the symbols between \Q and \E is parsed as literal symbols.
The problem is that you're not creating the correct regex for replacing the words in the set.
"\\b"+toDelete+"\\b" will produce this String \b[end, something]\b which is not what you need.
To fix that you can do something like this:
for(String del : toDelete){
line = line.replaceAll("\\b"+del+"\\b", "");
}
What this does is to go through the set, produce a regex from each word and remove that word from the line String.
Another approach will be to produce a single regex from all the words in the set.
Eg:
String regex = "";
for(String word : toDelete){
regex+=(regex.isEmpty() ? "" : "|") + "(\\b"+word+"\\b)";
}
....
line = line.replace(regex, "");
This should produce a regex that looks something like this: (\bend\b)|(\bsomething\b)

Validating a Text File content using regex

The Input text file has content as following :
TIMINCY........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
.
.
. (any number of lines containing DETAILS)
TIMINCY........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
.
.
.(so on)
Q: I need to validate the file using regex so that if the file's content is NOT
in accordance with respect to the pattern given above then I can throw CustomException.
Please let know if you could help. Any help is appreciated cordially.
String patternString = "TMINCY"+"[.]\\{*\\}"+";"+"["+"DETAILS"+"[.]\\{*\\}"+";"+"]"+"\\{*\\}"+"]"+"\\{*\\};";
Pattern pattern = Pattern.compile(patternString );
String messageString = null;
StringBuilder builder = new StringBuilder();
try (BufferedReader reader = Files.newBufferedReader(curracFile.toPath(), charset)) {
String line;
while ((line = reader.readLine()) != null) {
builder.append(line);
builder.append(NEWLINE_CHAR_SEQUENCE);
}
messageString = builder.toString();
} catch (IOException ex) {
LOGGER.error(FILE_CREATION_ERROR, ex.getCause());
throw new BusinessConversionException(FILE_CREATION_ERROR, ex);
}
System.out.println("messageString is::"+messageString);
return pattern.matcher(messageString).matches();
But it is Returning FALSE for correct file. Please help me with the regex.
What about something like "^(TIMINCY|DETAIL)[\.]+[a-zA-z\s.]+"
"^" - matches the start of the line
"(TIMINCY|DETAIL)" - matches TIMINCY or DETAIL
"[\.]" - matches the dot character to occur one or more times
"[a-zA-z\s.]+" - Here you put the allowed characters to occur one or more time
Reference: Oracle Documentation
You could try line by line when you're iterating over the lines
Pattern p = Pattern.compile("^(?:TIMINCY|DETAILS)[.]{8}.*");
//Explanation:
// ^ : Matches the begining of the string.
// (?:): non capturing group.
// [.]{8}: Matches a dot (".") eight times in a row.
// .*: Matches everything until the end of the string
// | : Regex OR operator
String line = reader.readLine()
Matcher m;
while (line != null) {
m = p.matcher(line);
if(!m.matches(line))
throw new CustomException("Not valid");
builder.append(line);
builder.append(NEWLINE_CHAR_SEQUENCE);
line = reader.readLine();
}
Also: Matcher.matches() returns true if the ENTIRE STRING matches your regular expression, i would recommend using Matcher.find() to find patterns you don't want.
Matcher (Java 7)

matching between array and content of file without using regex

please possible make matching between array and content of file without using regex.
please replay:-
if i have a txt file contain this sentences:
the sql is the best book for jon.
book sql is the best title for jon.
the html for author asr.
book java for famous writer amr.
and if i stored this string in array;
sql html java
jon asr amr
I want to search for content of array in the file for example if "sql" and"jon" in the same sentence in the txt file then write the sentence and
write all word before "sql" named as prefix and all word between two "sql" and"jon" and named as middle and all word after "jon"named as suffix.
I try to write cod :
String book[][] = {{"sql","html","java"},{"jon","asr","amr"}};
String input;
try {
BufferedReader br = new BufferedReader(new FileReader(new File("sample.txt") ));
input= br.readLine();
while ((input)!= null)
{
if((book[0][0].contains(input))&( book[1][0]).contains(input)){
System.out.println();
if((book[0][1].contains(input))&( book[1][1]).contains(input)){
System.out.println();
if((book[0][2].contains(input))&( book[1][2]).contains(input)){
System.out.println();
}
else
System.out.println("not match");
}}
}} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
i don't know how to write code to extract prefix,middle and suffix
the output is:
the sentence is : the sql is the best book for jon.
prefix is :the
middle is:is the best book for
suffix is: null
and so on...
You should use Pattern class for that. http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Tutorial http://docs.oracle.com/javase/tutorial/essential/regex/
Sorry, I'm not going to write the exact code.
The pattern will look like
"(.*)(?:sql|html|java)(.*)(?:jon|asr|amr)(.*)"
Then, in Matcher you will find your prefix, middle and suffix as matcher.group(1), matcher.group(2) and matcher.group(3).
Here is the code you need:
String line = "the sql is the best book for jon.";
String regex = "(.*)(sql|html|java)(.*)(jon|asr|amr)(.*)";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(line);
matcher.find();
String prefix = matcher.group(1);
String firstMatch = matcher.group(2);
String middle = matcher.group(3);
String secondMatch = matcher.group(4);
String suffix = matcher.group(5);

strip data from a text file

Im going to start by posting what the date in the text file looks like, this is just 4 lines of it, the actually file is a couple hundred lines long.
Friday, September 9 2011
-STV 101--------05:00 - 23:59 SSB 4185 Report Printed on 9/08/2011 at 2:37
0-AH 104--------07:00 - 23:00 AH GYM Report Printed on 9/08/2011 at 2:37
-BG 105--------07:00 - 23:00 SH GREAT HALL Report Printed on 9/08/2011 at 2:37
What I want to do with this text file is ignore the first line with the date on it, and then ignore the '-' on the next line but read in the "STV 101", "5:00" and "23:59" save them to variables and then ignore all other characters on that line and then so on for each line after that.
Here is how I am currently reading the lines entirely. And then I just call this function once the user has put the path in the scheduleTxt JTextfield. It can read and print each line out fine.
public void readFile () throws IOException
{
try
{
FileInputStream fstream = new FileInputStream(scheduleTxt.getText());
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
while ((strLine = br.readLine()) != null)
{
System.out.println (strLine);
}
in.close();
}
catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
UPDATE: it turns out I also need to strip Friday out of the top line and put it in a variable as well
Thanks! Beef.
Did not test it thoroughly, but this regular expression would capture the info you need in groups 2, 5 and 7: (Assuming you're only interested in "AH 104" in the example of "0-AH 104----")
^(\S)*-(([^-])*)(-)+((\S)+)\s-\s((\S)+)\s(.)*
String regex = "^(\\S)*-(([^-])*)(-)+((\\S)+)\\s-\\s((\\S)+)\\s(.)*";
Pattern pattern = Pattern.compile(regex);
while ((strLine = br.readLine()) != null){
Matcher matcher = pattern.matcher(strLine);
boolean matchFound = matcher.find();
if (matchFound){
String s1 = matcher.group(2);
String s2 = matcher.group(5);
String s3 = matcher.group(7);
System.out.println (s1 + " " + s2 + " " + s3);
}
}
The expression could be tuned with non-capturing groups in order to capture only the information you want.
Explanation of the regexp's elements:
^(\S)*- Matches group of non-whitespace characters ended by -. Note: Could have been ^(.)*- instead, would not work if there are whitespaces before the first -.
(([^-])*) Matches group of every character except -.
(-)+ Matches group of one or more -.
((\S)+) Matches group of one or more non-white-space characters. This is captured in group 5.
\s-\s Matches group of white-space followed by - followed by whitespace.
'((\S)+)' Same as 4. This is captured in group 7.
\s(.)* Matches white-space followed by anything, which will be skipped.
More info on regular expression can be found on this tutorial.
There are also several useful cheatsheets around. When designing/debugging an expression, a regexp testing tool can prove quite useful, too.

Java regex matching

strong textI have a bunch of lines in a textfile and I want to match this ${ALPANUMERIC characters} and replace it with ${SAME ALPHANUMERIC characters plus _SOMETEXT(CONSTANT)}.
I've tried this expression ${(.+)} but it didn't work and I also don't know how to do the replace regex in java.
thank you for your feedback
Here is some of my code :
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line); // get a matcher object
if(m.find()) {
System.out.println("MATCH: "+m.group());
//TODO
//REPLACE STRING
//THEN APPEND String Builder
}
}
OK this above works but it only founds my variable and not the whole line for ex here is my input :
some text before ${VARIABLE_NAME} some text after
some text before ${VARIABLE_NAME2} some text after
some text before some text without variable some text after
... etc
so I just want to replace the ${VARIABLE_NAME} or ${VARIABLE_NAME} with ${VARIABLE_NAME2_SOMETHING} but leave preceding and following text line as it is
EDIT:
I though I though of a way like this :
if(line.contains("\\${([a-zA-Z0-9 ]+)}")){
System.out.println(line);
}
if(line.contains("\\$\\{.+\\}")){
System.out.println(line);
}
My idea was to capture the line containing this, then replace , but the regex is not ok, it works with pattern/matcher combination though.
EDIT II
I feel like I'm getting closer to the solution here, here is what I've come up with so far :
if(line.contains("$")){
System.out.println(line.replaceAll("\\$\\{.+\\}", "$1" +"_SUFFIX"));
}
What I meant by $1 is the string you just matched replace it with itself + _SUFFIX
I would use the String.replaceAll() method like so:
`String old="some string data";
String new=old.replaceAll("$([a-zA-Z0-9]+)","(\1) CONSTANT"); `
The $ is a special regular expression character that represents the end of a line. You'll need to escape it in order to match it. You'll also need to escape the backslash that you use for escaping the dollar sign because of the way Java handles strings.
Once you have your text in a string, you should be able to do the following:
str.replaceAll("\\${([a-zA-Z0-9 ]+)}", "\\${$1 _SOMETEXT(CONSTANT)}")
If you have other characters in your variable names (i.e. underscores, symbols, etc...) then just add them to the character class that you are matching for.
Edit: If you want to use a Pattern and Matcher then there are still a few changes. First, you probably want to compile your Pattern outside of the loop. Second, you can use this, although it is more verbose.
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line);
sb.append(m.replaceAll("\\${$1 _SOMETEXT(CONSTANT)}"));
THE SOLUTION :
while ((line = br.readLine()) != null) {
if(line.contains("$")){
sb.append(line.replaceAll("\\$\\{(.+)\\}", "\\${$1" +"_SUFFIX}") + "\n");
}else{
sb.append(line + "\n");
}
}
line = line.replaceAll("\\$\\{\\w+", "$0_SOMETHING");
There's no need to check for the presence of $ or whatever; that's part of what replaceAll() does. Anyway, contains() is not regex-powered like find(); it just does a plain literal text search.

Categories

Resources