Java regex matching - java

strong textI have a bunch of lines in a textfile and I want to match this ${ALPANUMERIC characters} and replace it with ${SAME ALPHANUMERIC characters plus _SOMETEXT(CONSTANT)}.
I've tried this expression ${(.+)} but it didn't work and I also don't know how to do the replace regex in java.
thank you for your feedback
Here is some of my code :
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line); // get a matcher object
if(m.find()) {
System.out.println("MATCH: "+m.group());
//TODO
//REPLACE STRING
//THEN APPEND String Builder
}
}
OK this above works but it only founds my variable and not the whole line for ex here is my input :
some text before ${VARIABLE_NAME} some text after
some text before ${VARIABLE_NAME2} some text after
some text before some text without variable some text after
... etc
so I just want to replace the ${VARIABLE_NAME} or ${VARIABLE_NAME} with ${VARIABLE_NAME2_SOMETHING} but leave preceding and following text line as it is
EDIT:
I though I though of a way like this :
if(line.contains("\\${([a-zA-Z0-9 ]+)}")){
System.out.println(line);
}
if(line.contains("\\$\\{.+\\}")){
System.out.println(line);
}
My idea was to capture the line containing this, then replace , but the regex is not ok, it works with pattern/matcher combination though.
EDIT II
I feel like I'm getting closer to the solution here, here is what I've come up with so far :
if(line.contains("$")){
System.out.println(line.replaceAll("\\$\\{.+\\}", "$1" +"_SUFFIX"));
}
What I meant by $1 is the string you just matched replace it with itself + _SUFFIX

I would use the String.replaceAll() method like so:
`String old="some string data";
String new=old.replaceAll("$([a-zA-Z0-9]+)","(\1) CONSTANT"); `

The $ is a special regular expression character that represents the end of a line. You'll need to escape it in order to match it. You'll also need to escape the backslash that you use for escaping the dollar sign because of the way Java handles strings.
Once you have your text in a string, you should be able to do the following:
str.replaceAll("\\${([a-zA-Z0-9 ]+)}", "\\${$1 _SOMETEXT(CONSTANT)}")
If you have other characters in your variable names (i.e. underscores, symbols, etc...) then just add them to the character class that you are matching for.
Edit: If you want to use a Pattern and Matcher then there are still a few changes. First, you probably want to compile your Pattern outside of the loop. Second, you can use this, although it is more verbose.
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line);
sb.append(m.replaceAll("\\${$1 _SOMETEXT(CONSTANT)}"));

THE SOLUTION :
while ((line = br.readLine()) != null) {
if(line.contains("$")){
sb.append(line.replaceAll("\\$\\{(.+)\\}", "\\${$1" +"_SUFFIX}") + "\n");
}else{
sb.append(line + "\n");
}
}

line = line.replaceAll("\\$\\{\\w+", "$0_SOMETHING");
There's no need to check for the presence of $ or whatever; that's part of what replaceAll() does. Anyway, contains() is not regex-powered like find(); it just does a plain literal text search.

Related

Validating a Text File content using regex

The Input text file has content as following :
TIMINCY........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
.
.
. (any number of lines containing DETAILS)
TIMINCY........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
DETAIL........ many arbitrary characters incl. white spaces and tabs
.
.
.(so on)
Q: I need to validate the file using regex so that if the file's content is NOT
in accordance with respect to the pattern given above then I can throw CustomException.
Please let know if you could help. Any help is appreciated cordially.
String patternString = "TMINCY"+"[.]\\{*\\}"+";"+"["+"DETAILS"+"[.]\\{*\\}"+";"+"]"+"\\{*\\}"+"]"+"\\{*\\};";
Pattern pattern = Pattern.compile(patternString );
String messageString = null;
StringBuilder builder = new StringBuilder();
try (BufferedReader reader = Files.newBufferedReader(curracFile.toPath(), charset)) {
String line;
while ((line = reader.readLine()) != null) {
builder.append(line);
builder.append(NEWLINE_CHAR_SEQUENCE);
}
messageString = builder.toString();
} catch (IOException ex) {
LOGGER.error(FILE_CREATION_ERROR, ex.getCause());
throw new BusinessConversionException(FILE_CREATION_ERROR, ex);
}
System.out.println("messageString is::"+messageString);
return pattern.matcher(messageString).matches();
But it is Returning FALSE for correct file. Please help me with the regex.
What about something like "^(TIMINCY|DETAIL)[\.]+[a-zA-z\s.]+"
"^" - matches the start of the line
"(TIMINCY|DETAIL)" - matches TIMINCY or DETAIL
"[\.]" - matches the dot character to occur one or more times
"[a-zA-z\s.]+" - Here you put the allowed characters to occur one or more time
Reference: Oracle Documentation
You could try line by line when you're iterating over the lines
Pattern p = Pattern.compile("^(?:TIMINCY|DETAILS)[.]{8}.*");
//Explanation:
// ^ : Matches the begining of the string.
// (?:): non capturing group.
// [.]{8}: Matches a dot (".") eight times in a row.
// .*: Matches everything until the end of the string
// | : Regex OR operator
String line = reader.readLine()
Matcher m;
while (line != null) {
m = p.matcher(line);
if(!m.matches(line))
throw new CustomException("Not valid");
builder.append(line);
builder.append(NEWLINE_CHAR_SEQUENCE);
line = reader.readLine();
}
Also: Matcher.matches() returns true if the ENTIRE STRING matches your regular expression, i would recommend using Matcher.find() to find patterns you don't want.
Matcher (Java 7)

Find multiple pattern matches and replace each one in a text file

I want to find multiple string matches and replace in a text file and replace unique for each pattern.
Example
I have the following patterns to match and replace respectively
I want to find this patterns 1 ."cd", 2. "kj", 3."by" and replace by this: 1."sdi" 2."ge" 3. "bi".
BufferedReader cd = new BufferedReader(new FileReader("text.txt"));
String line;
Pattern pattern = Pattern.compile("cd",Pattern.CASE_INSENSITIVE);
Matcher matcher;
while ((line = cd.readLine()) != null) {
matcher = pattern.matcher(line);
if (matcher.find()) {
line = matcher.replaceAll("sdi");
System.out.println(line); cd.close();
This simple code works for single pattern match. Is there any other way possible to do?
Why dont you just do something like this?
yourstring.replaceAll("cd", "sdi").replaceAll("kj", "ge").replaceAll("by", "bi");
This way, the compiler will take care of implementing and matching the patterns.

Reading a specific text in Java

This is kind of a followup to my other question simple Java Regex read between two
Now my code looks like this. I am reading the contents of a file, scanning for whatever between src and -t1. Running this code will return 1 correct link but the source file contains 10 and I can't figure out the loop. I thought another way might be to write to a second file on disk and remove the first link from the original source but I can't code that either:
File workfile = new File("page.txt");
BufferedReader br = new BufferedReader(new FileReader(workfile));
String line;
while ((line = br.readLine()) != null) {
//System.out.println(line);
String url = line.split("<img src=")[1].split("-t1")[0];
System.out.println(url);
}
br.close();
I think you want something like
import java.util.regex.*;
Pattern urlPattern = Pattern.compile("<img src=(.*?)-t1");
while ((line = br.readLine()) != null) {
Matcher m = urlPattern.matcher (line);
while (m.find()) {
System.out.println(m.group(1));
}
}
The regular expression looks for strings beginning with <img src= and ending with -t1 (and looks for the shortest substrings possible, so that more than one can be found in the line). The part in parentheses is a "capture group" to capture the text that gets matched; this is called group 1. Then, for each line, we loop on find() to find all occurrences in each line. Each time we find one, we print what's in group 1.

Replace a particular String from a text file

I'm trying to replace the occurence of a certain String from a given text file. Here's the code I've written:
BufferedReader tempFileReader = new BufferedReader(new InputStreamReader(new FileInputStream(tempFile)));
File tempFileBuiltForUse = new File("C:\\testing\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents ;
while( (lineContents = tempFileReader.readLine()) != null)
{
Pattern pattern = Pattern.compile("/.");
Matcher matcher = pattern.matcher(lineContents);
String lineByLine = null;
while(matcher.find())
{
lineByLine = lineContents.replaceAll(matcher.group(),System.getProperty("line.separator"));
changer.write(lineByLine);
}
}
changer.close();
tempFileReader.close();
Suppose the contents of my tempFile are:
This/DT is/VBZ a/DT sample/NN text/NN ./.
I want the anotherTempFile to contain :
This/DT is/VBZ a/DT sample/NN text/NN .
with a new line.
But I'm not getting the desired output. And I'm not able to see where I'm going wrong. :-(
Kindly help. :-)
A dot means "every character" in regular expressions. Try to escape it:
Pattern pattern = Pattern.compile("\\./\\.");
(You need two backslahes, to escape the backslash itself inside the String, so that Java knows you want to have a backslash and not a special character as the newline character, e.g. \n
In a regex, the dot (.) matches any character (except newlines), so it needs to be escaped if you want it to match a literal dot. Also, you appear to be missing the first dot in your regex since you want the pattern to match ./.:
Pattern pattern = Pattern.compile("\\./\\.");
Your regular expression has a problem. Also you don't have to use the Pattern and matcher. Simply use replaceAll() method of the String class for the replacement. It would be easier. Try the code below:
tempFileReader = new BufferedReader(
new InputStreamReader(new FileInputStream("c:\\test.txt")));
File tempFileBuiltForUse = new File("C:\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents;
while ((lineContents = tempFileReader.readLine()) != null) {
String lineByLine = lineContents.replaceAll("\\./\\.", System.getProperty("line.separator"));
changer.write(lineByLine);
}
changer.close();
tempFileReader.close();
/. is a regular expression \[any-symbol].
Change into to `/\\.'

strip data from a text file

Im going to start by posting what the date in the text file looks like, this is just 4 lines of it, the actually file is a couple hundred lines long.
Friday, September 9 2011
-STV 101--------05:00 - 23:59 SSB 4185 Report Printed on 9/08/2011 at 2:37
0-AH 104--------07:00 - 23:00 AH GYM Report Printed on 9/08/2011 at 2:37
-BG 105--------07:00 - 23:00 SH GREAT HALL Report Printed on 9/08/2011 at 2:37
What I want to do with this text file is ignore the first line with the date on it, and then ignore the '-' on the next line but read in the "STV 101", "5:00" and "23:59" save them to variables and then ignore all other characters on that line and then so on for each line after that.
Here is how I am currently reading the lines entirely. And then I just call this function once the user has put the path in the scheduleTxt JTextfield. It can read and print each line out fine.
public void readFile () throws IOException
{
try
{
FileInputStream fstream = new FileInputStream(scheduleTxt.getText());
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
while ((strLine = br.readLine()) != null)
{
System.out.println (strLine);
}
in.close();
}
catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
UPDATE: it turns out I also need to strip Friday out of the top line and put it in a variable as well
Thanks! Beef.
Did not test it thoroughly, but this regular expression would capture the info you need in groups 2, 5 and 7: (Assuming you're only interested in "AH 104" in the example of "0-AH 104----")
^(\S)*-(([^-])*)(-)+((\S)+)\s-\s((\S)+)\s(.)*
String regex = "^(\\S)*-(([^-])*)(-)+((\\S)+)\\s-\\s((\\S)+)\\s(.)*";
Pattern pattern = Pattern.compile(regex);
while ((strLine = br.readLine()) != null){
Matcher matcher = pattern.matcher(strLine);
boolean matchFound = matcher.find();
if (matchFound){
String s1 = matcher.group(2);
String s2 = matcher.group(5);
String s3 = matcher.group(7);
System.out.println (s1 + " " + s2 + " " + s3);
}
}
The expression could be tuned with non-capturing groups in order to capture only the information you want.
Explanation of the regexp's elements:
^(\S)*- Matches group of non-whitespace characters ended by -. Note: Could have been ^(.)*- instead, would not work if there are whitespaces before the first -.
(([^-])*) Matches group of every character except -.
(-)+ Matches group of one or more -.
((\S)+) Matches group of one or more non-white-space characters. This is captured in group 5.
\s-\s Matches group of white-space followed by - followed by whitespace.
'((\S)+)' Same as 4. This is captured in group 7.
\s(.)* Matches white-space followed by anything, which will be skipped.
More info on regular expression can be found on this tutorial.
There are also several useful cheatsheets around. When designing/debugging an expression, a regexp testing tool can prove quite useful, too.

Categories

Resources