I want to find multiple string matches and replace in a text file and replace unique for each pattern.
Example
I have the following patterns to match and replace respectively
I want to find this patterns 1 ."cd", 2. "kj", 3."by" and replace by this: 1."sdi" 2."ge" 3. "bi".
BufferedReader cd = new BufferedReader(new FileReader("text.txt"));
String line;
Pattern pattern = Pattern.compile("cd",Pattern.CASE_INSENSITIVE);
Matcher matcher;
while ((line = cd.readLine()) != null) {
matcher = pattern.matcher(line);
if (matcher.find()) {
line = matcher.replaceAll("sdi");
System.out.println(line); cd.close();
This simple code works for single pattern match. Is there any other way possible to do?
Why dont you just do something like this?
yourstring.replaceAll("cd", "sdi").replaceAll("kj", "ge").replaceAll("by", "bi");
This way, the compiler will take care of implementing and matching the patterns.
Related
I have a code snippet to convert an input stream into a String. I then use java.util.regex.Matcher to find something inside the string.
The following works for me:
StringBuilder sb = new StringBuilder();
InputStream ins; // the InputStream data
BufferedReader br = new BufferedReader(new InputStreamReader(ins));
br.lines().forEach(sb::append);
br.close();
String data = sb.toString();
Pattern pattern = Pattern.compile(".*My_PATTERN:(.*)");
Matcher matcher = pattern.matcher(data);
if (matcher.find())
String searchedStr = matcher.group(1); // I find a match here
But if I try to replace BufferedReader with Apache IOUtils, I do not find any matches with the same string.
InputStream ins; // the InputStream data
String data = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
Pattern pattern = Pattern.compile(".*My_PATTERN:(.*)");
Matcher matcher = pattern.matcher(data);
if (matcher.find())
String searchedStr = matcher.group(1); // I cannot find a match here
I have tried with other "StandardCharsets" apart from UTF-8 but none have worked.
I am unable to understand what is different here that would cause IOUtils to not match. Can someone kindly help me out here?
The first code removes line brakes, the second doesn't.
So you should define multiline pattern matching:
In the pattern (starting with flags s=dotall, m=multiline)
Pattern pattern = Pattern.compile("(?sm).*My_PATTERN:(.*)");
In the pattern v2
Pattern pattern = Pattern.compile("[\\s\\S]*My_PATTERN:([\\s\\S]*)");
With flags
Pattern pattern = Pattern.compile(".*My_PATTERN:(.*)", MULTILINE|DOTALL);
All matches line brakes in the group's value.
Or remove line breaks ie: data = data.replaceAll("\\r?\\n", "");
See:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#compile(java.lang.String,%20int)
https://docs.oracle.com/javase/tutorial/essential/regex/pattern.html
I am trying to find strings from one text file that are present in another. I have 2 text files, file1.txt and file2.txt the contents of which are as below :
file1.txt
Hello
Second Line
Text line
Final Line
file2.txt
Final Linee
Text llline
line 3 of file2
Helloo
The code I have is as below :
public class Regex {
public static void main (String[] args) throws IOException{
BufferedReader inputFile= new BufferedReader(new FileReader("file1.txt"));
String line;
String pattern;
while((line = inputFile.readLine()) != null){
System.out.println(line);
BufferedReader patternsFile = new BufferedReader(new FileReader("file2.txt"));
while ((pattern = patternsFile.readLine()) != null){
Pattern r = Pattern.compile(pattern);
System.out.println(r);
Matcher m = r.matcher(line);
if (m.find()){
System.out.println("Line corresponding to pattern in file1.txt : " + line);
}
}
}
}
However, the above code returns all the lines from file1.txt that match some pattern from file2.txt. However, I want to find the closest string with edit distance of n letters. So for example if n=1, then the output should be :
Hello
Final Line
and if n=2 then it should output
Hello
Final Line
Text line
I am starting out with Java, and have absolutely no experience with it. Therefore any and all help would be appreciated.
Thank you
Okay, i can give two tips.
First of all, you may want to look at Apache Lucene if you are writing a text analyser or something similar or you need some strong matching features.
Secondly, if you are looking for something more "minimal" you can implement a Cosine Similarity algorithm which is really interesting and should really look at it.
Then you can re-implement it and adapt for you code.
You can find an implementation in Apache Common Text
This is kind of a followup to my other question simple Java Regex read between two
Now my code looks like this. I am reading the contents of a file, scanning for whatever between src and -t1. Running this code will return 1 correct link but the source file contains 10 and I can't figure out the loop. I thought another way might be to write to a second file on disk and remove the first link from the original source but I can't code that either:
File workfile = new File("page.txt");
BufferedReader br = new BufferedReader(new FileReader(workfile));
String line;
while ((line = br.readLine()) != null) {
//System.out.println(line);
String url = line.split("<img src=")[1].split("-t1")[0];
System.out.println(url);
}
br.close();
I think you want something like
import java.util.regex.*;
Pattern urlPattern = Pattern.compile("<img src=(.*?)-t1");
while ((line = br.readLine()) != null) {
Matcher m = urlPattern.matcher (line);
while (m.find()) {
System.out.println(m.group(1));
}
}
The regular expression looks for strings beginning with <img src= and ending with -t1 (and looks for the shortest substrings possible, so that more than one can be found in the line). The part in parentheses is a "capture group" to capture the text that gets matched; this is called group 1. Then, for each line, we loop on find() to find all occurrences in each line. Each time we find one, we print what's in group 1.
I'm trying to replace the occurence of a certain String from a given text file. Here's the code I've written:
BufferedReader tempFileReader = new BufferedReader(new InputStreamReader(new FileInputStream(tempFile)));
File tempFileBuiltForUse = new File("C:\\testing\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents ;
while( (lineContents = tempFileReader.readLine()) != null)
{
Pattern pattern = Pattern.compile("/.");
Matcher matcher = pattern.matcher(lineContents);
String lineByLine = null;
while(matcher.find())
{
lineByLine = lineContents.replaceAll(matcher.group(),System.getProperty("line.separator"));
changer.write(lineByLine);
}
}
changer.close();
tempFileReader.close();
Suppose the contents of my tempFile are:
This/DT is/VBZ a/DT sample/NN text/NN ./.
I want the anotherTempFile to contain :
This/DT is/VBZ a/DT sample/NN text/NN .
with a new line.
But I'm not getting the desired output. And I'm not able to see where I'm going wrong. :-(
Kindly help. :-)
A dot means "every character" in regular expressions. Try to escape it:
Pattern pattern = Pattern.compile("\\./\\.");
(You need two backslahes, to escape the backslash itself inside the String, so that Java knows you want to have a backslash and not a special character as the newline character, e.g. \n
In a regex, the dot (.) matches any character (except newlines), so it needs to be escaped if you want it to match a literal dot. Also, you appear to be missing the first dot in your regex since you want the pattern to match ./.:
Pattern pattern = Pattern.compile("\\./\\.");
Your regular expression has a problem. Also you don't have to use the Pattern and matcher. Simply use replaceAll() method of the String class for the replacement. It would be easier. Try the code below:
tempFileReader = new BufferedReader(
new InputStreamReader(new FileInputStream("c:\\test.txt")));
File tempFileBuiltForUse = new File("C:\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents;
while ((lineContents = tempFileReader.readLine()) != null) {
String lineByLine = lineContents.replaceAll("\\./\\.", System.getProperty("line.separator"));
changer.write(lineByLine);
}
changer.close();
tempFileReader.close();
/. is a regular expression \[any-symbol].
Change into to `/\\.'
strong textI have a bunch of lines in a textfile and I want to match this ${ALPANUMERIC characters} and replace it with ${SAME ALPHANUMERIC characters plus _SOMETEXT(CONSTANT)}.
I've tried this expression ${(.+)} but it didn't work and I also don't know how to do the replace regex in java.
thank you for your feedback
Here is some of my code :
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line); // get a matcher object
if(m.find()) {
System.out.println("MATCH: "+m.group());
//TODO
//REPLACE STRING
//THEN APPEND String Builder
}
}
OK this above works but it only founds my variable and not the whole line for ex here is my input :
some text before ${VARIABLE_NAME} some text after
some text before ${VARIABLE_NAME2} some text after
some text before some text without variable some text after
... etc
so I just want to replace the ${VARIABLE_NAME} or ${VARIABLE_NAME} with ${VARIABLE_NAME2_SOMETHING} but leave preceding and following text line as it is
EDIT:
I though I though of a way like this :
if(line.contains("\\${([a-zA-Z0-9 ]+)}")){
System.out.println(line);
}
if(line.contains("\\$\\{.+\\}")){
System.out.println(line);
}
My idea was to capture the line containing this, then replace , but the regex is not ok, it works with pattern/matcher combination though.
EDIT II
I feel like I'm getting closer to the solution here, here is what I've come up with so far :
if(line.contains("$")){
System.out.println(line.replaceAll("\\$\\{.+\\}", "$1" +"_SUFFIX"));
}
What I meant by $1 is the string you just matched replace it with itself + _SUFFIX
I would use the String.replaceAll() method like so:
`String old="some string data";
String new=old.replaceAll("$([a-zA-Z0-9]+)","(\1) CONSTANT"); `
The $ is a special regular expression character that represents the end of a line. You'll need to escape it in order to match it. You'll also need to escape the backslash that you use for escaping the dollar sign because of the way Java handles strings.
Once you have your text in a string, you should be able to do the following:
str.replaceAll("\\${([a-zA-Z0-9 ]+)}", "\\${$1 _SOMETEXT(CONSTANT)}")
If you have other characters in your variable names (i.e. underscores, symbols, etc...) then just add them to the character class that you are matching for.
Edit: If you want to use a Pattern and Matcher then there are still a few changes. First, you probably want to compile your Pattern outside of the loop. Second, you can use this, although it is more verbose.
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line);
sb.append(m.replaceAll("\\${$1 _SOMETEXT(CONSTANT)}"));
THE SOLUTION :
while ((line = br.readLine()) != null) {
if(line.contains("$")){
sb.append(line.replaceAll("\\$\\{(.+)\\}", "\\${$1" +"_SUFFIX}") + "\n");
}else{
sb.append(line + "\n");
}
}
line = line.replaceAll("\\$\\{\\w+", "$0_SOMETHING");
There's no need to check for the presence of $ or whatever; that's part of what replaceAll() does. Anyway, contains() is not regex-powered like find(); it just does a plain literal text search.