Replace a particular String from a text file - java

I'm trying to replace the occurence of a certain String from a given text file. Here's the code I've written:
BufferedReader tempFileReader = new BufferedReader(new InputStreamReader(new FileInputStream(tempFile)));
File tempFileBuiltForUse = new File("C:\\testing\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents ;
while( (lineContents = tempFileReader.readLine()) != null)
{
Pattern pattern = Pattern.compile("/.");
Matcher matcher = pattern.matcher(lineContents);
String lineByLine = null;
while(matcher.find())
{
lineByLine = lineContents.replaceAll(matcher.group(),System.getProperty("line.separator"));
changer.write(lineByLine);
}
}
changer.close();
tempFileReader.close();
Suppose the contents of my tempFile are:
This/DT is/VBZ a/DT sample/NN text/NN ./.
I want the anotherTempFile to contain :
This/DT is/VBZ a/DT sample/NN text/NN .
with a new line.
But I'm not getting the desired output. And I'm not able to see where I'm going wrong. :-(
Kindly help. :-)

A dot means "every character" in regular expressions. Try to escape it:
Pattern pattern = Pattern.compile("\\./\\.");
(You need two backslahes, to escape the backslash itself inside the String, so that Java knows you want to have a backslash and not a special character as the newline character, e.g. \n

In a regex, the dot (.) matches any character (except newlines), so it needs to be escaped if you want it to match a literal dot. Also, you appear to be missing the first dot in your regex since you want the pattern to match ./.:
Pattern pattern = Pattern.compile("\\./\\.");

Your regular expression has a problem. Also you don't have to use the Pattern and matcher. Simply use replaceAll() method of the String class for the replacement. It would be easier. Try the code below:
tempFileReader = new BufferedReader(
new InputStreamReader(new FileInputStream("c:\\test.txt")));
File tempFileBuiltForUse = new File("C:\\anotherTempFile.txt");
Writer changer = new BufferedWriter(new FileWriter(tempFileBuiltForUse));
String lineContents;
while ((lineContents = tempFileReader.readLine()) != null) {
String lineByLine = lineContents.replaceAll("\\./\\.", System.getProperty("line.separator"));
changer.write(lineByLine);
}
changer.close();
tempFileReader.close();

/. is a regular expression \[any-symbol].
Change into to `/\\.'

Related

Regex for replacing Exact String match [duplicate]

My input:
1. end
2. end of the day or end of the week
3. endline
4. something
5. "something" end
Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
String delete="end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+delete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
My output If I use the above snippet:(Also my expected output)
1.
2. of the day or of the week
3. endline
4. something
5. "something"
But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+toDelete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
I get my output as: (It just removes the space)
1. end
2. endofthedayorendoftheweek
3. endline
4. something
5. "something" end
Can u guys help me on this?
Click here to follow the thread
You need to create an alternation group out of the set with
String.join("|", toDelete)
and use as
line = line.replaceAll("\\b(?:"+String.join("|", toDelete)+")\\b", "");
The pattern will look like
\b(?:end|something)\b
See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).
Or, better, compile the regex before entering the loop:
Pattern pat = Pattern.compile("\\b(?:" + String.join("|", toDelete) + ")\\b");
...
line = pat.matcher(line).replaceAll("");
UPDATE:
To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!\w) instead of the initial \b to make sure there is no word char before and (?!\w) negative lookahead instead of the final \b to make sure there is no word char after the match.
In Java 8, you may use this code:
Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
.map(Pattern::quote)
.collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\\w)(?:" + String.join("|", nToDel) + ")(?!\\w)";
The regex will look like (?<!\w)(?:\Q+end\E|\Qsomething-\E)(?!\w). Note that the symbols between \Q and \E is parsed as literal symbols.
The problem is that you're not creating the correct regex for replacing the words in the set.
"\\b"+toDelete+"\\b" will produce this String \b[end, something]\b which is not what you need.
To fix that you can do something like this:
for(String del : toDelete){
line = line.replaceAll("\\b"+del+"\\b", "");
}
What this does is to go through the set, produce a regex from each word and remove that word from the line String.
Another approach will be to produce a single regex from all the words in the set.
Eg:
String regex = "";
for(String word : toDelete){
regex+=(regex.isEmpty() ? "" : "|") + "(\\b"+word+"\\b)";
}
....
line = line.replace(regex, "");
This should produce a regex that looks something like this: (\bend\b)|(\bsomething\b)

Find multiple pattern matches and replace each one in a text file

I want to find multiple string matches and replace in a text file and replace unique for each pattern.
Example
I have the following patterns to match and replace respectively
I want to find this patterns 1 ."cd", 2. "kj", 3."by" and replace by this: 1."sdi" 2."ge" 3. "bi".
BufferedReader cd = new BufferedReader(new FileReader("text.txt"));
String line;
Pattern pattern = Pattern.compile("cd",Pattern.CASE_INSENSITIVE);
Matcher matcher;
while ((line = cd.readLine()) != null) {
matcher = pattern.matcher(line);
if (matcher.find()) {
line = matcher.replaceAll("sdi");
System.out.println(line); cd.close();
This simple code works for single pattern match. Is there any other way possible to do?
Why dont you just do something like this?
yourstring.replaceAll("cd", "sdi").replaceAll("kj", "ge").replaceAll("by", "bi");
This way, the compiler will take care of implementing and matching the patterns.

Java Pattern/ Matcher

This is a sample text: \1f\1e\1d\020028. I cannot modify the input text, I am reading long string of texts from a file.
I want to extract the following: \1f, \1e, \1d, \02
For this, I have written the following regular expression pattern: "\\[a-fA-F0-9]"
I am using Pattern and Matcher classes, but my matcher is not able find the pattern using the mentioned regular expression. I have tested this regex with the text on some online regex websites and surprisingly it works there.
Where am I going wrong?
Original code:
public static void main(String[] args) {
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
inputText = inputText.replace("\\", "\\\\");
String regex = "\\\\[a-fA-F0-9]{2}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(inputText);
while (m.find()) {
System.out.println(m.group());
}
}
Output: Nothing is printed
(answer changed after OP added more details)
Your string
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
Doesn't actually contains any \ literals because according to Java Language Specification in section 3.10.6. Escape Sequences for Character and String Literals \xxx will be interpreted as character indexed in Unicode Table with octal (base/radix 8) value represented by xxx part.
Example \123 = 1*82 + 2*81 + 3*80 = 1*64 + 2*8 + 3*1 = 64+16+3 = 83 which represents character S
If string you presented in your question is written exactly the same in your text file then you should write it as
String inputText = "\\1f\\1e\\1d\\02002868BF03030000000000000000S023\\1f\\1e\\1d\\03\\0d";
(with escaped \ which now will represent literal).
(older version of my answer)
It is hard to tell what exactly you did wrong without seeing your code. You should be able to find at least \1, \1, \1, \0 since your regex can match one \ and one hexadecimal character placed after it.
Anyway this is how you can find results you mentioned in question:
String text = "\\1f\\1e\\1d\\020028";
Pattern p = Pattern.compile("\\\\[a-fA-F0-9]{2}");
// ^^^--we want to find two hexadecimal
// characters after \
Matcher m = p.matcher(text);
while (m.find())
System.out.println(m.group());
Output:
\1f
\1e
\1d
\02
You need to read the file properly and replace '\' characters with '\\'. Assume that there is file called test_file in your project with this content:
\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d
Here is the code to read the file and extract values:
public static void main(String[] args) throws IOException, URISyntaxException {
Test t = new Test();
t.test();
}
public void test() throws IOException {
BufferedReader br =
new BufferedReader(
new InputStreamReader(
getClass().getResourceAsStream("/test_file.txt"), "UTF-8"));
String inputText;
while ((inputText = br.readLine()) != null) {
inputText = inputText.replace("\\", "\\\\");
Pattern pattern = Pattern.compile("\\\\[a-fA-F0-9]{2}");
Matcher match = pattern.matcher(inputText);
while (match.find()) {
System.out.println(match.group());
}
}
}
Try adding a . at the end, like:
\\[a-fA-F0-9].
If you don't want to modify the input string, you could try something like:
static public void main(String[] argv) {
String s = "\1f\1e\1d\020028";
Pattern regex = Pattern.compile("[\\x00-\\x1f][0-9A-Fa-f]");
Matcher match = regex.matcher(s);
while (match.find()) {
char[] c = match.group().toCharArray();
System.out.println(String.format("\\%d%s",c[0]+0, c[1])) ;
}
}
Yes, it's not perfect, but you get the idea.

reading character like ö and ü from file in eclipse

I have a input file which contains some words like bört and übuk.When I read this line based on the following code I got these strange results. How can I solve it?
String line = bufferedReader.readLine();
if (line == null) { break; }
String[] words = line.split("\\W+");
for (String word : words) {
System.out.println(word);
output is
b
rt
and
buk
Try to create a BufferedReader handling UTF8 characters encoding :
FileInputStream fis = new FileInputStream(new File("someFile.txt"));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader bufferedReader = new BufferedReader(isr);
It seems that your problem is that standard character class \\W is negation of \\w which represents only [a-zA-Z0-9_] characters, so split("\\W+") will split on every character which is not in this character class like in your case ö, ü.
To solve this problem and include also Unicode characters you can compile your regex with Pattern.UNICODE_CHARACTER_CLASS flag which enables the Unicode version of Predefined character classes and POSIX character classes. To use this flag you can add (?U)at start of used regex
String[] words = line.split("(?U)\\W+");
Demo:
String line = "bört and übuk";
String[] words = line.split("(?U)\\W+");
for (String word : words)
System.out.println(word);
Output:
bört
and
übuk
You need something like this :-
BufferedReader bufferReader = new BufferedReader(
new InputStreamReader(new FileInputStream(fileDir), "UTF-8"));
Here instead of UTF-8 , you can put the encoding you need to support while reading the file

Java regex matching

strong textI have a bunch of lines in a textfile and I want to match this ${ALPANUMERIC characters} and replace it with ${SAME ALPHANUMERIC characters plus _SOMETEXT(CONSTANT)}.
I've tried this expression ${(.+)} but it didn't work and I also don't know how to do the replace regex in java.
thank you for your feedback
Here is some of my code :
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line); // get a matcher object
if(m.find()) {
System.out.println("MATCH: "+m.group());
//TODO
//REPLACE STRING
//THEN APPEND String Builder
}
}
OK this above works but it only founds my variable and not the whole line for ex here is my input :
some text before ${VARIABLE_NAME} some text after
some text before ${VARIABLE_NAME2} some text after
some text before some text without variable some text after
... etc
so I just want to replace the ${VARIABLE_NAME} or ${VARIABLE_NAME} with ${VARIABLE_NAME2_SOMETHING} but leave preceding and following text line as it is
EDIT:
I though I though of a way like this :
if(line.contains("\\${([a-zA-Z0-9 ]+)}")){
System.out.println(line);
}
if(line.contains("\\$\\{.+\\}")){
System.out.println(line);
}
My idea was to capture the line containing this, then replace , but the regex is not ok, it works with pattern/matcher combination though.
EDIT II
I feel like I'm getting closer to the solution here, here is what I've come up with so far :
if(line.contains("$")){
System.out.println(line.replaceAll("\\$\\{.+\\}", "$1" +"_SUFFIX"));
}
What I meant by $1 is the string you just matched replace it with itself + _SUFFIX
I would use the String.replaceAll() method like so:
`String old="some string data";
String new=old.replaceAll("$([a-zA-Z0-9]+)","(\1) CONSTANT"); `
The $ is a special regular expression character that represents the end of a line. You'll need to escape it in order to match it. You'll also need to escape the backslash that you use for escaping the dollar sign because of the way Java handles strings.
Once you have your text in a string, you should be able to do the following:
str.replaceAll("\\${([a-zA-Z0-9 ]+)}", "\\${$1 _SOMETEXT(CONSTANT)}")
If you have other characters in your variable names (i.e. underscores, symbols, etc...) then just add them to the character class that you are matching for.
Edit: If you want to use a Pattern and Matcher then there are still a few changes. First, you probably want to compile your Pattern outside of the loop. Second, you can use this, although it is more verbose.
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line);
sb.append(m.replaceAll("\\${$1 _SOMETEXT(CONSTANT)}"));
THE SOLUTION :
while ((line = br.readLine()) != null) {
if(line.contains("$")){
sb.append(line.replaceAll("\\$\\{(.+)\\}", "\\${$1" +"_SUFFIX}") + "\n");
}else{
sb.append(line + "\n");
}
}
line = line.replaceAll("\\$\\{\\w+", "$0_SOMETHING");
There's no need to check for the presence of $ or whatever; that's part of what replaceAll() does. Anyway, contains() is not regex-powered like find(); it just does a plain literal text search.

Categories

Resources