Java String Replacement - java

I have a string like this in a file
<script>
Evening</script>
I have written a code to replace this string but it's not identifying the newline character
i,e. I want to replace above string with:
<h1>Done</h1>
code goes like this:
package stringreplace;
import java.io.*;
import org.omg.CORBA.Request;
public class stringreplace {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
FileReader fr = null;
BufferedReader br = null;
try
{
fr = new FileReader("G://abc.html");
br = new BufferedReader(fr);
String newtext="";
String line="";
String matchExist1 = "<script>\r\nEvening</script>";
String newpattern = "<h1>Done</h1>";
String matchExist2 = "</body>";
String newpattern2 = "<script>alpha</script></body>";
StringBuffer sb = new StringBuffer();
while((line=br.readLine())!=null)
{
int ind2 = line.indexOf(matchExist1);
System.out.println(ind2);
int ind3 = line.indexOf(matchExist2);
if((ind2==-1) || (ind3==-1))
{
line = line.replaceFirst(matchExist1,newpattern);
line = line.replaceFirst(matchExist2,newpattern2);
sb.append(line+"\n");
}
//sb.append(line+"\n");
else if((ind2!=-1) || (ind3!=-1))
{
String tag = "</body>";
line = line.replaceFirst("</body>",tag);
sb.append(line+"\n");
}
}
br.close();
FileWriter fw = new FileWriter("G://abc.html");
fw.write(sb.toString());
fw.close();
System.out.println("done");
System.out.println(sb);
}
catch (Exception e)
{
System.out.println(e);
}
}
}
But it is not identifying newline character.

Since you are reading only one input line at a time you can hardly expect to match a pattern that spans two lines.You must first fix your read to have a least two lines in it. Once you've done that, #sterna's answer will do the trick

I think you can't be sure about how your newline looks like. So I would not match for a specific sequence instead use \s+ this is at least one whitespace character and all newline characters are included.
String matchExist1 = "<script>\\s+Evening</script>";
Edit:
Of course, you have to fix at first the problem mgc described (+1). And then you can make use of my answer!

Related

Is there any in built function in Java to remove unwanted data from extracted data

I extracted some text from a text file but now I want only some specific words from that text.
What I have tried is read from that text file and I have searched by using keyword:
FileReader fr = new
FileReader("D:\\PDFTOEXCEL\\Extractionfrompdf.txt");
BufferedReader br = new BufferedReader(fr);
String s;
String keyword = "dba COPIEFacture ";
while ((s = br.readLine()) != null) {
if (s.contains(keyword)) {
System.out.println(s);
I got Output like this: dba COPIEFacture du 28/05/2018 n° 10077586115Récapitulatif de vote facture
But I want only 28/05/2018 This so please help me
You'll need to use String manipulation methods.
It's difficult to know the best way to do it without seeing other outputs, but you could probably use split() and indexOf() to retrieve the date.
There are other, probably more complex, methods. For example, here's a StackOverflow answer about retrieving dates from strings using a regex pattern.
This will do the trick.
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
FileReader fr;
String keyword = "dba COPIEFacture du ";
String textToFind = "28/05/2018"; // The length usually will not
// change.You can use value
// 10(length) instead
StringBuilder sb = new StringBuilder();
try {
fr = new FileReader("D:\\PDFTOEXCEL\\Extractionfrompdf.txt");
int i;
while ((i = fr.read()) != -1) {
sb.append((char) i);
}
int start = sb.indexOf(keyword) + keyword.length();
int end = start + textToFind.length();
System.out.print(sb.substring(start, end)); //output: 28/05/2018
fr.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

How to remove the duplicate string?

In my code I have two files in my drive those two files have some text and I want to display those string in the console and also remove the repeated string and display the repeated string once rather than displaying it twice.
Code:
public class read {
public static void main(String[] args) {
try{
File file = new File("D:\\file1.txt");
FileReader fileReader = new FileReader(file);
BufferedReader br = new BufferedReader(fileReader);
StringBuffer stringBuffer = new StringBuffer();
String line;
while((line = br.readLine()) != null){
stringBuffer.append(line);
stringBuffer.append("\n");
}
fileReader.close();
System.out.println("Contents of file1:");
String first = stringBuffer.toString();
System.out.println(first);
File file1 = new File("D:\\file2.txt");
FileReader fileReader1 = new FileReader(file1);
BufferedReader br1 = new BufferedReader(fileReader1);
StringBuffer stringBuffer1 = new StringBuffer();
String line1;
while((line1 = br1.readLine()) != null){
stringBuffer1.append(line1);
stringBuffer1.append("\n");
}
fileReader1.close();
System.out.println("Contents of file2:");
String second = stringBuffer1.toString();
System.out.println(second);
System.out.println("answer:");
System.out.println(first+second);
}catch (IOException e) {
// TODO: handle exception
e.printStackTrace();
}
}
}
Output is:
answer:
hi hello
how are you
hi ya
i am fine
But I want to compare both the strings and if the same string repeated then that string should be displayed once.
Output I expect is like this:
answer:
hi hello
how are you
ya
i am fine
Where the "hi" is found in both the strings so that I need to delete the one duplicate string.
How can I do that please help.
Thanks in advance.
You can pass your lines through this method to parse out duplicate words:
// store unique previous words
static Set<String> words = new HashSet<>();
static String removeDuplicateWords(String line) {
StringJoiner sj = new StringJoiner(" ");
// split on whitespace to get distinct words
for (String word : line.split("\\s+")) {
// try to add word to the set
if (words.add(word)) {
// if the word was added (=not seen before), append to the result
sj.add(word);
}
}
return sj.toString();
}

remove '#' symbol from the beginning of the string in java

Sample data in csv file
##Troubleshooting DHCP Configuration
#Module 3: Point-to-Point Protocol (PPP)
##Configuring HDLC Encapsulation
Hardware is HD64570
So i want to get the lines as
#Troubleshooting DHCP Configuratin
Module 3: Point-to-Point Protocol(PPP)
#Configuring HDLC Encapsulation
Hardware is HD64570
I have written sample code
public class ReadCSV {
public static BufferedReader br = null;
public static void main(String[] args) {
ReadCSV obj = new ReadCSV();
obj.run();
}
public void run() {
String sCurrentLine;
try {
br = new BufferedReader(new FileReader("D:\\compare\\Genre_Subgenre.csv"));
try {
while ((sCurrentLine = br.readLine()) != null) {
if(sCurrentLine.charAt(0) == '#'){
System.out.println(sCurrentLine);
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
I am getting below error
##Troubleshooting DHCP Configuration
#Module 3: Point-to-Point Protocol (PPP)
##Configuring HDLC Encapsulation
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(Unknown Source)
at example.ReadCSV.main(ReadCSV.java:19)
Please suggest me how to do this?
Steps:
Read the CSV file line by line
Use line.replaceFirst("#", "") to remove the first # from each line
Write the modified lines to an output stream (file or String) which suites you
If the variable s contains the content of the CSV file as String
s = s.replace("##", "#");
will replace all the occurrencies of '##" with '#'
You need something like String line=buffer.readLine()
Check the first character of the line with line.charAt(0)=='#'
Get the new String with String newLine=line.substring(1)
This is a rather trivial question. Rather than do the work for you, I'll outline the steps that you need to take without gifting you the answer.
Read in a file line by line
Take the first line and check if the first character of this line is a # - If it is, create a substring of this line excluding the first character ( or use fileLine.replaceFirst("#", ""); )
Store this line somewhere in an array like data structure or simply replace the current variable with the edited one ( fileLine = fileLine.replaceFirst("#", ""); )
Repeat until no more lines left from file.
If you want to add these changes to the file, simply overwrite the old file with the new lines (e.g. Using a steam reader and setting second parameter to false would overwrite)
Make an attempt and show us what you have tried, people will be more likely to help if they believe you have attempted the problem yourself thoroughly first.
package stackoverflow.q_25054783;
import java.util.Arrays;
public class RemoveHash {
public static void main(String[] args) {
String [] strArray = new String [3];
strArray[0] = "##Troubleshooting DHCP Configuration";
strArray[1] = "#Module 3: Point-to-Point Protocol (PPP)";
strArray[2] = "##Configuring HDLC Encapsulation";
System.out.println("Original array: " + Arrays.toString(strArray));
for (int i = 0; i < strArray.length; i++) {
strArray[i] = strArray[i].replaceFirst("#", "");
}
System.out.println("Updated array: " + Arrays.toString(strArray));
}
}
//Output:
//Original array: [##Troubleshooting DHCP Configuration, #Module 3: Point-to-Point Protocol (PPP), ##Configuring HDLC Encapsulation]
//Updated array: [#Troubleshooting DHCP Configuration, Module 3: Point-to-Point Protocol (PPP), #Configuring HDLC Encapsulation]
OpenCSV reads CSV file line by line and gives you an array of strings, where each string is one comma separated value, right? Thus, you are operating on a string.
You want to remove '#' symbol from the beginning of the string (if it is there). Correct?
Then this should do it:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
if (nextLine[0].charAt(0) == '#') {
nextLine[0] = nextLine[0].substring(1, nextLine[0].length());
}
}
Replacing the first '#' symbol on each of the lines in the CSV file.
private List<String> getFileContentWithoutFirstChar(File f){
try (BufferedReader input = new BufferedReader(new InputStreamReader(new FileInputStream(f), Charset.forName("UTF-8")))){
List<String> lines = new ArrayList<String>();
for(String line = input.readLine(); line != null; line = input.readLine()) {
lines.add(line.substring(1));
}
return lines
} catch(IOException e) {
e.printStackTrace();
System.exit(1);
return null;
}
}
private void writeFile(List<String> lines, File f){
try(BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f), StandardCharsets.UTF_8))){
for(String line : lines){
bw.write(content);
}
bw.flush();
}catch (Exception e) {
e.printStackTrace();
}
}
main(){
File f = new File("file/path");
List<Stirng> lines = getFileContent(f);
f.delete();
writeFile(lines, f);
}

Replace a String inside a file using java

I have a TXT file in which I'd like to change this String
<!DOCTYPE Publisher
PUBLIC "-//Springer-Verlag//DTD A++ V2.4//EN" "http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd">
into this one <!DOCTYPE Publisher> using Java.
I wrote the following function but it seems not to be working.
public void replace() {
try {
File file = new File("/home/zakaria/Bureau/PhD/test2/file.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = "", oldtext = "";
while((line = reader.readLine()) != null) {
oldtext += line + "\n";
}
reader.close();
String newtext = oldtext
.replaceAll("<!DOCTYPE Publisher\nPUBLIC \"-//Springer-Verlag//DTD A++ V2.4//EN\" \"http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd\">",
"<!DOCTYPE Publisher>");
FileWriter writer = new FileWriter("/home/zakaria/Bureau/PhD/test2/file.txt");
writer.write(newtext);
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
What did I do wrong?
Try this simple code:
public static void replace() {
try {
File file = new File("resources/abc.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = "", oldtext = "";
boolean found = false;
while ((line = reader.readLine()) != null) {
if (line.trim().startsWith("<!DOCTYPE Publisher")) {
found = true;
}
if (line.trim().endsWith("A++V2.4.dtd\">")) {
oldtext += "<!DOCTYPE Publisher>";
found = false;
continue;
}
if (found) {
continue;
}
oldtext += line + "\n";
}
reader.close();
FileWriter writer = new FileWriter("resources/file.txt");
writer.write(oldtext);
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
You are fortunate to start with that it didn't change anything at all.
Otherwise you'd have lost your original file...
Never modify a file in place!!
Create a temporary file where you write the modified content, and only then rename to your original file.
Also, the string you want to replace is pretty complicated, and you don't want to use .replace() since this will replace all occurrences.
Do like this:
final String quoted
= Pattern.quote("<!DOCTYPE Publisher\nPUBLIC \"-//Springer-Verlag//DTD A++ V2.4//EN\" \"http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd\">");
final Pattern pattern = Pattern.compile(quoted);
final Path victim = Paths.get("/home/zakaria/Bureau/PhD/test2/file.txt");
final Path tmpfile = Files.createTempFile("tmp", "foo");
final byte[] content = Files.readAllBytes(victim);
final String s = new String(content, StandardCharsets.UTF_8);
final String replacement = pattern.matcher(s).replaceFirst("<!DOCTYPE Publisher>");
try (
final OutputStream out = Files.newOutputStream(tmpfile);
) {
out.write(replacement.getBytes(StandardCharsets.UTF_8));
out.flush();
}
Files.move(tmpfile, victim);
If the text you want to eliminate is on the second and subsequent lines, as in your demo-input
<!DOCTYPE Publisher
PUBLIC "-//Springer-Verlag//DTD A++ V2.4//EN"
"http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd">
and no lines between the first and last in the tag contain a closing >, then you can do the following:
while(more lines to process)
if "<!DOCTYPE Publisher" is not found
read line and output it
else
//This is the first line in a <!DOCTYPE tag
read the line and output it, appending '>' to the end
while the next line does NOT end with a '>'
discard it (don't output it)
Try with this regexp:
String newtext = oldtext.replaceAll(
"<!DOCTYPE Publisher\nPUBLIC \"-\\/\\/Springer-Verlag\\/\\/DTD A[+][+] V2[.]4\\/\\/EN\"[ ]\"http:\\/\\/devel[.]springer[.]de\\/A[+][+]\\/V2[.]4\\/DTD\\/A[+][+]V2[.]4[.]dtd\">", "<!DOCTYPE Publisher>");
The only changes are escaping forward slashes and putting dots and plus signs between square brackets.

Problems trying to retrieve information from txt file

I'm stuck on one issue in my application. I have one text file that contains one piece of code that I need to retrieve to apply into one string variable. The problem is which is the best way to do this? I ran those samples below, but they are logically incorrect / incomplete. Take a look:
Reading through line:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
String line = null;
try {
while( (line = bfr.readLine()) != null ){
line.contentEquals("d.href");
System.out.println(line);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Reading through character:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
int i = 0;
try {
while ((i = bfr.read()) != -1) {
char ch = (char) i;
System.out.println(Character.toString(ch));
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
};
Reading through Scanner:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
BufferedReader bfr = new BufferedReader(new FileReader(Node));
int wordCount = 0, totalcount = 0;
Scanner s = new Scanner(googleNode);
while (s.hasNext()) {
totalcount++;
if (s.next().contains("(?=d.href).*?(}=?)")) wordCount++;
}
System.out.println(wordCount+" "+totalcount);
With (1.) I'm having difficult to find d.href with contains the start of the code piece.
With (2.) I can't think or find one way to store d.href as string and retrieve the rest of information.
With (3.) I can correctly find d.href but I can't retrieve pieces of the txt.
Could anyone help me please?
As answer of my question, I used scanner to read word by word in the text file. .contains("window.maybeRedirectForGBV") returns one boolean value, and hasNext() one string. Then, I stoped the query for my code stretch on the text file one word before I wanted and moved forward one more time to store the value of the next word on one string variable. From this point you only need to treat your string the way you want. Hope this help.
String stringSplit = null;
Scanner s = new Scanner(Node);
while (s.hasNext()) {
if (s.next().contains("window.maybeRedirectForGBV")){
stringSplit = s.next();
break;
}
}
You can use regular expressions like this:
Pattern pattern = Pattern.compile("^\\s*d\\.href([^=]*)=(.*)$");
// Groups: 1-----1 2--2
// Possibly spaces, "d.href", any characters not '=', the '=', any chars.
....
Matcher m = pattern.matcher(line);
if (m.matches()) {
String dHrefSuffix = m.group(1);
String value = m.group(2);
System.out.println(value);
break;
}
BufferedReader will do.

Categories

Resources