I'm trying to read in a file and modify the text, but I need to keep new lines when doing so. For example, if I were to read in a file that contained:
This is some text.
This is some more text.
It would just read in as
This is some text.This is some more text.
How do I keep that space? I think it has something to do with the /n escape character. I've seen using BufferReader and FileReader, but we haven't learned that in my class yet, so is there another way? What I've tried is something like this:
if (ch == 10)
{
ch = '\n';
fileOut.print(ch);
}
10 is the ASCII table code for a new line, so I thought Java could recognize it as that, but it doesn't.
In Java 8:
You can read lines using:
List<String> yourFileLines = Files.readAllLines(Paths.get("your_file"));
Then collect strings:
String collect = yourFileLines.stream().filter(StringUtils::isNotBlank).collect(Collectors.joining(" "));
The problem is that you (possibly) want to read your file a line at a time, and then you want to write it back a line at a time (keeping empty lines).
The following source does that, it reads the input file one line at a time, and writes it back one line at a time (keeping empty lines).
The only problem is ... it possibly changes the new line, maybe you are reading a unix file and write a dos file or vice-versa depending on the system you are running in and the source type of the file you a reading.
Keeping the original newline can introduce a lot complexity, read BufferedReader and PrintWriter api docs for more information.
public void process(File input , File output){
try(InputStream in = new FileInputStream(input);
OutputStream out = new FileOutputStream(output)){
BufferedReader reader = new BufferedReader(new InputStreamReader(in, "utf-8"),true);
PrintWriter writer = new PrintWriter( new OutputStreamWriter(out,"utf-8"));
String line=null;
while((line=reader.readLine())!=null){
String processed = proces(line);
writer.println(processed);
}
} catch (IOException e) {
// Some exception management
}
}
public String proces(String line){
return line;
}
/n should be \n
if (ch == 10)
{
ch = '\n';
fileOut.print(ch);
}
Is that a typo?
ch = '/n';
otherwise use
ch = '\n';
Related
I'm importing a file into my code and trying to print it. the file contains
i don't like cake.
pizza is good.
i don’t like "cookies" to.
17.
29.
the second dont has a "right single quotation" and when I print it the output is
don�t
the question mark is printed out a blank square. is there a way to convert it to a regular apostrophe?
EDIT:
public class Somethingsomething {
public static void main(String[] args) throws FileNotFoundException,
IOException {
ArrayList<String> list = new ArrayList<String>();
File file = new File("D:\\project1Test.txt");//D:\\project1Test.txt
if(file.exists()){//checks if file exist
FileInputStream fileStream = new FileInputStream(file);
InputStreamReader input = new InputStreamReader(fileStream);
BufferedReader reader = new BufferedReader(input);
String line;
while( (line = reader.readLine()) != null) {
list.add(line);
}
for(int i = 0; i < list.size(); i ++){
System.out.println(list.get(i));
}
}
}}
it should print as normal but the second "don't" has a white block on the apostrophe
this is the file I'm using https://www.mediafire.com/file/8rk7nwilpj7rn7s/project1Test.txt
edit: if it helps even more my the full document where the character is found here
https://www.nytimes.com/2018/03/25/business/economy/labor-professionals.html
It’s all about character encoding. The way characters are represented isn't always the same and they tend to get misinterpreted.
Characters are usually stored as numbers that depend on the encoding standard (and there are so many of them). For example in ASCII, "a" is 97, and in UTF-8 it's 61.
Now when you see funny characters such as the question mark (called replacement character) in this case, it's usually that an encoding standard is being misinterpreted as another standard, and the replacement character is used to replace the unknown or misinterpreted character.
To fix your problem you need to tell your reader to read your file using a specific character encoding, say SOME-CHARSET.
Replace this:
InputStreamReader input = new InputStreamReader(fileStream);
with this:
InputStreamReader input = new InputStreamReader(fileStream, "SOME-CHARSET");
A list of charsets is available here. Unfortunately, you might want to go through them one by one. A short list of most common ones could be found here.
Your problem is almost certainly the encoding scheme you are using. You can read a file in most any encoding scheme you want. Just tell Java how your input was encoded. UTF-8 is common on Linux. Windows native is CP-1250.
This is the sort of problem you have all the time if you are processing files created on a different OS.
See here and Here
I'll give you a different approach...
Use the appropriate means for reading plain text files. Try this:
public static String getTxtContent(String path)
{
try(BufferedReader br = new BufferedReader(new FileReader(path)))
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
}catch(IOException fex){ return null; }
}
I found this code from another question
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
String line;
String input = "";
while ((line = file.readLine()) != null)
input += line + "\n";
input = input.replace(toUpdate, updated);
FileOutputStream os = new FileOutputStream(data);
os.write(input.getBytes());
file.close();
os.close();
}
This is my file before I replace some lines
example1
example2
example3
But when I replace a line, the file now looks like this
example1example2example3
Which makes it impossible to read the file when there are a lot of lines in it.
How would I go about editing the code above to make my file look what it looked like at the start?
Use System.lineSeparator() instead of \n.
while ((line = file.readLine()) != null)
input += line + System.lineSeparator();
The issue is that on Unix systems, the line separator is \n while on Windows systems, it's \r\n.
In Java versions older then Java 7, you would have to use System.getProperty("line.separator") instead.
As pointed out in the comments, if you have concerns about memory usage, it would be wise to not store the entire output in a variable, but write it out line-by-line in the loop that you're using to process the input.
If you read and modify line by line this has the advantage, that you dont need to fit the whole file in memory. Not sure if this is possible in your case, but it is generally a good thing to aim for streaming. In your case this would in addition remove the need for concatenate the string and you don't need to select a line terminator, because you can write each single transformed line with println(). It requires to write to a different file, which is generally a good thing as it is crash safe. You would lose data if you rewrite a file and get aborted.
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
PrintWriter writer = new PrintWriter(new File(data+".out"), "UTF-8");
String line;
while ((line = file.readLine()) != null)
{
line = line.replace(toUpdate, updated);
writer.println(line);
}
file.close();
if (writer.checkError())
throw new IOException("cannot write");
writer.close();
}
In this case, it assumes that you need to do the replace only on complete lines, not multiple lines. I also added an explicit encoding and use a writer, as you have a string to output.
This is because you use OutputStream which is better for handling binary data. Try using PrintWriter and don't add any line terminator at the end of the lines. Example is here
I am writing a class that will read lines from a log file when it is updated.
I am using Apache VFS2 to get a method called when a file is updated. My main issue is I don't want to read the line from the file if the line is not complete yet, as in it does have a "\n" or "\r" line separator type character at the end. I think i have looked at all the Java libraries i can to read lines but they all discard the EOF and line termination information so I don't think I can use them.
Instead I am looking at reading it in byte by byte and then checking the result to then discard all stuff that comes after the last line separator. I was wondering what you folks thoughts on the best method for doing this is.
So for example:
2013-Jul-01_14:07:17.875 - Connection to Message Bus is reestablished<LF>
2013-Jul-01_14:07:17.875 - Connection to Message Bus is reestablished<LF>
2013-Jul-01_14:15:08.205 - No connection to Message Bus - reestablish before we can publish<LF>
2013-Jul-01_14:15:08.205 - NOT A REAL LINE PLEASE DONT READ
I want to read in the first 3 but not the fourth as it doesn't have a line feed or carriage return character ().
I have looked at Apache commons-io Tailer stuff but I cant tell if that will give me "incomplete" lines (and I realize I will have to ditch the VFS2 stuff to use it).
So psudo-code:
private void ingestFileObject(FileObject file) {
BufferedInputStream bs = new BufferedInputStream(file.getContent().getInputStream());
StringBuilder result = new StringBuilder();
while (bs.available() > 0) {
result.append((char) bs.read());
}
bs.close();
String resultString = result.toString();
//determine what part of resultString is after last carriage return/line seperate (using regex [\\r\\n]+?
//remove the offending part of String.
}
Or any other solutions completely ignoring my psudo-code are welcome at this point too...
Thanks
Is using Scanner help you?
Scanner scanner = new Scanner(file);
//block till there is some thing with a new line
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
//do processing.
}
This is what I ended up doing:
BufferedReader bufReader = new BufferedReader(new InputStreamReader(file.getContent().getInputStream()));
StringBuilder result = new StringBuilder();
int readInInt = -1;
String charsSinceLastLineSep = "";
if (bufReader.ready()) {
while (-1 != (readInInt = bufReader.read())) {
char readInChar = (char) readInInt;
// if new line reset line buffer, otherwise add to buffer
if (readInChar == '\n' || readInChar == '\r') {
charsSinceLastLineSep = "";
} else {
charsSinceLastLineSep += readInChar;
}
result.append(readInChar);
}
bufReader.close();
// remove all characters added since last Carriage Return or NewLine was found indicating
// that line was not a complete log line
String resultString = (result.subSequence(0, (result.length() - charsSinceLastLineSep.length())).toString());
I am writing a small java app which will scan a text file for any instances of particular word and need to have a feature whereby it can report that an instance of the word was found to be the 14th word in the file, on the third line, for example.
For this i tried to use the following code which i thought would check to see whether or not the input was a newline (\n) character and then incerement a line variable that i created:
FileInputStream fileStream = new FileInputStream("src/file.txt");
DataInputStream dataStream = new DataInputStream(fileStream);
BufferedReader buffRead = new BufferedReader(new InputStreamReader(dataStream));
String strLine;
String Sysnewline = System.getProperty("line.separator");
CharSequence newLines = Sysnewline;
int lines = 1;
while ((strLine = buffRead.readLine()) != null)
{
if(strLine.contains(newLines))
{
System.out.println("Line Found");
lines++;
}
}
System.out.println("Total Number Of Lines In File: " + lines);
This does not work for, it simply display 0 at the end of this file. I know the data is being placed into strLine during the while loop as if i change the code slightly to output the line, it is successfully getting each line from the file.
Would anyone happen to know the reason why the above code does not work?
Read the javadocs for readLine.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
readLine() strips newlines. Just increment every iteration of the loop. Also, you're overcomplicating your file reading code. Just do new BufferedReader(new FileReader("src/file.txt"))
I'm having some problems with the FileReader class.
How do I specify an offset in the lines it goes through, and how do I tell it when to stop?
Let's say I want it to go through each line in a .txt file, but only lines 100-200 and then stop?
How would I do this? Right now I'm using ReadLine() but I don't think there's a way to specify offset with that.
Any fast help is VERY appreciated. Thanks.
You can't. FileReader reads a character at a time or a line at a time. Obviously you can write your own code extending or wrapping it to skip to the unneeded lines.
An aside: Be CAREFUL using FileReader or FileWriter - they use the default LOCALE character set. If you want to force a character set use OutputStreamWriter or InputStreamReader. Example
Writer w = new FileWriter(file) can be replaced by
Writer w = new OutputStreamWriter(new FileOutputStream(file),"UTF-8"); <=== see how I can set the character set.
An alternative: If you have FIXED-WIDTH text, then look at RandomAccessFile which lets you seek to any position. This doesn't help you much unless you have fixed width text or an index to skip to a line. But it is handy :)
Read all the lines but use another variable to count which line you are on. Call continue if you are on a line that you don't want to process (say, before the 100th line) and break when you will not want to process any more lines (after the 200th line).
There is not a way to tell the reader to only read certain lines, you can just use a counter to do it.
try {
BufferedReader in = new BufferedReader(new FileReader("infilename"));
String str;
int lineNumber = 0;
while ((str = in.readLine()) != null) {
lineNumber++;
if (lineNumber >= 100 && lineNumber <= 200) {
System.out.println("Line " + lineNumber + ": " + str);
}
}
in.close();
} catch (IOException e) { }
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
for(int i=0;i<100;i++,in.readLine()){}
String line101 = in.readLine();