Extending reader, how to return my "read"? - java

As part of my Java course, I wrote a "zip" writer and reader - as Huffman algorithm works.
My class extends Reader, and have an object Reader r.
In my main method, I have these lines:
input = new BufferedReader(new HuffmanReader(new FileReader("output.hff")));
String str = input.readLine();
It should return the decompressed string I wrote to the file, after decompressing it, of course. But it returns the first line of the file!
My read function:
public int read(char[] cbuf, int off, int len) throws IOException {
//...
r.read(buffer,0,8192)
//does the decompress process
String fnlStr = ... //The final result
cbuf = fnlStr.toCharArray();
//close streams
return cbuf.length;
}
My Debug window shows this:
HuffmanReader.read(char[], int, int) line: 23
BufferedReader.fill() line: not available
BufferedReader.readLine(boolean) line: not available
BufferedReader.readLine() line: not available
Run.main(String[]) line: 23
It calls my read function twice. How can I stop the bufferReader from calling the read function again?

You don't return the data you read from the method like you would usually. Instead, when read is called, the caller gives you the array cbuf, which is essentially the address of a chunk of memory, and tells you to write len chars into it.
When you do cbuf = fnlStr.toCharArray(), you're just replacing your local copy of that address with another address, but you're not actually changing the memory you were supposed to write to. You need to either iterate over the array you are given in a for loop and write to it, or use System.arraycopy if you have constructed another buffer that contains the result.
E.g., the following read method will always read "Test\n":
public int read(char[] cbuf, int off, int len) throws IOException {
char[] result = "Test\n".toCharArray();
int numRead = Math.min(len, result.length);
System.arraycopy(result, 0, cbuf, off, numRead);
return numRead;
}
Replacing the "Test\n" literal with your decompressed string should get you started. Of course, you will still have to manage how much of your source you have already consumed.
And as to BufferedReader calling read twice: you shouldn't care how often it's called. Simply get the data from your underlying source, write it to cbuf and return the number of chars you have written. If there is nothing left to read, return -1 to signal the end of the stream (in which case BufferedReader will stop calling read).
As an aside, Reader is meant to read character streams, while InputStream is for binary data (it's basically the same thing, just with byte[] instead of char[] and without using a charset). Since compressed files are binary, you might want to switch your FileReader to a FileInputStream.
I could imagine weird bugs if, for some reason, the charset you encode with isn't the same you decode with. Or less dramatically, you might use more space than you think, if one 16-bit code unit in UTF-16 needs 3 8-bit code units in UTF-8.

You are reading only the first line. Change the first part to something like:
input = new BufferedReader(new HuffmanReader(new FileReader("output.hff")));
Arraylist<String> list = new ArrayList<String>();
String line;
while ((line = reader.readLine()) != null) {
list.add(line);
}
And also, to fix that your method is being called twice, make a boolean and set it to true after you have done your things in the method. Then in the beginning of that method, check if that boolean is true. If it is, return from the method so it won't be executing things after it again.

Related

QuestionMark instead of String in java

public class Example {
public static void main(String[] args) throws IOException {
byte[] bytes = new byte[100];
InputStreamReader fileInputStream = new InputStreamReader(new FileInputStream("/Users/deni/Desktop/Input.txt"));
while (fileInputStream.read() != -1) {
int i = 0;
bytes[i] = (byte) fileInputStream.read();
i++;
}
String string = new String(bytes, StandardCharsets.UTF_8);
System.out.println(string);
}
}
File contains only just number.But when I run this method I get question mark.
What is source of this problem and how to solve it? I am from Russia.I have read some other posts related with this problems but it doesn't help me.I tried write javac -J-Duser.language=en Example java.
Thank you.
You either want to use a Reader and read diretly into char[]/String (the preferred method) or use an InputStream, read into a byte[] and transfer that into a String later on. You combine both, effectively double-decoding the input, which might accidentally work for pure ASCII text, but will mangle all other text.
You don't actually fill the byte[] array because you declare int i = 0 inside the loop meaning you'll only ever fill the byte[0] with a value.
You ignore every second read() result by calling read() twice in your loop and checking one return value for -1 and storing the other in bytes[]. Instead, you want to call read() once, assign it to a variable, check that variable for -1 and if it isn't use that variable to store in the target.
Files.readString(Path, Charset) does everything you're trying to do without having to implement it manually.
But in the interest of actually interacting with your code, this is a functioning method with as few changes as possible:
char[] chars = new char[100];
InputStreamReader reader = new InputStreamReader(new FileInputStream("/Users/deni/Desktop/Input.txt"), StandardCharsets.UTF_8);
int i = 0;
int c;
while ((c = reader.read()) != -1) {
chars[i] = (char) c;
i++;
}
String string = new String(chars, 0, i);
System.out.println(string);
This is still needlessly complicated, breaks if the input is more than 100 chars and produces weird output if it's less than 100. But should demonstrated the necessary changes.

How to remove empty lines from a text file? Java

I have a text file in which I want to read in, remove all non-alphabetic characters and white space including the empty lines. Then convert the text to lowercase. This is what I have so far in terms of the code:
public static String replace(String file ){
String plaintext = "";
try{
Scanner input = new Scanner(new File(file));
while(input.hasNext()){
//text = text + input.nextLine();
plaintext = input.nextLine();
plaintext = plaintext.replaceAll("[^a-zA-Z]", "");
plaintext = plaintext.toLowerCase();
System.out.println(plaintext);
}
input.close();
}
catch(FileNotFoundException e){
System.out.println("File not found. ");
}
return "";
}//end of replace method
The only problem I'm facing is that I am not sure how to removed the black lines of spaces in between each paragraph of the text file. My output shows like this:
csthesciencethatdealswiththetheoryandmethodsofprocessinginformationindigitalcomputersthedesignofcomputerhardwareandsoftwareandthe
applicationsofcomputers
itthedevelopmentimplementationandmaintenanceof
computerhardwareandsoftwaresystemstoorganizeandcommunicateinformation
electronicallyabbreviationit
computersaremanmadetoolsthataidusinsolvingotherproblemsabiologististryingtofigureouthowlifeworksphysicistsandchemistsaretryingtofigureouthowitemsreactinouruniversemathematiciansaretryingtofigureoutrulesformanmadesystems
anyresearchproblemthatmayimproveacomputerscapabilityofhelpingsolveaproblemoranyresearchproblemthatshedslightaboutanewwaytodosomethingwithacomputerispartofcs
mostexcitingresearchmedicalapplicationsexpertsystemsfordiagnosis
The code below should work; please note that it makes use of the JSR 203 API as far as file handling is concerned (since Java 7; in store for 10+ years for you to use) and of Java 8 for streams and their associated methods. Also note that it won't work with code points outside the BMP:
public static String trimFile(final String file)
throws IOException
{
final StringBuilder sb = new StringBuilder();
final Path path = Paths.get(file);
try (
final Reader r = Files.newBufferedReader(path);
) {
int c;
while ((c = r.read()) != -1)
if (Character.isLetter(c))
sb.appendCodePoint(c);
}
return sb.toString();
}
A little explanation here:
we don't need anything else than a Reader given the requirements; and even though Files.newBufferedReader() returns a BufferedReader, we don't care about reading line by line, therefore we downgrade it to a Reader, and we trust the JRE implementation to do its thing;
the read() method of a Reader returns an int... This is fine given the requirements (ie, we should not expect code points outside the BMP);
the Character.isLetter() implementation used is the one that takes an int as an argument, well, so what: as stated in the previous point, we don't expect code points outside the BMP, in which case this method behaves the same way as its counterpart expecting a char, so no harm done;
we do, however, have to use the appendCodePoint of StringBuilder; this class' .append method taking an int as an argument will append the string representation of the integer as an argument, but this is not what we want.

Why is FileInputStream read() method wrongly reading question mark (ascii: 63) when put into infinite loop?

There are some similar questions on the site but all have been used in different scenario. So, I'm asking it here:
package Assign6B;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class FileOpsDemo {
public static void main(String[] args) throws IOException
{
FileInputStream inputFile = null;
FileOutputStream outputFile = null;
try
{
inputFile = new FileInputStream("s:/inputFile.txt");
outputFile = new FileOutputStream("s:/outputFile.txt");
char c;
while(( c = (char) inputFile.read()) != -1)
{
System.out.println((char)c);
outputFile.write(c);
}
System.out.println("File transfer complete!");
}
finally
{
if (inputFile != null)
inputFile.close();
if (outputFile != null)
outputFile.close();
}
}
}
This is my code. In the while loop condition, first I had set it to typecast the int output by read() into a char. The result was that it went into an infinite loop with all characters being converted to '?' (ascii: 63). Then I realized my mistake of char conversion and changed it.
But when I changed my while condition to "=-2" (without the char conversion) (this condition will never occur and hence put it into infinite loop). Here too, even if there is no char conversion the first (say, 10) valid characters of the file are still being converted into '?'. (After it reaches EOF, all invalid chars become '?' - I'm assuming this is given).
Why is this happening? At least the valid characters of the file should be read properly until it encounters the EOF and starts feeding on invalid chars!
Why is this happening?
The problem is in this line:
while(( c = (char) inputFile.read()) != -1)
You are doing the following:
Reading a byte from the file. This gives you an int which is either a byte in the range 0 to 255, or -1.
You are casting that value to a char. For the byte, that gives a char value in the range 0 to 255. For -1 the cast will give you '\uffff'.
You assign that value to c.
You then test the value against -1. This is where it goes wrong. In the case where read returned -1, you will now be evaluating this '\uffff' == -1. The LHS is converted to an int value ... 0x0000ffff ... and that is compared to 0xffffffff. They are different.
Then you print 'uffff' ... which is being converted to a '?' when output as a character in your default charset.
There are two major mistakes in the code. First, the conversion int -> char -> int is not going to work; see above.
Second, and more important:
you should not be trying to use an InputStream (which is byte oriented) to read data as characters, and
you should be trying to write character data to an OutputStream.
Depending on what you actually trying to achieve here, you should either:
read and write bytes ... without a spurious "conversion" to char in the middle, OR
use a FileReader and FileWriter to do the conversions properly for the platform default characterset.
(There are some other points that could be made about buffering, choosing an alternate charset, etc, but this Answer is already getting too long.)
Just change this section of code - once you convert to a char, you cannot compare it to an integer successfully, so your while exit condition is never met.
int c;
while ((c = inputFile.read()) != -1) {
System.out.println((char) c);
outputFile.write(c);
}
Also using the java 8 java.nio and java.io packages is much simpler
public static void main(String[] args) throws IOException {
List<String> lines = Files.readAllLines(Paths.get("s:/inputFile.txt"));
Files.write(Paths.get("s:/outputFile.txt"), lines);
}
Typecasting a the result of in.read() to char is bad style. Characters should only read from a Reader - in your case you could use an InputStreamReader:
inputFile = new FileInputStream("s:/inputFile.txt");
outputFile = new FileOutputStream("s:/outputFile.txt");
Reader inputReader = InputStreamReader(inputFile, StandardCharsets.UTF_8);
Writer outputWriter = OutputStreamWriter(outputFile, StandardCharsets.UTF_8);
char[] cbuf = new char[4096];
int read;
while( (read = inputReader.read(cbuf)) >= 0)
{
System.out.println(new String(cbuf, 0, read));
outputWriter.write(cbuf, 0, read);
}
This example furthermore does not copy byte-by-byte (massive speed improvement) and it applies UTF-8 as charset.

Optimum time to perform an operation: within, or after loop

I am reading a file to parse later on. The file is not likely to exceed an MB in size, so this is perhaps not a crucial question for me at this stage. But for best practise reasons, I'd like to know when is the optimum time to perform an operation.
Example:
Using a method I've pasted from http://www.dzone.com/snippets/java-read-file-string, I am reading a buffer into a string. I would now like to remove all whitespace. My method is currently this:
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuffer fileData = new StringBuffer(1024);
BufferedReader reader = new BufferedReader(
new FileReader(filePath));
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
listRaw = fileData.toString().replaceAll("\\s","");
return true;
}
So, I remove all whitespace from the string at the time I store it - in it's entirety - to a class variable.
To me, this means less processing but more memory usage. Would I be better off applying the replaceAll() operation on the readData variable as I append it to fileData for best practise reasons? Using more processing but avoiding passing superfluous whitespace around.
I imagine this has little impact for a small file like the one I am working on, but what if it's a 200MB log file?
Is it entirely case-dependant, or is there a consensus I'd do better to follow?
Thanks for the input everybody. I'm sure you've helped to aim my mindset in the right direction for writing Java.
I've updated my code to take into consideration the points raised. Including the suggestion by Don Roby that at some point, I may want to keep spaces. Hopefully things read better now!
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuilder fileData = new StringBuilder(51200);
BufferedReader reader = new BufferedReader(new FileReader(filePath));
char[] buf = new char[51200];
boolean spaced = false;
while(reader.read(buf) != -1){
for(int i=0;i<buf.length;i++) {
char c = buf[i];
if (c != '\t' && c != '\r' && c != '\n') {
if (c == ' ') {
if (spaced) {
continue;
}
spaced = true;
} else {
spaced = false;
}
fileData.append(c);
}
}
}
reader.close();
listRaw = fileData.toString().trim();
return true;
}
You'd better create and apply the regexp replacement only once, at the end. But you would gain much more by
initializing the StringBuilder with a reasonable size
avoiding the creation of a String inside the loop, and append the read characters directly to the StringBuilder
avoiding the instantiation of a new char buffer, for nothing, at each iteration.
To avoid an unnecessary long temporary String creation, you could read char by char, and only append the char to the StringBuilder if it's not a whitespace. In the end, the StringBuilder would contain only the good characters, and you wouldn't need any replaceAll() call.
THere are actually several very significant inefficiencies in this code, and you'd have to fix them before worrying about the relatively less important issue you've raised.
First, don't create a new buf object on each iteration of the loop -- use the same one! There's no problem with doing so -- the new data overwrites the old, and you save on object allocation (which is one of the more expensive operations you can do.)
Second, similarly, don't create a String to call append() -- use the form of append that takes a char array and an offset (0, in this case) and length (numRead, in this case.) Again, you create one less object per loop iteration.
Finally, to come to the question you actually asked: doing it in the loop would create a String object per iteration, but with the tuning we've just done, you're creating zero objects per iterataion -- so removing the whitespace at the end of the loop is the clear winner!
Depending somewhat on the parse you're going to do, you may well be better off not removing the spaces in a separate step at all, and just ignore them during the parse.
It's also reasonably rare to want to remove all whitespace. Are you sure you don't want to just replace multiple spaces with single spaces?

What is the quickest / most efficient way to append a char to a file loaded into memory?

read_data = new BufferedReader( new FileReader(args[0]) );
data_buffer = new StringBuffer();
int i;
while(read_data.ready())
{
while((i = read_data.read()) != -1)
{
data_buffer.append((char)i);
}
}
data_buffer.append(System.getProperty("line.separator"));
What I'm trying to do is, read an entire .txt file into a string and append a newline to the string. And then be able to process this string later on by creating a new Scanner by passing data_buffer.toString(). Obviously on really large files this process takes up a lot of time, and all I want to do is just append a newline to the .txt file I've read into memory.
I'm aware the whole idea seems a bit hacky or weird, but are there any quicker methods?
Cheers :)
The fastest way to do something is often to not do it at all.
Why don't you modify the parsing code in such way that the newline at the end is not required? If you are appending it each time, you could as well change the code to behave as if it were there while it really isn't.
The next thing I would try would be to avoid creating a huge String char by char, as this is indeed rather costly. You can create a Scanner based on an InputStream and it will probably be much faster than reading data into a String and parsing that. You can override your FileInputStream to return a virtual newline character at the end of the file, thus avoiding the instatiation of the pasted string.
And if you absolutely positively did have to read the data into a buffer, you would probably be better off by reading into a byte array using the array-based read() methods of the stream - much faster than byte by byte. Since you can know the size of the file in advance, you could allocate your buffer with space for the extra end-of-line marker and insert it into the array. In contrast to creating a StringBuffer and making a String out of it, this does not require a full copy of the buffer.
From what I can tell, what you are actually trying to do is to read a file in such a way that it always appears to have a line separator at the end of the last line.
If that is the case, then you could do this by implementing a subtype of FilterReader, and have it "insert" an extra character or two if required when it reaches the end of the character stream.
The code to do this won't be trivial, but it will avoid the time and space overhead of buffering the entire file in memory.
If all you're doing is passing the resulting file in to a Scanner, you should create a Readable for the file and send that to Scanner.
Here's an example (untested):
public class NLReader implements Readable {
Reader r;
boolean atEndOfReader = false;
boolean atEnd = false;
public NLReader(Reader r) {
this.r = r;
}
public int read(CharBuffer cb) throws IOException {
if (!atEndOfReader) {
int result = r.read(cb);
if (result == -1) {
atEndOfReader = true;
} else {
return result;
}
}
if (!atEnd) {
String nl = System.getProperty("line.separator");
cb.append(nl);
atEnd = true;
return nl.length();
}
return -1;
}
}
This only reads the file once, and never copies it (unlike your StringBuffer -- and you should be using StringBuilder instead unless you really need the synchronization of StringBuffer).
This also doesn't load the actual file in to memory, so that can save memory pressure as well.

Categories

Resources