I'm trying to write a mini FTP application that reads binary data from a file and sends it to a client. My program usually does not behave as desired and usually ends up sending the file, but not doing it completely (i.e. send text file and the content is blank). I think it may be because I use the filereader to read the line, although I do not quite understand why this would be a problem.
Here is the relevant code:
File file = new File(rootDirectory, name);
int filenum = (int)file.length();
long filelen = file.length();
System.out.println("File is: " + filenum + " bytes long");
socketOut.writeLong(filelen);
fileIn = new BufferedReader(new FileReader(file));
System.out.println("Sending: " + name);
while((line = fileIn.readLine()) != null){
socketOut.writeBytes(line);
socketOut.flush();
}
The problem is that Readers/writers read text (as opposed to Input~/OutputStreams). FileReader internally uses the default operating system encoding. That conversion will never do for binary files. Also note, that readLine discards the line ending (\r\n, \n or \u0085). As of Java 7 you can do
Files.copy(file.toPath(), socketOut);
instead of the wile loop.
Joop's solution is perfect for Java7 (or later). If you are stuck on an older version (or want to extend your tool arsenal anyway), have a look at the following free libraries:
Apache Commons IO (actually all Apache Commons are interesting to look at). There you can do IOUtils.copy(...)
Google Guava There it is a little more complicated but flexible. Use ByteSource.copyTo(ByteSink)
I like the caching in the Google libraries, pretty neat
If you don't have Java 7 and don't want to add external libraries, the canonical copy loop in Java for streams is as follows:
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
where count is an int, and buffer is a byte[] of any non-zero size. It doesn't have to be anywhere near the size of the file. I usually use 8192.
Related
I am writing csv file with the help of csvWriter (Java) but while executing code on Unix Box with huge records (Around 9000) it creates empty file.
When i try to execute same code at local( Eclipse ) at windows it works fine for same huge file. WHY?
I Noticed one thing if record are around 3000 then it works fine at unix box also.
Issue is with only huge file.
I tried to use writer.writeNext() method also instead of writeAll() but still same issue is observed at UNIX Box. :(
Note : File does not has any special characters , It's in English.
Code -->
CSVReader reader = new CSVReader(new FileReader(inputFile), ',','"');
List<String[]> csvBody = reader.readAll();
int listSize = csvBody.size();
if(listSize > 0){
String renameFileNamePath = outputFolder + "//"+ existingFileName.replaceFirst("file1", "file2");
File newFile = new File(renameFileNamePath);
CSVWriter writer = new CSVWriter(new FileWriter(newFile), ',');
for(int row=1 ; row < listSize; row++){
String timeKeyOrTransactionDate = null;
timeKeyOrTransactionDate = year+"-"+month+"-"+day+" 00:00:00";
csvBody.get(row)[0] = timeKeyOrTransactionDate ;
}
//Write to CSV file which is open
writer.writeAll(csvBody);
writer.flush();
writer.close();
}
reader.close();
The readAll and writeAll methods should only be used with small datasets - otherwise avoid it like the plague. Use the readNext and writeNext methods instead so you don't have to read the entire file into memory.
Note the readNext will return null once you have no more data (end of Stream or end of file). I will have to update the javadocs to mention that.
Disclaimer - I am the maintainer of the opencsv project. So please take the "avoid like plague" seriously. Really that was only put there because most files are usually small and can fit in memory but when in doubt of how big your dataset will be avoid putting it all in memory.
A data error. The linux machine probably uses UTF-8 Unicode encoding. This can throw error on the first encountered malformed UTF-8 byte sequence, with the single byte Windows encoding simply accepts.
You are using the old utility class FileReader (there also exists the also flawed FileWriter), that use the default platform encoding, which makes the software platform dependent.
You need to do:
Charset charset = Charset.forName("Windows-1252"); // Windows Latin-1
For reading
BufferedReader br = Files.newBufferedReader(inputFile.toPath(), charset);
For writing
Path newFile = Paths.get(renameFileNamePath);
BufferedWriter bw = Files.newBufferedWriter(newFile, charset);
CSVWriter writer = new CSVWriter(bw, ',');
The above assumes a single byte encoding, but probably will work for most other single byte encodings too.
A pity that the file is not in UTF-8, allowing any script.
Issue has resolved. Actually output directory was shared via loader application also and loader keeps on checking file in every minutes that's why before writing the csv file ,loader pick it and load with zero kb in DB.
Hence I used buffered writer instead of file writer and also writing data first in tmp file then renamed it with file2 and it worked.
Thanks to all of you for your help and valuable suggestions.
We have an issue unzipping bz2 files in Java, whereby the input stream thinks it's finished after reading ~3% of the file.
We would welcome any suggestions for how to decompress and read large bz2 files which have to be processed line by line.
Here are the details of what we have done so far:
For example, a bz2 file is 2.09 GB in size and uncompressed it is 24.9 GB
The code below only reads 343,800 lines of the actual ~10 million lines the file contains.
Modifying the code to decompress the bz2 into a text file (FileInputStream straight into the CompressorInputStream) results in a file of ~190 MB - irrespective of the size of the bz2 file.
I have tried setting a buffer value of 2048 bytes, but this has no effect on the outcome.
We have executed the code on Windows 64 bit and Linux/CentOS both with the same outcome.
Could the buffered reader come to an empty, "null" line and cause the code to exit the while-loop?
import org.apache.commons.compress.compressors.*;
import java.io.*;
...
CompressorInputStream is = new CompressorStreamFactory()
.createCompressorInputStream(
new BufferedInputStream(
new FileInputStream(filePath)));
lineNumber = 0;
line = "";
br = new BufferedReader(
new InputStreamReader(is));
while ((line = br.readLine()) != null) {
this.processLine(line, ++lineNumber);
}
Even this code, which forces an exception when the end of the stream is reached, has exactly the same result:
byte[] buffer = new byte[1024];
int len = 1;
while (len == 1) {
out.write(buffer, 0, is.read(buffer));
out.flush();
}
There is nothing obviously wrong with your code; it should work. This means the problem must be elsewhere.
Try to enable logging (i.e. print the lines as you process them). Make sure there are no gaps in the input (maybe write the lines to a new file and do a diff). Use bzip2 --test to make sure the input file isn't buggy. Check whether it always fails for the same line (maybe the input contains odd characters or binary data?)
The issue lies with the bz2 files: they were created using a version of Hadoop which includes bad block headers inside the files.
Current Java solutions stumble over this, while others ignore it or handle it somehow.
Will look for a solution/workaround.
I've been Googling up some answers, and I can't seem to find the best one.
Here's what I have so far for reading internal files on Android:
fis = openFileInput("MY_FILE");
StringBuilder fileContent = new StringBuilder("");
byte[] buffer = new byte[fis.available()];
while (fis.read(buffer) != -1) {
fileContent.append(new String(buffer));
}
MYVARIABLE = fileContent.toString();
fis.close();
It use to leave a lot of whitespaces, but I just used .available method to only return what I need.
Is there a faster or shorter way to write this? I can't seem to find any good ones in the API guide.
1). API for available() says it should not be used for the purposes you need:
Note that this method provides such a weak guarantee that it is not very useful in practice.
Meaning it may not give you the file size.
2). When you read smth in RAM, then take under account the file can be lengthy, so try to avoid spending extra RAM. For this a relatively small (1~8 KB) buffer is used to read from source and then append to result. On the other hand using too small buffers (say, several bytes) slows down reading significantly.
3). Reading bytes differs from reading characters, because a single character may be represented by more than one byte (depends on encoding). To read chars the spesific classes are used which are aware of encoding and know how to convert bytes to chars properly. For instance InputStreamReader is one of such classes.
4). The encoding to use for reading should be the encoding tha was used for persisting the data.
Taking all the said above I would use smth like this:
public static String getStringFromStream(InputStream in, String encoding)
throws IOException {
InputStreamReader reader;
if (encoding == null) {
// This constructor sets the character converter to the encoding
// specified in the "file.encoding" property and falls back
// to ISO 8859_1 (ISO-Latin-1) if the property doesn't exist.
reader = new InputStreamReader(in);
} else {
reader = new InputStreamReader(in, encoding);
}
StringBuilder sb = new StringBuilder();
final char[] buf = new char[1024];
int len;
while ((len = reader.read(buf)) > 0) {
sb.append(buf, 0, len);
}
return sb.toString();
}
5). Make sure to always close an InputStream when done working with it.
Sure, there are more than one way to read text from file in Java/Android. This is mostly because Java API contains several generations of IO APIs. For instance, classes from java.nio package were created to be more efficient, however usually there is no strong reason of using them (don't fall into premature optimization sin).
I am downloading databases from the network, which are between 100 Kbytes and 500 Kbytes large. Here is my code (removed useless code):
URLConnection uConnection = downloadUrl.openConnection();
InputStream iS = uConnection.getInputStream();
BufferedInputStream bIS = new BufferedInputStream(iS);
byte[] buffer = new byte[1024];
FileOutputStream fOS = new FileOutputStream(db);
int bufferLength = 0;
while ((bufferLength = bIS.read(buffer)) > 0) {
fOS.write(buffer, 0, bufferLength);
}
fOS.close();
My problem is, that it takes a long time for him to finish the while-statement. Have I messed up the code somewhere? It shouldn't take that long for such small files, shouldn't it? I'm talking about 1 minute, for three files not larger than 1 MB altogether... Thanks in advance!
"Slow" is really rather ambiguous. That being said, considering what you're trying to do you shouldn't be using a BufferedInputStream and your buffer is way too small.
The buffered wrappers are for optimizing small reads/writes. Since all you're doing is trying to read a ton of data as fast as you can, you should just read directly from the InputStream, and use a large buffer (Say, 64k since the underlying native code is probably going to chunk at that size anyway).
byte[] buffer = new byte[65536];
...
while ((bufferLength = iS.read(buffer, 0, buffer.length) > 0) {
...
I've found the real solution in Jdk 1.7, which is made by reliable, fast, simple and almost definitively will spawn a pity veil on older java.io solutions.Despite the web is still plenty full of examples of copying files in java using In/out Streams I'll warmely suggest everyone to use a simple method : java.nio.Files.copy(Path origin, Path destination) with optional parameters for replacing destination,migrate metadata file attributes and even try a transactional move of files (if permitted by the underlying O.S.). That's a really good Job, waited for so long! You can easily convert code from copy(File file1, File file2) by appending a ".toPath()" to the File instance (e.g. file1.toPath(), file2.toPath(). Note also that the boolean method isSameFile(file1.toPath(), file2.toPath()), is already used inside the above copy method but easily usable in every case you want. For every case you can't upgrade to 1.7 using community libraries from Apache (commons-io) or Google (guava commons) is still suggested.
This problem seems to happen inconsistently. We are using a java applet to download a file from our site, which we store temporarily on the client's machine.
Here is the code that we are using to save the file:
URL targetUrl = new URL(urlForFile);
InputStream content = (InputStream)targetUrl.getContent();
BufferedInputStream buffered = new BufferedInputStream(content);
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int letter;
while((letter = buffered.read()) != -1)
fos.write(letter);
fos.close();
Later, I try to access that file by using:
ObjectInputStream keyInStream = new ObjectInputStream(new FileInputStream(savedFile));
Most of the time it works without a problem, but every once in a while we get the error:
java.io.StreamCorruptedException: invalid stream header: 0D0A0D0A
which makes me believe that it isn't saving the file correctly.
I'm guessing that the operations you've done with getContent and BufferedInputStream have treated the file like an ascii file which has converted newlines or carriage returns into carriage return + newline (0x0d0a), which has confused ObjectInputStream (which expects serialized data objects.
If you are using an FTP URL, the transfer may be occurring in ASCII mode.
Try appending ";type=I" to the end of your URL.
Why are you using ObjectInputStream to read it?
As per the javadoc:
An ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream.
Probably the error comes from the fact you didn't write it with ObjectOutputStream.
Try reading it wit FileInputStream only.
Here's a sample for binary ( although not the most efficient way )
Here's another used for text files.
There are 3 big problems in your sample code:
You're not just treating the input as bytes
You're needlessly pulling the entire object into memory at once
You're doing multiple method calls for every single byte read and written -- use the array based read/write!
Here's a redo:
URL targetUrl = new URL(urlForFile);
InputStream is = targetUrl.getInputStream();
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int count;
byte[] buff = new byte[16 * 1024];
while((count = is.read(buff)) != -1) {
fos.write(buff, 0, count);
}
fos.close();
content.close();
You could also step back from the code and check to see if the file on your client is the same as the file on the server. If you get both files on an XP machine, you should be able to use the FC utility to do a compare (check FC's help if you need to run this as a binary compare as there is a switch for that). If you're on Unix, I don't know the file compare program, but I'm sure there's something.
If the files are identical, then you're looking at a problem with the code that reads the file.
If the files are not identical, focus on the code that writes your file.
Good luck!