JAVA: Open and read file using InputStreamReader

JAVA: Open and read file using InputStreamReader - java

I'm trying to read a binary file (pdf, doc, zip) using InputStreamReader. I achieved that using FileInputStream, and saving the contents of file into a byte array. But i've been asked to do that using InputStreamReader. So when i'm trying to open and read a pdf file for example using
File file = new File (inputFileName);
Reader in = new
InputStreamReader(new FileInputStream(file));
char fileContent[] = new char[(int)file.length()];
in.read(fileContent); in.close();
and then save this content to another pdf file using
File outfile = new File(outputFile);
Writer out = new OutputStreamWriter(new FileOutputStream(outfile));
out.write(fileContent);
out.close();
Everything goes fine (no exception or errors) but when i'm trying to open the new file, either it says it's corrupted or wrond encoding.
Any suggestion??
ps1 i specifically need this using InputStreamReader
ps2 it works fine when trying to read/write .txt files

String, char, Reader, Writer are for text in java. This text is Unicode, and hence all scripts may be combined.
byte[], InputStream, OutputStream is for binary data. If they represent text, they must be associated with some encoding.
The bridge between text and binary data always involves a conversion.
In your case:
Reader in = new InputStreamReader(new FileInputStream(file), encoding);
Reader in = new InputStreamReader(new FileInputStream(file)); // Platform's encoding
The second version is non-portable, as other computers can have any encodings.
In your case, do not use an InputStreamReader for binary data. The conversion can only corrupt things.
Maybe they intended to mean: do not read all in a byte array. In that case use a BufferedInputStream to read small byte arrays (a buffer) repeatedly.

Do not use reader/writer API. Use binary streams instead:
File inFile = new File("...");
File outFile = new File("...");
FileChannel in = new FileInputStream(inFile).getChannel();
FileChannel out = new FileOutputStream(outFile).getChannel();
in.transferTo(0, inFile.length(), out);

Related

special characters in utf-8 text file

I've an input file which comes under ANSI UNIX file format. I convert that file into UTF-8.
Before converting to UTF-8, there is an special character like this in input file
»
After converting to UTF-8, it becomes like this
Ã»
When I process my file as it is, without converting to utf-8, all special characters disappeared and data loss as well.
But when I process my file after converting to UTF-8, All data appears with special character same as am getting after converting to UTF-8 in output file.
ANSI to UTF-8 (could be wrong, please correct me if am wrong somewhere)
FileInputStream = fis = new FileInputStream("inputtextfile.txt");
InputStreamReader isr = new InputStreamReader (fis, "ISO-8859-1");
Reader in = new BufferReader(isr);
FileOutputStream fos = new FileOutputStream("outputfile.txt");
OutPutStreamWriter osw = OutPutStreamWriter("fos", "UTF-8");
Writer out = new BufferedWriter(osw);
int ch;
out.write("\uFEFF";);
while ((ch = in.read()) > -1 ) {
out.write(ch);
}
out.close();
in.close();
After this am processing my file further for final output.
I'm using Talend ETL tool for creating an final output out of generated utf-8. (Java based ETL tool)
What I want is, I want to process my file so that I could get same special characters in output as am getting in input file.
I'm using java 1.8 for this whole processing. I'
'm too stuck in this situation and never dealt this with special characters.
Any suggestion would be helpful.

When do I need to specify the encoding while writing the file to the disk?

I have a sample method which copies one file to another using InputStream and OutputStream. In this case, the source file is encoded in 'UTF-8'. Even if I don't specify the encoding while writing to the disk, the destination file has the correct encoding. But, if I have to write a java.lang.String to a file, I need to specify the encoding. Why is that ?
public static void copyFile() {
String sourceFilePath = "C://my_encoded.txt";
InputStream inStream = null;
OutputStream outStream = null;
try{
String targetFilePath = "C://my_target.txt";
File sourcefile =new File(sourceFilePath);
outStream = new FileOutputStream(targetFilePath);
inStream = new FileInputStream(sourcefile);
byte[] buffer = new byte[1024];
int length;
//copy the file content in bytes
while ((length = inStream.read(buffer)) > 0){
outStream.write(buffer, 0, length);
}
inStream.close();
outStream.close();
System.out.println("File "+targetFilePath+" is copied successful!");
}catch(IOException e){
e.printStackTrace();
}
}
My guess is that since the source file has thee correct encoding and since we read and write one byte at a time, it works fine. And java.lang.String is 'UTF-16' by default and if we write it to the file, it reads one byte at a time instead of 2 bytes and hence garbage values. Is that correct or am I completely wrong in my understanding ?

You are copying the file byte per byte, so you don't need to care about character encoding.
As a rule of thumb:
Use the various InputStream and OutputStream implementations for byte-wise processing (like file copy).
There are some convenience methods to handle text directly like PrintStream.println(). Be careful because most of them use the default platform specific encoding.
Use the various Reader and Writer implemenations for reading and writing text.
If you need to convert between byte-wise and text processing use InputStreamReader and OutputStreamWriter with explicit file encoding.
Do not rely on the default encoding. The default character encoding is platform specific (e.g. Windows-ANSI aka Cp1252 for Windows, usually UTF-8 on Linux).
Example: If you need to read a UTF-8 text file:
BufferedReader reader =
new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "UTF-8"));
Avoid using a FileReader because a FileReader uses always the default encoding.
A special case: If you need random access to a file you should use RandomAccessFile. With it you can read and write data blocks at arbitrary positions. You can read and write raw byte blocks or you can use convenience methods to read and write text. But you should read the documentation carefully. E.g. the methods readUTF() and writeUTF() use a modified UTF-8 encoding.
InputStream, OutputStream, Reader, Writer and RandomAccessFile form the basic IO functionality, enough for most use cases. For advanced IO (e.g. memory mapped files, ...) have a look at package java.nio.

Just read your code! (For the copy part at least ;-) )
When you copy the two files, you copy it byte by byte. There is no conversion to String, thus.
When you write a String into a file, you need to convert it (indirectly sometimes) in an array of byte (byte[]). There you need to specify your encoding.
When you read a file to get a String, you need to know its encoding in order to do it properly. Java doesn't 'skip' any byte but you need to make a conversion once again : from a byte[] to a String.

Java write byte in file

I am doing in Java a File Comprimir to a Binary File. The problem is the next, how can I write a Byte in a new File that the total size only occups 1 byte? I am doing the next:
FileOutputStream saveFile=new FileOutputStream("SaveObj3.sav");
// Create an ObjectOutputStream to put objects into save file.
ObjectOutputStream save = new ObjectOutputStream(saveFile);
save.writeByte(0);
save.close();
saveFile.close();
That, only must to write a only byte in the file, but when I look the size,it occups 7 bytes. Anyone knows how can I write a only byte? Is there another way better?

Don't use ObjectOutputStream. Use the FileOutputStream directly:
FileOutputStream out=new FileOutputStream("SaveObj3.sav");
out.write(0);
out.close();

As JB Nizet noticed documentation of ObjectOutputStream constructor states that this object also
writes the serialization stream header to the underlying stream
which explains additional bytes.
To prevent this behaviour you can just use other streams like FileOutputStream or maybe DataOutputStream
FileOutputStream saveFile = new FileOutputStream("c:/SaveObj3.sav");
DataOutputStream save = new DataOutputStream(saveFile);
save.writeByte(0);
save.close();

You can use Files class provided by Java 7. It's more easy than expected.
It can be performed in one line:
byte[] bytes = new String("message output to be written in file").getBytes();
Files.write(Paths.get("outputpath.txt"), bytes);
If you have a File class, you can just replace:
Paths.get("outputpath.txt")
to:
yourOutputFile.toPath()
To write only one byte, as you want, you can do the following:
Files.write(Paths.get("outputpath.txt"), new byte[1]);
in file properties:
size: 1 bytes

Java read utf-8 encoded file, character by character

I have a file saved as utf-8 (saved by my application in fact). How do you read it character by character?
File file = new File(folder+name);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream dis = new DataInputStream(bis);
The two options seem to be:
char c = dis.readByte()
char c = dis.readChar()
The first option works as long as you only have ascii characters stored, ie english.
The second option reads the first and second byte of the file as one character.
The original file is being written as follows:
File file = File.createTempFile("file", "txt");
FileWriter fstream = new FileWriter(file);
BufferedWriter out = new BufferedWriter(fstream);

You don't want a DataInputStream, that's for reading raw bytes. Use an InputStreamReader, which lets you specify the encoding of the input (UTF-8 in your case).

You should be aware that in the Java world you use streams to process bytes, and readers/writers to process characters. These two are not the same, and you should choose the right one to handle what you have.
Have a look at http://java.sun.com/docs/books/tutorial/i18n/text/stream.html to see how to work with characters in a byte-oriented world.
The Sun Java Tutorial is a highly recommended learning resource.

Use a Reader (eg. BufferedReader)
Reader reader = new BufferedReader(new FileReader(file));
char c = reader.read();

You can read individual bytes and when you hit a byte that is less than 128 (ie. the 8th byte is 0) then that is the last byte of the character.
I'm no Java expert, but I would assume that there are better ways. Maybe some way of telling the reader what encoding it is in...
edit: see dmazzoni's answer.

Corrupt file when using Java to download file

This problem seems to happen inconsistently. We are using a java applet to download a file from our site, which we store temporarily on the client's machine.
Here is the code that we are using to save the file:
URL targetUrl = new URL(urlForFile);
InputStream content = (InputStream)targetUrl.getContent();
BufferedInputStream buffered = new BufferedInputStream(content);
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int letter;
while((letter = buffered.read()) != -1)
fos.write(letter);
fos.close();
Later, I try to access that file by using:
ObjectInputStream keyInStream = new ObjectInputStream(new FileInputStream(savedFile));
Most of the time it works without a problem, but every once in a while we get the error:
java.io.StreamCorruptedException: invalid stream header: 0D0A0D0A
which makes me believe that it isn't saving the file correctly.

I'm guessing that the operations you've done with getContent and BufferedInputStream have treated the file like an ascii file which has converted newlines or carriage returns into carriage return + newline (0x0d0a), which has confused ObjectInputStream (which expects serialized data objects.

If you are using an FTP URL, the transfer may be occurring in ASCII mode.
Try appending ";type=I" to the end of your URL.

Why are you using ObjectInputStream to read it?
As per the javadoc:
An ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream.
Probably the error comes from the fact you didn't write it with ObjectOutputStream.
Try reading it wit FileInputStream only.
Here's a sample for binary ( although not the most efficient way )
Here's another used for text files.

There are 3 big problems in your sample code:
You're not just treating the input as bytes
You're needlessly pulling the entire object into memory at once
You're doing multiple method calls for every single byte read and written -- use the array based read/write!
Here's a redo:
URL targetUrl = new URL(urlForFile);
InputStream is = targetUrl.getInputStream();
File savedFile = File.createTempFile("temp",".dat");
FileOutputStream fos = new FileOutputStream(savedFile);
int count;
byte[] buff = new byte[16 * 1024];
while((count = is.read(buff)) != -1) {
fos.write(buff, 0, count);
}
fos.close();
content.close();

You could also step back from the code and check to see if the file on your client is the same as the file on the server. If you get both files on an XP machine, you should be able to use the FC utility to do a compare (check FC's help if you need to run this as a binary compare as there is a switch for that). If you're on Unix, I don't know the file compare program, but I'm sure there's something.
If the files are identical, then you're looking at a problem with the code that reads the file.
If the files are not identical, focus on the code that writes your file.
Good luck!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.