Writing byte array to an UTF8-encoded file - java

Given a byte array in UTF-8 encoding (as result of base64 decoding of a String) - what is please a correct way to write it to a file in UTF-8 encoding?
Is the following source code (writing the array byte by byte) correct?
OutputStreamWriter osw = new OutputStreamWriter(
new FileOutputStream(tmpFile), Charset.forName("UTF-8"));
for (byte b: buffer)
osw.write(b);
osw.close();

Don't use a Writer. Just use the OutputStream. A complete solution using try-with-resource looks as follows:
try (FileOutputStream fos = new FileOutputStream(tmpFile)) {
fos.write(buffer);
}
Or even better, as Jon points out below:
Files.write(Paths.get(tmpFile), buffer);

Related

Java OutStreamWriter to ByteArrayInputStream

I am writing a csv file in a very old java application so i can not use all the new Java 8 streams.
Writer writer = new OutputStreamWriter(new FileOutputStream("file.csv"));
writer.append("data,");
writer.append("data,");
...
Then I need to transform the writer object into a ByteArrayInputStream.
How can i do it ?
Thanks in advance.
Best regards.
This depends on what you are trying to do.
If you are writing a bunch of data to the file and THEN reading the file you will want to use a FileInputStream in place of your ByteArrayInputStream.
If you want to write a bunch of data to a byte array then you should take a look at using a ByteArrayOutputStream. If you then need to read the byte array as a ByteArrayInputStream you can pass the ByteArrayOutputStream into the input stream like what is shown below. Keep in mind this only works for writing and THEN reading. You can not use this like a buffer.
//Create output stream
ByteArrayOutputStream out = new ByteArrayOutputStream();
//Create Writer
Writer writer = new OutputStreamWriter(out);
//Write stuff
...
//Close writer
writer.close();
//Create input stream using the byte array from out as input.
ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
Short answer: you can't.
A ByteArrayInputStream is just not assignable from a OutputStreamWriter.
Since you're probably after write, you can just read the file back to a byte[] and then construct a ByteArrayInputStream with it:
File file = new File("S:\\Test.java");
FileInputStream fis = new FileInputStream(file);
byte[] content = new byte[(int) file.length()];
fis.read(content,0,content.length);
ByteArrayInputStream bais = new ByteArrayInputStream(content);

What can be replaced by fileInputStream.available()?

When learning Java IO, I found that fileInputStream has an availabl() method, which can be equal to the file size when reading local files. So if you can directly know the size of the file, then in the case of the need to read the entire file, it is necessary to use BufferedInputStream to decorate it?
like this:
FileInputStream fileInputStream=new FileInputStream("F:\\test.txt");
byte[] data=new byte[fileInputStream.available()];
if (fileInputStream.read(data)!=-1) {
System.out.println(new String(data));
}
or
BufferedReader bufferedReader=new BufferedReader(new
FileReader("F:\\test.txt"));
StringBuilder stringBuilder=new StringBuilder();
for (String line;(line=bufferedReader.readLine())!=null;){
stringBuilder.append(line);
}
System.out.println(stringBuilder.toString());
or
BufferedInputStream bufferedInputStream=new BufferedInputStream(new FileInputStream("F:\\test.txt"));
byte[] data=new byte[bufferedInputStream.available()];
if (bufferedInputStream.read(data)!=-1) {
System.out.println(new String(data));
}
What are the pros and cons of these methods? Which one is better?
thx.
You are wrong about the meaning of available(). It returns the possible number of bytes you can read without blocking. From documentation:
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
So, if you want convert stream to byte array you should use corresponding libraries, such as IOUtils:
byte[] out = IOUtils.toByteArray(stream);

Extract tar.gz file in memory in Java

I'm using the Apache Compress library to read a .tar.gz file, something like this:
final TarArchiveInputStream tarIn = initializeTarArchiveStream(this.archiveFile);
try {
TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
while (tarEntry != null) {
byte[] btoRead = new byte[1024];
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
int len = 0;
while ((len = tarIn.read(btoRead)) != -1) {
bout.write(btoRead, 0, len);
}
bout.close();
tarEntry = tarIn.getNextTarEntry();
}
tarIn.close();
}
catch (IOException e) {
e.printStackTrace();
}
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
You could replace the file stream with a ByteArrayOutputStream.
i.e. replace this:
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
with this:
ByteArrayOutputStream bout = new ByteArrayOutputStream();
and then after closing bout, use bout.toByteArray() to get the bytes.
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
Yea sure.
Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream ... or a series of such streams.
The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)
Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.
copy the value of btoread to a String like
String s = String.valueof(byteVar);
and goon appending the byte value to the string untill end of the file reaches..

Java Charset InputStreamReader, File Channel Differences

I'm trying to read a (Japanese) file that is encoded as a UTF-16 file.
When I read it using an InputStreamReader with a charset of 'UTF-16" the file is read correctly:
try {
InputStreamReader read = new InputStreamReader(new FileInputStream("JapanTest.txt"), "UTF-16");
BufferedReader in = new BufferedReader(read);
String str;
while((str=in.readLine())!=null){
System.out.println(str);
}
in.close();
}catch (Exception e){
System.out.println(e);
}
However, when I use File Channels and read from a byte array the Strings aren't always converted correctly:
File f = new File("JapanTest.txt");
fis = new FileInputStream(f);
channel = fis.getChannel();
MappedByteBuffer buffer = channel.map( FileChannel.MapMode.READ_ONLY, 0L, channel.size());
buffer.position(0);
int get = Math.min(buffer.remaining(), 1024);
byte[] barray = new byte[1024];
buffer.get(barray, 0, get);
CharSet charSet = Charset.forName("UTF-16");
//endOfLinePos is a calculated value and defines the number of bytes to read
rowString = new String(barray, 0, endOfLinePos, charSet);
System.out.println(rowString);
The problem I've found is that I can only read characters correctly if the MappedByteBuffer is at position 0. If I increment the position of the MappedByteBuffer and then read a number of bytes into a byte array, which is then converted to a string using the charset UTF-16, then the bytes are not converted correctly. I haven't faced this issue if a file is encoded in UTF-8, so is this only an issue with UTF-16?
More Details:
I need to be able to read any line from the file channel, so to do this I build a list of line ending byte positions and then use those positions to be able to get the bytes for any given line and then convert them to a string.
The code unit of UTF-16 is 2 bytes, not a byte like UTF-8. The pattern and single byte code unit length makes UTF-8 self-synchronizing; it can read correctly at any point and if it's a continuation byte, it can either backtrack or lose only a single character.
With UTF-16 you must always work with pairs of bytes, you cannot start reading at an odd byte or stop reading at an odd byte. You also must know the endianess, and use either UTF-16LE or UTF-16BE when not reading at the start of the file, because there will be no BOM.
You can also encode the file as UTF-8.
Possibly, the InputStreamReader does some transformations the normal new String(...) does not. As a work-around (and to verify this assumption) you could try to wrap the data read from the channel like new InputStreamReader( new ByteArrayInputStream( barray ) ).
Edit: Forget that :) - Channels.newReader() would be the way to go.

PrintWriter and byte[] problem

byte[] data = (byte[])opBinding.execute();
PrintWriter out = new PrintWriter(outputStream);
out.println(data);
out.flush();
out.close();
but instead of text i get #84654. How can i add byte[] to PrintWriter? I need byte[] and not strinf becouse i have encoing problems with čćžšđ
You can use the outputstream directly to write the bytes.
outputStream.write(byte[] b);
PrintWriter is meant for text data, not binary data.
It sounds like you should quite possibly be converting your byte[] to a String, and then writing that string out - assuming the PrintWriter you're writing to uses an encoding which supports the characters you're interested in.
You'll also need to know the encoding that the original text data has been encoded in for the byte[], in order to successfully convert to text to start with.
The problem is, that your code calls (implicitly) data.toString() before returning the result to your println statement.
try this
byte[] data = (byte[])opBinding.execute();
PrintWriter out = new PrintWriter(outputStream);
out.println(new String(data));
out.flush();
out.close();
It worked for me.. when i used
PrintWriter out=new PrintWriter(System.out);
Also it converts the byte data to String using toString() method.. So it may be a reason for your encoding problem

Categories

Resources