Decompress Gzip file and store in variable - java

I have searched a lot for conversion from byte to string but my query is a little different, please read ahead.
Currently i have a gzip file which i can decompress using the code from http://www.mkyong.com/java/how-to-decompress-file-from-gzip-file/.
This code helps me store my decompressed output in a file, but how do i store it in a variable? I am using this code currently:
public String unGunzipFile(String compressedFile, String decompressedFile) {
byte[] buffer = new byte[1024];
try {
FileInputStream fileIn = new FileInputStream(compressedFile);
GZIPInputStream gZIPInputStream = new GZIPInputStream(fileIn);
FileOutputStream fileOutputStream = new FileOutputStream(decompressedFile);
StringBuffer str = new StringBuffer();
int bytes_read;
while ((bytes_read = gZIPInputStream.read(buffer)) > 0) {
String s = new String(buffer);
str.append(s);
fileOutputStream.write(buffer, 0, bytes_read);
}
gZIPInputStream.close();
fileOutputStream.close();
System.out.println("The file was decompressed successfully!");
System.out.println(str);
String final_string = str.toString();
return final_string;
} catch (IOException ex) {
ex.printStackTrace();
return null;
}
}
Since i am converting bytes to string near the end when bytes_read is not 1024 in length i end up getting some weird data in my StringBuffer, but in the file there is no such data since fileOutputStream.write(buffer, 0, bytes_read); limits it to writing the updated part.
How do i fix this?
Thanks in advance.

Use the String(byte[] bytes, int offset, int length) constructor that lets you specify the length to be converted. i.e.
String s = new String(buffer, 0, bytes_read)

Instead of using String s = new String(buffer), i suggest use
public String(byte[] bytes,
int offset,
int length)
which might help you.

Related

How can I decompress a stream in c# like this java snippet code?

I'm trying to convert this java snippet code in c# but I'm a bit confused about it.
This is the java code:
My try is the following, but there are some errors in gis.Read, because it wants a char* and not a byte[] and in the String constructor for the same reason.
public static String decompress(InputStream input) throws IOException
{
final int BUFFER_SIZE = 32;
GZIPInputStream gis = new GZIPInputStream(input, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1) {
string.append(new String(data, 0, bytesRead));
}
gis.close();
// is.close();
return string.toString();
}
I expected to get a readable string.
You need to transform the bytes to characters first. For that, you need to know the encoding.
In your code, you could have replaced new String(data, 0, bytesRead) with Encoding.UTF8.GetString(data, 0, bytesRead) to do that. However, I would handle this slightly differently.
StreamReader is a useful class to read bytes as text in C#. Just wrap it around your GZipStream and let it do its magic.
public static string Decompress(Stream input)
{
// note this buffer size is REALLY small.
// You could stick with the default buffer size of the StreamReader (1024)
const int BUFFER_SIZE = 32;
string result = null;
using (var gis = new GZipStream(input, CompressionMode.Decompress, leaveOpen: true))
using (var reader = new StreamReader(gis, Encoding.UTF8, true, BUFFER_SIZE))
{
result = reader.ReadToEnd();
}
return result;
}

corrupted file text while reading

I have the following code:
BlobDomain blobDomain = null;
OutputStream out = null;
try {
blobDomain = new BlobDomain();
out = blobDomain.getBinaryOutputStream();
byte[] buffer = new byte[8192];
int bytesRead = 0;
while ((bytesRead = in.read(buffer, 0, 8192)) != -1) {
out.write(buffer, 0, bytesRead);
String line = (new String(buffer));
fullText += line;
}
} catch (Exception e) {
//do nothing
}finally{
if (out != null)
try {
out.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
when i print the fullText what i see for larger files is that end part of the text is added again to the fullText. So full text has some lines repeated in the end. any suggestions on what is wrong here?
The reason that you are getting this is that you are writing the entire buffer every time to your String. Thus, when you reach the end of the file you may not have read exactly the amount of bytes that your buffer is sized at. The old data is still in the buffer and will also be written to your String.
One option to solve this may be to write your data to a String first and then to write your String to the output stream. This should also be faster than adding to a String after each read.
Save inputStream to String:
java.util.Scanner s = new java.util.Scanner(in).useDelimiter("\\A");
fullText = s.hasNext() ? s.next() : "";
Write String to output stream:
out.write(fullText.getBytes());
If you want to keep you code as-is then perform a substring on the buffer and retrieve only the amount of bytes read. For example:
String line = (new String(buffer.substring(0,bytesRead));

How to use ByteStream to read 1Mb of a file into a string

What I have now is using FileInputStream
int length = 1024*1024;
FileInputStream fs = new FileInputStream(new File("foo"));
fs.skip(offset);
byte[] buf = new byte[length];
int bufferSize = fs.read(buf, 0, length);
String s = new String(buf, 0, bufferSize);
I'm wondering how can I realize the same result by using ByteStreams in guava library.
Thanks a lot!
Here's how you could do it with Guava:
byte[] bytes = Files.asByteSource(new File("foo"))
.slice(offset, length)
.read();
String s = new String(bytes, Charsets.US_ASCII);
There are a couple of problems with your code (though it may work fine for files, it won't necessarily for any type of stream):
fs.skip(offset);
This doesn't necessarily skip all offset bytes. You have to either check the number of bytes it skipped in the return value until you've skipped the full amount or use something that does that for you, such as ByteStreams.skipFully.
int bufferSize = fs.read(buf, 0, length);
Again, this won't necessarily read all length bytes, and the number of bytes it does read can be an arbitrary amount--you can't rely on it in general.
String s = new String(buf, 0, bufferSize);
This implicitly uses the system default Charset, which usually isn't a good idea--and when you do want it, it's best to make it explicit with Charset.defaultCharset().
Also note that in general, a certain number of bytes may not translate to a legal sequence of characters depending on the Charset being used (i.e. if it's ASCII you're fine, if it's Unicode, not so much).
Why try to use Guava when it's not necessary ?
In this case, it looks like you're looking exactly for a RandomAccessFile.
File file = new File("foo");
long offset = ... ;
try (RandomAccessFile raf = new RandomAccessFile(file, "r")) {
byte[] buffer = new byte[1014*1024];
raf.seek(offset);
raf.readFully(buffer);
return new String(buffer, Charset.defaultCharset());
}
I'm not aware of a more elegant solution:
public static void main(String[] args) throws IOException {
final int offset = 20;
StringBuilder to = new StringBuilder();
CharStreams.copy(CharStreams.newReaderSupplier(new InputSupplier<InputStream>() {
#Override
public InputStream getInput() throws IOException {
FileInputStream fs = new FileInputStream(new File("pom.xml"));
ByteStreams.skipFully(fs, offset);
return fs;
}
}, Charset.defaultCharset()), to);
System.out.println(to);
}
The only advantage is that you can save some GC time when your String is really big by avoiding conversion into String.

Java: Outputting text file to Console

I'm attempting to output a text file to the console with Java. I was wondering what is the most efficient way of doing so?
I've researched several methods however, it's difficult to discern which is the least performance impacted solution.
Outputting a text file to the console would involve reading in each line in the file, then writing it to the console.
Is it better to use:
Buffered Reader with a FileReader, reading in lines and doing a bunch of system.out.println calls?
BufferedReader in = new BufferedReader(new FileReader("C:\\logs\\"));
while (in.readLine() != null) {
System.out.println(blah blah blah);
}
in.close();
Scanner reading each line in the file and doing system.print calls?
while (scanner.hasNextLine()) {
System.out.println(blah blah blah);
}
Thanks.
If all you want to do is print the contents of a file (and don't want to print the next int/double/etc.) to the console then a BufferedReader is fine.
Your code as it is won't produce the result you're after, though. Try this instead:
BufferedReader in = new BufferedReader(new FileReader("C:\\logs\\log001.txt"));
String line = in.readLine();
while(line != null)
{
System.out.println(line);
line = in.readLine();
}
in.close();
I wouldn't get too hung up about it, though because it's more likely that the main bottleneck will be the ability of your console to print the information that Java is sending it.
If you're not interested in the character based data the text file is containing, just stream it "raw" as bytes.
InputStream input = new BufferedInputStream(new FileInputStream("C:/logs.txt"));
byte[] buffer = new byte[8192];
try {
for (int length = 0; (length = input.read(buffer)) != -1;) {
System.out.write(buffer, 0, length);
}
} finally {
input.close();
}
This saves the cost of unnecessarily massaging between bytes and characters and also scanning and splitting on newlines and appending them once again.
As to the performance, you may find this article interesting. According the article, a FileChannel with a 256K byte array which is read through a wrapped ByteBuffer and written directly from the byte array is the fastest way.
FileInputStream input = new FileInputStream("C:/logs.txt");
FileChannel channel = input.getChannel();
byte[] buffer = new byte[256 * 1024];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
try {
for (int length = 0; (length = channel.read(byteBuffer)) != -1;) {
System.out.write(buffer, 0, length);
byteBuffer.clear();
}
} finally {
input.close();
}
If it's a relatively small file, a one-line Java 7+ way to do this is:
System.out.println(new String(Files.readAllBytes(Paths.get("logs.txt"))));
See https://docs.oracle.com/javase/7/docs/api/java/nio/file/package-summary.html for more details.
Cheers!
If all you want is most efficiently dump the file contents to the console with no processing in-between, converting the data into characters and finding line breaks is unnecessary overhead. Instead, you can just read blocks of bytes from the file and write then straight out to System.out:
package toconsole;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
public class Main {
public static void main(String[] args) {
BufferedInputStream bis = null;
byte[] buffer = new byte[8192];
int bytesRead = 0;
try {
bis = new BufferedInputStream(new FileInputStream(args[0]));
while ((bytesRead = bis.read(buffer)) != -1) {
System.out.write(buffer, /* start */ 0, /* length */ bytesRead);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try { bis.close(); } catch (Exception e) { /* meh */ }
}
}
}
In case you haven't come across this kind of idiom before, the statement in the while condition both assigns the result of bis.read to bytesRead and then compares it to -1. So we keep reading bytes into the buffer until we are told that we're at the end of the file. And we use bytesRead in System.out.write to make sure we write only the bytes we've just read, as we can't assume all files are a multiple of 8 kB long!
FileInputStream input = new FileInputStream("D:\\Java\\output.txt");
FileChannel channel = input.getChannel();
byte[] buffer = new byte[256 * 1024];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
try {
for (int length = 0; (length = channel.read(byteBuffer)) != -1;) {
System.out.write(buffer, 0, length);
byteBuffer.clear();
}
} finally {
input.close();
}
Path temp = Files.move
(Paths.get("D:\\\\Java\\\\output.txt"),
Paths.get("E:\\find\\output.txt"));
if(temp != null)
{
System.out.println("File renamed and moved successfully");
}
else
{
System.out.println("Failed to move the file");
}
}
For Java 11 you could use more convenient approach:
Files.copy(Path.of("file.txt"), System.out);
Or for more faster output:
var out = new BufferedOutputStream(System.out);
Files.copy(Path.of("file.txt"), out);
out.flush();

Reading text file in J2ME

I'm trying to read a resource (asdf.txt), but if the file is bigger than 5000 bytes, (for example) 4700 pieces of null-character inserted to the end of the content variable. Is there any way to remove them? (or to set the right size of the buffer?)
Here is the code:
String content = "";
try {
InputStream in = this.getClass().getResourceAsStream("asdf.txt");
byte[] buffer = new byte[5000];
while (in.read(buffer) != -1) {
content += new String(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
The simplest way is to do the correct thing: Use a Reader to read text data:
public String readFromFile(String filename, String enc) throws Exception {
String content = "";
Reader in = new
InputStreamReader(this.getClass().getResourceAsStream(filename), enc);
StringBuffer temp = new StringBuffer(1024);
char[] buffer = new char[1024];
int read;
while ((read=in.read(buffer, 0, buffer.length)) != -1) {
temp.append(buffer, 0, read);
}
content = temp.toString();
return content;
}
Note that you definitely should define the encoding of the text file you want to read.
And note that both your code and this example code work equally well on Java SE and J2ME.

Categories

Resources