Count the bytes written to file via BufferedWriter formed by GZIPOutputStream - java

I have a BufferedWriter as shown below:
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new GZIPOutputStream( hdfs.create(filepath, true ))));
String line = "text";
writer.write(line);
I want to find out the bytes written to the file with out querying file like
hdfs = FileSystem.get( new URI( "hdfs://localhost:8020" ), configuration );
filepath = new Path("path");
hdfs.getFileStatus(filepath).getLen();
as it will add overhead and I don't want that.
Also I cant do this:
line.getBytes().length;
As it give size before compression.

You can use the CountingOutputStream from Apache commons IO library.
Place it between the GZIPOutputStream and the file Outputstream (hdfs.create(..)).
After writing the content to the file you can read the number of written bytes from the CountingOutputStream instance.

If this isn't too late and you are using 1.7+ and you don't wan't to pull in an entire library like Guava or Commons-IO, you can just extend the GZIPOutputStream and obtain the data from the associated Deflater like so:
public class MyGZIPOutputStream extends GZIPOutputStream {
public MyGZIPOutputStream(OutputStream out) throws IOException {
super(out);
}
public long getBytesRead() {
return def.getBytesRead();
}
public long getBytesWritten() {
return def.getBytesWritten();
}
public void setLevel(int level) {
def.setLevel(level);
}
}

You can make you own descendant of OutputStream and count how many time write method was invoked

This is similar to the response by Olaseni, but I moved the counting into the BufferedOutputStream rather than the GZIPOutputStream, and this is more robust, since def.getBytesRead() in Olaseni's answer is not available after the stream has been closed.
With the implementation below, you can supply your own AtomicLong to the constructor so that you can assign the CountingBufferedOutputStream in a try-with-resources block, but still retrieve the count after the block has exited (i.e. after the file is closed).
public static class CountingBufferedOutputStream extends BufferedOutputStream {
private final AtomicLong bytesWritten;
public CountingBufferedOutputStream(OutputStream out) throws IOException {
super(out);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize) throws IOException {
super(out, bufSize);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize, AtomicLong bytesWritten)
throws IOException {
super(out, bufSize);
this.bytesWritten = bytesWritten;
}
#Override
public void write(byte[] b) throws IOException {
super.write(b);
bytesWritten.addAndGet(b.length);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
super.write(b, off, len);
bytesWritten.addAndGet(len);
}
#Override
public synchronized void write(int b) throws IOException {
super.write(b);
bytesWritten.incrementAndGet();
}
public long getBytesWritten() {
return bytesWritten.get();
}
}

Related

Append to ObjectOutputStream iteratively

I want to add lots of data to a file. I defined the HYB class since my object contains ofdifferent types of data (String and byte[]). I used ObjectOutputStream and ObjectInputStream to write and read from the file. But my code does not print the expected result. To write my code I used code in the following pages:
How can I append to an existing java.io.ObjectStream?
ClassCastException when Appending Object OutputStream
I try to debug my code and found the problem but I could not. This is my code:
import java.io.*;
import java.io.BufferedOutputStream;
import java.util.*;
public class HYB implements Serializable
{
private static final long serialVersionUID = 1L;
private List<byte[]> data = new ArrayList<>();
public void addRow(String s,byte[] a)
{
data.add(s.getBytes()); // add encoding if necessary
data.add(a);
}
#Override public String toString()
{
StringBuilder sb = new StringBuilder();
synchronized (data)
{
for(int i=0;i<data.size();i+=2)
{
sb.append(new String(data.get(i)));
sb.append(Arrays.toString(data.get(i+1))+"\n");
}
}
return sb.toString();
}
private static void write(File storageFile, HYB hf)
throws IOException {
ObjectOutputStream oos = getOOS(storageFile);
oos.writeObject(hf);
oos.flush();
oos.close();
}
public static ObjectOutputStream getOOS(File file) throws IOException
{
if (file.exists()) {
return new AppendableObjectOutputStream(new FileOutputStream(file, true));
} else {
return new ObjectOutputStream(new FileOutputStream(file));
}
}
private static ObjectInputStream getOIS(FileInputStream fis)
throws IOException {
long pos = fis.getChannel().position();
return pos == 0 ? new ObjectInputStream(fis) :
new AppendableObjectInputStream(fis);
}
private static class AppendableObjectOutputStream extends
ObjectOutputStream {
public AppendableObjectOutputStream(OutputStream out)
throws IOException {
super(out);
}
#Override
protected void writeStreamHeader() throws IOException {
}
}
private static class AppendableObjectInputStream extends ObjectInputStream {
public AppendableObjectInputStream(InputStream in) throws IOException {
super(in);
}
#Override
protected void readStreamHeader() throws IOException {
// do not read a header
}
}
public static void main(String[] args) throws FileNotFoundException, IOException, ClassNotFoundException
{
File x=new File ("test");
HYB hf1 = new HYB();
hf1.addRow("fatemeh",new byte[] {11,12,13});
hf1.addRow("andisheh",new byte[] {14,15,16});
write(x,hf1);
HYB hf = new HYB();
hf.addRow("peter",new byte[] {1,2,3});
hf.addRow("jaqueline",new byte[] {4,5,6});
write(x,hf);
FileInputStream fis = new FileInputStream(x);
HYB hf2 = (HYB) getOIS(fis).readObject();
System.out.println(hf2);
}
}
expected results:
fatemeh[11, 12, 13]
andisheh[14, 15, 16]
peter[1, 2, 3]
jaqueline[4, 5, 6]
actual results:
fatemeh[11, 12, 13]
andisheh[14, 15, 16]
Writing the two HYB objects to the ObjectOutputStream doesn't merge them into a single HYB object; the ObjectOutputStream still contains two HYB object, of which your code reads one. If you did a second call to readObject(), the second one would be retrieved and could be printed to the screen. So you could just wrap the readObject() and println() calls in a loop that reads/writes until there's nothing else to read from the stream.
You are writing two HYB objects to the stream, but only reading one out.
You need to readObject() twice.

Logging InputStream

I create an InputStream class, that extends CiphetInputStream. I want to log all data from my InputStream (that i use as input in parser further) so i done following:
public class MyInputStream extends CipherInputStream {
private OutputStream logStream = new ByteArrayOutputStream();
.....
#Override
public int read() throws IOException {
int read = super.read();
logStream.write(read);
return read;
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
int read = super.read(b, off, len);
if (read > 0) {
logStream.write(b, off, read);
}
return read;
}
#Override
public int read(byte[] buffer) throws IOException {
int read = super.read(buffer);
if (read()>0) {
logStream.write(buffer);
}
return read;
}
#Override
public void close() throws IOException {
log();
super.close();
}
public void log() {
String logStr = new String(((ByteArrayOutputStream) logStream).toByteArray(), Charset.defaultCharset());
Log.d(getClass(), logStr);
try {
logStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
In actual my stream has something like this:
<response>
<result>0</result>
</response>
but log show smth like this mutation :
<<response>
<resultt >0</resullt>
</respoonse>
[and (?) symbol at the end]
Thanks for any help!
You can combine TeeInputStream and Logger.stream():
new TeeInputStream(
yourStream,
Logger.stream(Level.INFO, this)
);
If you want to see log in logcat, try Log.i(String tag, String message); or System.out.println("");. Both of them works. You can also use, Log.d, Log.w and Log.e also.

Extending OutputStream class; write(int) method

So my goal is to implement the write method in the class OutputStream to create a new class NumStream, which basically converts ints to Strings. Here is my sample code:
import java.io.*;
public class NumStream extends OutputStream {
public void write(int c) throws IOException {
// What goes here?
}
public static void main(String[] args) {
NumStream ns = new NumStream();
PrintWriter pw = new PrintWriter(new OutputStreamWriter(ns));
pw.println("123456789 and ! and # ");
pw.flush(); // needed for anything to happen, try taking it out
}
}
I've tried using several different approaches, and my result always results in the program compiling, but when I run it, nothing happens. So far I've tried using switch statements to produce this result:
public void write(int c) throws IOException {
StringBuffer sb = new StringBuffer();
switch (c) {
case 1: sb.append("1");
break;
//etc. through 9
I'm unsure of what to do or try next to produce a result. :/ Any tips to steer me in the right direction?
I had the same problem too, Here is the solution:
public class MyOutputStream extends OutputStream {
StringBuilder anotatedText;
public MyOutputStream() {
// Custom constructor
}
#Override
public void write(int b) {
int[] bytes = {b};
write(bytes, 0, bytes.length);
}
public void write(int[] bytes, int offset, int length) {
String s = new String(bytes, offset, length);
anotatedText.append(s);
}
public void myPrint() {
System.out.println(anotatedText);
}
}
All we need to do is to implement the "write" method correctly which is clearly instructed in the above example.

how to convert PrintWriter to String or write to a File?

I am generating dynamic page using JSP, I want to save this dynamically generated complete page in file as archive.
In JSP, everything is written to PrintWriter out = response.getWriter();
At the end of page, before sending response to client I want to save this page, either in file or in buffer as string for later treatment.
How can I save Printwriter content or convert to String?
To get a string from the output of a PrintWriter, you can pass a StringWriter to a PrintWriter via the constructor:
#Test
public void writerTest(){
StringWriter out = new StringWriter();
PrintWriter writer = new PrintWriter(out);
// use writer, e.g.:
writer.print("ABC");
writer.print("DEF");
writer.flush(); // flush is really optional here, as Writer calls the empty StringWriter.flush
String result = out.toString();
assertEquals("ABCDEF", result);
}
Why not use StringWriter instead? I think this should be able to provide what you need.
So for example:
StringWriter strOut = new StringWriter();
...
String output = strOut.toString();
System.out.println(output);
It will depend on: how the PrintWriter is constructed and then used.
If the PrintWriter is constructed 1st and then passed to code that writes to it, you could use the Decorator pattern that allows you to create a sub-class of Writer, that takes the PrintWriter as a delegate, and forwards calls to the delegate, but also maintains a copy of the content that you can then archive.
public class DecoratedWriter extends Writer
{
private final Writer delegate;
private final StringWriter archive = new StringWriter();
//pass in the original PrintWriter here
public DecoratedWriter( Writer delegate )
{
this.delegate = delegate;
}
public String getForArchive()
{
return this.archive.toString();
}
public void write( char[] cbuf, int off, int len ) throws IOException
{
this.delegate.write( cbuf, off, len );
this.archive.write( cbuf, off, len );
}
public void flush() throws IOException
{
this.delegate.flush();
this.archive.flush();
}
public void close() throws IOException
{
this.delegate.close();
this.archive.close();
}
}
You cannot get it with just your PrintWriter object. It flushes the data, and does not hold any content within itself. This isn't the object you should be looking at to get the entire string,
The best way I think is prepare your response in other object like StringBuffer, and fush its content to the response, and after save the content stored in that variable to the file.
This helped me: for obtaining a SOAP-able object as XML string.
JAXBContext jc = JAXBContext.newInstance(o.getClass());
Marshaller m = jc.createMarshaller();
StringWriter writer = new StringWriter();
m.marshal( o, new PrintWriter(writer) );
return writer.toString();
Along similar lines to what cdc is doing - you can extend PrintWriter and then create and pass around an instance of this new class.
Call getArchive() to get a copy of the data that's passed through the writer.
public class ArchiveWriter extends PrintWriter {
private StringBuilder data = new StringBuilder();
public ArchiveWriter(Writer out) {
super(out);
}
public ArchiveWriter(Writer out, boolean autoFlush) {
super(out, autoFlush);
}
public ArchiveWriter(OutputStream out) {
super(out);
}
public ArchiveWriter(OutputStream out, boolean autoFlush) {
super(out, autoFlush);
}
public ArchiveWriter(String fileName) throws FileNotFoundException {
super(fileName);
}
public ArchiveWriter(String fileName, String csn) throws FileNotFoundException, UnsupportedEncodingException {
super(fileName, csn);
}
public ArchiveWriter(File file) throws FileNotFoundException {
super(file);
}
public ArchiveWriter(File file, String csn) throws FileNotFoundException, UnsupportedEncodingException {
super(file, csn);
}
#Override
public void write(char[] cbuf, int off, int len) {
super.write(cbuf, off,len);
data.append(cbuf, off, len);
}
#Override
public void write(String s, int off, int len) {
super.write(s, off,len);
data.append(s, off, len);
}
public String getArchive() {
return data.toString();
}
}

Record size of objects as they're being serialized?

What's the best way to record the size of certain objects as they are being serialized? For example, once objects of type A, B, C are serialized, record the size of their serialized bytes. We can get the size of the entire object graph via getBytes, but we'd like to break it down as to what are the largest contributors to the overall serialized size.
ObjectOutputStream offers writeObjectOverride, but we don't want to rewrite the serialization process. In simplified terms, we need to be aware of when we encounter a certain object prior to serialization, record the total current byte count, and then after it's serialized, take the difference of byte counts. It seems like encompassing writeSerialData would work, but the method is private.
Ideas?
Thanks.
--- UPDATE ---
The answers/suggestions below are insightful. Below is what I have so far. Let me know your thoughts. Thanks.
// extend to get a handle on outputstream
MyObjectOutputStream extends ObjectOutputStream {
private OutputStream out;
public MyObjectOutputStream(out) {
super(out);
this.out = out;
}
public OutputStream getOut() {
return this.out;
}
}
// counter
public static class CounterOutputStream extends FilterOutputStream {
private int bytesWritten = 0;
...
public int getBytesWritten() {
return this.bytesWritten;
}
public void resetCounter() {
bytesWritten = 0;
}
private void update(int len) {
bytesWritten += len;
}
}
// go serialize
ByteArrayOutputStream out = new ByteArrayOutputStream();
ObjectOutputStream oos = new MyObjectOutputStream(new CounterOutputStream(out, 1024));
// record serialized size of this class; do this for every interested class
public class MyInterestingObject {
...
private void writeObject(ObjectOutputStream out) throws IOException {
CounterOutputStream counter = null;
if (out instanceof MyObjectOutputStream) {
counter = (CounterOutputStream)((MyObjectOutputStream)out).getOut();
counter.resetCounter();
}
// continue w/ standard serialization of this object
out.defaultWriteObject();
if (counter != null) {
logger.info(this.getClass() + " bytes written: " + counter.getBytesWritten());
// TODO: store in context or somewhere to be aggregated post-serialization
}
}
}
The simplest solution would be to wrap the OutputStream you're using with an implementation that will count bytes written.
import java.io.IOException;
import java.io.OutputStream;
public class CountingOutputStream extends OutputStream {
private int count;
private OutputStream out;
public CountingOutputStream(OutputStream out) {
this.out = out;
}
public void write(byte[] b) throws IOException {
out.write(b);
count += b.length;
}
public void write(byte[] b, int off, int len) throws IOException {
out.write(b, off, len);
count += len;
}
public void flush() throws IOException {
out.flush();
}
public void close() throws IOException {
out.close();
}
public void write(int b) throws IOException {
out.write(b);
count++;
}
public int getBytesWritten() {
return count;
}
}
Then you would just use that
CountingOutputStream s = new CountingOutputStream(out);
ObjectOutputStream o = new ObjectOutputStream(s);
o.write(new Object());
o.close();
// s.getBytesWritten()
You could implement Externalizable rather than Serializable on any objects you need to capture such data from. You could then implement field-by-field byte counting in the writeExternal method, maybe by handing off to a utility class. Something like
public void writeExternal(ObjectOutput out) throws IOException
{
super.writeExternal(out);
out.writeUTF(this.myString == null ? "" : this.myString);
ByteCounter.getInstance().log("MyClass", "myString", this.myString);
}
Another hackish way would be to stick with Serializable, but to use the readResolve or writeReplace hooks to capture whatever data you need, e.g.
public class Test implements Serializable
{
private String s;
public Test(String s)
{
this.s = s;
}
private Object readResolve()
{
System.err.format("%s,%s,%s,%d\n", "readResolve", "Test", "s", s.length());
return this;
}
private Object writeReplace()
{
System.err.format("%s,%s,%s,%d\n", "writeReplace", "Test", "s", s.length());
return this;
}
public static void main(String[] args) throws Exception
{
File tmp = File.createTempFile("foo", "tmp");
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(tmp));
Test test = new Test("hello world");
out.writeObject(test);
out.close();
ObjectInputStream in = new ObjectInputStream(new FileInputStream(tmp));
test = (Test)in.readObject();
in.close();
}
}

Categories

Resources