Add flush() into Object serialization

Add flush() into Object serialization - java

I have this serialization and deserialization classes which I use to send objects into Kafka:
public class SaleRequestFactorySerializer implements Serializable, Serializer<SaleRequestFactory> {
#Override
public byte[] serialize(String topic, SaleRequestFactory data)
{
ByteArrayOutputStream out = new ByteArrayOutputStream();
try
{
ObjectOutputStream outputStream = new ObjectOutputStream(out);
outputStream.writeObject(data);
out.close();
}
catch (IOException e)
{
e.printStackTrace();
}
return out.toByteArray();
}
}
public class SaleResponseFactoryDeserializer implements Serializable, Deserializer<SaleRequestFactory> {
#Override
public SaleRequestFactory deserialize(String topic, byte[] data)
{
SaleRequestFactory saleRequestFactory = null;
try
{
ByteArrayInputStream bis = new ByteArrayInputStream(data);
ObjectInputStream in = new ObjectInputStream(bis);
saleRequestFactory = (SaleRequestFactory) in.readObject();
in.close();
}
catch (IOException | ClassNotFoundException e)
{
e.printStackTrace();
}
return saleRequestFactory;
}
}
It's not clear for me do I need to add flush() methods in order to prevent memory leak. Can you guide me what I'm missing into my code?

You don't need to; the garbage collector doesn't magically get confused by resources. It's that the resource they represent may take years to close if the GC never even kicks in because there's no need for it. There's no 'leaking' whatsoever for these.
BAIS, BAOS, OIS, OOS, etc - these don't actually represent any OS resources. Contrast to e.g. FileInputStream, Files.newInputStream, socket.getInputStream(), etc.
Regardless of whether it represents a resource or not, closing any stream will flush as part of the operation.
You can improve this code considerably by ditching OIS and OOS; you can do some web searching to find that almost everybody, notably including most of the java language designers over at oracle itself, don't particularly like java's built-in serialization. The protocol is tightly bound up with java (so, it's not possible to document the scheme without saying: "Well, uh, take this class file, load up a JVM, deserialize it there, otherwise, no way to read this thing", which is not great. It's also not a particularly efficient algorithm, and produces rather large blobs of binary data. Look instead into serialization schemes which don't use java's built in mechanism, such as Jackson or GSON, which serialize objects into JSON and back out. JSON isn't an efficient format either, but at least it's easy to read with eyeballs, or from other languages, and it is easily specced.
If you want as much performance as you can squeeze out, write a protobuf based serializer.
If you want to stick with this code instead, note that e.printStackTrace() is the worst way to handle an exception. An exception happened - your choice is to dump part of the error into System.err, beyond the reach of loggers and completely invisible on most server deployments, the rest into the void, and then keep on going like nothing went wrong, returning empty byte arrays, most likely causing either wild goose chases (with your app 'silently doing nothing', and you wondering what's going on), or a cascade of errors (Where the weird state you're returning, such as empty arrays, causes other errors, which, if you handle them the same way, cause still other errors, resulting in a single problem sending a flurry of hundreds of cascading stack traces to syserr, all of them except the first utterly irrelevant). It also means you have to dirty up your code - for example, you tend to run into an issue where the compiler demands you initialize a variable before using it - even though you always set that variable, unless exceptions occur.
The right way to handle an exception is to actually handle it (and logging it is not handling it). If that's not an option, then the right way is to throw an exception. Give callers a chance. If you can't throw the exception straight onwards, wrap it. If you can't do that, wrap it in a runtimeexception. Thus, the ¯\(ツ)/¯ I have no clue and I don't want to be bothered line of code is not e.printStackTrace() but throw new RuntimeException("Unhandled", e); - this has none of these problems: It lists ALL the info about the error, it will not cause execution to continue with weird data returned, and log frameworks have a chance to see this stuff.

Related

What is the use case for null(Input/Output)Stream API in Java?

With Java 11, I could initialize an InputStream as:
InputStream inputStream = InputStream.nullInputStream();
But I am unable to understand a potential use case of InputStream.nullInputStream or a similar API for OutputStream
i.e. OutputStream.nullOutputStream.
From the API Javadocs, I could figure out that it
Returns a new InputStream that reads no bytes. The returned stream is
initially open. The stream is closed by calling the close() method.
Subsequent calls to close() have no effect. While the stream is open,
the available(), read(), read(byte[]), ...
skip(long), and transferTo() methods all behave as if end of stream
has been reached.
I went through the detailed release notes further which states:
There are various times where I would like to use methods that require
as a parameter a target OutputStream/Writer for sending output, but
would like to execute those methods silently for their other effects.
This corresponds to the ability in Unix to redirect command output to
/dev/null, or in DOS to append command output to NUL.
Yet I fail to understand what are those methods in the statement as stated as .... execute those methods silently for their other effects. (blame my lack of hands-on with the APIs)
Can someone help me understand what is the usefulness of having such an input or output stream with a help of an example if possible?
Edit: One of a similar implementation I could find on browsing further is apache-commons' NullInputStream, which does justify the testing use case much better.

Sometimes you want to have a parameter of InputStream type, but also to be able to choose not to feed your code with any data. In tests it's probably easier to mock it but in production you may choose to bind null input instead of scattering your code with ifs and flags.
compare:
class ComposableReprinter {
void reprint(InputStream is) throws IOException {
System.out.println(is.read());
}
void bla() {
reprint(InputStream.nullInputStream());
}
}
with this:
class ControllableReprinter {
void reprint(InputStream is, boolean for_real) throws IOException {
if (for_real) {
System.out.println(is.read());
}
}
void bla() {
reprint(new BufferedInputStream(), false);
}
}
or this:
class NullableReprinter {
void reprint(InputStream is) throws IOException {
if (is != null) {
System.out.println(is.read());
}
}
void bla() {
reprint(null);
}
}
It makes more sense with output IMHO. Input is probably more for consistency.
This approach is called Null Object: https://en.wikipedia.org/wiki/Null_object_pattern

I see it as a safer (1) and more expressive (2) alternative to initialising a stream variable with null.
No worries about NPEs.
[Output|Input]Stream is an abstraction. In order to return a null/empty/mock stream, you had to deviate from the core concept down to a specific implementation.

I think nullOutputStream is very easy and clear: just to discard output (similar to > /dev/null) and/or for testing (no need to invent an OutputStream).
An (obviously basic) example:
OutputStream out = ... // an easy way to either print it to System.out or just discard all prints, setting it basically to the nullOutputStream
out.println("yeah... or not");
exporter.exportTo(out); // discard or real export?
Regarding nullInputStream it's probably more for testing (I don't like mocks) and APIs requiring an input stream or (this now being more probable) delivering an input stream which does not contain any data, or you can't deliver and where null is not a viable option:
importer.importDocument("name", /* input stream... */);
InputStream inputStream = content.getInputStream(); // better having no data to read, then getting a null
When you test that importer, you can just use a nullInputStream there, again instead of inventing your own InputStream or instead of using a mock. Other use cases here rather look like a workaround or misuse of the API ;-)
Regarding the return of an InputStream: that rather makes sense. If you haven't any data you may want to return that nullInputStream instead of null so that callers do not have to deal with null and can just read as they would if there was data.
Finally, these are just convenience methods to make our lifes easier without adding another dependency ;-) and as others already stated (comments/answers), it's basically an implementation of the null object pattern.
Using the null*Stream might also have the benefit that tests are executed faster... if you stream real data (of course... depending on size, etc.) you may just slow down your tests unnecessarily and we all want tests to complete fast, right? (some will put in mocks here... well...)

Is put-ing to a ByteBuffer then writing it to a file more efficient than writing the individual field

I want to write ONLY the values of the data members of an object into a file, so here I can can't use serialization since it writes a whole lot other information which i don't need. Here's is what I have implemented in two ways. One using byte buffer and other without using it.
Without using ByteBuffer:
1st method
public class DemoSecond {
byte characterData;
byte shortData;
byte[] integerData;
byte[] stringData;
public DemoSecond(byte characterData, byte shortData, byte[] integerData,
byte[] stringData) {
super();
this.characterData = characterData;
this.shortData = shortData;
this.integerData = integerData;
this.stringData = stringData;
}
public static void main(String[] args) {
DemoSecond dClass= new DemoSecond((byte)'c', (byte)0x7, new byte[]{3,4},
new byte[]{(byte)'p',(byte)'e',(byte)'n'});
File checking= new File("c:/objectByteArray.dat");
try {
if (!checking.exists()) {
checking.createNewFile();
}
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
bo.write(dClass.characterData);
bo.write(dClass.shortData);
bo.write(dClass.integerData);
bo.write(dClass.stringData);
// POINT B
bo.close();
} catch (FileNotFoundException e) {
System.out.println("FNF");
e.printStackTrace();
} catch (IOException e) {
System.out.println("IOE");
e.printStackTrace();
}
}
}
Using byte buffer: One more thing is that the size of the data members will always remain fixed i.e. characterData= 1byte, shortData= 1byte, integerData= 2byte and stringData= 3byte. So the total size of this class is 7byte ALWAYS
2nd method
// POINT A
FileOutputStream bo = new FileOutputStream(checking);
ByteBuffer buff= ByteBuffer.allocate(7);
buff.put(dClass.characterData);
buff.put(dClass.shortData);
buff.put(dClass.integerData);
buff.put(dClass.stringData);
bo.write(buff.array());
// POINT B
I want know which one of the two methods is more optimized? And kindly give the reason also.
The above class DemoSecond is just a sample class.
My original classes will be of size 5 to 50 bytes. I don't think here size might be the issue.
But each of my classes is of fixed size like the DemoSecond
Also there are so many files of this type which I am going to write in the binary file.
PS
if I use serialization it also writes the word "characterData", "shortData", "integerData","stringData" also and other information which I don't want to write in the file. What I am corcern here is about THEIR VALUES ONLY. In case of this example its:'c', 7, 3,4'p','e','n'. I want to write only this 7bytes into the file, NOT the other informations which is USELESS to me.

As you are doing file I/O, you should bear in mind that the I/O operations are likely to be very much slower than any work done by the CPU in your output code. To a first approximation, the cost of I/O is an amount proportional to the amount of data you are writing, plus a fixed cost for each operating system call made to do the I/O.
So in your case you want to minimise the number of operating system calls to do the writing. This is done by buffering data in the application, so the application performs few put larger operating system calls.
Using a byte buffer, as you have done, is one way of doing this, so your ByteBuffer code will be more efficient than your FileOutputStream code.
But there are other considerations. Your example is not performing many writes. So it is likely to be very fast anyway. Any optimisation is likely to be a premature optimisation. Optimisations tend to make code more complicated and harder to understand. To understand your ByteBuffer code a reader needs to understand how a ByteBuffer works in addition to everything they need to understand for the FileOutputStream code. And if you ever change the file format, you are more likely to introduce a bug with the ByteBuffer code (for example, by having a too small a buffer).
Buffering of output is commonly done. So it should not surprise you that Java already provides code to help you. That code will have been written by experts, tested and debugged. Unless you have special requirements you should always use such code rather than writing your own. The code I am referring to is the BufferedOutputStream class.
To use it simply adapt your code that does not use the ByteBuffer, by changing the line of your code that opens the file to
OutputStream bo = new BufferedOutputStream(new FileOutputStream(checking));

The two methods differ only in the byte buffer allocated.
If you are concerning about unnecessary write action to file, there is already a BufferedOutputStream you can use, for which buffer is allocated internally, and if you are writing to same outputstream multiple times, it is definitely more efficient than allocating buffer every time manually.

It would be simplest to use a DataOutputStream around a BufferedOutputStream around the FileOutputStream.
NB You can't squeeze 'shortData' into a byte. Use the various primitives of DataOutputStream, and use the corresponding ones of DataInputStream when reading them back.

Reading and writing objects via GZIP streams?

I am new to Java. I want to learn to use GZIPstreams. I already have tried this:
ArrayList<SubImage>myObject = new ArrayList<SubImage>(); // SubImage is a Serializable class
ObjectOutputStream compressedOutput = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(
new File("....")))));
compressedOutput.writeObject(myObject);
and
ObjectInputStream compressedInput = new ObjectInputStream(
new BufferedInputStream(new GZIPInputStream(new FileInputStream(
new File("....")))));
myObject=(ArrayList<SubImage>)compressedInput.readObject();
When the program writes myObject to a file without throwing any exception, but when it reaches the line
myObject=(ArrayList<SubImage>)compressedInput.readObject();
it throws this exception:
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
How can I solve this problem?

You have to flush and close your outputstream. Otherwhise, at least, the BufferedOutputStream will not write everything to the file (it does in big chucks to avoid penalizing performance).
If you call compressedOutput.flush() and compressedOutput.close() it will suffice.
You can try writing a simple string object and checking if the file is well written.
How? If you write a xxx.txt.gz file you can open it with your preferred zip app and look at the xxx.txt. If the app complains, then the content is not full written.
Extended answer to a comment: compressing even more the data
Changing serialization
You could change the standard serialization of SubImage object if it's an object of your own. Check java.io.Serializable javadoc to know how to do it. It's pretty straightforward.
Writing just what you need
Serialization has the drawback that needs to write "it's a SubImage" just before every instance you write. It's not necessary if you know what's going to be there beforehand. So you could try to serialize it more manually.
To write your list, instead of writing an object write directly the values that conform your list. You will need just a DataOutputStream (but ObjectOutputStream is a DOS so you can use it anyway).
dos.writeInt(yourList.size()); // tell how many items
for (SubImage si: yourList) {
// write every field, in order (this should be a method called writeSubImage :)
dos.writeInt(...);
dos.writeInt(...);
...
}
// to read the thing just:
int size = dis.readInt();
for (int i=0; i<size; i++) {
// read every field, in the same order (this should be a method called readSubImage :)
dis.readInt(...);
dis.readInt(...);
...
// create the subimage
// add it to the list you are recreating
}
This method is more manual but if:
you know what's going to be written
you will not need this kind of serialization for many types
it's pretty affordable and definitively more compressed than the Serializable counterpart.
Have in mind that there are alternative frameworks to serialize objects or create string messages (XStream for xml, Google Protocol Buffers for binary messages, and so on). That frameworks could work directly to binary or writing a string that could be then written.
If your app will need more on this, or just curious, maybe you should look at them.
Alternative serialization frameworks
Just looked in SO and found several questions (and answers) addressing this issue:
https://stackoverflow.com/search?q=alternative+serialization+frameworks+java
I've found that XStream is pretty easy and straightforward to use. And JSON is a format pretty readable and succint (and Javascript compatible which could be a plus :).
I should go for:
Object -> JSON -> OutputStreamWriter(UTF-8) -> GZippedOutputStream -> FileOutputStream

Can objects be buffered during java serialization?

I have a very large object which I wish to serialize. During the process of serialization, it comes to occupy some 130MB of heap as an weblogic.utils.io.UnsyncByteArrayOutputStream. I am using a BufferedOutputStream to speed up writing the data to disk, which reduces the amount of time for which this object is held in memory.
Is it possible to use a buffer to reduce the size of the object in memory though? It would be good if there was a way to serialize it x bytes at a time and write those bytes to disk.
Sample code follows if it is of any use. There's not much to go on though I don't think. If it's the case that there needs to be a complete in-memory copy of the object to be serialised (and therefore no concept of a serialization buffer) then I suppose I am stuck.
ObjectOutputStream tmpSerFileObjectStream = null;
OutputStream tmpSerFileStream = null;
BufferedOutputStream bufferedStream = null;
try {
tmpSerFileStream = new FileOutputStream(tmpSerFile);
bufferedStream = new BufferedOutputStream(tmpSerFileStream);
tmpSerFileObjectStream = new ObjectOutputStream(bufferedStream);
tmpSerFileObjectStream.writeObject(siteGroup);
tmpSerFileObjectStream.flush();
} catch (InvalidClassException invalidClassEx) {
throw new SiteGroupRepositoryException(
"Problem encountered with class being serialised", invalidClassEx);
} catch (NotSerializableException notSerializableEx) {
throw new SiteGroupRepositoryException(
"Object to be serialized does not implement " + Serializable.class,
notSerializableEx);
} catch (IOException ioEx) {
throw new SiteGroupRepositoryException(
"Problem encountered while writing ser file", ioEx);
} catch (Exception ex) {
throw new SiteGroupRepositoryException(
"Unexpected exception encountered while writing ser file", ex);
} finally {
if (tmpSerFileObjectStream != null) {
try {
tmpSerFileObjectStream.close();
if(null!=tmpSerFileStream)tmpSerFileStream.close();
if(null!=bufferedStream)bufferedStream.close();
} catch (IOException ioEx) {
logger.warn("Exception caught on trying to close ser file stream", ioEx);
}
}
}

This is wrong on so many levels. This is a massive abuse of serialization. Serialization is mostly intended for temporarily storing an object. For example,
session objects between tomcat server restarts.
transferring objects between jvms ( load balancing at website )
Java's serialization makes no effort to handle long-term storage of objects (No versioning support) and may not handle large objects well.
For something so big, I would suggest some investigation first:
Ensure that you are not trying to persist the entire JVM Heap.
Look for member variables that can be labeled as 'transient' to avoid including them it the serialization ( perhaps you have references to service objects )
Consider possibility that there is a memory leak and the object is excessively large.
If everything is indeed correct, you will have to research alternatives to java.io.Serialization. Taking more control via java.io.Externalization might work. But I would suggest something like a json or xml representation.
Update:
Investigate :
google's protocol buffer
facebook's Thrift
Avro
Cisco's Etch
Take a look at this benchmarkings as well.

What is the "siteGroup" object that you're trying to save? I ask, because it's unlikely that any one object is 130MB in size, unless it has a ginormous list/array/map/whatever in it -- and if that's the case, the answer would be to persist that data in a database.
But if there's no monster collection in the object, then the problem is likely that the object tree contains references to a bagillion objects, and the serialization of course does a deep copy (this fact has been used as a shortcut to implement clone() a lot of times), so everything gets cataloged all at once in a top-down fashion.
If that's the problem, then the solution would be to implement your own serialization scheme where each object gets serialized in a bottom-up fashion, possibly in multiple files, and only references are maintained to other objects, instead of the whole thing. This would allow you to write each object out individually, which would have the effect you're looking for: smaller memory footprint due to writing the data out in chunks.
However, implementing your own serialization, like implementing a clone() method, is not all that easy. So it's a cost/benefit thing.

It sounds like whatever runtime you are using has a less-than-ideal implementation of object serialization that you likely don't have any control over.
A similar complaint is mentioned here, although it is quite old.
http://objectmix.com/weblogic/523772-outofmemoryerror-adapter.html
Can you use a newer version of weblogic? Can you reproduce this in a unit test? If so, try running it under a different JVM and see what happens.

I don't know about weblogic (that is - JRockit I suppose) serialization in particular: honestly I see no reason for using ByteArrayOutputStreams...
You may want to implement java.io.Externalizable if you need more control on how your object is serialized - or switch to an entirely different serialization system (eg: Terracotta) if you don't want to write read/write methods yourself (if you have many big classes).

Why does it occupy all those bytes as an unsync byte array output stream?
That's not how default serialization works. You must have some special code in there to make it do that. Solution: don't.

Should java try blocks be scoped as tightly as possible?

I've been told that there is some overhead in using the Java try-catch mechanism. So, while it is necessary to put methods that throw checked exception within a try block to handle the possible exception, it is good practice performance-wise to limit the size of the try block to contain only those operations that could throw exceptions.
I'm not so sure that this is a sensible conclusion.
Consider the two implementations below of a function that processes a specified text file.
Even if it is true that the first one incurs some unnecessary overhead, I find it much easier to follow. It is less clear where exactly the exceptions come from just from looking at statements, but the comments clearly show which statements are responsible.
The second one is much longer and complicated than the first. In particular, the nice line-reading idiom of the first has to be mangled to fit the readLine call into a try block.
What is the best practice for handling exceptions in a funcion where multiple exceptions could be thrown in its definition?
This one contains all the processing code within the try block:
void processFile(File f)
{
try
{
// construction of FileReader can throw FileNotFoundException
BufferedReader in = new BufferedReader(new FileReader(f));
// call of readLine can throw IOException
String line;
while ((line = in.readLine()) != null)
{
process(line);
}
}
catch (FileNotFoundException ex)
{
handle(ex);
}
catch (IOException ex)
{
handle(ex);
}
}
This one contains only the methods that throw exceptions within try blocks:
void processFile(File f)
{
FileReader reader;
try
{
reader = new FileReader(f);
}
catch (FileNotFoundException ex)
{
handle(ex);
return;
}
BufferedReader in = new BufferedReader(reader);
String line;
while (true)
{
try
{
line = in.readLine();
}
catch (IOException ex)
{
handle(ex);
break;
}
if (line == null)
{
break;
}
process(line);
}
}

The basic premise here is false: the size of a try block makes no difference in performance. Performance is affected by actually raising exceptions at runtime, and that's independent of the size of the try block.
However, keeping try blocks small can lead to better programs.
You might catch exceptions to recover and proceed, or you might catch them simply to report them to the caller (or to a human, via some UI).
In the first case, failures from which you can recover are often very specific, and this leads to smaller try blocks.
In the second case, where an exception is caught so that it can be wrapped by another exception and re-thrown, or displayed to the user, small try blocks mean that you know more precisely which operation failed, and the higher-level context in which that call was made. This allows you to create more specific error reports.
Of course, there are… exceptions (sorry!) to these guidelines. For example, in some cases very specific error reports could be a security problem.
It might be useful to know what effect a try block has on the compiled code. It doesn't change the compiled instructions at all! (Of course, the corresponding catch block does, since it's like any other code.)
A try block creates an entry in the exception table associated with the method. This table has a range of source instructions counters, an exception type, and a destination instruction. When an exception is raised, this table is examined to see if there is an entry with a matching type, and a range that includes the instruction that raised the exception. If it does, execution branches to the corresponding destination number.
The important thing to realize is that this table isn't consulted (and has no effect on running performance) unless it's needed. (Neglecting a little overhead in the loading of the class.)

I've been told that there is some overhead in using the Java try-catch mechanism.
Absolutely. And there's overhead to method calls, too. But you shouldn't put all your code in one method.
Not to toot the premature optimization horn, but the focus should be on ease of reading, organization, etc. Language constructs rarely impact performance as much as system organization and choice of algorithms.
To me, the first is easiest to read.

No. The only thing that you should be considering is where you can reasonably handle the exception and what resources you need to reclaim (with finally).

This is premature optimization at its worst. Don't do it.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Knuth.

there is very very little benefit to the 2nd method. after all if you can successfully open a file but not read from it, then there is something very wrong with your computer. thus knowing that the io exception came from the readLine() method is very rarely useful. also as you know, different exceptions are thrown for different problems anyway (FileNotFoundException, etc)
as long as you scope it with a 'logical' block, ie opening, reading, and closing a file in 1 go, i would go with the first method. it's much simpler to read and, especially when dealing with IO, the processor cycles used by the try-catch overhead would be minimal if any.

Putting the try blocks around the specific code that may throw an exception, makes it, in my opinion easier to read. You're likely to want to display a different message for each error and provide instructions to the user, which will be different depending on where the error occurs.
However, the performance issue that most people refer to is related to raising the exception, not to the try block itself.
In other words, as long as you never have an error raised, the try block won't noticeably affect performance. You shouldn't consider a try block just another flow control construct and raise an error to branch through your code. That's what you want to avoid.

The second method will generate a compiler error that reader may not have been initialized. You can get around that by initializing it to null, but that just means you could get an NPE, and there's no advantage to that.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.