Using a SerialBlob vs byte[]

Using a SerialBlob vs byte[] - java

I am using hibernate to store and retrieve data from a MySQL database. I was using a byte array but came across the SerialBlob class. I can use the class successfully but I cant seem to find any difference between using the SerialBlob and a byte array. Does anyone know the basic differences or possible situations you wish to use a SerialBlob inlue of a byte[] are?

You are right that the SerialBlob is just a thin abstraction around a byte[], but:
Are you working in a team?
Do you sometimes make mistakes?
Are you lazy with writing comments?
Do you sometimes forget what your code from a year ago actually does?
If you anwsered any of the above questions with a yes, you should probably use SerialBlob.
It's basically the same with any other abstraction around a simple data structure (think ByteBuffer, for example) or another class. You want to use it over byte[], because:
It's more descriptive. A byte[] could be some sort of cache, it could be a circular buffer, it could be some sort of integrity checking mechanism gone wrong. But if you use SerialBlob, it's obvious that this is just a blob of binary data from the database / to be stored in the database.
Instead of manual array handling, you use methods on the class, which is, again, easier to read if you don't know the code. Even trivial array manipulation must be comprehended by the reader of your code. A method with a good name is self-descriptive.
This is helpful for your teammates and also for you when you'll read this code in a year.
It's more error proof. Every time you write any new code, there's a good chance you had made a bug in it. It may be not visible at first, but it is probably in there. The SerialBlob code has been tested by thousands of people around the world and it's safe to say that you won't get any bugs associated to it.
Even if you're sure you got your byte array handling right, because it's so straightforward, what if somebody else finds your code in half a year and starts "optimizing" things? What if he reuses an old blob, or messes up with your magic array padding? Every single off-by-one error in index manipulating will corrupt your data and that might not be detected right away (You are writing unit tests, aren't you?).
It restricts you to only a handful of possible interactions. This might actually look like a demerit, but it's not! It ensures you won't be using your blob as a local temporary variable after you're done with it. It ensures you won't try to make a String out of it or anything silly. It makes sure you'll only use it as a blob. Again, clarity and safety.
It's already written and always looks the same. You don't have to write a new implementation for every project, or read ten different implementations in ten different projects. If you'll ever see a SerialBlob in anyone's project, the usage will be clear to you. Everyone uses the same one.
TL; DR: A few years ago (or maybe still in C), using a byte[] would be ok. In Java (and OOP in general), try to use a specific class designed for the job instead of a primitive (low level) structure as it more clearly describes your intents, produces less errors and reduces the length of your code in the long run.

Related

What are some real uses cases for the methods skip and reset in BufferedReader?

I'm trying to find out, what are the methods mark() and reset() of BufferedReader really useful for?
I understand what they are doing, but for going back and forth in some text I never used them - usually I solve this problem by reading either a sequence of chars or the whole line in an array or StringBuilder and go back and forth through it.
I believe there must be some reason why these methods are present in the BufferedReader and other Reader implementations supporting it but I'm unable to make an assumption why.
Does the usage of mark() & reset provide some benefit compared to reading the data in our own array and navigating through it?
I've searched through the codebase of one of my large projects I'm working on (mainly Java backend using Spring Boot), with lots of dependencies on the classpath and the only thing for which the mark & reset methods were used (in only very few libraries) was skipping an optional BOM character at the beginning of a text file. And even for this simple use case, I find it a bit contrived to do it that way.
Also, I was searching for other tutorials and on Stackoverflow (e.g. What are mark and reset in BufferedReader?) and couldn't find any explanation why to actually solve these kinds of problems using mark & reset. All code examples only explain what the methods are doing on "hello world" examples (jumping from one position in the stream back to a previous position for no particular reason). Nowhere I could find any explanation why someone should actually use it among other ways which sound more elegant and aren't really of worse performance.

I haven't used them myself, but a case that springs to mind is where you want to copy the data into a structure that needs to be sized correctly.
When reading streams and copying data into a target data structure (perhaps after parsing it), you always have the problem that you don't know how big to make your target in advance. The mark/rewind feature lets you mark, read the stream, parse it quickly to calculate the size, reset, allocate the memory, and then re-parse copying the data this time. There are of course other ways of doing it (e.g., using your own dynamic buffer), but if your code is already centered around the Reader concept then mark/reset lets you stay with that.
That said, even BufferedReader's own readLine method doesn't use this technique (it creates a StringBuffer internally).

Understanding the importance of serialisation in Java

I was just introduced to the concept of serialisation in Java and while I 'get' the fundamentals, I can't help but feel like it's a bit of an overkill? My logic is that if I have pointers to the objects and I know how many bytes it takes up in memory. Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
The amount of detail my book goes into serialisation is giving me a good indication that I'm not really understanding the importance of this and that there is probably something more subtle than just writing out all the bytes exactly as they are. Any help is greatly appreciated! (I have some background in c++ if that helps)

Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
How could anyone ever read them back in? Say I'm writing code that's supposed to read in your file. Please tell me what the third byte means so that I can decode it properly.
What if the internal representation of the object contains pointers to other objects that might be in different memory locations the next time the program runs? For example, it is quite common to manage identical strings by having internal references to the same internal string object. How will writing that reference to a file be sensible given that the internal string object may not exist in the next run?
To write data to a file, you need to write it out in some specific format that actually contains all the information you need to be able to read back in. What happens to work internally for this program at this time just won't do as there's no guarantee another program at another time can make sense of it.

What you suggest works provided;
the order and type of fields doesn't change. Note this is not set at compile time.
the byte order doesn't change.
you don't have any references eg no String, enum, List or Map.
the name&package of the type doesn't change.
We at Chronicle, use a form of serialization which supports this as it's much faster but it's very limiting. You have to be very aware of those limitations and have a problem which is suitable. We also have a form of serialization which have none of these constraints, but it is slower.
The purpose of Java Serialization is to support arbitrary object graphs even if data is exchanged between systems which might arrange the data differently.

Is there a way to output variables as Java sees them so I can isolate methods for testing or asking for assistance?

I'm learning Java and find myself sending methods around while asking for help but my problem is I have many methods and the data is modified at each method. I often have to send large files when only one area is relevant(it makes my SO questions excessively long as well).
But for some of the stuff I do, I can't get the right data format to be outputted as string that I can input later. For example, if I add data to a list of Points(like this, (new Point(0, 0));) then when I output the results I get something like this(with sample data):
[java.awt.Point[x=970,y=10], java.awt.Point[x=65,y=10], java.awt.Point[x=729,y=10]
I get errors when I assign this to a variable and send it to my method I want to test/show. I basically have two goals:
If I want help on a single method(thats part of a much larger class), I want to be able to send the least amount of code to the person helping me(ideally just the method itself and the inputs..which I'm unable to capture exactly right now).
When I test my code, I would like a way to isolate a method so I don't have to run a large file when all I can about is improving one method.
I am pretty sure I'm not the first person to come across this problem, How can I approach this?
UPDATE: Here's an example,
double[] data = new double[] {.05, .02, -.03, .04, .01};
System.out.println(data); //output is: [D#13fcf0ce
If I make a new variable of this and send it to a method I get errors. I have 30 methods in a class. I want to have a friend help me with one. I'm to avoid sending 29 methods that are irrelevant to the person. So I need a way to capture the data, but printout doesn't seem to capture it in a way I can send to methods.

Java outputs variables in a way that is human-readable (although it depends on the object's toString method). The output of toString is (unsurprisingly) a String. Unless you have a parsing mechanism to turn a string back into the original object, it's a one-way operation.
There should be no need to turn it back into the original object, however. If you're trying to isolate a function and sample data, the easiest thing to do is encapsulate it in a test and some data--there are many different ways to do this and communicate it to someone else.
I'm still unclear on your usecase, however. If it's an SO question, all you should need to do is show the code in question, provide a minimal amount of data that shows the problem, and you're done. This could be done in a self-contained example where you simple create the data in code, as a unit test, or by showing the string output as you've already done.
If you're trying to communicate the issue to a tech support tier, then the best mechanism depends entirely on what they're equipped to handle--they'll tell you if you didn't do it right, believe me.

You can use Debuggers and step over your code. You can 'watch' variables so that you can get their actual value, rather than their toString representation. Debuggers are usually part and parcel with all the major IDE's such as Eclipse, Netbeans and IntelliJ.
As to your questions about isolation and testing, this is much more of a design problem. Ideally your methods should be self contained, reducing coupling. What you could do is to learn to break down your problem into smaller chunks and until it can't be broken down further. Once you do this, you start building methods which tackle each part of the problem seperately.
Once you have your method, you test it on its own (thus reducing the amount of things which can go wrong, as opposed to testing tons of code at once). If you are satisfied, you integrate the method with your code and test again. If something then goes wrong, you will know that your last module is the problem since it broke your system. You can get some more information about this here.

Lightweight way to persist objects in Java

I'm trying to design a lightweight way to store persistent data in Java. I've already got a very efficient way to serialize POJOs to DataOutputStreams (and back), but I'm trying to think of a good way to ensure that changes to the data in the POJOs gets serialized when necessary.
This is for a client-side app where I'm trying to keep the size of the eventual distributable as low as possible, so I'm reluctant to use anything that would pull-in heavy-weight dependencies. Right now my distributable is almost 10MB, and I don't want it to get much bigger.
I've considered DB4O but its too heavy - I need something light. Really its probably more a design pattern I need, rather than a library.
Any ideas?

The 'lightest weight' persistence option will almost surely be simply marking some classes Serializable and reading/writing from some fixed location. Are you trying to accomplish something more complex than this? If so, it's time to bundle hsqldb and use an ORM.
If your users are tech savvy, or you're just worried about initial payload, there are libraries which can pull dependencies at runtime, such as Grape.

If you already have a compact data output format in bytes (which I assume you have if you can persist efficiently to a DataOutputStream) then an efficient and general technique is to use run-length-encoding on the difference between the previous byte array output and the new byte array output.
Points to note:
If the object has not changed, the difference in byte arrays will be an array of zeros and hence will compress very small....
For the first time you serialize the object, consider the previous output to be all zeros so that you communicate a complete set of data
You probably want to be a bit clever when the object has variable-sized substructures....
You can also try zipping the difference rather than RLE - might be more efficient in some cases where you have a large object graph with a lot of changes

In Java: is where a way to create a subarray that will point to a portion of a bigger array?

Learning Java, so be gentle please. Ideally I need to create an array of bytes that will point to a portion of a bigger array:
byte[] big = new byte[1000];
// C-style code starts
load(file,big);
byte[100] sub = big + 200;
// C-style code ends
I know this is not possible in Java and there are two work-arounds that come to mind and would include:
Either copying portion of big into sub iterating through big.
Or writting own class that will take a reference to big + offset + size and implementing the "subarray" through accessor methods using big as the actual underlying data structure.
The task I am trying to solve is to load a file into memory an then gain read-only access to the records stored withing the file through a class. The speed is paramount, hence ideally I'd like to avoid copying or accessor methods. And since I'm learning Java, I'd like to stick with it.
Any other alternatives I've got? Please do ask questions if I didn't explain the task well enough.

Creating an array as a "view" of an other array is not possible in Java. But you could use java.nio.ByteBuffer, which is basically the class you suggest in work-around #2. For instance:
ByteBuffer subBuf = ByteBuffer.wrap(big, 200, 100).slice().asReadOnlyBuffer();
No copying involved (some object creation, though). As a standard library class, I'd also assume that ByteBuffer is more likely to receive special treatment wrt. "JIT" optimizations by the JVM than a custom one.

If you want to read a file fast and with low-level access, check the java nio stuff. Here's an example from java almanac.
You can use a mapped byte buffer to navigate within the file content.

Take a look at the source for java.lang.String (it'll be in the src.zip or src.jar). You will see that they have a an array of cahrs and then an start and end. So, yes, the solution is to use a class to do it.
Here is a link to the source online.
The variables of interest are:
value
offset
count
substring is probably a good method to look at as a starting point.
If you want to read directlry from the file make use of the java.nio.channels.FileChannel class, specifically the map() method - that will let you use memory mapped I/O which will be very fast and use less memory than the copying to arrays.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.