Casting vs Parsing vs Serialization in Java: What are the differences? - java

These 3 terms deal with the conversion from one form to another, which seems similar and confusing. In general, which unique features make them distinct? Under what situation(s) what should each be used?

They are similar in that all 3 deal in converting data from one representation to another one (almost, casting reference types is a bit special).
1. Casting
In Java casting does two different things, depending on whether you're casting references or primitive values:
casting a reference simply changes the type of the reference, it does not change anything about the Object. For example:
Object a = "a string constant";
String b = (String) a;
After running this code both a and b will point to the exact same object (of type String representing the value "a string constant"). The difference is just that a is a Object type reference and b is a String type reference. This limits what you can call (so a.length() won't work, but b.length() will work).
Casting a reference type will only succeed when the object being referenced is actually of a compatible type. So if o was initialized as new Object() in the code block above, then the cast on the second line would fail with a ClassCastException.
casting a primitive type does potentially change the value in question, depending on the range and resolution of the target type:
int i = 1000;
char c = (char) i;
byte b = (byte) i;
Here the int value 1000 is cast both to char and to byte. The first cast just leaves c equal to 1000. But byte can't hold the value 1000, so it will be truncated to -24.
2. Parsing
Parsing is about converting textual data to a more specific representation. The simplest example of parsing is something like this:
String s = "1000";
int i = Integer.parseInt(s);
s holds the textual representation of the number 1000, i.e. the Unicode characters U+0031 U+0030 U+0030 U+0030. Integer.parseInt takes that text representation and converts it into an int type.
However, parsing can describe a wide variety of processes ranging from simple ones as above, slightly more complex ones like parsing a decimal number or date up to arbitrarily complex object trees.
As an example: The Java compiler will parse the Java source code and convert it into an internal representation that will then be further processed.
According to some definitions parsing can also apply to non-text inputs, as long as the input is some set of symbols (which could just be bytes), but that interpretation is rather rare.
3. Serialization
Serialization is the process of turning data or program state into something that can easily be stored or transferred. Usually that means into a byte stream (or more directly, a byte[]).
Similarly to parsing, serialization can apply to very simple one-value transformations down to serializing whole object trees and writing them to files.
In Java Serialization usually refers to the mechanism surrounding ObjectOutputStream and ObjectInputStream, but the term is also used to describe the general concept (i.e. other formats can also be described as "serialization").

Related

How to distinguish long and double-values when deserializing with moshi?

My goal is to synchronize abitrary rows of data by using the JSON-Format.
As I do not know the exact scheme for the rows (it is a general sync method), my datamodel apparently has to rely on "Object". So in Java I will have an array of Map<String,Object> to be synchronized with the server.
Translating such a row into JSON would give something like
{{"string":"stringvalue"},{"double1":1234.567},{"double2":1234.0},{"long":1234}}
so far, so good - no problem with moshi - everything works as expected.
Now the Problem: When I try to deserialize that JSON with moshi, I get back a double-value for the "long" member. Moshi converts all numbers to Doubles. But unfortunately not all numbers can be safely converted to doubles. Very big integers (aka longs) have a problem with the limited precision of doubles. And rounding-effects also might exist.
I opened an issue with moshi, but unfortunately that was closed. Maybe I wasn't clear enough. (Issue 192)
JSON has no concept of integer - only numbers and Strings. But the subtle detail from "double2" from the example above might lead to a solution for my problem:
If a number does not contain a decimal-point, it is an integer and should be converted to a long.
As longs can not be losslessly converted to doubles, I need a method to intercept the parser before the value is converted to double. But how to do that?
Moshi has this handy concept of JsonAdapters - but unfortunately I currently do not see how I can use them in this case:
The input-type of such an JsonAdapter would have to be Object because I can not cast a generated double to long. So I have to intercept the parser before he converts any value.
But how to return more than one type from there? (I would have to return String, Double or Long from there - or if I can limit the inputs to only numbers I would at least have to return Longs or Doubles.)
(My backend is written in PHP and automatically produces the desired output: Integers are written without a decimal-point.)
I am afraid it's not possible without changing Moshi source code. The JSON string source passes through JsonReader which converts all numbers to double. I could not find a way how to alter this behavior since all subclasses are package-protected.

Vertx: My JsonArray consisting of Long values becomes suddenly a mix of Integer- and Long-Values?

I have following JsonArray consisting of Long Values:
[1234567873,852369471,9517,789 ,4826,96127435]
Long , Long ,Long,Long,Long, Long
After sent over the eventbus there is a JsonArray consisting of Integers and Longs:
[1234567873,852369471,9517 ,789 ,4826 ,96127435]
Long , Long ,Integer,Integer,Integer,Long
Obiously Vertx downsize small number-Longs to save memory - that's why I got the Cannot cast from Integer to Long-ClassCastException when I try following Code:
List<Long> collect = jsonArray.stream().map(element -> (Long) element).collect(Collectors.toList());
In contrast to that following Codeline works:
for (int jsonArrayIndex = 0; jsonArrayIndex < jsonArray.size(); jsonArrayIndex++) {
Long longValue = jsonArray.getLong(jsonArrayIndex);
}
How this can work?
JSON does not have a concept of Long or Integer, only the much more general concept of number (see the JSON spec at json.org). Therefore it is up to your code to decide what object (i.e. Integer or Long) to parse a particular JSON number into.
By using jsonArray.stream() (which returns a Stream<Object>) you are allowing the Vertx library to decide which object to parse each element into. In this case, it chooses the most appropriate type for each element one by one, i.e. some get parsed into a Long and some into an Integer. If you had any numbers like 1.5, these would likely be parsed into Double (you would need to check this though).
However, by using the jsonArray.getLong() method, you are telling Vertx that you have decided what object to parse each element into (a Long) a therefore it will not try to be clever by selecting the most appropriate type for each element. If you had any number like 1.5 here, this method would likely throw an exception (again, you would have to check this).
Even tho you can't cast from Integer to Long (because of class hierarchy, Integer would have to extend Long at it doesn't), you can convert from int to long, that's probably what this method does behind the scenes.

why do i have to use Integer.parseInt?

I am new to Java so forgive me if this is a silly question.
First I did the following (this is a section from my code):
public static void team()
{
int score = JOptionPane.showInputDialog("Enter the team score...");
calculate(score);
}
But I kept getting an error saying: "Incompatible types string cannot be converted to int".
So I thought I may need to use parsing. I did the following and it worked:
public static void team()
{
int myScore;
String score = JOptionPane.showInputDialog("Enter the team score...");
myScore = Integer.parseInt(score);
calculate(myScore);
}
I would like to know why there is a need for me to parse the variable "score"? In the first piece of code, I declared the variable score as an integer, so would it not store the input which I expect to be an Integer in the variable score which is then passed into the method calculate. I understand that parsing is when you convert a String into an Integer. However in this case, I do not expect the user to input a string so why is there a need for parsing?
The simple reason is: because JOptionPane.showInputDialog() returns a String, regardless of what you actually want the user to input.
There's no mechanism in Java to do automatic conversion based on the type of your destination variable myScore (though some other languages can do this, e.g. implicit conversions in Scala).
You could (in principle) create a custom dialog that returns an int , e.g. by getting the user to choose from a pulldown list of numbers, and then no parsing would be needed. But otherwise, you have to do parsing explicitly.
The JOptionPane.showInputDialog("Enter the team score..."); returns a String which you tried to store in a int. That is why you are getting the "incompatible types".
See Javadocs for more information.
Because a string is stored differently than an integer. Integers are stored in binary, while as a String is a collection of characters.
Example:
The number 14 would be stored as 1110 in Binary. However, as a String, it's stored as 00110001 00110100
Take a look at this http://www.roubaixinteractive.com/PlayGround/Binary_Conversion/The_Characters.asp
Integer.parseInt() basically does the dirty work for you, by looking up the characters in a lookup table, and as long as they correlate with proper integer values, it coverts it over to binary form, to be stored in an integer.

Can all tag numbers be extracted from a given protobuf serialization?

Given a protobuf serialization is it possible to get a list of all tag numbers that are in the message? Generally is it possible to view the structure of the message without the defining .proto files?
Most APIs will indeed have some form of reader-based API that allows you to enumerate a raw protobuf stream. However, that by itself is not enough to fully understand the data, since without the schema the interpretation is ambiguous:
a varint could be zig-zag encoded (sint32/sint64), or not (int32/int64/uint32/uint64) - radically changing the meaning, or a boolean, or an enum
a fixed-32/fixed-64 could be a signed or unsigned integer, or could be an IEEE754 float/double
a length-prefixed chunk could be a UTF-8 string, a BLOB, a sub-message, or a "packed" repeated set of primitives; if it is a sub-message, you'll have to repeat recursively
So... yes and no. Certainly you can get the field numbers of the outermost message.
Another approach would be to use the regular API against a type with no members (message Naked {}), and then query the unexpected data (i.e. all of it) via the "extension" API that many implementations provide.
You can get all the tag numbers which appear in one particular message, but you won't get any nested messages - and you won't know the types of those fields, only some subset of possible types.
If you look at the wire encoding, you can see that (for example) byte arrays, strings and nested messages are all encoded the same way - so you may know that "field 3 is a length-prefixed binary field" but you won't know whether that means it's a nested message, a string or a byte array.

Java: String.substring() with long type parameters

I have a large string (an RSS Article to be more precise) and I want to get the word in a specific startIndex and endIndex. String provides the substring method, but only using ints as its parameters. My start and end indexes are of type long.
What is the best way to get the word from a String using start and end indexes of type long?
My first solution was to start trimming the String and get it down so I can use ints. Didn't like where it was going. Then I looked at Apache Commons Lang but didn't find anything. Any good solutions?
Thank you.
Update:
Just to provide a little more information.
I am using a tool called General Architecture for Text Engineering (GATE) which scans a String and returns a list of Annotations. An annotation holds a type of a word (Person, Location, etc) and the start and end indexes of that word .
For the RSS, I use ROME, which reads an RSS feed and contains the body of the article in a String.
There is no point doing this on a String because a String can hold at 2^31 - 1 characters. Internally the string's characters are held in a char[], and all of the API methods use int as the type for lengths, positions and offsets.
The same restriction applied to StringBuffer or StringBuilder; i.e. an int length.
A StringReader is backed by a String, so that won't help.
Both CharBuffer and ByteBuffer have the same restriction; i.e. an int length.
A bare array of a primitive type is limited to an int length.
In short, you are going to have to implement your own "long string" type that internally holds its characters in (for example) an array of arrays of characters.
(I tried a Google search but I couldn't spot an existing implementation of long strings that looked credible. I guess there's not a lot of call for monstrously large strings in Java ...)
By the way, if you anticipate that the strings are never going to be this large, you should just convert your long offsets to int. A cast would work, but you might want to check the range and throw an exception if you ever get an offset >= 2^31.
A String is backed by a char[], and arrays can only be indexed with ints (and can consequently only hold 231 characters). If you have long indexes, just cast them to ints - if they're larger than Integer.MAX_VALUE, your program is broken.
You'd better use a java.io.Reader. This class supports the methods skip(long n) and read(char[] cbuf). But please note they return a long (how many bytes were skipped / read), so you need to call those methods in a loop.
Probably it would be better not to use String but StringReader.

Categories

Resources