How to distinguish long and double-values when deserializing with moshi? - java

My goal is to synchronize abitrary rows of data by using the JSON-Format.
As I do not know the exact scheme for the rows (it is a general sync method), my datamodel apparently has to rely on "Object". So in Java I will have an array of Map<String,Object> to be synchronized with the server.
Translating such a row into JSON would give something like
{{"string":"stringvalue"},{"double1":1234.567},{"double2":1234.0},{"long":1234}}
so far, so good - no problem with moshi - everything works as expected.
Now the Problem: When I try to deserialize that JSON with moshi, I get back a double-value for the "long" member. Moshi converts all numbers to Doubles. But unfortunately not all numbers can be safely converted to doubles. Very big integers (aka longs) have a problem with the limited precision of doubles. And rounding-effects also might exist.
I opened an issue with moshi, but unfortunately that was closed. Maybe I wasn't clear enough. (Issue 192)
JSON has no concept of integer - only numbers and Strings. But the subtle detail from "double2" from the example above might lead to a solution for my problem:
If a number does not contain a decimal-point, it is an integer and should be converted to a long.
As longs can not be losslessly converted to doubles, I need a method to intercept the parser before the value is converted to double. But how to do that?
Moshi has this handy concept of JsonAdapters - but unfortunately I currently do not see how I can use them in this case:
The input-type of such an JsonAdapter would have to be Object because I can not cast a generated double to long. So I have to intercept the parser before he converts any value.
But how to return more than one type from there? (I would have to return String, Double or Long from there - or if I can limit the inputs to only numbers I would at least have to return Longs or Doubles.)
(My backend is written in PHP and automatically produces the desired output: Integers are written without a decimal-point.)

I am afraid it's not possible without changing Moshi source code. The JSON string source passes through JsonReader which converts all numbers to double. I could not find a way how to alter this behavior since all subclasses are package-protected.

Related

Casting vs Parsing vs Serialization in Java: What are the differences?

These 3 terms deal with the conversion from one form to another, which seems similar and confusing. In general, which unique features make them distinct? Under what situation(s) what should each be used?
They are similar in that all 3 deal in converting data from one representation to another one (almost, casting reference types is a bit special).
1. Casting
In Java casting does two different things, depending on whether you're casting references or primitive values:
casting a reference simply changes the type of the reference, it does not change anything about the Object. For example:
Object a = "a string constant";
String b = (String) a;
After running this code both a and b will point to the exact same object (of type String representing the value "a string constant"). The difference is just that a is a Object type reference and b is a String type reference. This limits what you can call (so a.length() won't work, but b.length() will work).
Casting a reference type will only succeed when the object being referenced is actually of a compatible type. So if o was initialized as new Object() in the code block above, then the cast on the second line would fail with a ClassCastException.
casting a primitive type does potentially change the value in question, depending on the range and resolution of the target type:
int i = 1000;
char c = (char) i;
byte b = (byte) i;
Here the int value 1000 is cast both to char and to byte. The first cast just leaves c equal to 1000. But byte can't hold the value 1000, so it will be truncated to -24.
2. Parsing
Parsing is about converting textual data to a more specific representation. The simplest example of parsing is something like this:
String s = "1000";
int i = Integer.parseInt(s);
s holds the textual representation of the number 1000, i.e. the Unicode characters U+0031 U+0030 U+0030 U+0030. Integer.parseInt takes that text representation and converts it into an int type.
However, parsing can describe a wide variety of processes ranging from simple ones as above, slightly more complex ones like parsing a decimal number or date up to arbitrarily complex object trees.
As an example: The Java compiler will parse the Java source code and convert it into an internal representation that will then be further processed.
According to some definitions parsing can also apply to non-text inputs, as long as the input is some set of symbols (which could just be bytes), but that interpretation is rather rare.
3. Serialization
Serialization is the process of turning data or program state into something that can easily be stored or transferred. Usually that means into a byte stream (or more directly, a byte[]).
Similarly to parsing, serialization can apply to very simple one-value transformations down to serializing whole object trees and writing them to files.
In Java Serialization usually refers to the mechanism surrounding ObjectOutputStream and ObjectInputStream, but the term is also used to describe the general concept (i.e. other formats can also be described as "serialization").

Java: Org.JSON Library writes inconsistent datatypes [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I read and write a set of numerical values to a JSON file - they are all the 'double' datatype in Java. An example is the following:
double length = 22.4
double height = 13.1
double width = 17.9
double mass = 15.0
A JSONArray, 'objects', will then contain JSONObjects, each of which will contain the above measurements. Using the same values as above, it will look something like this (pay attention ot the datatypes):
"objects" :
[
{
"length" : 22.4,
"width" : 17.9,
"height" : 13.1,
"mass" : 15
}
]
The first issue posed by such a document is the fact that reading 'length', 'width' and 'height' will return a datatype 'BigDecimal' which seems unnormal for the org.json library which to my knowledge only returns double and integer. This is a problem because I need to convert the 'BigDecimal' values back to double although I wrote the JSON object using standard java double values. Additionally, the org.json library seems to convert any 'double' datatypes to integers when read (notice how mass is '15' and not '15.0'? This is a problem because as you can probably tell, the numerical value of "mass" may be different for another JSONObject (and thus indeed be a double). You can probably imagine the hell, that is iterating through this JSONArray when random values are an integer...
In summary, my problem is the fact that reading JSON files with the org.json library means that numerical values are only returned as doubles or BigDecimals. My question is if I can specify what datatype I would like org.json to write? All I want, is to read and write all my numerical values as a double - I don't want org.json to choose which datatype it will write to the JSON file.
My ideas were the following:
Writing everything as a string and converting back to double when reading
My attempts to solve this that haven't worked or are redundant to me:
Using the valueOf method to change between datatypes
Writing 9999 conditions to find the right instances to convert back to double (which sucks)
Using doubleValue() method
All I want, is to read and write all my numerical values as a double - I don't want org.json to choose which datatype it will write to the JSON file.
For what it is worth, the JSON library is not choosing a data type. It is choosing a character based representation to express a value. And there is some logical justification for it doing what it is doing. JSON is (or at least originated as) a subset of Javascript. In Javascript, integer and floating point are not distinct types. There is just one primitive data type for all numbers in Javascript.
I don't think that this can be solved using the org.json code. The stripping of trailing zeros after a decimal point happens in the static JSONObject::numberToString. Since it is a static method, you can't override it in a subclass.
(Take a look at the source code.)
My suggestions:
Work with the current behavior. For example, your code could be written know which fields are supposed to have floating point values, and use doubleValue() to retrieve their values.
Find an alternative JSON library that doesn't do this. (Based on my reading of the code, I think Jackson would be OK. I haven't checked any other libraries ...)
You could create your own private fork of the org.json library and modify it, but then you would be stuck with the overhead of maintaining that fork indefinitely.
I think that 3 is a bad idea. I have only included it for completeness.

Weird! Long value round off issue in spring boot

When I try to serialize an object to JSON using Jackson object mapper, it works perfectly.
{"id":1291741231928705024,"uuid":null,"email":"kannanrbk.r#gmail.com"}
Whereas, when I try to access it using spring rest controller. The long value numbers are rounded off, the last 3 digits.
I read existing questions in the stackoverflow, most of them suggest changing the datatype to string. But we used the Long value reference in most of the places, changing datatype will need some refactoring.
I did my initial analysis:
We are using Jackson ObjectMapper
From Spring, it indirectly calls MappingJackson2HttpMessageConverter
This problem might be somewhere around the JSONParser, where it treats any number as a double (15 digits) and after that, it's rounded off
Is there any way to fix this issue?
Is there any way to fix this issue?
There is no problem with Jackson/Java/Spring Boot, but with JavaScript/Browser.
Trying to reproduce the issue I serialized the same object and got this using curl:
$ curl localhost:8080
{"id":1291741231928705024,"uuid":null,"email":"kannanrbk.r#gmail.com"}
Here the number is correctly serialized.
The same json viewed in Firefox does truncate:
However the "Raw Data" tab displays the number correctly:
.
In JavaScript 1291741231928705024 is not safe integer (see Number.isSafeInteger()):
Number.isSafeInteger(1291741231928705024);
false
The number is greater than 2^53 - 1 so it gets rounded. Even more confusing situations are possible in JavaScript:
> 1291741231928705024 === 1291741231928705022
true
Possible solution
First of all check your client against this kind of problems. If it can safely deserialize such numbers then you're safe.
Or you can serialize longs as Strings (as you mentioned in the question), this is what Twitter proposes in its Twitter IDs (snowflake) article:
To allow Javascript and JSON parsers to read the IDs, Twitter objects include a string version of any ID when responding with JSON. Status, User, Direct Message, Saved Search and other IDs in the Twitter API are therefore returned as both an integer and a string in JSON responses.
Try with bigInt for your primary key

Java & MySQL: Store an Read a 365 position of bitarray. HOW?

I am currently working with Java and MySQL, and I found an issue I don't know how to solve.
I have a class that stores a String of 365 positions that represents a Binary String "010111010010100...", and I would like to be able to store and read that field from the database.
Once it is read, I will perform an AND Logic operation with another bitarray.
I read about the BitSet class, that allows the logical operators (AND, OR, XOR, ...) between them. I tried it, but I didn't like the solutions I got. I could also try to transform the String to a byte array, and then store and read it from the database, in order to later perform the logic AND operation, but not sure if I would need to always create a BitSet, and how performant could it be.
I don't know which is the most performant way to do what I want:
Convert the Binary String in another element.
Store that element in the database (in the case of BitSet I tried to define the Database field as BLOB, but I had a lot of issues transforming the BitSet to BLOB and reading the BLOB to a BitSet).
Read the element from the database (at this point would be great to directly work with the element without making any cast or transformation).
Perform a logic AND with another bitarray and get the result.
I have tried a lot of options, but they didn't work.
Could someone help me with this problem and how to better approach it from the performance point of view?
Thanks!
Storing bit in a string is bit weird, I used long to store a number, and make bitwise operations on that. It won't work for you, since you use much more bits. If it can remain string, maybe you can write a short function to make the AND operator on each byte of the string, somehow like this:
for (int i = 0; i<366; i++) {
data .= (stringname[i] == binarystring[i]?"1":"0");
}
Go through your string, while checking if it equals binary string (The one you want to AND it), if they equal, concat 1, if not, concat 0;

Can all tag numbers be extracted from a given protobuf serialization?

Given a protobuf serialization is it possible to get a list of all tag numbers that are in the message? Generally is it possible to view the structure of the message without the defining .proto files?
Most APIs will indeed have some form of reader-based API that allows you to enumerate a raw protobuf stream. However, that by itself is not enough to fully understand the data, since without the schema the interpretation is ambiguous:
a varint could be zig-zag encoded (sint32/sint64), or not (int32/int64/uint32/uint64) - radically changing the meaning, or a boolean, or an enum
a fixed-32/fixed-64 could be a signed or unsigned integer, or could be an IEEE754 float/double
a length-prefixed chunk could be a UTF-8 string, a BLOB, a sub-message, or a "packed" repeated set of primitives; if it is a sub-message, you'll have to repeat recursively
So... yes and no. Certainly you can get the field numbers of the outermost message.
Another approach would be to use the regular API against a type with no members (message Naked {}), and then query the unexpected data (i.e. all of it) via the "extension" API that many implementations provide.
You can get all the tag numbers which appear in one particular message, but you won't get any nested messages - and you won't know the types of those fields, only some subset of possible types.
If you look at the wire encoding, you can see that (for example) byte arrays, strings and nested messages are all encoded the same way - so you may know that "field 3 is a length-prefixed binary field" but you won't know whether that means it's a nested message, a string or a byte array.

Categories

Resources