I am sending a big array of data. What is more optimized: to concat data with a symbol or to send it as a JSONArray?
Data is being sent from Android client to Apache PHP.
Example of concated data:
data1_data2_data3_data4
Example of JSONArray
{ "Data": [data1, data2, data3, data4] }
It completely depends on your usecase. From you example, here are some thoughts:
in terms of bytes sent, the concatenation is slightly better, as a JSON adds some metadata and symbols.
In terms of ease of use, JSON clearly wins, as there are libraries and standards. If you just have plain data without any _, concatenated data are ok. But what happens if one of you data has a _ ? You will need to escape those and to keep track of your custom format all over your codes... (And that's just the tip of the iceberg).
In general, my advice is: use standard data serialization schemes, always. In case the size of the serialized data is a concern, have a look at binary standards (for example protobuf).
It doesn't really matter, if you're asking about optimized towards transfer size in bytes, the difference is minimal.
However, the concatenated data example that you gave will require more data processing on the recipient part as your script will have to cut the sent data and parse the symbol transferring it into a usable object.
So best to stick with the usual JSON object, as I don't think you will gain any optimization this way.
Depends on what you mean by optimization.
Realistically speaking, even if you were to parse it with a custom made function/class vs. some built in function(like json_decode from PHP) the time difference would be rather minimal or irrelevant.
If you can stick to the standard, then do it. Send it as proper JSON not some weirdly concatenated string.
The advantages outweigh anything else.
to concat data will be optimized but you want to make sure your data do not have "_" or handle delimiter properly.
Related
I have a large result from DB object that I want to convert to a a json string and pass as a message to pubsub. I am new to Java 8 streams and I can not figure out what is the best way to use that. I usually use
new Gson().toJson(myArrayList.toArray(), T[].class) but I want to avoid storing large objects in the memory so avoid making the ArrayList as one of the answers suggested.
Given that your arraylist is already a giant object, what you're attempting to avoid is, at worst, a constant factor of 2x (that you need 2x the memory you already need now). In other words, there is no point in trying to stream your JSON here.
If you have a stream of objects which are being read one at a time from someplace (say, you're iterating through a DB resultset and converting each 'row' into an object e.g. with JOOQ or JDBI or hibernate, and you want to send these out via JSON), then what you want makes plenty of sense.
In other words, the advice is: Don't bother, you won't gain anything useful. If you want to spend time to make this code less memory intensive, then go back to the code that ends up making a gigantic arraylist and change that to make a stream (lower-case s. Could be an actual java.util.stream.Stream, but more likely an iterator or just a different code structure).
Only then worry about the JSON part. Once you've done that, investigate this wiki page on streaming GSON.
I was just introduced to the concept of serialisation in Java and while I 'get' the fundamentals, I can't help but feel like it's a bit of an overkill? My logic is that if I have pointers to the objects and I know how many bytes it takes up in memory. Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
The amount of detail my book goes into serialisation is giving me a good indication that I'm not really understanding the importance of this and that there is probably something more subtle than just writing out all the bytes exactly as they are. Any help is greatly appreciated! (I have some background in c++ if that helps)
Why can't I just theoretically write these bytes to some txt file, along with the some extra bytes to indicate the type. With this, can't I just read these bytes back and restore my original object?
How could anyone ever read them back in? Say I'm writing code that's supposed to read in your file. Please tell me what the third byte means so that I can decode it properly.
What if the internal representation of the object contains pointers to other objects that might be in different memory locations the next time the program runs? For example, it is quite common to manage identical strings by having internal references to the same internal string object. How will writing that reference to a file be sensible given that the internal string object may not exist in the next run?
To write data to a file, you need to write it out in some specific format that actually contains all the information you need to be able to read back in. What happens to work internally for this program at this time just won't do as there's no guarantee another program at another time can make sense of it.
What you suggest works provided;
the order and type of fields doesn't change. Note this is not set at compile time.
the byte order doesn't change.
you don't have any references eg no String, enum, List or Map.
the name&package of the type doesn't change.
We at Chronicle, use a form of serialization which supports this as it's much faster but it's very limiting. You have to be very aware of those limitations and have a problem which is suitable. We also have a form of serialization which have none of these constraints, but it is slower.
The purpose of Java Serialization is to support arbitrary object graphs even if data is exchanged between systems which might arrange the data differently.
Hey fellow programmers,
I am working on a JAVA application where speed is key. I need to deal with a stream of JSON (Requests to the server return a json object that I continuously parse to analyze it later on). The Json object is about 2000 characters long, so I was wondering if it wouldn't be quicker to just treat it as a string (using indexOf, substring etc... ) instead of using a JSON Parser. (I used both Jackson and Json-lib without noticeable difference) ? Will it save me a couple milli-seconds ?
Thank you !
It depends what you need to know from it, but in general, I think it's better to use a JSON parser. The parser will be highly optimized, so it will beat your own attempts if you need to read many values. Also, the parser will ignore whitespace, while you have to take care of it explicitly.
Checking something yourself is harder than you think. For instance, if you need to know if a property 'x' exists, you cannot just check for the existance of the string x, because it can also be part of a value. You cannot look for x:, because maybe there is a space between them. And if you found x, do you know if it is in the right place? Is it part of the right object, or maybe of a sub-object you didn't expect there to be at all?
Before you know it, you are writing a parser yourself.
If you can't notice the difference, don't bother and use the parser, because it is the easiest, safest and most flexible choice. Only start optimizing if you need to.
I'm trying to design a lightweight way to store persistent data in Java. I've already got a very efficient way to serialize POJOs to DataOutputStreams (and back), but I'm trying to think of a good way to ensure that changes to the data in the POJOs gets serialized when necessary.
This is for a client-side app where I'm trying to keep the size of the eventual distributable as low as possible, so I'm reluctant to use anything that would pull-in heavy-weight dependencies. Right now my distributable is almost 10MB, and I don't want it to get much bigger.
I've considered DB4O but its too heavy - I need something light. Really its probably more a design pattern I need, rather than a library.
Any ideas?
The 'lightest weight' persistence option will almost surely be simply marking some classes Serializable and reading/writing from some fixed location. Are you trying to accomplish something more complex than this? If so, it's time to bundle hsqldb and use an ORM.
If your users are tech savvy, or you're just worried about initial payload, there are libraries which can pull dependencies at runtime, such as Grape.
If you already have a compact data output format in bytes (which I assume you have if you can persist efficiently to a DataOutputStream) then an efficient and general technique is to use run-length-encoding on the difference between the previous byte array output and the new byte array output.
Points to note:
If the object has not changed, the difference in byte arrays will be an array of zeros and hence will compress very small....
For the first time you serialize the object, consider the previous output to be all zeros so that you communicate a complete set of data
You probably want to be a bit clever when the object has variable-sized substructures....
You can also try zipping the difference rather than RLE - might be more efficient in some cases where you have a large object graph with a lot of changes
I am needing to store easily parsable data in a file as an alternative to the database backed solution (not up for debate). Since its going to be storing lots of data, preferably it would be a lightweight syntax. This does not necessarily need to be human readable, but should be parsable. Note that there are going to be multiple types of fields/columns, some of which might be used and some of which won't
From my limited experience without a database I see several options, all with issues
CSV - I could technically do this, and it is very light. However the parsing would be an issue, and then it would suck if I wanted to add a column. Multi-language support is iffy, mainly people's own custom parsers
XML - This is the perfect solution from many fronts except when it comes to parsing and overhead. Thats a lot of tags and would generate a giant file, and parsing would be very resource consuming. However virtually every language supports XML
JSON - This is the middle ground, but I don't really want to do this as its an awkward syntax and parsing is non-trivial. Language support is iffy.
So all have their disadvantages. But what would be the best when trying to aim for language support AND somewhat small file size?
How about sqlite? This would allow you to basically embed the "DB" in your application, but not require a separate DB backend.
Also, if you end up using a DB backend later, it should be fairly easy to switch over.
If that's not suitable, I'd suggest one of the DBM-like stores for key-value lookups, such as Berkely DB or tdb.
If you're just using the basics of all these formats, all of the parsers are trivial. If CSV is an option, then for XML and JSON you're talking blocks of name/value pairs, so there's not even a recursive structure involved. json.org has support for pretty much any language.
That said.
I don't see what the problem is with CSV. If people write bad parsers, then too bad. If you're concerned about compatibility, adopt the default CSV model from Excel. Anyone that can't parse CSV from Excel isn't going to get far in this world. The weakest support you find in CSV is embedded newlines and carriage returns. If you data doesn't have this, then it's not a problem. Only other issue is embedded quotations, and those are escaped in CSV. If you don't have those either, then its even more trivial.
As for "adding a column", you have that problem with all of these. If you add a column, you get to rewrite the entire file. I don't see this being a big issue either.
If space is your concern, CSV is the most compact, followed by JSON, followed by XML. None of the resulting files can be easily updated. They pretty much all would need to be rewritten for any change in the data. CSV has the advantage that it's easily appended to, as there's no closing element (like JSON and XML).
JSON is probably your best bet (it's lightish, faster to parse, and self-descriptive so you can add your new columns as time goes by). You've said parsable - do you mean using Java? There are JSON libraries for Java to take the pain out of most of the work. There are also various light-weight in memory databases that can persist to a file (in case "not an option" means you don't want a big separate database)
If this is just for logging some data quickly to a file, I find tab delimited files are easier to parse than CSV, so if it's a flat text file you're looking for I'd go with that (so long as you don't have tabs in the feed of course). If you have fixed size columns, you could use fixed-length fields. That is even quicker because you can seek.
If it's unstructured data that might need some analysis, I'd go for JSON.
If it's structured data and you envision ever doing any querying on it... I'd go with sqlite.
When I needed solution like this I wrote up a simple representation of data prefixed with length. For example "Hi" will be represented as(in hex) 02 48 69.
To form rows just nest this operation(first number is number of fields, and then the fields), for example if field 0 contains "Hi" and field 1 contains "abc" then it will be:
Num of fields Field Length Data Field Length Data
02 02 48 69 03 61 62 63
You can also use first row as names for the columns.
(I have to say this is kind of a DB backend).
You can use CSV and if you only add columns to the end this is simple to handle. i.e. if you have less columns than you expect, use the default value for the "missing" fields.
If you want to be able to change the order/use of fields, you can add a heading row. i.e. the first row has the names of the columns. This can be useful when you are trying to read the data.
If you are forced to use a flat file, why not develop your own format? You should be able to tweak overhead and customize as much as you want (which is good if you are parsing lots of data).
Data entries will be either of a fixed or variable length, there are advantages to forcing some entries to a fixed length but you will need to create a method for delimiting both. If you have different "types" of rows, write all the rows of a each type in a chunk. Each chunk of rows will have a header. Use one header to describe the type of the chunk, and another header to describe the columns and their sizes. Determine how you will use the headers to describe each chunk.
eg (H is header, C is column descriptions and D is data entry):
H Phone Numbers
C num(10) type
D 1234567890 Home
D 2223334444 Cell
H Addresses
C house(5) street postal(6) province
D 1234_ "some street" N1G5K6 Ontario
I'd say that if you want to store rows and columns, you've got to to use a DB. The reason is simple - modification of the structure with any approach except RDBMS will require significant efforts, and you mentioned that you want to change the structure in future.