Information File Store

Information File Store - java

I am looking for the best possible way to create a file store. In this file store I will be storing information on contact details etc. I will need to modify the details in the text file. I just wanted to get some opinions of what would be the best way to do this?

It depends on the size of the project.
For small projects: you can use something simple like XML, JSON, etc...
For bigger projects: you should use an SQL database, like MySQL, SQLite (very handy!)
And it is always nice to use SQLite! It is a simple SQL database library that stores the databases in one single file.
Since you have to use something simple with basic File IO, I would suggest taking one of the simple formats, like XML, JSON, CSV. Or to score better, write your own binary files, using DataOutputStream and DataInputStream.
Your file format could be something like this:
arbitrary number of bytes: fixed header (like: "PendoContactsFormat")
4 bytes (ie: int), number of contacts
(for each contact:)
2 bytes (ie: short), number of fields in this contact
(for each field in this contact:)
2 bytes: size of the field header
n bytes: field header
2 bytes: size of the field value
n bytes: field value

CSV may be the right format, if you want to be able to edit this file manually and let non-programmers to edit it.

Related

Simplest format to read/write huge files

I need to write huge files ( more than 1 million lines) and send the file to a different machine where I need to read it with a Java BufferedReader, one line at a time.
I was using indetned Json format but it turned out to be not very handy,
it requires too much coding and that consumes extra RAM/CPU.
I'm looking for something that looks like this:
client:id="1" name="jack" adress="House N°1\nCity N°3 \n Country 1" age="20"
client:id="2" name="alice" adress="House N°2\nCity N°5 \n Country 2" age="30"
vihecul:id="1" model="ford" hp="250" fuel="diesel"
vihecul:id="2" model="nisan" hp="190" fuel="diesel"
This way I can read the objects one at a time.
I know about url.encode & base64, but I'm trying to keep shorter readable lines.
So any suggestions please!

With the huge files, any textual data formats, specially with the markup data like JSON, YAML or XML, is not a very nice solution.
I can suggest to use a universal binary format, like Google Protocol Buffers or ASN1.
The Google Protocol Buffers is much easy to get started.
Of course if you just need a Java-To-Java data transferring, you can use java out of the box serialization.

What about reading/writing files in binary format using DataInputStream and DataOutputStream?
Of course, your data must have fixed structure, but as a benefit you'll get smaller file sizes and faster reading/writing.

What is the fastest file / way to parse a large data file?

So I am working on a GAE project. I need to look up cities, Country Names and Country Codes for sign ups, LBS, ect ...
Now I figured that putting all the information in the Datastore is rather stupid as it will be used quite frequently and its gonna eat up my datastore quotations for no reason, specially that these lists arent going to change, so its pointless to put in datastore.
Now that leaves me with a few options:
API - No budget for paid services, free ones are not exactly reliable.
Upload Parse-able file - Favorable option as I like the certainty that the data will always be there.
So I got the files needed from GeoNames (link has source files for all countries in case someone needs it). The file for each country is a regular UTF-8 tab delimited file which is great.
However, now that I have the option to choose how to format and access the data, the question is:
What is the best way to format and retrieve data systematically from a static file in a Java servelet container ?
The best way being the fastest, and least resource hungry method.
Valid options:
TXT file, tab delimited
XML file Static
Java Class with Tons of enums
I know that importing country files as Java Enums and going through their values will be very fast, but do you think this is going to affect memory beyond reasonable limits ? On the other hand, every time I need to access a record, the loop will go through a few thousand lines until it finds the required record ... reading line by line so no memory issues, but incredibly slow ... I have had some experience with parsing an excel file in a Java servelet and it took something like 20 seconds just to parse 250 records, on large scale, response time WILL timeout (no doubt about it) so is XML anything like excel ??
Thank you very much guys !! Please provide opinions, all and anything is appreciated !

Easiest and fastest way would be to have the file as a static web resource file, under the WEB-INF folder and on application startup, have a context listener to load the file into memory.
In memory, it should be a Map, mapping from a key you want to search by. This will allow you like a constant access time.
Memory consumption would only matter if it is really big. A hundred thousand record for example not worth optimizing if you need to access this many times.
The static file should be plain text format or CSV, they are read and parsed most efficiently. No need XML formatting as parsing it would be slow.
If the list is really big, you can break it up into multiple, smaller files, and only parse those and only when they are required. A reasonable, easy partitioning would be to break it up by country, but any other partitioning would work (like based on its name using the first few characters from its name).
You could also consider building this Map in the memory once, and then serialize this map to a binary file, and include that binary file as a static resource file, and that way you would only have to deserialize this Map and would be no need to parse/process it as a text file and build objects yourself.
Improvements on the data file
An alternative to having the static resource file as a text/CSV file or a serialized Map
data file would be to have it as a binary data file where you could create your own custom file format.
Using DataOutputStream you can write data to a binary file in a very compact and efficient way. Then you could use DataInputStream to load data from this custom file.
This solution has the advantages that the file could be much less (compared to plain text / CSV / serialized Map), and loading it would be much faster (because DataInputStream doesn't use number parsing from a text for example, it reads the bytes of a number directly).

Hold the data in source form as XML. At start of day, or when it changes, read it into memory: that's the only time you incur the parsing cost. There are then two main options:
(a) your in-memory form is still an XML tree, and you use XPath/XQuery to query it.
(b) your in-memory form is something like a java HashMap
If the data is very simple then (b) is probably best, but it only allows you to do one kind of query, which is hard-coded. If the data is more complex or you have a variety of possible queries, then (a) is more flexible.

How may I store images in a database using the Inubit tool set?

I am learning Inubit. I want to know, how may I store images in a database using the Inubit tool set?

The question is more than a year old. I guess you solved it by now.
For all others coming here, let me sketch out the typical way you'd do that.
0. (optional) Compress data.
Depending on the compression of the image (e.g. its GIF, PDF, uncompressed TIFF, etc. and not JPEG), you might want to compress it via a Compressor module first to reduce needed database space and increase overall performance on the next steps. Be sure to compress the binary data and not the base64-encoded string (see next step)!
1. Encode binary stream to base64.
Depending on where you get the image
data from, chances are that it already is base64 encoded. E.g. you
used a file connector to retrieve it from disk with the appropriate option checked or used a web service
connector. If you really have a binary data stream, convert it to
base64 using an encoder module (better self-documenting) or using a variable
assignment using the XPATH-function isxp:encode (more concise).
2. Save the encoded data via a database connector.
Well, the details
for doing this right are pretty much database specific. The cheap
trick that should work on any database, is storing the base64-string
simply as a string in a TEXT / CLOB column. This will waste about
three times as much space in the database as the original binary
data, since base64 is poorly packed. Doing it right would mean to
construct a forced SQL query in an XSLT that decodes the
base64-string to binary and stores it. Here is some reference
to how it can be done in Oracle.
Hope, this might be of some help.
Cheers,
Jörn
Jörn Willhöft
Willhöft IT-Beratung GmbH, Berlin, Germany

You do not store the image in the database, you only record the path to the image. The Image will be stored on the server.
Here is an example of how to store the path to the image : How to insert multiple images path to database

Getting metadata of an APNG image

I am trying to get the metadata of an apng image at the moment. I have
been able to get different frames from one apng file flawlessly and i am using PNGJ (a really great Standalone Java library for reading and writing PNG images), but I
am not able to get the different info that is stored against every
apng frame like delay of every frame.
I am at the moment just able to get the simple png image info that is stored in the header part by using
PngReader pngr = FileHelper.createPngReader(File);
pngr.imgInfo;
But I don't know how to have the information stored against the fcTL chunk. How can I do that?

You omitted the information that you are using the PNGJ library. As I mentioned in the other answer, this library does not parse APGN chunks (fcTL, fdAT). It loads them (you can inspect them in the ChunksList property) but they will be instatiated as "UNKNOWN" chunks, hence the binary data will be left in raw form. If you want to look inside the content of the fcTL chunks, you'd either parse the binary yourself, or implement youself the logic for that chunk type and register it in the reader (here's an example for a custom chunk).

Look at how you're currently reading 4-bytes integer 'seq' from fdAT.
You can read information from fcTL the same way.
Just keep in mind that some info is stored in fcTL as 4 bytes, some as 2 bytes, and some as 1 byte.

Java NIO - How to efficiently parse a file containing both ascii and binary data?

I have some data files looking something like this:
text
header
"lots of binary data hear"
/header
more text
header
"more binary data"
/header
....
Most of the files are around 1-5MB in size. It's very unlikely that I will have to deal with any files larger than approximately 30MB.
I'm fairly new to Java NIO and the API looks a bit like a jungle to me. Could anyone give me any pointers to how I should go about parsing a file like this?
Would it be possible to have multiple threads consuming data from different parts of the file? The file will just be open for reading.

Redesign the file. That's a terrible design.

Question is how would you know if you're reading text or binary data. If there is a clear demarcation of the text and binary regions (like a marker, or a defined block size), then I suspect Preon would be able to help you out. Preon does have support for reading both text and binary data in a useful way. And since I'm pretty sure your binary data represents something else, you might also be able to decode the binary bits into a more useful data structure than just an array.

you can FileChannel.map(), and read it like an array.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.