how to implement RowLoader in gemfirexd? - java

How to write Rowloader JAVA code to inject data from sample.csv file into GenfireXD database.

The GemFireXD distribution includes a JDBCRowLoader source example. Look in the examples directory. In your case you will have to determine which field of your CSV you want to consider as primary keys, parse the CSV and return rows as needed.

You can check IMPORT_DATA_EX and IMPORT_TABLE_EX procedures to load data into GemFireXD.
Since you mentioned csv format IMPORT_DATA_EX might be the recommend way to do it since you can also tweak the number of threads and constraints while loading the data. It's definitely one of the fastest ways to do it but please note that the csv file but be available from the node you're issuing the command.
You might also want to consider starting a peer member with host-data=false.
Reference: http://gemfirexd.docs.pivotal.io/latest/userguide/index.html#reference/system_procedures/derby/rrefimportdataproc_ex.html

Related

Store multiple values in a file - best format?

I want to store multiple values (String, Int and Date) in a file via Java in Android Studio.
I don't have that much experience in that area, so I tried to google a bit, but I didn't get the solution, which I've been looking for. So, maybe you can recommend me something?
What I've tried so far:
Android offers a SharedPreferences feature, which allows a user to save a primitive value for a key. But I have multiple values for a key, so that won't work for me.
Another option is saving data on an external storage medium as file. As far as good. But I want to keep the filesize at minimum and load the file as fast as possible. That's the place, where I can't get ahead. If I directly save all values as simple text, I would need to parse the .txt file per hand to load the data which will take time for multiple entries.
Is there a possibility to save multiple entries with multiple values for a particular key in an efficient way?
No need to reinvent a bicycle. Most probably the best option for your case is using the databases. Look into Sqlite or Realm.
You don’t divulge enough details about your data structure or volume, so it is difficult to give a specific solution.
Generally speaking, you have these three choices.
Serialize a collection
I have multiple values for a key
You could use a Map with a List or Set as its value. This has been discussed countless times on Stack Overflow.
Then use Serialization to write and read to storage.
Text file
Write a text file.
Use Tab-delimited or CSV format if appropriate. I suggest using the Apache Commons CSV library for that.
Database
If you have much data, or concurrency issues with multiple threads, use a database such as the H2 Database Engine.

What is the fastest file / way to parse a large data file?

So I am working on a GAE project. I need to look up cities, Country Names and Country Codes for sign ups, LBS, ect ...
Now I figured that putting all the information in the Datastore is rather stupid as it will be used quite frequently and its gonna eat up my datastore quotations for no reason, specially that these lists arent going to change, so its pointless to put in datastore.
Now that leaves me with a few options:
API - No budget for paid services, free ones are not exactly reliable.
Upload Parse-able file - Favorable option as I like the certainty that the data will always be there.
So I got the files needed from GeoNames (link has source files for all countries in case someone needs it). The file for each country is a regular UTF-8 tab delimited file which is great.
However, now that I have the option to choose how to format and access the data, the question is:
What is the best way to format and retrieve data systematically from a static file in a Java servelet container ?
The best way being the fastest, and least resource hungry method.
Valid options:
TXT file, tab delimited
XML file Static
Java Class with Tons of enums
I know that importing country files as Java Enums and going through their values will be very fast, but do you think this is going to affect memory beyond reasonable limits ? On the other hand, every time I need to access a record, the loop will go through a few thousand lines until it finds the required record ... reading line by line so no memory issues, but incredibly slow ... I have had some experience with parsing an excel file in a Java servelet and it took something like 20 seconds just to parse 250 records, on large scale, response time WILL timeout (no doubt about it) so is XML anything like excel ??
Thank you very much guys !! Please provide opinions, all and anything is appreciated !
Easiest and fastest way would be to have the file as a static web resource file, under the WEB-INF folder and on application startup, have a context listener to load the file into memory.
In memory, it should be a Map, mapping from a key you want to search by. This will allow you like a constant access time.
Memory consumption would only matter if it is really big. A hundred thousand record for example not worth optimizing if you need to access this many times.
The static file should be plain text format or CSV, they are read and parsed most efficiently. No need XML formatting as parsing it would be slow.
If the list is really big, you can break it up into multiple, smaller files, and only parse those and only when they are required. A reasonable, easy partitioning would be to break it up by country, but any other partitioning would work (like based on its name using the first few characters from its name).
You could also consider building this Map in the memory once, and then serialize this map to a binary file, and include that binary file as a static resource file, and that way you would only have to deserialize this Map and would be no need to parse/process it as a text file and build objects yourself.
Improvements on the data file
An alternative to having the static resource file as a text/CSV file or a serialized Map
data file would be to have it as a binary data file where you could create your own custom file format.
Using DataOutputStream you can write data to a binary file in a very compact and efficient way. Then you could use DataInputStream to load data from this custom file.
This solution has the advantages that the file could be much less (compared to plain text / CSV / serialized Map), and loading it would be much faster (because DataInputStream doesn't use number parsing from a text for example, it reads the bytes of a number directly).
Hold the data in source form as XML. At start of day, or when it changes, read it into memory: that's the only time you incur the parsing cost. There are then two main options:
(a) your in-memory form is still an XML tree, and you use XPath/XQuery to query it.
(b) your in-memory form is something like a java HashMap
If the data is very simple then (b) is probably best, but it only allows you to do one kind of query, which is hard-coded. If the data is more complex or you have a variety of possible queries, then (a) is more flexible.

Best way to create a file and then use the data later

Java: Best way to store data in a file.
I am doing a comparison between 2 versions of a file and then record the differences between the file as Insert, Deleted or Changed. The data needs to be logged in a similiar format >>
Version_old=1.28 Version_new=1.29
Operation=Changed,SourceLineFrom=55,SourceLineTo=55 TargetFileFrom=55 TargetFileTo= 55
Operation=Delete, SourceLineFrom=57,SourceLineTo=59 TargetFilefrom=57 TargetFileTo= -
The data is needed later on. Can anyone suggest me which is the best and easiest format to save this data? the data have to retrieved later on for processing.
I would look at the format produced by git diff tool. It's clear, can easily be parsed, and I'm sure that there are existing parsers for it
too amny options to really be helpful.
it sounds like maybe XML - at least you get free parsers.
Another alternative is to store the data using JSON format. This is mainly possible because each change set is constructed as a name-value pair (a Map basically).

Best file format regarding standard string and integer data?

For my project, I need to store info about protocols (the data sent (most likely integers) and in the order it's sent) and info that might be formatted something like this:
'ID' 'STRING' 'ADDITIONAL INTEGER DATA'
This info will be read by a Java program and stored in memory for processing, but I don't know what would be the most sensible format to store this data in?
EDIT: Here's some extra information:
1)I will be using this data in a game server.
2)Since it is a game server, speed is not the primary concern, since this data will primary be read and utilized during startup, which shouldn't occur very often.
3)Memory consumption I would like to keep at a minimum, however.
4)The second data "example" will be used as a "dictionary" to look up names of specific in-game items, their stats and other integer data (and therefore might become very large, unlike the first data containing the protocol information, where each file will only note small protocol bites, like a login protocol for instance).
5)And yes, I would like the data to be "human-editable".
EDIT 2: Here's the choices that I've made:
JSON - For the protocol descriptions
CSV - For the dictionaries
There are many factors that could come to weigh--here are things that might help you figure this out:
1) Speed/memory usage: If the data needs to load very quickly or is very large, you'll probably want to consider rolling your own binary format.
2) Portability/compatibility: Balanced against #1 is the consideration that you might want to use the data elsewhere, with programs that won't read a custom binary format. In this case, your heavy hitters are probably going to be CSV, dBase, XML, and my personal favorite, JSON.
3) Simplicity: Delimited formats like CSV are easy to read, write, and edit by hand. Either use double-quoting with proper escaping or choose a delimiter that will not appear in the data.
If you could post more info about your situation and how important these factors are, we might be able to guide you further.
How about XML, JSON or CSV ?
I've written a similar protocol-specification using XML. (Available here.)
I think it is a good match, since it captures the hierarchal nature of specifying messages / network packages / fields etc. Order of fields are well defined and so on.
I even wrote a code-generator that generated the message sending / receiving classes with methods for each message type in XSLT.
The only drawback as I see it is the verbosity. If you have a really simple structure of the specification, I would suggest you use some simple home-brewed format and write a parser for it using a parser-generator of your choice.
In addition to the formats suggested by others here (CSV, XML, JSON, etc.) you might consider storing the info in a Java properties file. (See the java.util.Properties class.) The code is already there for you, so all you have to figure out is the properties names (or name prefixes) you want to use.
The Properties class also provides for storing/loading properties in a simple XML format.

Document Management System - Database Design

I'm writing my own Document Management System (DMS) in Java (the ones available don't satisfy my needs).
The documents shall be described by the Qualified DublinCore Metadata Standard. The easiest way to do this, in my opinion is do pack the key-value pairs in a RDF model with a XML representation.
To store the metadata for all documents i have two ideas (the document files will be stored in the filesystem):
Store all metadata of all documents in a single XML file
Make a XML file for each document and store it either in the filesystem or in a RDBMS (like the H2 database engine for Java), a key-value database won't solve this because the keys for one document are not unique.
Since (many) documents are linked among each other the first approach may would be better for analysing the data, but the second approach may be much faster.
Which solution you would recommend? Or are there any better solutions?
Stefan
I don't know how your analysis work, but if you need the complete graph in memory to do your analysis then use variante 1 (Store all metadata of all documents in a single XML file), because you will get no gain (but only extra work) from variante 2 in this scenario.
added
If this extra work for variant 2 is not to much, then I recomend variant 2, because it can be more calable.
you could update or add document meta data by writing only a small xml file instead of a huge one
it depends on what xml parser you use, but in some cases it is faster to parse some smaller xml files than one huge one (but this strongly depends on the ammout of data).
Have you considered using MongoDB and GridFS? http://www.mongodb.org/display/DOCS/GridFS+Specification
You can store your documents directly in MongoDB as binary and even store the associated metadata for that particular file in any format you want. It would have the ability to store documents even if they have the same name and it will generate it's own unique IDs.
BTW: even if it does not belong to your question: have a look at a JCR (Java Content Repository) implementation like JackRabbit. You could use it to store your documents and maybe your meta data too.
I'd look into a NO SQL document solution like Couch DB to see if it could help you.
I don't like the file system solution; there's no abstraction whatsoever to help you there.
If your are always accessing all documents, none of your approaches would be slower than the other. But I would recommend the second approach. When it comes to analyzing the data, you'll need to read all documents, so there is no difference if they are in different files or in one file...

Categories

Resources