XML as data store. Insert, remove, delete - java

I was planning to use XML to store the data for a Java DVD database application I'm writing. I know that the word "database" is right there in the title, but XML just seemed so much more portable, was human readable and (I assumed before looking into it) simpler to implement.
Parsing XML seems to be the easiest thing in the world... even creating a new XML file isn't much trouble, but changing records, inserting them or deleting them, I can only see to do by creating a fresh XML file.
Am I missing something? Or is the thing that I'm missing that I should switch over to a database format (but there's some wonderful database format I've not heard of, that's totally portable and users won't need to install something separate to use :) )

the most popular way to use a file as a database is probably with sqlite http://www.sqlite.org/ and that's what i would use if i were solving your problem (it's pretty much a standard SQL database, but uses just one file as storage). another, pure-java option is apache derby http://db.apache.org/derby/
however, pure xml databases do exist (and were quite fashionable about 10 years ago - the "nosql" of their time) - the associated standards are xpath http://en.wikipedia.org/wiki/XPath and xquery http://en.wikipedia.org/wiki/Xquery . i haven't used it, but it seems like basex http://basex.org/open-source/ is an open-source implementation that you could use (and it does claim to provide ACID guarantees - http://basex.org/products/ ).
if you're more familiar with xml than sql i don't see any great harm in using an xml database for a small project. just structure your code so that most of the program doesn't care what the storage is (ie by providing a neutral interface). then if xml doesn't work out you can switch to sql by re-implementing just that interface and leaving the rest of your program alone (and if it does work, post back here saying so - it would be interesting to know).

If you're going to have a web-based front end, it seems that a regular database is the way to go as the back end. I don't believe your users would have a need to download anything new, since that's all taken care of server-side. A real database also has the ACID advantage over a pseudobase; it should be atomic, consistent, isolated, and durable, and I can't imagine XML would be a good substitute in those respects.

Related

XML vs JSON vs SQLite for only reading data

I have a collection of 350 locations in the United States with each containing about 25 subcategories. The data structure looks something like this:
Location (ex: Albany, NY)
--> Things to do
--> Population
... 23 More
Which of the following would be best for loading this data into the app: JSON, XML, or SQLite? Just to clarify, I don't need to edit this data in any way. I simply need to read it so that the information can be loaded into TextView's.
Edit:
I'm attempting to implement Room and XML and so far the XML seems to be the simplest to implement. Is it bad practice to use the XML solution? It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices. Would it still be a better practice to implement the Room solution?
Undoubtedly, among all of these RDB is the most efficient one, both in terms of storage and query response. I personally do not see any point in using xml and json as these have been traditionally used for exchange of data and are inefficient for storage and queries.
I would suggest that you evaluate the following:
a) how are you going to store the data: single file vs multiple files(for example by subject)
b) are you going to be doing updates on the strings or just appending(SQL will be better suited for updates but if it just reading data after a batch processing flat files might be better suited)
c) How complex are the queries that you want to implement.XML and SQL are better suited for queries that might try to address metadata (date stored, original location address, etc.) than JSON
Once you determine what you want to optimize: whether it is on adding metadata, fast updates, fast querying, ease of storage, fast retrieval of subject files, etc. then you can decide the tradeoffs with other less important goals. In this specific instance the devil is very much in the details.
In most cases it would be better to use a database because it increases readability and maintainability. Especially if you want to show these information inside a kind of list-view. If you use JSON or XML you'll have to parse or write a lot of code to switch between things or load them with a good performance. Consider the case of using Room, LiveData and a RecyclerView, this will reduce the code you'll need and improve( a lot) performance and readability of your app code. By the way you should provide more information about how you want to use and where you want to show these information. XML (or the Android resource system) should be used if you plan to use the resource system itself with its qualifiers to reduce your work. Most of the time JSON is used to communicate outside or with another app in an easy way or for REST requests/responses.
The one option that wouldn't make sense to use at all for your use case is SQLite. Unless you plan on running specific queries on the data for preprocessing before loading them into your view it doesn't worth the overhead (even if I don't imagine is a lot with 350 locations)
XML vs JSON serve the same usecase without much difference, read up their specifics in this website: https://www.json.org/xml.html
I would personally go for JSON due to the simplicity of the format.
Edit:
#simo-r Argument is also a valid one in regards to readability of your code. While there are libraries that can make reading json/xml easier by default Android has really good SQLite support so it might make sense to use it. Ultimately it is in your personal preference and where you see the project growing.
I have a collection of 350 locations in the United States with each containing about 25 subcategories.
The main issue is scalability
Will you, in the next few years, keep just a few hundred locations, or do you imagine, that, if your software becomes successful, your data would grow to many thousands of locations?
If yes: choose SQLite because it could store many records, in an efficient way. Don't forget to have a good database schema with appropriate indexes. See this and read about database normalization. Also, an SQLite database could later be migrated (with efforts) to PostGreSQL.
If no (your data has just a few megabytes): keep JSON or XML. The data is in the page cache.
Consider also YAML, and sometimes a mixed approach.
don't forget to document how your data is organized and accessed.
See also the data persistence chapter of this draft report
If you gonna simply bind data into text views, you can just store the text as strings.xml. As simple as that.
Go with JSON.
Advantages :
Low overhead ( Vs SQLite )
Lightweight parsers like Jackson available using which you can easily convert your data into custom object or data-structure if you need.
Maintainable. As most of the developers understand the format.
I would suggest using JSON. Reason below
JSON vs XML
JSON is lightweight than XML and would take fewer resources(network and storage). Performance of the app increases.
JSON parsing is easy and as mentioned above, its trivial.
JSON is friendly to javascript, in case it's required.
JSON vs SQLite
350 data set with 23 attributes, can be easily managed by JSON. RDBMS is not required.
SQLite becomes an overhead. It's an extra layer and layer comes with a cost. Especially if the application is containerized, the architecture becomes complicated. One needs to deal with volume mapping etc, in case of JSON you can keep the data as part of the application code.
Importantly, since data is static, keep the application stateless by keeping the data alongside the codebase. This makes lot more sense from architectural perspective.
Problem
You have a fixed set of information with a simple structure that you wish to deliver to clients.
Questions to Reflect On
Do I expect this information to significantly changed or modified ever?
Do I expect to increase the amount of information available?
What kind of help do I have? Do they have a background in software engineering or is it someone of a different profession that has to wear a lot of hats?
What is the scale of the project? Are you expecting a large amount of users or just people interested in a very niche application?
JSON or XML
JSON and XML provide similar services: they are both data transfer protocols. If the information is not expected to grow both might be a great option. If its public information, just serve these files statically over nginx. You can point a worker with limited software engineering experience to update these files; they're just files in a folder presented in a human readable format... its extremely simple to do. These updates should be minor and infrequent.
JavaScript Object Notation(JSON) Pros
solid browser and backend support
small size and fast parsing by the javascript engine
very human readable, easy for the untrained eye to make changes
Extensible Markup Language(XML) Pros
standard meta-data option
supports namespaces
solid backend support and is often baked into frameworks
This article explains XML and JSON differences really well (in 2020) if these highlights were not sufficient for your investigation.
Database System
There are a plethora of database systems out there. Their job is to efficiently retrieve specific information from a large volume of data stored. The key reason to use databases is scalability. Scalability means a number of things; I view it as adapting to drastic change. If you expect this information to frequently change or grow, go with a database.
Object Relational Mapping (ORM)
Databases can be cumbersome to use. I would recommend using an ORM on top of them. These encapsulate a database and makes it more user friendly (language specific). Room makes sense in your use case especially for java android development. Encapsulation also allows you to migrate to other databases later without change your code. Here's a good article that discusses Room and SQLite!
Miscellaneous
"Is it bad practice to use an XML solution?"
No. The important thing is that it works, is understandable, and runs efficiently. Just keep in mind that XML and JSON are data transfer protocols and they do THAT job well. This stackoverflow discussion may be helpful to gain a better picture of what that means; be sure to read more than just the accepted answer.
"It doesn't seem to be using too many resources and it isn't running slow at all when tested on a few devices."
Although testing for functionality is great, keep in mind that your test is not a load test and does not verify what you're trying to confirm. I would explore load testing, Wikipedia is a good place to start!

Which is the best way to handle big CSV files (Java, MySQL, MongoDB)

i need to handle a big CSV file with around +750.000 rows of data. Each line has around 1000+ characters and ~50 columns, and i am really not sure what's the best (or atleast good and sufficient) way to handle and manipulate this kind of data.
I need to do the following steps:
Compare the values of two Colomns and write the result to a new column (this one seems easy)
Compare values of two lines and do stuff. (e.g delete if one value is duplicated.)
Compare values of two different files.
My Problem is that this is currently done with PHP and/ or Excel and the limits are nearly exceeded + this takes a long time to process and will be no longer possible when the files get even bigger.
I have 3 different possibilities in mind:
Use MySQL, create a table (or two) and do the comparing, adding or deleting part. (I am not really familiar with SQL and would have to learn it, also it should be done automatically so there is the problem that you cant create tables of CSV files )
Use Java creating Objects in ArrayList or Linked Lists and to "the stuff" (to operations would be easy but handling that much data will probably be the problem)
(Is it even possible to save that many files in Java or does it crash / is there a good tool etc.?)
Use Clojure along with MongoDB to add files from CSV to MongoDB and read files using Mongo.
(Name additional possibilities if you have another idea ..)
All in all I am not a Pro in any of these but would like to solve this problem / get some hints or even your opinion.
Thanks in advance
Since in our company we work a lot with huge csv files here are some ideas:
because these files are in our case always exported from some other relational database we always use PostgreSQL, MySQL or golang + SQLite to be able to use simple plain SQL queries which are in these cases most simple and reliable solution
number of rows you describe is quite low from the point of view of all these databases so do not worry
all have native internal solution for import / export of CSV - which works much quicker than anything created manually
for repeated standard checks I use golang + SQLite with :memory: database - this is definitely the quickest solution
MySQL is definitely very good and quick for checks you described but choose of database depends also on how sophisticated analysis you would need to do further - for example MySQL up to 5.7 still does not have window functions which you could need later - so consider using PostgreSQL in some cases too...
I normally use PostgreSQL for this kind of tasks. PostgreSQL COPY allows importing CSV data easily. Then you get a table with your CSV data and the power for SQL (and a reasonable database) to do basically anything you want with the data.
I am pretty sure MySQL have similar capabilities of importing CSV, I just generally prefer PostgreSQL.
I would not use Java for CSV processing. This will be too much code and unless you take care of indices, the processing will not be performant. An SQL database is much better equiped for tabular data processing (should not be a surprize).
I wouldn't use MongoDB, my impression is that it is less powerful in update operations compared to an SQL database. But this is just an opinion, take it with a grain of salt.
You should try Python with the pandas package. On a machine with enough memory (say 16GB) it should be able to handle your CSV files with ease. The main thing is - anyone with some experience with pandas will be able to develop a quick script for you and tell you in a few minutes if your job is doable or not. To get you started:
import pandas
df = pandas.read_csv('filename.csv')
You might need to specify the column type if you get into memory issues.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
I'd suggest to use Spark. Even in a standalone machine the performance is incredible. You can use Scala and Python to handle your data. It's flexible and you can do processing that is impossible in Java or relational database.
The other choices are great also, but I'd consider Spark to all analytics needs from now on.

C#/Java Best practice to save data locally

Lets say I want to make a program on c#.net for a video club to store clients and theirs rents.
What is the best and modern way (and standalone way) to store this data? xml, binary seriliazation, sqlite, access?
I will also need to query often the data. For example a client come, I search him via name, I find him and I add him a new rent. Also (now or on future) there will be data for dvds too (add dvds and on short there will relation between dvds and clients). Its like a database but because database is not standalone I want my program works only if user have on his pc .net installed. I could use mysql but this needs mysql server installed...
What you think is the best and most modern solution?
various sql's are good enuff: sqlite, sql ce, sql express.
Access may be the easiest starting solution as you can visualise the data without much trouble.
if it is not critical data, simply store it in an xml or json file using xml to object mapping like JAXB or xsd.exe or JSON to object mapping like GSON.
If it is critical best is to go with DB.
Since it's a small solution use either ms sql express or mysql. Both should be free (as in free beer).
Sql express is best for .net, mysql for java.
You can use an embedded database. If you don't like SQL databases (some can be embedded) you can try one of the many free "NoSQL" database http://nosql-database.org/ has a list of 122.
I think the best solution is the simplest which works. I don't think the most modern is necessarily going to be the best. For example, I'd be looking at the number of members. Is this number as big as from a large movie store chain? Or is for a small group of people? If only a small number, is it possible for you to use Excel or similar?
Are you choosing C# because you want to learn it or are you driven by the end result. i.e. a working and useful asset tracking system? (Which many already exist of course)
Other than that, many databases can be stored in a single file and are quite easy to create. I think there is an flat file driver for SQLite, and JDBC certainly has one.
You may also wish to consider the need to inform your members of your privacy policy with dealing with their personal information (eg. How do you keep private Mr Bloggs's dirty movie rental collection). We often forget this stuff in our eagerness for creating a cool application!
Good luck!

Recommend practices for storing application transient config

I am writing a small personal file-server with Play! and it's my first web application. What are the recommended practices for storing preferences that users can modify through a preference panel.
My first idea was to use a property file in the conf directory, but I must be able to modify it during runtime. Is the conf directory writable, whatever the deployment option ?
Are there built-in options for that or is there a better approach ?
As said by Kim Stebel, the usual solution is to use what you application already has, which is most of the time a database engine, being relational or not. That's because most of the time, user preferences come after some other data where already persisted.
But in your case, it seems that the file system is you persistence engine, and you don't seems to need transactions or excessively good read/write performance for the discussed feature, so I would keep that part the simplest possible until some other persistence engine is needed: I would just serialize user preference object to some text format (JSON or XML comes to mind) and save them in the filesystem: no mapping hell for now, no premature choice (and even the possibility to corrupt^W edit your user preference with your favorite text editor, directly on the server, yeah ;)
That being said, there is a ton of good framework for that job, in Scala or from the Java ecosystem.
For the XML mapping, I don't thing Scala native library is the best choice. It's easy to produce XML structure with it, but the mapping from XML to Scala object is at best horrible.
XStream (http://x-stream.github.io/) is quite good for that, but you will have to use Java collection, or add your own (and that wasn't my idea of the 'most simple').
For JSON mapping, there is several really good libraries in Scala. Google and other stackoverflowers may have more details, but I know there is at least these two:
Lift-JSON (https://github.com/lift/lift/tree/master/framework/lift-base/lift-json/) - I used that one, and even if the API seems sometimes strange to me and the doc is a little too light, the automatic deserialization to case class is really cool;
Jerkson (https://github.com/codahale/jerkson) is reported to be quite simple and good
Hope it helps,
The usual solution would be to store the settings in a database. Is there any reason not to use a database?

Alternative of Storing data except databases like mysql,sql etc

I had completed my project Address Book in Java core, in which my data is stored in database (MySql).
I am facing a problem that when i run my program on other computer than tere is the requirement of creating the hole data base again.
So please tell me any alternative for storing my data without using any database software like mysql, sql etc.
You can use an in-memory database such as HSQLDB, Derby (a.k.a JavaDB), H2, ..
All of those can run without any additional software installation and can be made to act like just another library.
I would suggest using an embeddable, lightweight database such as SQLite. Check it out.
From the features page (under the section Suggested Uses For SQLite):
Application File Format. Rather than
using fopen() to write XML or some
proprietary format into disk files
used by your application, use an
SQLite database instead. You'll avoid
having to write and troubleshoot a
parser, your data will be more easily
accessible and cross-platform, and
your updates will be transactional.
The whole point of StackOverflow was so that you would not have to email around questions/answers :)
You could store data in a filesystem, memory (use serialisation etc) which are simple alternatives to DB. You can even use HSQLDB which can be run completely in memory
If you data is not so big, you may use simple txt file and store everything in it. Then load it in memory. But this will lead to changing the way you modify/query data.
Database software like mysql, sql etc provides an abstraction in terms of implementation effort. If you wish to avoid using the same, you can think of having your own database like XML or flat files. XML is still a better choice as XML parsers or handlers are available. Putting your data in your customised database/flat files will not be manageable in the long run.
Why don't you explore sqlite? It is file based, means you don't need to install it separately and still you have the standard SQL to retrieve or interact with the data? I think, sqlite will be a better choice.
Just use a prevayler (.org). Faster and simpler than using a database.
I assume from your question that you want some form of persistent storage to the local file system of the machine your application runs on. In addition to that, you need to decide on how the data in your application is to be used, and the volume of it. Do you need a database? Are you going to be searching the data different fields? Do you need a query language? Is the data small enough to fit in to a simple data structure in memory? How resilient does it need to be? The answers to these types of questions will help lead to the correct choice of storage. It could be that all you need is a simple CSV file, XML or similar. There are a host of lightweight databases such as SQLite, Berkelely DB, JavaDB etc - but whether or not you need the power of a database is up to your requirements.
A store that I'm using a lot these days is Neo4j. It's a graph database and is not only easy to use but also is completely in Java and is embedded. I much prefer it to a SQL alternative.
In addition of the others answers about embedded databases I was working on a objects database that directly serialize java objects without the need for ORM. Its name is Sofof and I use it in my projects. It has many features which are described in its website page.

Categories

Resources