Java ORM related question - SQL Vs Google DB (Big Table?) GAE

Java ORM related question - SQL Vs Google DB (Big Table?) GAE - java

I was wondering about the following two options when one is not using SQL tables but ORM based DBs (Example - when you are using GAE)
Would the second option be less efficient?
Requirement:
There is an object. The object has a collection of similar items. I need to store this object. Example, say the object is a tree and it has a collection of leaves.
Option 1:
Traditional SQL type structure:
Table for the Tree (with TreeId as the
identifier for a row in the Table.)
Table for the Leaves (where each leaf
has a TreeId and to show the leaves
of a tree, I query all leaves where
the TreeId is the Id of the tree.)
Here, the Tree structure DOES NOT
have a field with leaves.
Option 2:
ORM / GAE Tables:
Using the same example above,
I have an object for Tree where the object has a collection (Set/List in Java/C++) of leaves.
I store and retrieve the Tree together with the leaves (as the leaves are implemented as a Set in the Tree object)
My question is, will the second one be less efficient that the first option?
If so, why? Are there other alternatives?
Thank you!

It would be better to use Hibernate(for java) or other ORM framework than ORM db.
1. orm db's are mostly amateur.
2. no one appreciates it. You will be much better specialist if you know PostgreSQL with orm framework than just some orm db.
3. there are many standards in the world of rdbms. no standards in orm dbs.
4. rdbms support and community make this choice safer in long term.
5. effeciency is a tricky question. almost 80% sure that if you want to find row with "name = 'Alex'" it will be faster in rdbms than in orm db, cuz orm db will need to unpack object for this operation.
PS: i understand, my post is almost offtopic, but i think it contains some good stuff to think about.

Related

Adding custom fields in my application

I have a SAAS product, which is build by Spring MVC and Hibernate. Generally SAAS products allow user's to customize the product like adding extra fields to the table. So i want to give the flexibility to users, to create custom fields in the tables for themselves. Please provide all the viable solutions to achieve it. Thank you so much for your help.

I'm guessing your trying to back this to a Relational database. The primary problem is that relational databases store things in tables, and tables don't really handle free form data well.
So one solution is to use a document structure that is flexible, like XML (and perhaps ditch the database) but databases have features which are nice, so let's also consider the database-using approaches.
You could create a "custom field" table which would have columns (composite primary key) for
ExtendedTable
ColumnName
but you'd also have to store the data somewhere
(ExtendedKey)
DataItem
And now we get into the really nasty bits. How would you apply constraints to this data? I mean, what would the type be of a DataItem? A general solution would be quite complex (being a type of free form database). Hopefully you could limit the solution to solve only the problems you require solved.
Another approach is to use a single "extra" column that contains an XML record which embeds it's own "column and value" extensions, but if you wanted to display a table of the efficiently, you'd have to parse out every XML document in every field, which is not ideal.
Neither one of these approaches will work well with the existing SQL query language, so you'll then start building your own query language.
I suggest you go back and look at real data requirements, instead of sweeping them under the table with a "and anything else one might want" set of columns on your table.

Your requirement is best suited use case for NoSQL databases (like MongoDB).
Dynamically creating relational database tables & columns (modifying schemas) upon user requests in an application is not a best practice as these involve DDL operations, which are very powerful and in case if you don't handle them carefully, the whole application's database goes to the inconsistent state.

Database vs Solr vs Graph DB(Neo4j)

I'm thinking about possible solution (tool) for my issue.
There is a collection of locations with a huge amount (more than 600 000) of elements. Locations have name (in different languages) and represented in tree structure: region->country->admin division->city->zip. User can add custom location, but I plan that these actions will happen rarely. Application should provide efficient ability to perform search by location name, type, to build hierarchical name (f.e. "London->England->United Kingdom"), build subtree of locations (f.e. all countries and cities in those countries of Europe).
I've considered three solutions.
Plain database: locations will hold in some tables and the main building logic will be implemented in java code. In case of this solution I am worried about performance, because search, building tree and creating custom locations can involve additional table joining.
SOLR: at first glance this task is exactly for solr: data set changes rarely, we need search by names. But I'm worried if Solr pivots feature will satisfy the tree building needs. Also I'm not sure if Solr searching will be much better then plain DB, because search is not so difficult (just searching by names which are short strings).
graph db Neo4j: it seems useful for building trees and subtrees. But I'm not sure about search performance (it seems I should use community edition, which does not have some useful performance features like caching and etc.)

Database is a big NO. as RDBMS is not optimized for relation based queries. For example show me the people who are eating in the same restaurant where I do and also belong to the same region where I do. OR to make it more complex, a db query can be a killer where level of relations are to be calculated. Like I can be your second level friend where one or more of your friends is/are my friend(s).
SOLR: Solr is a good option but you have to see the performance impact of it. With so many rows to index it can be a memory killer. Go through these first before implementing SOLR.
http://wiki.apache.org/solr/SolrPerformanceProblems
http://wiki.apache.org/solr/SolrPerformanceFactors
SOLR also not a good solution for more logical searches as you have to learn it all before going for it.
Neo4J (or Any other graph DB) is perfect solution. I have implemented all these three technologies myself and with my experience I found Neo4J best for such requirement.
However, you must see how to backup the database and how to recover it in case of a crash.
All the best.

Performance of Graph vs. Relational databases

I am working on a project where tons of graph operations are performed in near real-time. We are currently using Hibernate, MySQL and EhCache but considering moving all the graph-related persistence to a graph database like Neo4j or Titan.
Can graph databases perform better than Hibernate+relational? I just want to make sure we are not going to replace six of one with half a dozen of the other.

The deeper the object graph, the more the performance advantage swings to object/graph databases.
Relational database performance drops off markedly with more than seven JOINs.
Geometric systems such as CAD/CAM, with deep object graphs for bills of materials, outperform their relational counterparts.
Relational databases have one huge advantage: relational algebra and a clear separation between the data and the "how" of accessing and manipulating it. But they are not perfect for every problem.

The advantage you have when moving to neo4j (or some graph db) is that the query time remains constant (well almost) and hence predictable irrespective of the increase in data volume. It always better to do a proof-of-concept based on your data domain as generalized answers are generally not applicable for nosql dbs.
Taken from here.

Both graph and relational databases rely on caches to improve query performance. However, an edge traversal in a graph database is usually a constant time operation, and the edge is typically cached if the vertex is cached. With an RDBMS, a foreign key traversal requires a B-Tree index lookup on the target table which takes O(log n) time. When the index doesn't fit in the cache, the database would have to perform disk-seek operations which are slow.
Check out Bitsy. If your graph fits in memory, it is very fast for queries and updates. Or you can go with another Blueprints implementation, like Neo4J and Titan, which can handle larger datasets.

If you're using Hibernate then you're persisting domain objects which by their nature ARE object graphs.
Databases are tabular structures and do OK with this relationship but break down fast. In addition, Hibernate has a nasty habit of pulling in the entire database with joins.
Given that Neo4j was designed with object relations as it's core function and you're doing domain persistence, this nature design fit is sure to be better.
Also, Neo4j does its queries using Lucene (a stupid fast search index) and can jump straight to your node for traversal.
Bottom line: Neo4j was design for mind blowing scale and exactly the idea of graph-related data. You're not going wrong for scaling but you will find the tools/libraries aren't as mature for the job as they are for a classic DB connection

Abstraction layer for table partitioning - JPA

Facts
Database: PostgreSQL (latest)
Programming language: Java
Problem statement (simplified)
We have 2 tables - overview and details. There could be millions of rows in "overview" and each row of "overview" can have millions of rows associated with it in "details". The foreign key details.overview_id refers to overview.id. Most queries are of the general formSELECT * FROM details WHERE overview_id = xxx AND details.id > yyy AND details.id < zzz; If we have a single table for details, the queries will be too slow (although the queries on details are almost always on primary keys). More on the nature of DB activities: INSERT and UPDATE on overview happens infrequently. INSERT on details happen at a rapid pace, while UPDATE on the same table almost never happens and bulk DELETE happens sometimes.
What we already have
In the past we used raw SQL to partition the table "details" against each row in "overview". (In practice, we did not actually partition, instead we created new tables based on a template. These tables did not have any column called overview_id (saving storage space), instead we had a separate table that did the mapping between overview.id and the table-name of the specific partition table.) So, as you can understand, the partitions had to be generated on the fly as new rows were inserted in overview and partitions were dropped as rows were deleted from overview. All of this was managed inside the application. The application-database interaction has been blazing fast, but the application code is fairly complex, implying it is hard to maintain. Also, with raw SQL lying around everywhere, it is hard to scale the DB horizontally - we have to reinvent what most JPA providers have already done.
Current goal
Currently we are exploring options for a mechanism by which this partitioning can happen behind the scene - possibly by a JPA provider (I understand that this is not part of the JPA spec), so that we can focus on the application while the underlying framework/layer takes care of the scalability issues.
I looked at openJPA Slice and EclipseLink. Both of them provide partition (shard) management across hosts. We certainly need that. But we also need partition management within a single host. However, if there is a better or more elegant solution to this or if there is a totally different angle to look at this, I will be really glad to know about that.
I will appreciate any insight you can provide.
Thanks.
Prajesh

Have you looked into using Postgres's table partitioning?
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

Thank you all for your comments/answers till date. We decided to stick to what we already have (see the section named "what we already have"), with minor modifications.

"Select or create" from database, in Java

I've done all my database development for the past few years in Ruby, mostly using ActiveRecord. Now I'm stuck using Java for a project, and it feels so verbose and hamfisted, I'm wondering if I'm doing things wrong.
In an ORM paradigm, if I want to insert into related tables, I'd so something like
# Joe Bob got a new car
p = Person.find_or_create_by_name("Joe Bob");
Car.new({:make=>"Toyota", :plate=>"ABC 123", :owner=>p});
In Java, at least using JDBC directly, I'm going to have to do the Person lookup by hand, insert if it doesn't exist, then create the Car entry.
Of course, in real life, it's more than just 2 tables and the pain scales exponentially. Surely there's a better way?

You can use ORM solutions for Java - there are various solutions available.
Links worth looking at:
Hibernate - http://www.hibernate.org/ - probably the leading Java ORM solution
SO Question - Hibernate, iBatis, Java EE or other Java ORM tool
Having said that, I've usually found that for complex applications ORM frequently causes more trouble than it is worth (and yes, this does include Ruby projects with Activerecord). Sometimes it really does make sense to just get at the data directly via SQL rather than attempt to force on object-oriented facade on top of it.

The better way is learn SQL! The ORM you like so much writes SQL for you behind the scenes.
So you can make a quick helper function that tries to select the record, and if it doesn't exist creates it for you.
In MySQL you can use INSERT IGNORE ..... which will insert the row only if it doesn't exist.
And here is a special bit of SQL you may like (MySQL only):
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), c=3;
This tries to insert the record, if it doesn't exist it returns the auto_increment like usual that you retrieve in your program.
But: If it does exist then it updates it - (only c is set to update in that case), but the cool part is it sets the LAST_INSERT_ID() just like it would on an insert.
So either way you get the ID field. And all in a single bit of SQL.
SQL is a very nice language - you should learn it and not rely on the psudo-language of orm.

If you are using JDBC you need to lookup yourself and create person if it does not exist. There is no better way if you use JDBC.
But you can use Hibernate, it will help you reduce writing the O-R mapping yourself and reduce the boilerplate.
As you come from Ruby and If you find it painful to write all the SQL queries, JDBC boilerplate then the better way is to use ORM. I recommend one of the following,
Hibernate
JPA (If you want to change the ORM implementation then use JPA)

Sormula contains an active record package. The save method will update an existing record or insert if no record exists.
See the active record example on web site.
Also see org.sormula.tests.active.SaveTest.java within the project:
SormulaTestAR record = new SormulaTestAR();
record.attach(getActiveDatabase()); // record needs to know data source
record.setId(8002);
record.setType(8);
record.setDescription("Save one AR 2");
record.save();

Looks like I'm late here, however, ActiveJDBC will do what you want in Java:
Person p = Person.findOrCreateIt("name", "Joe Bob");
Car car = Car.createIt("make", "Toyota", "plate", "ABC 123", "owner", p);
There is a ton more it can do, check out at: http://javalite.io/

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.