Lossless schema mapping from XML records to relations

Lossless schema mapping from XML records to relations - java

I have a problem of dealing with close to 100000 xml records. The problem is to construct a schema mapping from xml schema of these records to relations
Any ideas in this field are welcome. Please propose an algorithm / a mthodology that can be followed to achieve this schema mapping.
Or any related work would surely be helpful.
Thanks
Peter

We're think on details, but it might be that you don't need an algorithm or methodology as much as you need really good tools. Altova has a set of XML tools; some of them can help you map XML documents to a SQL database. (I'm not sure whether they will help you create tables based on XML document elements.) You can download Altova Missionkit here and use it free for 30 days.
I'm sure they're not the only player in this market.
Disclaimer: I have no relationship with Altova. I've used XMLSpy briefly during a contract job for a Fortune 500 a while back. It worked well, and without surprises.

Related

java (spring) create database diffs automatically

I'm seeking for same functionality in Java that present in PHP's Doctrine ORM.
I can describe an entity and the doctrine:migrations:diff console command will make me a fine migration which will generate SQL for updating the database schema. then I can update current database schema with new fields/tables etc running doctrine:migrations:migrate
But what's the same way to do it in Java? I've tried to do it with Flyway, but had no luck because it can't just get the entity and generate diff: https://github.com/flyway/flyway/issues/648#issuecomment-64208848
So, I just tried to do it with liquibase, but don't understand how to do it.
I've tried to ask this question in #java room on irc.freenode.net, but the closest answer was "let hibernate create the schema and have liquibase extract it" (thank you, k5_)
but… how?
many sources say that I must use somewhat "hibernate.hbm2ddl.auto"… but how? what is it?
I'm surprised that there's no examples "for dummies" over internet for this important functionality
I think that it is a terrible way to write all migrations by hands and I just don't believe that there are no solutions exist
please, help me
thank you in advance!
UPD
In my particular case I'm using Hibernate, I have basic entities defined (with OneToMany, ManyToOne mappings)
I want to start on clean database and updating its schema with migrations.
So my goal is to generate this migrations automatically from entities descriptions compared to current database schema state.
UPD 2
The second closest answer is (thanks once again, k5_): "configure hibernate to create a schema when it doesnt exist (just add the configuration option you posted to the persistence.xml). Start the server it will create a schema in a database. Use the liquibase to extract the schema. Or use it to create a diff over two databases."

Neo4j: Enforcing schema with XSD

I was wondering if there exists a tool for Neo4j that can read an XSD file and use it to enforce a schema on Neo4j.
I'm newbie on graph databases but I'm starting to appreciate the schema-less approach. There's a lot of projects out there that have been pumping in a lot of non-sequential data and making sense of it all which is really cool.
I've come across some requirements that call for control on what properties a node or edge can have given a certain label and what labels an edge can have given the labels of its source and destination nodes. The schema is also subject to change - although not frequent.
As I understand, the standard practice is to control the schema from the application itself which to me doesn't seem like it should be a BEST practice. For example, the picky developers from Oracle land create views for applications to interact with and then apply triggers onto the views that execute the appropriate transactions upon the application attempting to insert or update on the view.
I would be looking for a similar device in Neo4j and since I already have the XSD files, it would be a lot less work overall to simply dump them into a folder and have it use those for reference on what to enforce.
This is something I'm willing to write myself unless there's already a library out there for this. I have a day job after all. :)
Thanks!

Not only does this tool not exist, but it couldn't even exist without more work on standardizing how XML is stored in neo4j. There are key differences between the XML model and the neo4j model.
There's this python application here that can import XML into neo4j; documents, not schemas. But in the way that it does it, there are many things to keep in mind:
There's no obvious mapping from XML elements/attributes on to neo4j nodes/properties. You'd think that elements should be nodes, attributes properties; but a better graph model would usually be different than that. For example, XML namespaces would make great nodes because they connect to so many other things (e.g. all elements defined in a namespace) yet typically they're attributes. Maybe namespaces should be labels? Also maybe a reasonable choice, except there's no standard answer there.
XML trees have sequence, and sequence matters; graphs don't. Say you have an XML element with 2 children, A and B. In neo4j you might have a node connected to two other nodes, but you need a way of expressing (probably via a relationship property) that A comes before B. That's of course doable in neo4j, but there's no agreement as far as I know about how to do that. So maybe you pick a sequence attribute, and give it an integer value. Seems reasonable...but now your schema validation software has a dependency on that design choice. XML in neo4j stored any other way won't validate.
There's a host of XML processing options that matter in schema validation that wouldn't in a graph, for example whether or not you care about ignoring whitespace nodes, strict vs. lax schema validation, and so on.
Look, neo4j is great but if you really need to validate a pile of XML documents, it's probably not your best choice because of some mismatches between the graph model and XML's document model. Possible options might be to validate the documents before they go into neo4j, or just to come up with a way of synthesizing XML documents from what is in neo4j, and then validating that result once it's outside of the graph database, as an XML file.

The most easiest way to get the schema from a HSQLDB

I want to get the schema from one table from my hsqldb database. At the moment everything is generated with hibernate. Is there a fast and easy solution to get the schema out of the hsqldb?
I appreciate your answer!

I often use SchemaSpy to generate an html description of a schema including relationship graphs. It uses a JDBC driver to fetch the schema information, so this works with any DB that has one.
http://schemaspy.sourceforge.net

Try SCRIPT.
Same question here: look here too.

Mapping large XML documents using JAXB or Alternative

I am trying to map large XML documents specified by a large set of XSDs to another large XML document (About 2500 lines). The mapping is not exactly one to one, but it's relatively close, with maybe 30-40 elements changing, some needing to be concatenated, or basic filtering logic performed on them. I've found Altova MapForce to be a good solution, however it seems to be overkill as far as the features it provides. Another option I've explored is building a custom mapping framework using JAXB, but I fear I would be building a product like MapForce, and I estimate it would take a few hundred man hours.
I have found very little online about XML Mapping, with the biggest finds being a handful of commercial product solutions, all of which seem a bit overkill.
Any ideas?

I you are using Eclipse IDE, you have an option similar to File - New - "Create bean from XSD schema". It's very useful!

Hibernate Search, Lucene or any other alternative?

I have a query which is doing ILIKE on some 11 string or text fields of table which is not big (500 000), but for ILIKE obviously too big, search query takes round 20 seconds. Database is postgres 8.4
I need to implement this search to be much faster.
What came to my mind:
I made additional TVECTOR column assembled from all columns that need to be searched, and created the full text index on it. The fulltext search was quite fast. But...I can not map this TVECTOR type in my .hbms. So this idea fell off (in any case i thaught it more as a temporary solution).
Hibernate search. (Heard about it first time today) It seems promissing, but I need experienced opinion on it, since I dont wanna get into the new API, possibly not the simplest one, for something which could be done simpler.
Lucene
In any case, this has happened now with this table, but i would like to solution to be more generic and applied for future cases related to full text searches.
All advices appreciated!
Thanx

I would strongly recommend Hibernate Search which provides a very easy to use bridge between Hibernate and Lucene. Rememeber you will be using both here. You simply annotate properties on your domain classes which you wish to be able to search over. Then when you update/insert/delete an entity which is enabled for searching Hibernate Search simply updates the relevant indexes. This will only happen if the transaction in which the database changes occurs was committed i.e. if it's rolled back the indexes will not be broken.
So to answer your questions:
Yes you can index specific columns on specific tables. You also have the ability to Tokenize the contents of the field so that you can match on parts of the field.
It's not hard to use at all, you simply work out which properties you wish to search on. Tell Hibernate where to keep its indexes. And then can use the EntityManager/Session interfaces to load the entities you have searched for.

Since you're already using Hibernate and Lucene, Hibernate Search is an excellent choice.
What Hibernate Search will primarily provide is a mechanism to have your Lucene indexes updated when data is changed, and the ability to maximize what you already know about Hibernate to simplify your searches against the Lucene indexes.
You'll be able to specify what specific fields in each entity you want to be indexed, as well as adding multiple types of indexes as needed (e.g., stemmed and full text). You'll also be able to manage to index graph for associations so you can make fairly complex queries through Search/Lucene.
I have found that it's best to rely on Hibernate Search for the text heavy searches, but revert to plain old Hibernate for more traditional searching and for hydrating complex object graphs for result display.

I recommend Compass. It's an open source project built on top of Lucene that provider a simpler API (than Lucene). It integrates nicely with many common Java libraries and frameworks such as Spring and Hibernate.

I have used Lucene in the past to index database tables. The solution works great, but remeber that you need to maintain the index. Either, you update the index every time your objects are persisted or you have a daemon indexer that dump the database tables in your Lucene index.
Have you considered Solr? It's built on top of Lucene and offers automatic indexing from a DB and a Rest API.

A year ago I would have recommended Compass. It was good at what it does, and technically still happily runs along in the application I developed and maintain.
However, there's no more development on Compass, with efforts having switched to ElasticSearch. From that project's website I cannot quite determine if it's ready for the Big Time yet or even actually alive.
So I'm switching to Hibernate Search which doesn't give me that good a feeling but that migration is still in its initial stages, so I'll reserve judgement for a while longer.

All the projects are based on Lucene. If you want to implement a very advanced features I advice you to use Lucene directly. If not, you may use Solr which is a powerful API on top of lucene that can help you index and search from DB.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.