Do I need to normalize this MySQL db? - java

I have a classifieds website which uses SOLR to search for whatever ads the user wants to search for... SOLR then returns the ID:s of all the matches found. I then use the ID:s to fetch and display the ads from a MySQL table.
currently I have one huge table containing everything in MySQL.
Sometimes some of the fields are empty because for instance an apartment has no "model" but a car does.
Is this a problem for me if I use SOLR like I do?
Thanks

Ask yourself these questions:
Is your current implementation slow or prone to error?
Are you adding a lot of "hacks" in order to display content or fetch data correctly due to the de-normalization of your database?
In the long run, will you benefit from normalizing the table?
Hope that helps. It all depends on your situation! Personally, I build databases normalized and then de-normalize as needed to keep things speedy.

If you are using SOLR, why don't you just serve complete ad from solr instead of MySQL to save DB time?
One huge table usually is not goog option at all.

Related

how to create "has many" between two documents in couchdb?

basically I am wondering how you would go about in Couchdb as you would in MysQL: storing username, password in one table and link the user id as foreign key on another table of tasks?
should I just use mysql for the user authentication part and couchdb to store lots of user submitted documents? so create a random unique token to link each user to their "documents" on couchdb?
also I am looking to store Java objects to the couchdb, and retrieve them to be used directly in my application. which Java-couchdb library does this? Ektorp's example is seems more complicated compared to couchdb4j.
I do not know Java very well, but I suggest use the most simple tool you find. CouchDB is very simple and usually it is most beneficial to access it with simple tools too.
Yes, if you will have many relationships in the data, MySQL will help. However CouchDB can do some simple has-many queries.
First, there is view collation. You use map/reduce, and for every "child" document, you emit a key pointing to the parent document. When you query for ?key=parent then you get a long list of children. (The wiki explains it pretty well.)
Secondly, I suggest the article What's new in CouchDB 0.11 which shows how to use document _ids to link between two documents.
Good luck!

Any reference for good Datamining tools in Java?

We are working on an internship project for company. The project itself consists of Datamining. Let's say the structure of database we have to work is huge (in Gigabytes).
Sad to say that DB itself is very poorly structured with inconsistent values and most importantly no primary or foreign keys. So in our simple Servlet modules to extract and show the inconsistent data, it takes forever for queries to perform and show up on servlet.
As n00b programmers we do not know about Join and such things in DB. Also we are using MySQL as our DB server. The DB is composed of real-time data from telecom towers.
To find sample inconsistency in table values we are using combination of multiple queries, output of one query serving as input to another query like:
"SELECT distinct(tow_id) FROM 'tower_data' WHERE TIME_STAMP LIKE ? ";
//query for finding tower-id.
"SELECT time_stamp FROM tower_data WHERE 'TIME_STAMP' LIKE ? AND 'PARAM_CODE' = ? AND 'TOW_ID'=? GROUP BY time_stamp HAVING count( * ) >1";
//query for finding time stamps with duplicate data.
And so on.
Also there are some 10 tables in the database. We need to combine 2-3 tables to get values for custom queries.
After finding all the inconsistent values for multiple factors, we have to do data cleansing, removal of noise, data prediction and such tasks in the next stage.
So we thought we can apply some Java Data Mining tools which would in turn apply some algorithm to speed up the data retrieval.
Please guide us towards some good datamining tools. Any guidance towards optimizing/rewriting the queries would also be highly appreciated.
I'm not 100% sure it will help in your case, but have a look at google-refine...
Since you seem to have a lot of badly structured data, I do not think data-mining will help.
You may consider using Apache Hadoop for going over all this data and finding inconsistencies. You can use Amazon EC2 for a simple and relatively cheap way to run Hadoop. You can also use Hadoop to port the databases to a better schema, provided that you can build one.
EDIT: I guess you can also do some things within MySQL. Use query explanation to find the slow parts of your query - I believe 'LIKE' is usually slow, and maybe you can reformulate the query to something faster. Maybe you can first sort your schema by timestamp and then look at sub-ranges. Again, you first have to have an efficient way to get the data, and then you can try to mine it. Good luck.

Should I use Lucene only for search?

Our website needs to give out data to the world. This is open-source data that we have stored, and we want it to make it publicly available. It's about 2 million records.
We've implemented the search of these records using Lucene, which is fine, however we'd like to show an individual record (say the user clicks on it after the search is done) and provide more detailed information for that record.
This more detailed information however isn't stored in the index directly... there are like many-to-many relationships and we use our relational database (MySQL) to provide this information.
So like a single record belongs to a category, we want the user to click on that category and show the rest of the records within that category (lots more associations like this).
My question is, should we use Lucene also to store this sort of information and retrieve it through simple search (category:apples), or should MySQL continue doing this logical job? Should I use Lucene only for the search part?
EDIT
I would like to point out that all of our records are pretty static.... changes are made to this data once every week or so.
Lucene's strength lies in rapidly building an index of a set of documents and allowing you to search over them. If this "detailed information" does not need to be indexed or searched over, then don't store it in Lucene.
Lucene is not a database, it's an index.
You want to use Lucene to store data?, I thing it's ok, I've used Solr http://lucene.apache.org/solr/
which built on top of Lucene to work as search engine and store more data relate to the record that maybe use for front end display. It worked with 500k records for me, and 2mil records I think it should be fine.

dynamic object relation mapping

I am trying to create an application in java which pulls out records from the database and maps it to objects. It does that without knowing what the schema of the database looks like. All i want to do is fetch all rows from all tables and store them somewhere. There could be a thousand tables with thousands of records each. The application doesn't know the name of any table or attribute. It should map "on the fly". I looked at hibernate but it doesnt give me what i want for this app. I don't want to create hard-coded xml files and classes for mapping. Any ideas how i can accomplish this ?
Thanks
Oracle has a bunch of data dictionary views for metadata.
ALL_TABLES, ALL_TAB_COLUMNS would be first places to start. Then you'd build ad-hoc queries based on what you get out of there. Not sure whether you have to deal with all data types (dates, blobs, spatial, user-defined....).
Not sure what you mean by "store them somewhere". If you start thinking CSV or XML files, you'll need to escape various characters from VARCHAR2 columns.
If you are looking for some generic extract/unload routines, you should look at what is already available in the database or open-source/commercially.
MyBatis provides a pretty simple way to map data results to objects and back, maybe check that out?
http://code.google.com/p/mybatis/
Not to be flip, but for this task, you might want to check out Ruby on Rails and its ActiveRecord approach

Adding Java Objects to database

For a university assignment I have been assigned I have a Prize object which contains either text, image, or video content. I would like to persist this information into a BLOB field within an Apache Derby database (which will be running on a low powered PDA). How can I add this data to the database?
Thanks in advance.
In this article Five Steps to Managing Unstructured Data with Derby
you can read how to do this.
It describes how to insert binary data into a column with the BLOB datatype in Apache Derby using JDBC.
I assume you'll be connecting via JDBC. If so, simply write your SQL and look at the setBlob method of a PreparedStatement. Should be pretty straightforward.
Serialization is the easy way to do it, however if possible you could make it look like a real database table with a structure containing id (bigint), datatype (smallint), creationdate (date) and data (blob) and specifically make the client code to save the object's data there. This way you could do searches like "get all video prizes created between January 1st 2008 and January 15th 2009" and it wouldn't break down old data if your class would change too much for the serialization to stop working.
This sort of solution would be easy to extend in the future too if there would be need for it; I understand this is a school assignment and such need most likely won't ever surface but if your teacher/professor knows his stuff, I bet he's willing to give an extra point or two for doing this excercise in this way since it takes a bit more time and shows that you can take the steps to prepare in advance for coping in the everchanging landscape of software development.
If you are using Netbeans (I assume Eclipse has similar functionality) you can setup your database schema and the create new Java entity classes from the database and it will generate the appropriate JPA classes for you.
http://hendrosteven.wordpress.com/2008/03/06/simple-jpa-application-with-netbeans/
This is nice as it allows you to focus on your code rather than the database glue code.
The best solution , is to use Derby, because it keep being a multi platform app developed via Java.

Categories

Resources