Apache Solr for single database column suggestions - java

I have a relational database with few tables. Some of them have columns that I want to enable autocompletion / autocorrection on (e.g. titles, tags, categories).
I have seen that Apache Solr, which builds upon Lucene indexing can offer such functionality. Also data can be fed in to Solr from relational database.
My question is: is this the best way I can get autocomplete and autocorrect services for my entities? Or am I killing a mosquito with a bazooka here?
Solr requires a lot of resources, memory and stuff and I wonder if something far simpler can do the trick for me.

How many unique values do you have in title, tags , categories? A few thousand? Then I think you can get away with using a Trie Data structure. A few million records in those columns? Then Solr / Elasticsearch might be good option.
I have used Trie for autosuggestion. Building a Trie is expensive. But you can store the trie in Memcached or even SQL and update it periodically when new data is added to your columns.

Related

How join a record set that is returned from a web service with one of your sql tables

I thought about this solution: get data from web service, insert into table and then join with other table, but it will affect perfomance and, also, after this I must delete all that data.
Are there other ways to do this?
You don't return a record set from a web service. HTTP knows nothing about your database or result sets.
HTTP requests and responses are strings. You'll have to parse out the data, turn it into queries, and manipulate it.
Performance depends a great deal on things like having proper indexes on columns in WHERE clauses, the nature of the queries, and a lot of details that you don't provide here.
This sounds like a classic case of "client versus server". Why don't you write a stored procedure that does all that work on the database server? You are describing a lot of work to bring a chunk of data to the middle tier, manipulate it, put it back, and then delete it? I'd figure out how to have the database do it if I could.
no, you don't need save anything into database, there's a number of ways to convert XML to table without saving it into database
for example in Oracle database you can use XMLTable/XMLType/XQuery/dbms_xml
to convert xml result from webservice into table and then use it in your queries
for example:
if you use Oracle 12c you can use JSON_QUERY: Oracle 12ะก JSON
XMLTable: oracle-xmltable-tutorial
this week discussion about converting xml into table data
It is common to think about applications having a three-tier structure: user interface, "business logic"/middleware, and backend data management. The idea of pulling records from a web service and (temporarily) inserting them into a table in your SQL database has some advantages, as the "join" you wish to perform can be quickly implemented in SQL.
Oracle (as other SQL DBMS) features temporary tables which are optimized for just such tasks.
However this might not be the best approach given your concerns about performance. It's a guess that your "middleware" layer is written in Java, given the tags placed on the Question, and the lack of any explicit description suggests you may be attempting a two-tier design, where user interface programs connect directly with the backend data management resources.
Given your apparent investment in Oracle products, you might find it worthwhile to incorporate Oracle Middleware elements in your design. In particular Oracle Fusion Middleware promises to enable "data integration" between web services and databases.

Hibernate Search, Lucene or any other alternative?

I have a query which is doing ILIKE on some 11 string or text fields of table which is not big (500 000), but for ILIKE obviously too big, search query takes round 20 seconds. Database is postgres 8.4
I need to implement this search to be much faster.
What came to my mind:
I made additional TVECTOR column assembled from all columns that need to be searched, and created the full text index on it. The fulltext search was quite fast. But...I can not map this TVECTOR type in my .hbms. So this idea fell off (in any case i thaught it more as a temporary solution).
Hibernate search. (Heard about it first time today) It seems promissing, but I need experienced opinion on it, since I dont wanna get into the new API, possibly not the simplest one, for something which could be done simpler.
Lucene
In any case, this has happened now with this table, but i would like to solution to be more generic and applied for future cases related to full text searches.
All advices appreciated!
Thanx
I would strongly recommend Hibernate Search which provides a very easy to use bridge between Hibernate and Lucene. Rememeber you will be using both here. You simply annotate properties on your domain classes which you wish to be able to search over. Then when you update/insert/delete an entity which is enabled for searching Hibernate Search simply updates the relevant indexes. This will only happen if the transaction in which the database changes occurs was committed i.e. if it's rolled back the indexes will not be broken.
So to answer your questions:
Yes you can index specific columns on specific tables. You also have the ability to Tokenize the contents of the field so that you can match on parts of the field.
It's not hard to use at all, you simply work out which properties you wish to search on. Tell Hibernate where to keep its indexes. And then can use the EntityManager/Session interfaces to load the entities you have searched for.
Since you're already using Hibernate and Lucene, Hibernate Search is an excellent choice.
What Hibernate Search will primarily provide is a mechanism to have your Lucene indexes updated when data is changed, and the ability to maximize what you already know about Hibernate to simplify your searches against the Lucene indexes.
You'll be able to specify what specific fields in each entity you want to be indexed, as well as adding multiple types of indexes as needed (e.g., stemmed and full text). You'll also be able to manage to index graph for associations so you can make fairly complex queries through Search/Lucene.
I have found that it's best to rely on Hibernate Search for the text heavy searches, but revert to plain old Hibernate for more traditional searching and for hydrating complex object graphs for result display.
I recommend Compass. It's an open source project built on top of Lucene that provider a simpler API (than Lucene). It integrates nicely with many common Java libraries and frameworks such as Spring and Hibernate.
I have used Lucene in the past to index database tables. The solution works great, but remeber that you need to maintain the index. Either, you update the index every time your objects are persisted or you have a daemon indexer that dump the database tables in your Lucene index.
Have you considered Solr? It's built on top of Lucene and offers automatic indexing from a DB and a Rest API.
A year ago I would have recommended Compass. It was good at what it does, and technically still happily runs along in the application I developed and maintain.
However, there's no more development on Compass, with efforts having switched to ElasticSearch. From that project's website I cannot quite determine if it's ready for the Big Time yet or even actually alive.
So I'm switching to Hibernate Search which doesn't give me that good a feeling but that migration is still in its initial stages, so I'll reserve judgement for a while longer.
All the projects are based on Lucene. If you want to implement a very advanced features I advice you to use Lucene directly. If not, you may use Solr which is a powerful API on top of lucene that can help you index and search from DB.

Does a simple document-based database exist?

Is there a database out there that I can use for a really basic project that stores the schema in terms of documents representing an individual database table?
For example, if I have a schema made up of 5 tables (one, two, three, four and five), then the database would be made up of 5 documents in some sort of "simple" encoding (e.g. json, xml etc)
I'm writing a Java based app so I would need it to have a JDBC driver for this sort of database if one exists.
CouchDB and you can use it with java
dbslayer is also light weight with MySQL adapter. I guess, this will make life a little easy.
I haven't used it for a bit, but HyperSQL has worked well in the past, and it's quite quick to set up:
"... offers a small, fast multithreaded and transactional database engine which offers in-memory and disk-based tables and supports embedded and server modes."
CouchDB works well (#zengr). You may also want to look at MongoDB.
Comparing Mongo DB and Couch DB
Java Tutorial - MongoDB
Also check http://jackrabbit.apache.org/ , not quite a DB but should also work.

"Should I use multiple indices in Solr?", and some other quick Q

Imagine a classifieds website, a very simple one where users don't have login details.
I have this currently with MySql as a db. The db has several tables, because of the categories, but one main table for the classified itself. Total of 7 tables in my case.
I want to use only Solr as a "db" because some people on SO thinks it would be better, and I agree, if it works that is.
Now, I have some quick questions about doing this:
Should I have multiple scheema.xml files or config.xml files?
How do I query multiple indices?
How would this (having multiple indices) affect performance and do I need a more powerful machine (memory, cpu etc...) for managing this?
Would you eventually go with only Solr instead of what I planned to do, which is to use Solr to search and return ID numbers which I use to query and find the classifieds in MySql?
I have some 300,000 records today, and they probably won't be increasing.
I have not tested how the records would affect performance when using Solr with MySql, because I am still creating the website, but when using only MySql it is quite slow.
I am hoping it will be better with Solr + MySql, but as I said, if it is possible I will go with only Solr.
Thanks
4 : If an item has status fields that get updated much more frequently than the rest of the record then it's better to store that information in a database and retrieve it when you access the item. For example, if you stored your library book holdings in a solr index, you would store the 'borrowed' status in a database. Updating Solr can take a fair bit of resources and if you don't need to search on a field it doesn't really need to be in Solr.

Questions about SOLR documents and some more

Website: Classifieds website (users may put ads, search ads etc)
I plan to use SOLR for searching and then return results as ID nr:s only, and then use those ID nr:s and query mysql, and then lastly display the results with those ID:s.
Currently I have around 30 tables in MySQL, one for each category.
1- Do you think I should do it differently than above?
2- Should I use only one SOLR document, or multiple documents? Also, is document the same as a SOLR index?
3- Would it be better to Only use SOLR and skip MySQL knowing that I have alot of columns in each table? Personally I am much better at using MySQL than SOLR.
4- Say the user wants to search for cars in a specific region, how is this type of querying performed/done in SOLR? Ex: q=cars&region=washington possible?
You may think there is alot of info about SOLR out there, but there isn't, and especially not about using PHP with SOLR and a SOLR php client... Maybe I will write something when I have learned all this... Or maybe one of you could write something up!
Thanks again for all help...
First, the definitions: a Solr/Lucene document is roughly the equivalent of a database row. An index is roughly the same as a database table.
I recommend trying to store all the classified-related information in Solr. Querying Solr and then the database is inefficient and very likely unnecessary.
Querying in a specific region would be something like q=cars+region:washington assuming you have a region field in Solr.
The Solr wiki has tons of good information and a pretty good basic tutorial. Of course this can always be improved, so if you find anything that isn't clear please let the Solr team know about it.
I can't comment on the PHP client since I don't use PHP.
Solr is going to return it's results in a syntax easily parsible using SimpleXml. You could also use the SolPHP client library: http://wiki.apache.org/solr/SolPHP.
Solr is really quite efficient. I suggest putting as much data into your Solr index as necessary to retrieve everything in one hit from Solr. This could mean much less database traffic for you.
If you've installed the example Solr application (comes with Jetty), then you can develop Solr queries using the admin interface. The URI of the result is pretty much what you'd be constructing in PHP.
The most difficult part when beginning with Solr is getting the solrconfig.xml and the schema.xml files correct. I suggest starting with a very basic config, and restart your web app each time you add a field. Starting off with the whole schema.xml can be confusing.
2- Should I use only one SOLR document, or multiple documents? Also, is document the
same as a SOLR index?
3- Would it be better to Only use SOLR and skip MySQL knowing that I have alot of
columns in each table? Personally I am much better at using MySQL than SOLR.
A document is "an instance" of solr index. Take into account that you can build only one solr index per solr Core. A core acts as an independent solr Server into the same solr insallation.
http://wiki.apache.org/solr/CoreAdmin
Yo can build one index merging some table contents and some other indexes to perform second level searches...
would you give more details about your architecture and data??
As suggested by others you can store and index your mysql data and can run query in solr index, thus making mysql unnecessary to use.
You don't need to just store and index ids and query and get ids and then run mysql query to get additional data against that id. You can just store other data corresponding to ids in solr itself.
Regarding solr PHP client, then you don't need to use and it is recommended to directly use REST like Solr Web API. You can use PHP function like file_get_contents("http://IP:port/solr/#/core/select?q=query&start=0&rows=100&wt=json") or use curl with PHP if you need to. Both ways are almost same and efficient. This will return data in json as wt=json. Then use PHP function json_decode($returned_data) to get that data in object.
If you need to ask anything just reply.

Categories

Resources