How to add more synonims to a SOLR SE?

How to add more synonims to a SOLR SE? - java

My application is related to health care, so I would like to have queries that use "heart" to also bring results that include "cardiac"
That is just an example, I have many more synonyms I need to load.
How does SOLR can be taught about those synonyms?

See the SOLR documentation, you'll want to create a SynonymFilter on a Field in your schema, and then define a synonyms.txt file to define all your synonyms, formatting of the file is detailed best in the docs so I won't go into it here.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Related

How to read Lucene indexes from Solr

I have an existing web application which uses lucene to create indexes. Now as per the requirement, I have to set up Solr which will serve as a search engine for many other web application inculding my web app. I do not want to create indexes within Solr. Hence, I need to tell Solr to read indexes from lucene instead of creating indexes within Solr and reading from its own.
As a beginner of Solr, first I used nutch to create indxes and then used those indxes within Solr. But I'm unaware how to make Solr understand to read indexes from lucene. I did not find any documentation around this. Kindly advice how to achive this.

It is not possible in any reliable way.
It's like saying you built an application in Ruby and now want to use Rails on top of existing database structure. Solr (as Rails) has it's own expectations naming and workflows around Lucene core and there is no migration path.
You can try using Luke to confirm the internal data structures differences between Lucene and Solr for yourself.

I have never done that before but as Solr is built on Lucene, you can try these steps. dataDir is the main point here
I am assuming you deploying it in /usr/local so change accordingly, and have basing knowledge of solr configuration.
Download Solr and copy dist/apache-solr-x.x.x.war to tomcat/webapps
Copied example/solr/conf to /usr/local/solr/
Set solr.home to /usr/local/solr
In solrconfig.xml, change dataDir to /usr/local/solr/data (Solr looks for the index directory inside)
change schema.xml accordingly ie. you need to change fields,

Hibernate Search Configuration Help

I am trying to configure hibernate search for my application by reading several web tutorials, the majority uses annotation but I uses xml mapping, also, many tutorial are saying to use spring and maven while I don't uses these.
Can someone help and provide some starting point for configuring hibernate search, many web tutorial are not working for me
The application is a gwt application using gilead with hibernate on the back end

As pointed out in the previous answer, Hibernate Search does not have a xml configuration. You can configure Hibernate via xml, but not Search. Since Hibernate Search 3.3 there is an alternative, however, which is the programmatic configuration api - http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#hsearch-mapping-programmaticapi
There is an object called SearchMapping. Once instantiated it offers an fluent API to configure Search the same way you would do with annotations. Add the configured SearchMapping instance to your Hibernate Configuration instance using the key *hibernate.search.model_mapping* and Search will automatically bootstrap together with Hibernate Core. There is not much to it. You don't need Spring.
Once Search is running you probably want to write a few lines of code to index your existing database. There is code for that in the online manual.
Last, but not least you need some searches. Have a look how to create a FulltextQuery. Your system probably gets some sort of search input in what for so ever. Your task is to transform the search input from the "frontend" into a Lucene query which you can then pass to Search in order to execute the search and return managed objects.
Last, but not least - maven is a completely different thing. Now we are talking build systems. Using maven you can get automatically download the artifacts from the JBoss Maven repository. However, there are also dist bundles on SourceForge is this is more what you are after. Check http://www.hibernate.org/subprojects/search/download for Search download information.
Hope this helps.

http://docs.jboss.org/hibernate/search/3.3/reference/en-US/html_single/#d0e43
Hibernate Search, however, has itself its own set of annotations (#Indexed, #DocumentId, #Field,...) for which there exists so far no alternative configuration.
I also remember seeing something like this in "Hibernate Search in Action", where the author said that there's not much demand for non-annotation configuration (I don't have my copy now, so, I may be wrong). I guess that there is still not enough demand.
Note that Hibernate itself can be configured via XML, and I assume that you can mix both (XML for Hibernate mappings, annotations for Hibernate Search mappings).

use compass-lucene as caching technique

Any example of scenarios other than doing search for which I could use "compass"?
Lets say we have a page that list top 10 most view article. How to use compass to show this kind of results. Any demo/sample project on this to refer to? definitely Jira would be a good example but its source code is not available. I want to know how to maximize the benefits of using compass-lucene in an application.
May i know where can i download spring-compass jpa #annotated example? The nightly built i downloaded is xml-based.

Any example of scenarios other than doing search for which I could use "compass"?
Well, AFAIK, this is what is has been designed for.
Lets say we have a page that list top 10 most view article. How to use compass to show this kind of results.
I'm not sure this is a good use-case, Compass is in my opinion useful when you want to search results across the whole application business domain model i.e. not only lets say articles (in that case, you can just query the database).
Now, let's imagine that your domains objects are searchable classes and have a searchable property numberOfViews, I guess it would be possible to search on this property (refer to the whole Searching section).
Any demo/sample project on this to refer to?
The compass distribution includes samples: a Library basic example that highlights the main features of Compass::Core and a Petclinic sample that shows how to add compass to an existing application (the Spring petclinic sample).
I want to know how to maximize the benefits of using compass-lucene in an application.
Read the author's blog, Compass wiki and the reference documentation :)
May i know where I can download spring-compass jpa #annotated example?
As mentioned, the spring-compass sample included in compass distribution is based on Spring's petclinic which doesn't use annotations (see SPR-2960). Just in case, petclinic annotated entities are attached to SPR-2960 so feel free to use them.

Questions about SOLR documents and some more

Website: Classifieds website (users may put ads, search ads etc)
I plan to use SOLR for searching and then return results as ID nr:s only, and then use those ID nr:s and query mysql, and then lastly display the results with those ID:s.
Currently I have around 30 tables in MySQL, one for each category.
1- Do you think I should do it differently than above?
2- Should I use only one SOLR document, or multiple documents? Also, is document the same as a SOLR index?
3- Would it be better to Only use SOLR and skip MySQL knowing that I have alot of columns in each table? Personally I am much better at using MySQL than SOLR.
4- Say the user wants to search for cars in a specific region, how is this type of querying performed/done in SOLR? Ex: q=cars&region=washington possible?
You may think there is alot of info about SOLR out there, but there isn't, and especially not about using PHP with SOLR and a SOLR php client... Maybe I will write something when I have learned all this... Or maybe one of you could write something up!
Thanks again for all help...

First, the definitions: a Solr/Lucene document is roughly the equivalent of a database row. An index is roughly the same as a database table.
I recommend trying to store all the classified-related information in Solr. Querying Solr and then the database is inefficient and very likely unnecessary.
Querying in a specific region would be something like q=cars+region:washington assuming you have a region field in Solr.
The Solr wiki has tons of good information and a pretty good basic tutorial. Of course this can always be improved, so if you find anything that isn't clear please let the Solr team know about it.
I can't comment on the PHP client since I don't use PHP.

Solr is going to return it's results in a syntax easily parsible using SimpleXml. You could also use the SolPHP client library: http://wiki.apache.org/solr/SolPHP.
Solr is really quite efficient. I suggest putting as much data into your Solr index as necessary to retrieve everything in one hit from Solr. This could mean much less database traffic for you.
If you've installed the example Solr application (comes with Jetty), then you can develop Solr queries using the admin interface. The URI of the result is pretty much what you'd be constructing in PHP.
The most difficult part when beginning with Solr is getting the solrconfig.xml and the schema.xml files correct. I suggest starting with a very basic config, and restart your web app each time you add a field. Starting off with the whole schema.xml can be confusing.

2- Should I use only one SOLR document, or multiple documents? Also, is document the
same as a SOLR index?
3- Would it be better to Only use SOLR and skip MySQL knowing that I have alot of
columns in each table? Personally I am much better at using MySQL than SOLR.
A document is "an instance" of solr index. Take into account that you can build only one solr index per solr Core. A core acts as an independent solr Server into the same solr insallation.
http://wiki.apache.org/solr/CoreAdmin
Yo can build one index merging some table contents and some other indexes to perform second level searches...
would you give more details about your architecture and data??

As suggested by others you can store and index your mysql data and can run query in solr index, thus making mysql unnecessary to use.
You don't need to just store and index ids and query and get ids and then run mysql query to get additional data against that id. You can just store other data corresponding to ids in solr itself.
Regarding solr PHP client, then you don't need to use and it is recommended to directly use REST like Solr Web API. You can use PHP function like file_get_contents("http://IP:port/solr/#/core/select?q=query&start=0&rows=100&wt=json") or use curl with PHP if you need to. Both ways are almost same and efficient. This will return data in json as wt=json. Then use PHP function json_decode($returned_data) to get that data in object.
If you need to ask anything just reply.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.

I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).

If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.

True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.

I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.