I'm finding a full-text search framework to search my blog, i have seen solr getting started, but i find solr it's black-box framework, in my experience, if it raise some error, it will hard to debug and know what happen in its inner code.
so my question is is there java search framework that i can use it with pure java(without xml or bin)?
Solr is open-source, you can see it here http://lucene.apache.org/solr/
You normally would not need to debug it to begin with. Solr just wraps the Lucene engine in a REST API with a nice web interface so what would really your problem be?
If you want alternatives, nowadays the best/only alternative with lots of support would be elastic search, you can read more here http://www.elasticsearch.org/overview/
Have a blast.
Related
I am trying to get the project from github to work.
It can be found here: https://github.com/neo4j/java-rest-binding
Has anyone put this into a JAR already? I am to connect to a local neo4j store. Any other suggestions would be appreciated. I just want to be able to do a quick access of node zero.
As a complement to Axel's answer, here is a good available tutorial where you can find a lot of examples of Neo4J uses (core APIs, indexing, traversals, Cypher and REST interactions): https://github.com/jimwebber/neo4j-tutorial.
Especially, Koan11 class illustrates how to call the built-in REST API, Koan12 shows how to roll your own API via unmanaged extensions.
I recently posted answer to a similar question here.
Neo4j, REST API, java - cypher queries
From there you can grab my pom.xml and my java file for your quick example.
Also you might find this tutorial helpful.
http://thought-bytes.blogspot.com/2013/07/getting-started-with-neo4j-java-rest-heroku.html
I have a set of categorized text files. I want to categorize another large set of text files to use in my research. Is there a good way to compare them?
I think SVM based methods are useful but is there a simple and documented library for using such algorithms?
I don't know much about SVM, but LingPipe might be really helpful for you. The link is a tutorial specifically about categorization of documents (automatic or guided).
Also, look into the inter-related search products Lucene (a search library), Solr (search server app), and Carrot2 (for 'clustering' search results). There should be some interesting work in that space for you.
Mallet is another awesome library to look into. It has good commandline tools to help you get started and a Java API once you start getting into integrating it with the rest of your system.
I want to know what are the query classes that Solr use for querying. And what are the difference in querying using lucene and Solr
I am not sure what you are asking, but SOLR is basically a search/indexing server. It has an external http based api for sending documents to be indexed and to search them.
One of the core pieces of SOLR is Lucene. This is the library that actually indexes/searches stuff.
If you need the API/query info for SOLR (which should mirror very closely that of lucene), look on lucene.apache.org
Solr allows you to have a distributed search engine that is exposed as a web-service to your client application. If you are asking, how to use it on the client side, just look at solrj api. If you ask for internal SOLR apis and classes, then you could start from the QueryComponent class, e.g. http://lucene.apache.org/solr/api/org/apache/solr/handler/component/QueryComponent.html.
Lucene is the technology used by solr to perform searches.
I'm not 100% what you are asking but if its how do i query solr, then you simply visit or curl a url, the url will contain the solr query. e.g.
price:[0-1000]
or
name:test
the first part (before the :) is the field,and the second part is the search which can be text,numeric range etc...
there is plenty of documentation regarding this on solr's wiki
Let me know what your actual problem is and ill gladly help
There are several advantages to use Solr 1.4 (out-of-the-box facetting search, grouping, replication, http administration vs. luke, ...).
Even if I embed a search-functionality in my Java application I could use SolrJ to avoid the HTTP trade-off when using Solr. Is SolrJ recommended at all?
So, when would you recommend to use "pure-Lucene"? Does it have a better performance or requires less RAM? Is it better unit-testable?
PS: I am aware of this question.
If you have a web application, use Solr - I've tried integrating both, and Solr is easier. Otherwise, if you don't need Solr's features (the one that comes to mind as being most important is faceted search), then use Lucene.
If you want to completely embed your search functionality within your application and do not want to maintain a separate process like Solr, using Lucene is probably preferable. Per example, a desktop application might need some search functionality (like the Eclipse IDE that uses Lucene for searching its documentation). You probably don't want this kind of application to launch a heavy process like Solr.
Here is one situation where I have to use Lucene.
Given a set of documents, find out the most common terms in them.
Here, I need to access term vectors of each document (using low-level APIs of TermVectorMapper). With Lucene it's quite easy.
Another use case is for very specialized ordering of search results. For exmaple, I want a search for an author name (who has writen multiple books) to result into one book from each store in the first 10 results. In this case, I will find results from each book store and to show final results I will pick one result from each book store. Here you are essentially doing multiple searches to generate final results. Having access to low-level APIs of lucene definitely helps.
One more reason to go for Lucene was to get new goodies ASAP. This no longer is true as both of them have been merged and there will be synchronous releases.
I'm surprised nobody mentioned NRT - Near Real Time search, available with Lucene, but not with Solr (yet).
Use Solr if you are more concerned about scalability than performance and use Lucene if you are more concerned about performance than scalability.
I have heard about Lucene a lot, that it's one of the best search engine libraries in Java. Is there any similar (as powerful) library for Ruby?
Well, there's Ferret, which is a port of Lucene to Ruby. Also, Lucene is very easy to use from JRuby, if that's an option for you.
Depending on your needs, you might also want to take a look at Solr, which is a higher-level front-end built on Lucene. There is a Ruby interface, solr-ruby, that interacts with Solr via HTTP.
Ferret is what you're looking for:
"Ferret is a high-performance, full-featured text search engine library written for Ruby. It is inspired by Apache Lucene Java project."
I would try one of them in combination with sphinx.
Thinking Sphinx
http://freelancing-god.github.com/ts/en/rails3.html
Riddle
http://riddle.freelancing-gods.com/
http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/README.html
CLucene is a cross-platform C++ port of Lucene. It can be wrapped and used also from every high-level language (there are also a few legacy Swift projects you could start with). See:
http://sourceforge.net/projects/clucene
http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=summary
unfortunately, in most cases, ferret is not what you're looking for, it's got recurring issues with re-indexing speed, index corruption and segfaults on the server. I think most people are going to SOLR, sphinx, and Xapian. I recall seeing some Tsearch / postgres apps mentioned, Tsearch seems to be a industrial-strength solution
Take a look here
Full Text Searching with Rails