Full-Text-Search of database [closed]

Full-Text-Search of database [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Iam looking for a performant way and readable way to implement a full-text-search. I have a lot of requirements for the serach. See this list below.
Requirements
Peformance
My database growing up very fast. Load all data into HEAP an doing some .stream()-magic is not an option. The search should be performed by the DBMS.
Readability
I need easy a solution. A complex query like this How to implement simple full text search in JPA (Spring Data JPA)? (see option #2) is also not a solution. I would need some JOINs and the resulting query is to complex.
The overhead with an "index-field" is also not possible (to much joined data).
Concurrency
The application need to be scalable (with n-instances), so a solution with Lucene is not very good here is an example
no mixing of technologies
I dont want to mix the logic into different systems. This means, the whole search-logic should be defined in Java. A combination of the Java-Logic with views or sql-functions should be avoided.
Discovered options yet
QueryDsl
This is my old solution. But its very complex and produced a lot of problems with the automated generated classes.
Lucence
I like this. But there only one big problem: The index. Keep the index up2date on all instances is going a bit too overkill.
Very long #Query
The resulting query getting to complex to handle it.
Java.stream()...
// kinda
getAllUsers().stream()
.filter(user -> user.getName().contains(searchTerm)
|| user.getSex().contains(searchTerm)
|| user.getAge().toString().equals(searchTerm)
|| ...)
I have to much data to do that. So this solution will also not scale well.
Specification Interface
My preferred solution. But maybe there are other (and better) solutions?
SearchFiled or similar
Too many JOINS. Too much data.
?
Question
What are your expericenes with full-text-search in a Spring-Boot-Application? Do you know a solution that met my requierements?

If you have reached till Lucene, then a step further is Solr. I haven't used the options you have mentioned above, but I have certainly worked with Solr and can safely say that it is worth a try, for speed and ease of use.
Out of the four constraints you have put, the first three are taken care of, I feel with Solr.
Performance: Solr is a proven candidate in this area.
Readability: I assume you mean readability of code. Though this depends upon the code and design are done, the Solr part is quite friendly to code, understand and maintain because of the lack of JOIN and other RDBMS concepts.
Concurrency: From the official documentation at lucene.apache.org/solr:
Both Lucene and Solr were designed to scale to support large implementations with minimal custom coding.
and that Solr can do the following in this regard:
distributing an index across multiple servers
replicating an index on multiple servers
merging indexes
no mixing of technologies: With the option of using Solr, you have at least two technologies: Java and Solr. I am not sure if you wanted to keep your solution to pure Java/JEE. If that is the case, then this may not satisfy that need.
However, this requirement:
The search should be performed by the DBMS.
is surely not taken care of.
Also, can't think of a way other than a custom design for this:
Keep the index up2date on all instances is a bit overkill.
A warning: It may take some time to get a good grasp on Solr if you are new to it.

you may consider apache solr for searching

Related

Morphia vs Spring Data Mongo [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last month.
Improve this question
I am using Java language.I have to use ORM framework with MongoDb as Database.I have two options Morphia or Spring Data Mongo support.As far i am able to get details , it has been found Spring Data Mongo is better to use since:
1)It provides better DAO out of box inbuilt classes.
2)It has larger community base.
Are there any performance based differences between the two.And if there which one is better in which condition.Also i have requirement of multitenancy .After little search i found that there is very simple custom implementation in Spring Data Mongo to do the same.But in Morphia it is somewhat difficult.Does achieving multitenancy in Morphia diificult(where we need to write a lot of boiler plate code)

I have been using Spring-data and I guess I feel somehow it lags as far as maturity is concerned.
It's good for all the practical purposes but the features mongo provides in it's full glory, Spring-data is slow to map that as a driver specially when it comes to aggregation.
As far as performance goes, Spring-data doesn't lag imo.
Sometime I get weird behaviors. Some of their annotation silently doesn't work at some places and for my life I cannot figure out why?
But as an overall implementation it's quite helpful in the way that it provides a robust structure on which your application can grow. It also is easy if you are coming from SQL background since you can draw a parallel between jdbcTemplate and mongoTemplate (though one needs to be cautious)
I seriously considered using Morphia, but dropped the idea since spring-data was providing a more structured ways. Looks like in Morphia we would have to implement some structure on our own which has pros and cons but You usually want to avoid doing it. Since there is a risk of boiler-plate codes, there is a learning curve for 'your' structure for new members.
On the pros side, I am sure Morphia provides more extensiblity leading you to enjoy the ability to suck most out of mongo features. It also is lightweight compared to Spring data.

Morphia is the way to go. Pretty stable, very good Play integration and offers access to all Mongo driver features if you need more torque. Reference resolution, entity embedding are working as expected. You get lifecycle annotations too, which are pretty useful for boilerplate persistence code.
I personally like spring-data because of the hades project... You don't need to implement the DAOs. You just write the interface and spring data automatically provides it to you. However Spring Data Mongodb implementation seems a little buggy in my initial trial. If you have hard dates and is working on a production quality product, probably it is wise to choose Morphia.

Choosing an ORM for Android project (min. API level 7) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I currently have an application, where it's primary performance issue is using file-based database consisted of JSON responses.
I'd like to rewrite my application to use SQLite database feature.
Since I'm lazy, I'd like to use some kind of ORM.
So far I have found only two big ORM libraries:
ORMLite
GreenDAO ORM
DB4O
ActiveAndroid
My primary goal is to raise performance on working with data as much as possible
But I've found two possible issues with those libraries.
ORMLite uses annotations, which is big performance issue in pre-honeycomb due to this bug
GreenDAO is using some kind of code generator, and that would slow me down on development as I would have to write generator, and then use generated code. And I don't very like this idea.
DB4O is JPA, which I've always considered as slow and heavy on memory usage, therefore unsuitable for low-end devices (remember the Android API v7)
ad #ChenKinnrot:
The estimated load should be sufficient to think about using an ORM.
In my case it is about 25-30 unique tables, and at least 10 table joins (2 - 4 tables at a time). About 300-500 unique fields (columns)
So my questions are:
Should I use ORM/JPA layer in Android application?
If so, what library would you recommend me to use? (and please add some arguments too)

I've used ORMLite and found it straightforward once you got the hang of it (a few hours), quite powerful and didn't cause any performance problems (app tested in Gingerbread on HTC desire and HTC Hero).
I will be using it again in any projects I need to use a DB for.

A ORM layer is appealing.
However, in practice I either write simple ORM myself or use the Content Provider paradigm, which does not cooperate well with ORM.
I have looked into some existing ORM libraries (mainly ORMLite ,activeAnroid) but they all scared me away
as they seems not so easy to get started.
"We're talking about 25-30 unique tables, and at least 10 table joins.
About 300-500 unique fields (columns)"
If you have fixed and limited patterns of how the data will be queried, I would recommend to write the ORM/sql yourself.
My 2 cents.

If you are worried about your app's performance, I'd recommend greenDAO. It will save you from writing lots of boring code, so code generation should not be an issue. In return, it will generate also entities and DB unit tests for you.

I got some knowledge to share so:
ORM by definition is slower than writing your own sql, it's suppose to simplify the coding of data access, and provide a generic solution, generic = runs slower than you write your queries, if you know sql well.
The real question is how good performance you want to get, if it's the best possible, don't consider any data mapping framework, only sql generation framework that will help you write stuff faster, but gives you full control of everything.
If you don't want to get the most out of the sql db, use orm, I got no experience with this orm you mentioned, so I can't say what to choose.
And your DB is not so big and complex so the time you'll save with orm is not an issue.

In my experience, I had a lot of benefits from using ORM engines. However, there was the case when I had to deal with performance problems.
I had to load about 10 000 rows from the database, and with a standard implementation (I was using ORMLite), it took about 1 minute to complete (depends on the device CPU).
When you need to read a lot of data from the database, you can execute plain SQL and parse the results yourself (in my case, I only needed to query for 3 columns from the table). ORMLite also allows you to retrieve raw-results. By this, the performance has increased by 10 times. All 10 000 rows were loaded in 5 seconds or less!

Java commercial-friendly R-tree implementation? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I need a commercial-friendly (Apache Licence, LGPL, Mozilla Public License etc) R-tree implementation in Java, in order to substitute the geonames Web Service for timezones, as suggested in the question "Determine timezone from latitude/longitude without using web services like Geonames.org". I have found some around, but I was wondering if someone has evaluated or used them in practice.

https://github.com/rweeks/util/blob/master/src/com/newbrightidea/util/RTree.java -
LGPL implementation of R-Tree by Russ Weeks. It looks very simple and clear and not dependent on external libraries.
http://www.mischiefblog.com/?p=171
http://www.mischiefbox.com/blog/uploads/rtree.jar
LGPL implementation of R-Tree by Chris Jones. Another simple and clear solution.
http://www.khelekore.org/prtree/
CPL 1.0 implementation of Priority R-Tree by Robert Olofsson
http://jsi.sourceforge.net/
LGPL - project aims to maintain a high performance Java version of the RTree spatial indexing algorithm.

First of all let me point out that if You'll look up the nearest city from the given coordinates, it might not be in the same time zone! What you really need, in my opinion, is an information about it administrative affiliation - minimum would be a country, but in some cases it should be even more than that, i.e. state. That information can be retrieved with Google Maps API and then correlated to some more detailed TZ information.
there is a free alternative to GeoNames - EarthTools. There are some limitations to the service itself (number of requests, etc.), but still it's good, tested and working just fine for me.
Second of all - there is a free alternative to GeoNames - EarthTools. There are some limitations to the service itself (number of requests, etc.), but still it's good, tested and working just fine for me.
Third of all - if You would care about importing the data into DB, most of the current DB implementations provide geo spatial indexes that you can use. If You need that information embedded in Your application, you can use H2Database (embeded Java DB) with H2Spatial addition - although I've tried it and I can not recommend it fully. Neo4j have a great spatial index implementation
Additionally You can use Solr for GeoSpatial searches. It's nice, it's quick and it's easy to implement. I'm actually in the middle of the process of migrating my DB searches to Solr...
Last, but not least, below you'll find some of the ones I've tested a while back:
JSI - LGPL
GeoTools - LGPL, an overkill, will give You far more than what you need... but it's great!
Possibly few more there, but the ones I've tested so far...

RTree simple Java class created by me:
https://github.com/hadmir/rtree/blob/master/RTree.java
All objects are stored inside two int[] arrays, so it is really easy to persist (to file). Also, fact that adding new rects doesn't create any objects means that you can insert millions of rectangles into RTree and JVM will not burn in flames. This is useful for geo projects, where object counts are usually enormous.
Only 2D rectangles are stored (so, for complex object you need to find bounding rectangle). Query returns all rects (IDs of rects) intersecting or overlapping with "query rectangle".

How mature is Ebean or Siena? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
In the last time I heard a lot of complaining about Hibernate. And indeed I have some painful experiences with Hibernate too. So I read about Ebean and Siena.
Both have interesting approaches. Unfortunately, database access layers are very easy to write, but if your project grows and you have to handle great database-tables, you know if they are good or not. So it's really difficult to evaluate such a tool. Hibernate is well known and you could be sure that you can solve your problem with it. Sometime you need to learn a lot, but you can solve it.
How is it with Ebean? Are there any real world applications? Which databases are supported? Is it reliable?
After searching a little bit more I see that there are a lot more ORM-frameworks, so is there at least one reliable one?

Rob (Ebean Committer) here.
Ebean is about 4+ years old now. I would say it is fairly mature now. The supported DB's include Oracle, MySql, Postgres, H2 and SQL Server (and recently SQLite). Ebean is doing stuff that other ORM's are not such as Autofetch (automatic query tuning) so I'm not how that fit's into a 'maturity rating'. IMO the Ebean community is relatively small though so you probably need to hit the Ebean google group to engage them.
Any real world applications? Yes, but you are best to ask the Ebean community about that really. Certainly there is good support for batch processing (batch size, turn of cascading persist for a transaction etc) and large query support that I don't see in JPA etc (you might get something similar with Hibernate's Sessionless support).
Hopefully this might answer some small parts of your question anyway.
Cheers, Rob.

I'm currently a developer of Siena but not since very long. Let me explain why I became a developer on this project?
I went to Siena because I wanted to use Play+GAE and Siena appeared to be a good start for GAE DB and I really wanted to avoid JDO/JPA.
Then, I began to really appreciate Siena for its straightforward, light and easy approach and so simple APIs. It doesn't pretend to be the all-in-one abstraction layer like JDO and the greatest standard DB API like JPA. It really made me think of DB APIs from Python/Ruby and it really fits my point of view: I want a simple DB API which allows me to solve the great majority of my problems and when I have a more complex problem, I will use the lower layer APIs but certainly not an abstraction layer such as hibernate.
The possibility to make my code work on GAE DB or JDBC was also a good aspect. Once again, Siena doesn't pretend to provide exactly the same things in both worlds because SQL and NoSQL are not really compatible (but ORM is neither really compliant to SQL model :) ).
But once again, it is quite practical to be able to rely on the same APIs in several DBs.
Finally, the library is ONE jar and you don't have to retrieve the whole universe to use it.
So, I became progressively a committer on Siena because I wanted to take part of this nice little adventure.
Now siena team is working on a new version keeping the same simple APIs, bringing new interesting features and really improving all the backend code to make it even easier to extend for new DB support.
Siena is a pragmatic API driven by user experiences and that's why I like it ;)
Pascal

We've had really great experience with MyBatis, which is not an ORM per se, but another class of persistence manager, an SQL Mapper. Using it you start with SQL statements and direct it on how to map result rows into POJOs. It's conceptually easy to understand and tune with not much magic going on inside. It's ideal if you are comfortable with SQL or need to work with an established schema.

Besides Ebean and Siena:
You can try JIRM which is focused on CRUDing immutable objects (yes I'm the author).
There is also jOOQ and Joist.
I feel that JIRM minimizes the number of DTO's because the domain objects are immutable and do not inherit, implement and/or are not "enhanced/instrumented". Such is not the same with Siena and Ebean.
Also because the objects are Immutable there is more of a focus on per column updating instead of the whole object which makes more sense given today's AJAX interfaces (compared to the old POST the whole bean model).

What about using EB3, with for instance JBoss (www.jboss.org)?

Lucene Interview Questions [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am interviewing candidates for a position developing an application which relies heavily on Lucene. In addition to the usual questions I ask, I'd like to be able to ask one or two Lucene-specific questions that will give me a rough idea of how familiar they are with the library. The problem is that I have no experience with Lucene myself. Any suggestions?

A couple of questions I would ask:
What is the Lucene data structure? (inverted index)
How does Lucene computes the relevancy of a document? (vector space model, boolean model)
What is a segment? (a portion of the index)
How text is being indexed? (analyzers, tokenizers)
What is a document? (collection of fields)
What is the Lucene query syntax looks like? (boolean query, boost, fuzzy searches)
How it differs from a relational database and when would you use one over the other?

This is a tricky task. You're looking for the guy who knows more about Lucene than you do; therefore, you can't be a reliable judge of the candidates' knowledge (although you should be able to at least eliminate the ones who obviously know less than you).
My advice is to ask the candidates to explain to you some aspect of Lucene that you are confused about. When the interview's over, you can look it up to see if the answer made sense. This has the added benefit of testing their ability to communicate complex ideas. (And if the answer is "I don't know", then you should take that as a good sign: people who are willing to admit their ignorance are worth a lot more than those who aren't.)

If the candidate has a long history of Java development, familiarity with the Lucene API shouldn't be that important. Someone unfamiliar with Lucene might take a little longer to get started, but in the long run, I would feel much more comfortable with a Very experienced Java developer than a somewhat experienced java Developer with Lucene experience. In fact, I might prefer a very experienced non-java programmer if there portfolio was impressive.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.