Lets suppose we have one table with column id, post_text.
And we have data in it as :
id post_text
Rent hotel Mohali near bus stop.
Hotel mohali
Rent hotel Delhi
Rent hotel Lima , lower prices.
lets suppose I want to search with word "Rent mohali" than I should get only first result.
i.e
id post_text
Rent hotel Mohali near bus stop.
because it both word rent and mohali is present in it.
#mysql, #java, #searching ,#precise search
If you're using MySQL and you'd like to use dynamic searching, you can use Full-Text feature which is more powerful than using the LIKE but be aware that you have to change the structure of you database to support searching index. Using the Full-Text, in addition to getting the result by semantic similarity, it offers more options to filter the query.
Like keyword in the other hand is not powerful and not dynamic but you don't have to change your database structure with indexes and nature language mode. If you follow that way, you have to find a way to split into multiple likes to try to find two things in one search.
Related
I have the following problem for which I’m trying to find an elegant solution.
The front-end shows a search input field. The user is allowed to enter any string. I want to send a request to the backend and get all entities which contain the entered value in any column. It is unknown in which table and which column the entered string can be found.
Example:
Entities: Human, Building, Planet
All of them have 15+ attributes.
Now, the user enters the number 12 into the search field. I want the backend to find any entries in the database which contain the number 12 which could be a person’s shoe size or the street number the building is in and so on.
As technologies for the backend, I’m using JPA (Hibernate) and a Postgres database.
I’d like to implement that as generic as possible so I don’t have to modify anything if new tables or new attributes come along.
What I already tried is getting all table names like that:
"select col.table_name from information_schema.columns col join information_schema.tables tab on tab.table_schema = col.table_schema and tab.table_name = col.table_name and tab.table_type = 'BASE TABLE' where col.table_schema not in ('information_schema', 'pg_catalog')"
+ "group by col.table_name"
Since jpa querys use the Java class name instead of the table name, it’s no use tough. Same problem is for attributes. Using native querys to avoid this problem would make things even more complicated.
Is there any good solution to search the whole database for a specific value?
Lets say I have a table with 2 columns:
city
name (of a person).
I also have a Java "city" object which contains:
city name
a list of all the people in that city
So now I have two options to get the data:
First use DISTINCT to get a list of all the cities. Then, for each city, query the database again, using WHERE to get only records where the person lives in that city. Then I can store this in a City object.
Get a list of all the data, using ORDER BY to order by the city name. Then loop through all the records and start storing them in City objects. When I detect that the city name changes then I can create a new City object and store the records in that.
Which of these methods is faster / better practice? Or is there some better way of getting this information than these two methods? I am using Oracle database.
A database query is a relatively expensive operation - you need to communicate with another server over the network, it then may need to access its disk, compute a result, return it to you, etc. You'd want to minimize these as much as possible. Having a single query and going over its results is by far a better idea than having multiple queries, unless you have some killer reason not to do so - which doesn't seem to be the case here, at least not from the information you shared.
Sort answer is #2. You wish to make as less queries to the database as possible. #2 if i got it correct you will make a join of city/people and then create the object.
Better way: Use JPA/Hibernate. i.e check http://www.baeldung.com/hibernate-one-to-many
Answer number #2 is optimal, in all cases.
You'll need to code the logic in Java to differentiate when you change from one city to the next one.
Alternatively, if you were using MyBatis the solution becomes very simple by using "collections". These perform a single database call and retrieve the whole Java tree you specify, including all sublists in multiple levels. Very performant and also easy to code.
I have a Employee object for example, Employee contains several fields (300+) like name, department, salary, age, account, etc.
Entire Employee table data cached into java List object , which contains 2+ million records.
Requirement
user can search on any filed presents in Employee object like employee name like Sehwag and age > 30 or salary > 100000, Based on user search we have to show filtered list of Employee list.
due to performance issues we are not querying DB, we want to apply the user search criteria on cached java List object earlier
is there way api / frameworks / any other solution where we can query on java objects?
below approach I am trying but I am feeling not a good approach
Iterating the Employee list and applying condition user search criteria on employee object, to know the user selected search criteria among 300 fields is challenging, written a lot enum mapping logic and some additional logic for every filed to make it work.
with current requirement it may works but thinking to use api or framework or better way to solve the requirement!
thanks in advance for your help.
First of all. If you either don't want or don't have chance to change the existing solution, take a look at querydsl.
There is a querydsl-collection module which fits exactly with your need. See more at http://www.querydsl.com/ and http://www.querydsl.com/static/querydsl/latest/reference/html/ch02s08.html
However, if you have a chance to review/rebuild the solution, you should consider something more appropriate for large volume querying. I suggest you exploring more about nonsql databases (mongodb) or indexing tools such as lucene or elasticsearch which adds a RESTFul layer on top of lucene.
I hope it helps.
tried CQEngine's SQL based queries https://github.com/npgall/cqengine suits to my requirement,
below are some useful links
https://dzone.com/articles/getting-started-cqengine-linq
https://mvnrepository.com/artifact/com.googlecode.cqengine/cqengine
I'm working with a pretty niche requirement to model a relational structure within Solr and thought that a custom polyfield would be the most suitable solution to my problem. In short, each record in the index will have a number of embargo and expiry dates for when the content should be considered 'available'. These dates are grouped with another kind of categorisation (let's say by device), so for example, any given item in the index may be available for mobile users between two dates, but only available for desktop users between another two dates.
Much like the currency and the latlon types, I would index the values as a comma separated list representing each availability window, for example:
mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z
So, a single index record could look like
{
id: "1234",
text: ["foobarbaz"],
availability: [
"mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z",
"pc,2013-09-22T00:00:00Z,2013-09-30T00:00:00Z"
]
}
The custom type would do the job of parsing the incoming value and storing it accordingly. Is this a viable solution? How would I approach the custom logic required at query time to filter by device and then make sure that NOW is within the provided dates?
My attempt so far has been based on the Currency field type, but now I've dialled it back to just storing the string in its un-parsed state. If I could prove that the filtering I want is even possible before using the polyfield features, then I'll know if it's worth continuing.
Does anybody else have any experience writing custom (poly)fields, or doing anything similar to what I'm doing?
Thanks!
If you want to be able to filter and search on these ranges, I don't think you'll have much luck storing records like that. It would make more sense to me to have a more structured document, something like:
id: "1234",
text: ["foobarbaz"],
mobileavailabilitystart: "mobile,2013-09-23T00:00:00Z",
mobileavailabilityend: "2013-09-30T00:00:00Z",
pcavailabilitystart: "2013-09-22T00:00:00Z",
pcavailabilityend: "2013-09-30T00:00:00Z"
Indexing the full contents of a csv line in Lucene/Solr, in a single field, would allow you to perform full-text searches on it, but would not be a good way to support querying for a specific element of it.
Our website needs to give out data to the world. This is open-source data that we have stored, and we want it to make it publicly available. It's about 2 million records.
We've implemented the search of these records using Lucene, which is fine, however we'd like to show an individual record (say the user clicks on it after the search is done) and provide more detailed information for that record.
This more detailed information however isn't stored in the index directly... there are like many-to-many relationships and we use our relational database (MySQL) to provide this information.
So like a single record belongs to a category, we want the user to click on that category and show the rest of the records within that category (lots more associations like this).
My question is, should we use Lucene also to store this sort of information and retrieve it through simple search (category:apples), or should MySQL continue doing this logical job? Should I use Lucene only for the search part?
EDIT
I would like to point out that all of our records are pretty static.... changes are made to this data once every week or so.
Lucene's strength lies in rapidly building an index of a set of documents and allowing you to search over them. If this "detailed information" does not need to be indexed or searched over, then don't store it in Lucene.
Lucene is not a database, it's an index.
You want to use Lucene to store data?, I thing it's ok, I've used Solr http://lucene.apache.org/solr/
which built on top of Lucene to work as search engine and store more data relate to the record that maybe use for front end display. It worked with 500k records for me, and 2mil records I think it should be fine.