Architect Predictive Search on 30-50K objects? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have to build a search functionality where GUI will provide a search field to search objects in Oracle database. There are currently 30K objects I have to search on but they will grow in number over time. 300-400 per month approx.
As a part of the requirement, when user types in any text in search Like for example "ABC", then all objects in the DB that contains ABC should appear in a datatable more like system is predicting results based on what user has types in the search field.
Question is how to architect such feature?
Simple way to do is to load everything in the GUI Javascript object and run search on it. Since JS is ridiculously fast, performance wont be an issue.
Another way is to run query in the Database everytime user types in text in search field. This does not seem convenient as it will put unnecessary load on the database.
Is there a better way to architect this feature? Please share thoughts.

premature optimization is seldom useful.
300-400 object growth per month with a 30k base object is nothing at all for any DB to handle.
loading all 30k object at once on the browser is awful and may affect performance while querying result in the DB will not have this problem until you have LOT of and LOT of users accessing the DB.
You should be building the service using the Database and then if/when you reach a bottleneck you can think about optimization trick such as caching frequent queries on the database.

Related

Which collection to use to store and search a million record in Java? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I would like to know which is the right collection to be used in Java to store and search string/data from millions of data?
Assume you want it for String.
Assume you want it for objects, and search for multiple values.
Please note that I am looking for the best performance to quickly get the search result. I am looking for Java collection which can do this. Want to know the right collection that can be used. Input is an arbitrary string and not necessarily sorted.
Actually if you want to search in such large data structure, none of the available collections in java would be sufficient as you would need to store all the data in memory which would require really powerfull computer to work.
However there are existing solutions to you problem, which is called Full Text Search.
Take a look at Apache Lucene or Elasticsearch (which uses Apache Lucene under the hood)
For more simple solution you could also use any relational database which should also do the trick.

Best massive data persistent storage with TTL? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
We are building a system which needs to put tons of data into some persistent storage for a fixed amount of time - 30 to 60 days. Since the data is not critical (we can lose some for example when virtual machine goes down) and we don't want to pay the price of persisting it with every request (latency is critical for us) we were thinking about either buffering & batching the data or sending in an async manner.
Data is append only, we would need to persist 2-3 items per request, system processes ~10k rps on multiple hosts scaled horizontally.
We are hesitating between choosing Mongo (3.x?) or Cassandra, but we can go with any other solution. Does anyone here have any experience or hints in solving this kind of problem? We are running some PoCs, but we might not be able to find all the problems early enough and pivot might be costly.
I can't comment on MongoDB but I can talk to Cassandra. Cassandra does indeed have a TTL feature in which you can expire data after a certain time. You have to plan for it though because TTL's do add some overhead during a process Cassandra runs called 'compaction' - see: http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html
and: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html
As long as you size for that kind of workload, you should be OK. That being said, Cassandra really excels when you have event driven data - things like time series, product catalogs, click stream data, ETC.
If you aren't familiar with Patrick McFadin, meet your new best friend: https://www.youtube.com/watch?v=tg6eIht-00M
And of course, the plenty of free tutorials and training here: https://academy.datastax.com/
EDIT to add one more idea of expiring data 'safely' and with the least overhead. This is one done by a sharp guy by the name of Ryan Svihla https://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/

Which one is faster to search an item? Hitting DB or iterating list of values fetched from Db [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
In my web application i have a employee table with employee id,name,designation salary... as attributes which may contain thousands of records in it.. I want to search employee name, so for searching employeename which one will work faster hitting DB every time or create list of employee names once in java bean and iterating it for searching every time... Which one is better..??
By far, even if you have millions of records, it is better to hit the database per request. To enhance this, you can add a key/index on your name field in your employee table and the requests will be faster.
In case the data in your employee table doesn't vary that much, you have another option which is using a cache for your employee table. With this, your access to the data will be even faster since it will look up the employee at cache (usually RAM), but this comes with a more complex design and adding policies for your cache data retrieval and setting periods to update the cache data.
This is depends in few things..
Hitting the DB is IO action and in case you have specific screen/process that does a lot in specific flow of course it will be better to load list from DB once and use it several times, And this is in case that you can be sure that employees list won't be change in DB by other process/Or it can change and this is not critical for you..
If the screen/process make only few hits to get employees it should be hitting DB.
Remember that Hitting DB a lot of time can also load the DB and cause him to be slow.. He can't handle with infinite number of request.
Hope that helps

What exactly is a CRUD table [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
What is a CRUD table and what is it used for? Is this just another name for a hash table?
I know that CRUD stands for create, retrieve, update, and delete. Are these not just the functions of a regular DB table?
Could someone give an example, maybe in Java?
There is no CRUD table. CRUD refers to operations on a table: create, retrieve, update, and delete. Those operations can be executed on any table. They are bundled together as they are the most basic operations.
A large number of applications that people write are 'crud'. They are simply creating entries in a database, reading them, updating them, and deleting them. Managing users, bug tracking, retail stock inventories... all mostly CRUD with various business logic wrapped around it from time to time.
There isn't such a thing as a crud table. Its just the most common type of application you will find out there, and a good bet what most programmers find themselves writing time and time again.
That the name crud is synonymous with 'dirt, filth, or waste, or something of poor quality' shows part of the distain that many people have for writing such applications. On this theme, some people will jokingly refer to "Create, Retrieve, Alter, Purge" as another form of the application.

Matching multiple items [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am running a e-commerce startup. I am currently facing a problem of matching multiple items as they are input from the sellers portal. I would ideally display one item when a user searches for the item and not display the same item multiple times (each from a different seller with a different details page/picture).
Similar to this
As a part of this effort, I was thinking of using NLP to compare multiple item description page and to club them together. Can someone help me in pointing out algorithms or libraries that can match description content of multiple items from different sellers.?
We are using Java, so any APIs that you can suggest will be helpful!
As someone researching in this area, I can only suggest you to read Data Matching by Peter Christen. It will guide you through the process of matching and goes through some techniques that can be used in general.
However, the whole problem is a bit more difficult as the generic approaches (inverted index and some similarity metric) don't perform very well. You will have a lot of false-positive matches in your dataset, which usually turns into a manual review process of matching candidates. As this is very costly, usually a classifier is used it distinguish between a good match and a not-so-good or wrong match. But as a startup you won't have a lot of training data to train such a supervised classifier here.
Here are a few quick-tips that I found very useful in my daily work:
Expect a lot of garbage (accessory business is the largest in terms of products in e-commerce and will crap your matches)
Normalize your data well, at best according to your database standards
Use word-level models instead of character-level ones (e.g. a Bigram model)
Use the product name and pricing information, all other attributes are usually huge garbage cans

Categories

Resources