Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am running a e-commerce startup. I am currently facing a problem of matching multiple items as they are input from the sellers portal. I would ideally display one item when a user searches for the item and not display the same item multiple times (each from a different seller with a different details page/picture).
Similar to this
As a part of this effort, I was thinking of using NLP to compare multiple item description page and to club them together. Can someone help me in pointing out algorithms or libraries that can match description content of multiple items from different sellers.?
We are using Java, so any APIs that you can suggest will be helpful!
As someone researching in this area, I can only suggest you to read Data Matching by Peter Christen. It will guide you through the process of matching and goes through some techniques that can be used in general.
However, the whole problem is a bit more difficult as the generic approaches (inverted index and some similarity metric) don't perform very well. You will have a lot of false-positive matches in your dataset, which usually turns into a manual review process of matching candidates. As this is very costly, usually a classifier is used it distinguish between a good match and a not-so-good or wrong match. But as a startup you won't have a lot of training data to train such a supervised classifier here.
Here are a few quick-tips that I found very useful in my daily work:
Expect a lot of garbage (accessory business is the largest in terms of products in e-commerce and will crap your matches)
Normalize your data well, at best according to your database standards
Use word-level models instead of character-level ones (e.g. a Bigram model)
Use the product name and pricing information, all other attributes are usually huge garbage cans
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I would like to know which is the right collection to be used in Java to store and search string/data from millions of data?
Assume you want it for String.
Assume you want it for objects, and search for multiple values.
Please note that I am looking for the best performance to quickly get the search result. I am looking for Java collection which can do this. Want to know the right collection that can be used. Input is an arbitrary string and not necessarily sorted.
Actually if you want to search in such large data structure, none of the available collections in java would be sufficient as you would need to store all the data in memory which would require really powerfull computer to work.
However there are existing solutions to you problem, which is called Full Text Search.
Take a look at Apache Lucene or Elasticsearch (which uses Apache Lucene under the hood)
For more simple solution you could also use any relational database which should also do the trick.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm doing my first Mobile App and i have been studying java for about 2 weeks now so i'm pretty new.
The app that i'm trying to make is a grocery list app.
Basically, I want to sort my grocery list by department wise. For example:
Fruit items will go in a fruitArray
the meat items will go in meatArray
etc.
Should I create a single file for each department (Fruit, meat, bakery) or can i store all my array in the same file and access them separately?
I did some search but i cannot find a topic that answers my question.
I would like to know what would be the best way to do it and if it's to put the arrays in the same file, how can i do that ?
Thanksâș
When you're doing or learning something new, do whatever works for you. Consider trying to do both, as it's experience you'll gain for future.
Some general things to take note of though :
Storing all items in one file could potentially create one massive file (regardless of the project you're working on, im generalizing now)
Storing items in different files increases the overhead of having to handle and manage different files.
These are things you will have to consider and things you'll learn from experience, based on the project and various different aspects.
For the sake of this question, I think storing different items in different files would potentially be a better, simpler solution for now, but it really depends on the entire project and how you plan to implement it. Hope this at least somewhat answers your question. Happy coding and happy learning
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have to build a search functionality where GUI will provide a search field to search objects in Oracle database. There are currently 30K objects I have to search on but they will grow in number over time. 300-400 per month approx.
As a part of the requirement, when user types in any text in search Like for example "ABC", then all objects in the DB that contains ABC should appear in a datatable more like system is predicting results based on what user has types in the search field.
Question is how to architect such feature?
Simple way to do is to load everything in the GUI Javascript object and run search on it. Since JS is ridiculously fast, performance wont be an issue.
Another way is to run query in the Database everytime user types in text in search field. This does not seem convenient as it will put unnecessary load on the database.
Is there a better way to architect this feature? Please share thoughts.
premature optimization is seldom useful.
300-400 object growth per month with a 30k base object is nothing at all for any DB to handle.
loading all 30k object at once on the browser is awful and may affect performance while querying result in the DB will not have this problem until you have LOT of and LOT of users accessing the DB.
You should be building the service using the Database and then if/when you reach a bottleneck you can think about optimization trick such as caching frequent queries on the database.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In Data Structures and Algorithms in Java, the advantages of ordered arrays are stated. One of the advantages, I wish I had some kind of real example for. This is not for homework, but just self-clarification. What are some real cases for when insertion/deletion is not frequent, but searches are frequent? Anything would help even if you can point me in the direction of some github repository. Thank you.
An example would be a dictionary. After it is built, it can be looked up millions of time. Like your paper dictionary, the words in it better be sorted.
While I like leeyuiwah's answer, a more common domain which you can see in commercial context is a data base of some entity, for example the customers or employees, for which normally you create a view. That's why we index them (make the retrieval faster). Indeed, after inserting some records most of the operations will be retrieval which includes a search (based on complicated conditions or a simple identifier).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I made a website with many pages, on each page is a sample essay. The homepage is a page with a search field. I'm attempting to design a system where a user can type in a word and when they click 'search', multiple paragaphs containing the searched word from the pages with a sample essays are loaded on to the page.
I'm 14 and have been programming for about 2 years, can anyone please explain to me the programming languages/technologies I'll need to accomplish this task and provide suggestions as to how I can achieve my task.
All I have so far are the web pages with articles and a custom search page I've made with PHP. Any suggestions?
The programing language does not realy matter for the way to solve the problem. You can implement it in the language which you are comfortable with.
There are two basic ways to solve the problem:
Use a crawler which creates a index of words found on the different pages
The use that index to lookup the searched word
or
When the user has entered the search expression, you start crawling the pages and look if the search expression is found
Of course both solutions will have different (dis)advantages
For example:
In 1) you need to do a inital crawl (and udate it later on when the pages change)
In 1) you need to store the crawl result in some sort of database
In 1) you will receive instanst search results
In 2) You don't need a database/datastore
In 2) You will have to wait until all pages are searched before showing the final resultlist