Creating a webpage crawler that finds and maches user input [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I made a website with many pages, on each page is a sample essay. The homepage is a page with a search field. I'm attempting to design a system where a user can type in a word and when they click 'search', multiple paragaphs containing the searched word from the pages with a sample essays are loaded on to the page.
I'm 14 and have been programming for about 2 years, can anyone please explain to me the programming languages/technologies I'll need to accomplish this task and provide suggestions as to how I can achieve my task.
All I have so far are the web pages with articles and a custom search page I've made with PHP. Any suggestions?

The programing language does not realy matter for the way to solve the problem. You can implement it in the language which you are comfortable with.
There are two basic ways to solve the problem:
Use a crawler which creates a index of words found on the different pages
The use that index to lookup the searched word
or
When the user has entered the search expression, you start crawling the pages and look if the search expression is found
Of course both solutions will have different (dis)advantages
For example:
In 1) you need to do a inital crawl (and udate it later on when the pages change)
In 1) you need to store the crawl result in some sort of database
In 1) you will receive instanst search results
In 2) You don't need a database/datastore
In 2) You will have to wait until all pages are searched before showing the final resultlist

Related

exact speech word recognition [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I used Android Speech-to-Text API (Recognizer Intent), for recognition of the word said by the user. But the problem is that it returns the accurate word after autocorrection. I want it returns the exact word (without correction) said by the user. Please suggest me any other android library for this feature or how can I got my feature inside the android inbuild speech to text API.
I also saw the google API for this but that is paid And that is also AI-based.
I want it to return the exact word (without correction) said by the user.
I think you misunderstand what speech recognition is capable of doing.
A speech recognizing system is only capable of recognizing an uttered word as being one of a number of possible words. It doesn't ... and cannot ... tell you with 100% accuracy what the speaker actually said.
This applies to any speech recognition system, including a human listener. (How many times have you had to ask someone to "Say that again please" ?)
The only way to determine with absolute certainly the exact words that were spoken is to ask the person who spoke them to type them in! (And even then, they may not give you a 100% accurate answer, in some cases.)
In short, what you want is not possible. Software cannot do it. Humans cannot do it, even if they believe that they can1. You need to adjust your expectations.
1 - The Two Ronnies - Four Candles sketch
Identifying / recommending better (more accurate) speech recognition software or services is off-topic.

Should i store one arrayList per file or should i store all my arrayList in the same file? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm doing my first Mobile App and i have been studying java for about 2 weeks now so i'm pretty new.
The app that i'm trying to make is a grocery list app.
Basically, I want to sort my grocery list by department wise. For example:
Fruit items will go in a fruitArray
the meat items will go in meatArray
etc.
Should I create a single file for each department (Fruit, meat, bakery) or can i store all my array in the same file and access them separately?
I did some search but i cannot find a topic that answers my question.
I would like to know what would be the best way to do it and if it's to put the arrays in the same file, how can i do that ?
Thanksâ˜ș
When you're doing or learning something new, do whatever works for you. Consider trying to do both, as it's experience you'll gain for future.
Some general things to take note of though :
Storing all items in one file could potentially create one massive file (regardless of the project you're working on, im generalizing now)
Storing items in different files increases the overhead of having to handle and manage different files.
These are things you will have to consider and things you'll learn from experience, based on the project and various different aspects.
For the sake of this question, I think storing different items in different files would potentially be a better, simpler solution for now, but it really depends on the entire project and how you plan to implement it. Hope this at least somewhat answers your question. Happy coding and happy learning

Architect Predictive Search on 30-50K objects? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have to build a search functionality where GUI will provide a search field to search objects in Oracle database. There are currently 30K objects I have to search on but they will grow in number over time. 300-400 per month approx.
As a part of the requirement, when user types in any text in search Like for example "ABC", then all objects in the DB that contains ABC should appear in a datatable more like system is predicting results based on what user has types in the search field.
Question is how to architect such feature?
Simple way to do is to load everything in the GUI Javascript object and run search on it. Since JS is ridiculously fast, performance wont be an issue.
Another way is to run query in the Database everytime user types in text in search field. This does not seem convenient as it will put unnecessary load on the database.
Is there a better way to architect this feature? Please share thoughts.
premature optimization is seldom useful.
300-400 object growth per month with a 30k base object is nothing at all for any DB to handle.
loading all 30k object at once on the browser is awful and may affect performance while querying result in the DB will not have this problem until you have LOT of and LOT of users accessing the DB.
You should be building the service using the Database and then if/when you reach a bottleneck you can think about optimization trick such as caching frequent queries on the database.

XML Document Parsing in Java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need to parse an XML document in java for a web service I'm making, and save the contents of it.
I need to save the name of the tags, if the tag has attributes save the attributes, and then save the data within those tags. These three items will be inserted into a database table with the three columns tags, attributes, and data.
I'm using the following java libraries:
javax.xml.parsers.DocumentBuilder
javax.xml.parsers.DocumentBuilderFactory
org.w3c.dom.Document, org.w3c.dom.NodeList
org.xml.sax.InputSource.
Any help would be much appreciated.
DISCLAIMER: I don't want to plagiarize so I didn't include code but included links to other tutorials that are VERY helpful to this topic.
First, you should read w3c dom's java API because it tells you a lot of useful functions that are very related to your question.
Second, this website contains a useful tutorial that's easy to understand and it contains the necessary information for you to get the attributes of tags.
Third, this website gives you info on how to get tagName when you are looping through elements.
Fourth, you should always read related API, google, and then post a question if you are have no clue after a LONG period of time.
Lastly, you should post a difference question or research on database FIRST before asking that question here. This question should only be about XML Document Parsing in Java.
We are not supposed to help you do anything so the API is the best help for you (and google).
API: https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/package-summary.html

Matching multiple items [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am running a e-commerce startup. I am currently facing a problem of matching multiple items as they are input from the sellers portal. I would ideally display one item when a user searches for the item and not display the same item multiple times (each from a different seller with a different details page/picture).
Similar to this
As a part of this effort, I was thinking of using NLP to compare multiple item description page and to club them together. Can someone help me in pointing out algorithms or libraries that can match description content of multiple items from different sellers.?
We are using Java, so any APIs that you can suggest will be helpful!
As someone researching in this area, I can only suggest you to read Data Matching by Peter Christen. It will guide you through the process of matching and goes through some techniques that can be used in general.
However, the whole problem is a bit more difficult as the generic approaches (inverted index and some similarity metric) don't perform very well. You will have a lot of false-positive matches in your dataset, which usually turns into a manual review process of matching candidates. As this is very costly, usually a classifier is used it distinguish between a good match and a not-so-good or wrong match. But as a startup you won't have a lot of training data to train such a supervised classifier here.
Here are a few quick-tips that I found very useful in my daily work:
Expect a lot of garbage (accessory business is the largest in terms of products in e-commerce and will crap your matches)
Normalize your data well, at best according to your database standards
Use word-level models instead of character-level ones (e.g. a Bigram model)
Use the product name and pricing information, all other attributes are usually huge garbage cans

Categories

Resources