Right now, I'm trying to get the results from Google in Java, by searching for a term. I'm using a desktop program, not an applet.
That in itself isn't complicated. but then Google gave me a 403 error. Anyways, I added referrer and User Agent and then it worked.
Now, my problem is that I don't get the results page from Google. Instead, I get their script which gets the results page.
My code right now simply uses a GET request on "http://www.google.com/search?q=" + Dork;
Then it outputs each line.
Here is what I get when I run my program:
<.!doctype html><.head><.title>dork - Google Search<./title><.script>window.google={kEI:"9myaS-Date).getTime()}}};try{}catch(u){}window.google.jsrt_kill=1;
align:center}#logo{display:block;overflow:hidden;position:relative;width:103px;height:37px;
<./ script><./div>
Lots of stuff like that. I shortened it (A LOT) and put in dots to fit it here.
So my big question is:
How do I turn this whole mess into the nice results page I get when searching Google with a browser?
Any help would be seriously appreciated, and I really need the answer fast.
Also, please keep in mind that I do NOT want to use Google's API for this.
Thanks in advance!
Jack is right, take a look at the Google AJAX APIs. If you want nicely formatted results, brush up on your html and css.
Related
A friend and I decided to code a discord bot in Java, using JDA. The idea is for the bot to give you a request Minecraft recipe (a picture of it). However, we don't want to have to download every single recipe (there are way too many of them). So I was wondering if there's something we can use that would give us the recipes with pictures and everything, like an API or a website that we can access from the Java code that would return something we can use. (No code attached since we haven't really done anything and it would serve no purpose).
I am not sure if an API exists that can do that for you. If the problem is spending the time to download every single recipe I might recommend creating a webscraper that could get that data from you. Getting the images from a site like https://www.minecraftcrafting.info/ would be fairly straight forward using python and beautifulsoup (https://pypi.org/project/beautifulsoup4/). Hope that helps and good luck!
Hello everyone here is my problem.
I want to extract 2 words from a website, the words are "won" or "loss". If i can find those 2 words on the website i will be able to write the program i am working on.The problems i have are...
When i write a java program to get the html code from the site it only gives me the html code that is not changing ie: it doesnt giving the dynamic php code parts.
When i "inspect elements" on the website it gives me exactly what i want. It says i either won or loss in the html tags . However if i simply view source it doesn't show me that dynamic php code that u would see when inspecting elements.
Is there a way for me to write code that looks at "inspect elements" for the website and keep track of the part of the html code that is changing between "win" or "loss"?
I've had trouble with something like this before and since you lack details I will give you the best answer I can...
More information that will be helpful to know maybe if you edit include,
Code... Show me what you got
The html code
APIs or frameworks used in you application
So the issue seems like when you request the site the information is not there. Normally this doesn't happen since most webpage display information at load time.
These days we do a lot of stuff with Javascript so therefore that is probably the part you are having problems with. Javascript can load information onto the page dynamically at anytime. It need not me at load time and even if it looks like it by eye that its there when the page loads it may not be since it's too fast to notice.
Look into the javascript code and see if you can find a get, post, or put action and see if you can follow that to where it loads the page. Then mimic the request in your program.
I'm using PHP to scrape some information off webpages, however, I've discovered that the info I'm trying to scrape from the pages is loading through some manner of AJAX/javascript. I thought I remembered that Curl could iterate through the javascript, but I've found that that's not the case.
I seem to remember some sort of backend "web browser" library/function that could trace through javascript and AJAX, to get at a final page result of what a full-functioned browser would arrive at.
Is there a library or function that can do this? Any ideas on how to go about this, other than having to manually trace through the scripts/redirects myself? It doesn't have to be pretty -- I'm just looking to scrape the resulting text.
Maybe not in php but in other languages there's: Watir/WatiN, selenium, watir/selenium-webdriver, capybara-webkit, celerity, node.js runs js directly, as well as phantomjs. There's also iMacros and similar commercial options.
But I usually find that I can get the data I want without any of these by just looking at the requests the page is making and recreate them/parsing the response.
I don't think there is such a library. If you're really desperate and you have lots of time on your hands, then you can, of course, download source code of Firefox, for example, and build yourself something useful. However I don't think this is going to be the best use of yours or anybody else's resources.
Note that even google's indexing bot does not process ajax. Here is what Google has to say about it. It's quite possible that the site you're dealing with does support this, in which case you can try using this google's technique, but on the whole, unfortunately, you're out of luck.
I am looking to develop an app that will take login details from the user, go to a website, login, return values on the web page and then display them to the user on the phone.
Does java have this functionallity? Will I need to use javascript instead maybe? do these answers depend on the website that I am trying to access?
In my head I figure that I could just read in the paramaters as strings or chars, parse the webpage for the appropriate form and "paste" the appropriate value into the form "box". However, I have never attempted anything like this with coding so I am completely new to the idea and dont really know where to start. I tried googling around but any information that I found was either irrelevant or conflicting.
I'm not looking for the code to do it because I will not really learn anythig from that but a finger in the right direction would be great. I really do want to try get better at programming so that's why I've started to give myself these little side projects
Any help that can be offered would be great
Ian,
You can try using http-client (http://hc.apache.org/httpclient-3.x/) lib from apache. It lets to pro grammatically access a website (from a Java code). You will need to do the following things
Use the http-client lib to POST the data to the web site.
Receive the html response.
Use some html parser or xpath to retrieve the values from the response html.
You would need a script which accesses the webpage and enters the data, but in my opinion this is illegal. Because you are accessing a secured area and are able to look into sensitive data. Also accessing the page via a script is "botting" - most pages have safety precautions to prevent the execution of scripts, because most of them are harmful.
In my opinion there is no legal and easy solution to this.
I'm writing a perl program that was doing a simple get command to retrieve results and process them. But the site has been updated and now has a java component that handles the results (so the actual data is not in the source code anymore).
This is the site:
http://wro.westchesterclerk.com/legalsearch.aspx
Try putting in:
Index Number: 11103
Year: 2009
I want to be able to pro grammatically enter the "index number" and "year" at the bottom of the form where it says "search by number" and then retrieve the results listed next to it.
I've written many programs in Perl that simply pass variables via the URL and the results are listed in the source code, so it's easy to parse. (Using LWP:Simple)
Like:
$html = get("http://www.url.com?id=$somenum&year=$someyear")
But this is totally new to me and I don't know where to begin.
I'm somewhat familiar with LWP:UserAgent and Mechanize.
I'd really appreciate any help.
Thanks!
That sort of question gets asked a lot. The standard answer is Wireshark.
I was just using it on that website with the test data you gave and extracted a single responsible POST request. This lets you bypass Javascript altogether.
It might be more logical for you to use one of the modules which drives a browser. Something like Mozilla::Mechanize or the Selenium tools.
A browser knows best how to interact with the server using AJAX and re-render the DOM and so on, so build your script on top of that ability.
What your asking to do in this case is hard. Not impossible but hard.
method A:
You can sift through their javascript code. What their "ajax" is doing is making a get/post request to another web page and dynamically loading the results. If you can decipher what that link is and the proper arguments you can continue to use get. I would recoment Getting the firebug plugin and any other tool that will help you de-obfuscate their javascript.
Another Method:
If your program could access a web browser(with javascript url support. like firefox). You could programatticaly go to these addresses, then wait a moment and get your data.
http://wro.westchesterclerk.com/legalsearch.aspx
javascript: function go() { document.getElementById('ctl00_tbSearchArea__ctl1_cphLegalSearch_splMain_tmpl0_tbLegalSearchType__ctl0_txtInde xNo').value=11109; document.getElementById('ctl00_tbSearchArea__ctl1_cphLegalSearch_splMain_tmpl0_tbLegalSearchType__ctl0_txtYear').value='09';searchClick();} go();
This is a method we have used along with mozembed to programatically get around this stuff. Recently we switched to Web Kit. And to remove this from taking up a video display we have used Xvfb/Xvnc to create a virtual desktop to load the browser in.
Those are the methods I have came up with so far. Let me know if you come up with another. Also I hope I helped.