I am trying to download data from the ajax based websites. But crawler4j doesn't support ajax crawling. I want to use crawler4j but with this functionality. My project is based on crawler4j. I there is any way to do please share link and snippet.
Check this pull-request by MadEgg - maybe this will solve your issue. You can also refer to the related issue on the project page to emphasize the need for AJAX crawling within the crawler4j framework.
Related
I'm looking for a java-framework which enables me to easily communicate with a website.
What I'd like to do is for example:
log into a website
open various pages
read information
submit information into forms
send ajax-requests
read ajax-response
What I'm not looking for is a browser automation plugin like selenium. I'm trying to have my application directly communicate with the website.
That's the general outline. If you can think of a better solution for the following problem, I'm more than willing to follow your advice (:
We're working with an webapplication with an gruesome GUI. Unfortunatley we've got no means to tinker with said application or request changes to it. What I'd ike to do is to build is a client which logs into said application, fetches the data and displays them in a more appropriate manner with additional information based on that data while also providing tools to process this data and submit it back to that web-application.
Thanks in advance.
Selenium does come for JAVA. You can download it from here. http://www.seleniumhq.org/download/
Here is a tutorial:
https://www.airpair.com/selenium/posts/selenium-tutorial-with-java
How Selenium web driver works
Selenium web driver (firefox web driver) will open a web browser(firefox) for you and you can actually see what's going on. The choice of opening a browser window may not be the requirement for you. Then you can use:
HTMLUnitWebDriver
PhantomJSDriverService
Take a look at
http://hc.apache.org/httpcomponents-client-ga/quickstart.html
Its not a framework but a library but should provide you the needed methods to interact with your web application
I've got a problem: I want to parse a page (e.g. this one) to collect information about the offered apps and save these information into a database.
Moreover I am using crawler4j for visiting every (available) page. But the problem - as I can see - is, that crawler4j needs links to follow in the source code.
But in this case the hrefs are generated by some JavaScript code so that crawler4j does not get new links to visit / pages to crawl.
So my idea was to use Selenium so that I can inspect several Elements like in a real Browser like Chrome or Firefox (I'm quite new with this).
But, to be honest, I don't know how to get the "generated" HTML instead of the source code.
Can anybody help me?
To inspect elements, you do not need the Selenium IDE, just use Firefox with the Firebug extension. Also, with the developer tools add on you can view a page's source and also the generated source (this is mainly for PHP).
Crawler4J can not handle javascript like this. It is better left for another more advanced crawling library. See this response here:
Web Crawling (Ajax/JavaScript enabled pages) using java
we maintain online book shop using java wicket and it also have search function for finding books in MYSQL database. If someone search for "Fiction" in the search box, the results will contain metadata of each result. If a user click on title, it will take him/her to details page.
I have a requirement to add functionality of reviews box/form in the details page using spring. we do maintian 2 other websites which needs similar sort of work. So we decided to develope this feature as a small individual and re-usable application then integrate into java Wicket or ZEND framework.
what i need to do is:-
1) get details from the form like name, email and review information - JSP or HTML, JQUERY
2) show the entered review on the web page and post those details to MYSQL (update the reviews table in db) when form submitted. springREST
Is it really possible to accomplish using SpringREST? or is there any options for this?
can anyone give some ideas on this requirement how to do?
Sure you can do this. If you implement this new feature with javascript, you can include it to both frameworks. I would suggest you use a javascript framework like AngularJS or Bootstrap as they will make calling the rest service easier.
In addition you need to deploy your rest service as an application or bundle it with your existing application that contains the wicket app.
Here you'll find an example on calling rest service with AngularJS How to access the services from RESTful API in my angularjs page?
I need to create a java program which should go to a website's login page, log in, than go to an another page of the site and submit a form. I know how to submit a form, but my problem is with the login part. This script should work with multiple sites, some are using cookies and some sessions. Is there any way to solve my problem ?
I can't show you any code because I don't know where to begin, first I should submit the login form and then separately go to the submission page ? I don't know please help me, or please tell me how could I solve this problem: I want to submit a form on various sites automatically, only I will be using this script. Until now I'we created a script in JavaScript and I'we opend the sites in iframes and I'we discovered that in Google chrome I can control external iframes, too, and I`we used JavaScript to fill the forms automatically, but my problem is that I need to submit files, images too, and I can't do this using only JavaScript. If it's not possible to do this using java please help me to find an another solution, I need to make it fully automated.
You can use Apache HTTP Client for logging in to websites using Java.
I would take a look at the Selenium RC framework and APIs. It's a test automation tool but there's no reason why you couldn't use it for doing programmatic logins to websites. It has client libraries for many languages including Java.
Using selenium RC you can write Java could thatcan load, navigate and fill in forms programmatically. You are able to target the form input fields using field names or classes and the Java API allows you to load multipart data into a form.
Selenium comes in two flavours, the older Selenium RC version and the newer WebDriver version. Both are capable of doing what you want, however they have slightly different ways of doing it. The documentation provides some good examples to get you started.
I need to screen scrape some data from a website, because it isn't available via their web service. When I've needed to do this previously, I've written the Java code myself using Apache's HTTP client library to make the relevant HTTP calls to download the data. I figured out the relevant calls I needed to make by clicking through the relevant screens in a browser while using the Charles web proxy to log the corresponding HTTP calls.
As you can imagine this is a fairly tedious process, and I'm wodering if there's a tool that can actually generate the Java code that corresponds to a browser session. I expect the generated code wouldn't be as pretty as code written manually, but I could always tidy it up afterwards. Does anyone know if such a tool exists? Selenium is one possibility I'm aware of, though I'm not sure if it supports this exact use case.
Thanks,
Don
I would also add +1 for HtmlUnit since its functionality is very powerful: if you are needing behaviour 'as though a real browser was scraping and using the page' that's definitely the best option available. HtmlUnit executes (if you want it to) the Javascript in the page.
It currently has full featured support for all the main Javascript libraries and will execute JS code using them. Corresponding with that you can get handles to the Javascript objects in page programmatically within your test.
If however the scope of what you are trying to do is less, more along the lines of reading some of the HTML elements and where you dont much care about Javascript, then using NekoHTML should suffice. Its similar to JDom giving programmatic - rather than XPath - access to the tree. You would probably need to use Apache's HttpClient to retrieve pages.
The manageability.org blog has an entry which lists a whole bunch of web page scraping tools for Java. However, I do not seem to be able to reach it right now, but I did find a text only representation in Google's cache here.
You should take a look at HtmlUnit - it was designed for testing websites but works great for screen scraping and navigating through multiple pages. It takes care of cookies and other session-related stuff.
I would say I personally like to use HtmlUnit and Selenium as my 2 favorite tools for Screen Scraping.
A tool called The Grinder allows you to script a session to a site by going through its proxy. The output is Python (runnable in Jython).