Fetching data from the other sites and displaying into our page.? - java

Is ther any way to get data from other sites and display in our jsp pages dynamically.
http://www.dictionary30.com/meaning/Misty see this url
in that one block is like
Wikipedia Meaning and Definition on 'Misty'
In that block they are fetching the data from Wikipedia and displaying into dictionaly30.
Question:
How they are fetching wiki data to their site.?
I need to display data like that in my jsp page by fetching from other site.

You can use URLConnection and read other site's data.
or better you use JSoup it will also parse specific data for you from some other site.
for your case
Document document = Jsoup.parse(new URL("http://www.dictionary30.com/meaning/Misty"), 10000);
Element div = document.select("div[id=contentbox]").first();
System.out.println(div.html());

You can fetch data from other site on your server side using URLConnection and provide this data to jsp page.

Do make sure you get permission first from the site owners before doing anything like that.
Most people don't take kindly to their data being leeched by others, especially as it costs them money and doesn't generate any (advertising) income.
It's also very risky in that your own site/application will quickly fail as soon as the site you're leeching from gets changed to a different layout.

Related

Fetching data from another website with JSOUP

Basically, I need a table with all the possible books that exist, and I don't wanna do that, because I'm a very lazy person xD. So, my question is.. can I use a site, that I have in mind, and just like cut off the rest this site(that I don't need) and leave only the search part(maybe do some kind of changes in layout)... then, make the search, find the book and store in my database only the data that make sense for me. Is that possible? I heard that JSOUP could help.
So, I just want some tips. (thx for reading).
the site: http://www.isbn.bn.br/website/consulta/cadastro
Yes, you can do that using Jsoup, the main problem is that the URL you shared uses JavaScript so you'll need to use Selenium to force the JS execution or you can also get the book URL and parse it.
The way to parse a web using Jsoup is:
Document document = Jsoup.connect("YOUR-URL-GOES-HERE")
.userAgent("Mozilla/5.0")
.get();
The you retrieve the whole HTML in a Document so you can get any Element contained in the Element using CSS Selectors, for example, if in the HTML you want to retrieve the title of the web, you can use:
Elements elements = document.select("title");
And that for every HTML tag that you want to retrieve information from. You can check the Jsoup Doc an check some of the examples explained: Jsoup
I hope it helps you!

Developing app to detect webpage change

I'm trying to make a desktop app with java to track changes made to a webpage as a side project and also to monitor when my professors add content to their webpages. I did a bit of research and my current approach is to use the Jsoup library to retrieve the webpage, run it through a hashing algorithm, and then compare the current hash value with a previous hash value.
Is this a recommended approach? I'm open to suggestions and ideas since before I did any research I had no clue how to start nor what jsoup was.
One potential problem with your hashing method: if the page contains any dynamically generated content that changes on each refresh, as many modern websites do, your program will report that the page is constantly changing. Hashing the whole page will only work if the site does not employ any of this dynamic content (ads, hit counter, social media, etc.).
What specifically are you looking for that has changed? Perhaps new assignments being posted? You likely do not want to monitor the entire page for changes anyway. Therefore, you should use an HTML parser -- this is where Jsoup comes in.
First, parse the page into a Document object:
Document doc = Jsoup.parse(htmlString)
You can now perform a number of methods on the Document object to traverse the HTML Nodes. (See Jsoup docs on DOM navigation methods)
For instance, say there is a table on the site, and each row of the table represents a different assignment. The following code would get the table by its ID and each of its row by selecting each of the table's tags.
Element assignTbl = doc.getElementById("assignmentTable");
Elements tblRows = assignTbl.getElementsByTag("tr");
for (Element tblRow: tblRows) {
tblRow.html();
}
You will need to somehow view the webpage's source code (such as Inspect Element in Google Chrome) to figure out the page's structure and design your code accordingly. This way, not only would the algorithm be more reliable, but you could take it much further, such as extracting the details of the assignment that has changed. (If you would like assistance, please edit your question with the target page's HTML.)

Providing input data for web scraping

I want to scrap data from the following site:
http://www.upmandiparishad.in/commodityWiseAll.aspx
There are two input elements, Commodity and Date. How do I provide these values and retrieve the resulting information?
To extract data from a web page from Java, you can use jsoup.
To provide input elements, you need to understand how they are provided originally by your browser.
Basically, there are two most common methods for a request-response between a client and a server:
GET - Requests data from a specified resource
POST - Submits data to be processed to a specified resource
You can find more about them here.
When you select the Commodity and the Date input values, you can investigate the methods used to provide those values to the server by examining network requests. For example, in Chrome, you can press F12 and select the Network tab to check the information being sent to and from the browser.
When you find out the way of providing the data, you can then form your HTTP request accordingly to provide the same data via jsoup or similar library.
For example, here is how you can provide simple input fields to your request:
Document doc = Jsoup.connect("http://example.com/")
.data("some_input_1", "some_data_1")
.data("some_input_2", "some_data_2")
.post();
This is ofcourse just to get you started, it is by no means a complete answer. You need to show real effort on your side to search for answers online, as there are plenty.
Here are just a few to get you started:
http://www.mkyong.com/java/how-to-send-http-request-getpost-in-java/
http://simplescrape.sourceforge.net/
http://www.xyzws.com/Javafaq/how-to-use-httpurlconnection-post-data-to-web-server/139
http://www.javaworld.com/article/2077532/learn-java/java-tip-34--posting-via-java.html

Wikipedia Page Id from URL

I am parsing through wikipedia dump in java. In my module I want to know the page id of the internal pages of wiki those are referred by the current page. Getting the internal links and thus the url from it is easy. But how to get Page ID from url.
Do I have to use some mediaWiki for this? If yes how
Any other alternative?
for eg: http://en.wikipedia.org/wiki/United_States
I want to get its Page-Id i.e 3434750
You can use the API for that. Specifically, the query would look something like:
http://en.wikipedia.org/w/api.php?action=query&titles=United_States
(You can also specify more than one page title in the titles parameter, separated by |.)
As an alternative, you could download the page.sql dump (1 GB compressed for the English Wikipedia), which also contains this information. To actually query it, you could either import it into an MySQL database and then query that, or you could directly parse the SQL.
If you can't use the api you can always get the pageID from the info page reached by appending ?action=info to the url. Should make a better starting point for a parser.
For your example above: https://en.wikipedia.org/wiki/United_States?action=info

How can I send a newsletter with xPages content?

I have some content displayed using computed fields inside a repeat in my xpage.
I now need to be able to send out a newsletter (by email) every week with the content of this repeat. The content can be both plain text and html
My site is also translated into different languages so I need the code to be able to specify the language and return the content in that language.
I am thinking about creating a scheduled lotusscript or java agent that somehow read the content of the repeat. is this possible? if so, some sample code to get me started would be great
edit: the content is only available to logged in users
thanks
Thomas
Use a java agent, and instead of going to the content natively, do a web page open and open the page as if in a browser, then process the result. (you could make a special version of the web page that hides all extraneous content as well if you wanted)
How is the data for the repeat evaluated? Can it be translated in to a lotusscript database.search?
If so then it would be best to forget about the actual xPage and concentrate on working out how to get the same data via LotusScript and then write your scheduled agent to loop through the document collection and generate the email that way.
Looking to the Xpage would generate a lot of extra work, you need to be authenticated as the user ( if the data in the repeat is different from one user to the next ) to get the exact same data that this particular user would see and then you have to parse the page to extract the data.
If you have a complicated enough newsletter that you want to do an Xpage and not build the html yourself in the agent, what you could do is build a single xpage that changes what's rendered based on a special query string, then in your agent get the html from a URLConnection and pass the html into the body of your email.
You could build the URL based on a view that shows documents with today's date.
I would solve this by giving the user a teaser on what to read and give them a link to the full content.
You should check out Weihang Chens (my colleague) article about rendering an xPage as Mime and sending it as a mail.
http://www.bleedyellow.com/blogs/weihang/entry/render_a_xpages_programmtically_and_send_it_as_a_mail?lang=en_us
We got this working in house and it is very convenient.
He describes 3 different approaches to the problem.

Categories

Resources