i am trying to create an android application that saves webpages to use it in offline-browsing, i was able to save the webpage but the problem was in the contents (images, javascripts,..etc), is there a way to do so programmatically, i use eclipse and test my work on an emulator.
hm, I am afraid you should parse html's yourself (I mean do that with a properly lib) and store all resources (css, js, images, videos etc.) too.
s. how it is done in a java crawler: open source crawlers
You will need to search for all images, javascript files, css files, etc... and download them, saving them to the same relative path to the HMTL files - Assuming the html is coded with relative paths (images/image.png) and not absolute paths (http://www.domain.com/image/image.png).
You can pretty easily search the html string for <img, <script, <link etc.. and parse from there - or you can find a 3rd party html parser
Related
I know how to download a webpage's source in java. But a webpage also contains image url, CSS and JS script url which need to be downloaded later like:
<LINK REL="STYLESHEET" HREF="htmlatex.css">
<img src=p10012.gif>
If I only download the source of a webpage, rendering it in offline mode will need to download this htmlatex.css and p10012.gif result in missing contents in offline mode. My objective is to download all contents of webpage programmatically and provide it as assets of an android app. HOw can I do that in java.
Note: please let me know if my question is not clear enough.
I would suggest to use JSoup library to do it as its pretty good HTML parse. You can parse HTML and than iterate over resources to download them. I am not sure but there should be an example on the same topic you asked.
I am developing a Java project in which i have a sub-module where i need to extract contents [text, image, color] from a webpage and compare it with another webpage. I am planning to use WinHTTrack software for downloading the webpage locally, but the problem is it doesn't save it as HTML. How can i download a webpage with HTML extension using softwares such as WinHTTrack [or just saving the webpage through ctrl+s is enogh.?]. Also i am planning to use HTML Parsers to extract the 3 content types[text, image, color],after downloading the webpage locally. So which parser to go with.?
WEll I use Httrack and it fetches html files as well. You are probably taking winhttrack project file as the only output file, but if you check inside the project directory there are html files (together with images, etc). I would suggest using - http://htmlparser.sourceforge.net/. It is a java library and since your project is a Java project it should be fairly easy to use it. You can also save the whole website locally using org.htmlparser.parserapplications.SiteCapturer (and specify whether resources such as images should be captured as well). Hope it helps.
all.
for the given page, say "http://www.yahoo.com", how can i calculate total size for the downloaded files, for example img files, javascript files, and css files?
I know the htmlparser jar, but this does not support element for css file.
As Graeme mentioned, both the Firebug add-on for Firefox (a great tool for web developers btw) and the developer tools in Chrome will give you the info you want.
However if you dont want to download anything you can use this online service:
http://www.websiteoptimization.com/services/analyze/
And this will tell you how much is downloaded in bytes for a webpage, including images, style sheets, scripts and everything else.
I have a WebView on my Android application which loads (WebView.loadUrl()) different local HTML files from phone's internal storage. I would like to include some custom css styles for them.
Now, I could have my app edit every HTML file and add linking reference for the CSS file.
I could also read the file contents, add the CSS linking and use WebView.loadData() to load it.
But is it possible to do this a lot simpler and efficiently?
Note: The HTML files are downloaded from a website. So editing them manually is not possible in this case, but once downloaded they can be edited via the app if necessary.
One possibility (I have not tried this):
WebView.loadDataWithBaseURL(String baseUrl, String data, ..)
takes a baseURL for the document to use to resolve relative URLs. Take a look at the CSS url and construct baseURL so that CSS url will reference local CSS file.
How to download the page and the images given the url using java. A similar question was avaialble already, which when tried just saves the page and not the images.
You'll have to download the page (HTML), then parse the HTML, find the <img> tags in there, and download the images (the src attribute of the <img> tag is the URL of the image file). Likewise you might want to download accompanying CSS or JavaScript files for the page.
Java makes it easy to copy files from a Web site