I want to save an RSS feed to an xml document on my computer. I'm using XPath with Java to parse the XML myself, so all I want is a file that contains the source (XML) I see when I view the source of the website's RSS page.
In other words, instead of copying and pasting the source of the RSS page into a file I save as an XML file, I'd like to write a program that pulls this for me.
You don't even need to introduce a library to do that!
Simply get an URL-object on the Rss-Feed you want to "download" and use the openConnection()-method to get an URLConnection.
You can then use it's getInputStream()-method. From this InputStream you can read the unparsed source of the RSS document (you should wrapp it with a BufferedInputStream).
This can then be saved as a String (in memory) or directly written to the HDD by using a FileOutputStream.
An example-implementation can be found here: https://gist.github.com/2320294
You can use Apache commons HttpClient to get the file from the web. The usage of this library is very convenient. Here's the official tutorial.
Related
I have an android app that works with large html files (a whole book). Reading whole html files is not a good idea for many reasons (performance, memory usage, etc.)
I prefer to read the file one tag at a time if it's possible. My Html file looks like this
<main_tag>some text here</main_tag>
<main_tag><sub_tag>something</sub_tag><sub_tag>another thing</sub_tag><main_tag>
My Main tags are h1 ... h6 and p. And i want to read my file based on this tags. All the other tags are included in main tags and should be read with main tag.
any idea how can i achieve this? performance is a real issue here
all you need is to use android xml pull api, read the documentation about org.xmlpull.v1.XmlPullParser
Nirav
I have a xml file that is on this link
http://nchc.dl.sourceforge.net/project/trialxml/options.xml
I have downloaded and parsed it successfully and also made a dynamic UI, but I have not used any of the predefined functions like getFirstChild(), getNextSibling() which makes my parser incompatible of parsing complex XML files having around 6-7 levels of menus.
Please help how to traverse a XML file,and dynamically create a UI.
Try using DOM parser, to parse your xml document
See this link for further details:
http://tutorials.jenkov.com/java-xml/dom.html
Actually I am attempting to extract the data from a PDF file but I didn't find any example in the internet and I am asking if there is any possibility that I can use the JPedal library to open to read the data from a PDF file.
You can use PDFBox from Apache.
I am not familiar with JPedal, but I write lots of code that generates and processes pdf files. I use IText and highly recommend it. If you have a specific question on how to process a pdf file, let me know.
Which APIs in java help in extracting table metadata from a pdf, and presenting that table in a web page?
The result should be that when the source of page is viewed it will show the html code of that table.
Itext is usefull in this context
http://itextpdf.com/
I assume that, you need a PDF library for Java.
PDFBox is one of the popular libraries created to PDF manipulation and I think it is worth to look at it.
try The Metadata Extract Tool which extracts metadata from specific file types including PDF. Then you can parse the xml output with any Java XML parser. Once you're able to parse it, elements can be easily laid down in your view page.
how to Parse a PDF file and write the content in word file using Java?
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.
Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).
You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.