Save content of HTML in local storage - java

i want to fetch xml file from the links like
http://api.worldbank.org/countries/GBR/indicators/NY.GDP.MKTP.KD.ZG?date=2004:2012
it returns a xml file, i don't know how to save this file in my folder named "temp" using java or javascripts, actually i don't want to display this result of that link to the user, I'm generating such links dynamically.
please help!!!

I recommend you to use an HTML parser library like jsoup in this situation. Please have a look at the below steps for better under standing:
1. Download jsoup core library (jsoup-1.6.1.jar) from http://jsoup.org/download
2. Add the jsoup-1.6.1.jar file to your classpath.
3. Try the below code to save the xml file from the URL.
package com.overflow.stack;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
/**
*
* #author sarath_sivan
*/
public class XmlExtractor {
public static StringBuilder fetchXmlContent(String url) throws IOException {
StringBuilder xmlContent = new StringBuilder();
Document document = Jsoup.connect(url).get();
xmlContent.append(document.body().html());
return xmlContent;
}
public static void saveXmlFile(StringBuilder xmlContent, String saveLocation) throws IOException {
FileWriter fileWriter = new FileWriter(saveLocation);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write(xmlContent.toString());
bufferedWriter.close();
System.out.println("Downloading completed successfully..!");
}
public static void downloadXml() throws IOException {
String url = "http://api.worldbank.org/countries/GBR/indicators/NY.GDP.MKTP.KD.ZG?date=2004:2012";
String saveLocation = System.getProperty("java.io.tmpdir")+"sarath.xml";
XmlExtractor.saveXmlFile(XmlExtractor.fetchXmlContent(url), saveLocation);
}
public static void main(String[] args) throws IOException {
XmlExtractor.downloadXml();
}
}
4. Once the above code is executed successfully, a file named "sarath.xml" should be there in your temp folder.
Thank you!

Well your body is XML not HTML, just retrieve it using Apache HttpClient, and pump the read InputStream to a FileOutputStream. What was the problem? Do you want to save parsed content in a formatted form?

public String execute() {
try {
String url = "http://api.worldbank.org/countries/GBR/indicators/NY.GDP.MKTP.KD.ZG?date=2004:2012";
String saveLocation = System.getProperty("java.io.tmpdir")+"sarath.xml";
XmlExtractor.saveXmlFile(XmlExtractor.fetchXmlContent(url), saveLocation);
} catch (Exception e) {
e.printStackTrace();
addActionError(e.getMessage());
}
return SUCCESS;
}

Related

how to add data to XML files in java

how can i add data to an xml file and append to it ohter data if it exists ?
i tried the following but this code only creates one node of values and does not append to the file. it always removes the existing one and add the new one to it.
import java.beans.XMLEncoder;
import java.beans.XMLDecoder;
import java.io.*;
public class Main {
public static void main(String[] args) {
Question quest = new Question("Mouhib 9a7boun ?", "EYY");
try {
FileOutputStream fos = new FileOutputStream(new File("./quest.xml"));
XMLEncoder encoder = new XMLEncoder(fos);
encoder.writeObject(quest);
encoder.close();
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
If you want to append addtional data to an existing file you can use FileOutputStream(File file, boolean append).
BUT this will not lead to a valid xml-file.
You can use XML DOM editing methods:
XML DOM Add Nodes

Servlet does not open txt

Servlet is very good looking and reading files that have English names like hello.txt. It does not want to read files that have a Russian name, such pushkin.txt. Is anyone able to help to solve this problem?
Here is the code:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.List;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class servlet extends HttpServlet {
/**
*
*/
private static final long serialVersionUID = 1L;
public static List<String> getFileNames(File directory, String extension) {
List<String> list = new ArrayList<String>();
File[] total = directory.listFiles();
for (File file : total) {
if (file.getName().endsWith(extension)) {
list.add(file.getName());
}
if (file.isDirectory()) {
List<String> tempList = getFileNames(file, extension);
list.addAll(tempList);
}
}
return list;
}
#SuppressWarnings("resource")
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException{
response.setContentType("text/html; charset=UTF-8");
String myName = request.getParameter("text");
List<String> files = getFileNames(new File("C:\\Users\\vany\\Desktop\\test"), "txt");
for (String string : files) {
if (myName.equals(string)) {
try {
File file = new File("C:\\Users\\vany\\Desktop\\test\\" + string);
FileReader reader = new FileReader(file);
int b;
PrintWriter writer = response.getWriter();
writer.print("<html>");
writer.print("<head>");
writer.print("<title>HelloWorld</title>");
writer.print("<body>");
writer.write("<div>");
while((b = reader.read()) != -1) {
writer.write((char) b);
}
writer.write("</div>");
writer.print("</body>");
writer.print("</html>");
}
finally {
if(reader != null) {
try{
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
}
}
}
The question is relevant, the problem is not solved
I thought that you have a problem whith the statements
for (String string : files) {
if (myName.equals(string)) {
I would compare in this way
for (File file: files) {
if (myName.equals(file.getName())) {
I hope that it help you.
Note: Thanks for the comments, you can try it.
Greetings
First of all I would use a debugger to check what's wrong with that code. It's quite difficult to find a bug without running the code. If you don't want to use a debugger print out all filenames found in the directory to ensure that some file names were found:
for (String string : files) {
System.out.println(string)
....
If files were found I would check whether I have rights to write to them. It might be that the application has not proper permissions to write in selected directory.
Are files "hello.txt" and pushkin.txt directly inside "C:\Users\vany\Desktop\test\" folder? Or is pushkin.txt file in another folder from "C:\Users\vany\Desktop\test\"?
Can you show us how you invoke the servlet?
If you have pushkin.txt in another folder and you invoke the servlet with something like "folder\pushkin.txt" it will not work because getFileNames() returns file names (without folder) and "myName.equals(string)" fails as "folder\pushkin.txt" is not equal to "pushkin.txt"

Convert DOC file to DOCX with Java

I need to use DOCX files (actually the XML contained in them) in a Java software I'm currently developing, but some people in my company still use the DOC format.
Do you know if there is a way to convert a DOC file to the DOCX format using Java ? I know it's possible using C#, but that's not an option
I googled it, but nothing came up...
Thanks
You may try Aspose.Words for Java. It allows you to load a DOC file and save it as DOCX format. The code is very simple as shown below:
// Open a document.
Document doc = new Document("input.doc");
// Save document.
doc.save("output.docx");
Please see if this helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
Check out JODConverter to see if it fits the bill. I haven't personally used it.
Use newer versions of jars jodconverter-core-4.2.2.jar and jodconverter-local-4.2.2.jar
String inputFile = "*.doc";
String outputFile = "*.docx";
LocalOfficeManager localOfficeManager = LocalOfficeManager.builder()
.install()
.officeHome(getDefaultOfficeHome()) //your path to openoffice
.build();
try {
localOfficeManager.start();
final DocumentFormat format
= DocumentFormat.builder()
.from(DefaultDocumentFormatRegistry.DOCX)
.build();
LocalConverter
.make()
.convert(new FileInputStream(new File(inputFile)))
.as(DefaultDocumentFormatRegistry.getFormatByMediaType("application/msword"))
.to(new File(outputFile))
.as(format)
.execute();
} catch (OfficeException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
} catch (FileNotFoundException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
} finally {
OfficeUtils.stopQuietly(localOfficeManager);
}
JODConvertor calls OpenOffice/LibreOffice via a network protocol. It can therefore 'do anything you can do in OpenOffice'. This includes converting formats. But it only does as good a job as whatever version of OpenOffice you are running. I have some art in one of my docs, and it doesn't convert them as I hoped.
JODConvertor is no longer supported, according to the google code web site for v3.
To get JOD to do the job you need to do something like
private static void transformBinaryWordDocToDocX(File in, File out)
{
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
docx.setStoreProperties(DocumentFamily.TEXT,
Collections.singletonMap("FilterName", "MS Word 2007 XML"));
converter.convert(in, out, docx);
}
private static void transformBinaryWordDocToW2003Xml(File in, File out)
{
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);;
DocumentFormat w2003xml = new DocumentFormat("Microsoft Word 2003 XML", "xml", "text/xml");
w2003xml.setInputFamily(DocumentFamily.TEXT);
w2003xml.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 2003 XML"));
converter.convert(in, out, w2003xml);
}
private static OfficeManager officeManager;
#BeforeClass
public static void setupStatic() throws IOException {
/*officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome("C:/Program Files/LibreOffice 3.6")
.buildOfficeManager();
*/
officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();
officeManager.start();
}
#AfterClass
public static void shutdownStatic() throws IOException {
officeManager.stop();
}
For this to work you need to be running LibreOffice as a networked server ( I could not get the 'run on demand' part of JODConvertor to work under windows with LO 3.6 very well )
To convert DOC file to HTML look at this
(Convert Word doc to HTML programmatically in Java)
Use this: http://poi.apache.org/
Or use this :
XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx")));
XWPFWordExtractor wx = new XWPFWordExtractor(docx);
String text = wx.getText();
System.out.println("text = "+text);
I needed the same conversion ,after researching a lot found Jodconvertor can be useful in it , you can download the jar from
https://code.google.com/p/jodconverter/downloads/list
Add jodconverter-core-3.0-beta-4-sources.jar file to your project lib
//1) Create OfficeManger Object
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome(new File("/opt/libreoffice4.4"))
.buildOfficeManager();
officeManager.start();
// 2) Create JODConverter converter
OfficeDocumentConverter converter = new OfficeDocumentConverter(
officeManager);
// 3)Create DocumentFormat for docx
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
docx.setStoreProperties(DocumentFamily.TEXT,
Collections.singletonMap("FilterName", "MS Word 2007 XML"));
//4)Call convert funtion in converter object
converter.convert(new File("doc/AdvancedTable.doc"), new File(
"docx/AdvancedTable.docx"), docx);
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("C:/Users/312845/Desktop/a.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("C:/Users/312845/Desktop/test.docx"));
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}

Identifying file type in Java

Please help me to find out the type of the file which is being uploaded.
I wanted to distinguish between excel type and csv.
MIMEType returns same for both of these file. Please help.
I use Apache Tika which identifies the filetype using magic byte patterns and globbing hints (the file extension) to detect the MIME type. It also supports additional parsing of file contents (which I don't really use).
Here is a quick and dirty example on how Tika can be used to detect the file type without performing any additional parsing on the file:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.HashMap;
import org.apache.tika.metadata.HttpHeaders;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.TikaMetadataKeys;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.xml.sax.helpers.DefaultHandler;
public class Detector {
public static void main(String[] args) throws Exception {
File file = new File("/pats/to/file.xls");
AutoDetectParser parser = new AutoDetectParser();
parser.setParsers(new HashMap<MediaType, Parser>());
Metadata metadata = new Metadata();
metadata.add(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName());
InputStream stream = new FileInputStream(file);
parser.parse(stream, new DefaultHandler(), metadata, new ParseContext());
stream.close();
String mimeType = metadata.get(HttpHeaders.CONTENT_TYPE);
System.out.println(mimeType);
}
}
I hope this will help. Taken from an example not from mine:
import javax.activation.MimetypesFileTypeMap;
import java.io.File;
class GetMimeType {
public static void main(String args[]) {
File f = new File("test.gif");
System.out.println("Mime Type of " + f.getName() + " is " +
new MimetypesFileTypeMap().getContentType(f));
// expected output :
// "Mime Type of test.gif is image/gif"
}
}
Same may be true for excel and csv types. Not tested.
I figured out a cheaper way of doing this with java.nio.file.Files
public String getContentType(File file) throws IOException {
return Files.probeContentType(file.toPath());
}
- or -
public String getContentType(Path filePath) throws IOException {
return Files.probeContentType(filePath);
}
Hope that helps.
Cheers.
A better way without using javax.activation.*:
URLConnection.guessContentTypeFromName(f.getAbsolutePath()));
If you are already using Spring this works for csv and excel:
import org.springframework.mail.javamail.ConfigurableMimeFileTypeMap;
import javax.activation.FileTypeMap;
import java.io.IOException;
public class ContentTypeResolver {
private FileTypeMap fileTypeMap;
public ContentTypeResolver() {
fileTypeMap = new ConfigurableMimeFileTypeMap();
}
public String getContentType(String fileName) throws IOException {
if (fileName == null) {
return null;
}
return fileTypeMap.getContentType(fileName.toLowerCase());
}
}
or with javax.activation you can update the mime.types file.
The CSV will start with text and the excel type is most likely binary.
However the simplest approach is to try to load the excel document using POI. If this fails try to load the file as a CSV, if that fails its possibly neither type.

Why am I getting a NullPointerException when trying to read a file?

I use this test to convert txt to pdf :
package convert.pdf;
//getResourceAsStream(String name) : Returns an input stream for reading the specified resource.
//toByteArray : Get the contents of an InputStream as a byte[].
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.io.IOUtils;
import convert.pdf.txt.TextConversion;
public class TestConversion {
private static byte[] readFilesInBytes(String file) throws IOException {
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
private static void writeFilesInBytes(byte[] file, String name) throws IOException {
IOUtils.write(file, new FileOutputStream(name));
}
//just change the extensions and test conversions
public static void main(String args[]) throws IOException {
ConversionToPDF algorithm = new TextConversion();
byte[] file = readFilesInBytes("/convert/pdf/text.txt");
byte[] pdf = algorithm.convertDocument(file);
writeFilesInBytes(pdf, "text.pdf");
}
}
Problem:
Exception in thread "main" java.lang.NullPointerException
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:218)
at convert.pdf.TestConversion.readFilesInBytes(TestConversion.java:17)
at convert.pdf.TestConversion.main(TestConversion.java:28)
I use the debugger, and the problem seems to be located here :
private static byte[] readFilesInBytes(String file) throws IOException {
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
What is my problem?
Sounds like the resource probably doesn't exist with that name.
Are you aware that Class.getResourceAsStream() finds a resource relative to that class's package, whereas ClassLoader.getResourceAsStream() doesn't? You can use a leading forward slash in Class.getResourceAsStream() to mimic this, so
Foo.class.getResourceAsStream("/bar.png")
is roughly equivalent to
Foo.class.getClassLoader().getResourceAsStream("bar.png")
Is this actually a file (i.e. a specific file on the normal file system) that you're trying to load? If so, using FileInputStream would be a better bet. Use Class.getResourceAsStream() if it's a resource bundled in a jar file or in the classpath in some other way; use FileInputStream if it's an arbitrary file which could be anywhere in the file system.
EDIT: Another thing to be careful of, which has caused me problems before now - if this has worked on your dev box which happens to be Windows, and is now failing on a production server which happens to be Unix, check the case of the filename. The fact that different file systems handle case-sensitivity differently can be a pain...
Are you checking to see if the file exists before you pass it to readFilesInBytes()? Note that Class.getResourceAsStream() returns null if the file cannot be found. You probably want to do:
private static byte[] readFilesInBytes(String file) throws IOException {
File testFile = new File(file);
if (!testFile.exists()) {
throw new FileNotFoundException("File " + file + " does not exist");
}
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
or better yet:
private static byte[] readFilesInBytes(String file) throws IOException {
InputStream stream = TestConversion.class.getResourceAsStream(file);
if (stream == null) {
throw new FileNotFoundException("readFilesInBytes: File " + file
+ " does not exist");
}
return IOUtils.toByteArray(stream);
}
This class reads a TXT file in the classpath and uses TextConversion to convert to PDF, then save the pdf in the file system.
Here TextConversion code :
package convert.pdf.txt;
//Conversion to PDF from text using iText.
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import convert.pdf.ConversionToPDF;
import convert.pdf.ConvertDocumentException;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Font;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
public class TextConversion implements ConversionToPDF {
public byte[] convertDocument(byte[] documents) throws ConvertDocumentException {
try {
return this.convertInternal(documents);
} catch (DocumentException e) {
throw new ConvertDocumentException(e);
} catch (IOException e) {
throw new ConvertDocumentException(e);
}
}
private byte[] convertInternal(byte[] documents) throws DocumentException, IOException {
Document document = new Document();
ByteArrayOutputStream pdfResultBytes = new ByteArrayOutputStream();
PdfWriter.getInstance(document, pdfResultBytes);
document.open();
BufferedReader reader = new BufferedReader( new InputStreamReader( new ByteArrayInputStream(documents) ) );
String line = "";
while ((line = reader.readLine()) != null) {
if ("".equals(line.trim())) {
line = "\n"; //white line
}
Font fonteDefault = new Font(Font.COURIER, 10);
Paragraph paragraph = new Paragraph(line, fonteDefault);
document.add(paragraph);
}
reader.close();
document.close();
return pdfResultBytes.toByteArray();
}
}
And here the code to ConversionToPDF :
package convert.pdf;
// Interface implemented by the conversion algorithms.
public interface ConversionToPDF {
public byte[] convertDocument(byte[] documentToConvert) throws ConvertDocumentException;
}
I think the problem come from my file system (devbox on windows and server is Unix).
I will try to modify my classpath.
This problem may be caused by calling methods on test.txt, which can be a folder shortcut. In other words, you're calling a method on a file that doesn't exist, resulting in a NullPointerException.

Categories

Resources