I am converting doc file into pdf format in android using following libraries,
itext-1.4.8.jar
poi-3.0-FINAL.jar
poi-scratchpad-3.2-FINAL.jar
here is my sample code
package com.example.converter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import android.content.Context;
import android.os.Environment;
import android.widget.LinearLayout;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon extends LinearLayout {
FileInputStream infile;
private static String FILE = Environment.getExternalStorageDirectory()
+ "/MyReport.pdf";
public TestCon(Context context) {
super(context);
my_method(context);
}
public void my_method(Context context) {
POIFSFileSystem fs = null;
Document document = new Document();
try {
infile = (FileInputStream) context.getApplicationContext().getAssets().open("test.doc");
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(infile);
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(FILE);
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range
.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": "
+ paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}
but I am getting this error
[2013-05-10 12:39:12 - Dex Loader] Unable to execute dex: Multiple dex files define Lorg/apache/poi/generator/FieldIterator;
[2013-05-10 12:39:12 - converter] Conversion to Dalvik format failed: Unable to execute dex: Multiple dex files define Lorg/apache/poi/generator/FieldIterator;
I have removed my android-support-v4.jar. from lib folder a/c to this answer answer about the error but I am still getting the same error :(
Please help me to solve this issue
Anyone who have done the doc to pdf conversion,please share your code.
I will be very thankful :)
Regards
The problem is that you are including something twice or more :
Multiple dex files define Lorg/apache/poi/generator/FieldIterator
Review your build path for duplicated libraries.
In addition, once this is resolved, you'll problably have to add this line in the project.properties file :
dex.force.jumbo=true
This will allow you to solve the problem with the 65535 methods limit problem for some time.
Related
I am trying to convert html to pdf using openhtmltopdf. I'm using mave openhtmltopdf.
Then I write the Main class (example below), but the problem is that I need the landscape orientation of the page and it was possible to adjust the font (now everything is moving out), what should I do for this?
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.FileSystems;
import org.jsoup.Jsoup;
import org.jsoup.helper.W3CDom;
import org.jsoup.nodes.Document;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
public class Main {
public static void main(String[] args) {
try {
// HTML file - Input
File inputHTML = new File("D:\\Загрузки от 13.05\\htmltopdf\\4874063.html");
// Converted PDF file - Output
String outputPdf = "D:\\Загрузки от 13.05\\htmltopdf\\Test.pdf";
Main htmlToPdf = new Main();
//create well formed HTML
org.w3c.dom.Document doc = htmlToPdf.createWellFormedHtml(inputHTML);
System.out.println("Starting conversion to PDF...");
htmlToPdf.xhtmlToPdf(doc, outputPdf);
} catch (IOException e) {
System.out.println("Error while converting HTML to PDF " + e.getMessage());
e.printStackTrace();
}
}
// Creating well formed document
private org.w3c.dom.Document createWellFormedHtml(File inputHTML) throws IOException {
Document document = Jsoup.parse(inputHTML, "UTF-8");
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
System.out.println("HTML parsing done...");
return new W3CDom().fromJsoup(document);
}
private void xhtmlToPdf(org.w3c.dom.Document doc, String outputPdf) throws IOException {
// base URI to resolve future resources
String baseUri = FileSystems.getDefault()
.getPath("F:/", "Anshu/NetJs/Programs/", "src/main/resources/template")
.toUri()
.toString();
OutputStream os = new FileOutputStream(outputPdf);
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.withUri(outputPdf);
builder.toStream(os);
// add external font
//builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA");
builder.withW3cDocument(doc, baseUri);
builder.run();
System.out.println("PDF creation completed");
os.close();
}
}```
I tried to read a Docx File in java.
But I am getting the error as "The constructor XWPFDocument(FileInputStream) is undefined" in LINE NO: 16 and "Type mismatch: cannot convert from XWPFParagraph[] to List" in LINE NO: 18.
Below are my code.
Used Jars:
org.apache.poi.xwpf.usermodel.XWPFDocument;
org.apache.poi.xwpf.usermodel.XWPFParagraph;
Can any one please tell me that why Iam getting this and please tell me that how to resolve it?
Thanks in advance!
package com.readindDocx;
import java.io.File;
import java.io.FileInputStream;
import java.util.List;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class ReadingDocument {
public static void main(String[] args) {
try {
File file = new File("D:/SampleWordFile.docx");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
}
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
I'm currently using the below method to take screenshots and store them in a folder called 'Screenshots'. But what i want is, to take these screenshots and paste them in a word document according to the test cases to which they belong.
Is it possible? If so could somebody please guide me?
public String FailureScreenshotAndroid(String name) {
try {
Date d = new Date();
String date = d.toString().replace(":", "_").replace(" ", "_");
TakesScreenshot t = (TakesScreenshot)driver;
File f1 = t.getScreenshotAs(OutputType.FILE);//Temporary Location
String permanentLocation =System.getProperty("user.dir")+ "\\Screenshots\\"+name+date+".png";
File f2 = new File(permanentLocation);
FileUtils.copyFile(f1, f2);
return permanentLocation;
}catch (Exception e) {
String msg = e.getMessage();
return msg;
}
}
Try below:
import java.awt.Rectangle;
import java.awt.Robot;
import java.awt.Toolkit;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.concurrent.TimeUnit;
import javax.imageio.ImageIO;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class TakeScreenshots {
public static void main(String[] args) {
try {
XWPFDocument docx = new XWPFDocument();
XWPFRun run = docx.createParagraph().createRun();
FileOutputStream out = new FileOutputStream("d:/xyz/doc1.docx");
for (int counter = 1; counter <= 5; counter++) {
captureScreenShot(docx, run, out);
TimeUnit.SECONDS.sleep(1);
}
docx.write(out);
out.flush();
out.close();
docx.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void captureScreenShot(XWPFDocument docx, XWPFRun run, FileOutputStream out) throws Exception {
String screenshot_name = System.currentTimeMillis() + ".png";
BufferedImage image = new Robot()
.createScreenCapture(new Rectangle(Toolkit.getDefaultToolkit().getScreenSize()));
File file = new File("d:/xyz/" + screenshot_name);
ImageIO.write(image, "png", file);
InputStream pic = new FileInputStream("d:/xyz/" + screenshot_name);
run.addBreak();
run.addPicture(pic, XWPFDocument.PICTURE_TYPE_PNG, screenshot_name, Units.toEMU(350), Units.toEMU(350));
pic.close();
file.delete();
}
}
I wrote following code to fetch the results in form of XML responses and write some of its content to the a file from Java. This is done by receiving an XML-response for about 700,000 queries to a public database.
However, before the code can write to the file, it is either stopped by some random exception (from the server) at a random position in code. I tried writing to the file from the For-loop itself, but was not able to. So I tried to store the chunks from received responses into Java HashMap and write the HashMap to the file in a single call. But before the code receives all the responses in the for-loop and stores them into a HashMap, it stops with some exception (maybe at the 15000th iteration!!). Is there any other efficient way to write to the file in Java when one requires such iterations to fetch the data?
The local file that I use for this code is here.
My code is,
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.Reader;
import java.io.StringWriter;
import java.net.URL;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.json.XML;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class random {
static FileWriter fileWriter;
static PrintWriter writer;
public static void main(String[] args) {
// Hashmap to store the MeSH values for each PMID
Map<String, String> universalMeSHMap = new HashMap<String, String>();
try {
// FileWriter for MeSH terms
fileWriter = new FileWriter("/home/user/eclipse-workspace/pmidtomeshConverter/src/main/resources/outputFiles/pmidMESH.txt", true);
writer = new PrintWriter(fileWriter);
// Read the PMIDS from this file
String filePath = "file_attached_to_Post.txt";
String line = null;
BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath));
String[] pmidsAll = null;
int x = 0;
try {
//print first 2 lines or all if file has less than 2 lines
while(((line = bufferedReader.readLine()) != null) && x < 1) {
pmidsAll = line.split(",");
x++;
}
}
finally {
bufferedReader.close();
}
// List of strings containing the PMIDs
List<String> pmidList = Arrays.asList(pmidsAll);
// Iterate through the list of PMIDs to fetch the XML files from PubMed using eUtilities API service from PubMed
for (int i = 0; i < pmidList.size(); i++) {
String baseURL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=abstract&id=";
// Process to get the PMIDs
String indPMID_p0 = pmidList.get(i).toString().replace("[", "");
String indPMID_p1 = indPMID_p0.replace("]", "");
String indPMID_p2 = indPMID_p1.replace("\\", "");
String indPMID_p3 = indPMID_p2.replace("\"", "");
// Fetch XML response from the eUtilities into a document object
Document doc = parseXML(new URL(baseURL + indPMID_p3));
// Convert the retrieved XMl into a Java String
String xmlString = xml2String(doc); // Converts xml from doc into a string
// Convert the Java String into a JSON Object
JSONObject jsonWithMeSH = XML.toJSONObject(xmlString); // Converts the xml-string into JSON
// -------------------------------------------------------------------
// Getting the MeSH terms from a JSON Object
// -------------------------------------------------------------------
JSONObject ind_MeSH = jsonWithMeSH.getJSONObject("PubmedArticleSet").getJSONObject("PubmedArticle").getJSONObject("MedlineCitation");
// List to store multiple MeSH types
List<String> list_MeSH = new ArrayList<String>();
if (ind_MeSH.has("MeshHeadingList")) {
for (int j = 0; j < ind_MeSH.getJSONObject("MeshHeadingList").getJSONArray("MeshHeading").length(); j++) {
list_MeSH.add(ind_MeSH.getJSONObject("MeshHeadingList").getJSONArray("MeshHeading").getJSONObject(j).getJSONObject("DescriptorName").get("content").toString());
}
} else {
list_MeSH.add("null");
}
universalMeSHMap.put(indPMID_p3, String.join("\t", list_MeSH));
writer.write(indPMID_p3 + ":" + String.join("\t", list_MeSH) + "\n");
System.out.println("Completed iteration for " + i + " PMID");
}
// Write to the file here
for (Map.Entry<String,String> entry : universalMeSHMap.entrySet()) {
writer.append(entry.getKey() + ":" + entry.getValue() + "\n");
}
System.out.print("Completed writing the file");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
writer.flush();
writer_pubtype.flush();
writer.close();
writer_pubtype.close();
}
}
private static String xml2String(Document doc) throws TransformerException {
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.METHOD, "xml");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
trans.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(2));
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(doc.getDocumentElement());
trans.transform(source, result);
String xmlString = sw.toString();
return xmlString;
}
private static Document parseXML(URL url) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse((url).openStream());
doc.getDocumentElement().normalize();
return doc;
}
private static String readAll(Reader rd) throws IOException {
StringBuilder sb = new StringBuilder();
int cp;
while ((cp = rd.read()) != -1) {
sb.append((char) cp);
}
return sb.toString();
}
public static JSONObject readJsonFromUrl(String url) throws IOException, JSONException {
InputStream is = new URL(url).openStream();
try {
BufferedReader rd = new BufferedReader(new InputStreamReader(is, Charset.forName("UTF-8")));
String jsonText = readAll(rd);
JSONObject json = new JSONObject(jsonText);
return json;
} finally {
is.close();
}
}
}
This is what it prints on the Console before the exception.
Completed iteration for 0 PMID
Completed iteration for 1 PMID
Completed iteration for 2 PMID
Completed iteration for 3 PMID
Completed iteration for 4 PMID
Completed iteration for 5 PMID
And it writes until the below given exception appears...
So at any random point in the loop, I get the exception below.
java.io.FileNotFoundException: https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1890)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:647)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1304)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1270)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:264)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1161)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1045)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:959)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at pmidtomeshConverter.Convert2MeSH.parseXML(Convert2MeSH.java:240)
at pmidtomeshConverter.Convert2MeSH.main(Convert2MeSH.java:121)
You want your parser to ignore DTD when parsing them.
Use this feature :
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
See Xerces documentation for other features.
There is no need to use a Map; just write directly to the file. For better performance use a BufferedWriter.
I would also check that there is no rate limit or anything of that nature on the server side (you can guess that from the error you're getting). Save the response in a separate file when parsing or downloading fails, that way you will be able to diagnose the issue better.
I would also invest some time into implementing a restart mechanism, such that you can restart the process from the last failed location instead of starting from the beginning every time. It can be as simple as providing a skip counter as input to skip the first N requests.
You should re-use the DocumentBuilderFactory so that it doesn't load the same DTD every time. Additionally you may want to disable DTD validation altogether (unless you want only valid documents, in which case it's good to catch that exception and dump the bad XML to a separate file for review).
private static DocumentBuilderFactory dbf;
public static void main(String[] args) {
dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setFeature("http://xml.org/sax/features/validation", false);
...
}
private static Document parseXML(URL url) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse((url).openStream());
doc.getDocumentElement().normalize();
return doc;
}
How to Read word comments (Annotation) from microsoft word document ?
please provide some example code if possible ...
Thanking you ...
Finally, I found the answer
here is the code snippet ...
File file = null;
FileInputStream fis = null;
HWPFDocument document = null;
Range commentRange = null;
try {
file = new File(fileName);
fis = new FileInputStream(file);
document = new HWPFDocument(fis);
commentRange = document.getCommentsRange();
int numComments = commentRange.numParagraphs();
for (int i = 0; i < numComments; i++) {
String comments = commentRange.getParagraph(i).text();
comments = comments.replaceAll("\\cM?\r?\n", "").trim();
if (!comments.equals("")) {
System.out.println("comment :- " + comments);
}
}
} catch (Exception e) {
e.printStackTrace();
}
I am using Poi poi-3.5-beta7-20090719.jar, poi-scratchpad-3.5-beta7-20090717.jar. The other archives - poi-ooxml-3.5-beta7-20090717.jar and poi-dependencies-3.5-beta7-20090717.zip - will be needed if you are hoping to work on the OpenXML based file formats.
I appreciate the help of Mark B who actually found this solution ....
Get the HWPFDocument object (by passing a Word document in an input stream, say).
Then you can get the summary via getSummaryInformation(), and that will give you a SummaryInformation object via getSummary()
Please refer the following link,it may fulfill yr requirements...
http://bihag.wordpress.com/2009/11/04/how-to-read-comments-from-word-with-poi-jav/#comment-13
Am also new to apache poi. Hear is my program its working fine this program extract word form doc to text...I hope this program will help u before u run this program u can set corresponding lib files in your classpath.
/*
* FileExtract.java
*
* Created on April 12, 2010, 9:46 AM
*
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
*/
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import javax.swing.text.BadLocationException;
import javax.swing.text.DefaultStyledDocument;
import javax.swing.text.rtf.RTFEditorKit;
import java.io.*;
import org.apache.poi.POIOLE2TextExtractor.*;
import org.apache.poi.POIOLE2TextExtractor;
import org.apache.poi.POITextExtractor;
import org.apache.poi.extractor.ExtractorFactory;
import org.apache.poi.hdgf.extractor.VisioTextExtractor;
import org.apache.poi.hslf.extractor.PowerPointExtractor;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.extractor.ExcelExtractor;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import javax.swing.text.Document;
/**
*
* #author ChandraMouil V
*/
public class RtfDocTextExtract {
/** Creates a new instance of FileExtract */
static String filePath;
static String rtfFile;
static FileInputStream fis;
static int x=0;
public RtfDocTextExtract() {
}
//This function for .DOC File
public static void meth(String filePath) {
try {
if(x!=0){
fis = new FileInputStream("D:/DummyRichTextFormat.doc");
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
WordExtractor oleTextExtractor = (WordExtractor) ExtractorFactory.createExtractor(fileSystem);
String[] paragraphText = oleTextExtractor.getParagraphText();
FileWriter fw = new FileWriter("E:/resume-template.txt");
for (String paragraph : paragraphText) {
fw.write(paragraph);
}
fw.flush();
}
}catch(Exception e){
e.printStackTrace();
}
}
}