I'm writing a app to display and edit file .doc I'm using POI with HWPF. Now I can read text from file and write to file .doc too. But my reader only read default file .doc which is created by msoffice, It can't read the file created by my writer also msoffice can read this and all content was displayed right. It always show error:
Exception in thread "main" java.lang.RuntimeException:java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:322)
at ReadPOI.main(ReadPOI.java:18)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.poi.hwpf.usermodel.Range.binarySearchStart(Range.java:1016)
at org.apache.poi.hwpf.usermodel.Range.findRange(Range.java:1095)
at org.apache.poi.hwpf.usermodel.Range.initParagraphs(Range.java:982)
at org.apache.poi.hwpf.usermodel.Range.numParagraphs(Range.java:311)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processParagraphes(AbstractWordConverter.java:1058)
at org.apache.poi.hwpf.converter.WordToTextConverter.processSection(WordToTextConverter.java:435)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processSingleSection(AbstractWordConverter.java:1126)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:722)
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:304)
... 1 more
Are there any different between file created by msoffice and file created by my writer, and how to fix it. Please help me. There are my demo code in Java. Thank you
My reader:
import java.io.File;
import java.io.FileInputStream;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
public class ReadPOI
{
public static void main(String args[]) throws Exception
{
File file = new File("Test.doc");
FileInputStream fin = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(fin);
Range range = doc.getRange();
WordExtractor extractor = new WordExtractor(doc);
System.out.println("starting\n" + extractor.getText() + "end\n");
fin.close();
}
}
My Writer:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.hwpf.HWPFDocument;
public class WritePOI
{
public static void main(String args[]) throws Exception
{
File file = new File("Template.doc");
FileInputStream fin = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(fin);
doc.getRange().replaceText("Haha\n", false);
FileOutputStream fout = new FileOutputStream("Test.doc");
doc.write(fout);
fout.close();
fin.close();
}
}
It's a bug in the WordExtractor getText() that even remains up to the version 3.10-FINAL. It should not give you an:
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:571)
at java.util.ArrayList.get(ArrayList.java:349)
at org.apache.poi.hwpf.usermodel.Range.binarySearchStart(Range.java:1016)
It is not marked as deprecated in the api but it says that getTextFromPieces() is faster. I double checked it using your example and it works OK.
So in the ReadPOI use:
System.out.println(extractor.getTextFromPieces());
Or
String [] dataArray = extractor.getParagraphText();
for(int i=0;i<dataArray.length;i++)
{
System.out.println("\n–" + dataArray[i]);
}
Related
Here is what I've tried
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
public class GZIPCompression {
public static void main(String[] args) throws IOException {
File file = new File("gziptest.zip");
try ( OutputStream os = new GZIPOutputStream(new FileOutputStream(file, true))) {
os.write("test".getBytes());
}
try ( GZIPInputStream inStream = new GZIPInputStream(new FileInputStream(file))) {
while (inStream.available() > 0) {
System.out.print((char) inStream.read());
}
}
}
}
Based on what I've read, this should append "test" to the end of gziptest.zip, but when I run the code, the file doesn't get modified at all. The strange thing is that if I change FileOutputStream(file, true) to FileOutputStream(file, false), the file does get modified, but its original contents are overriden which is of course not what I want.
I am using JDK 14.0.1.
A couple of things here.
Zip and GZip are different.. If you are doing a gzip test, your file should have the extension .gz, not .zip
To properly append "test" to the end of the gzip data, you should first use a GZIPInputStream to read in from the file, then tack "test" onto the uncompressed text, and then send it back out through GZipOutputStream
I am trying to create a word document with apache poi which will contain a jpeg picture. I ve found code to do so from here stackoverflow. However, when I run the code a docx is created, it seems with its size that contains the jpg image but I couldn't open it.
My code is the following:
import org.apache.poi.util.Units;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.BreakType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class SimpleImages {
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument();
XWPFParagraph p = doc.createParagraph();
XWPFRun r = p.createRun();
//for(String imgFile : args) {
String imgFile = "mosaic.jpg";
int format =XWPFDocument.PICTURE_TYPE_JPEG;
r.setText(imgFile);
r.addBreak();
r.addPicture(new FileInputStream(imgFile), format, imgFile, Units.toEMU(200), Units.toEMU(200)); // 200x200 pixels
r.addBreak(BreakType.PAGE);
//}
FileOutputStream out = new FileOutputStream("images.docx");
doc.write(out);
out.close();
}
}
When I tried to open my docx I am receiving:
the file file.docx cannot be opened because there are problems with
the contents
.
I had the same problem but its got resolved. Previously i was using poi 3.10 version and that was culprit for the issue. I just updated it to 3.12 and issue got resolved
I have done reading doc file now i'm trying to read docx file content. when i searched for sample code i found many, nothing worked. check the code for reference...
import java.io.*;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
import com.itextpdf.text.Paragraph;
public class createPdfForDocx {
public static void main(String[] args) {
InputStream fs = null;
Document document = new Document();
XWPFWordExtractor extractor = null ;
try {
fs = new FileInputStream("C:\\DATASTORE\\test.docx");
//XWPFDocument hdoc=new XWPFDocument(fs);
XWPFDocument hdoc=new XWPFDocument(OPCPackage.open(fs));
//XWPFDocument hdoc=new XWPFDocument(fs);
extractor = new XWPFWordExtractor(hdoc);
OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/test.pdf"));
PdfWriter.getInstance(document, fileOutput);
document.open();
String fileData=extractor.getText();
System.out.println(fileData);
document.add(new Paragraph(fileData));
System.out.println(" pdf document created");
} catch(IOException e) {
System.out.println("IO Exception");
e.printStackTrace();
} catch(Exception ex) {
ex.printStackTrace();
}finally {
document.close();
}
}//end of main()
}//end of class
For the above code i'm getting following Exception:
org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:60)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:107)
at pagecode.createPdfForDocx.main(createPdfForDocx.java:20)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:67)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:521)
at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:58)
... 4 more
Caused by: java.lang.NoSuchMethodError: org/openxmlformats/schemas/wordprocessingml/x2006/main/CTStyles.getStyleList()Ljava/util/List;
at org.apache.poi.xwpf.usermodel.XWPFStyles.onDocumentRead(XWPFStyles.java:78)
at org.apache.poi.xwpf.usermodel.XWPFStyles.<init>(XWPFStyles.java:59)
... 9 more
Please help
Thank you
This is covered in the Apache POI FAQ! The entry you want is I'm using the poi-ooxml-schemas jar, but my code is failing with "java.lang.NoClassDefFoundError: org/openxmlformats/schemas/something"
The short answer is to switch the poi-ooxml-schemas jar for the full ooxml-schemas-1.1 jar. The full answer is given in the FAQ
For reading excels or docx file if you want to solve errors you need to add all jars then you wont get any error.
I am writing an updater. I have this code:
package main;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.regex.Pattern;
import java.lang.*;
import static java.lang.System.out;
public class UpdaterCore
{
public static void main(String args[]) throws IOException
{
java.io.BufferedInputStream inv = new java.io.BufferedInputStream(new
java.net.URL("http://unicombox.tk/update/nv").openStream());
java.io.FileOutputStream fosv = new java.io.FileOutputStream("nv");
java.io.BufferedOutputStream boutv = new BufferedOutputStream(fosv,1024);
byte data[] = new byte[1024];
while(inv.read(data,0,1024)>=0)
{
boutv.write(data);
}
boutv.close();
inv.close();
//end version download
Scanner VersionReader= new Scanner(new File ("v")).useDelimiter(",");
int currentVersion= VersionReader.nextInt();
VersionReader.close();
Scanner NewVersionReader= new Scanner(new File ("nv")).useDelimiter(",");
int newVersion= NewVersionReader.nextInt();
NewVersionReader.close();
if (newVersion>currentVersion){
java.io.BufferedInputStream in = new java.io.BufferedInputStream(new
java.net.URL("http://unicombox.tk/update/update.zip").openStream());
java.io.FileOutputStream fos = new java.io.FileOutputStream("update.zip");
java.io.BufferedOutputStream bout = new BufferedOutputStream(fos,1024);
byte data1[] = new byte[1024];
while(in.read(data1,0,1024)>=0)
{
bout.write(data1);
}
bout.close();
in.close();
out.println("Update successfully downloaded!");
}
else{
out.println("You have the latest version!");
}
}
}
It gets the new version from a server, and then compares it to its current version. If the new version is greater than the current version, it downloads the update.
I am having one big problem. My program can never find the files "v" and "nv"!
"v" and "nv" are in the same folder as the compiled jar, yet I get a FileNotFound.
What am I doing wrong?
Get path to current directory (directory where the .jar file is placed) like this:
// import java.io.*;
// import java.net.URLDecoder;
// throws java.io.UnsupportedEncodingException
String path = UpdaterCore.class.getProtectionDomain().getCodeSource().getLocation().getPath();
String decodedPath = URLDecoder.decode(path, "UTF-8");
System.out.println(decodedPath);
and then create File instance like this
new File (decodedPath + File.separatorChar + "v")
You are probably running the program from a different directory, one level up?
You can use getAbsolutePath() on those java.io.File-s to find out what file path are you really trying to read from.
Or just use Marek Sebera's solution, it is fine.
I use this test to convert txt to pdf :
package convert.pdf;
//getResourceAsStream(String name) : Returns an input stream for reading the specified resource.
//toByteArray : Get the contents of an InputStream as a byte[].
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.io.IOUtils;
import convert.pdf.txt.TextConversion;
public class TestConversion {
private static byte[] readFilesInBytes(String file) throws IOException {
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
private static void writeFilesInBytes(byte[] file, String name) throws IOException {
IOUtils.write(file, new FileOutputStream(name));
}
//just change the extensions and test conversions
public static void main(String args[]) throws IOException {
ConversionToPDF algorithm = new TextConversion();
byte[] file = readFilesInBytes("/convert/pdf/text.txt");
byte[] pdf = algorithm.convertDocument(file);
writeFilesInBytes(pdf, "text.pdf");
}
}
Problem:
Exception in thread "main" java.lang.NullPointerException
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:218)
at convert.pdf.TestConversion.readFilesInBytes(TestConversion.java:17)
at convert.pdf.TestConversion.main(TestConversion.java:28)
I use the debugger, and the problem seems to be located here :
private static byte[] readFilesInBytes(String file) throws IOException {
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
What is my problem?
Sounds like the resource probably doesn't exist with that name.
Are you aware that Class.getResourceAsStream() finds a resource relative to that class's package, whereas ClassLoader.getResourceAsStream() doesn't? You can use a leading forward slash in Class.getResourceAsStream() to mimic this, so
Foo.class.getResourceAsStream("/bar.png")
is roughly equivalent to
Foo.class.getClassLoader().getResourceAsStream("bar.png")
Is this actually a file (i.e. a specific file on the normal file system) that you're trying to load? If so, using FileInputStream would be a better bet. Use Class.getResourceAsStream() if it's a resource bundled in a jar file or in the classpath in some other way; use FileInputStream if it's an arbitrary file which could be anywhere in the file system.
EDIT: Another thing to be careful of, which has caused me problems before now - if this has worked on your dev box which happens to be Windows, and is now failing on a production server which happens to be Unix, check the case of the filename. The fact that different file systems handle case-sensitivity differently can be a pain...
Are you checking to see if the file exists before you pass it to readFilesInBytes()? Note that Class.getResourceAsStream() returns null if the file cannot be found. You probably want to do:
private static byte[] readFilesInBytes(String file) throws IOException {
File testFile = new File(file);
if (!testFile.exists()) {
throw new FileNotFoundException("File " + file + " does not exist");
}
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
or better yet:
private static byte[] readFilesInBytes(String file) throws IOException {
InputStream stream = TestConversion.class.getResourceAsStream(file);
if (stream == null) {
throw new FileNotFoundException("readFilesInBytes: File " + file
+ " does not exist");
}
return IOUtils.toByteArray(stream);
}
This class reads a TXT file in the classpath and uses TextConversion to convert to PDF, then save the pdf in the file system.
Here TextConversion code :
package convert.pdf.txt;
//Conversion to PDF from text using iText.
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import convert.pdf.ConversionToPDF;
import convert.pdf.ConvertDocumentException;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Font;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
public class TextConversion implements ConversionToPDF {
public byte[] convertDocument(byte[] documents) throws ConvertDocumentException {
try {
return this.convertInternal(documents);
} catch (DocumentException e) {
throw new ConvertDocumentException(e);
} catch (IOException e) {
throw new ConvertDocumentException(e);
}
}
private byte[] convertInternal(byte[] documents) throws DocumentException, IOException {
Document document = new Document();
ByteArrayOutputStream pdfResultBytes = new ByteArrayOutputStream();
PdfWriter.getInstance(document, pdfResultBytes);
document.open();
BufferedReader reader = new BufferedReader( new InputStreamReader( new ByteArrayInputStream(documents) ) );
String line = "";
while ((line = reader.readLine()) != null) {
if ("".equals(line.trim())) {
line = "\n"; //white line
}
Font fonteDefault = new Font(Font.COURIER, 10);
Paragraph paragraph = new Paragraph(line, fonteDefault);
document.add(paragraph);
}
reader.close();
document.close();
return pdfResultBytes.toByteArray();
}
}
And here the code to ConversionToPDF :
package convert.pdf;
// Interface implemented by the conversion algorithms.
public interface ConversionToPDF {
public byte[] convertDocument(byte[] documentToConvert) throws ConvertDocumentException;
}
I think the problem come from my file system (devbox on windows and server is Unix).
I will try to modify my classpath.
This problem may be caused by calling methods on test.txt, which can be a folder shortcut. In other words, you're calling a method on a file that doesn't exist, resulting in a NullPointerException.