How to get pictures and tables from .docx document using apache poi? - java

Dears, kindly i tried to extract whole document from .docx file to a text area in java, and What i only receive is text without images or tables, so any advice? Thanks in advance.
My code is :
try{
JFileChooser chooser = new JFileChooser();
chooser.showOpenDialog(null);
XWPFDocument doc = new XWPFDocument(new
FileInputStream(chooser.getSelectedFile()));
XWPFWordExtractor extract = new XWPFWordExtractor(doc);
content.setText(extract.getText());
content.setFont(new Font("Serif", Font.ITALIC, 16));
content.setLineWrap(true);
content.setWrapStyleWord(true);
content.setBackground(Color.white);
} catch(Exception e){
JOptionPane.showMessageDialog(null, e);
}
}

To extract tables use List<XWPFTable> table = doc.getTables()
The example below
public static void readWordDocument() {
try {
String fileName = "C:\\sample.docx";
if(!(fileName.endsWith(".doc") || fileName.endsWith(".docx"))) {
throw new FileFormatException();
} else {
XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));
List<XWPFTable> table = doc.getTables();
for (XWPFTable xwpfTable : table) {
List<XWPFTableRow> row = xwpfTable.getRows();
for (XWPFTableRow xwpfTableRow : row) {
List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
for (XWPFTableCell xwpfTableCell : cell) {
if(xwpfTableCell!=null)
{
System.out.println(xwpfTableCell.getText());
List<XWPFTable> itable = xwpfTableCell.getTables();
if(itable.size()!=0)
{
for (XWPFTable xwpfiTable : itable) {
List<XWPFTableRow> irow = xwpfiTable.getRows();
for (XWPFTableRow xwpfiTableRow : irow) {
List<XWPFTableCell> icell = xwpfiTableRow.getTableCells();
for (XWPFTableCell xwpfiTableCell : icell) {
if(xwpfiTableCell!=null)
{
System.out.println(xwpfiTableCell.getText());
}
}
}
}
}
}
}
}
}
}
} catch(FileFormatException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
To extarct images use List<XWPFPictureData> piclist=docx.getAllPictures()
See example below
public static void extractImages(String src){
try{
//create file inputstream to read from a binary file
FileInputStream fs=new FileInputStream(src);
//create office word 2007+ document object to wrap the word file
XWPFDocument docx=new XWPFDocument(fs);
//get all images from the document and store them in the list piclist
List<XWPFPictureData> piclist=docx.getAllPictures();
//traverse through the list and write each image to a file
Iterator<XWPFPictureData> iterator=piclist.iterator();
int i=0;
while(iterator.hasNext()){
XWPFPictureData pic=iterator.next();
byte[] bytepic=pic.getData();
BufferedImage imag=ImageIO.read(new ByteArrayInputStream(bytepic));
ImageIO.write(imag, "jpg", new File("D:/imagefromword"+i+".jpg"));
i++;
}
}catch(Exception e){System.exit(-1);}
}

Related

Read three column and write two column in excel using selenium java

I need to read three columns as Name, Empid and empcode from excel, insert into textbox and save the record then setid & reqid fields are generated. Now save the setid and reqid after the Name,Empid and empcode in the same excel. I have tried any code from the internet but not much is working.
Below code for writing in excel
public static XSSFWorkbook wb;
public static XSSFSheet sheet;
public static FileInputStream fs;
public static FileOutputStream out;
public void WriteExcelFile()
{
String str1 = ackgreqid1.getText();
File file = new File(System.getProperty("user.dir") + "/src/main/java/com/testdata/FreeTestData.xlsx");
try {
fs = new FileInputStream(file);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
try {
wb = new XSSFWorkbook(fs);
sheet = wb.getSheet("TestData");
} catch (IOException e) {
System.out.println("unable to find workbook or worksheet");
e.printStackTrace();
}
Iterator<Row> sheetrow = sheet.rowIterator();
// Row firstRow = sheetrow.next();
// int rowCount = sheet.getLastRowNum();
// = firstRow.cellIterator();
// Iterator<Cell> cell = firstRow.cellIterator();
while (sheetrow.hasNext()) {
Row value = sheetrow.next();
// System.out.println(value);
Iterator<Cell> cell = value.cellIterator();
Cell cellValue = cell.next();
String cellData = cellValue.getStringCellValue();
while (cell.hasNext()) {
if (cellData.equalsIgnoreCase("Step-Up")) {
int lastCellNum = value.getLastCellNum();
// System.out.println(lastCellNum);
Cell createCell = value.createCell(lastCellNum);
createCell.setCellValue(str1);
try {
out = new FileOutputStream(file);
wb.write(out);
// cell.next();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
out.close();
break;
} catch (IOException e) {
e.printStackTrace();
}
}
} else {
break;
}
}

Apache POI docx - after modified the target docx file, file corrupt

I had write a code by using Apache POI 3.6.
The code is use to insert paragraph ,table and image to one docx file, but as the paragraph, table and image is from different docx file, so I need to read content from different docx file then insert to the target file.
But I found the file is only readable at the first content insert, so is there any thing need to be changed?
My method defined like following:
public class DocDomGroupUtilImpl implements DocDomGroupUtil {
private FileInputStream fips;
private XWPFDocument document;
private FileOutputStream fops;
#Override
public void generateDomGroupFile(File templateFile, List<Item> items) {
// initial parameters
Map<String, String> parameters = new HashMap<String, String>();
for (Item item : items) {
parameters.put(item.getParaName(), item.getParaValue());
}
// get domGroup type
String templateFileName = templateFile.getName();
String type = templateFileName.substring(0, templateFileName.indexOf("."));
try {
fips = new FileInputStream(templateFile);
document = new XWPFDocument(fips);
// create tempt file for document domGroup, named like
// docDomGroup_<type>_<domName>.docx
String domGroupFilePath = CONSTAINTS.temptDomGroupPath + "docDomGroup_" + type + "_"
+ parameters.get("$DomGroupName_Value") + ".docx";
File domGroupFile = new File(domGroupFilePath);
if (domGroupFile.exists()) {
domGroupFile.delete();
}
domGroupFile.createNewFile();
fops = new FileOutputStream(domGroupFile, true);
// modified the groupName
// replace content
String regularExpression = "\\$(.*)_Value";
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs) {
List<XWPFRun> runs = paragraph.getRuns();
if (runs != null) {
for (XWPFRun run : runs) {
String text = run.getText(0);
if (text != null && Pattern.matches(regularExpression, text)) {
text = parameters.get(text);
run.setText(text, 0);
}
}
}
}
document.write(fops);
close();
// copy all the information from dom related files
File dir = new File(CONSTAINTS.temptDomPath);
for (File file : dir.listFiles()) {
if (!file.isDirectory()) {
fops = new FileOutputStream(domGroupFile, true);
fips = new FileInputStream(file);
document = new XWPFDocument(fips);
document.write(fops);
}
close();
}
// clean up tempt dom folder removeAllDomFile();
removeAllDomFile();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
close();
}
}
/**
* remove all the generated tempt dom files
*/
private void removeAllDomFile() {
// loop directory and delete tempt file
File dir = new File(CONSTAINTS.temptDomPath);
for (File file : dir.listFiles())
if (!file.isDirectory()) {
file.delete();
}
}
private void close() {
try {
fips.close();
fops.flush();
fops.close();
document.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

Apache POI event API Update existing Excel sheet

I have large excel file with several worksheets.
I want to process just one sheet in file...Read value from two columns and update two columns.
Using this code, I am able to read data from sheet.But unable to figure out, how to save output back.
public class ExcelFunctions {
private class ExcelData implements SheetContentsHandler {
private Record rec ;
public void startRow(int rowNum) {
rec = new Record();
output.put("R"+rowNum, rec);
}
public void endRow(int rowNum) {
}
public void cell(String cellReference, String formattedValue,
XSSFComment comment) {
int thisCol = (new CellReference(cellReference)).getCol();
if(thisCol==7){
try {
rec.setK1(formattedValue);
} catch (Exception e) {
}
}
if(thisCol==8){
try {
rec.setK2(formattedValue);
} catch (Exception e) {
}
}
if(thisCol == 27){
String key = rec.full_key();
System.out.println(key);
///////Process Matched Key...get Data
//////Set value to column 27
}
if(thisCol == 28){
String key = rec.full_key();
System.out.println(key);
///////Process Matched Key...get Data
//////Set value to column 28
}
}
public void headerFooter(String text, boolean isHeader, String tagName) {
}
}
///////////////////////////////////////
private final OPCPackage xlsxPackage;
private final Map<String, Record> output;
public ExcelFunctions(OPCPackage pkg, Map<String, Record> output) {
this.xlsxPackage = pkg;
this.output = output;
}
public void processSheet(
StylesTable styles,
ReadOnlySharedStringsTable strings,
SheetContentsHandler sheetHandler,
InputStream sheetInputStream)
throws IOException, ParserConfigurationException, SAXException {
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(
styles, null, strings, sheetHandler, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
public void process()
throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
boolean found = false;
while (iter.hasNext() && !found) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
if(sheetName.equals("All Notes") ){
processSheet(styles, strings, new ExcelData(), stream);
found = true;
}
stream.close();
}
}
#SuppressWarnings("unused")
public static void main(String[] args) throws Exception {
File xlsxFile = new File("C:\\Users\\admin\\Downloads\\Unique Name Macro\\big.xlsm");
if (!xlsxFile.exists()) {
System.err.println("Not found or not a file: " + xlsxFile.getPath());
return;
}
// The package open is instantaneous, as it should be.
OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ_WRITE);
Map<String, Record> output = new HashMap<String, Record>();
ExcelFunctions xlFunctions = new ExcelFunctions(p, output);
xlFunctions.process();
p.close();
if (output != null){
for(Record rec : output.values()){
System.out.println(rec.full_key());
}
}
}
}
File is very large and I only want to use Event API.
I have successfully tested Using this code.
But this loads Whole file in memory(causing application to crash)...While I only need to edit One sheet.
public static void saveToExcel(String ofn, Map<String, Record> data) {
FileInputStream infile;
try {
infile = new FileInputStream(new File("C:\\Users\\admin\\Downloads\\Unique Name Macro\\big.xlsm"));
XSSFWorkbook workbook = new XSSFWorkbook (infile);
XSSFSheet sheet = workbook.getSheet("All Notes");
for(Record rec : output.values()){
Row dataRow = rec.getRow(rev.getRownum-1);
setCellValue(dataRow, 26, "SomeValue");
setCellValue(dataRow, 27, "SomeValue");
}
FileOutputStream out = new FileOutputStream(new File(ofn));
workbook.write(out);
infile.close();
out.close();
workbook.close();
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static void setCellValue(Row row,int col, String value){
Cell c0 = row.getCell(col);
if (c0 == null){
c0 = row.createCell(col);
}
c0.setCellValue(value);
}
I don't think there is anything provided in POI out of the box which allows to do that.
Therefore you might be better off doing this by unzipping the XLSX/XLSM file (they are actually a bunch of xml-files inside a zip) and reading the xml-files as text-files or with a normal XML Parser so that you can easily write out the changed file again to produce the XLSX/XLSM file again.

edit .doc file header java

I need to edit .doc & .docx files header and maintain the style of the document.
I tried doing it by using:
poi api : I managed to read the file header but couldn't find how to replace a text in it and save the result with the original style .
public static void mFix(String iFilePath , HashMap<String, String> iOldNewCouples)
{
aOldNewCouples = iOldNewCouples;
try {
if(iFilePath==null)
return;
File file = new File(iFilePath);
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
WordExtractor extractor = new WordExtractor(document); // read the doc as rtf
String fileData = extractor.getHeaderText();
String fileDataResult =fileData ;
for (Entry<String, String> entry : aOldNewCouples.entrySet())
{
if(fileData.contains(entry.getKey())) {
System.out.println("replace " +entry.getKey());
fileDataResult = fileData.replace(entry.getKey(), entry.getValue());
}
}
document.getHeaderStoryRange().replaceText(fileData, fileDataResult);
saveWord(iFilePath ,document);
fis.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace( );
}
}
private static void saveWord(String filePath, HWPFDocument doc) throws FileNotFoundException, IOException
{
FileOutputStream fileOutputStream = null;
try{
fileOutputStream = new FileOutputStream(new File(filePath.replace(".doc", "-test.doc")));
BufferedOutputStream buffOutputStream = new BufferedOutputStream(fileOutputStream);
doc.write(buffOutputStream);
buffOutputStream.close();
fileOutputStream.close();
}
finally{
if( fileOutputStream != null)
fileOutputStream.close();
}
}
I tried doc4j api for docx : I found how to edit the header but didn't found how to keep the style.
public static void mFix(String iFilePath , HashMap<String, String> iOldNewCouples) {
aOldNewCouples = iOldNewCouples;
WordprocessingMLPackage output;
try {
output = WordprocessingMLPackage.load(new java.io.File(iFilePath));
replaceText(output.getDocumentModel().getSections().get(0).getHeaderFooterPolicy().getDefaultHeader());
output.save(new File(iFilePath));
}
catch (Exception e) {
e.printStackTrace();
}
}
public static void replaceText(ContentAccessor c) throws Exception
{
for (Object p: c.getContent())
{
if (p instanceof ContentAccessor)
replaceText((ContentAccessor) p);
else if (p instanceof JAXBElement)
{
Object v = ((JAXBElement) p).getValue();
if (v instanceof ContentAccessor)
replaceText((ContentAccessor) v);
else if (v instanceof org.docx4j.wml.Text)
{
org.docx4j.wml.Text t = (org.docx4j.wml.Text) v;
String text = t.getValue();
if (text != null)
{
boolean flag = false;
for (Entry<String, String> entry : aOldNewCouples.entrySet())
{
if(text.contains(entry.getKey())) {
flag =true;
text = text.replaceAll(entry.getKey(), entry.getValue());
t.setSpace("preserve");
t.setValue(text);
}
}
}
}
}
}
}
I would like to have examples for those api.
If there is other free solution for this for Java projects , please write them with example.
thanks
Tami

trying to set arabic sentence in word using Apache poi?

i'am trying to generate a word document using apache poi api, and i want to set an arabic sentence into the word, but the words didn't stay on the order !!! for instead of "شهادة بالملك" i get بالملك شهادة
public class word {
public static void main (String [] args) {
XWPFDocument docx = new XWPFDocument();
try {
XWPFParagraph tmpParagraph = docx.createParagraph();
XWPFRun tmpRun = tmpParagraph.createRun();
tmpRun.setText("شهادة بالملك");
tmpRun.setFontSize(18);
tmpRun.setFontFamily("Calibri (Corps)");
tmpRun.setBold(true);
tmpRun.setColor("003894");
tmpParagraph.setAlignment(ParagraphAlignment.LEFT);
tmpRun.setUnderline(UnderlinePatterns.SINGLE);
tmpParagraph.setSpacingAfter(300);
FileOutputStream fos = new FileOutputStream("Word2.docx");
docx.write(fos);
fos.close();
}
catch (Exception e ) {
e.printStackTrace();
}
}
}
this is the answer :
public class word {
public enum TextOrientation {
LTR,
RTL
}
public static void main (String [] args) {
XWPFDocument docx = new XWPFDocument();
try {
XWPFParagraph tmpParagraph = docx.createParagraph();
XWPFRun tmpRun = tmpParagraph.createRun();
tmpRun.setText("شهادة بالملك");
tmpRun.setFontSize(18);
tmpRun.setFontFamily("Calibri (Corps)");
tmpRun.setBold(true);
tmpRun.setColor("003894");
tmpParagraph.setAlignment(ParagraphAlignment.CENTER);
tmpRun.setUnderline(UnderlinePatterns.SINGLE);
tmpParagraph.setSpacingAfter(300);
setOrientation(tmpParagraph, TextOrientation.RTL);
FileOutputStream fos = new FileOutputStream("Word2.docx");
docx.write(fos);
fos.close();
}
catch (Exception e ) {
e.printStackTrace();
}
}
private static void setOrientation(XWPFParagraph par, TextOrientation orientation) {
if ( par.getCTP().getPPr()==null ) {
par.getCTP().addNewPPr();
}
if ( par.getCTP().getPPr().getBidi()==null ) {
par.getCTP().getPPr().addNewBidi();
}
par.getCTP().getPPr().getBidi().setVal(orientation==TextOrientation.RTL?STOnOff.ON:STOnOff.OFF);
}
}

Categories

Resources