I am using the iText7(v7.1.1) to create PDF-file.
Environment: java version "1.7.0_45".
For IVS(Ideographic Variation Sequence) see below
http://blogs.adobe.com/CCJKType/files/2017/09/iuc32-lunde-s5t3.pdf
Sample-Code see below
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Table;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.io.font.PdfEncodings;
import java.io.File;
public class SimpleTableIVS {
public static final String DEST = "SimpleTableIVS.pdf";
public static void main(String[] args) throws Exception {
File file = new File(DEST);
new SimpleTableIVS().manipulatePdf(DEST);
}
protected void manipulatePdf(String dest) throws Exception {
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
Document doc = new Document(pdfDoc);
// UTF-8 encoding table and Unicode characters
// http://www.utf8-chartable.com/unicode-utf8-table.pl?start=131072&unicodeinhtml=hex&htmlent=1
// http://www.utf8-chartable.com/unicode-utf8-table.pl?start=33792&number=1024
byte[] bUtfA = {(byte)0xd8, (byte)0x40, (byte)0xdc, (byte)0x0b}; // U+2000B [IVS:2000B_E0103]
byte[] bUtfB = {(byte)0x84, (byte)0x5b}; // U+845B, [IVS: 845B_E0103]
// After Add "Ideographic Variation Selector"
byte[] bUtfC = {(byte)0xdb, (byte)0x40, (byte)0xdd, (byte)0x01}; // U+E0101
byte[] bUtfD = {(byte)0xdb, (byte)0x40, (byte)0xdd, (byte)0x02}; // U+E0102
//PdfFont font = PdfFontFactory.createFont("C:/Windows/Fonts/msmincho.ttc,0", PdfEncodings.IDENTITY_H);
//PdfFont font = PdfFontFactory.createFont("C:/Windows/Fonts/meiryo.ttc,0", PdfEncodings.IDENTITY_H);
PdfFont font = PdfFontFactory.createFont("C:/Program Files/ipamjm/ipamjm.ttf", PdfEncodings.IDENTITY_H);
String strUtfA = new String(bUtfA, "UTF-16");
String strUtfB = new String(bUtfB, "UTF-16");
String strUtfC = strUtfA + (new String(bUtfC, "UTF-16"));
String strUtfD = strUtfB + (new String(bUtfD, "UTF-16"));
Table table = new Table(4);
table.addCell(new Paragraph("\u200d" + strUtfA).setFont(font).setFontSize(12));
table.addCell(new Paragraph("\u200d" + strUtfB).setFont(font).setFontSize(12));
table.addCell(new Paragraph("\u200d" + strUtfC).setFont(font).setFontSize(12));
table.addCell(new Paragraph("\u200d" + strUtfD).setFont(font).setFontSize(12));
doc.add(table);
doc.close();
}
}
Unfortunately, Ideographic Variation Sequences are not currently supported in iText. Supporting it is feasible, but not very easy, and thus is not the highest priority at the moment.
An internal development ticket has been created for that and it would be great to implement this feature in the next versions.
Related
I'm a rookie, really. I'm building my first project (if I can finish it).
I want to extract PDF text with formatting and location, and then write to .docx file. I checked the PDFBox API documentation, but I'm not sure if I want to get the location of the text, then should I traverse the rows? Or traverse the characters? I studied these three carefully.
Text coordinates when stripping from PDFBox
Get font of each line using PDFBox
How to extract font styles of text contents using pdfbox?
And here is my DEMO:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import java.io.IOException;
import java.util.List;
public class PDFTextExtractor extends PDFTextStripper {
/**
* Instantiate a new PDFTextStripper object.
*
* #throws IOException If there is an error loading the properties.
*/
public PDFTextExtractor() throws IOException {
}
String prevFont = "";
#Override
protected void writeString(String text, List<TextPosition> textPositions) throws IOException {
StringBuilder sb = new StringBuilder();
for (TextPosition position : textPositions){
String font = position.getFont().getName();
float x = position.getX();
float y = position.getY();
float fontSize = position.getFontSizeInPt();
if (font != null && !font.equals(prevFont)){
sb.append("[").append(font.split("-")[0]).append("+").append(font.split("-")[1]).append("+").append(fontSize).append("]");
prevFont = font;
}
sb.append(position.getUnicode());
}
writeString(sb.toString());
}
#Override
public String getText(PDDocument doc) throws IOException {
return super.getText(doc);
}
}
And i calling it like here:
FileOutputStream outputStream = new FileOutputStream(EXPORT_PATH + file.getName().split("\\.")[0] + ".docx");
try (PDDocument originalPDF = PDDocument.load(file);
XWPFDocument doc = new XWPFDocument()) {
//get All pages
PDPageTree pageList = originalPDF.getDocumentCatalog().getPages();
for (PDPage page : pageList){
//Parse Content
PDFTextStripper stripper = new PDFTextExtractor();
stripper.setSortByPosition(true);
String ss = stripper.getText(originalPDF);
System.out.println(ss);
//Write Content
XWPFParagraph paragraph = doc.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText(ss);
run.addBreak(BreakType.PAGE);
}
doc.write(outputStream);
originalPDF.close();
outputStream.close();
}
#JasonPlutext,
Hi Jason! I tried the above code but it just replaces an totally the image deleting the whole template.
I would like to just replace/add a particular relationship of the image ,say
<Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image10.png"/>
in place of rId8 i would like to replace rId7 image.
My Source Code:
public static void main(String[] args) throws Exception {
String inputfilepath = "C:\\Users\\saranyac\\QUERIES\\Estimation\\PPT-PSR\\PSR_Dev0ps\\PSRAutomationTemplate.pptx";
PresentationMLPackage presentationMLPackage = (PresentationMLPackage)OpcPackage.load(new java.io.File(inputfilepath));
MainPresentationPart pp = presentationMLPackage.getMainPresentationPart();
SlidePart slidePart = presentationMLPackage.getMainPresentationPart().getSlide(0);
SlideLayoutPart layoutPart = slidePart.getSlideLayoutPart();
System.out.println("SlidePart Name:::::"+slidePart.getPartName().getName());
String layoutName = layoutPart.getJaxbElement().getCSld().getName();
System.out.println("layout: " + layoutPart.getPartName().getName() + " with cSld/#name='" + layoutName + "'");
System.out.println("Master: " + layoutPart.getSlideMasterPart().getPartName().getName());
System.out.println("layoutPart.getContents()::::::::s: " + layoutPart.getContents());
//layoutPart.setContents( (SldLayout)XmlUtils.unmarshalString(SAMPLE_PICTURE, Context.jcPML));
// Add image part
File file = new File("C:\\Users\\saranyac\\PPT-PSR\\PSR_Dev0ps\\ppt\\media\\image10.png" );
BinaryPartAbstractImage imagePart
= BinaryPartAbstractImage.createImagePart(presentationMLPackage, slidePart, file);
Relationship rel = pp.getRelationshipsPart().getRelationshipByID("rId8");
System.out.println("Relationship:::::::s: " +imagePart.getSourceRelationship().getId());
// pp.removeSlide(rel);
java.util.HashMap<String, String>mappings = new java.util.HashMap<String, String>();
mappings.put("rId8", imagePart.getSourceRelationship().getId());
String outputfilepath = "C:\\Work\\24Jan2018_CheckOut\\PPT-TRAILS\\Success.pptx";
//presentationMLPackage.save(new java.io.File(outputfilepath));
SaveToZipFile saver = new SaveToZipFile(presentationMLPackage);
saver.save(outputfilepath);
System.out.println("\n\n done .. saved " + outputfilepath);
}
Please help me how to replace an image in the generated PPT.
With Regards,
Saranya
See https://github.com/plutext/docx4j/blob/master/src/samples/pptx4j/org/pptx4j/samples/TemplateReplaceSimple.java (just added):
package org.pptx4j.samples;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import javax.xml.bind.JAXBException;
import org.apache.commons.io.FileUtils;
import org.docx4j.TraversalUtil;
import org.docx4j.TraversalUtil.CallbackImpl;
import org.docx4j.dml.CTBlip;
import org.docx4j.dml.CTBlipFillProperties;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.OpcPackage;
import org.docx4j.openpackaging.packages.PresentationMLPackage;
import org.docx4j.openpackaging.parts.Part;
import org.docx4j.openpackaging.parts.PresentationML.SlidePart;
import org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage;
import org.pptx4j.Pptx4jException;
/**
* Example of how to replace text and images in a Pptx.
*
* Text is replaced using the familiar VariableReplace approach.
*
* Images are replaced by replacing their byte content.
*
* #author jharrop
*
*/
public class TemplateReplaceSimple {
public static void main(String[] args) throws Docx4JException, Pptx4jException, JAXBException, IOException {
// Input file
String inputfilepath = System.getProperty("user.dir") + "/sample-docs/pptx/image.pptx";
// String replacements
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("colour", "green");
// Image replacements
List<ImageReplacementDetails> imageReplacements = new ArrayList<ImageReplacementDetails>();
ImageReplacementDetails example1 = new ImageReplacementDetails();
example1.slideIndex = 0;
example1.imageRelId = "rId2";
example1.replacementImageBytes = FileUtils.readFileToByteArray(new File("test.png"));
imageReplacements.add(example1);
PresentationMLPackage presentationMLPackage =
(PresentationMLPackage)OpcPackage.load(new java.io.File(inputfilepath));
// First, the text replacements
List<SlidePart> slideParts=
presentationMLPackage.getMainPresentationPart().getSlideParts();
for (SlidePart slidePart : slideParts) {
slidePart.variableReplace(mappings);
}
// Second, the image replacements.
// We have a design choice here.
// Either we can replace text placeholders with images,
// or we can replace existing images with new images, but keep the XML specifying size etc
// Here I opt for the latter, so what we need is the relId and image bytes.
for( ImageReplacementDetails ird : imageReplacements) {
// its a bit inefficient to potentially traverse a single slide
// multiple times, but I've done it this way to keep this example simple
SlidePart slidePart=
presentationMLPackage.getMainPresentationPart().getSlide(ird.slideIndex);
SlidePicFinder traverser = new SlidePicFinder();
new TraversalUtil(slidePart.getJaxbElement().getCSld().getSpTree().getSpOrGrpSpOrGraphicFrame(), traverser);
for(org.pptx4j.pml.Pic pic : traverser.pics) {
CTBlipFillProperties blipFill = pic.getBlipFill();
if (blipFill!=null) {
CTBlip blip = blipFill.getBlip();
if (blip.getEmbed()!=null) {
String relId = blip.getEmbed();
// is this the one we want?
if (relId.equals(ird.imageRelId)) {
Part part = slidePart.getRelationshipsPart().getPart(relId);
try {
BinaryPartAbstractImage imagePart = (BinaryPartAbstractImage)part;
// you'll need to ensure that you replace like with like,
// ie png for png, not eg jpeg for png!
imagePart.setBinaryData(ird.replacementImageBytes);
} catch (ClassCastException cce) {
System.out.println(part.getClass().getName());
}
} else {
System.out.println(relId + " isn't a match for this replacement. ");
}
} else {
System.out.println("No a:blip/#r:embed");
}
}
}
}
System.out.println("\n\n saving .. \n\n");
String outputfilepath = System.getProperty("user.dir") + "/OUT_VariableReplace.pptx";
presentationMLPackage.save(new java.io.File(outputfilepath));
System.out.println("\n\n done .. \n\n");
}
static class ImageReplacementDetails {
int slideIndex;
String imageRelId;
byte[] replacementImageBytes;
}
static class SlidePicFinder extends CallbackImpl {
List<org.pptx4j.pml.Pic> pics = new ArrayList<org.pptx4j.pml.Pic>();
public List<Object> apply(Object o) {
if (o instanceof org.pptx4j.pml.Pic) {
pics.add((org.pptx4j.pml.Pic) o);
System.out.println("added pic");
}
return null;
}
}
}
I have found kiwiwings answer to the question of how you can embed files into Excel using Apache POI, but unfortunately his answer only covers HSSF spreadsheets (the XLS format), and we are currently using the new XSSF format (XLSX), and the solution proposed for HSSF spreadsheets will not work. I tried porting it, but the final nail in the coffin comes from the fact that there is no HSSFObjectData equivalent in the XSSF world.
This is what I have done so far - I have found a way to attach the files to the Excel file. This code does it:
private PackagePart packageNotebook(
final OPCPackage pkg,
final String notebookTable,
final String taskId,
final String notebookName,
final byte[] contents
) throws InvalidFormatException, IOException
{
final PackagePartName partName =
PackagingURIHelper.createPartName( "/notebook/" + notebookTable + "/" + taskId + "/" + notebookName );
pkg.addRelationship( partName, TargetMode.INTERNAL, PackageRelationshipTypes.CUSTOM_XML );
final PackagePart part = pkg.createPart( partName, "text/xml" );
IOUtils.write( contents, part.getOutputStream() );
return part;
}
I was also able to create the image I wanted to use as the anchor in the Excel file. What I am unable to do, however, is to "link" that image to the embedded content, as kiwiwings was able to do in his reply.
My end goal is to have an XLSX Excel file with embedded objects in it, in such a way that the user can double click in the anchor I open in the cells and then be able to edit the file, just like you would do if you were embedding a file using the Excel client.
Does anyone have a working example on how to do that?
I've applied a patch via #60586, so embedding is now much easier. The following snipplet is taken from the corresponding JUnit test.
Workbook wb1 = new XSSFWorkbook();
Sheet sh = wb1.createSheet();
int picIdx = wb1.addPicture(getSamplePng(), Workbook.PICTURE_TYPE_PNG);
byte samplePPTX[] = getSamplePPT(true);
int oleIdx = wb1.addOlePackage(samplePPTX, "dummy.pptx", "dummy.pptx", "dummy.pptx");
Drawing<?> pat = sh.createDrawingPatriarch();
ClientAnchor anchor = pat.createAnchor(0, 0, 0, 0, 1, 1, 3, 6);
pat.createObjectData(anchor, oleIdx, picIdx);
Compared to the HSSF/Package Manager stuff this was more straight forward. :)
So as usual I've started by creating the necessary file via Excel 2016 and checked what's inside the xmls. Office likes to put a lot of AlternateContent tags in - in the below solution I've removed those wrappers and directly provided the elements within the original "Choice" elements, at least with Excel2016 it works ... - be aware that the embeddings in the original files of Excel2016 couldn't be opened in Libre Office 5 / Excel-Viewer, so your users need a normal installation.
As I've implemented it in the full developer codebase of POI, you might need use the full schemas.
The preview pictures will be replaced when the users tries to open the embedded objects. It would be nice if POIs WMF packages could be used to generate preview images on the fly, but I only have implemented them read-only up till now :(
If the embedded elements can't be opened, please drop me a line of your users Office installation and I try to downgrade accordingly.
package org.apache.poi.xssf;
import java.awt.geom.Rectangle2D;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.HashSet;
import java.util.Set;
import javax.xml.namespace.QName;
import org.apache.poi.POIXMLDocument;
import org.apache.poi.hpsf.ClassID;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackagePartName;
import org.apache.poi.openxml4j.opc.PackageRelationship;
import org.apache.poi.openxml4j.opc.PackageRelationshipTypes;
import org.apache.poi.openxml4j.opc.PackagingURIHelper;
import org.apache.poi.openxml4j.opc.TargetMode;
import org.apache.poi.poifs.filesystem.Ole10Native;
import org.apache.poi.poifs.filesystem.Ole10NativeException;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
import org.apache.poi.xslf.usermodel.XSLFTextBox;
import org.apache.poi.xssf.usermodel.XSSFClientAnchor;
import org.apache.poi.xssf.usermodel.XSSFDrawing;
import org.apache.poi.xssf.usermodel.XSSFPicture;
import org.apache.poi.xssf.usermodel.XSSFRelation;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.XmlException;
import org.apache.xmlbeans.XmlObject;
import org.junit.Test;
import org.openxmlformats.schemas.drawingml.x2006.main.CTOfficeArtExtension;
import org.openxmlformats.schemas.drawingml.x2006.main.CTOfficeArtExtensionList;
import org.openxmlformats.schemas.drawingml.x2006.spreadsheetDrawing.CTPicture;
import org.openxmlformats.schemas.drawingml.x2006.spreadsheetDrawing.CTTwoCellAnchor;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObjects;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
public class TestEmbed {
static final String drawNS = "http://schemas.microsoft.com/office/drawing/2010/main";
static final String relationshipsNS = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
// write some embedded objects to sheet
#Test
public void write() throws IOException, InvalidFormatException {
XSSFWorkbook wb = new XSSFWorkbook();
XSSFSheet sh = wb.createSheet();
int imgPptId = addImageToWorkbook(wb, "ppt-icon.jpg", Workbook.PICTURE_TYPE_JPEG);
int imgPckId = addImageToWorkbook(wb, "PackageIcon.png", Workbook.PICTURE_TYPE_PNG);
String imgPckRelId = addImageToSheet(sh, imgPckId, Workbook.PICTURE_TYPE_PNG);
String imgPptRelId = addImageToSheet(sh, imgPptId, Workbook.PICTURE_TYPE_JPEG);
// embed two different HTML pages via package manager
XSSFClientAnchor imgAnchor1 = new XSSFClientAnchor(0, 0, 0, 0, 1, 1, 3, 3);
String oleRelId1 = addHtml(sh, 1);
int shapeId1 = addImageToShape(sh, imgAnchor1, imgPckId);
addObjectToShape(sh, imgAnchor1, shapeId1, oleRelId1, imgPckRelId, "Objekt-Manager-Shellobjekt");
XSSFClientAnchor imgAnchor2 = new XSSFClientAnchor(0, 0, 0, 0, 5, 1, 7, 3);
String oleRelId2 = addHtml(sh, 2);
int shapeId2 = addImageToShape(sh, imgAnchor2, imgPckId);
addObjectToShape(sh, imgAnchor2, shapeId2, oleRelId2, imgPckRelId, "Objekt-Manager-Shellobjekt");
// embed a slideshow (no package manager needed)
XSSFClientAnchor imgAnchor3 = new XSSFClientAnchor(0, 0, 0, 0, 1, 5, 7, 10);
String oleRelId3 = addSlideShow(sh, 1);
int shapeId3 = addImageToShape(sh, imgAnchor3, imgPptId);
addObjectToShape(sh, imgAnchor3, shapeId3, oleRelId3, imgPptRelId, "Presentation");
FileOutputStream fos = new FileOutputStream("bla.xlsx");
wb.write(fos);
fos.close();
wb.close();
}
// read Ole10Native objects from workbook
#Test
public void read() throws IOException, XmlException, Ole10NativeException {
XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream("bla.xlsx"));
XSSFSheet sheet = wb.getSheetAt(0);
CTWorksheet cws = sheet.getCTWorksheet();
if (!cws.isSetOleObjects()) {
System.out.println("sheet has no ole objects");
} else {
Set<Integer> processedShapes = new HashSet<Integer>();
for (XmlObject xOleObj : cws.getOleObjects().selectPath("declare namespace p='"+XSSFRelation.NS_SPREADSHEETML+"' .//p:oleObject")) {
XmlCursor cur = xOleObj.newCursor();
String shapeId = cur.getAttributeText(new QName("shapeId"));
String relId = cur.getAttributeText(new QName(relationshipsNS, "id"));
cur.dispose();
if (processedShapes.contains(Integer.valueOf(shapeId))) {
continue;
}
processedShapes.add(Integer.valueOf(shapeId));
PackagePart pp = sheet.getRelationById(relId).getPackagePart();
if ("application/vnd.openxmlformats-officedocument.oleObject".equals(pp.getContentType())) {
InputStream is = pp.getInputStream();
POIFSFileSystem poifs = new POIFSFileSystem(is);
is.close();
Ole10Native ole10 = Ole10Native.createFromEmbeddedOleObject(poifs);
poifs.close();
System.out.println("Filename: "+ole10.getFileName()+" - content length: "+ole10.getDataSize());
}
}
}
wb.close();
}
// add a dummy html to the embeddings folder
private static String addHtml(XSSFSheet sh, int oleId) throws IOException, InvalidFormatException {
String html10 = "<html><body><marquee>This is the end. Html-id: "+oleId+"</marquee></body></html>";
Ole10Native ole10 = new Ole10Native("html"+oleId+".html", "html"+oleId+".html", "html"+oleId+".html", html10.getBytes("ISO-8859-1"));
ByteArrayOutputStream bos = new ByteArrayOutputStream(500);
ole10.writeOut(bos);
POIFSFileSystem poifs = new POIFSFileSystem();
poifs.getRoot().createDocument(Ole10Native.OLE10_NATIVE, new ByteArrayInputStream(bos.toByteArray()));
poifs.getRoot().setStorageClsid(ClassID.OLE10_PACKAGE);
final PackagePartName pnOLE = PackagingURIHelper.createPartName( "/xl/embeddings/oleObject"+oleId+".bin" );
final PackagePart partOLE = sh.getWorkbook().getPackage().createPart( pnOLE, "application/vnd.openxmlformats-officedocument.oleObject" );
PackageRelationship prOLE = sh.getPackagePart().addRelationship( pnOLE, TargetMode.INTERNAL, POIXMLDocument.OLE_OBJECT_REL_TYPE );
OutputStream os = partOLE.getOutputStream();
poifs.writeFilesystem(os);
os.close();
poifs.close();
return prOLE.getId();
}
// add a dummy slideshow to the embeddings folder
private static String addSlideShow(XSSFSheet sh, int pptId) throws IOException, InvalidFormatException {
XMLSlideShow ppt = new XMLSlideShow();
XSLFTextBox tb = ppt.createSlide().createTextBox();
tb.setText("this is the end - PPT-ID: "+pptId);
tb.setAnchor(new Rectangle2D.Double(100,100,100,100));
final PackagePartName pnPPT = PackagingURIHelper.createPartName( "/xl/embeddings/sample"+pptId+".pptx" );
final PackagePart partPPT = sh.getWorkbook().getPackage().createPart( pnPPT, "application/vnd.openxmlformats-officedocument.presentationml.presentation" );
PackageRelationship prPPT = sh.getPackagePart().addRelationship( pnPPT, TargetMode.INTERNAL, POIXMLDocument.PACK_OBJECT_REL_TYPE );
OutputStream os = partPPT.getOutputStream();
ppt.write(os);
os.close();
ppt.close();
return prPPT.getId();
}
private static int addImageToWorkbook(XSSFWorkbook wb, String fileName, int pictureType) throws IOException {
FileInputStream fis = new FileInputStream(fileName);
int imgId = wb.addPicture(fis, pictureType);
fis.close();
return imgId;
}
private static String addImageToSheet(XSSFSheet sh, int imgId, int pictureType) throws InvalidFormatException {
final PackagePartName pnIMG = PackagingURIHelper.createPartName( "/xl/media/image"+(imgId+1)+(pictureType == Workbook.PICTURE_TYPE_PNG ? ".png" : ".jpeg") );
PackageRelationship prIMG = sh.getPackagePart().addRelationship( pnIMG, TargetMode.INTERNAL, PackageRelationshipTypes.IMAGE_PART );
return prIMG.getId();
}
private static int addImageToShape(XSSFSheet sh, XSSFClientAnchor imgAnchor, int imgId) {
XSSFDrawing pat = sh.createDrawingPatriarch();
XSSFPicture pic = pat.createPicture(imgAnchor, imgId);
CTPicture cPic = pic.getCTPicture();
int shapeId = (int)cPic.getNvPicPr().getCNvPr().getId();
cPic.getNvPicPr().getCNvPr().setHidden(true);
CTOfficeArtExtensionList extLst = cPic.getNvPicPr().getCNvPicPr().addNewExtLst();
// https://msdn.microsoft.com/en-us/library/dd911027(v=office.12).aspx
CTOfficeArtExtension ext = extLst.addNewExt();
ext.setUri("{63B3BB69-23CF-44E3-9099-C40C66FF867C}");
XmlCursor cur = ext.newCursor();
cur.toEndToken();
cur.beginElement(new QName(drawNS, "compatExt", "a14"));
cur.insertAttributeWithValue("spid", "_x0000_s"+shapeId);
return shapeId;
}
private static void addObjectToShape(XSSFSheet sh, XSSFClientAnchor imgAnchor, int shapeId, String objRelId, String imgRelId, String progId) {
CTWorksheet cwb = sh.getCTWorksheet();
CTOleObjects oo = cwb.isSetOleObjects() ? cwb.getOleObjects() : cwb.addNewOleObjects();
CTOleObject ole1 = oo.addNewOleObject();
ole1.setProgId(progId);
ole1.setShapeId(shapeId);
ole1.setId(objRelId);
XmlCursor cur1 = ole1.newCursor();
cur1.toEndToken();
cur1.beginElement("objectPr", XSSFRelation.NS_SPREADSHEETML);
cur1.insertAttributeWithValue("id", relationshipsNS, imgRelId);
cur1.insertAttributeWithValue("defaultSize", "0");
cur1.beginElement("anchor", XSSFRelation.NS_SPREADSHEETML);
cur1.insertAttributeWithValue("moveWithCells", "1");
CTTwoCellAnchor anchor = CTTwoCellAnchor.Factory.newInstance();
anchor.setFrom(imgAnchor.getFrom());
anchor.setTo(imgAnchor.getTo());
XmlCursor cur2 = anchor.newCursor();
cur2.copyXmlContents(cur1);
cur2.dispose();
cur1.toParent();
cur1.toFirstChild();
cur1.setName(new QName(XSSFRelation.NS_SPREADSHEETML, "from"));
cur1.toNextSibling();
cur1.setName(new QName(XSSFRelation.NS_SPREADSHEETML, "to"));
cur1.dispose();
}
}
currently I'm trying to convert a PDF to PDF/A.
However somehow I don't know if I can convert the colorspace is there any way by doing so?
this is my code, yet:
PDDocumentInformation info = doc.getDocumentInformation();
System.out.println("Page Count=" + doc.getNumberOfPages());
System.out.println("Title=" + info.getTitle());
System.out.println("Author=" + info.getAuthor());
System.out.println("Subject=" + info.getSubject());
System.out.println("Keywords=" + info.getKeywords());
System.out.println("Creator=" + info.getCreator());
System.out.println("Producer=" + info.getProducer());
System.out.println("Creation Date=" + info.getCreationDate());
System.out.println("Modification Date=" + info.getModificationDate());
System.out.println("Trapped=" + info.getTrapped());
PDDocumentCatalog cat = doc.getDocumentCatalog();
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
PDFAIdentificationSchema pdfaid = xmp.createAndAddPFAIdentificationSchema();
pdfaid.setConformance("A");
pdfaid.setPart(3);
pdfaid.setAboutAsSimple(null);
DublinCoreSchema dublinCoreSchema = xmp.createAndAddDublinCoreSchema();
dublinCoreSchema.setTitle(info.getTitle());
dublinCoreSchema.addCreator(info.getAuthor());
AdobePDFSchema adobePDFSchema = xmp.createAndAddAdobePDFSchema();
adobePDFSchema.setProducer(info.getProducer());
XMPBasicSchema xmpBasicSchema = xmp.createAndAddXMPBasicSchema();
xmpBasicSchema.setCreatorTool(info.getCreator());
xmpBasicSchema.setCreateDate(info.getCreationDate());
xmpBasicSchema.setModifyDate(info.getModificationDate());
xmp.addSchema(pdfaid);
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
InputStream colorProfile = PdfConverter.class.getResourceAsStream("/sRGBColorSpaceProfile.icm");
PDOutputIntent oi = new PDOutputIntent(doc, colorProfile);
oi.setInfo("sRGB IEC61966-2.1");
oi.setOutputCondition("sRGB IEC61966-2.1");
oi.setOutputConditionIdentifier("sRGB IEC61966-2.1");
oi.setRegistryName("http://www.color.org");
cat.addOutputIntent(oi);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
cat.setMetadata(metadata);
The colorspace gets added however on validation i get:
2.3.2 : Unexpected key in Graphic object definition, The ColorSpace is unknown
For every page/element whatever, it appears quite often.
Could I do anything against it? Like converting the ColorsSpace? Using antoher library?
I have found this trick to convert pdf to pdfA.
Fill the PDF form
Convert it to image
Create a valid PDFA form as explained in PDFBox website
Fill the image created as the result
In this example, I used : OoPdfFormExample.pdf that can be found easily in internet.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.preflight.Format;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.exception.SyntaxValidationException;
import org.apache.pdfbox.preflight.parser.PreflightParser;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.tools.imageio.ImageIOUtil;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.DublinCoreSchema;
import org.apache.xmpbox.schema.PDFAIdentificationSchema;
import org.apache.xmpbox.type.BadFieldValueException;
import org.apache.xmpbox.xml.XmpSerializer;
import javax.xml.transform.TransformerException;
import java.awt.image.BufferedImage;
import java.io.*;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Random;
public class CreatePDFAFile {
private static final String OUTPUT_DIR = "tmp";
static String separator = FileSystems.getDefault().getSeparator();
public static void main(String[] args) throws IOException {
Path tmpDir = getRandomPath();
String fileInput = fillForm("template/OoPdfFormExample.pdf", tmpDir);
String image = PDF2Image(fileInput, tmpDir);
String pdfa = createPDFA(image, tmpDir);
checkPDFAValidation(pdfa);
}
private static String fillForm(String formTemplate, Path path) throws IOException {
String fileOut = path + separator + "FillForm.pdf";
try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate))) {
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
if (acroForm != null) {
acroForm.getField(acroForm.getFields().get(0).getFullyQualifiedName()).setValue("TEST");
}
acroForm.refreshAppearances();
acroForm.flatten();
pdfDocument.save(fileOut);
}
return fileOut;
}
public static String PDF2Image(String fileInput, Path path) {
String fileName = "";
try (final PDDocument document = PDDocument.load(new File(fileInput))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
fileName = path + separator + "image-" + page + ".png";
ImageIOUtil.writeImage(bim, fileName, 300);
}
} catch (IOException e) {
System.err.println("Exception while trying to create pdf document - " + e);
}
return fileName;
}
public static String createPDFA(String imagePath, Path path) throws IOException {
try (PDDocument doc = new PDDocument()) {
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDType0Font.load(doc, new File("template" + separator + "LiberationSans-Regular.ttf"));
if (!font.isEmbedded()) {
throw new IllegalStateException("PDF/A compliance requires that all fonts used for"
+ " text rendering in rendering modes other than rendering mode 3 are embedded.");
}
try (PDPageContentStream contents = new PDPageContentStream(doc, page)) {
contents.beginText();
contents.setFont(font, 12);
contents.newLineAtOffset(100, 700);
contents.showText("");
contents.endText();
}
// add XMP metadata
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
String fileName = path + separator + "FinalPDFAFile.pdf";
try {
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle(fileName);
PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
id.setPart(1);
id.setConformance("B");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
} catch (BadFieldValueException | TransformerException e) {
throw new IllegalArgumentException(e);
}
// sRGB output intent
InputStream colorProfile = new FileInputStream(new File("template/sRGB.icc"));
PDOutputIntent intent = new PDOutputIntent(doc, colorProfile);
intent.setInfo("");
intent.setOutputCondition("");
intent.setOutputConditionIdentifier("");
intent.setRegistryName("");
doc.getDocumentCatalog().addOutputIntent(intent);
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, doc);
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true, true)) {
float scale = 1 / 5f;
contentStream.drawImage(pdImage, 20, 20, pdImage.getWidth() * scale, pdImage.getHeight() * scale);
}
doc.save(fileName);
return fileName;
}
}
private static void checkPDFAValidation(String fileName) throws IOException {
ValidationResult result = null;
PreflightParser parser = new PreflightParser(fileName);
try {
parser.parse(Format.PDF_A1B);
PreflightDocument document = parser.getPreflightDocument();
document.validate();
// Get validation result
result = document.getResult();
document.close();
} catch (SyntaxValidationException e) {
result = e.getResult();
}
if (result.isValid()) {
System.out.println("The file " + fileName + " is a valid PDF/A-1b file");
} else {
System.out.println("The file" + fileName + " is not valid, error(s) :");
for (ValidationResult.ValidationError error : result.getErrorsList()) {
System.out.println(error.getErrorCode() + " : " + error.getDetails());
}
}
}
private static Path getRandomPath() throws IOException {
String path = generateRandom();
Path tmpDir = Paths.get(OUTPUT_DIR + separator + path + separator);
Files.createDirectory(tmpDir);
return tmpDir;
}
private static String generateRandom() {
String aToZ = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
Random rand = new Random();
StringBuilder res = new StringBuilder();
for (int i = 0; i < 17; i++) {
int randIndex = rand.nextInt(aToZ.length());
res.append(aToZ.charAt(randIndex));
}
return res.toString();
}
}
Some questions about using Zxing...
I write the following code to read barcode from an image:
public class BarCodeDecode
{
/**
* #param args
*/
public static void main(String[] args)
{
try
{
String tmpImgFile = "D:\\FormCode128.TIF";
Map<DecodeHintType,Object> tmpHintsMap = new EnumMap<DecodeHintType, Object>(DecodeHintType.class);
tmpHintsMap.put(DecodeHintType.TRY_HARDER, Boolean.TRUE);
tmpHintsMap.put(DecodeHintType.POSSIBLE_FORMATS, EnumSet.allOf(BarcodeFormat.class));
tmpHintsMap.put(DecodeHintType.PURE_BARCODE, Boolean.FALSE);
File tmpFile = new File(tmpImgFile);
String tmpRetString = BarCodeUtil.decode(tmpFile, tmpHintsMap);
//String tmpRetString = BarCodeUtil.decode(tmpFile, null);
System.out.println(tmpRetString);
}
catch (Exception tmpExpt)
{
System.out.println("main: " + "Excpt err! (" + tmpExpt.getMessage() + ")");
}
System.out.println("main: " + "Program end.");
}
}
public class BarCodeUtil
{
private static BarcodeFormat DEFAULT_BARCODE_FORMAT = BarcodeFormat.CODE_128;
/**
* Decode method used to read image or barcode itself, and recognize the barcode,
* get the encoded contents and returns it.
* #param whatFile image that need to be read.
* #param config configuration used when reading the barcode.
* #return decoded results from barcode.
*/
public static String decode(File whatFile, Map<DecodeHintType, Object> whatHints) throws Exception
{
// check the required parameters
if (whatFile == null || whatFile.getName().trim().isEmpty())
throw new IllegalArgumentException("File not found, or invalid file name.");
BufferedImage tmpBfrImage;
try
{
tmpBfrImage = ImageIO.read(whatFile);
}
catch (IOException tmpIoe)
{
throw new Exception(tmpIoe.getMessage());
}
if (tmpBfrImage == null)
throw new IllegalArgumentException("Could not decode image.");
LuminanceSource tmpSource = new BufferedImageLuminanceSource(tmpBfrImage);
BinaryBitmap tmpBitmap = new BinaryBitmap(new HybridBinarizer(tmpSource));
MultiFormatReader tmpBarcodeReader = new MultiFormatReader();
Result tmpResult;
String tmpFinalResult;
try
{
if (whatHints != null && ! whatHints.isEmpty())
tmpResult = tmpBarcodeReader.decode(tmpBitmap, whatHints);
else
tmpResult = tmpBarcodeReader.decode(tmpBitmap);
// setting results.
tmpFinalResult = String.valueOf(tmpResult.getText());
}
catch (Exception tmpExcpt)
{
throw new Exception("BarCodeUtil.decode Excpt err - " + tmpExcpt.toString() + " - " + tmpExcpt.getMessage());
}
return tmpFinalResult;
}
}
I try to read the following two images that contains code128 and QRCode.
It can work for the code128 but not for the QRCode.
Any one knows why...
Please go through this link for complete Tutorial. The author of this code is Joe. I have not developed this code, so I am just doing Copy paste to make sure this is available in case link is broken.
The author is using ZXing(Zebra Crossing Library) you can download it from here, for this tutorial.
QR Code Write and Read Program in Java:
package com.javapapers.java;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import javax.imageio.ImageIO;
import com.google.zxing.BarcodeFormat;
import com.google.zxing.BinaryBitmap;
import com.google.zxing.EncodeHintType;
import com.google.zxing.MultiFormatReader;
import com.google.zxing.MultiFormatWriter;
import com.google.zxing.NotFoundException;
import com.google.zxing.Result;
import com.google.zxing.WriterException;
import com.google.zxing.client.j2se.BufferedImageLuminanceSource;
import com.google.zxing.client.j2se.MatrixToImageWriter;
import com.google.zxing.common.BitMatrix;
import com.google.zxing.common.HybridBinarizer;
import com.google.zxing.qrcode.decoder.ErrorCorrectionLevel;
public class QRCode {
public static void main(String[] args) throws WriterException, IOException,
NotFoundException {
String qrCodeData = "Hello World!";
String filePath = "QRCode.png";
String charset = "UTF-8"; // or "ISO-8859-1"
Map<EncodeHintType, ErrorCorrectionLevel> hintMap = new HashMap<EncodeHintType, ErrorCorrectionLevel>();
hintMap.put(EncodeHintType.ERROR_CORRECTION, ErrorCorrectionLevel.L);
createQRCode(qrCodeData, filePath, charset, hintMap, 200, 200);
System.out.println("QR Code image created successfully!");
System.out.println("Data read from QR Code: "
+ readQRCode(filePath, charset, hintMap));
}
public static void createQRCode(String qrCodeData, String filePath,
String charset, Map hintMap, int qrCodeheight, int qrCodewidth)
throws WriterException, IOException {
BitMatrix matrix = new MultiFormatWriter().encode(
new String(qrCodeData.getBytes(charset), charset),
BarcodeFormat.QR_CODE, qrCodewidth, qrCodeheight, hintMap);
MatrixToImageWriter.writeToFile(matrix, filePath.substring(filePath
.lastIndexOf('.') + 1), new File(filePath));
}
public static String readQRCode(String filePath, String charset, Map hintMap)
throws FileNotFoundException, IOException, NotFoundException {
BinaryBitmap binaryBitmap = new BinaryBitmap(new HybridBinarizer(
new BufferedImageLuminanceSource(
ImageIO.read(new FileInputStream(filePath)))));
Result qrCodeResult = new MultiFormatReader().decode(binaryBitmap,
hintMap);
return qrCodeResult.getText();
}
}
Maven dependency for the ZXing QR Code library:
<dependency>
<groupId>com.google.zxing</groupId>
<artifactId>core</artifactId>
<version>2.2</version>
</dependency>
<dependency>
<groupId>com.google.zxing</groupId>
<artifactId>javase</artifactId>
<version>2.2</version>
</dependency>
Curiously your code works for me, but I had to remove the follow hint.
tmpHintsMap.put(DecodeHintType.PURE_BARCODE, Boolean.FALSE);
When my image is not pure barcode, this hint broke my result.
Thank you!
This Code worked for me.
public static List<string> ScanForBarcodes(string path)
{
return ScanForBarcodes(new Bitmap(path));
}
public static List<string> ScanForBarcodes(Bitmap bitmap)
{
// initialize a new Barcode reader.
BarcodeReader reader = new BarcodeReader
{
TryHarder = true, // TryHarder is slower but recognizes more Barcodes
PossibleFormats = new List<BarcodeFormat> // in the ZXing There is an Enum where all supported barcodeFormats were contained
{
BarcodeFormat.CODE_128,
BarcodeFormat.QR_CODE,
//BarcodeFormat. ... ;
}
};
return reader.DecodeMultiple(bitmap).Select(result => result.Text).ToList(); // return only the barcode string.
// If you want the full Result use: return reader.DecodeMultiple(bitmap);
}
Did you use this (ZXing.Net) Lib?