pdfbox get ArrayIndexOutOfBoundsException with TrueTypeFont.getUnicodeCmap

pdfbox get ArrayIndexOutOfBoundsException with TrueTypeFont.getUnicodeCmap - java

i am using pdfbox to render a pdf file to image, but i got ArrayIndexOutOfBoundsException when run TrueTypeFont.getUnicodeCmap method, cmapTable is empty and cmapTable.getCmaps()[0] led to out of bounds, here is the call stack
java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmap(TrueTypeFont.java:566)
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:183)
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:70)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:125)
at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:128)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:123)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:189)
at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:80)
and here is the getUnicodeCmap method in pdfbox
public CmapSubtable getUnicodeCmap(boolean isStrict) throws IOException
{
CmapTable cmapTable = getCmap();
if (cmapTable == null)
{
if (isStrict)
{
throw new IOException("The TrueType font does not contain a 'cmap' table");
}
else
{
return null;
}
}
CmapSubtable cmap = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE,
CmapTable.ENCODING_UNICODE_2_0_FULL);
if (cmap == null)
{
cmap = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE,
CmapTable.ENCODING_UNICODE_2_0_BMP);
}
if (cmap == null)
{
cmap = cmapTable.getSubtable(CmapTable.PLATFORM_WINDOWS,
CmapTable.ENCODING_WIN_UNICODE_BMP);
}
if (cmap == null)
{
// Microsoft's "Recommendations for OpenType Fonts" says that "Symbol" encoding
// actually means "Unicode, non-standard character set"
cmap = cmapTable.getSubtable(CmapTable.PLATFORM_WINDOWS,
CmapTable.ENCODING_WIN_SYMBOL);
}
if (cmap == null)
{
if (isStrict)
{
throw new IOException("The TrueType font does not contain a Unicode cmap");
}
else
{
// fallback to the first cmap (may not be Unicode, so may produce poor results)
cmap = cmapTable.getCmaps()[0];
}
}
return cmap;
}
i found this file contains fonts with custom encoding, and there is a "warm" comment in the pdfbox's code: fallback to the first cmap (may not be Unicode, so may produce poor results). So I suspect that the custom font encoding caused this problem, is this correct?

Related

How to fix the itext 7 exception Unexpected ColorSpace

I try to read a PDF document using the itext 7.1.9 and I get an exception that looks like the following:
com.itextpdf.kernel.pdf.canvas.parser.util.InlineImageParsingUtils$InlineImageParseException: Unexpected ColorSpace: /R9.
at com.itextpdf.kernel.pdf.canvas.parser.util.InlineImageParsingUtils.getComponentsPerPixel(InlineImageParsingUtils.java:257)
at com.itextpdf.kernel.pdf.canvas.parser.util.InlineImageParsingUtils.computeBytesPerRow(InlineImageParsingUtils.java:271)
at com.itextpdf.kernel.pdf.canvas.parser.util.InlineImageParsingUtils.parseUnfilteredSamples(InlineImageParsingUtils.java:298)
at com.itextpdf.kernel.pdf.canvas.parser.util.InlineImageParsingUtils.parseSamples(InlineImageParsingUtils.java:345)
at com.itextpdf.kernel.pdf.canvas.parser.util.InlineImageParsingUtils.parse(InlineImageParsingUtils.java:163)
at com.itextpdf.kernel.pdf.canvas.parser.util.PdfCanvasParser.parse(PdfCanvasParser.java:119)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:283)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:306)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
I suppose this method must be fixed in itext 7 library:
private static int getComponentsPerPixel(PdfName colorSpaceName, PdfDictionary colorSpaceDic) {
if (colorSpaceName == null) {
return 1;
} else if (colorSpaceName.equals(PdfName.DeviceGray)) {
return 1;
} else if (colorSpaceName.equals(PdfName.DeviceRGB)) {
return 3;
} else if (colorSpaceName.equals(PdfName.DeviceCMYK)) {
return 4;
} else {
if (colorSpaceDic != null) {
PdfArray colorSpace = colorSpaceDic.getAsArray(colorSpaceName);
if (colorSpace != null) {
if (PdfName.Indexed.equals(colorSpace.getAsName(0))) { // the fail is on here. colorSpace.getAsName(0) returns /ICCBased
return 1;
}
} else {
PdfName tempName = colorSpaceDic.getAsName(colorSpaceName);
if (tempName != null) {
return getComponentsPerPixel(tempName, colorSpaceDic);
}
}
}
throw (new InlineImageParseException("Unexpected ColorSpace: {0}.")).setMessageParams(new Object[]{colorSpaceName});
}
}
I don't understand why I get it and how to fix it.

This issue has been fixed in iText more than a year ago. The fix has first been released in version 7.1.16.
Thus, please update your iText version.
For details, the fix has been added in commit df4013c8f141059a91373008ac4a7013f6be0852 authored on 2021-04-14 10:36:24 and committed on 2021-05-25 15:11:36 with the comment
Add support processing inline image with ICCBased color space in resources
DEVSIX-5295

Apache POI XSLF remove shadow from text on the slide

I got the pptx file with simple presentation. It has background image, white text on it and this text has shadow. I need to simplify presentation and remove all this things (set backgroun to white, font color to black and remove shadows)
Change bachground and font colors are pretty straightforward, like this
SlideShow ppt = SlideShowFactory.create(inputStream);
List<Slide> slides= ppt.getSlides();
for (int i = 0; i< slides.size(); i++) {
Slide slide = slides.get(i);
((XSLFSlide)slide).getBackground().setFillColor(Color.white);
XSLFTextShape[] shapes = ((XSLFSlide) slide).getPlaceholders();
for (XSLFTextShape textShape : shapes) {
List<XSLFTextParagraph> textparagraphs = textShape.getTextParagraphs();
for (XSLFTextParagraph para : textparagraphs) {
List<XSLFTextRun> textruns = para.getTextRuns();
for (XSLFTextRun incomingTextRun : textruns) {
incomingTextRun.setFontColor(Color.black);
}
}
But i can't figure out how to remove shadows. Here is examle before and after
I tried to call getShadow() method on TextShape, but it's null, in XSLFTextRun there is no methods to manage text shadows. For HSLF i saw that there is setShadowed() for TextRun.
But how to deal with shadows in XSLF?
Thanks!
UPDATE:
Thanks Axel Richter for really valuable answer.
In my doc i found two cases with shadowed text.
First one is as Axel described, solution is to clean shadow from CTRegularTextRun. Also i find out that XSLFTextParagraph.getTextRuns() may contain LineBreak objects, so before casting XSLFTextRun.getXMLObject() - it's good idea to check that it's instance of CTRegularTextRun and not CTTextLineBreak
Code:
private void clearShadowFromTextRun(XSLFTextRun run) {
if (run.getXmlObject() instanceof CTRegularTextRun) {
CTRegularTextRun cTRun = (CTRegularTextRun) run.getXmlObject();
if (cTRun.getRPr() != null) {
if (cTRun.getRPr().getEffectLst() != null) {
if (cTRun.getRPr().getEffectLst().getOuterShdw() != null) {
cTRun.getRPr().getEffectLst().unsetOuterShdw();
}
}
}
}
}
Second case - SlideMaster contains some styles definitions for body and title. So if we want remove all shadows competely - we should clear them too.
Code:
private void clearSlideMastersShadowStyles(XMLSlideShow ppt) {
List<XSLFSlideMaster> slideMasters = ppt.getSlideMasters();
for (XSLFSlideMaster slideMaster : slideMasters) {
CTSlideMaster ctSlideMaster = slideMaster.getXmlObject();
if (ctSlideMaster.getTxStyles() != null) {
if (ctSlideMaster.getTxStyles().getTitleStyle() != null) {
clearShadowsFromStyle(ctSlideMaster.getTxStyles().getTitleStyle());
}
if (ctSlideMaster.getTxStyles().getBodyStyle() != null) {
clearShadowsFromStyle(ctSlideMaster.getTxStyles().getBodyStyle());
}
if (ctSlideMaster.getTxStyles().getOtherStyle() != null) {
clearShadowsFromStyle(ctSlideMaster.getTxStyles().getOtherStyle());
}
}
}
}
private void clearShadowsFromStyle(CTTextListStyle ctTextListStyle) {
if (ctTextListStyle.getLvl1PPr() != null) {
if (ctTextListStyle.getLvl1PPr().getDefRPr() != null)
if (ctTextListStyle.getLvl1PPr().getDefRPr().getEffectLst() != null)
if (ctTextListStyle.getLvl1PPr().getDefRPr().getEffectLst().getOuterShdw() != null)
ctTextListStyle.getLvl1PPr().getDefRPr().getEffectLst().unsetOuterShdw();
}
//same stuff for other 8 levels. Ugly uhh...
}

Settings of text shadow is not yet implemented in XSLFTextRun. But of course they are set in the XML.
A run having shadowed text looks like:
<a:r>
<a:rPr lang="de-DE" smtClean="0" dirty="0" b="1">
<a:effectLst>
<a:outerShdw dir="2700000" algn="tl" dist="38100" blurRad="38100">
<a:srgbClr val="000000">
<a:alpha val="43137"/>
</a:srgbClr>
</a:outerShdw>
</a:effectLst>
</a:rPr>
<a:t>The text...</a:t>
</a:r>
As you see there is a rPr ( run properties) having a effectLst having a outerShdw element. We can use ooxml-schemas classes and methods to set and unset this.
...
incomingTextRun.setFontColor(Color.black);
org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun cTRun = (org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun)incomingTextRun.getXmlObject();
if (cTRun.getRPr() != null) {
if (cTRun.getRPr().getEffectLst() != null) {
if (cTRun.getRPr().getEffectLst().getOuterShdw() != null) {
cTRun.getRPr().getEffectLst().unsetOuterShdw();
}
}
}
...

Using PDFBox to remove Optional Content Groups that are not enabled

I'm using apache PDFBox from java, and I have a source PDF with multiple optional content groups. What I am wanting to do is export a version of the PDF that includes only the standard content and the optional content groups that were enabled. It is important for my purposes that I preserve any dynamic aspects of the original.... so text fields are still text fields, vector images are still vector images, etc. The reason that this is required is because I intend to ultimately be using a pdf form editor program that does not know how to handle optional content, and would blindly render all of them, so I want to preprocess the source pdf, and use the form editing program on a less cluttered destination pdf.
I've been trying to find something that could give me any hints on how to do this with google, but to no avail. I don't know if I'm just using the wrong search terms, or if this is just something that is outside of what the PDFBox API was designed for. I rather hope it's not the latter. The info shown here does not seem to work (converting the C# code to java), because despite the pdf I'm trying to import having optional content, there does not seem to be any OC resources when I examine the tokens on each page.
for(PDPage page:pages) {
PDResources resources = page.getResources();
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
Collection tokens = parser.getTokens();
...
}
I'm truly sorry for not having any more code to show what I've tried so far, but I've just been poring over the java API docs for about 8 hours now trying to figure out what I might need to do this, and just haven't been able to figure it out.
What I DO know how to do is add text, lines, and images to a new PDPage, but I do not know how to retrieve that information from a given source page to copy it over, nor how to tell which optional content group such information is part of (if any). I am also not sure how to copy form fields in the source pdf over to the destination, nor how to copy the font information over.
Honestly, if there's a web page out there that I wasn't able to find with google with the searches that I tried, I'd be entirely happy to read up more about it, but I am really quite stuck here, and I don't know anyone personally that knows about this library.
Please help.
EDIT:
Trying what I understand from what was suggested below, I've written a loop to examine each XObject on the page as follows:
PDResources resources = pdPage.getResources();
Iterable<COSName> names = resources.getXObjectNames();
for(COSName name:names) {
PDXObject xobj = resources.getXObject(name);
PDFStreamParser parser = new PDFStreamParser(xobj.getStream().toByteArray());
parser.parse();
Object [] tokens = parser.getTokens().toArray();
for(int i = 0;i<tokens.length-1;i++) {
Object obj = tokens[i];
if (obj instanceof COSName && obj.equals(COSName.OC)) {
i++;
Object obj = tokens[i];
if (obj instanceof COSName) {
PDPropertyList props = resources.getProperties((COSName)obj);
if (props != null) {
...
However, after an OC key, the next entry in the tokens array is always an Operator tagged as "BMC". Nowhere am I finding any info that I can recognize from the named optional content groups.

Here's a robust solution for removing marked content blocks (open to feedback if anyone finds anything that isn't working right). You should be able to adjust for OC blocks...
This code properly handles nesting and removal of resources (xobject, graphics state and fonts - easy to add others if needed).
public class MarkedContentRemover {
private final MarkedContentMatcher matcher;
/**
*
*/
public MarkedContentRemover(MarkedContentMatcher matcher) {
this.matcher = matcher;
}
public int removeMarkedContent(PDDocument doc, PDPage page) throws IOException {
ResourceSuppressionTracker resourceSuppressionTracker = new ResourceSuppressionTracker();
PDResources pdResources = page.getResources();
PDFStreamParser pdParser = new PDFStreamParser(page);
PDStream newContents = new PDStream(doc);
OutputStream newContentOutput = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter newContentWriter = new ContentStreamWriter(newContentOutput);
List<Object> operands = new ArrayList<>();
Operator operator = null;
Object token;
int suppressDepth = 0;
boolean resumeOutputOnNextOperator = false;
int removedCount = 0;
while (true) {
operands.clear();
token = pdParser.parseNextToken();
while(token != null && !(token instanceof Operator)) {
operands.add(token);
token = pdParser.parseNextToken();
}
operator = (Operator)token;
if (operator == null) break;
if (resumeOutputOnNextOperator) {
resumeOutputOnNextOperator = false;
suppressDepth--;
if (suppressDepth == 0)
removedCount++;
}
if (OperatorName.BEGIN_MARKED_CONTENT_SEQ.equals(operator.getName())
|| OperatorName.BEGIN_MARKED_CONTENT.equals(operator.getName())) {
COSName contentId = (COSName)operands.get(0);
final COSDictionary properties;
if (operands.size() > 1) {
Object propsOperand = operands.get(1);
if (propsOperand instanceof COSDictionary) {
properties = (COSDictionary) propsOperand;
} else if (propsOperand instanceof COSName) {
properties = pdResources.getProperties((COSName)propsOperand).getCOSObject();
} else {
properties = new COSDictionary();
}
} else {
properties = new COSDictionary();
}
if (matcher.matches(contentId, properties)) {
suppressDepth++;
}
}
if (OperatorName.END_MARKED_CONTENT.equals(operator.getName())) {
if (suppressDepth > 0)
resumeOutputOnNextOperator = true;
}
else if (OperatorName.SET_GRAPHICS_STATE_PARAMS.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.EXT_G_STATE, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.DRAW_OBJECT.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.XOBJECT, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.SET_FONT_AND_SIZE.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.FONT, operands.get(0), suppressDepth == 0);
}
if (suppressDepth == 0) {
newContentWriter.writeTokens(operands);
newContentWriter.writeTokens(operator);
}
}
if (resumeOutputOnNextOperator)
removedCount++;
newContentOutput.close();
page.setContents(newContents);
resourceSuppressionTracker.updateResources(pdResources);
return removedCount;
}
private static class ResourceSuppressionTracker{
// if the boolean is TRUE, then the resource should be removed. If the boolean is FALSE, the resource should not be removed
private final Map<COSName, Map<COSName, Boolean>> tracker = new HashMap<>();
public void markForOperator(COSName resourceType, Object resourceNameOperand, boolean preserve) {
if (!(resourceNameOperand instanceof COSName)) return;
if (preserve) {
markForPreservation(resourceType, (COSName)resourceNameOperand);
} else {
markForRemoval(resourceType, (COSName)resourceNameOperand);
}
}
public void markForRemoval(COSName resourceType, COSName refId) {
if (!resourceIsPreserved(resourceType, refId)) {
getResourceTracker(resourceType).put(refId, Boolean.TRUE);
}
}
public void markForPreservation(COSName resourceType, COSName refId) {
getResourceTracker(resourceType).put(refId, Boolean.FALSE);
}
public void updateResources(PDResources pdResources) {
for (Map.Entry<COSName, Map<COSName, Boolean>> resourceEntry : tracker.entrySet()) {
for(Map.Entry<COSName, Boolean> refEntry : resourceEntry.getValue().entrySet()) {
if (refEntry.getValue().equals(Boolean.TRUE)) {
pdResources.getCOSObject().getCOSDictionary(COSName.XOBJECT).removeItem(refEntry.getKey());
}
}
}
}
private boolean resourceIsPreserved(COSName resourceType, COSName refId) {
return getResourceTracker(resourceType).getOrDefault(refId, Boolean.FALSE);
}
private Map<COSName, Boolean> getResourceTracker(COSName resourceType){
if (!tracker.containsKey(resourceType)) {
tracker.put(resourceType, new HashMap<>());
}
return tracker.get(resourceType);
}
}
}
Helper class:
public interface MarkedContentMatcher {
public boolean matches(COSName contentId, COSDictionary props);
}

Optional Content Groups are marked with BDC and EMC. You will have to navigate through all of the tokens returned from the parser and remove the "section" from the array. Here is some C# Code that was posted a while ago - [1]: How to delete an optional content group alongwith its content from pdf using pdfbox?
I investigated that (converting to Java) but couldn't get it work as expected. I managed to remove the content between BDC and EMC and then save the result using the same technique as the sample but the PDF was corrupted. Perhaps that is my lack of C# Knowledge (related to Tuples etc.)
Here is what I came up with, as I said it doesn't work perhaps you or someone else (mkl, Tilman Hausherr) can spot the flaw.
OCGDelete (PDDocument doc, int pageNum, String OCName) {
PDPage pdPage = (PDPage) doc.getDocumentCatalog().getPages().get(pageNum);
PDResources pdResources = pdPage.getResources();
PDFStreamParser pdParser = new PDFStreamParser(pdPage);
int ocgStart
int ocgLength
Collection tokens = pdParser.getTokens();
Object[] newTokens = tokens.toArray()
try {
for (int index = 0; index < newTokens.length; index++) {
obj = newTokens[index]
if (obj instanceof COSName && obj.equals(COSName.OC)) {
// println "Found COSName at "+index /// Found Optional Content
startIndex = index
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if (obj instanceof COSName) {
prop = pdRes.getProperties(obj)
if (prop != null && prop instanceof PDOptionalContentGroup) {
if ((prop.getName()).equals(delLayer)) {
println "Found the Layer to be deleted"
println "prop Name was " + prop.getName()
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if ((obj.getName()).equals("BDC")) {
ocgStart = index
println("OCG Start " + ocgStart)
ocgLength = -1
index++
while (index < newTokens.size()) {
ocgLength++
obj = newTokens[index]
println " Loop through relevant OCG Tokens " + obj
if (obj instanceof Operator && (obj.getName()).equals("EMC")) {
println "the next obj was " + obj
println "after that " + newTokens[index + 1] + "and then " + newTokens[index + 2]
println("OCG End " + ocgLength++)
break
}
index++
}
if (endIndex > 0) {
println "End Index was something " + (startIndex + ocgLength)
}
}
}
}
}
}
}
}
}
}
catch (Exception ex){
println ex.message()
}
for (int i = ocgStart; i < ocgStart+ ocgLength; i++){
newTokens.removeAt(i)
}
PDStream newContents = new PDStream(doc);
OutputStream output = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(output);
writer.writeTokens(newTokens);
output.close();
pdPage.setContents(newContents);
}

Highlight a text displayed in eclipse editor area programatically [duplicate]

A qustion about Eclipse PDE development: I write a small plugin for Eclipse and have the following
* an org.eclipse.ui.texteditor.ITextEditor
* a line number
How can I automatically jump to that line and mark it? It's a pity that the API seems only to support offsets (see: ITextEditor.selectAndReveal()) within the document but no line numbers.
The best would be - although this doesn't work:
ITextEditor editor = (ITextEditor)IDE.openEditor(PlatformUI.getWorkbench().getActiveWorkbenchWindow().getActivePage(), file, true );
editor.goto(line);
editor.markLine(line);
It this possible in some way? I did not find a solution

on the class DetailsView I found the following method.
private static void goToLine(IEditorPart editorPart, int lineNumber) {
if (!(editorPart instanceof ITextEditor) || lineNumber <= 0) {
return;
}
ITextEditor editor = (ITextEditor) editorPart;
IDocument document = editor.getDocumentProvider().getDocument(
editor.getEditorInput());
if (document != null) {
IRegion lineInfo = null;
try {
// line count internaly starts with 0, and not with 1 like in
// GUI
lineInfo = document.getLineInformation(lineNumber - 1);
} catch (BadLocationException e) {
// ignored because line number may not really exist in document,
// we guess this...
}
if (lineInfo != null) {
editor.selectAndReveal(lineInfo.getOffset(), lineInfo.getLength());
}
}
}

Even though org.eclipse.ui.texteditor.ITextEditor deals wiith offset, it should be able to take your line number with the selectAndReveal() method.
See this thread and this thread.
Try something along the line of:
((ITextEditor)org.eclipse.jdt.ui.JavaUI.openInEditor(compilationUnit)).selectAndReveal(int, int);

What to do with iText "Unexpected color space /CS0" type of exceptions

I have some files generated by unknown source that open just fine in PDF browsers (Reader/Foxit) but iText fails to process them. For particular file I get:
Exception in thread "main" java.lang.IllegalArgumentException: Unexpected colorspace /CS0
at com.itextpdf.text.pdf.parser.InlineImageUtils.getComponentsPerPixel(InlineImageUtils.java:238)
at com.itextpdf.text.pdf.parser.InlineImageUtils.computeBytesPerRow(InlineImageUtils.java:251)
at com.itextpdf.text.pdf.parser.InlineImageUtils.parseUnfilteredSamples(InlineImageUtils.java:280)
at com.itextpdf.text.pdf.parser.InlineImageUtils.parseInlineImageSamples(InlineImageUtils.java:320)
at com.itextpdf.text.pdf.parser.InlineImageUtils.parseInlineImage(InlineImageUtils.java:153)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:370)
at com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:79)
sometimes /CS0 color space changes to /CS1 through /CS9 (or something similar).
Is it a iText bug (I'm using java 1.7, iText 5.4.1) or are my pdf files just broken? Even if the pdf files are broken is there any way I can fix them? (Adobe Reader seems to do that somehow, but unfortunately opening the file and saving it again does not work).

I'm not familiar with PDF specification so I don't know if PDFs I worked with were valid or not. I did however managed to solve the problem by making changes to iText in file com.itextpdf.text.pdf.parser.InlineIamgeUtils method getComponentsPerPixel(...) from:
private static int getComponentsPerPixel(PdfName colorSpaceName, PdfDictionary colorSpaceDic){
if (colorSpaceName == null)
return 1;
if (colorSpaceName.equals(PdfName.DEVICEGRAY))
return 1;
if (colorSpaceName.equals(PdfName.DEVICERGB))
return 3;
if (colorSpaceName.equals(PdfName.DEVICECMYK))
return 4;
if (colorSpaceDic != null){
PdfArray colorSpace = colorSpaceDic.getAsArray(colorSpaceName);
if (colorSpace != null){
if (PdfName.INDEXED.equals(colorSpace.getAsName(0))){
return 1;
}
}
}
throw new IllegalArgumentException("Unexpected color space " + colorSpaceName);
}
to
private static int getComponentsPerPixel(PdfName colorSpaceName, PdfDictionary colorSpaceDic){
if (colorSpaceName == null)
return 1;
if (colorSpaceName.equals(PdfName.DEVICEGRAY))
return 1;
if (colorSpaceName.equals(PdfName.DEVICERGB))
return 3;
if (colorSpaceName.equals(PdfName.DEVICECMYK))
return 4;
if (colorSpaceDic != null){
PdfArray colorSpace = colorSpaceDic.getAsArray(colorSpaceName);
if (colorSpace != null){
if (PdfName.INDEXED.equals(colorSpace.getAsName(0))){
return 1;
}
} /* Begin mod # */ else {
PdfName tempName = colorSpaceDic.getAsName(colorSpaceName);
if(tempName != null) return(getComponentsPerPixel(tempName, colorSpaceDic));
} /* End mod */
}
throw new IllegalArgumentException("Unexpected color space " + colorSpaceName);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

pdfbox get ArrayIndexOutOfBoundsException with TrueTypeFont.getUnicodeCmap - java

Related

How to fix the itext 7 exception Unexpected ColorSpace

Apache POI XSLF remove shadow from text on the slide

Using PDFBox to remove Optional Content Groups that are not enabled

Highlight a text displayed in eclipse editor area programatically [duplicate]

What to do with iText "Unexpected color space /CS0" type of exceptions

Categories

Resources