reversed Arabic when printing PDF - java

I'm trying to print Arabic in some PDF documents using the Java code found here :
http://www.java2s.com/Code/Java/PDF-RTF/ArabicTextinPDF.htm
The example works great, except that the text comes out backwards. For example, changing the example slightly :
String txt = "\u0623\u0628\u062c\u062f\u064a\u0629 \u0639\u0631\u0628\u064a\u0629";
System.out.println(txt);
g2.drawString(txt, 100, 30);
What is printed on the screen are the same characters but in the opposite direction, compared to the PDF. The console output is correct, the PDF is not.
I don't want to simply reverse the characters because otherwise I would lose bi-directional support ...
Thanks much

IIRC, iText supports Arabic shaping at a highler level than drawString. Lets see here...
Ah! ColumnText.showTextAligned(PdfContentByte canvas, int alignment, Phrase phrase, float x, float y, float rotation, int runDirection, int arabicOptions)
Alignment is one of Element.ALIGN_*. Run direction is one of PdfWriter.RUN_DIRECTION_*. Arabic options are bit flags, ColumnText.AR_*
That should do the trick, with one caveat: I'm not sure that it'll handle multiple directions in the same phrase. Your test string has CJKV, Arabic, and Latin characters, so there should be two direction changes.
Good luck.

Figured it out, here is the complete process :
document.open();
java.awt.Font font = new java.awt.Font("times", 0, 30);
PdfContentByte cb = writer.getDirectContent();
java.awt.Graphics2D g2 = cb.createGraphicsShapes(PageSize.A4.width(), PageSize.A4.height());
g2.setFont(font);
String txt = "日本人 أبجدية عربية Dès Noël où";
System.out.println(txt);
java.awt.font.FontRenderContext frc = g2.getFontRenderContext();
java.awt.font.TextLayout layout = new java.awt.font.TextLayout(txt, font, frc);
layout.draw(g2, 15, 55);
g2.dispose();
document.close();
You'll notice it does multiple languages with bi-directional support. Only thing is it's impossible to copy/paste the resulting PDF text, as it is an image. I can live with that.

Unicode Arabic (or anything else) is always in logical order in a Java program. Some PDFs are made in visual order, though this is quite rare in the modern world. The program you cite might be a hack that ends up with PDF's that work, sort of, for some purposes.
If I were you, I'd start by examining some PDF's produced in Arabic by some modern tool.
This sort of 'graphics' approach to PDF construction seems risky to me at best.

Related

Is it possible to redact PDF areas with PDFBox by position?

The Context
Currently, I have a solution where I loop through a PDF and draw black rectangles throughout it.
So I already have a PDRectangle list representing the right areas I need to fill/cover on the pdf, hiding all the texts I want to.
The Problems
Problem number 1: The text underneath the black rectangle is easily copied, searchable, or extracted by other tools.
I solved this by flattening my pdf (converting it into an image so that it becomes a single layer document and the black rectangle can no longer be tricked). Same solution as described here:
Disable pdf-text searching with pdfBox
This is not an actual redacting, it's more like a workaround.
Which leads me to
Problem number 2:
My final PDF becomes an image document, where I lose all the pdf properties, including searching, copying... also it's a much slower process. I wanted to keep all the pdf properties while the redacted areas are not readable by any means.
What I want to accomplish
That being said, I'd like to know if it is possible and how I could do an actual redacting, blacken out rectangles areas since I already have all the positions I need, with PDFBox, keeping the pdf properties and not allowing the redacted area to be read.
Note: I'm aware of the problems PDFBox had with the old ReplaceText function, but here I have the positions I need to make sure I'd blank precisely the areas I need.
Also, I'm accepting other free library suggestions.
Technical Specification:
PDFBox 2.0.21
Java 11.0.6+10, AdoptOpenJDK
MacOS Catalina 10.15.4, 16gb, x86_64
My Code
This is how I draw the black rectangle:
private void draw(PDPage page, PDRectangle hitPdRectangle) throws IOException {
PDPageContentStream content = new PDPageContentStream(pdDocument, page,
PDPageContentStream.AppendMode.APPEND, false, false);
content.setNonStrokingColor(0f);
content.addRect(hitPdRectangle.getLowerLeftX(),
hitPdRectangle.getLowerLeftY() -0.5f,
hitPdRectangle.getUpperRightX() - hitPdRectangle.getLowerLeftX(),
hitPdRectangle.getUpperRightY() - hitPdRectangle.getLowerLeftY());
content.fill();
content.close();
}
This is how I convert it into an Image PDF:
private PDDocument createNewRedactedPdf() throws IOException {
PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
PDDocument redactedDocument = new PDDocument();
for (int pageIndex = 0; pageIndex < pdDocument.getNumberOfPages(); pageIndex++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(pageIndex, 200);
String formatName = "jpg";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(image, formatName, baos);
byte[] bimg = baos.toByteArray();
PDPage page = pdDocument.getPage(pageIndex);
float pageWidth = page.getMediaBox().getWidth();
float pageHeight = page.getMediaBox().getHeight();
PDPage pageDraw = new PDPage(new PDRectangle(pageWidth, pageHeight));
redactedDocument.addPage(pageDraw);
String imgSuffixName = pageIndex + "." + formatName;
PDImageXObject img = PDImageXObject.createFromByteArray(redactedDocument, bimg,
pdDocument.getDocument().getDocumentID() + imgSuffixName);
try (PDPageContentStream contentStream
= new PDPageContentStream(redactedDocument, pageDraw, PDPageContentStream.AppendMode.OVERWRITE, false)) {
contentStream.drawImage(img, 0, 0, pageWidth, pageHeight);
}
}
return redactedDocument;
}
Any thoughts?
What you want to have, a true redaction feature, is possible to implement based on PDFBox but it requires a lot of coding on top of it (similar to the pdfSweep add-on implemented on top of iText).
In particular you have found out yourself that it does not suffice to draw black rectangles over the areas to redact as text extraction or copy&paste from a viewer usually completely ignores whether text is visible or covered by something.
Thus, in the code you do have to find the actual instruction drawing the text to redact and remove them. But you cannot simply remove them without replacement, otherwise additional text on the same line may be moved by your redaction.
But you cannot simply replace them with the same number of spaces or a move-right by the width of the removed text: Just consider the case of a table you want to redact a column from with only "yes" and "no" entries. If after redaction a text extractor returns three spaces where there was a "yes" and two spaces where there was a "no", anyone looking at those results knows what there was in the redacted area.
You also have to clean up instructions around the actual text drawing instruction. Consider the example of the column to redact with "yes"/"no" information again, but this time for more clarity the "yes" is drawn in green and the "no" in red. If you only replace the text drawing instructions, someone with an extractor that also extracts attributes like the color will immediately know the redacted information.
In case of tagged PDFs, the tag attributes have to be inspected too. There in particular is an attribute ActualText which contains the actual text represented by the tagged instructions (in particular for screen readers). If you only remove the text drawing instructions but leave the tags with their attributes, anyone reading using a screen reader may not even realize that you tried to redact something as his screen reader reads the complete, original text to him.
For a proper redaction, therefore, you essentially have to interpret all the current instructions, determine the actual content they draw, and create a new set of instructions which draws the same content without unnecessary extra instructions which may give away something about the redacted content.
And here we only looked at redacting the text; redacting vector and bitmap graphics on a PDF page has a similar amount of challenges to overcome for proper redaction.
...
Thus, the code required for actual redaction is beyond the scope of a stack overflow answer. Nonetheless, the items above may help someone implementing a redactor not to fall into typical traps of too naive redaction code.

LibGDX RTL font rendering?

I am creating an application that should support two languages, English, and Hebrew.
The problem is that Hebrew is Right-To-Left western language and English is Left-To-Right language, and LibGDX does not support RTL fonts.
I have created the bitmap for the font and everything works
But when I write in hebrew, it will write the words reversed. I have a solution for this when I write solely in hebrew, just reverse the words using a StringBuilder, but that's a cheap-fix. But what if I want to implemnet a chat, or character name?
From what I can see the easiest solution is to use Heiro. If you look at this thread Hiero Rendering Arabic fonts Right to Left where there is recent provision to accomodate RTL
From there it becomes increasingly difficult. There are quite a few questions about this issue (one example Showing non-western language from right to left in libgdx (Android)) and fewer solutions.
You have the option of creating a library of glyphs of strings for commonly used words or expression, though this is a painstaking process to set up and there is an overhead in terms of time when using chat, as there is with your string reversal.
This discussion in the libgdx github Support for complex font rendering (Chinese, Arabic, ...). goes into these and more options including work done to support Windows sridharsundaram/complexscriptlayout, which, although that is not Android, may be worth investigating for further development ideas.
On the positive side, there are an increasing number of recent developments in this front, so RTL and bidi formats should become increasingly easier for developers using libgdx.
Of interest is this support issue Right-To-Left Text Rendering Support #787 as there are breadcrumb trails of people with the same issue developing resources.
As of right now, there really isn't a way to render Right to Left text, as shown by this thread about it. So the only way to really do it is to reverse the text with StringBuilder, and then display that. A more efficient way to render the reversed text is to create a method that will display the text accordingly, so you don't have to reverse it every time you try to write Right to Left text. If you create a method, you will be able to implement the RTL text into chats, names, or other graphics that require RTL fonts.
I also recommend converting your Bitmap to a .ttf file so that it is easy to use your custom font while also keeping a good quality. You can then use the FreeTypeFontGenerator to render your font nicely. If you cannot convert your Bitmap to a font you could also use your method of displaying text in the below method. A really good alternative is the Hiero library. You can select the RTL text check box.
Here is an example of a method that you could create to render the text (using the FreeTypeFontGenerator):
// Keep the generator here so that it is not created each time
FreeTypeFontGenerator generator = new FreeTypeFontGenerator(Gdx.files.internal("fontFile.ttf"));
public void renderRTL(float x, float y, int fontSize, String text) {
batch.begin(); // Lets you draw on the screen
// Reverses the text given
StringBuilder builder = new StringBuilder();
builder.append(text);
builder.reverse();
String outputText = builder.toString();
// Creates the font
FreeTypeFontGenerator.FreeTypeFontParameter parameter = new FreeTypeFontGenerator.FreeTypeFontParameter();
parameter.size = fontSize;
parameter.characters = "ALL HEBREW CHARACTERS"; // Put all your Hebrew characters in this String
scoreFont = generator.generateFont(parameter);
scoreFont.setColor(Color.BLACK);
scoreFont.draw(batch, outputText, x, y); // Actually draws the text
generator.dispose(); // Clears memory
batch.end();
}
Make sure to add all of these dependencies into your build.gradle file (in the dependencies section):
compile "com.badlogicgames.gdx:gdx-freetype:$gdxVersion"
natives "com.badlogicgames.gdx:gdx-freetype-platform:$gdxVersion:natives-armeabi"
natives "com.badlogicgames.gdx:gdx-freetype-platform:$gdxVersion:natives-armeabi-v7a"
natives "com.badlogicgames.gdx:gdx-freetype-platform:$gdxVersion:natives-x86"
You could optionally add a color parameter (in RGBA format) to your method for more functionality. Hope it helps!
Use this solution on github :
https://github.com/ultra-deep/libgdx-rtl-support

List of possible causes / workarounds for broken font metrics in Java? Or do I have to switch to LaTeX?

I'm working on an application that I thought would be fairly simple / straightforward. All it does is render a string into a BufferedImage in a particular font (from a file, not installed on the system). I'm making some progress, but after several days of struggling with it, Java seems very fickle about font metrics. Sometimes it likes to give good font metrics and sometimes it likes to return some combination of all zeroes and all 240s (regardless of the size of the font) for no readily apparent reason.
In a few cases I found that just calling f = f.deriveFont(myFloat) would cause total failure of font metrics, going from valid numbers to all zeroes, whereas calling f = f.deriveFont(Font.PLAIN, (int) myInt) would preserve the metrics so the new font instance would work properly. (Yes, I know that's supposed to be a float, but for some reason calling it with a float was in several cases breaking the font metrics again.)
Through a combination of TextAttributes in the Font and RenderingHints in the Graphics2D object, I've managed to get most of the fonts to work... but I still have a handful of OpenType fonts (.otf) that have broken metrics in Java (JSE/JDK 7) while they display fine in the Windows 7 font viewer.
So I chose one of these fonts and I stripped out everything in my application and am just getting the most raw view of the font that I can in a Main method. The echo() method in this code is just a shorthand for System.out.println(). (Also, I'm sorry for the sloppiness of this code - I'm just throwing in anything I can find that might potentially yield new information at this point.)
BufferedImage img = new BufferedImage(730, 70, TYPE_INT_ARGB);
Graphics2D g = img.createGraphics();
FontRenderContext frc = g.getFontRenderContext();
String text = "Hello World";
File media = new File("z:\\path\\to\\fonts\\LondonDoodles.otf");
Font f7 = Font.createFont(Font.TRUETYPE_FONT, media);
f7 = f7.deriveFont(Font.PLAIN, (int) 20);
TextLayout lay = new TextLayout(text, f7, frc);
Rectangle2D r = lay.getPixelBounds(frc, 0, 0);
echo("NumGlyphs=" + f7.getNumGlyphs());
echo("bounds=" + r); // these bounds are all zeros
// use an array of GlyphCodes instead of a String
// to try and get at the glyphs directly without going through the CMAP
GlyphVector gv = f7.createGlyphVector(frc, new int[] { 1, 2, 3, 4, 5 });
echo("Glyph0Logical=" + gv.getGlyphLogicalBounds(0).getBounds2D()); // all 240s, go figure
echo("Glyph0Pixel=" + gv.getGlyphPixelBounds(0, frc, 0, 0)); // all zeros
echo("GlyphVectorLogical=" + gv.getLogicalBounds()); // random assortment of zeros & 240s
echo("GlyphVectorPixel=" + gv.getPixelBounds(frc, 0, 0)); // all zeros
echo(g.getFontMetrics(f7)); // 240s
LineMetrics lm = f7.getLineMetrics(text, frc);
echo("LineMetrics=" + lm.getAscent() + "/" + lm.getDescent() + "/" + lm.getHeight() + "/" + lm.getLeading()); // 240s
The output from this code looks like this:
NumGlyphs=33
bounds=java.awt.Rectangle[x=0,y=0,width=0,height=0]
Glyph0Logical=java.awt.geom.Rectangle2D$Float[x=240.0,y=240.0,w=240.0,h=240.0]
Glyph0Pixel=java.awt.Rectangle[x=0,y=0,width=0,height=0]
GlyphVectorLogical=java.awt.geom.Rectangle2D$Float[x=0.0,y=240.0,w=0.0,h=240.0]
GlyphVectorPixel=java.awt.Rectangle[x=0,y=0,width=0,height=0]
sun.font.FontDesignMetrics[font=java.awt.Font[family=London Doodles,name=LondonDoodles,style=plain,size=20]ascent=-239, descent=240, height=241]
LineMetrics=-240.0/240.0/240.0/240.0
It's obviously possible to get correct font metrics from the .otf file this font came in, because the Windows 7 font viewer and another app written in C is able to render it onto a PNG like I'm trying to do here, I just can't figure out why Java doesn't like it.
It's not throwing an IOException or a FontFormatException, so I
know it's reading the file and creating an ostensibly valid Font
object.
In the code above, I created the GlyphVector using an int-array,
which is the GlyphCodes indexed from 1 to the number of glyphs in
the font (in this case 33) so it wouldn't have to search the CMAP
that translates from Unicode CodePoints to glyphs and potentially
not find those characters.
Also you can see in the echo
of the FontMetrics that the Font is reporting a size of 20pts, so
it's not like it's been set to size=0 or anything. The 240s in the
output don't change, regardless of what size I set on the Font.
After several days of trying to come up with search terms for Google, I still can't find mention of this specific problem. It seems really odd. You would think that font metrics seemingly randomly breaking left and right (though I'm sure it's not actually random) would be something people would be asking about if they were having this issue, particularly given that all the fonts I've tested seem to consistently report a combination of zeros and 240s. That latter part, that they all consistently report 240s, seems oddly specific for a problem that can't be easily Googled.
Anyway, I'm wondering if anybody's got a list of items (text attributes, graphics settings, JRE flags, OS environment variables, font-creation programs, flags within the font file, etc.) that might cause bad font metrics in Java? Or maybe a list of workarounds for broken font metrics? Or are fonts just horrible in Java and I should be looking into LaTeX? (Which to be honest, I know nothing about at this point.)
Thanks!
EDIT: Well... this is kind of frustrating... I thought maybe if I dug into the "open source" for OpenJDK that maybe I could produce a derived implementation of Font that could resolve these metrics issues... but apparently that's been deliberately prevented by making all the necessary internals private, static and final. See: Fonts, how to extend them ... and I can't replace core classes at runtime - see: Replacing java class? ... so the only alternative if I want to try and fix the metrics that way is to recreate the entire Java font architecture with full copies of java.awt.Font, sun.font.Font2D, etc. and then draw the glyphs onto the Graphics2D object manually since Graphics.drawGlyphVector() and Graphics.drawString() won't work without Font or a derivative of Font because they used the Font class as the argument to Graphics.setFont() instead of declaring an interface for Font to implement. ... unless I'm misreading the answer on that 2nd Stack Overflow reference? Am I misreading that? Could I create a custom class loader that would substitute the Font class with a modified version?
EDIT: Maybe I should have done more research before I posted the last edit, but for anyone who's reading this question and wondering, yes, you can substitute modified versions of core java classes with a JVM argument. See: http://media.techtarget.com/tss/static/articles/content/CovertJava/Sams-CovertJava-15.pdf So what I'm doing right now is digging through reams of core classes to try and find the place where metrics are read in from the font file. If I can find that, then I can substitute that class with a modified version that will correct the broken metrics.

iText drawing in JavaGraphics2D in CMYK

If i use
PdfContentByte cb = writer.getDirectContent();
cb.setColorFill(new CMYKColor(c, m, y, k));
it's straightforward. However i have some Swing components, that draw self in Graphics2D, it is very convince to use something like that:
PdfContentByte cb = writer.getDirectContent();
Graphics2D g2 cb.createGraphics(w, h);
mySwingComponent.paint(g2);
g2.dispose();
It works fine, but the colors are translated form java's sRGB to CMYK by iText. I want to draw direct with CMYK colors. I am trying to do it so in my Swing component
MySwingComponent extends JComponenet {
void paint(Graphics2D g2) {
g.setColor(new com.lowagie.text.pdf.CMYKColor(0, 0, 0, 1));
}
}
Unfortunately it just does not work. Is there some way exactly to specify which CMYK color will be painted?
P.S. Background of my problem is that if i draw something in garyscale, then in Adobe Illustrator in PDF color is not just (0, 0, 0, 0.4), but like (0.1, 0.15, 0.2, 0.4f).
UPD: I have solution now:
Just using
g2.setPaint(new CMYKColor(1f, 0.0f, 0.0f, 0.0f));
in your paint(Graphics2D g2)
will force iText to produce CMYK Pdf
I had your exact problem. I messed around with the API but finally had to look at the source code. I came up with two solutions.
The first solution is to modify the source code to fit your needs (after all, isn't that the definition of free software?). The source code can be extracted from the itextpdf-5.1.3-sources.jar file (or whatever version of the library you have). The line causing the CMYK/RGB issue is line 1650 of the PdfGraphics2D.java file (com/itextpdf/text/pdf/PdfGraphics2D.java). You should see a line that says:
cb.setColorFill(new BaseColor(color));
If you want a quick and dirty fix, simply change that line to:
cb.setColorFill(new CMYKColor(0f, 0f, 0f, 1f));
This, of course, limits you to one color, but now that you know which line is handling the actual color, you can modify the class and add some functionality/state (if you need it). You'll need to add
import com.itextpdf.text.pdf.CMYKColor;
to the top of the file as well. N.B. Line 1650 handles fills. If you're doing strokes, simply modify the same thing in the else statement (it should be clear when you look at the file).
Compile the source:
javac -cp path/to/itextpdf-5.1.3.jar path/to/PdfGraphics2D.java
Change to the root of the itextpdf-5.1.3-sources folder and update the jar:
jar uf path/to/itextpdf-5.1.3.jar com/itextpdf/text/pdf/PdfGraphics2D.class
And that's it! Your PDF file will now render the color using the CMYK value you specified. This is great for something simple, but if you need more functionality, you will have to modify the PdfGraphics2D class some more. I was personally using this to draw CMYK black fonts using the drawGlyphVector method.
Second solution:
If the first solution doesn't work for you, you can always edit/parse the PostScript directly. In your method that is creating the PDF, add the line Document.compress = false; after you instantiate the PdfWriter. Now you can view the PDF file in a text editor. Search around and you'll find some lines like
0 0 0 1 k or 0 0 1 rg These lines are setting colors (CMYK black and RGB black, respectively). Lowercase letters after the color values (which are floats, it seems) mean fill and uppercase is stroke. So 0 0 0 1 K would be a CMYK black stroke and so forth.
You could read the PDF in line by line and basically do a "search and replace" (in Java, programmatically, of course) for lines ending in "rg". Hope that makes sense. Not terribly fast, since this requires an extra disk read and write...
Hope that helps.

Incorrect / missing font metrics in Java?

Using a certain font, I use Java's FontLayout to determine its ascent, descent, and leading. (see Java's FontLayout tutorial here)
In my specific case I'm using Arial Unicode MS, font size 8. Using the following code:
Font font = new Font("Arial Unicode MS", 0, 8);
TextLayout layout = new TextLayout("Pp", font,
new FontRenderContext(null, true, true));
System.out.println( "Ascent: "+layout.getAscent());
System.out.println( "Descent: "+layout.getDescent());
System.out.println( "Leading: "+layout.getLeading());
Java gives me the following values:
Ascent: 8.550781
Descent: 2.1679688
Leading: 0.0
So far so good. However if I use the sum of these values as my line spacing for various lines of text, this differs by quite a bit from the line spacing used in OpenOffice, Microsoft Word, etc.: it is smaller. When using default single line spacing Word and OO seem to have a line spacing of around 13.7pt (instead of 10.7pt like I computed using Java's font metrics above).
Any idea
why this is?
whether I can somehow access the font information Word and OpenOffice seem to be accessing which leads to this different line spacing?
Things I've tried so far:
adding all glyphs to a glyph vector with font.getNumGlyphs() etc. - still get the same font metrics values
using multiple lines as described here - each line I get has the same font metrics as outlined above.
using FontMetrics' methods such as getLeading()
Zarkonnen doesn't deserve his downvotes as he's on the right lines. Many Java fonts appear to return zero for their leading when perhaps they shouldn't. Maybe it is down to this bug: I don't know. It would appear to be down to you to put this whitespace back in.
Typographical line height is usually defined as ascent + descent + leading. Ascent and descent are measured upwards and downwards from the baseline that characters sit on, and the leading is the space between the descent of one line and the ascent of the line underneath.
But leading is not fixed. You can set the leading in most Word-processing and typographical software. Word calls this the line-spacing. The original question is probably asking how Microsoft Word calculates its single line spacing. Microsoft's recommendations for OpenType fonts seem to suggest that software on different platforms calculate it differently. (Maybe this is why Java now returns zero?)
A quick bit of Googling around seems to indicate that a rule of thumb for leading is 120% of ascent+descent for single-line spacing, or a fixed point spacing; say 2pts leading between all lines. In the absence of any hard or fast rule I can find, I would say it boils down to the legibility of the text you're presenting, and you should just go with what you think looks best.
Are Word and OO including the white space between lines, while Java isn't?
So in Word / OO, your number is Ascent + Descent + Whitespace, while in Java you just have Ascent + Descent?

Categories

Resources