Masking aadhaar number in an UIDAI aadhaar pdf

Masking aadhaar number in an UIDAI aadhaar pdf - java

Does anybody know how to mask aadhaar number in an aadhaar pdf downloaded from UIDAI website.
I have already tried below code for itext. It is not working for an aadhaar PDF but is working for other normal pdfs
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new MainClassName().manipulatePdf(SRC, DEST);
}
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
PdfArray refs = null;
if (dict.get(PdfName.CONTENTS).isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
} else if (dict.get(PdfName.CONTENTS).isIndirect()) {
refs = new PdfArray(dict.get(PdfName.CONTENTS));
}
for (int i = 0; i < refs.getArrayList().size(); i++) {
PRStream stream = (PRStream) refs.getDirectObject(i);
byte[] data = PdfReader.getStreamBytes(stream);
stream.setData(new String(data).replace("8989 7890 4567", "XXXX XXXX 4567").getBytes());
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
Tried suggestions from this post too (i.e, using pdfbox)
Search and replace text in PDF using JAVA
Even that is not working for an aadhaar PDF but is working for a normal pdf
(I know we can download a masked aadhaar pdf from UIDAI website itself, but I need to do it through JAVA)

you can convert your PDF to an Image then with help of Google Cloud OCR
and then mask text and convert it back to PDF.
Here is Sample Project in javascript.
https://github.com/rinkeshjain/aadhaar-mask

Related

Merging of different file format into single pdf using itext is giving corrupt pdf

I have a multipart form in which different type of files will be uploaded in single request like pdf,png,jpg etc. I need to read these files and create final merged single pdf file. Does iText will support this kind of merger of different file types.
I can able to merge all image type but when i try to merge pdf with imgae getting some issues.Any sample example on this.Below code creating final pdf without images.
public void generateMergedPDF(Map<String, String> dataMap, MultipartFile[] files) throws Exception {
ClassPathResource resource = new ClassPathResource("templatepdfForm.pdf");
FileOutputStream userInfo = new FileOutputStream(new File("C:\\pdf\\megepath\\updateddocument.pdf"));
//Update filed values to template Starts
PdfReader reader = new PdfReader(resource.getInputStream());
PdfStamper stamper = new PdfStamper(reader, userInfo);
stamper.setFormFlattening(true);
AcroFields form = stamper.getAcroFields();
Map<String, Item> fieldMap = form.getFields();
for (String key : fieldMap.keySet()) {
String fieldValue = dataMap.get(key);
if (fieldValue != null) {
form.setField(key, fieldValue);
}
}
stamper.close();
//Update filed values to template Ends
Document mergePdfDoc = new Document();
PdfCopy pdfCopy;
boolean smartCopy = false;
FileOutputStream finalFile = new FileOutputStream("C:\\pdf\\finalmergedfile.pdf");
//Merge updated pdf with multipart content
if(smartCopy)
pdfCopy = new PdfSmartCopy(mergePdfDoc, finalFile);
else
pdfCopy = new PdfCopy(mergePdfDoc, finalFile);
PdfWriter writer = PdfWriter.getInstance(mergePdfDoc, finalFile);
mergePdfDoc.open();
PdfReader mergeReader = new PdfReader(new FileInputStream(new File("C:\\pdf\\megepath\\updateddocument.pdf")));
pdfCopy.addDocument(mergeReader);
pdfCopy.freeReader(mergeReader);
mergeReader.close();
PdfReader[] pdfReader = new PdfReader[files.length];
for(int i=0; i<=files.length-1;i++) {
if(FileContentType.APPLICATION_TYPE.getContentTypes().contains(files[i].getContentType())) {
//To add multipart pdf content
pdfReader[i] = new PdfReader(files[i].getInputStream());
pdfCopy.addDocument(pdfReader[i]);
pdfCopy.freeReader(pdfReader[i]);
pdfReader[i].close();
}else if(FileContentType.IMAGE_TYPE.getContentTypes().contains(files[i].getContentType())) {
//To add multipart image content
System.out.println("Image Type Loop");
Image fileImage = Image.getInstance(files[i].getBytes());
mergePdfDoc.setPageSize(fileImage);
mergePdfDoc.newPage();
claimImage.setAbsolutePosition(0, 0);
mergePdfDoc.add(fileImage);
}
}
pdfCopy.setMergeFields();
//mergePdfDoc.close(); //If i enable this close stream closed error
//writer.close(); //If i enable this close stream closed error
memInfo.close();
finalFile.close();
}

Removing all embedded files in iTextSharp

As I've become interested in iTextSharp I need to learn C#. Since I know a bit of AutoHotkey (simple yet powerful script programming language for Windows) it is easer for me. However, I often come across code written in Java which is said to be easily converted to C#. Unfortunetly, I have some problems with it. Let's have a look at original code written by Bruno Lowagie.
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary root = reader.getCatalog();
PdfDictionary names = root.getAsDict(PdfName.NAMES);
names.remove(PdfName.EMBEDDEDFILES);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
}
This is what I have managed to write on my own:
static void removeFiles(string sourceFilePath, string destFilePath)
{
try
{
// read src file
FileStream inputStream = new FileStream(sourceFilePath, FileMode.Open, FileAccess.Read, FileShare.None);
Document source = new Document();
// open for reading
PdfWriter reader = PdfReader.GetInstance(inputStream);
source.Open();
// create dest file
FileStream outputStream = new FileStream(destFilePath, FileMode.Create, FileAccess.ReadWrite, FileShare.None);
Document dest = new Document();
PdfWriter writer = PdfWriter.GetInstance(dest, inputStream); // open stream from src
// remove embedded files from dest
PdfDictionary root = dest.getCatalog().getPdfObject();
PdfDictionary names = root.getAsDictionary(PdfName.Names);
names.remove(PdfName.EmbeddedFiles);
// close all
source.Close();
dest.Close();
}
catch (Exception ex)
{
}
}
Unfortunately, there are many errors such as:
'Document' does not contain a definition for 'getCatalog' and no extension method 'getCatalog'
'PdfReader' does not contain a definition for 'GetInstance'
This is what I have managed to do after countless hours of coding and googling.

There are some iTextSharp examples available. See for instance: How to read a PDF Portfolio using iTextSharp
I don't know much about C#, but this is a first attempt to fix your code:
static void RemoveFiles(string sourceFilePath, string destFilePath)
{
// read src file
FileStream inputStream = new FileStream(sourceFilePath, FileMode.Open, FileAccess.Read, FileShare.None);
// open for reading
PdfReader reader = new PdfReader(inputStream);
FileStream outputStream = new FileStream(destFilePath, FileMode.Create, FileAccess.ReadWrite, FileShare.None);
PdfStamper stamper = new PdfStamper(reader, outputStream);
// remove embedded files
PdfDictionary root = reader.Catalog;
PdfDictionary names = root.GetAsDict(PdfName.NAMES);
names.Remove(PdfName.EMBEDDEDFILES);
// close all
stamper.Close();
reader.Close();
}
Note that I don't understand why you were using Document and PdfWriter. You should use PdfStamper instead. Also: you are only removing the document-level attachments. If there are file attachment annotations, they will still be present in the PDF.

Extracting an embedded object from a pdf

I had embedded a byte array into a pdf file (Java).
Now I am trying to extract that same array.
The array was embedded as a "MOVIE" file.
I couldn't find any clue on how to do that...
Any ideas?
Thanks!
EDIT
I used this code to embed the byte array:
public static void pack(byte[] file) throws IOException, DocumentException{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(RESULT));
writer.setPdfVersion(PdfWriter.PDF_VERSION_1_7);
writer.addDeveloperExtension(PdfDeveloperExtension.ADOBE_1_7_EXTENSIONLEVEL3);
document.open();
RichMediaAnnotation richMedia = new RichMediaAnnotation(writer, new Rectangle(0,0,0,0));
PdfFileSpecification fs
= PdfFileSpecification.fileEmbedded(writer, null, "test.avi", file);
PdfIndirectReference asset = richMedia.addAsset("test.avi", fs);
RichMediaConfiguration configuration = new RichMediaConfiguration(PdfName.MOVIE);
RichMediaInstance instance = new RichMediaInstance(PdfName.MOVIE);
RichMediaParams flashVars = new RichMediaParams();
instance.setAsset(asset);
configuration.addInstance(instance);
RichMediaActivation activation = new RichMediaActivation();
richMedia.setActivation(activation);
PdfAnnotation richMediaAnnotation = richMedia.createAnnotation();
richMediaAnnotation.setFlags(PdfAnnotation.FLAGS_PRINT);
writer.addAnnotation(richMediaAnnotation);
document.close();

I have written a brute force method to extract all streams in a PDF and store them as a file without an extension:
public static final String SRC = "resources/pdfs/image.pdf";
public static final String DEST = "results/parse/stream%s";
public static void main(String[] args) throws IOException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new ExtractStreams().parse(SRC, DEST);
}
public void parse(String src, String dest) throws IOException {
PdfReader reader = new PdfReader(src);
PdfObject obj;
for (int i = 1; i <= reader.getXrefSize(); i++) {
obj = reader.getPdfObject(i);
if (obj != null && obj.isStream()) {
PRStream stream = (PRStream)obj;
byte[] b;
try {
b = PdfReader.getStreamBytes(stream);
}
catch(UnsupportedPdfException e) {
b = PdfReader.getStreamBytesRaw(stream);
}
FileOutputStream fos = new FileOutputStream(String.format(dest, i));
fos.write(b);
fos.flush();
fos.close();
}
}
}
Note that I get all PDF objects that are streams as a PRStream object. I also use two different methods:
When I use PdfReader.getStreamBytes(stream), iText will look at the filter. For instance: page content streams consists of PDF syntax that is compressed using /FlateDecode. By using PdfReader.getStreamBytes(stream), you will get the uncompressed PDF syntax.
Not all filters are supported in iText. Take for instance /DCTDecode which is the filter used to store JPEGs inside a PDF. Why and how would you "decode" such a stream? You wouldn't, and that's when we use PdfReader.getStreamBytesRaw(stream) which is also the method you need to get your AVI-bytes from your PDF.
This example already gives you the methods you'll certainly need to extract PDF streams. Now it's up to you to find the path to the stream you need. That calls for iText RUPS. With iText RUPS you can look at the internal structure of a PDF file. In your case, you need to find the annotations as is done in this question: All links of existing pdf change the action property to inherit zoom - iText library
You loop over the page dictionaries, then loop over the /Annots array of this dictionary (if it's present), but instead of checking for /Link annotations (which is what was asked in the question I refer to), you have to check for /RichMedia annotations and from there examine the assets until you find the stream that contains the AVI file. RUPS will show you how to dive into the annotation dictionary.

extracting one page from pdf file using iText

I want to return one page from pdf files from java servlet (to reduce file size download), using itext library.
using this code
try {
PdfReader reader = new PdfReader(input);
Document document = new Document(reader.getPageSizeWithRotation(page_number) );
PdfSmartCopy copy1 = new PdfSmartCopy(document, response.getOutputStream());
copy1.setFullCompression();
document.open();
copy1.addPage(copy1.getImportedPage(reader, page_i) );
copy1.freeReader(reader);
reader.close();
document.close();
} catch (DocumentException e) {
e.printStackTrace();
}
this code returns the page, but the file size is large and some times equals the original file size, even it is just a one page.

I have downloaded a single file from your repository: Abdomen.pdf
I have then used the following code to "burst" that PDF:
public static void main(String[] args) throws DocumentException, IOException {
PdfReader reader = new PdfReader("resources/Abdomen.pdf");
int n = reader.getNumberOfPages();
reader.close();
String path;
PdfStamper stamper;
for (int i = 1; i <= n; i++) {
reader = new PdfReader("resources/abdomen.pdf");
reader.selectPages(String.valueOf(i));
path = String.format("results/abdomen/p-%s.pdf", i);
stamper = new PdfStamper(reader,new FileOutputStream(path));
stamper.close();
reader.close();
}
}
To "burst" means to split in separate pages. While the original file Abdomen.pdf is 72,570 KB (about 70.8 MB), the separate pages are much smaller:
I can not reproduce the problem you describe.

A bit more updated and a lot cleaner (5.5.6 and up) :
/**
* Manipulates a PDF file src with the file dest as result
* #param src the original PDF
* #param dest the resulting PDF
* #throws IOException
* #throws DocumentException
*/
public void manipulatePdf(String src, String dest)
throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
SmartPdfSplitter splitter = new SmartPdfSplitter(reader);
int part = 1;
while (splitter.hasMorePages()) {
splitter.split(new FileOutputStream("results/merge/part_" + part + ".pdf"), 200000);
part++;
}
reader.close();
}

Editing PDF text using Java

Is there a way I can edit a PDF from Java?
I have a PDF document which contains placeholders for text that I need to be replaced using Java, but all the libraries that I saw created PDF from scratch and small editing functionality.
Is there anyway I can edit a PDF or is this impossible?

You can do it with iText. I tested it with following code. It adds a chunk of text and a red circle over each page of an existing PDF.
/* requires itextpdf-5.1.2.jar or similar */
import java.io.*;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.*;
public class AddContentToPDF {
public static void main(String[] args) throws IOException, DocumentException {
/* example inspired from "iText in action" (2006), chapter 2 */
PdfReader reader = new PdfReader("C:/temp/Bubi.pdf"); // input PDF
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream("C:/temp/Bubi_modified.pdf")); // output PDF
BaseFont bf = BaseFont.createFont(
BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED); // set font
//loop on pages (1-based)
for (int i=1; i<=reader.getNumberOfPages(); i++){
// get object for writing over the existing content;
// you can also use getUnderContent for writing in the bottom layer
PdfContentByte over = stamper.getOverContent(i);
// write text
over.beginText();
over.setFontAndSize(bf, 10); // set font and size
over.setTextMatrix(107, 740); // set x,y position (0,0 is at the bottom left)
over.showText("I can write at page " + i); // set text
over.endText();
// draw a red circle
over.setRGBColorStroke(0xFF, 0x00, 0x00);
over.setLineWidth(5f);
over.ellipse(250, 450, 350, 550);
over.stroke();
}
stamper.close();
}
}

I modified the code found a bit and it was working as follows
public class Principal {
public static final String SRC = "C:/tmp/244558.pdf";
public static final String DEST = "C:/tmp/244558-2.pdf";
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new Principal().manipulatePdf(SRC, DEST);
}
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
PdfArray refs = null;
if (dict.get(PdfName.CONTENTS).isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
} else if (dict.get(PdfName.CONTENTS).isIndirect()) {
refs = new PdfArray(dict.get(PdfName.CONTENTS));
}
for (int i = 0; i < refs.getArrayList().size(); i++) {
PRStream stream = (PRStream) refs.getDirectObject(i);
byte[] data = PdfReader.getStreamBytes(stream);
stream.setData(new String(data).replace("NULA", "Nulo").getBytes());
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
}

Take a look at iText and this sample code

Take a look at aspose and this sample code

I've done this using LibreOffice Draw.
You start by manually opening a pdf in Draw, checking that it renders OK, and saving it as a Draw .odg file.
That's a zipped xml file, so you can modify it in code to find and replace the placeholders.
Next (from code) you use a command line call to Draw to generate the pdf.
Success!
The main issue is that Draw doesn't handle fonts embedded in a pdf. If the font isn't also installed on your system - then it will render oddly, as Draw will replace it with a standard one that inevitably has different sizing.
If this approach is of interest, I'll put together some shareable code.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Masking aadhaar number in an UIDAI aadhaar pdf - java

you can convert your PDF to an Image then with help of Google Cloud OCR and then mask text and convert it back to PDF. Here is Sample Project in javascript. https://github.com/rinkeshjain/aadhaar-mask

Related

Merging of different file format into single pdf using itext is giving corrupt pdf

Removing all embedded files in iTextSharp

Extracting an embedded object from a pdf

extracting one page from pdf file using iText

Editing PDF text using Java

Categories

Resources