Multi-page document scanning - java

I'm making a java program that scans document and saves it to pdf. It works like a charm for a single page document. I run bash command from java, create BufferedImage from InputStream and then build pdf document using itext.
Process p = Runtime.getRuntime().exec("scanimage --resolution=300 --format png --device-name " + device.getName());
BufferedImage bI = ImageIO.read(p.getInputStream());
The trouble begins when trying to scan multi-page document (batch scanning). Namely, I don't know what to do with resulting InputStream.
Process p = Runtime.getRuntime().exec("scanimage --batch --resolution=300 --format png --device-name " + device.getName());
I see a possible workaround, saving images to temporary files and then building pdf document using these files. However, I would like to avoid that. Is there a way to acquire array of BufferedImage from InputStream given by scanimage?

Related

PDF with forms to simple image PDF

How can I transform a PDF with forms made in Adobe Livecycle to a simple image PDF using Java?
I tried using Apache PDFBox but it can't save as image a PDF with forms.
This is what I tried(from this question: Convert PDF files to images with PDFBox)
String pdfFilename = "PDForm_1601661791_587488.pdf";
try (PDDocument document = PDDocument.load(new File(pdfFilename))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (page+1) + ".png", 300);
}
} catch (IOException ex) {
Logger.getLogger(StartClass.class.getName()).log(Level.SEVERE, null, ex);
}
But is not working, the result is an image where it writes that "The document you are trying to load requires Adobe Reader 8 or higher.
I guess is not possible, I tried many libraries and none worked.
This is how I solved the problem:
I used an external tool - PDFCreator.
In PDFCreator I created a special printer that prints and saves the PDF without asking any questions(you have these options in PDFCreator).
This is simple to reproduce in PDFCreator because in the Debug section you have an option to load a config file, so I have this file prepared, I just install PDFCreator and load the config file.
If you will use my INI file in the link above you should know that the resulted PDF is automatically saved in the folder: "current user folder/Desktop/temporary".
The rest of the job is done from Java using Adobe Reader, the code is in my case:
ProcessBuilder pb = new ProcessBuilder(adobePath, "/t", path+"/"+filename, printerName);
Process p = pb.start();
This code opens my PDF in AdobeReader, prints the PDF to the specified virtual printer, and exists automatically.
"adobePath" is the path to the adobe executable
path+"/"+filename is the path to my PDF.
"printerName" is the name of the virtual printer created in PDFCreator
So this is not a pure Java solution and in the future, I intend to use Apache PDFBox to generate my PDF's in a format that is compatible with browsers and all readers...but this works also.

Creating Thumbnail from video on S3 without downloading

There are some video files (mostly .mp4) stored in S3. They could be rather big. I need to get a thumbnail images for the video files - let's say 0.5 second's frame (to skip possible black screen etc.).
I can create the thumbnail if I download whole file but it's too long and I am trying to avoid this and download some minimal fragment.
I know how to download first N bytes from AWS S3 - request with specified range but the problem is the video file piece is corrupted and is not recognized as correct video.
I tried to emulate header bytes retrieving with the code
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class Test {
public static void main(String[] args) throws Exception {
try(FileInputStream fis = new FileInputStream("D://temp//1.mp4");
FileOutputStream fos = new FileOutputStream("D://temp//1_cut.mp4");
) {
byte[] buf=new byte[1000000];
fis.read(buf);
fos.write(buf);
fos.flush();
System.out.println("Done");
}
}
}
To work with static file but the result 1_cut.mp4 is not valid. Neither player can recognize it nor avconv library.
Is there any way to download just fragment of video file and create an image from the fragment?
Not sure if you need full java implementation but in case your file is accessible with direct or signed URL at S3 and you are ok to use ffmpeg than following should do the trick.
ffmpeg -i $amazon_s3_signed_url -ss 00:00:00.500 -vframes 1 thumbnail.png
You can use Amazon Java SDK to create a pre-signed URL and then execute the command above to create a thumbnail.
GeneratePresignedUrlRequest generatePresignedUrlRequest =
new GeneratePresignedUrlRequest(bucketName, objectKey);
generatePresignedUrlRequest.setMethod(HttpMethod.GET);
generatePresignedUrlRequest.setExpiration(expiration);
URL url = s3client.generatePresignedUrl(generatePresignedUrlRequest);
String urlString = url.toString();
Runtime run = Runtime.getRuntime();
Process proc = run.exec("ffmpeg -i " + urlString +" -ss 00:00:00.500 -vframes 1 thumbnail.png");
Your current approach just to download a random number of sequencial bytes would require you to do repair the partial file that you downloaded... a huge amount of work.
One alternative solution to your question above might be like that:
You would need to forward all Disk I/O read requests from your decoder to the S3 bucket. In case your decoder (avconv) supports reading from an inputstream, here is a good example how to override the read method:
How InputStream's read() method is implemented?
Another alternative is to use existing drivers that just let you access the S3 bucket as it was a local drive:
Windows: https://tntdrive.com/
Linux: https://tecadmin.net/mount-s3-bucket-centosrhel-ubuntu-using-s3fs/#

How to merge many pdfs

I want to ask how to merge more than 100k pdf files (file each pdf around 160 KB ) into 1 pdf file ?
Tutorial
I already read this tutorial, that code is working for few pdf. But when I tried for 10k pdf files I get this error "java.lang.OutOfMemoryError: GC overhead limit exceeded"
I already tried using -Xmx or -Xms, the error become "java heap space".
I am also using "pdf.flushCopiedObjects(firstSourcePdf);" it doesn't help. Or maybe I am using it incorrectly?
File file = new File(pathName);
File[] listFile = file.listFiles();
if (listFile == null) {
throw new Exception("File not Found at " + pathName);
}
Arrays.sort(listFile, 0, listFile.length - 1);
PdfADocument pdf = new PdfADocument(new PdfWriter(dest),
PdfAConformanceLevel.PDF_A_1A,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", null));
//Setting some required parameters
pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/A-1a example");
//Create PdfMerger instance
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
for (File filePdf : listFile) {
System.out.println("filePdf = " +filePdf.getName());
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(filePdf));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
pdf.flushCopiedObjects(firstSourcePdf);
firstSourcePdf.close();
}
pdf.close();
Thank You
This is a known issue when merging a large amount of PDF documents (or large PDFs).
iText will try to make the resulting PDF as small as possible. It does this by trying to reuse objects. For instance, if you have an image that occurs multiple times, in stead of embedding that image every time, it will embed it once and simply use a reference for the other occurrences.
That means iText has to keep all objects in memory, because there is no way of knowing beforehand whether an object will get reused.
A solution that usually helps is splitting the process in batches.
In stead of merging 1000 files into 1, try merging 1000 files in pairs (resulting in 500 documents) and then merge each of those in pairs (resulting in 250 documents) and so on.
That allows iText to flush the buffer regularly, which should stop the memory overhead from crashing the VM.
If it doesn't have to be iText, you could try using a command line application that supports merging of files. PDFtk, QPDF and HexaPDF CLI (note: I'm the author of HexaPDF) are some CLI tools that support basic PDF file merging.

How to convert .raw file to image file in android rooted device programmatically

I am trying to create a screenshot capturing app in android which capture screen shot of device screen by using adb command programmatically. I have tried every link from stack overflow and other sites but not much successful yet. can any body help me out here. I have followed this link Android take screenshot on rooted device to create the .raw file but now i am stuck on how to convert raw file to image in android.
Android cannot do that by itself. You need a external decoder. There are libraries which can decode, demosaicing, whitebalance etc and convert to RGB. This is a digital negative and needs a lot of image processing before its viewable in Android.
Check out dcraw, its a library for decoding all types of RAW-files.
if raw is image file then u can change extentention to jpg or png,
File ss = new File(ssDir, "ss.raw");
change this to
File ss = new File(ssDir, "ss.jpg");
but capturing screenshots require root permission
to execute su permission
File f = new File(Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_PICTURES), "screeshot.jpg");
Process localProcess = Runtime.getRuntime().exec("su");
OutputStream os = localProcess .getOutputStream();
//DataOutputStream dos = (DataOutputStream) localProcess.getOutputStream();
os.write(("/system/bin/screencap -p " + f.toString()).getBytes("ASCII"));
//dos.writeBytes("exit\n");
os.flush();
os.close();
localProcess .waitFor();

Unable to load image from the same package in Java?

In my java package, I have a file called 'prog.ico'. I'm trying to load this file, via the following code:
java.net.URL url = this.getClass().getResource("prog.ico");
java.awt.Image image = ImageIO.read( url );
System.out.println("image: " + image);
This gives the output:
image: null
What am I doing wrong? The .ico file exists in the same package as the class from which I'm running this code.
It seems that the .ico image format is not supported. See this question and it's answer to get around this.
To prevent link rot: This solution recommends using Image4J to process .ico files.
I've written a plugin for ImageIO that adds support for .ICO (MS Windows Icon) and .CUR (MS Windows Cursor) formats.
You can get it from GitHub here: https://github.com/haraldk/TwelveMonkeys/tree/master/imageio/imageio-ico
After you have it installed the plugin, you should be able to read your icon using the code in your original post.
I thing you must go over FileInputStream to wrap the file
File file = new File("prog.ico");
FileInputStream fis = new FileInputStream(file);
BufferedImage image = ImageIO.read(fis); //reading the image file

Categories

Resources