How to write extracted image from pdf to a file

How to write extracted image from pdf to a file - java

Hopefully this is simple.
I am using pdfbox to extract images from a pdf. I want to write the images to a folder. I don't seem to get any output (the folder has read and write privileges).
I am probably not writing the output stream properly I think.
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;
public final class JavaImgExtactor
{
public static void main(String[] args) throws IOException{
Stuff();
}
#SuppressWarnings("resource")
public static void Stuff() throws IOException{
File inFile = new File("/Users/sebastianzeki/Documents/Images Captured with Proc Data Audit.pdf");
PDDocument document = new PDDocument();
//document=null;
try {
document = PDDocument.load(inFile);
} catch (Exception e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
List pages = document.getDocumentCatalog().getAllPages();
Iterator iter = pages.iterator();
while (iter.hasNext()) {
PDPage page = (PDPage) iter.next();
System.out.println("page"+page);
PDResources resources = page.getResources();
Map pageImages = resources.getImages();
if (pageImages != null) {
Iterator imageIter = pageImages.keySet().iterator();
System.out.println("Success"+imageIter);
while (imageIter.hasNext()) {
String key = (String) imageIter.next();
PDXObjectImage image = (PDXObjectImage) pageImages.get(key);
FileOutputStream out = new FileOutputStream("/Users/sebastianzeki/Documents/ImgPDF.jpg");
try {
image.write2OutputStream(out);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
}
}

You are not closing the output stream, and the file name is always the same.
try (FileOutputStream out = new FileOutputStream("/Users/sebastianzeki/Documents/ImgPDF" + key + ".jpg") {
write2OutputStream(out);
} (Exception e) {
printStackTrace();
}
try-with-resources will automatically close out. Not sure whether key is usable as file name part.

image.write2OutputStream(out); writes the bytes from the image object to the out FileOutputStream object but it doesn't flush the buffer of out .
Add it should do the job :
out.flush();

Related

Cyrillic text coming from Document Properties is corrupt in PDF file in docx4j

I am trying to convert docx to pdf using docx4j 3.7.7.The issue is pdf is getting generated properly but the docpropery having cyrillic text is not coming up. It coming as #####. Normal paragraph with cyrillic text is getting generated properly. The issue is reproducible only in linux. In windows, docProperty is getting converted properly.
The file for testing can be found here
file
Below is the code :
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import org.docx4j.Docx4J;
import org.docx4j.convert.out.FOSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
public class TestRussian {
public static void main(String[] args) {
new TestRussian().convertWordToPdf();
}
public void convertWordToPdf() {
FileOutputStream fileOutputStream =null;
try {
File file = new File("Test1.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(file);
boolean checkViaFo = Docx4J.pdfViaFO();
FOSettings foSettings = Docx4J.createFOSettings();
fileOutputStream= new FileOutputStream("PDFRussian1.pdf");
foSettings.setWmlPackage(wordMLPackage);
//Getting error in update() during complex field update
//FieldUpdater updater = new FieldUpdater(wordMLPackage);
//updater.update(true);
Docx4J.toPDF(wordMLPackage,fileOutputStream);
System.out.println("Done");
} catch (Exception ex) {
} finally {
try {
if (fileOutputStream != null) {
fileOutputStream.close();
}
} catch (IOException e) {
}
}
}
}
I have read something about MERGEGORMAT & CHARFORMAT but didnt have much idea on that

Couldn't append the text onto a Google Drive File

I am trying to append text to a text file on the Google Drive. But when I write, it whole file is overwritten. Why can't I just add the text in the end of the file?
DriveFile file = Drive.DriveApi.getFile(mGoogleApiClient, id);
file.open(mGoogleApiClient, DriveFile.MODE_WRITE_ONLY, null).setResultCallback(new ResultCallback<DriveApi.DriveContentsResult>() {
#Override
public void onResult(DriveApi.DriveContentsResult driveContentsResult) {
msg.Log("ContentsOpenedCallBack");
if (!driveContentsResult.getStatus().isSuccess()) {
Log.i("Tag", "On Connected Error");
return;
}
final DriveContents driveContents = driveContentsResult.getDriveContents();
try {
msg.Log("onWrite");
OutputStream outputStream = driveContents.getOutputStream();
Writer writer = new OutputStreamWriter(outputStream);
writer.append(et.getText().toString());
writer.close();
driveContents.commit(mGoogleApiClient, null);
} catch (IOException e) {
e.printStackTrace();
}
}
});

Finally I've found the answer to append the text on the drive document.
DriveContents contents = driveContentsResult.getDriveContents();
try {
String input = et.getText().toString();
ParcelFileDescriptor parcelFileDescriptor = contents.getParcelFileDescriptor();
FileInputStream fileInputStream = new FileInputStream(parcelFileDescriptor
.getFileDescriptor());
// Read to the end of the file.
fileInputStream.read(new byte[fileInputStream.available()]);
// Append to the file.
FileOutputStream fileOutputStream = new FileOutputStream(parcelFileDescriptor
.getFileDescriptor());
Writer writer = new OutputStreamWriter(fileOutputStream);
writer.write("\n"+input);
writer.close();
driveContentsResult.getDriveContents().commit(mGoogleApiClient, null);
} catch (IOException e) {
e.printStackTrace();
}
SO

The reason is that commit's default resolution strategy is to overwrite existing files. Check the API docs and see if there is a way to append changes.

For anyone facing this problem in 2017 :
Google has some methods to append data Here's a link!
Though copying the method from google didn't worked entirely for me , so here is the class which would append data : ( Please note this is a modified version of this code link )
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import android.content.Context;
import android.content.SharedPreferences;
import android.os.Bundle;
import android.os.ParcelFileDescriptor;
import android.preference.PreferenceManager;
import android.util.Log;
import com.google.android.gms.common.api.Result;
import com.google.android.gms.common.api.ResultCallback;
import com.google.android.gms.drive.Drive;
import com.google.android.gms.drive.DriveApi.DriveContentsResult;
import com.google.android.gms.drive.DriveApi.DriveIdResult;
import com.google.android.gms.drive.DriveContents;
import com.google.android.gms.drive.DriveFile;
import com.google.android.gms.drive.DriveId;
/**
* An activity to illustrate how to edit contents of a Drive file.
*/
public class EditContentsActivity extends BaseDemoActivity {
private static final String TAG = "EditContentsActivity";
#Override
public void onConnected(Bundle connectionHint) {
super.onConnected(connectionHint);
final ResultCallback<DriveIdResult> idCallback = new ResultCallback<DriveIdResult>() {
#Override
public void onResult(DriveIdResult result) {
if (!result.getStatus().isSuccess()) {
showMessage("Cannot find DriveId. Are you authorized to view this file?");
return;
}
DriveId driveId = result.getDriveId();
DriveFile file = driveId.asDriveFile();
new EditContentsAsyncTask(EditContentsActivity.this).execute(file);
}
};
SharedPreferences sp= PreferenceManager.getDefaultSharedPreferences(EditContentsActivity.this);
Drive.DriveApi.fetchDriveId(getGoogleApiClient(), EXISTING_FILE_ID)
.setResultCallback(idCallback);
}
public class EditContentsAsyncTask extends ApiClientAsyncTask<DriveFile, Void, Boolean> {
public EditContentsAsyncTask(Context context) {
super(context);
}
#Override
protected Boolean doInBackgroundConnected(DriveFile... args) {
DriveFile file = args[0];
SharedPreferences sp=PreferenceManager.getDefaultSharedPreferences(EditContentsActivity.this);
System.out.println("0"+sp.getString("drive_id","1"));
DriveContentsResult driveContentsResult=file.open(getGoogleApiClient(), DriveFile.MODE_READ_WRITE, null).await();
System.out.println("1");
if (!driveContentsResult.getStatus().isSuccess()) {
return false;
}
DriveContents driveContents = driveContentsResult.getDriveContents();
try {
System.out.println("2");
ParcelFileDescriptor parcelFileDescriptor = driveContents.getParcelFileDescriptor();
FileInputStream fileInputStream = new FileInputStream(parcelFileDescriptor
.getFileDescriptor());
// Read to the end of the file.
fileInputStream.read(new byte[fileInputStream.available()]);
System.out.println("3");
// Append to the file.
FileOutputStream fileOutputStream = new FileOutputStream(parcelFileDescriptor
.getFileDescriptor());
Writer writer = new OutputStreamWriter(fileOutputStream);
writer.write("hello world");
writer.close();
System.out.println("4");
driveContents.commit(getGoogleApiClient(), null).await();
return true;
} catch (IOException e) {
e.printStackTrace();
}
return false;
};
#Override
protected void onPostExecute(Boolean result) {
if (!result) {
showMessage("Error while editing contents");
return;
}
showMessage("Successfully edited contents");
}
}
}
Existing_File_id is the resource id. Here is one link if you need resource id a link

see the content of a .bson file using java

I have a very large .bson file.
Now I have two question:
How can I see the content of that file? (I know it can do with "bsondump", but this command is slow, specialy for large database) (In fact I want to see the structure of that file)
How can I see the content of that file using java?

You can easily read/parse a bson file in Java using a BSONDecoder instance such as BasicBSONDecoder or DefaultBSONDecoder. These classes are included in mongo-java-driver.
Here's a simple example of a Java implementation of bsondump.
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import org.bson.BSONDecoder;
import org.bson.BSONObject;
import org.bson.BasicBSONDecoder;
public class BsonDump {
public void bsonDump(String filename) throws FileNotFoundException {
File file = new File(filename);
InputStream inputStream = new BufferedInputStream(new FileInputStream(file));
BSONDecoder decoder = new BasicBSONDecoder();
int count = 0;
try {
while (inputStream.available() > 0) {
BSONObject obj = decoder.readObject(inputStream);
if(obj == null){
break;
}
System.out.println(obj);
count++;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
try {
inputStream.close();
} catch (IOException e) {
}
}
System.err.println(String.format("%s objects read", count));
}
public static void main(String args[]) throws Exception {
if (args.length < 1) {
//TODO usage
throw new IllegalArgumentException("Expected <bson filename> argument");
}
String filename = args[0];
BsonDump bsonDump = new BsonDump();
bsonDump.bsonDump(filename);
}
}

How do i periodically flush the rows in excel using jxl while do the close at the last

Below is my code.
I first create headers in init method.
Then in pushData I fill the rows.
The problem is once I do write and flush in init method nothing else comes in my excel sheet.
The rows that I would be writing to excel could be huge. The idea of using flush is not free the memory periodically.
Please tell me what mistake I am doing here. How do i achieve what I intent?l
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import jxl.Workbook;
import jxl.WorkbookSettings;
import jxl.write.Label;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;
import jxl.write.WriteException;
import jxl.write.biff.RowsExceededException;
public class ExportTranscriptDetailsToExcel implements IExportToFormat {
private static final ILogger logger = ILabsPlatform.getLogger(ExportTranscriptDetailsToExcel.class.getName());
OutputStream outputStream;
List<String> labels;
WritableWorkbook workbook;
WritableSheet sheet0;
public ExportTranscriptDetailsToExcel() {
this.outputStream = null;
workbook = null;
sheet0 = null;
}
#Override
public void init(String sheetName, List<String> labels, OutputStream outputStream) throws IOException,
RowsExceededException, WriteException {
this.outputStream = outputStream;
this.labels = labels;
WorkbookSettings workbookSettings = new WorkbookSettings();
workbookSettings.setEncoding("Cp1252");
workbook = Workbook.createWorkbook(outputStream, workbookSettings);
sheet0 = workbook.createSheet(sheetName, 0);
for (int i = 0; i < labels.size(); ++i) {
Label label = new Label(i, 0, labels.get(i));
sheet0.addCell(label);
}
workbook.write();
outputStream.flush();
}
#Override
public void pushDataRows(Object listOfResultRow) {
if ((listOfResultRow == null)) {
return;
}
String fieldName = null;
String fieldValue = null;
#SuppressWarnings("unchecked")
ArrayList<Map<String, Object>> interactionMap = (ArrayList<Map<String, Object>>) listOfResultRow;
try {
int i = 1;// the data rows starts from row1
for (Map<String, Object> element : interactionMap) {
for (int j = 0; j < labels.size(); j++) {
fieldName = labels.get(j);
Object ob = element.get(fieldName);
if (ob != null) {
fieldValue = ob.toString();
}
if (fieldValue == null) {
fieldValue = "-";
}
System.out.println("***********************fieldName:" + fieldName);
System.out.println("***********************fieldValue:" + fieldValue);
Label label1 = new Label(j, i, fieldValue);
fieldValue = null;
sheet0.addCell(label1);
}
i++;
}
} catch (Exception e) {
System.out.println(e.getMessage());
}
try {
workbook.write();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
outputStream.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
#Override
public void done() {
try {
workbook.close();
} catch (WriteException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

As suggested here
To get around the memory problem, you can signal jxl to use temporary files when writing. This will write data to a temporary file during execution rather than storing it in memory.
You need to adjust your WorkbookSettings:
workbookSettings.setUseTemporaryFileDuringWrite(true);
workbookSettings.setTemporaryFileDuringWriteDirectory(new File("your_temporary_directory"));
Replace your_temporary_directory above with a temporary directory you prefer
Also note that this feature is available in jxl version >= 2.6.9

I think I have found the problem with my code. This is just out of hit and trial and I would certainly need someone to tell if I am correct.
In the init method, if I only do outputStream.flush() the I don't see a problem. I think doing a workbook.write() kind of closes the stream for any further writing.
So basically do outputStream.flush() everytime you want to flush out of memory.
Do workbook.write() and workbook.close() in the last

Convert pdf to byte[] and vice versa with pdfbox

I've read the documentation and the examples but I'm having a hard time putting it all together. I'm just trying to take a test pdf file and then convert it to a byte array then take the byte array and convert it back into a pdf file then create the pdf file onto disk.
It probably doesn't help much, but this is what I've got so far:
package javaapplication1;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import org.apache.pdfbox.cos.COSStream;
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
public class JavaApplication1 {
private COSStream stream;
public static void main(String[] args) {
try {
PDDocument in = PDDocument.load("C:\\Users\\Me\\Desktop\\JavaApplication1\\in\\Test.pdf");
byte[] pdfbytes = toByteArray(in);
PDDocument out;
} catch (Exception e) {
System.out.println(e);
}
}
private static byte[] toByteArray(PDDocument pdDoc) throws IOException, COSVisitorException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
pdDoc.save(out);
pdDoc.close();
} catch (Exception ex) {
System.out.println(ex);
}
return out.toByteArray();
}
public void PDStream(PDDocument document) {
stream = new COSStream(document.getDocument().getScratchFile());
}
}

You can use Apache commons, which is essential in any java project IMO.
Then you can use FileUtils's readFileToByteArray(File file) and writeByteArrayToFile(File file, byte[] data).
(here is commons-io, which is where FileUtils is: http://commons.apache.org/proper/commons-io/download_io.cgi )
For example, I just tried this here and it worked beautifully.
try {
File file = new File("/example/path/contract.pdf");
byte[] array = FileUtils.readFileToByteArray(file);
FileUtils.writeByteArrayToFile(new File("/example/path/contract2.pdf"), array);
} catch (IOException e) {
e.printStackTrace();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to write extracted image from pdf to a file - java

image.write2OutputStream(out); writes the bytes from the image object to the out FileOutputStream object but it doesn't flush the buffer of out . Add it should do the job : out.flush();

Related

Cyrillic text coming from Document Properties is corrupt in PDF file in docx4j

Couldn't append the text onto a Google Drive File

see the content of a .bson file using java

How do i periodically flush the rows in excel using jxl while do the close at the last

Convert pdf to byte[] and vice versa with pdfbox

Categories

Resources