List pdf Attachments using PDFBox (Java)

List pdf Attachments using PDFBox (Java) - java

I'd like to get all filenames of attachments/embedded files of a PDF document. I've been searching for a long time now, but my code still doesn't work.
What I tried:
File input = new File(inputfile); // Input File Path, Given as param from args[]
pd = PDDocument.load(input);
PDDocumentNameDictionary names = new PDDocumentNameDictionary(pd.getDocumentCatalog());
PDEmbeddedFilesNameTreeNode efTree = names.getEmbeddedFiles();
Map<String, COSObjectable> existedNames = efTree.getNames();
System.out.println(existedNames);//Print Embedded-Filenames to console
pd.close();
I don't know if it is even possible to print the content of a MAP to console. I'm coding in eclipse which doesn't give me any errors. But when I run the jar File I get always: NullPointerException at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:778)
Any ideas or help? Many thanks...

Finally found a solution. For anyone with the same problem, the following code worked for me:
PDDocument pd;
File input = new File(inputfile); // Input File
pd = PDDocument.load(input);
//Writes all embedded Filenames (from pdf document) into Logfile
try{
PDDocumentCatalog catalog = pd.getDocumentCatalog();
PDDocumentNameDictionary names = catalog.getNames();
PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
Map<String, COSObjectable> embeddedFileNames = embeddedFiles.getNames();
//For-Each Loop is used to list all embedded files (if there is more than one)
for (Map.Entry<String, COSObjectable> entry : embeddedFileNames.entrySet())
{
//You might need to configure the logger first
logger.info("Inputfile: " + inputfile +"Found embedded File: " + entry.getKey() + ":");
}
}
catch (Exception e){
System.out.println("Document has no attachments. ");
}

Here's the ExtractEmbeddedFiles example from the source code download:
public final class ExtractEmbeddedFiles
{
private ExtractEmbeddedFiles()
{
}
/**
* This is the main method.
*
* #param args The command line arguments.
*
* #throws IOException If there is an error parsing the document.
*/
public static void main( String[] args ) throws IOException
{
if( args.length != 1 )
{
usage();
System.exit(1);
}
else
{
PDDocument document = null;
try
{
File pdfFile = new File(args[0]);
String filePath = pdfFile.getParent() + System.getProperty("file.separator");
document = PDDocument.load(pdfFile );
PDDocumentNameDictionary namesDictionary =
new PDDocumentNameDictionary( document.getDocumentCatalog() );
PDEmbeddedFilesNameTreeNode efTree = namesDictionary.getEmbeddedFiles();
if (efTree != null)
{
Map<String, PDComplexFileSpecification> names = efTree.getNames();
if (names != null)
{
extractFiles(names, filePath);
}
else
{
List<PDNameTreeNode<PDComplexFileSpecification>> kids = efTree.getKids();
for (PDNameTreeNode<PDComplexFileSpecification> node : kids)
{
names = node.getNames();
extractFiles(names, filePath);
}
}
}
// extract files from annotations
for (PDPage page : document.getPages())
{
for (PDAnnotation annotation : page.getAnnotations())
{
if (annotation instanceof PDAnnotationFileAttachment)
{
PDAnnotationFileAttachment annotationFileAttachment = (PDAnnotationFileAttachment) annotation;
PDComplexFileSpecification fileSpec = (PDComplexFileSpecification) annotationFileAttachment.getFile();
PDEmbeddedFile embeddedFile = getEmbeddedFile(fileSpec);
extractFile(filePath, fileSpec.getFilename(), embeddedFile);
}
}
}
}
finally
{
if( document != null )
{
document.close();
}
}
}
}
private static void extractFiles(Map<String, PDComplexFileSpecification> names, String filePath)
throws IOException
{
for (Entry<String, PDComplexFileSpecification> entry : names.entrySet())
{
String filename = entry.getKey();
PDComplexFileSpecification fileSpec = entry.getValue();
PDEmbeddedFile embeddedFile = getEmbeddedFile(fileSpec);
extractFile(filePath, filename, embeddedFile);
}
}
private static void extractFile(String filePath, String filename, PDEmbeddedFile embeddedFile)
throws IOException
{
String embeddedFilename = filePath + filename;
File file = new File(filePath + filename);
System.out.println("Writing " + embeddedFilename);
FileOutputStream fos = null;
try
{
fos = new FileOutputStream(file);
fos.write(embeddedFile.toByteArray());
}
finally
{
IOUtils.closeQuietly(fos);
}
}
private static PDEmbeddedFile getEmbeddedFile(PDComplexFileSpecification fileSpec )
{
// search for the first available alternative of the embedded file
PDEmbeddedFile embeddedFile = null;
if (fileSpec != null)
{
embeddedFile = fileSpec.getEmbeddedFileUnicode();
if (embeddedFile == null)
{
embeddedFile = fileSpec.getEmbeddedFileDos();
}
if (embeddedFile == null)
{
embeddedFile = fileSpec.getEmbeddedFileMac();
}
if (embeddedFile == null)
{
embeddedFile = fileSpec.getEmbeddedFileUnix();
}
if (embeddedFile == null)
{
embeddedFile = fileSpec.getEmbeddedFile();
}
}
return embeddedFile;
}
/**
* This will print the usage for this program.
*/
private static void usage()
{
System.err.println( "Usage: java " + ExtractEmbeddedFiles.class.getName() + " <input-pdf>" );
}
}

Related

Apache POI docx - after modified the target docx file, file corrupt

I had write a code by using Apache POI 3.6.
The code is use to insert paragraph ,table and image to one docx file, but as the paragraph, table and image is from different docx file, so I need to read content from different docx file then insert to the target file.
But I found the file is only readable at the first content insert, so is there any thing need to be changed?
My method defined like following:
public class DocDomGroupUtilImpl implements DocDomGroupUtil {
private FileInputStream fips;
private XWPFDocument document;
private FileOutputStream fops;
#Override
public void generateDomGroupFile(File templateFile, List<Item> items) {
// initial parameters
Map<String, String> parameters = new HashMap<String, String>();
for (Item item : items) {
parameters.put(item.getParaName(), item.getParaValue());
}
// get domGroup type
String templateFileName = templateFile.getName();
String type = templateFileName.substring(0, templateFileName.indexOf("."));
try {
fips = new FileInputStream(templateFile);
document = new XWPFDocument(fips);
// create tempt file for document domGroup, named like
// docDomGroup_<type>_<domName>.docx
String domGroupFilePath = CONSTAINTS.temptDomGroupPath + "docDomGroup_" + type + "_"
+ parameters.get("$DomGroupName_Value") + ".docx";
File domGroupFile = new File(domGroupFilePath);
if (domGroupFile.exists()) {
domGroupFile.delete();
}
domGroupFile.createNewFile();
fops = new FileOutputStream(domGroupFile, true);
// modified the groupName
// replace content
String regularExpression = "\\$(.*)_Value";
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs) {
List<XWPFRun> runs = paragraph.getRuns();
if (runs != null) {
for (XWPFRun run : runs) {
String text = run.getText(0);
if (text != null && Pattern.matches(regularExpression, text)) {
text = parameters.get(text);
run.setText(text, 0);
}
}
}
}
document.write(fops);
close();
// copy all the information from dom related files
File dir = new File(CONSTAINTS.temptDomPath);
for (File file : dir.listFiles()) {
if (!file.isDirectory()) {
fops = new FileOutputStream(domGroupFile, true);
fips = new FileInputStream(file);
document = new XWPFDocument(fips);
document.write(fops);
}
close();
}
// clean up tempt dom folder removeAllDomFile();
removeAllDomFile();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
close();
}
}
/**
* remove all the generated tempt dom files
*/
private void removeAllDomFile() {
// loop directory and delete tempt file
File dir = new File(CONSTAINTS.temptDomPath);
for (File file : dir.listFiles())
if (!file.isDirectory()) {
file.delete();
}
}
private void close() {
try {
fips.close();
fops.flush();
fops.close();
document.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

How to build a zip file with a size of 400 GB in java

I need to download all the documents from an alfresco site that contains 400GB of documents.
The code below is ok for create a small zip file (about 1GB) otherwise it takes too much memory.
I would not like to keep ZipOutputStream in memory, i would like to use memory only for every document copied to the Zip file or use a temporary file that is overwritten for each document.
What is the best practice for this kind of problem?
This piece of code is called from my main:
FolderImpl sitoFolder = (FolderImpl) cmisObject;
List<Tree<FileableCmisObject>> sitoFolderDescendants = sitoFolder.getDescendants(-1);
byte[] zipFile = createZipFILE(sitoFolderDescendants);
String rootPath = cartella_download_file;
File dir = new File(rootPath + File.separator);
if (!dir.exists()) {
dir.mkdirs();
}
Date date = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
String stringDate = sdf.format(date);
String nameZipFile = sitoFolder.getName().replaceAll("\\s","");
File serverFile = new File(dir.getAbsolutePath() + File.separator + stringDate+"_"+nameZipFile+".zip");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(new FileOutputStream(serverFile));
IOUtils.write(zipFile, bufferedOutputStream);
bufferedOutputStream.close();
//Returns the zip file
private byte[] createZipFILE(List<Tree<FileableCmisObject>> list) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ByteTransform byteTransform = new ByteTransform();
try {
ReportDocument reportDocument = new ReportDocument();
ZipOutputStream zos = new ZipOutputStream(baos);
for (Tree<FileableCmisObject> aList : list) {
traverseTree(aList, zos, reportDocument);
}
zos.close();
return baos.toByteArray();
} catch (IOException exc) {
reportLog.error(exc.getMessage());
} finally {
baos.close();
}
return new byte[0];
}
private void traverseTree(Tree<FileableCmisObject> tree, ZipOutputStream zos, ReportDocument reportDocument) {
for (int i=0; i<tree.getChildren().size(); i++) {
Tree<FileableCmisObject> child = tree.getChildren().get(i);
if (CmisUtil.isDocument(child.getItem())) {
Document document = (Document) child.getItem();
try {
addToZipFile(document, zos);
} catch (IOException ioExc) {
appLog.error(ioExc.getMessage());
}
} else if(CmisUtil.isFolder(child.getItem())) {
Folder folder = (Folder) child.getItem();
if (folder.getChildren().getTotalNumItems() == 0) {
try {
addToZipFolder(folder, zos);
} catch (IOException ioExc) {
appLog.error(ioExc.getMessage());
}
}
}
traverseTree(child, zos, reportDocument);
}
}
//Service method to add documents to the zip file
private void addToZipFile(Document document, ZipOutputStream zos) throws IOException {
InputStream inputStream = document.getContentStream().getStream();
String path = document.getPaths().get(0).replace(sito_export_path, "");
ZipEntry zipEntry = new ZipEntry(path);
zos.putNextEntry(zipEntry);
IOUtils.copy(inputStream, zos, 1024);
inputStream.close();
zos.closeEntry();
}
//Service method to add empty folder to the zip file
private void addToZipFolder(Folder folder, ZipOutputStream zos) throws IOException {
String path = folder.getPaths().get(0).replace(sito_export_path, "");
ZipEntry zipEntry = new ZipEntry(path.concat("/"));
zos.putNextEntry(zipEntry);
}

I solved it. I first created a directory on the server and then created the zip file on this directory directly.
The error was to save all the files first on: ByteArrayOutputStream and then on the zip file.
File serverFile = new File(dir.getAbsolutePath() + File.separator + stringDate+"_"+nameZipFile+".zip");
FileOutputStream fileOutputStream = new FileOutputStream(serverFile);
ZipArchiveOutputStream zos = new ZipArchiveOutputStream(fileOutputStream);
for (Tree<FileableCmisObject> aList : sitoFolderDescendants) {
traverseTree(aList, zos, reportDocument);
}
zos.close();
In the finally block I close the FileOutputStream.
Than I changed the services method using: ZipArchiveOutputStream and ZipArchiveEntry.
private void addToZipFolder(Folder folder, ZipArchiveOutputStream zos) throws IOException {
String path = folder.getPaths().get(0).replace(sito_export_path, "");
ZipArchiveEntry zipEntry = new ZipArchiveEntry(path.concat("/"));
appLog.info("aggiungo cartella vuota "+folder.getName()+" al file zip");
zos.putArchiveEntry(zipEntry);
zos.closeArchiveEntry();
}
private void addToZipFile(Document document, ZipArchiveOutputStream zos) throws IOException {
InputStream inputStream = document.getContentStream().getStream();
String path = document.getPaths().get(0).replace(sito_export_path, "");
ZipArchiveEntry entry = new ZipArchiveEntry(path);
entry.setSize(document.getContentStreamLength());
zos.putArchiveEntry(entry);
byte buffer[] = new byte[1024];
while (true) {
int nRead = inputStream.read(buffer, 0, buffer.length);
if (nRead <= 0) {
break;
}
zos.write(buffer, 0, nRead);
}
inputStream.close();
zos.closeArchiveEntry();
}

Actually i have create downlod as zip functionality for alfresco 3.4.d version and used following code.i have not checked it for GB's file because i don't have that much data.it may be help to you.
This is Java Backed WebScript.
/*
* this class create a zip file base on given(parameter) node
* */
public class ZipContents extends AbstractWebScript {
private static Log logger = LogFactory.getLog(ZipContents.class);
private static final int BUFFER_SIZE = 1024;
private static final String MIMETYPE_ZIP = "application/zip";
private static final String TEMP_FILE_PREFIX = "alf";
private static final String ZIP_EXTENSION = ".zip";
private ContentService contentService;
private NodeService nodeService;
private NamespaceService namespaceService;
private DictionaryService dictionaryService;
private StoreRef storeRef;
private String encoding;
public void setNodeService(NodeService nodeService) {
this.nodeService = nodeService;
}
public void setContentService(ContentService contentService) {
this.contentService = contentService;
}
public void setNamespaceService(NamespaceService namespaceService) {
this.namespaceService = namespaceService;
}
public void setDictionaryService(DictionaryService dictionaryService) {
this.dictionaryService = dictionaryService;
}
public void setStoreUrl(String url) {
this.storeRef = new StoreRef(url);
}
public void setEncoding(String encoding) {
this.encoding = encoding;
}
public void execute(WebScriptRequest req, WebScriptResponse res) throws IOException {
String nodes = req.getParameter("nodes");
if (nodes == null || nodes.length() == 0) {
throw new WebScriptException(HttpServletResponse.SC_BAD_REQUEST, "nodes");
}
List<String> nodeIds = new ArrayList<String>();
StringTokenizer tokenizer = new StringTokenizer(nodes, ",");
if (tokenizer.hasMoreTokens()) {
while (tokenizer.hasMoreTokens()) {
nodeIds.add(tokenizer.nextToken());
}
}
String filename = req.getParameter("filename");
if (filename == null || filename.length() == 0) {
throw new WebScriptException(HttpServletResponse.SC_BAD_REQUEST, "filename");
}
String noaccentStr = req.getParameter("noaccent");
if (noaccentStr == null || noaccentStr.length() == 0) {
throw new WebScriptException(HttpServletResponse.SC_BAD_REQUEST, "noaccent");
}
try {
res.setContentType(MIMETYPE_ZIP);
res.setHeader("Content-Transfer-Encoding", "binary");
res.addHeader("Content-Disposition", "attachment;filename=\"" + unAccent(filename) + ZIP_EXTENSION + "\"");
res.setHeader("Cache-Control", "must-revalidate, post-check=0, pre-check=0");
res.setHeader("Pragma", "public");
res.setHeader("Expires", "0");
createZipFile(nodeIds, res.getOutputStream(), new Boolean(noaccentStr));
} catch (RuntimeException e) {
throw new WebScriptException(HttpServletResponse.SC_BAD_REQUEST, e.getMessage());
}
}
public void createZipFile(List<String> nodeIds, OutputStream os, boolean noaccent) throws IOException {
File zip = null;
try {
if (nodeIds != null && !nodeIds.isEmpty()) {
zip = TempFileProvider.createTempFile(TEMP_FILE_PREFIX, ZIP_EXTENSION);
FileOutputStream stream = new FileOutputStream(zip);
CheckedOutputStream checksum = new CheckedOutputStream(stream, new Adler32());
BufferedOutputStream buff = new BufferedOutputStream(checksum);
ZipArchiveOutputStream out = new ZipArchiveOutputStream(buff);
out.setEncoding(encoding);
out.setMethod(ZipArchiveOutputStream.DEFLATED);
out.setLevel(Deflater.BEST_COMPRESSION);
if (logger.isDebugEnabled()) {
logger.debug("Using encoding '" + encoding + "' for zip file.");
}
try {
for (String nodeId : nodeIds) {
NodeRef node = new NodeRef(storeRef, nodeId);
addToZip(node, out, noaccent, "");
}
} catch (Exception e) {
logger.error(e.getMessage(), e);
throw new WebScriptException(HttpServletResponse.SC_BAD_REQUEST, e.getMessage());
} finally {
out.close();
buff.close();
checksum.close();
stream.close();
if (nodeIds.size() > 0) {
InputStream in = new FileInputStream(zip);
try {
byte[] buffer = new byte[BUFFER_SIZE];
int len;
while ((len = in.read(buffer)) > 0) {
os.write(buffer, 0, len);
}
} finally {
IOUtils.closeQuietly(in);
}
}
}
}
} catch (Exception e) {
logger.error(e.getMessage(), e);
throw new WebScriptException(HttpServletResponse.SC_BAD_REQUEST, e.getMessage());
} finally {
// try and delete the temporary file
if (zip != null) {
zip.delete();
}
}
}
public void addToZip(NodeRef node, ZipArchiveOutputStream out, boolean noaccent, String path) throws IOException {
QName nodeQnameType = this.nodeService.getType(node);
// Special case : links
if (this.dictionaryService.isSubClass(nodeQnameType, ApplicationModel.TYPE_FILELINK)) {
NodeRef linkDestinationNode = (NodeRef) nodeService.getProperty(node, ContentModel.PROP_LINK_DESTINATION);
if (linkDestinationNode == null) {
return;
}
// Duplicate entry: check if link is not in the same space of the
// link destination
if (nodeService.getPrimaryParent(node).getParentRef().equals(nodeService.getPrimaryParent(linkDestinationNode).getParentRef())) {
return;
}
nodeQnameType = this.nodeService.getType(linkDestinationNode);
node = linkDestinationNode;
}
String nodeName = (String) nodeService.getProperty(node, ContentModel.PROP_NAME);
nodeName = noaccent ? unAccent(nodeName) : nodeName;
if (this.dictionaryService.isSubClass(nodeQnameType, ContentModel.TYPE_CONTENT)) {
ContentReader reader = contentService.getReader(node, ContentModel.PROP_CONTENT);
if (reader != null) {
InputStream is = reader.getContentInputStream();
String filename = path.isEmpty() ? nodeName : path + '/' + nodeName;
ZipArchiveEntry entry = new ZipArchiveEntry(filename);
entry.setTime(((Date) nodeService.getProperty(node, ContentModel.PROP_MODIFIED)).getTime());
entry.setSize(reader.getSize());
out.putArchiveEntry(entry);
byte buffer[] = new byte[BUFFER_SIZE];
while (true) {
int nRead = is.read(buffer, 0, buffer.length);
if (nRead <= 0) {
break;
}
out.write(buffer, 0, nRead);
}
is.close();
out.closeArchiveEntry();
} else {
logger.warn("Could not read : " + nodeName + "content");
}
} else if (this.dictionaryService.isSubClass(nodeQnameType, ContentModel.TYPE_FOLDER)
&& !this.dictionaryService.isSubClass(nodeQnameType, ContentModel.TYPE_SYSTEM_FOLDER)) {
List<ChildAssociationRef> children = nodeService.getChildAssocs(node);
if (children.isEmpty()) {
String folderPath = path.isEmpty() ? nodeName + '/' : path + '/' + nodeName + '/';
ZipArchiveEntry entry = new ZipArchiveEntry(folderPath);
entry.setSize(0);
entry.setTime(((Date) nodeService.getProperty(node, ContentModel.PROP_MODIFIED)).getTime());
out.putArchiveEntry(entry);
out.closeArchiveEntry();
} else {
for (ChildAssociationRef childAssoc : children) {
NodeRef childNodeRef = childAssoc.getChildRef();
addToZip(childNodeRef, out, noaccent, path.isEmpty() ? nodeName : path + '/' + nodeName);
}
}
} else {
logger.info("Unmanaged type: " + nodeQnameType.getPrefixedQName(this.namespaceService) + ", filename: " + nodeName);
}
}
/**
* ZipEntry() does not convert filenames from Unicode to platform (waiting
* Java 7) http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244499
*
* #param s
* #return
*/
public static String unAccent(String s) {
String temp = Normalizer.normalize(s, Normalizer.NFD, 0);
return temp.replaceAll("[^\\p{ASCII}]", "");
}
}

TessAPI Error during processing. in Spring Mvc and tomcat 7

I am using Tesseract ocr in Spring MVC to convert some images in searchable pdf files.
I used tessApi with tessrenderresult that works perfectly fine in simple java project. But, it is not working in spring MVC project while I am deploying it in tomcat 7. I am getting error in line :
int result = api.TessBaseAPIProcessPages1(handle, imagepath, null, 0,
renderer);
I am getting following Exception
Error during processing.
Below is my code :
public class ImageToPDF {
private static String datapath = "F:/Projects Dev/Pfe Projects/GedWeb/.";
private static String language = "fra";
private static TessAPI api;
private static TessBaseAPI handle;
public ImageToPDF() {
super();
api = new TessDllAPIImpl().getInstance();
handle = api.TessBaseAPICreate();
}
public void finalize() throws Throwable {
super.finalize();
}
public void convert(String imagepath) {
File file = new File(imagepath);
String nomfile = FilenameUtils.removeExtension(file.getName());
String parentfolder = file.getParent();
String output = parentfolder + File.separator + nomfile + ".txt";
int set_only_init_params = FALSE;
int oem = TessOcrEngineMode.OEM_DEFAULT;
PointerByReference configs = null;
int configs_size = 0;
String[] params = { "load_system_dawg", "tessedit_char_whitelist" };
String vals[] = { "RelevÃ©", "" }; // 0123456789-.IThisalotfpnex
PointerByReference vars_vec = new PointerByReference();
vars_vec.setPointer(new StringArray(params));
PointerByReference vars_values = new PointerByReference();
vars_values.setPointer(new StringArray(vals));
NativeSize vars_vec_size = new NativeSize(params.length);
api.TessBaseAPISetOutputName(handle, output);
int rc = api.TessBaseAPIInit4(handle, datapath, language, oem, configs,
configs_size, vars_vec, vars_values, vars_vec_size,
set_only_init_params);
System.out.println(rc);
if (rc != 0) {
api.TessBaseAPIDelete(handle);
System.err.println("Could not initialize tesseract.");
return;
}
TessResultRenderer renderer = api.TessHOcrRendererCreate();
api.TessResultRendererInsert(renderer, api.TessBoxTextRendererCreate());
api.TessResultRendererInsert(renderer, api.TessTextRendererCreate());
String dataPath = api.TessBaseAPIGetDatapath(handle);
api.TessResultRendererInsert(renderer,
api.TessPDFRendererCreate(dataPath));
api.TessResultRendererBeginDocument(renderer, imagepath);
int result = api.TessBaseAPIProcessPages1(handle, imagepath, null, 0,
renderer);
api.TessResultRendererEndDocument(renderer);
System.out.println(handle.toString());
System.out.println(renderer.toString());
if (result == FALSE) {
System.err.println("Error during processing.");
return;
}
File file1 = new File(imagepath);
String parent = file1.getParent();
String nomfile1 = FilenameUtils.removeExtension(file1.getName());
String outputbase = parent + File.separator + nomfile1;
for (; renderer != null; renderer = api
.TessResultRendererNext(renderer)) {
String ext = api.TessResultRendererExtention(renderer).getString(0);
System.out
.println(String
.format("TessResultRendererExtention: %s\nTessResultRendererTitle: %s\nTessResultRendererImageNum: %d",
ext, api.TessResultRendererTitle(renderer)
.getString(0),
api.TessResultRendererImageNum(renderer)));
PointerByReference data = new PointerByReference();
IntByReference dataLength = new IntByReference();
result = api
.TessResultRendererGetOutput(renderer, data, dataLength);
if (result == TRUE) {
if (ext.equals("pdf")) {
int length = dataLength.getValue();
byte[] bytes = data.getValue().getByteArray(0, length);
try {
File file3 = new File(outputbase + "." + ext);
// create parent dirs when necessary
if (file3.getParentFile() != null) {
file3.getParentFile().mkdirs();
}
FileOutputStream bw = new FileOutputStream(
file3.getAbsoluteFile());
bw.write(bytes);
bw.close();
} catch (IOException e) {
e.printStackTrace();
}
} else {
String result1 = data.getValue().getString(0);
}
}
}
api.TessDeleteResultRenderer(renderer);
api.TessBaseAPIDelete(handle);
}
}
Any suggestion?

PRJ file not generated for null CoordinateReferenceSystem

I am trying to create a shape file from the the text file. To do this I used Geotools library.
It can create the shapefile, but I got a warning about the coordinate system that is null, which I'm setting!
The data are in UTM-wgs84-Zone 30N. First, I tried default Coordinate System (WGS84), then I used EPSG and decoded it. It returns null.
public static void main(String[] args) throws Exception {
File[] files = getFiles("C://ArtCSVFiles//");
for (int i=0;i<files.length;i++) {
String outputFilePath = "C://Art//" +files[i].getName()+".shp";
//String inputFilePath = "C://ParkTxtFiles//ParkCluster0Mp10Dist0.005.csv";
String inputFilePath = files[i].getAbsolutePath();
final SimpleFeatureType TYPE = DataUtilities.createType("Location", "location:Point"); // see createFeatureType();
FeatureCollection<SimpleFeatureType, SimpleFeature> collection = FeatureCollections.newCollection();
BufferedReader reader = new BufferedReader(new FileReader(files[i]));
try {
GeometryFactory factory = JTSFactoryFinder.getGeometryFactory(null);
for (String line = reader.readLine(); line != null; line = reader.readLine()) {
String split[] = line.split("\\,");
double longitude = Double.parseDouble(split[0]);
double latitude = Double.parseDouble(split[1]);
Point point = factory.createPoint(new Coordinate(longitude, latitude));
SimpleFeature feature = SimpleFeatureBuilder.build(TYPE, new Object[]{point}, null);
collection.add(feature);
}
} finally {
reader.close();
}
File newFile = getNewShapeFile(files[i], outputFilePath);
DataStoreFactorySpi factory = new ShapefileDataStoreFactory();
Map<String, Serializable> create = new HashMap<String, Serializable>();
create.put("url", newFile.toURI().toURL());
create.put("create spatial index", Boolean.TRUE);
ShapefileDataStore newDataStore = (ShapefileDataStore) factory.createNewDataStore(create);
newDataStore.createSchema(TYPE);
newDataStore.forceSchemaCRS(DefaultGeographicCRS.WGS84);
Transaction transaction = new DefaultTransaction("create");
String typeName = newDataStore.getTypeNames()[0];
FeatureStore<SimpleFeatureType, SimpleFeature> featureStore;
featureStore = (FeatureStore<SimpleFeatureType, SimpleFeature>)
newDataStore.getFeatureSource(typeName);
featureStore.setTransaction(transaction);
try {
featureStore.addFeatures(collection);
transaction.commit();
} catch (Exception problem) {
problem.printStackTrace();
transaction.rollback();
} finally {
transaction.close();
}
//System.exit(0); // we are actually exiting because we will use a Swing JFileChooser
}
}
public static File[] getFiles(String args) {
return new File(args).listFiles();
}
private static File getNewShapeFile(File file, String outputFilePath) {
String path = file.getAbsolutePath();
String newPath = path.substring(0,path.length()-4)+".shp";
File newFile = new File(outputFilePath);
if( newFile.equals( file )){
System.out.println("Cannot replace "+file);
System.exit(0);
}
return newFile;
}
private static File getCSVFile(String[] args) throws FileNotFoundException {
File file;
if (args.length == 0){
JFileChooser chooser = new JFileChooser();
chooser.setDialogTitle("Open CSV file");
chooser.setFileFilter( new FileFilter(){
public boolean accept( File f ) {
return f.isDirectory() || f.getPath().endsWith("csv") || f.getPath().endsWith("CSV");
}
public String getDescription() {
return "Comma Seperated Value";
}
});
int returnVal = chooser.showOpenDialog( null );
if(returnVal != JFileChooser.APPROVE_OPTION) {
System.exit(0);
}
file = chooser.getSelectedFile();
System.out.println("Opening CVS file: " + file.getName());
}
else {
file = new File( args[0] );
}
if (!file.exists()){
throw new FileNotFoundException( file.getAbsolutePath() );
}
return file;
}
/**
* Here is how you can use a SimpleFeatureType build to create
* the schema for your shapefile dynamically.
* <p>
* This method is an improvement on the origional example as we
* are specifying DefaultGeographicCRS.WGS84 and a maximum field length.
* <p>
* #return SimpleFeatureType
*/
static SimpleFeatureType createFeatureType() throws FactoryException {
SimpleFeatureTypeBuilder builder = new SimpleFeatureTypeBuilder();
builder.setName( "Location" );
CoordinateReferenceSystem crs = CRS.decode("EPSG:32630");
builder.setCRS(crs);
//add attributes in order
builder.add("Location", Point.class );
builder.length(15).add( "Name", String.class );
System.out.println(builder.crs(crs));
//build the type
final SimpleFeatureType LOCATION = builder.buildFeatureType();
return LOCATION;
}
}

Please change
final SimpleFeatureType TYPE = DataUtilities.createType("Location", "location:Point");
to
final SimpleFeatureType TYPE = DataUtilities.createFeatureType();
OR
pls Change
final SimpleFeatureType TYPE = DataUtilities.createType("Location", "location:Point");
to
final SimpleFeatureType TYPE = DataUtilities.createType("Location",
"location:Point:srid=4326," +
"name:String," +
"number:Integer"
);
and delete the method CreateFeatureType()

edit .doc file header java

I need to edit .doc & .docx files header and maintain the style of the document.
I tried doing it by using:
poi api : I managed to read the file header but couldn't find how to replace a text in it and save the result with the original style .
public static void mFix(String iFilePath , HashMap<String, String> iOldNewCouples)
{
aOldNewCouples = iOldNewCouples;
try {
if(iFilePath==null)
return;
File file = new File(iFilePath);
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
WordExtractor extractor = new WordExtractor(document); // read the doc as rtf
String fileData = extractor.getHeaderText();
String fileDataResult =fileData ;
for (Entry<String, String> entry : aOldNewCouples.entrySet())
{
if(fileData.contains(entry.getKey())) {
System.out.println("replace " +entry.getKey());
fileDataResult = fileData.replace(entry.getKey(), entry.getValue());
}
}
document.getHeaderStoryRange().replaceText(fileData, fileDataResult);
saveWord(iFilePath ,document);
fis.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace( );
}
}
private static void saveWord(String filePath, HWPFDocument doc) throws FileNotFoundException, IOException
{
FileOutputStream fileOutputStream = null;
try{
fileOutputStream = new FileOutputStream(new File(filePath.replace(".doc", "-test.doc")));
BufferedOutputStream buffOutputStream = new BufferedOutputStream(fileOutputStream);
doc.write(buffOutputStream);
buffOutputStream.close();
fileOutputStream.close();
}
finally{
if( fileOutputStream != null)
fileOutputStream.close();
}
}
I tried doc4j api for docx : I found how to edit the header but didn't found how to keep the style.
public static void mFix(String iFilePath , HashMap<String, String> iOldNewCouples) {
aOldNewCouples = iOldNewCouples;
WordprocessingMLPackage output;
try {
output = WordprocessingMLPackage.load(new java.io.File(iFilePath));
replaceText(output.getDocumentModel().getSections().get(0).getHeaderFooterPolicy().getDefaultHeader());
output.save(new File(iFilePath));
}
catch (Exception e) {
e.printStackTrace();
}
}
public static void replaceText(ContentAccessor c) throws Exception
{
for (Object p: c.getContent())
{
if (p instanceof ContentAccessor)
replaceText((ContentAccessor) p);
else if (p instanceof JAXBElement)
{
Object v = ((JAXBElement) p).getValue();
if (v instanceof ContentAccessor)
replaceText((ContentAccessor) v);
else if (v instanceof org.docx4j.wml.Text)
{
org.docx4j.wml.Text t = (org.docx4j.wml.Text) v;
String text = t.getValue();
if (text != null)
{
boolean flag = false;
for (Entry<String, String> entry : aOldNewCouples.entrySet())
{
if(text.contains(entry.getKey())) {
flag =true;
text = text.replaceAll(entry.getKey(), entry.getValue());
t.setSpace("preserve");
t.setValue(text);
}
}
}
}
}
}
}
I would like to have examples for those api.
If there is other free solution for this for Java projects , please write them with example.
thanks
Tami

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

List pdf Attachments using PDFBox (Java) - java

Related

Apache POI docx - after modified the target docx file, file corrupt

How to build a zip file with a size of 400 GB in java

TessAPI Error during processing. in Spring Mvc and tomcat 7

PRJ file not generated for null CoordinateReferenceSystem

edit .doc file header java

Categories

Resources