get Horizontal Text with PDFBox [duplicate]

get Horizontal Text with PDFBox [duplicate] - java

I am using PDFBox to do a simple extraction of words from a PDF file. Then it inserts those words to a table in database. From what I have tested, a 90 degrees clockwise rotated text in PDF will gives gibberish result when I tried to extract the words.
For example, database in the file will yield atabase and also database itself as two different words. Obviously, atabase does not exist in the PDF file.
I tried converting the original file to be rotated upright and do the extraction and it works perfectly as expected. I understand this could be a limitation of the PDFBox itself.
So, in the case of someone trying to index a rotated PDF file, is there a way to tackle this?
Code snippet ( just for reference) :
String lines[] = text.split("\\r?\\n");
for (String line : lines) {
String[] words = line.split(" ");
System.out.println("Line: " + line);
preparedStatement = con1.prepareStatement(sql);
int i=0;
for (String word : words) {
// check if one or more special characters at end of string then remove OR
// check special characters in beginning of the string then remove
// insert every word directly to table db
word = word.replaceAll("([\\W]+$)|(^[\\W]+)", "");
preparedStatement.setString(1, path1);
preparedStatement.setString(2, word);
System.out.println("Token: " +word);
preparedStatement.executeUpdate();
}
}
preparedStatement.close();
}

This is the PDFBox ExtractText command line utility, which can detect rotations since 2.0.13 (PDFBOX-4371). (That release had a bug with type 3 fonts, which was fixed (PDFBOX-4390) in the repository and in this code, and is in 2.0.14). Later code may have been improved since then. The current 2.0.* source can be found here.
To extract text from rotated files, use the "rotationMagic" setting. This setting first detects the angle of every glyph, collects these angles (AngleCollector), and in a second pass it does an extraction for every angle while discarding the rest (FilteredTextStripper). The order of extraction is by angle, which may or may not make sense if there are several different angles in a page.
The PDF is modified while extracting, so don't use this on documents you are saving.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.pdfbox.tools;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.pdfbox.cos.COSArray;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.io.IOUtils;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDDocumentNameDictionary;
import org.apache.pdfbox.pdmodel.PDEmbeddedFilesNameTreeNode;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.filespecification.PDComplexFileSpecification;
import org.apache.pdfbox.pdmodel.common.filespecification.PDEmbeddedFile;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import org.apache.pdfbox.util.Matrix;
/**
* This is the main program that simply parses the pdf document and transforms it
* into text.
*
* #author Ben Litchfield
* #author Tilman Hausherr
*/
public final class ExtractText
{
private static final Log LOG = LogFactory.getLog(ExtractText.class);
private static final String PASSWORD = "-password";
private static final String ENCODING = "-encoding";
private static final String CONSOLE = "-console";
private static final String START_PAGE = "-startPage";
private static final String END_PAGE = "-endPage";
private static final String SORT = "-sort";
private static final String IGNORE_BEADS = "-ignoreBeads";
private static final String DEBUG = "-debug";
private static final String HTML = "-html";
private static final String ALWAYSNEXT = "-alwaysNext";
private static final String ROTATION_MAGIC = "-rotationMagic";
private static final String STD_ENCODING = "UTF-8";
/*
* debug flag
*/
private boolean debug = false;
/**
* private constructor.
*/
private ExtractText()
{
//static class
}
/**
* Infamous main method.
*
* #param args Command line arguments, should be one and a reference to a file.
*
* #throws IOException if there is an error reading the document or extracting the text.
*/
public static void main( String[] args ) throws IOException
{
// suppress the Dock icon on OS X
System.setProperty("apple.awt.UIElement", "true");
ExtractText extractor = new ExtractText();
extractor.startExtraction(args);
}
/**
* Starts the text extraction.
*
* #param args the commandline arguments.
* #throws IOException if there is an error reading the document or extracting the text.
*/
public void startExtraction( String[] args ) throws IOException
{
boolean toConsole = false;
boolean toHTML = false;
boolean sort = false;
boolean separateBeads = true;
boolean alwaysNext = false;
boolean rotationMagic = false;
String password = "";
String encoding = STD_ENCODING;
String pdfFile = null;
String outputFile = null;
// Defaults to text files
String ext = ".txt";
int startPage = 1;
int endPage = Integer.MAX_VALUE;
for( int i=0; i<args.length; i++ )
{
if( args[i].equals( PASSWORD ) )
{
i++;
if( i >= args.length )
{
usage();
}
password = args[i];
}
else if( args[i].equals( ENCODING ) )
{
i++;
if( i >= args.length )
{
usage();
}
encoding = args[i];
}
else if( args[i].equals( START_PAGE ) )
{
i++;
if( i >= args.length )
{
usage();
}
startPage = Integer.parseInt( args[i] );
}
else if( args[i].equals( HTML ) )
{
toHTML = true;
ext = ".html";
}
else if( args[i].equals( SORT ) )
{
sort = true;
}
else if( args[i].equals( IGNORE_BEADS ) )
{
separateBeads = false;
}
else if (args[i].equals(ALWAYSNEXT))
{
alwaysNext = true;
}
else if (args[i].equals(ROTATION_MAGIC))
{
rotationMagic = true;
}
else if( args[i].equals( DEBUG ) )
{
debug = true;
}
else if( args[i].equals( END_PAGE ) )
{
i++;
if( i >= args.length )
{
usage();
}
endPage = Integer.parseInt( args[i] );
}
else if( args[i].equals( CONSOLE ) )
{
toConsole = true;
}
else
{
if( pdfFile == null )
{
pdfFile = args[i];
}
else
{
outputFile = args[i];
}
}
}
if( pdfFile == null )
{
usage();
}
else
{
Writer output = null;
PDDocument document = null;
try
{
long startTime = startProcessing("Loading PDF "+pdfFile);
if( outputFile == null && pdfFile.length() >4 )
{
outputFile = new File( pdfFile.substring( 0, pdfFile.length() -4 ) + ext ).getAbsolutePath();
}
document = PDDocument.load(new File( pdfFile ), password);
AccessPermission ap = document.getCurrentAccessPermission();
if( ! ap.canExtractContent() )
{
throw new IOException( "You do not have permission to extract text" );
}
stopProcessing("Time for loading: ", startTime);
if( toConsole )
{
output = new OutputStreamWriter( System.out, encoding );
}
else
{
if (toHTML && !STD_ENCODING.equals(encoding))
{
encoding = STD_ENCODING;
System.out.println("The encoding parameter is ignored when writing html output.");
}
output = new OutputStreamWriter( new FileOutputStream( outputFile ), encoding );
}
startTime = startProcessing("Starting text extraction");
if (debug)
{
System.err.println("Writing to " + outputFile);
}
PDFTextStripper stripper;
if(toHTML)
{
// HTML stripper can't work page by page because of startDocument() callback
stripper = new PDFText2HTML();
stripper.setSortByPosition(sort);
stripper.setShouldSeparateByBeads(separateBeads);
stripper.setStartPage(startPage);
stripper.setEndPage(endPage);
// Extract text for main document:
stripper.writeText(document, output);
}
else
{
if (rotationMagic)
{
stripper = new FilteredTextStripper();
}
else
{
stripper = new PDFTextStripper();
}
stripper.setSortByPosition(sort);
stripper.setShouldSeparateByBeads(separateBeads);
// Extract text for main document:
extractPages(startPage, Math.min(endPage, document.getNumberOfPages()),
stripper, document, output, rotationMagic, alwaysNext);
}
// ... also for any embedded PDFs:
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDDocumentNameDictionary names = catalog.getNames();
if (names != null)
{
PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
if (embeddedFiles != null)
{
Map<String, PDComplexFileSpecification> embeddedFileNames = embeddedFiles.getNames();
if (embeddedFileNames != null)
{
for (Map.Entry<String, PDComplexFileSpecification> ent : embeddedFileNames.entrySet())
{
if (debug)
{
System.err.println("Processing embedded file " + ent.getKey() + ":");
}
PDComplexFileSpecification spec = ent.getValue();
PDEmbeddedFile file = spec.getEmbeddedFile();
if (file != null && "application/pdf".equals(file.getSubtype()))
{
if (debug)
{
System.err.println(" is PDF (size=" + file.getSize() + ")");
}
InputStream fis = file.createInputStream();
PDDocument subDoc = null;
try
{
subDoc = PDDocument.load(fis);
if (toHTML)
{
// will not really work because of HTML header + footer
stripper.writeText( subDoc, output );
}
else
{
extractPages(1, subDoc.getNumberOfPages(),
stripper, subDoc, output, rotationMagic, alwaysNext);
}
}
finally
{
fis.close();
IOUtils.closeQuietly(subDoc);
}
}
}
}
}
}
stopProcessing("Time for extraction: ", startTime);
}
finally
{
IOUtils.closeQuietly(output);
IOUtils.closeQuietly(document);
}
}
}
private void extractPages(int startPage, int endPage,
PDFTextStripper stripper, PDDocument document, Writer output,
boolean rotationMagic, boolean alwaysNext) throws IOException
{
for (int p = startPage; p <= endPage; ++p)
{
stripper.setStartPage(p);
stripper.setEndPage(p);
try
{
if (rotationMagic)
{
PDPage page = document.getPage(p - 1);
int rotation = page.getRotation();
page.setRotation(0);
AngleCollector angleCollector = new AngleCollector();
angleCollector.setStartPage(p);
angleCollector.setEndPage(p);
angleCollector.writeText(document, new NullWriter());
// rotation magic
for (int angle : angleCollector.getAngles())
{
// prepend a transformation
// (we could skip these parts for angle 0, but it doesn't matter much)
PDPageContentStream cs = new PDPageContentStream(document, page,
PDPageContentStream.AppendMode.PREPEND, false);
cs.transform(Matrix.getRotateInstance(-Math.toRadians(angle), 0, 0));
cs.close();
stripper.writeText(document, output);
// remove prepended transformation
((COSArray) page.getCOSObject().getItem(COSName.CONTENTS)).remove(0);
}
page.setRotation(rotation);
}
else
{
stripper.writeText(document, output);
}
}
catch (IOException ex)
{
if (!alwaysNext)
{
throw ex;
}
LOG.error("Failed to process page " + p, ex);
}
}
}
private long startProcessing(String message)
{
if (debug)
{
System.err.println(message);
}
return System.currentTimeMillis();
}
private void stopProcessing(String message, long startTime)
{
if (debug)
{
long stopTime = System.currentTimeMillis();
float elapsedTime = ((float)(stopTime - startTime))/1000;
System.err.println(message + elapsedTime + " seconds");
}
}
/**
* This will print the usage requirements and exit.
*/
private static void usage()
{
String message = "Usage: java -jar pdfbox-app-x.y.z.jar ExtractText [options] <inputfile> [output-text-file]\n"
+ "\nOptions:\n"
+ " -password <password> : Password to decrypt document\n"
+ " -encoding <output encoding> : UTF-8 (default) or ISO-8859-1, UTF-16BE,\n"
+ " UTF-16LE, etc.\n"
+ " -console : Send text to console instead of file\n"
+ " -html : Output in HTML format instead of raw text\n"
+ " -sort : Sort the text before writing\n"
+ " -ignoreBeads : Disables the separation by beads\n"
+ " -debug : Enables debug output about the time consumption\n"
+ " of every stage\n"
+ " -alwaysNext : Process next page (if applicable) despite\n"
+ " IOException (ignored when -html)\n"
+ " -rotationMagic : Analyze each page for rotated/skewed text,\n"
+ " rotate to 0° and extract separately\n"
+ " (slower, and ignored when -html)\n"
+ " -startPage <number> : The first page to start extraction (1 based)\n"
+ " -endPage <number> : The last page to extract (1 based, inclusive)\n"
+ " <inputfile> : The PDF document to use\n"
+ " [output-text-file] : The file to write the text to";
System.err.println(message);
System.exit( 1 );
}
}
/**
* Collect all angles while doing text extraction. Angles are in degrees and rounded to the closest
* integer (to avoid slight differences from floating point arithmethic resulting in similarly
* angled glyphs being treated separately). This class must be constructed for each page so that the
* angle set is initialized.
*/
class AngleCollector extends PDFTextStripper
{
private final Set<Integer> angles = new TreeSet<Integer>();
AngleCollector() throws IOException
{
}
Set<Integer> getAngles()
{
return angles;
}
#Override
protected void processTextPosition(TextPosition text)
{
Matrix m = text.getTextMatrix();
m.concatenate(text.getFont().getFontMatrix());
int angle = (int) Math.round(Math.toDegrees(Math.atan2(m.getShearY(), m.getScaleY())));
angle = (angle + 360) % 360;
angles.add(angle);
}
}
/**
* TextStripper that only processes glyphs that have angle 0.
*/
class FilteredTextStripper extends PDFTextStripper
{
FilteredTextStripper() throws IOException
{
}
#Override
protected void processTextPosition(TextPosition text)
{
Matrix m = text.getTextMatrix();
m.concatenate(text.getFont().getFontMatrix());
int angle = (int) Math.round(Math.toDegrees(Math.atan2(m.getShearY(), m.getScaleY())));
if (angle == 0)
{
super.processTextPosition(text);
}
}
}
/**
* Dummy output.
*/
class NullWriter extends Writer
{
#Override
public void write(char[] cbuf, int off, int len) throws IOException
{
// do nothing
}
#Override
public void flush() throws IOException
{
// do nothing
}
#Override
public void close() throws IOException
{
// do nothing
}
}

Related

java- rotated file extraction?

I am using PDFBox to do a simple extraction of words from a PDF file. Then it inserts those words to a table in database. From what I have tested, a 90 degrees clockwise rotated text in PDF will gives gibberish result when I tried to extract the words.
For example, database in the file will yield atabase and also database itself as two different words. Obviously, atabase does not exist in the PDF file.
I tried converting the original file to be rotated upright and do the extraction and it works perfectly as expected. I understand this could be a limitation of the PDFBox itself.
So, in the case of someone trying to index a rotated PDF file, is there a way to tackle this?
Code snippet ( just for reference) :
String lines[] = text.split("\\r?\\n");
for (String line : lines) {
String[] words = line.split(" ");
System.out.println("Line: " + line);
preparedStatement = con1.prepareStatement(sql);
int i=0;
for (String word : words) {
// check if one or more special characters at end of string then remove OR
// check special characters in beginning of the string then remove
// insert every word directly to table db
word = word.replaceAll("([\\W]+$)|(^[\\W]+)", "");
preparedStatement.setString(1, path1);
preparedStatement.setString(2, word);
System.out.println("Token: " +word);
preparedStatement.executeUpdate();
}
}
preparedStatement.close();
}

This is the PDFBox ExtractText command line utility, which can detect rotations since 2.0.13 (PDFBOX-4371). (That release had a bug with type 3 fonts, which was fixed (PDFBOX-4390) in the repository and in this code, and is in 2.0.14). Later code may have been improved since then. The current 2.0.* source can be found here.
To extract text from rotated files, use the "rotationMagic" setting. This setting first detects the angle of every glyph, collects these angles (AngleCollector), and in a second pass it does an extraction for every angle while discarding the rest (FilteredTextStripper). The order of extraction is by angle, which may or may not make sense if there are several different angles in a page.
The PDF is modified while extracting, so don't use this on documents you are saving.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.pdfbox.tools;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.pdfbox.cos.COSArray;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.io.IOUtils;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDDocumentNameDictionary;
import org.apache.pdfbox.pdmodel.PDEmbeddedFilesNameTreeNode;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.filespecification.PDComplexFileSpecification;
import org.apache.pdfbox.pdmodel.common.filespecification.PDEmbeddedFile;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import org.apache.pdfbox.util.Matrix;
/**
* This is the main program that simply parses the pdf document and transforms it
* into text.
*
* #author Ben Litchfield
* #author Tilman Hausherr
*/
public final class ExtractText
{
private static final Log LOG = LogFactory.getLog(ExtractText.class);
private static final String PASSWORD = "-password";
private static final String ENCODING = "-encoding";
private static final String CONSOLE = "-console";
private static final String START_PAGE = "-startPage";
private static final String END_PAGE = "-endPage";
private static final String SORT = "-sort";
private static final String IGNORE_BEADS = "-ignoreBeads";
private static final String DEBUG = "-debug";
private static final String HTML = "-html";
private static final String ALWAYSNEXT = "-alwaysNext";
private static final String ROTATION_MAGIC = "-rotationMagic";
private static final String STD_ENCODING = "UTF-8";
/*
* debug flag
*/
private boolean debug = false;
/**
* private constructor.
*/
private ExtractText()
{
//static class
}
/**
* Infamous main method.
*
* #param args Command line arguments, should be one and a reference to a file.
*
* #throws IOException if there is an error reading the document or extracting the text.
*/
public static void main( String[] args ) throws IOException
{
// suppress the Dock icon on OS X
System.setProperty("apple.awt.UIElement", "true");
ExtractText extractor = new ExtractText();
extractor.startExtraction(args);
}
/**
* Starts the text extraction.
*
* #param args the commandline arguments.
* #throws IOException if there is an error reading the document or extracting the text.
*/
public void startExtraction( String[] args ) throws IOException
{
boolean toConsole = false;
boolean toHTML = false;
boolean sort = false;
boolean separateBeads = true;
boolean alwaysNext = false;
boolean rotationMagic = false;
String password = "";
String encoding = STD_ENCODING;
String pdfFile = null;
String outputFile = null;
// Defaults to text files
String ext = ".txt";
int startPage = 1;
int endPage = Integer.MAX_VALUE;
for( int i=0; i<args.length; i++ )
{
if( args[i].equals( PASSWORD ) )
{
i++;
if( i >= args.length )
{
usage();
}
password = args[i];
}
else if( args[i].equals( ENCODING ) )
{
i++;
if( i >= args.length )
{
usage();
}
encoding = args[i];
}
else if( args[i].equals( START_PAGE ) )
{
i++;
if( i >= args.length )
{
usage();
}
startPage = Integer.parseInt( args[i] );
}
else if( args[i].equals( HTML ) )
{
toHTML = true;
ext = ".html";
}
else if( args[i].equals( SORT ) )
{
sort = true;
}
else if( args[i].equals( IGNORE_BEADS ) )
{
separateBeads = false;
}
else if (args[i].equals(ALWAYSNEXT))
{
alwaysNext = true;
}
else if (args[i].equals(ROTATION_MAGIC))
{
rotationMagic = true;
}
else if( args[i].equals( DEBUG ) )
{
debug = true;
}
else if( args[i].equals( END_PAGE ) )
{
i++;
if( i >= args.length )
{
usage();
}
endPage = Integer.parseInt( args[i] );
}
else if( args[i].equals( CONSOLE ) )
{
toConsole = true;
}
else
{
if( pdfFile == null )
{
pdfFile = args[i];
}
else
{
outputFile = args[i];
}
}
}
if( pdfFile == null )
{
usage();
}
else
{
Writer output = null;
PDDocument document = null;
try
{
long startTime = startProcessing("Loading PDF "+pdfFile);
if( outputFile == null && pdfFile.length() >4 )
{
outputFile = new File( pdfFile.substring( 0, pdfFile.length() -4 ) + ext ).getAbsolutePath();
}
document = PDDocument.load(new File( pdfFile ), password);
AccessPermission ap = document.getCurrentAccessPermission();
if( ! ap.canExtractContent() )
{
throw new IOException( "You do not have permission to extract text" );
}
stopProcessing("Time for loading: ", startTime);
if( toConsole )
{
output = new OutputStreamWriter( System.out, encoding );
}
else
{
if (toHTML && !STD_ENCODING.equals(encoding))
{
encoding = STD_ENCODING;
System.out.println("The encoding parameter is ignored when writing html output.");
}
output = new OutputStreamWriter( new FileOutputStream( outputFile ), encoding );
}
startTime = startProcessing("Starting text extraction");
if (debug)
{
System.err.println("Writing to " + outputFile);
}
PDFTextStripper stripper;
if(toHTML)
{
// HTML stripper can't work page by page because of startDocument() callback
stripper = new PDFText2HTML();
stripper.setSortByPosition(sort);
stripper.setShouldSeparateByBeads(separateBeads);
stripper.setStartPage(startPage);
stripper.setEndPage(endPage);
// Extract text for main document:
stripper.writeText(document, output);
}
else
{
if (rotationMagic)
{
stripper = new FilteredTextStripper();
}
else
{
stripper = new PDFTextStripper();
}
stripper.setSortByPosition(sort);
stripper.setShouldSeparateByBeads(separateBeads);
// Extract text for main document:
extractPages(startPage, Math.min(endPage, document.getNumberOfPages()),
stripper, document, output, rotationMagic, alwaysNext);
}
// ... also for any embedded PDFs:
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDDocumentNameDictionary names = catalog.getNames();
if (names != null)
{
PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
if (embeddedFiles != null)
{
Map<String, PDComplexFileSpecification> embeddedFileNames = embeddedFiles.getNames();
if (embeddedFileNames != null)
{
for (Map.Entry<String, PDComplexFileSpecification> ent : embeddedFileNames.entrySet())
{
if (debug)
{
System.err.println("Processing embedded file " + ent.getKey() + ":");
}
PDComplexFileSpecification spec = ent.getValue();
PDEmbeddedFile file = spec.getEmbeddedFile();
if (file != null && "application/pdf".equals(file.getSubtype()))
{
if (debug)
{
System.err.println(" is PDF (size=" + file.getSize() + ")");
}
InputStream fis = file.createInputStream();
PDDocument subDoc = null;
try
{
subDoc = PDDocument.load(fis);
if (toHTML)
{
// will not really work because of HTML header + footer
stripper.writeText( subDoc, output );
}
else
{
extractPages(1, subDoc.getNumberOfPages(),
stripper, subDoc, output, rotationMagic, alwaysNext);
}
}
finally
{
fis.close();
IOUtils.closeQuietly(subDoc);
}
}
}
}
}
}
stopProcessing("Time for extraction: ", startTime);
}
finally
{
IOUtils.closeQuietly(output);
IOUtils.closeQuietly(document);
}
}
}
private void extractPages(int startPage, int endPage,
PDFTextStripper stripper, PDDocument document, Writer output,
boolean rotationMagic, boolean alwaysNext) throws IOException
{
for (int p = startPage; p <= endPage; ++p)
{
stripper.setStartPage(p);
stripper.setEndPage(p);
try
{
if (rotationMagic)
{
PDPage page = document.getPage(p - 1);
int rotation = page.getRotation();
page.setRotation(0);
AngleCollector angleCollector = new AngleCollector();
angleCollector.setStartPage(p);
angleCollector.setEndPage(p);
angleCollector.writeText(document, new NullWriter());
// rotation magic
for (int angle : angleCollector.getAngles())
{
// prepend a transformation
// (we could skip these parts for angle 0, but it doesn't matter much)
PDPageContentStream cs = new PDPageContentStream(document, page,
PDPageContentStream.AppendMode.PREPEND, false);
cs.transform(Matrix.getRotateInstance(-Math.toRadians(angle), 0, 0));
cs.close();
stripper.writeText(document, output);
// remove prepended transformation
((COSArray) page.getCOSObject().getItem(COSName.CONTENTS)).remove(0);
}
page.setRotation(rotation);
}
else
{
stripper.writeText(document, output);
}
}
catch (IOException ex)
{
if (!alwaysNext)
{
throw ex;
}
LOG.error("Failed to process page " + p, ex);
}
}
}
private long startProcessing(String message)
{
if (debug)
{
System.err.println(message);
}
return System.currentTimeMillis();
}
private void stopProcessing(String message, long startTime)
{
if (debug)
{
long stopTime = System.currentTimeMillis();
float elapsedTime = ((float)(stopTime - startTime))/1000;
System.err.println(message + elapsedTime + " seconds");
}
}
/**
* This will print the usage requirements and exit.
*/
private static void usage()
{
String message = "Usage: java -jar pdfbox-app-x.y.z.jar ExtractText [options] <inputfile> [output-text-file]\n"
+ "\nOptions:\n"
+ " -password <password> : Password to decrypt document\n"
+ " -encoding <output encoding> : UTF-8 (default) or ISO-8859-1, UTF-16BE,\n"
+ " UTF-16LE, etc.\n"
+ " -console : Send text to console instead of file\n"
+ " -html : Output in HTML format instead of raw text\n"
+ " -sort : Sort the text before writing\n"
+ " -ignoreBeads : Disables the separation by beads\n"
+ " -debug : Enables debug output about the time consumption\n"
+ " of every stage\n"
+ " -alwaysNext : Process next page (if applicable) despite\n"
+ " IOException (ignored when -html)\n"
+ " -rotationMagic : Analyze each page for rotated/skewed text,\n"
+ " rotate to 0° and extract separately\n"
+ " (slower, and ignored when -html)\n"
+ " -startPage <number> : The first page to start extraction (1 based)\n"
+ " -endPage <number> : The last page to extract (1 based, inclusive)\n"
+ " <inputfile> : The PDF document to use\n"
+ " [output-text-file] : The file to write the text to";
System.err.println(message);
System.exit( 1 );
}
}
/**
* Collect all angles while doing text extraction. Angles are in degrees and rounded to the closest
* integer (to avoid slight differences from floating point arithmethic resulting in similarly
* angled glyphs being treated separately). This class must be constructed for each page so that the
* angle set is initialized.
*/
class AngleCollector extends PDFTextStripper
{
private final Set<Integer> angles = new TreeSet<Integer>();
AngleCollector() throws IOException
{
}
Set<Integer> getAngles()
{
return angles;
}
#Override
protected void processTextPosition(TextPosition text)
{
Matrix m = text.getTextMatrix();
m.concatenate(text.getFont().getFontMatrix());
int angle = (int) Math.round(Math.toDegrees(Math.atan2(m.getShearY(), m.getScaleY())));
angle = (angle + 360) % 360;
angles.add(angle);
}
}
/**
* TextStripper that only processes glyphs that have angle 0.
*/
class FilteredTextStripper extends PDFTextStripper
{
FilteredTextStripper() throws IOException
{
}
#Override
protected void processTextPosition(TextPosition text)
{
Matrix m = text.getTextMatrix();
m.concatenate(text.getFont().getFontMatrix());
int angle = (int) Math.round(Math.toDegrees(Math.atan2(m.getShearY(), m.getScaleY())));
if (angle == 0)
{
super.processTextPosition(text);
}
}
}
/**
* Dummy output.
*/
class NullWriter extends Writer
{
#Override
public void write(char[] cbuf, int off, int len) throws IOException
{
// do nothing
}
#Override
public void flush() throws IOException
{
// do nothing
}
#Override
public void close() throws IOException
{
// do nothing
}
}

Apache POI Streaming API doesn't recognize Excel (xlsx) content

I have a class which ingests .xlsx-files. I took it from this example and modified it for my needs:
https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java
Now the application processes some files just fine, others not at all. If I change one single field or even character in one of the not working files and save them again, the whole content is processed correctly. Does anyone have an idea what might be the reason for (imho it lies somewhere within the original excel files).
To whom it may help, here is my code:
package com.goodgamestudios.icosphere.service.fileReader;
import com.goodgamestudios.icosphere.datamodel.DataSet;
import com.goodgamestudios.icosphere.datamodel.Tuple;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.ss.usermodel.BuiltinFormats;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
public class ExcelFileReader implements FileReader {
static final Logger LOG = LoggerFactory.getLogger(ExcelFileReader.class);
private SheetHandler handler;
#Override
public DataSet getDataFromFile(File file) throws IOException {
LOG.info("Start ingesting file {}");
try {
OPCPackage pkg = OPCPackage.open(file);
XSSFReader reader = new XSSFReader(pkg);
StylesTable styles = reader.getStylesTable();
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);
SharedStringsTable sst = reader.getSharedStringsTable();
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
handler = new SheetHandler(styles, strings, 24);
parser.setContentHandler(handler);
// rId2 found by processing the Workbook
// Seems to either be rId# or rSheet#
System.out.println("yooooo 1");
InputStream sheet2 = reader.getSheet("rId2");
System.out.println("yooooo 2");
InputSource sheetSource = new InputSource(sheet2);
System.out.println("yooooo 3");
parser.parse(sheetSource);
LOG.debug("{} rows parsed", handler.getData().getRows().size() + 1);
sheet2.close();
return handler.getData();
} catch (OpenXML4JException | SAXException ex) {
LOG.warn("Unable to parse file {}", file.getName());
LOG.warn("Exception: {}: ", ex);
}
return null;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*
* Derived from http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api
* <p/>
* Also see Standard ECMA-376, 1st edition, part 4, pages 1928ff, at
* http://www.ecma-international.org/publications/standards/Ecma-376.htm
* <p/>
* A web-friendly version is http://openiso.org/Ecma/376/Part4
*/
private static class SheetHandler extends DefaultHandler {
boolean isFirstRow = true;
private int quantityOfColumns;
private int currentColumnNumber = 1;
int currentRowNumber = 1;
private int rowNumberOfLastCell = 1;
private DataSet data = new DataSet();
private Tuple tuple;
/**
* Table with styles
*/
private StylesTable stylesTable;
/**
* Table with unique strings
*/
private ReadOnlySharedStringsTable sharedStringsTable;
/**
* Number of columns to read starting with leftmost
*/
private final int minColumnCount;
// Set when V start element is seen
private boolean vIsOpen;
// Set when cell start element is seen;
// used when cell close element is seen.
private xssfDataType nextDataType;
// Used to format numeric cell values.
private short formatIndex;
private String formatString;
private final DataFormatter formatter;
// The last column printed to the output stream
private int lastColumnNumber = -1;
// Gathers characters as they are seen.
private StringBuffer value;
static final Logger LOG = LoggerFactory.getLogger(SheetHandler.class);
private SheetHandler(StylesTable styles,
ReadOnlySharedStringsTable strings,
int cols) {
this.stylesTable = styles;
this.sharedStringsTable = strings;
this.minColumnCount = cols;
this.value = new StringBuffer();
this.nextDataType = xssfDataType.NUMBER;
this.formatter = new DataFormatter();
LOG.debug("Sheethandler created");
}
/*
* (non-Javadoc)
* #see org.xml.sax.helpers.DefaultHandler#startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)
*/
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
System.out.println("yooooooooooo start:uri:" + uri + " localname: " + localName + " name: " + name);
if ("inlineStr".equals(name) || "v".equals(name)) {
vIsOpen = true;
// Clear contents cache
value.setLength(0);
} // c => cell
else if ("c".equals(name)) {
// Get the cell reference
String r = attributes.getValue("r");
int firstDigit = -1;
for (int c = 0; c < r.length(); ++c) {
if (Character.isDigit(r.charAt(c))) {
firstDigit = c;
break;
}
}
currentColumnNumber = nameToColumn(r.substring(0, firstDigit));
System.out.println("colu mn " + currentColumnNumber);
// Set up defaults.
this.nextDataType = xssfDataType.NUMBER;
this.formatIndex = -1;
this.formatString = null;
String cellType = attributes.getValue("t");
String cellStyleStr = attributes.getValue("s");
if ("b".equals(cellType)) {
nextDataType = xssfDataType.BOOL;
} else if ("e".equals(cellType)) {
nextDataType = xssfDataType.ERROR;
} else if ("inlineStr".equals(cellType)) {
nextDataType = xssfDataType.INLINESTR;
} else if ("s".equals(cellType)) {
nextDataType = xssfDataType.SSTINDEX;
} else if ("str".equals(cellType)) {
nextDataType = xssfDataType.FORMULA;
} else if (cellStyleStr != null) {
// It's a number, but almost certainly one
// with a special style or format
XSSFCellStyle style = null;
if (cellStyleStr != null) {
int styleIndex = Integer.parseInt(cellStyleStr);
style = stylesTable.getStyleAt(styleIndex);
} else if (stylesTable.getNumCellStyles() > 0) {
style = stylesTable.getStyleAt(0);
}
if (style != null) {
this.formatIndex = style.getDataFormat();
this.formatString = style.getDataFormatString();
if (this.formatString == null) {
this.formatString = BuiltinFormats.getBuiltinFormat(this.formatIndex);
}
}
}
}
}
/*
* (non-Javadoc)
* #see org.xml.sax.helpers.DefaultHandler#endElement(java.lang.String, java.lang.String, java.lang.String)
*/
public void endElement(String uri, String localName, String name)
throws SAXException {
String thisStr = null;
// v => contents of a cell
if ("v".equals(name)) {
// Process the value contents as required.
// Do now, as characters() may be called more than once
switch (nextDataType) {
case BOOL:
char first = value.charAt(0);
thisStr = first == '0' ? "FALSE" : "TRUE";
break;
case ERROR:
thisStr = "\"ERROR:" + value.toString() + '"';
break;
case FORMULA:
// A formula could result in a string value,
// so always add double-quote characters.
thisStr = '"' + value.toString() + '"';
break;
case INLINESTR:
// TODO: have seen an example of this, so it's untested.
XSSFRichTextString rtsi = new XSSFRichTextString(value.toString());
thisStr = '"' + rtsi.toString() + '"';
break;
case SSTINDEX:
String sstIndex = value.toString();
try {
int idx = Integer.parseInt(sstIndex);
XSSFRichTextString rtss = new XSSFRichTextString(sharedStringsTable.getEntryAt(idx));
thisStr = rtss.toString();
} catch (NumberFormatException ex) {
System.out.println("Failed to parse SST index '" + sstIndex + "': " + ex.toString());
}
break;
case NUMBER:
String n = value.toString();
if (this.formatString != null && n.length() > 0) {
thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
} else {
thisStr = n;
}
break;
default:
thisStr = "(TODO: Unexpected type: " + nextDataType + ")";
break;
}
// Output after we've seen the string contents
// Emit commas for any fields that were missing on this row
if (lastColumnNumber == -1) {
lastColumnNumber = 0;
}
for (int i = lastColumnNumber; i < currentColumnNumber; ++i) {
}
// Might be the empty string.
System.out.println(thisStr);
if (isFirstRow) {
data.getHeaders().add(thisStr);
} else {
tuple.getRowEntries()[currentColumnNumber] = thisStr;
}
// Update column
if (currentColumnNumber > -1) {
lastColumnNumber = currentColumnNumber;
}
} else if ("row".equals(name)) {
// We're onto a new row
System.out.println("nextrow");
lastColumnNumber = -1;
System.out.println("yoooooo tuple:" + tuple);
if (isFirstRow) {
isFirstRow = false;
quantityOfColumns = data.getHeaders().size();
tuple = new Tuple(quantityOfColumns);
} else if (!tuple.isEmpty()) {
data.addRow(tuple);
tuple = new Tuple(quantityOfColumns);
}
}
}
/**
* Captures characters only if a suitable element is open. Originally
* was just "v"; extended for inlineStr also.
*/
public void characters(char[] ch, int start, int length)
throws SAXException {
if (vIsOpen) {
value.append(ch, start, length);
}
}
/**
* Converts an Excel column name like "C" to a zero-based index.
*
* #param name
* #return Index corresponding to the specified name
*/
private int nameToColumn(String name) {
int column = -1;
for (int i = 0; i < name.length(); ++i) {
int c = name.charAt(i);
column = (column + 1) * 26 + c - 'A';
}
return column;
}
public DataSet getData() {
return data;
}
}
/**
* The type of the data value is indicated by an attribute on the cell. The
* value is usually in a "v" element within the cell.
*/
enum xssfDataType {
BOOL,
ERROR,
FORMULA,
INLINESTR,
SSTINDEX,
NUMBER,
}
}
Here is the xml example of a working and a not working worksheet:
http://www.file-upload.net/download-10909789/not_working.xml.html
http://www.file-upload.net/download-10909790/working.xml.html
and here the xlsx-files:
http://www.file-upload.net/download-10909802/not_working.xlsx.html
http://www.file-upload.net/download-10909803/working.xlsx.html
Thanks!

The problem was, that LibreOffice Calc saves the first worksheet under "rId2", whereas MSOffice does so under "rId1". So now I'm now going through sheetIds until a sheet with content is parsed or no more sheets are found. Works with both files:
private void parseFirstWorksheetWithContent(XSSFReader reader) throws IOException, InvalidFormatException, SAXException {
//Sheet-ID seems to differ, seems to be "rId2" for files saved by MS Excel and "rId1" for those saved by LibreOffice Calc
try {
for (int i = 1; handler.getData().isEmpty(); i++) {
parseSheet(reader, "rId" + i);
}
} catch (IllegalArgumentException e) {
//No more sheets, file empty
}
}
private void parseSheet(XSSFReader reader, String sheetId) throws InvalidFormatException, SAXException, IOException {
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
parser.setContentHandler(handler);
InputStream sheetStream = reader.getSheet(sheetId);
InputSource sheetSource = new InputSource(sheetStream);
parser.parse(sheetSource);
sheetStream.close();
}

Java getShopItem() and getShopUser methods

Hi Guys I am trying to write a method, getShopItem(), for the Shop class, that will return a pointer to a ShopItem object stored in itemsList when given the item's itemCode value, below is my attempt at coding which has been unsucessful and will not compile:
public shop item.getshopitem(shopItem)
{
return this.shopitem
}
Shop Class:
public class Shop
{
private ArrayList<Shop> shopCollection;
private ArrayList<ShopUser> usersList;
private ArrayList<Tool> toolsList;
private int toolCount;
private String toolName;
private int power;
private int timesBorrowed;
private boolean rechargeable;
private int itemCode;
private int cost;
private double weight;
private boolean onLoan;
private JFrame myFrame;
private FileDialog fileDialogBox;
private int mode;
public void ReadToolData (String data) throws FileNotFoundException,NoSuchElementException
{
// public void FileDialogBox() throws FileNotFoundException
{
if ( userSelectMode() ) // LOAD or SAVE if successful
{
String path = "E:/LEWIS BC 2/java project/project 1 part 3/items_all.txt"; // start browsing in root on drive E:
// String path = "E:/LEWIS BC 2/java project/project 1 part 3/userData.txt"; // start browsing in root on drive E:
setUpFileDialog(path);
fileDialogBox.setVisible(true);
}
else
{
String message = "No option selected, aborting";
String title = "Error";
JOptionPane.showMessageDialog(myFrame, message, title,
JOptionPane.WARNING_MESSAGE);
return; // or System.exit(1)
}
// see what the user has selected
String fileName = fileDialogBox.getFile();
if ( fileName==null )
{
String message = "Selection cancelled by user, aborting";
String title = "Error";
JOptionPane.showMessageDialog(myFrame, message, title,
JOptionPane.WARNING_MESSAGE);
return; // or System.exit(2)
}
String directoryPath = fileDialogBox.getDirectory();
String message = "File selected: " + fileName + "\nFolder: " + directoryPath;
JOptionPane.showMessageDialog(myFrame, message, "File Selected",
JOptionPane.INFORMATION_MESSAGE);
// now create a File object
File fileObject = new File(directoryPath, fileName);
// note that this does NOT create an actual file -- it simply
// creates a reference, or a "handle", for a file
// normally would now do something useful with the File object !!
// -- let's check if the file exists and, if so, when it was last modified
if ( fileObject.exists() )
{
Date whenModified = new Date(fileObject.lastModified());
DateFormat df = DateFormat.getDateTimeInstance();
message = "Time file was last modified: " + df.format(whenModified);
}
else
message = "File " + fileObject.getAbsolutePath() + " does not exist";
JOptionPane.showMessageDialog(myFrame, message, "Do Something Useful",
JOptionPane.INFORMATION_MESSAGE);
}
try
{ // The name of the file which we will read from
String filename = "items_all.txt";
// Prepare to read from the file, using a Scanner object
File file = new File(filename);
Scanner in = new Scanner(file);
ArrayList<Tool> shops = new ArrayList<Tool>();
// Read each line until end of file is reached
while (in.hasNextLine())
{
// Read an entire line, which contains all the details for 1 account
String line = in.nextLine();
// Make a Scanner object to break up this line into parts
Scanner lineBreaker = new Scanner(line);
// 1st part is the toolCount
try
{
// need to put checks for empty line & comments
int toolCount = lineBreaker.nextInt();
// 2nd part is the toolName
String toolName = lineBreaker.next();
// 3rd part is the amount of money or the Cost...
int cost = lineBreaker.nextInt();
int total = lineBreaker.nextInt();
}
catch (InputMismatchException e)
{
System.out.println("File not found1.");
}
catch (NoSuchElementException e)
{
System.out.println("File not found2");
}
}
}
catch (FileNotFoundException e)
{
System.out.println("File not found");
} // Make an ArrayList to store all the
// Return the ArrayList
// return shops;
}
public static void main(String[] args)
{
String test = "245.34,2456 345.2,34.12,23456 23,4";
Scanner input = new Scanner(test);
input.useDelimiter(",");
while (input.hasNextDouble())
{
System.out.println(input.nextDouble());
}
System.out.println("I get to the end");
}
/**
* Default Constructor for Testing
*/
public void extractTokens(Scanner scanner ) throws IOException, FileNotFoundException
{
//extracts tokens from the text file
File text = new File("E:/LEWIS BC 2/java project/project 1 part 3/items_all.txt");
String toolName = scanner.next();
String itemCode = scanner.next();
String power = scanner.next();
String timesBorrowed = scanner.next();
String onLoan = scanner.next();
String cost = scanner.next();
String weight = scanner.next();
extractTokens(scanner);
// System.out.println(parts.get(1)); // "en"
}
/**
* Creates a collection of tools to be stored in a tool list
*/
public Shop(String toolName, int power,int timesborrowed,boolean rechargeable,int itemCode,int cost,double weight,int toolcount,boolean onLoan)
{
toolsList = new ArrayList<Tool>();
toolName = new String();
power = 0;
timesborrowed = 0;
rechargeable = true;
itemCode = 001;
cost = 100;
weight = 0.0;
toolCount = 0;
onLoan = true;
}
/**
* Default Constructor for Testing
*/
public Shop()
{
// initialise instance variables
toolName = "Spanner";
itemCode = 001;
timesBorrowed = 0;
power = 0;
onLoan = true;
rechargeable = true;
itemCode = 001;
cost = 100;
weight = 0.0;
toolCount = 0;
}
//
// /**
// * Creates a collection of tools to be stored in a tool list
// */
// public Shop() {
// this.toolsList = new ArrayList<Tool>();
// this.toolName = toolName;
// this.power = power;
// this.timesborrowed = timesborrowed;
//
//
// }
//
// /**
// * Default Constructor for Testing
// */
// public Shop(){
// // Call the previous defined constructor
//
// }
/**
* Reads ElectronicToolData data from a text file
*
* #param <code>fileName</code> a <code>String</code>, the name of the
* text file in which the data is stored.
*
* #throws FileNotFoundException
*/
//
// while (there are more lines in the data file )
// {
// lineOfText = next line from scanner
// if( line starts with // )
// { // ignore }
// else if( line is blank )
// { // ignore }
// else
// { code to deal with a line of ElectricTool data }
// }
private boolean userSelectMode()
{
String[] options = {"LOAD", "SAVE"};
String instr = "Select LOAD for read access, SAVE for write access";
String title = "Select Mode";
int button = JOptionPane.showOptionDialog(myFrame, instr, title,
JOptionPane.DEFAULT_OPTION,
JOptionPane.QUESTION_MESSAGE,
null, options, null);
boolean success;
if ( button==0 )
{
mode = FileDialog.LOAD;
success = true;
}
else if ( button==1 )
{
mode = FileDialog.SAVE;
success = true;
}
else
success = false;
return success;
}
private void setUpFileDialog(String path)
{
String fileDialogTitle = null;
if ( mode == FileDialog.LOAD )
fileDialogTitle = "Open";
else if ( mode == FileDialog.SAVE)
fileDialogTitle ="Save As";
else
{
// defensive programming -- this should never happen !
System.out.println("*** Unexpected Error -- Aborting ***");
System.exit(1);
}
fileDialogBox = new FileDialog(myFrame, fileDialogTitle, mode);
fileDialogBox.setDirectory(path); // start browsing in folder
// corresponding to path
}
/**
* Creates a tool collection and populates it using data from a text file
*/
public Shop(String fileName) throws FileNotFoundException
{
this();
ReadToolData(fileName);
}
/**
* Adds a tool to the collection
*
* #param <code>tool</code> an <code>Tool</code> object, the tool to be added
*/
public void storeTool(Tool tool)
{
toolsList.add(tool);
}
/**
* Shows a tool by printing it's details. This includes
* it's position in the collection.
*
* #param <code>listPosition</code> the position of the animal
*/
public void showTool(int listPosition)
{
Tool tool;
if( listPosition < toolsList.size() )
{
tool = toolsList.get(listPosition);
System.out.println("Position " + listPosition + ": " + tool);
}
}
/**
* Returns how many tools are stored in the collection
*
* #return the number of tools in the collection
*/
public int numberOfToolls()
{
return toolsList.size();
}
/**
* Displays all the tools in the collection
*
*/
public void showAllTools()
{
System.out.println("Shop");
System.out.println("===");
int listPosition = 0;
while( listPosition<toolsList.size() ) //for each loop
{
showTool(listPosition);
listPosition++;
}
System.out.println(listPosition + " tools shown" ); // display number of tools shown
}
public void printShopiteDetails()
{
// The name of the file to open.
String fileName = "items_all.txt";
// This will reference one line at a time
String line = null;
try {
// FileReader reads text files in the default encoding.
FileReader fileReader =
new FileReader(fileName);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader =
new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
// Always close files.
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println(
"Unable to open file '" +
fileName + "'");
}
catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
// Or we could just do this:
// ex.printStackTrace();
}
}
/**
* Adds a shop item to the shop.
*
* #param <code>shop</code> an <code>Shop</code> object, the Shop item can be added to the shop.
*/
public void storeShopitem(Shop shop)
{
shopCollection.add(shop);
}
/**
* Adds a shop user to the shop.
*
* #param <code>shop</code> an <code>Shop</code> object, the new Shop user can be added to the shop.
*/
public void storeShopUser(ShopUser shopUser)
{
usersList.add(shopUser);
}
/**
* Accessor method processhireRequest
*
* #return shopuser and shopitem object's
*/
public void processHireRequest(ShopItem shopItem, ShopUser shopUser)
{
this.itemCode = itemCode;
}
/**
* Accessor method processhireRequest
*
* #return shopuser and shopitem object's
*/
public void processReturnRequest(ShopItem shopItem, ShopUser shopUser)
{
this.itemCode = itemCode;
}
}
ShopItem class:
public abstract class ShopItem
{
private ArrayList<Tool> toolsList;
Shop shop;
private int toolCount;
private String toolName;
private int power;
private int timesBorrowed;
private boolean rechargeable;
private int itemCode;
private int cost;
private double weight;
private boolean onLoan;
private static JFrame myFrame;
private String Tool;
private String ElectricTool;
private String HandTool;
private String Perishable;
private String Workwear;
private String ShopUserID;
public void ReadToolData (String data) throws FileNotFoundException,NoSuchElementException
{
// shows the directory of the text file
File file = new File("E:/LEWIS BC 2/java project/project 1 part 3/ElectricToolData.txt");
Scanner S = new Scanner (file);
// prints out the data
System.out.println();
// prints out the
System.out.println();
S.nextLine();
S.nextLine();
S.nextLine();
S.nextLine();
S.nextInt ();
}
/**
* Creates a collection of tools to be stored in a tool list
*/
public ShopItem(String toolName, int power,int timesborrowed,boolean rechargeable,int itemCode,int cost,double weight,int toolcount,boolean onLoan,boolean ShopUserID)
{
toolsList = new ArrayList<Tool>();
rechargeable = true;
power = 0;
timesborrowed = 0;
// ShopUserID = new String();
toolName = new String();
itemCode = 001;
cost = 100;
weight = 0.0;
toolCount = 0;
onLoan = true;
// ShopUserID = null;
}
/**
* Default Constructor for Testing
*/
public ShopItem()
{
// initialise instance variables
rechargeable = true;
power = 0;
timesBorrowed = 0;
ShopUserID = "SU002171";
toolName = "Spanner";
itemCode = 001;
cost = 100;
weight = 0.0;
toolCount = 0;
onLoan = true;
}
/**
* Reads ElectronicToolData data from a text file
*
* #param <code>fileName</code> a <code>String</code>, the name of the
* text file in which the data is stored.
*
* #throws FileNotFoundException
*/
public void readData(String fileName) throws FileNotFoundException
{
//
// while (there are more lines in the data file )
// {
// lineOfText = next line from scanner
// if( line starts with // )
// { // ignore }
// else if( line is blank )
// { // ignore }
// else
// { code to deal with a line of ElectricTool data }
// }
myFrame = new JFrame("Testing FileDialog Box");
myFrame.setBounds(200, 200, 800, 500);
myFrame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
myFrame.setVisible(true);
{
FileDialog fileBox = new FileDialog(myFrame,
"Open", FileDialog.LOAD);
fileBox.setVisible(true);
}
{
File dataFile = new File(fileName);
Scanner scanner = new Scanner(dataFile);
while( scanner.hasNext() )
{
String info = scanner.nextLine();
System.out.println(info);
}
scanner.close();
}
}
/**
* Default Constructor for Testing
*/
public void extractTokens(Scanner scanner) throws IOException, FileNotFoundException
{
// extracts tokens from the scanner
File text = new File("E:/LEWIS BC 2/java project/java project part 3/step_5_data.txt");
String ToolName = scanner.next();
int itemCode = scanner.nextInt();
int cost = scanner.nextInt();
int weight = scanner.nextInt();
int timesBorrowed = scanner.nextInt();
boolean rechargeable = scanner.nextBoolean();
boolean onLoan = scanner.nextBoolean();
String ShopUserID = scanner.next();
extractTokens(scanner);
// System.out.println(parts.get(1)); // "en"
}
/**
* Creates a tool collection and populates it using data from a text file
*/
public ShopItem(String fileName) throws FileNotFoundException
{
this();
ReadToolData(fileName);
}
/**
* Adds a tool to the collection
*
* #param <code>tool</code> an <code>Tool</code> object, the tool to be added
*/
public void storeTool(Tool tool)
{
toolsList.add(tool);
}
/**
* Shows a tool by printing it's details. This includes
* it's position in the collection.
*
* #param <code>listPosition</code> the position of the animal
*/
public void showTool(int listPosition)
{
Tool tool;
if( listPosition < toolsList.size() )
{
tool = toolsList.get(listPosition);
System.out.println("Position " + listPosition + ": " + tool);
}
}
/**
* Returns how many tools are stored in the collection
*
* #return the number of tools in the collection
*/
public int numberOfToolls()
{
return toolsList.size();
}
/**
* Displays all the tools in the collection
*
*/
public void showAllTools()
{
System.out.println("Shop");
System.out.println("===");
int listPosition = 0;
while( listPosition<toolsList.size() ) //for each loop
{
showTool(listPosition);
listPosition++;
}
System.out.println(listPosition + " tools shown" ); // display number of tools shown
}
public void printAllDetails()
{
// The name of the file to open.
String fileName = "ElectricToolDataNew.txt";
// This will reference one line at a time
String line = null;
try {
// FileReader reads text files in the default encoding.
FileReader fileReader =
new FileReader(fileName);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader =
new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
// Always close files.
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println(
"Unable to open file '" +
fileName + "'");
}
catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
// Or we could just do this:
// ex.printStackTrace();
}
}
}
is this above code correct and should it work??
any answers or help would be greatly appreciated.

public shop item.getshopitem(shopItem)
This isn't valid Java. You dont need to prepend the function.
public ShopItem getshopitem(int shopItem)

I think the method you are trying to write is
public ShopItem getShopItem(int shopItem)
{
return this.shopItem;
}
Although, in the actual method implementation, you will want to search your list of ShopItems for the given item code.
If you are going to be storing item codes that correspond to ShopItems, I suggest using a HashMap with item codes as the keys and ShopItems as the values.

Control output file of a script

In continuation from this question.
I need help in making my TableToCSV (function that converts .html table to csv), render the code to a database, rather than a .csv. I created a BufferedReader, which converts .csv to database, but I can't get the 2 to connect. Please make the output file of TableToCSV go into my bufferedreader.
TableToCSV
* [TableToCSV.java]
*
* Summary: Extracts rows in CSV tables to CSV form. Extracts data from all tables in the input. Output in xxx.csv.
*
* Copyright: (c) 2011-2014 Roedy Green, Canadian Mind Products, http://mindprod.com
*
* Licence: This software may be copied and used freely for any purpose but military.
* http://mindprod.com/contact/nonmil.html
*
* Requires: JDK 1.6+
*
* Created with: JetBrains IntelliJ IDEA IDE http://www.jetbrains.com/idea/
*
* Version History:
* 1.0 2011-01-23 initial version.
* 1.1 2011-01-25 allow you to specify encoding
*/
package com.mindprod.csv;
import com.mindprod.common11.Misc;
import com.mindprod.entities.DeEntifyStrings;
import com.mindprod.hunkio.HunkIO;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.nio.charset.Charset;
import static java.lang.System.err;
import static java.lang.System.out;
/**
* Extracts rows in CSV tables to CSV form. Extracts data from all tables in the input. Output in xxx.csv.
* <p/>
* Use: java.exe com.mindprod.TableToCSV xxxx.html
* It also strips tags and converts entities back to UTF-8 characters.
*
* #author Roedy Green, Canadian Mind Products
* #version 1.1 2011-01-25 allow you to specify encoding
* #since 2011-01-23
*/
public final class TableToCSV
{
// ------------------------------ CONSTANTS ------------------------------
/**
* how to use the command line
*/
private static final String USAGE = "TableToCSV needs the name of an HTML file on the commandline, " +
"nothing else. Output will be in xxx.csv.";
// -------------------------- PUBLIC INSTANCE METHODS --------------------------
/**
* Constructor to convert an HTML table to CSV. Strips out entities and tags.
*
* #param file CSV file to be packed to remove excess space and quotes.
* #param separatorChar field separator character, usually ',' in North America,
* ';' in Europe and sometimes '\t' for
* tab for the output file. It is tab for the input file.
* Note this is a 'char' not a "string".
* #param quoteChar character used to quote fields containing awkward chars.
* #param commentChar character to treat as comments.
* #param encoding encoding of the input and output file.
*
* #throws java.io.IOException if problems reading/writing file
*/
#SuppressWarnings({ "WeakerAccess" })
public TableToCSV( final File file, final char separatorChar, final char quoteChar, final char commentChar,
final Charset encoding ) throws IOException
{
String outFilename = Misc.getCanOrAbsPath( file );
outFilename = outFilename.substring( 0, outFilename.length() - 5 ) + ".csv";
final File outFile = new File( outFilename );
// writer, quoteLevel, separatorChar, quoteChar, commentChar, trim
final PrintWriter pw = new PrintWriter( new OutputStreamWriter( new BufferedOutputStream( new FileOutputStream(
outFile ), 32768 ), encoding ) );
final CSVWriter w = new CSVWriter( pw, 0 /* minimal */, separatorChar, quoteChar, commentChar, true );
// read the entire html file into RAM.
String big = HunkIO.readEntireFile( file, encoding );
int from = 0;
// our parser is forgiving, works even if </td> </tr> missing.
while ( true )
{
// find <tr
final int trStart = big.indexOf( "<tr", from );
if ( trStart < 0 )
{
break;
}
from = trStart + 3;
final int trEnd = big.indexOf( '>', from );
if ( trEnd < 0 )
{
break;
}
while ( true )
{
// search for <td>...</td>
final int tdStart = big.indexOf( "<td", from );
if ( tdStart < 0 )
{
break;
}
from = tdStart + 3;
final int tdEnd = big.indexOf( '>', from );
if ( tdEnd < 0 )
{
break;
}
from = tdEnd + 1;
final int startField = tdEnd + 1;
final int slashTdStart = big.indexOf( "</td", from );
final int lookaheadTd = big.indexOf( "<td", from );
final int lookaheadSlashTr = big.indexOf( "</tr", from );
final int lookaheadTr = big.indexOf( "<tr", from );
int endField = Integer.MAX_VALUE;
if ( slashTdStart >= 0 && slashTdStart < endField )
{
endField = slashTdStart;
}
if ( lookaheadTd >= 0 && lookaheadTd < endField )
{
endField = lookaheadTd;
}
if ( lookaheadSlashTr >= 0 && lookaheadSlashTr < endField )
{
endField = lookaheadSlashTr;
}
if ( lookaheadTr >= 0 && lookaheadTr < endField )
{
endField = lookaheadTr;
}
if ( endField == Integer.MAX_VALUE )
{
break;
}
from = endField + 3;
final int slashTdEnd = big.indexOf( '>', from );
if ( slashTdEnd < 0 )
{
break;
}
String field = big.substring( startField, endField );
field = DeEntifyStrings.flattenHTML( field, ' ' );
w.put( field );
from = slashTdEnd + 1;
final int lookTd = big.indexOf( "<td", from );
final int lookTr = big.indexOf( "<tr", from );
if ( lookTr >= 0 && lookTr < lookTd || lookTd < 0 )
{
break;
}
}
w.nl();
}
out.println( w.getLineCount() + " rows extracted from table to csv" );
w.close();
}
// --------------------------- main() method ---------------------------
/**
* Simple command line interface to TableToCSV. Converts one HTML file to a CSV file, extracting tables,
* with entities stripped.
* Must have extension .html <br> Use java com.mindprod.TableToCSV somefile.html . You can use TableToCSV
* constructor
* in your own programs.
*
* #param args name of csv file to remove excess quotes and space
*/
public static void main( String[] args )
{
if ( args.length != 1 )
{
throw new IllegalArgumentException( USAGE );
}
String filename = args[ 0 ];
if ( !filename.endsWith( ".html" ) )
{
throw new IllegalArgumentException( "Bad Extension. Input must be a .html file.\n" + USAGE );
}
final File file = new File( filename );
try
{
// file, separatorChar, quoteChar, commentChar, encoding
new TableToCSV( file, ',', '\"', '#', CSV.UTF8Charset );
}
catch ( IOException e )
{
err.println();
e.printStackTrace( err );
err.println( "CSVToTable failed to export" + Misc.getCanOrAbsPath( file ) );
err.println();
}
}// end main
}
And here is my BufferedReader
BufferedReader br=new BufferedReader(new FileReader(newFile));
String line;
while((line=br.readLine())!=null)
{
String[]value = line.split(",");
String sql = "INSERT into main ( , Ticket #, Status, Priority, Department, Account Name) "
+ "values ('"+value[0]+"','"+value[1]+"','"+value[2]+"','"+value[3]+"','"+value[4]+"','"+value[5]+"')";
PreparedStatement pst = DatabaseConnection.ConnectDB().prepareStatement(sql);
pst.executeUpdate();
}
br.close();
}
catch(Exception e)
{
JOptionPane.showMessageDialog(null, e);
}
}
}
});

Did you test your database codes? does it work? (Hint: your sql statement is wrong). Is it auto-commit on? If not, aren't you suppose to close() the statement/connection?
I would factor the code and move the sql statement out and creation of prepared statement out side of the loop:
String sql = "INSERT INTO MAIN(\"Ticket #\", \"Status\", \"Priority\", \"Department\", \"Account Name\") VALUES (?, ?, ?, ?, ?);
PreparedStatement pst = DatabaseConnection.ConnectDB().prepareStatement(sql);
Then inside your while loop you just set the the object before execute.
pst.setString(1, value[0]);
pst.setString(2, value[1]); //...
And finally, don't forget to close() the statement / connection too!
pst.close();
DatabaseConnection.ConnectDB().close(); ???

Thumbnail of a PDF page (Java) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
How can I generate a thumbnail image of pages in a PDF document, using Java?

I think http://pdfbox.apache.org/ will do what you're looking for since you can create an image from a page and then scale the image
From their example code -
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.pdfbox;
import java.awt.HeadlessException;
import java.awt.Toolkit;
import java.awt.image.BufferedImage;
import javax.imageio.ImageIO;
import java.util.List;
import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.util.PDFImageWriter;
/**
* Convert a PDF document to an image.
*
* #author Ben Litchfield
* #version $Revision: 1.6 $
*/
public class PDFToImage
{
private static final String PASSWORD = "-password";
private static final String START_PAGE = "-startPage";
private static final String END_PAGE = "-endPage";
private static final String IMAGE_FORMAT = "-imageType";
private static final String OUTPUT_PREFIX = "-outputPrefix";
private static final String COLOR = "-color";
private static final String RESOLUTION = "-resolution";
private static final String CROPBOX = "-cropbox";
/**
* private constructor.
*/
private PDFToImage()
{
//static class
}
/**
* Infamous main method.
*
* #param args Command line arguments, should be one and a reference to a file.
*
* #throws Exception If there is an error parsing the document.
*/
public static void main( String[] args ) throws Exception
{
String password = "";
String pdfFile = null;
String outputPrefix = null;
String imageFormat = "jpg";
int startPage = 1;
int endPage = Integer.MAX_VALUE;
String color = "rgb";
int resolution;
float cropBoxLowerLeftX = 0;
float cropBoxLowerLeftY = 0;
float cropBoxUpperRightX = 0;
float cropBoxUpperRightY = 0;
try
{
resolution = Toolkit.getDefaultToolkit().getScreenResolution();
}
catch( HeadlessException e )
{
resolution = 96;
}
for( int i = 0; i < args.length; i++ )
{
if( args[i].equals( PASSWORD ) )
{
i++;
if( i >= args.length )
{
usage();
}
password = args[i];
}
else if( args[i].equals( START_PAGE ) )
{
i++;
if( i >= args.length )
{
usage();
}
startPage = Integer.parseInt( args[i] );
}
else if( args[i].equals( END_PAGE ) )
{
i++;
if( i >= args.length )
{
usage();
}
endPage = Integer.parseInt( args[i] );
}
else if( args[i].equals( IMAGE_FORMAT ) )
{
i++;
imageFormat = args[i];
}
else if( args[i].equals( OUTPUT_PREFIX ) )
{
i++;
outputPrefix = args[i];
}
else if( args[i].equals( COLOR ) )
{
i++;
color = args[i];
}
else if( args[i].equals( RESOLUTION ) )
{
i++;
resolution = Integer.parseInt(args[i]);
}
else if( args[i].equals( CROPBOX ) )
{
i++;
cropBoxLowerLeftX = Float.valueOf(args[i]).floatValue();
i++;
cropBoxLowerLeftY = Float.valueOf(args[i]).floatValue();
i++;
cropBoxUpperRightX = Float.valueOf(args[i]).floatValue();
i++;
cropBoxUpperRightY = Float.valueOf(args[i]).floatValue();
}
else
{
if( pdfFile == null )
{
pdfFile = args[i];
}
}
}
if( pdfFile == null )
{
usage();
}
else
{
if(outputPrefix == null)
{
outputPrefix = pdfFile.substring( 0, pdfFile.lastIndexOf( '.' ));
}
PDDocument document = null;
try
{
document = PDDocument.load( pdfFile );
//document.print();
if( document.isEncrypted() )
{
try
{
document.decrypt( password );
}
catch( InvalidPasswordException e )
{
if( args.length == 4 )//they supplied the wrong password
{
System.err.println( "Error: The supplied password is incorrect." );
System.exit( 2 );
}
else
{
//they didn't supply a password and the default of "" was wrong.
System.err.println( "Error: The document is encrypted." );
usage();
}
}
}
int imageType = 24;
if ("bilevel".equalsIgnoreCase(color))
{
imageType = BufferedImage.TYPE_BYTE_BINARY;
}
else if ("indexed".equalsIgnoreCase(color))
{
imageType = BufferedImage.TYPE_BYTE_INDEXED;
}
else if ("gray".equalsIgnoreCase(color))
{
imageType = BufferedImage.TYPE_BYTE_GRAY;
}
else if ("rgb".equalsIgnoreCase(color))
{
imageType = BufferedImage.TYPE_INT_RGB;
}
else if ("rgba".equalsIgnoreCase(color))
{
imageType = BufferedImage.TYPE_INT_ARGB;
}
else
{
System.err.println( "Error: the number of bits per pixel must be 1, 8 or 24." );
System.exit( 2 );
}
//si une cropBox a ete specifier, appeler la methode de modification de cropbox
//changeCropBoxes(PDDocument document,float a, float b, float c,float d)
if ( cropBoxLowerLeftX!=0 || cropBoxLowerLeftY!=0 || cropBoxUpperRightX!=0 || cropBoxUpperRightY!=0 )
{
changeCropBoxes(document,cropBoxLowerLeftX, cropBoxLowerLeftY, cropBoxUpperRightX, cropBoxUpperRightY);
}
//Make the call
PDFImageWriter imageWriter = new PDFImageWriter();
boolean success = imageWriter.writeImage(document, imageFormat, password,
startPage, endPage, outputPrefix, imageType, resolution);
if (!success)
{
System.err.println( "Error: no writer found for image format '"
+ imageFormat + "'" );
System.exit(1);
}
}
catch (Exception e)
{
System.err.println(e);
}
finally
{
if( document != null )
{
document.close();
}
}
}
}
/**
* This will print the usage requirements and exit.
*/
private static void usage()
{
System.err.println( "Usage: java org.apache.pdfbox.PDFToImage [OPTIONS] <PDF file>\n" +
" -password <password> Password to decrypt document\n" +
" -imageType <image type> (" + getImageFormats() + ")\n" +
" -outputPrefix <output prefix> Filename prefix for image files\n" +
" -startPage <number> The first page to start extraction(1 based)\n" +
" -endPage <number> The last page to extract(inclusive)\n" +
" -color <string> The color depth (valid: bilevel, indexed, gray, rgb, rgba)\n" +
" -resolution <number> The bitmap resolution in dpi\n" +
" -cropbox <number> <number> <number> <number> The page area to export\n" +
" <PDF file> The PDF document to use\n"
);
System.exit( 1 );
}
private static String getImageFormats()
{
StringBuffer retval = new StringBuffer();
String[] formats = ImageIO.getReaderFormatNames();
for( int i = 0; i < formats.length; i++ )
{
retval.append( formats[i] );
if( i + 1 < formats.length )
{
retval.append( "," );
}
}
return retval.toString();
}
private static void changeCropBoxes(PDDocument document,float a, float b, float c,float d)
{
List pages = document.getDocumentCatalog().getAllPages();
for( int i = 0; i < pages.size(); i++ )
{
System.out.println("resizing page");
PDPage page = (PDPage)pages.get( i );
PDRectangle rectangle = new PDRectangle();
rectangle.setLowerLeftX(a);
rectangle.setLowerLeftY(b);
rectangle.setUpperRightX(c);
rectangle.setUpperRightY(d);
page.setMediaBox(rectangle);
page.setCropBox(rectangle);
}
}
}

You could also have a look at JPedal (details at http://www.jpedal.org/pdf_thumbnail.php)

IcePdf is the best that I've seen (that's free) for reading pdfs. JPedal is awesome, but not free.
If you're going to be generating images from pdfs that the general public can send you, I assure you (from experience) that you'll get pdfs that will crash the JVM. (ie: If they're many-layered pdfs with all vector graphics). This pdf is an example that will crash many libraries (but is a perfectly valid PDF without anything funny like Javascript, etc).
We've gone down the route of trying to use a multitude of libraries and eventually resorting to delegating the work of creating a thumbnail to ImageMagick, which is a highly optimized C program for image manipulation.

This post is not only pdf, but also many other file type like office, image, text....

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

get Horizontal Text with PDFBox [duplicate] - java

Related

java- rotated file extraction?

Apache POI Streaming API doesn't recognize Excel (xlsx) content

Java getShopItem() and getShopUser methods

Control output file of a script

Thumbnail of a PDF page (Java) [closed]

Categories

Resources