pdfbox2.0.4 convert pdf with Chinese to png

pdfbox2.0.4 convert pdf with Chinese to png - java

I have imported pdfbox-2.0.4.jar, fontbox-2.0.4.jar and commons-logging-1.1.1.jar into eclipse kepler. The programm runs on win10.
The console prints lots of such warnings
org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Using fallback font ArialUnicodeMS for CID-keyed TrueType font KaiTi_GB2312.
And I cannot access the image file with whole content. How can I fix it?
My code is like this:
public class PdfboxTest {
private static final String filePath = "xxx";
private static final String outputFilePath = "xxx";
public static void change(File inputFile, File outputFolder) throws IOException {
String totalFileName = inputFile.getName();
String fileName = totalFileName.substring(0,totalFileName.lastIndexOf("."));
PDDocument doc = null;
try {
doc = PDDocument.load(inputFile);
PDFRenderer pdfRenderer = new PDFRenderer(doc);
int pageCounter = 0;
for(PDPage page : doc.getPages())
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, outputFilePath + "\\" + fileName + (pageCounter++) +".png", 300);
}
doc.close();
} finally {
if (doc != null) {
doc.close();
}
}
}
public static void main(String[] args) {
File inputFile = new File(filePath);
File outputFolder = new File(outputFilePath);
if(!outputFolder.exists()){
outputFolder.mkdirs();
}
try {
change(inputFile, outputFolder);
} catch (IOException e) {
e.printStackTrace();
}
}
}

As seen in the comments - the best solution is to install the missing font KaiTi_GB2312. The message Using fallback font means that the PDF references the mentioned font and didn't embed it, but can't find it on your computer, so PDFBox tried a fallback solution, in this case the ArialUnicodeMS font. Sadly such fallback solutions are not always perfect, which is why some glyphs were missing in the rendered image.

Related

Java File.renamTo not working

I have made the code which renames all the jpg files in a directory from 1 to n (number of files)..
if there were let say 50 jpg files that after running the program all the files are renamed to 1.jpg ,2.jpg and so on till 50.jpg
But i am facing the problem if I manually rename the file let say 50.jpg to aaa.jpg then again running the program doesn't rename that file
I have wasted one day to resove that issue
Kindly help me
Code:
public class Renaming {
private static String path; // string for storing the path
public static void main(String[] args) {
FileReader fileReader = null; // filereader for opening the file
BufferedReader bufferedReader = null; // buffered reader for buffering the data of file
try{
fileReader = new FileReader("input.txt"); // making the filereader object and paasing the file name
bufferedReader = new BufferedReader(fileReader); //making the buffered Reader object
path=bufferedReader.readLine();
fileReader.close();
bufferedReader.close();
}
catch (FileNotFoundException e) { // Exception when file is not found
e.printStackTrace();
}
catch (IOException e) { // IOException
e.printStackTrace();
}
finally {
File directory=new File(path);
File[] files= directory.listFiles(); // Storing the all the files in Array
int file_counter=1;
for(int file_no=0;file_no<files.length;file_no++){
String Extension=getFileExtension(files[file_no]); //getting the filw extension
if (files[file_no].isFile() && (Extension .equals("jpg")|| Extension.equals("JPG"))){ // checking that if file is of jpg type then apply renaming // checking thaat if it is file
File new_file = new File(path+"\\"+files[file_no].getName()); //making the new file
new_file.renameTo(new File(path+"\\"+String.valueOf(file_no+1)+".jpg")); //Renaming the file
System.out.println(new_file.toString());
file_counter++; // incrementing the file counter
}
}
}
}
private static String getFileExtension(File file) { //utility function for getting the file extension
String name = file.getName();
try {
return name.substring(name.lastIndexOf(".") + 1); // gettingf the extension name after .
} catch (Exception e) {
return "";
}
}`

first of all, you should use the path separator / . It's work on Windows, Linux and Mac OS.
This is my version of your problem to rename all files into a folder provide. Hope this will help you. I use last JDK version to speed up and reduce the code.
public class App {
private String path = null;
public static int index = 1;
public App(String path){
if (Files.isDirectory(Paths.get( path ))) {
this.path = path;
}
}
public void rename() throws IOException{
if ( this.path != null){
Files.list(Paths.get( this.path ))
.forEach( f ->
{
String fileName = f.getFileName().toString();
String extension = fileName.replaceAll("^.*\\.([^.]+)$", "$1");
try {
Files.move( f ,Paths.get( this.path + "/" + App.index + "." + extension));
App.index++;
} catch (IOException e) {
e.printStackTrace();
}
}
);
}
}
public static void main(String[] args) throws IOException {
App app = new App("c:/Temp/");
app.rename();
}
}

Convert .doc with images to .html using xdocreport

i am converting doc to html using following code
private static final String docName = "This is a test page.docx";
private static final String outputlFolderPath = "C://";
String htmlNamePath = "docHtml1.html";
String zipName="_tmp.zip";
static File docFile = new File(outputlFolderPath+docName);
File zipFile = new File(zipName);
public void ConvertWordToHtml() {
try {
InputStream doc = new FileInputStream(new File(outputlFolderPath+docName));
System.out.println("InputStream"+doc);
XWPFDocument document = new XWPFDocument(doc);
XHTMLOptions options = XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;
String root = "target";
File imageFolder = new File( root + "/images/" + doc );
options.setExtractor( new FileImageExtractor( imageFolder ) );
options.URIResolver( new FileURIResolver( imageFolder ) );
OutputStream out = new FileOutputStream(new File(htmlPath()));
XHTMLConverter.getInstance().convert(document, out, options);
} catch (Exception ex) {
}
}
public static void main(String[] args) throws IOException, ParserConfigurationException, Exception {
Convertion cwoWord=new Convertion();
cwoWord.ConvertWordToHtml();
}
public String htmlPath(){
return outputlFolderPath+htmlNamePath;
}
public String zipPath(){
// d:/_tmp.zip
return outputlFolderPath+zipName;
}
Above code is converting doc to html fine. Issue comes when i try to convert a doc file which has graphics
like circle (shown in screenshot), In this case, graphics doesn't show into html file.
Please help me out how can we maintain graphics from doc to html file as well after conversion. Thanks in Advance

You can embed the images in the html by using the following code:
Base64ImageExtractor imageExtractor = new Base64ImageExtractor();
options.setExtractor(imageExtractor);
options.URIResolver(imageExtractor);
where Base64ImageExtractor looks like:
public class Base64ImageExtractor implements IImageExtractor, IURIResolver {
private byte[] picture;
public void extract(String imagePath, byte[] imageData) throws IOException {
this.picture = imageData;
}
private static final String EMBED_IMG_SRC_PREFIX = "data:;base64,";
public String resolve(String uri) {
StringBuilder sb = new StringBuilder(picture.length + EMBED_IMG_SRC_PREFIX.length())
.append(EMBED_IMG_SRC_PREFIX)
.append(Base64Utility.encode(picture));
return sb.toString();
}
}

Getting ava.lang.ClassNotFoundException: org.apache.pdfbox.io.RandomAccessRead console error after pdfbox request

I'm working in a servlet file for a web project and this is my code :
I have the v.2.0.0 of pdfbox library and my code works in a simple java application
pdfmanager.java :
public class pdfManager {
private PDFParser parser;
private PDFTextStripper pdfStripper;
private PDDocument pdDoc ;
private COSDocument cosDoc ;
private String Text ;
private String filePath;
private File file;
public pdfManager() {
}
public String ToText() throws IOException
{
this.pdfStripper = null;
this.pdDoc = null;
this.cosDoc = null;
file = new File(filePath);
parser = new PDFParser(new RandomAccessFile(file,"r")); // update for PDFBox V 2.0
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
pdDoc.getNumberOfPages();
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(10);
// reading text from page 1 to 10
// if you want to get text from full pdf file use this code
// pdfStripper.setEndPage(pdDoc.getNumberOfPages());
Text = pdfStripper.getText(pdDoc);
return Text;
}
public void setFilePath(String filePath) {
this.filePath = filePath;
}
}
the srvlet file :
PrintWriter out = response.getWriter() ;
out.println("\ndata we gottoo : ") ;
pdfManager pdfManager = new pdfManager();
pdfManager.setFilePath("/Users/rami/Desktop/pdf2.pdf");
System.out.println(pdfManager.ToText());
called in doGet method

The library you need is not on the classpath or other problems occur when the classloader wants to load the class of the library. If you are in on a server, be sure to add the library to classpath folder. This can be done by hand or your application has to provide/deliver it by itself. Since it's not clear how your app is deployed or delivered it can have many reasons

java setting font in jar file

I'm trying to set a font "bilboregular.ttf" found in the resources folder under the root of the jar file. If i run from netbeans the font is set successfully, however if from the jar file it isn't
CODE:
main method:{
String s = System.getProperty("user.dir") + "\\src\\resources\\" + "bilboregular.ttf";
File a = new File(s);
if (a.exists()) {
this.loadFont(s);
} else {
this.loadFont(this.getClass().getResource("/resources/" + "bilboregular.ttf").getFile());
}
}
public void loadFont(String s) {
try {
font = java.awt.Font.createFont(java.awt.Font.TRUETYPE_FONT, new File(s));
font = font.deriveFont(Font.PLAIN, 20);
GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
ge.registerFont(font);
setFonts();
} catch (Exception ex) {
}
}
public boolean setFonts() {
jLabel3.setFont(font);
jLabel4.setFont(font);
return true;
}

Your main can't use getClass() because its static (assuming you mean public static void main(String args[]) with "main").
Try this code:
InputStream is = getClass().getResourceAsStream("/resources/bilboregular.ttf");
Font f = Font.createFont(Font.TRUETYPE_FONT, is);
if this doesn't work: please post the structure of your JAR and and the thrown exception (eg. use a Logger in your catch Block).
Btw. an empty exceptionblock is not a good idea here since IO operations can fail in many cases.

why throw java.awt.FontFormatException, however i can open the tff file in windows

the code is below:
Font getFont(String fontPath) throws IOException, FontFormatException {
fis = new FileInputStream(fontPath);
bis = new java.io.BufferedInputStream(fis);
Font tmp = null;
try {
tmp = Font.createFont(Font.TRUETYPE_FONT, bis);
} catch (FontFormatException e) {
e.printStackTrace();
bis.close();
fis = new FileInputStream(fontPath);
bis = new java.io.BufferedInputStream(fis);
tmp = Font.createFont(Font.TYPE1_FONT, bis);
}
return tmp.deriveFont(96.0f);
}
this code is not always work!!!.
it will throw java.awt.FontFormatException for some font file, however, that font file can be open correctly in windows...why
it will first throw:
java.awt.FontFormatException: Font name not found
and then throw:
java.awt.FontFormatException: bad pfa font

Try running this on your computer. Replace rootPath with the path to your fonts.
public class FontLoader {
public static void main(String[] args) {
String rootPath = "/Library/Fonts";
File root = new File(rootPath);
if (root.canRead()) {
String[] fontFiles = root.list();
Font font = null;
for (String fontFile : fontFiles) {
try {
System.out.println(fontFile);
font = Font.createFont(Font.TRUETYPE_FONT, new File(root
+ "/" + fontFile));
System.out.println(font);
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
}
You should be able to run indefinite number of times without fail, and if so then you might want to restructure your code somewhat - the stacking of close/open on your input streams looks somewhat suspicious. I say this without seeing where you are declaring those instance variables and what other code might be manipulating them.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

pdfbox2.0.4 convert pdf with Chinese to png - java

Related

Java File.renamTo not working

Convert .doc with images to .html using xdocreport

Getting ava.lang.ClassNotFoundException: org.apache.pdfbox.io.RandomAccessRead console error after pdfbox request

java setting font in jar file

why throw java.awt.FontFormatException, however i can open the tff file in windows

Categories

Resources