batik converting two svg to a single pdf

batik converting two svg to a single pdf - java

I'm trying to use Batik to do the following task:
I have few set of SVGs graphs, I need to create one .PDF file which basically has some texts followed by a SVG converted graph then few more text followed by another SVG converted graph and so on.
stupidly I tried this but with no surprise the second transcoder gets ignored(no Exception), I'm not sure how to tackle this:
Transcoder transcoder = new PDFTranscoder();
TranscoderInput transcoderInput = new TranscoderInput(new FileInputStream(new File(DESKTOP + "svg1.svg")));
TranscoderInput transcoderInput1 = new TranscoderInput(new FileInputStream(new File(DESKTOP + "svg2.svg")));;
TranscoderOutput transcoderOutput = new TranscoderOutput(new FileOutputStream(new File(DESKTOP+"results.pdf")));
transcoder.transcode(transcoderInput, transcoderOutput);
transcoder.transcode(transcoderInput1, transcoderOutput);
so in short I have two problems:
How to add few SVG into a .PDF?
How to add text along side them?

You can create a new svg with these two svg files side by side or upside down, then convert new svg to .pdf file.
or
if you want to create two svg file as two different pages in a single pdf file, then convert these two svg file to pdf files then merge two pdf files to single pdf file with itext 2.1.7(open source) or ghostscript

Related

Converting PDF to GRAYSCALE using PDFBox without images?

im using Apache PDFBox,
I want to convert a RGB PDF file to another GRAYSCALE file WITHOUT using images method because its making huge file size -_- !!
so this my steps:
Export a (A4) First.pdf from Adobe InDesign, contain images, texts, vector-objects.
I read the First.pdf file. Done!
using LayerUtility, copy pages from First.pdf rotate them and put them to NEW PDF file (A4) Second.pdf. Done!
this method preferred because i need vector-objects to reduce the size.
then, i want to save this as GRAY-SCALE PDF file (Second-grayscale.pdf)
and this my code (not all):
PDDocument documentFirst = PDDocument.load("First.pdf"));
// Second.pdf its empty always
PDDocument documentSecond = PDDocument.load("Second.pdf"));
for (int page = 0; page < documentSecond.getNumberOfPages(); page++) {
// get current page from documentSecond
PDPage tempPage = documentSecond.getPage(page);
// create content contentStream
PDPageContentStream contentStream = new PDPageContentStream(documentSecond, tempPage);
// create layerUtility
LayerUtility layerUtility = new LayerUtility(documentSecond);
// importPageAsForm from documentFirst
PDFormXObject form = layerUtility.importPageAsForm(documentFirst, page);
// saveGraphicsState
contentStream.saveGraphicsState();
// rotate the page
Matrix matrix;
matrix.rotate(Math.toRadians(90));
contentStream.transform(matrix);
// draw the rotated page from documentFirst to documentSecond
contentStream.drawForm(form);
contentStream.close();
}
// save the new document
documentSecond.save("Second.pdf");
documentSecond.close();
documentFirst.close();
// now convert it to GRAYSCALE or do it in the Loop above!
well, i just start using Apache Box this week, i have followed some
example, but most are old and not working, until now i did what i
need, just need the Grayscale :)!!
if there are other solutions in java using open-source library
or a free tools. (i found with Ghost Script and Python)
i read this example but i didn't understand it and there are a functions deprecated!:
https://github.com/lencinhaus/pervads/blob/master/libs/pdfbox/src/java/org/apache/pdfbox/ConvertColorspace.java
its about PDF Specs, and changing Color Space...

You mentioned you would be interested in a Ghostscript based solution as far as I understood.
If you are able to call GS from your command line you can do color to grayscale conversion with this command line
gs -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dOverrideICC -o out.pdf -f input.pdf
my answer is taken from How to convert a PDF to grayscale from command line avoiding to be rasterized?

Small rendered images from PDF slides

I'm using Ghost4J library (http://ghost4j.sourceforge.net) to split a PDF file of slides into several images. The problem I have is that I get images where the slides are in the corner and very small. I want my images to get the format of the page from the PDF, but I don't know how to do it. Here is the code I'm using.
PDFDocument examplePDF = new PDFDocument();
String filePath="input.pdf";
File file=new File(filePath);
examplePDF.load(file);
List<org.ghost4j.document.Document> docs=examplePDF.explode();
SimpleRenderer renderer = new SimpleRenderer();
renderer.setResolution(300);
int counter=0;
for ( org.ghost4j.document.Document d : docs){
List<Image> img=renderer.render(d);
ImageIO.write((RenderedImage) img.get(0), "png", new File(
(counter+ 1) + ".png"));
counter++;
}
I think the problem is in the explode method that doesn't take into account that my original pdf didn't have standard pdf page size.
PD. I first tried the code from the second answer of this question but that gave me a heap space error when the document have a lot of pages.

Would you consider using ImageMagick instead?
convert -density 300 input.pdf output.png
would give you output-1.png, output-2.png, etc.

How to extract data from a .docx file including image, table, formula etc?

I am doing a task in which i have to extract data from word document mainly images, tables and special texts(formula etc) .
I am able to save image from a word file it is downloaded from web but when i am applying same code to my .docx file than it is giving error.
Code for same is
//create file inputstream to read from a binary file
FileInputStream fs=new FileInputStream(filename);
//create office word 2007+ document object to wrap the word file
XWPFDocument docx=new XWPFDocument(fs);
//get all images from the document and store them in the list piclist
List<XWPFPictureData> piclist=docx.getAllPictures();
//traverse through the list and write each image to a file
Iterator<XWPFPictureData> iterator=piclist.iterator();
System.out.println(piclist.size());
while(iterator.hasNext()){
XWPFPictureData pic=iterator.next();
byte[] bytepic=pic.getData();
int i=0;
BufferedImage imag=ImageIO.read(new ByteArrayInputStream(bytepic));
//captureimage(imag,i,flag,j);
if(imag != null)
{
ImageIO.write(imag, "jpg", new File("D:/imagefromword"+i+".jpg"));
}else{
System.out.println("imag is empty");
}
It is giving incorrect format error. But I cannot change the doc file.
Secondly for above code if i am having more then one image and when i am saving this than every time it saving save image. Suppose we have 3 images then it will save 3 images but all three will be latest one.
Any help will be appreciated.

Without actual error one can only guess.
But there are two POI implementations HWPF and XWPF depending which version of word document your read the old doc one or xml-new-one docx. Typically the format error comes when you try to open the doc using the wrong one.
Also you need the full poi-ooxml-schemas jar to read more complicated documents.

convert a xls file as pdf without poi or jxl

So I summarize my problem. I would like to convert an xls file to PDF, while using java. .
I find two examples
The first is with Openoffice
import officetools.OfficeFile; // from officetools.jar
FileInputStream fis = new FileInputStream(new File("test.doc"));
FileOutputStream fos = new FileOutputStream(new File("test.pdf"));
OfficeFile f = new OfficeFile(fis,"localhost","8100", false);
f.convert(fos,"pdf");
But unfortunately I have to install it :(
I also find this example, two command line with vb (call pdf creator)
DoCmd.OpenReport "repClient", acViewPreview, "NumClient = 2"
DoCmd.OutputTo acOutputReport, "PDF", "d: \ test.pdf"
is there somthing like that on java !!!!
(Note I used for my first solution (jxl, appach poi) but formatting pdf generated is not like when I do save as PDF with Microsoft Excel)
think you in advance

I think you can stream the data from the excel document using
apache POI
library. You can pass this stream of data in
iText library API.
iText library API definitely has a function which writes stream data into PDF file. With iText, you can be sure of pdf formatting as it is widely used in organizations for PDF generation. Infact many reporting tool also use iText to generate PDF reports.

Apache Batik - Exception: "content not allowed in prolog" converting PDF to JPEG

I am getting the an exception when I try to convert some PDF to JPEG, with the message "content not allowed in prolog". I am performing a two-step opertaion converting SVG to PDF and then PDF to image.
I am facing this issue when I try to do the direct process which Batik is meant for.
Here is my code.
File pdfFile= new File("path to pdffile");
InputStream in = new java.io.FileInputStream(pdfFile);
JPEGTranscoder transcoder = new JPEGTranscoder();
transcoder.addTranscodingHint(JPEGTranscoder.KEY_XML_PARSER_CLASSNAME,
"org.apache.crimson.parser.XMLReaderImpl");
transcoder.addTranscodingHint(JPEGTranscoder.KEY_QUALITY,
new Float(1.0));
TranscoderInput input = new TranscoderInput(in);
OutputStream ostream = new FileOutputStream("path to jp file here!");
TranscoderOutput output = new TranscoderOutput(ostream);
transcoder.transcode(input, output);
ostream.close();

Batik transcoders expect to be given SVG data as input, not pdf documents.

I can't find anything that suggests you can convert PDF to JPEG with Batik. Btw, Batik is an XML graphics library, not a general transcoder library. Converting from PDF to JPEG is not what it's for, it's for displaying and/or transcoding SVG input to some image output.
Now, that being said, is there any reason you can't go straight from SVG to JPEG? Or use Batik to do the transcoding from SVG to PDF, then use another library besides Batik to transcode PDF to JPEG? What are your concerns involving this approach?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

batik converting two svg to a single pdf - java

Related

Converting PDF to GRAYSCALE using PDFBox without images?

Small rendered images from PDF slides

How to extract data from a .docx file including image, table, formula etc?

convert a xls file as pdf without poi or jxl

Apache Batik - Exception: "content not allowed in prolog" converting PDF to JPEG

Categories

Resources