#JasonPlutext,
Hi Jason! I tried the above code but it just replaces an totally the image deleting the whole template.
I would like to just replace/add a particular relationship of the image ,say
<Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image10.png"/>
in place of rId8 i would like to replace rId7 image.
My Source Code:
public static void main(String[] args) throws Exception {
String inputfilepath = "C:\\Users\\saranyac\\QUERIES\\Estimation\\PPT-PSR\\PSR_Dev0ps\\PSRAutomationTemplate.pptx";
PresentationMLPackage presentationMLPackage = (PresentationMLPackage)OpcPackage.load(new java.io.File(inputfilepath));
MainPresentationPart pp = presentationMLPackage.getMainPresentationPart();
SlidePart slidePart = presentationMLPackage.getMainPresentationPart().getSlide(0);
SlideLayoutPart layoutPart = slidePart.getSlideLayoutPart();
System.out.println("SlidePart Name:::::"+slidePart.getPartName().getName());
String layoutName = layoutPart.getJaxbElement().getCSld().getName();
System.out.println("layout: " + layoutPart.getPartName().getName() + " with cSld/#name='" + layoutName + "'");
System.out.println("Master: " + layoutPart.getSlideMasterPart().getPartName().getName());
System.out.println("layoutPart.getContents()::::::::s: " + layoutPart.getContents());
//layoutPart.setContents( (SldLayout)XmlUtils.unmarshalString(SAMPLE_PICTURE, Context.jcPML));
// Add image part
File file = new File("C:\\Users\\saranyac\\PPT-PSR\\PSR_Dev0ps\\ppt\\media\\image10.png" );
BinaryPartAbstractImage imagePart
= BinaryPartAbstractImage.createImagePart(presentationMLPackage, slidePart, file);
Relationship rel = pp.getRelationshipsPart().getRelationshipByID("rId8");
System.out.println("Relationship:::::::s: " +imagePart.getSourceRelationship().getId());
// pp.removeSlide(rel);
java.util.HashMap<String, String>mappings = new java.util.HashMap<String, String>();
mappings.put("rId8", imagePart.getSourceRelationship().getId());
String outputfilepath = "C:\\Work\\24Jan2018_CheckOut\\PPT-TRAILS\\Success.pptx";
//presentationMLPackage.save(new java.io.File(outputfilepath));
SaveToZipFile saver = new SaveToZipFile(presentationMLPackage);
saver.save(outputfilepath);
System.out.println("\n\n done .. saved " + outputfilepath);
}
Please help me how to replace an image in the generated PPT.
With Regards,
Saranya
See https://github.com/plutext/docx4j/blob/master/src/samples/pptx4j/org/pptx4j/samples/TemplateReplaceSimple.java (just added):
package org.pptx4j.samples;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import javax.xml.bind.JAXBException;
import org.apache.commons.io.FileUtils;
import org.docx4j.TraversalUtil;
import org.docx4j.TraversalUtil.CallbackImpl;
import org.docx4j.dml.CTBlip;
import org.docx4j.dml.CTBlipFillProperties;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.OpcPackage;
import org.docx4j.openpackaging.packages.PresentationMLPackage;
import org.docx4j.openpackaging.parts.Part;
import org.docx4j.openpackaging.parts.PresentationML.SlidePart;
import org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage;
import org.pptx4j.Pptx4jException;
/**
* Example of how to replace text and images in a Pptx.
*
* Text is replaced using the familiar VariableReplace approach.
*
* Images are replaced by replacing their byte content.
*
* #author jharrop
*
*/
public class TemplateReplaceSimple {
public static void main(String[] args) throws Docx4JException, Pptx4jException, JAXBException, IOException {
// Input file
String inputfilepath = System.getProperty("user.dir") + "/sample-docs/pptx/image.pptx";
// String replacements
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("colour", "green");
// Image replacements
List<ImageReplacementDetails> imageReplacements = new ArrayList<ImageReplacementDetails>();
ImageReplacementDetails example1 = new ImageReplacementDetails();
example1.slideIndex = 0;
example1.imageRelId = "rId2";
example1.replacementImageBytes = FileUtils.readFileToByteArray(new File("test.png"));
imageReplacements.add(example1);
PresentationMLPackage presentationMLPackage =
(PresentationMLPackage)OpcPackage.load(new java.io.File(inputfilepath));
// First, the text replacements
List<SlidePart> slideParts=
presentationMLPackage.getMainPresentationPart().getSlideParts();
for (SlidePart slidePart : slideParts) {
slidePart.variableReplace(mappings);
}
// Second, the image replacements.
// We have a design choice here.
// Either we can replace text placeholders with images,
// or we can replace existing images with new images, but keep the XML specifying size etc
// Here I opt for the latter, so what we need is the relId and image bytes.
for( ImageReplacementDetails ird : imageReplacements) {
// its a bit inefficient to potentially traverse a single slide
// multiple times, but I've done it this way to keep this example simple
SlidePart slidePart=
presentationMLPackage.getMainPresentationPart().getSlide(ird.slideIndex);
SlidePicFinder traverser = new SlidePicFinder();
new TraversalUtil(slidePart.getJaxbElement().getCSld().getSpTree().getSpOrGrpSpOrGraphicFrame(), traverser);
for(org.pptx4j.pml.Pic pic : traverser.pics) {
CTBlipFillProperties blipFill = pic.getBlipFill();
if (blipFill!=null) {
CTBlip blip = blipFill.getBlip();
if (blip.getEmbed()!=null) {
String relId = blip.getEmbed();
// is this the one we want?
if (relId.equals(ird.imageRelId)) {
Part part = slidePart.getRelationshipsPart().getPart(relId);
try {
BinaryPartAbstractImage imagePart = (BinaryPartAbstractImage)part;
// you'll need to ensure that you replace like with like,
// ie png for png, not eg jpeg for png!
imagePart.setBinaryData(ird.replacementImageBytes);
} catch (ClassCastException cce) {
System.out.println(part.getClass().getName());
}
} else {
System.out.println(relId + " isn't a match for this replacement. ");
}
} else {
System.out.println("No a:blip/#r:embed");
}
}
}
}
System.out.println("\n\n saving .. \n\n");
String outputfilepath = System.getProperty("user.dir") + "/OUT_VariableReplace.pptx";
presentationMLPackage.save(new java.io.File(outputfilepath));
System.out.println("\n\n done .. \n\n");
}
static class ImageReplacementDetails {
int slideIndex;
String imageRelId;
byte[] replacementImageBytes;
}
static class SlidePicFinder extends CallbackImpl {
List<org.pptx4j.pml.Pic> pics = new ArrayList<org.pptx4j.pml.Pic>();
public List<Object> apply(Object o) {
if (o instanceof org.pptx4j.pml.Pic) {
pics.add((org.pptx4j.pml.Pic) o);
System.out.println("added pic");
}
return null;
}
}
}
Related
I’m currently using PDFBox to read the text of a set of pdfs that I’ve inherited.
I’m only interested in reading the text, not making any changes to the file.
The code that works for most of the files is:
File pdfFile = myPath.toFile();
PDDocument document = PDDocument.load(pdfFile );
Writer sw = new StringWriter();
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage( 1 );
stripper.writeText( document, sw );
String documentText = sw.toString()
For most files, I wind up with the text in the documentText field.
But, for 3 of 24 files, the documentText content for the first file is “\r\n”, for the second “\r\n\r\n”, and for the third “\r\n\r\n\r\n:, But the three files are not consecutive. Multiple good files are between each of these files.
The File is derived from a java.nio.Path. The WindowsFileAttribute that is part of the Path has a size of 279K, so the file is not empty on disk.
I can open the file and view the data, and it looks like the other files that my code reads.
I’m using Java 8.0.121, and PDFBox 2.0.4. (this is the latest version, I believe.)
Any suggestions? Is there a better way to read the text? (I’m not interested in the formatting, or fonts used, just the text.)
Thanks.
Reading multiple PDF docs using pdfbox in java
package readwordfile;
import java.io.BufferedReader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;
/**
* This is an example on how to extract words from PDF document
*
* #author saravanan
*/
public class GetWordsFromPDF extends PDFTextStripper {
static List<String> words = new ArrayList<String>();
public GetWordsFromPDF() throws IOException {
}
/**
* #param args
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
String files;
// FileWriter fs = new FileWriter("C:\\Users\\saravanan\\Desktop\\New Text Document (2).txt");
// FileInputStream fstream1 = new FileInputStream("C:\\Users\\saravanan\\Desktop\\New Text Document (2).txt");
// DataInputStream in1 = new DataInputStream(fstream1);
// BufferedReader br1 = new BufferedReader(new InputStreamReader(in1));
String path = "C:\\Users\\saravanan\\Desktop\\New folder\\"; //local folder path name
File folder = new File(path);
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
files = listOfFiles[i].getName();
if (files.endsWith(".pdf") || files.endsWith(".PDF")) {
String nfiles = "C:\\Users\\saravanan\\Desktop\\New folder\\";
String fileName1 = nfiles + files;
System.out.print("\n\n" + files+"\n");
PDDocument document = null;
try {
document = PDDocument.load(new File(fileName1));
PDFTextStripper stripper = new GetWordsFromPDF();
stripper.setSortByPosition(true);
stripper.setStartPage(0);
stripper.setEndPage(document.getNumberOfPages());
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
int x = 0;
System.out.println("");
for (String word : words) {
if (word.startsWith("xxxxxx")) { //here you can give your pdf doc starting word
x = 1;
}
if (x == 1) {
if (!(word.endsWith("YYYYYY"))) { //here you can give your pdf doc ending word
System.out.print(word + " ");
// fs.write(word);
} else {
x = 0;
break;
}
}
}
} finally {
if (document != null) {
document.close();
words.clear();
}
}
}
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*
* #param str
* #param textPositions
* #throws java.io.IOException
*/
#Override
protected void writeString(String str, List<TextPosition> textPositions) throws IOException {
String[] wordsInStream = str.split(getWordSeparator());
if (wordsInStream != null) {
for (String word : wordsInStream) {
words.add(word); //store the pdf content into the List
}
}
}
}
I've been looking for easy way to add ID to HTML tags and spent few hours here jumping form one tool to another before I came up with this little test solving my issues. Hence my sprint backlog is almost empty I have some time to share. Feel free to make it clear and enjoy those whom are asked by QA to add the ID. Just change the tag, path and run :)
Had some issue here to make proper lambda due to lack of coffee today...
how to replace first occurence only, in single lambda? in files I had many lines having same tags.
private void replace(String path, String replace, String replaceWith) {
try (Stream<String> lines = Files.lines(Paths.get(path))) {
List<String> replaced = lines
.map(line -> line.replace(replace, replaceWith))
.collect(Collectors.toList());
Files.write(Paths.get(path), replaced);
} catch (IOException e) {
e.printStackTrace();
}
}
Above was replacing all lines as it found text to replace in next lines. Proper matcher with repleace that has autoincrement would be better to use within this method body isntead of preparing the replaceWith value before the call. If I'll ever need this again I'll add you another final version .
Final version to not waste more time (phase green):
import org.junit.Test;
import org.junit.runner.RunWith;
import org.mockito.runners.MockitoJUnitRunner;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
#RunWith(MockitoJUnitRunner.class)
public class RepalceInFilesWithAutoIncrement {
private int incremented = 100;
/**
* The tag you would like to add Id to
* */
private static final String tag = "label";
/**
* Regex to find the tag
* */
private static final Pattern TAG_REGEX = Pattern.compile("<" + tag + " (.+?)/>", Pattern.DOTALL);
private static final Pattern ID_REGEX = Pattern.compile("id=", Pattern.DOTALL);
#Test
public void replaceInFiles() throws IOException {
String nextId = " id=\"" + tag + "_%s\" ";
String path = "C:\\YourPath";
try (Stream<Path> paths = Files.walk(Paths.get(path))) {
paths.forEach(filePath -> {
if (Files.isRegularFile(filePath)) {
try {
List<String> foundInFiles = getTagValues(readFile(filePath.toAbsolutePath().toString()));
if (!foundInFiles.isEmpty()) {
for (String tagEl : foundInFiles) {
incremented++;
String id = String.format(nextId, incremented);
String replace = tagEl.split("\\r?\\n")[0];
replace = replace.replace("<" + tag, "<" + tag + id);
replace(filePath.toAbsolutePath().toString(), tagEl.split("\\r?\\n")[0], replace, false);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
});
}
System.out.println(String.format("Finished with (%s) changes", incremented - 100));
}
private String readFile(String path)
throws IOException {
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, StandardCharsets.UTF_8);
}
private List<String> getTagValues(final String str) {
final List<String> tagValues = new ArrayList<>();
final Matcher matcher = TAG_REGEX.matcher(str);
while (matcher.find()) {
if (!ID_REGEX.matcher(matcher.group()).find())
tagValues.add(matcher.group());
}
return tagValues;
}
private void replace(String path, String replace, String replaceWith, boolean log) {
if (log) {
System.out.println("path = [" + path + "], replace = [" + replace + "], replaceWith = [" + replaceWith + "], log = [" + log + "]");
}
try (Stream<String> lines = Files.lines(Paths.get(path))) {
List<String> replaced = new ArrayList<>();
boolean alreadyReplaced = false;
for (String line : lines.collect(Collectors.toList())) {
if (line.contains(replace) && !alreadyReplaced) {
line = line.replace(replace, replaceWith);
alreadyReplaced = true;
}
replaced.add(line);
}
Files.write(Paths.get(path), replaced);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Try it with Jsoup.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JsoupTest {
public static void main(String argv[]) {
String html = "<html><head><title>Try it with Jsoup</title></head>"
+ "<body><p>P first</p><p>P second</p><p>P third</p></body></html>";
Document doc = Jsoup.parse(html);
Elements ps = doc.select("p"); // The tag you would like to add Id to
int i = 12;
for(Element p : ps){
p.attr("id",String.valueOf(i));
i++;
}
System.out.println(doc.toString());
}
}
currently I'm trying to convert a PDF to PDF/A.
However somehow I don't know if I can convert the colorspace is there any way by doing so?
this is my code, yet:
PDDocumentInformation info = doc.getDocumentInformation();
System.out.println("Page Count=" + doc.getNumberOfPages());
System.out.println("Title=" + info.getTitle());
System.out.println("Author=" + info.getAuthor());
System.out.println("Subject=" + info.getSubject());
System.out.println("Keywords=" + info.getKeywords());
System.out.println("Creator=" + info.getCreator());
System.out.println("Producer=" + info.getProducer());
System.out.println("Creation Date=" + info.getCreationDate());
System.out.println("Modification Date=" + info.getModificationDate());
System.out.println("Trapped=" + info.getTrapped());
PDDocumentCatalog cat = doc.getDocumentCatalog();
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
PDFAIdentificationSchema pdfaid = xmp.createAndAddPFAIdentificationSchema();
pdfaid.setConformance("A");
pdfaid.setPart(3);
pdfaid.setAboutAsSimple(null);
DublinCoreSchema dublinCoreSchema = xmp.createAndAddDublinCoreSchema();
dublinCoreSchema.setTitle(info.getTitle());
dublinCoreSchema.addCreator(info.getAuthor());
AdobePDFSchema adobePDFSchema = xmp.createAndAddAdobePDFSchema();
adobePDFSchema.setProducer(info.getProducer());
XMPBasicSchema xmpBasicSchema = xmp.createAndAddXMPBasicSchema();
xmpBasicSchema.setCreatorTool(info.getCreator());
xmpBasicSchema.setCreateDate(info.getCreationDate());
xmpBasicSchema.setModifyDate(info.getModificationDate());
xmp.addSchema(pdfaid);
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
InputStream colorProfile = PdfConverter.class.getResourceAsStream("/sRGBColorSpaceProfile.icm");
PDOutputIntent oi = new PDOutputIntent(doc, colorProfile);
oi.setInfo("sRGB IEC61966-2.1");
oi.setOutputCondition("sRGB IEC61966-2.1");
oi.setOutputConditionIdentifier("sRGB IEC61966-2.1");
oi.setRegistryName("http://www.color.org");
cat.addOutputIntent(oi);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
cat.setMetadata(metadata);
The colorspace gets added however on validation i get:
2.3.2 : Unexpected key in Graphic object definition, The ColorSpace is unknown
For every page/element whatever, it appears quite often.
Could I do anything against it? Like converting the ColorsSpace? Using antoher library?
I have found this trick to convert pdf to pdfA.
Fill the PDF form
Convert it to image
Create a valid PDFA form as explained in PDFBox website
Fill the image created as the result
In this example, I used : OoPdfFormExample.pdf that can be found easily in internet.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.preflight.Format;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.exception.SyntaxValidationException;
import org.apache.pdfbox.preflight.parser.PreflightParser;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.tools.imageio.ImageIOUtil;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.DublinCoreSchema;
import org.apache.xmpbox.schema.PDFAIdentificationSchema;
import org.apache.xmpbox.type.BadFieldValueException;
import org.apache.xmpbox.xml.XmpSerializer;
import javax.xml.transform.TransformerException;
import java.awt.image.BufferedImage;
import java.io.*;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Random;
public class CreatePDFAFile {
private static final String OUTPUT_DIR = "tmp";
static String separator = FileSystems.getDefault().getSeparator();
public static void main(String[] args) throws IOException {
Path tmpDir = getRandomPath();
String fileInput = fillForm("template/OoPdfFormExample.pdf", tmpDir);
String image = PDF2Image(fileInput, tmpDir);
String pdfa = createPDFA(image, tmpDir);
checkPDFAValidation(pdfa);
}
private static String fillForm(String formTemplate, Path path) throws IOException {
String fileOut = path + separator + "FillForm.pdf";
try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate))) {
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
if (acroForm != null) {
acroForm.getField(acroForm.getFields().get(0).getFullyQualifiedName()).setValue("TEST");
}
acroForm.refreshAppearances();
acroForm.flatten();
pdfDocument.save(fileOut);
}
return fileOut;
}
public static String PDF2Image(String fileInput, Path path) {
String fileName = "";
try (final PDDocument document = PDDocument.load(new File(fileInput))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
fileName = path + separator + "image-" + page + ".png";
ImageIOUtil.writeImage(bim, fileName, 300);
}
} catch (IOException e) {
System.err.println("Exception while trying to create pdf document - " + e);
}
return fileName;
}
public static String createPDFA(String imagePath, Path path) throws IOException {
try (PDDocument doc = new PDDocument()) {
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDType0Font.load(doc, new File("template" + separator + "LiberationSans-Regular.ttf"));
if (!font.isEmbedded()) {
throw new IllegalStateException("PDF/A compliance requires that all fonts used for"
+ " text rendering in rendering modes other than rendering mode 3 are embedded.");
}
try (PDPageContentStream contents = new PDPageContentStream(doc, page)) {
contents.beginText();
contents.setFont(font, 12);
contents.newLineAtOffset(100, 700);
contents.showText("");
contents.endText();
}
// add XMP metadata
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
String fileName = path + separator + "FinalPDFAFile.pdf";
try {
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle(fileName);
PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
id.setPart(1);
id.setConformance("B");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
} catch (BadFieldValueException | TransformerException e) {
throw new IllegalArgumentException(e);
}
// sRGB output intent
InputStream colorProfile = new FileInputStream(new File("template/sRGB.icc"));
PDOutputIntent intent = new PDOutputIntent(doc, colorProfile);
intent.setInfo("");
intent.setOutputCondition("");
intent.setOutputConditionIdentifier("");
intent.setRegistryName("");
doc.getDocumentCatalog().addOutputIntent(intent);
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, doc);
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true, true)) {
float scale = 1 / 5f;
contentStream.drawImage(pdImage, 20, 20, pdImage.getWidth() * scale, pdImage.getHeight() * scale);
}
doc.save(fileName);
return fileName;
}
}
private static void checkPDFAValidation(String fileName) throws IOException {
ValidationResult result = null;
PreflightParser parser = new PreflightParser(fileName);
try {
parser.parse(Format.PDF_A1B);
PreflightDocument document = parser.getPreflightDocument();
document.validate();
// Get validation result
result = document.getResult();
document.close();
} catch (SyntaxValidationException e) {
result = e.getResult();
}
if (result.isValid()) {
System.out.println("The file " + fileName + " is a valid PDF/A-1b file");
} else {
System.out.println("The file" + fileName + " is not valid, error(s) :");
for (ValidationResult.ValidationError error : result.getErrorsList()) {
System.out.println(error.getErrorCode() + " : " + error.getDetails());
}
}
}
private static Path getRandomPath() throws IOException {
String path = generateRandom();
Path tmpDir = Paths.get(OUTPUT_DIR + separator + path + separator);
Files.createDirectory(tmpDir);
return tmpDir;
}
private static String generateRandom() {
String aToZ = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
Random rand = new Random();
StringBuilder res = new StringBuilder();
for (int i = 0; i < 17; i++) {
int randIndex = rand.nextInt(aToZ.length());
res.append(aToZ.charAt(randIndex));
}
return res.toString();
}
}
I am having trouble with parsing some html files from a directory into an output directory. I'm using Jsoup to remove HTML tags and writing to an output directory but some of the data is lost when I'm testing it. What I want to do in the end with the parsed files is to populate a hashmap so that I can sort the words by frequency and then in a separate directory sort them by alphabetical order. This compiles and runs, but I am getting stuck at the very end when it comes to write out. Code would be lovely and all, but I'm only interested in the steps to take in order to set this entire thing up. Thank you.
Update: Here is code.
Update: Also I feel like Jsoup is getting rid of data.
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;
import org.jsoup.Jsoup;
public class Parser {
public static File infolder = new File("input folder folder path goes here");
static String temp = "";
static ArrayList<String> list = new ArrayList<String>();
public static void main(String[] args) throws FileNotFoundException
{
String outfolder = "output folder path goes here";
File theDir = new File(outfolder);
// if the directory does not exist, create it
if (!theDir.exists()) {
System.out.println("creating directory: " + outfolder);
boolean result = theDir.mkdir();
if (result) {
System.out.println("DIR created");
}
}
System.out.println("Reading files under the folder " + infolder.getAbsolutePath());
parseFiles(infolder);
// System.out.println();
}
public static void parseFiles(final File folder) throws FileNotFoundException
{
PrintWriter out = null;
for (final File fileEntry : folder.listFiles()) {
if (fileEntry.isFile()) {
temp = fileEntry.getName();
if ((temp.substring(temp.lastIndexOf('.') + 1, temp.length()).toLowerCase()).equals("html")) {
System.out.println("File= " + folder.getAbsolutePath() + "\\" + fileEntry.getName());
File file = new File(folder.getAbsolutePath() + "\\" + fileEntry.getName());
ArrayList<String> filetext = new ArrayList<String>();
Scanner in = new Scanner(file);
while (in.hasNextLine()) {
filetext.add(in.nextLine());
}
String filename = "tokenfile" + fileEntry.getName();
try {
out = new PrintWriter(new BufferedWriter(new FileWriter("C:/Users/bounty213/Desktop/Output/" + filename + ".txt", true)));
}
catch (IOException e) {
//exception handling left as an exercise for the reader
}
String parsed;
for (String word : filetext) {
parsed = Jsoup.parse(word).text();
System.out.println(parsed);
out.println(parsed);
}
out.close();
}
}
}
}
}
Some questions about using Zxing...
I write the following code to read barcode from an image:
public class BarCodeDecode
{
/**
* #param args
*/
public static void main(String[] args)
{
try
{
String tmpImgFile = "D:\\FormCode128.TIF";
Map<DecodeHintType,Object> tmpHintsMap = new EnumMap<DecodeHintType, Object>(DecodeHintType.class);
tmpHintsMap.put(DecodeHintType.TRY_HARDER, Boolean.TRUE);
tmpHintsMap.put(DecodeHintType.POSSIBLE_FORMATS, EnumSet.allOf(BarcodeFormat.class));
tmpHintsMap.put(DecodeHintType.PURE_BARCODE, Boolean.FALSE);
File tmpFile = new File(tmpImgFile);
String tmpRetString = BarCodeUtil.decode(tmpFile, tmpHintsMap);
//String tmpRetString = BarCodeUtil.decode(tmpFile, null);
System.out.println(tmpRetString);
}
catch (Exception tmpExpt)
{
System.out.println("main: " + "Excpt err! (" + tmpExpt.getMessage() + ")");
}
System.out.println("main: " + "Program end.");
}
}
public class BarCodeUtil
{
private static BarcodeFormat DEFAULT_BARCODE_FORMAT = BarcodeFormat.CODE_128;
/**
* Decode method used to read image or barcode itself, and recognize the barcode,
* get the encoded contents and returns it.
* #param whatFile image that need to be read.
* #param config configuration used when reading the barcode.
* #return decoded results from barcode.
*/
public static String decode(File whatFile, Map<DecodeHintType, Object> whatHints) throws Exception
{
// check the required parameters
if (whatFile == null || whatFile.getName().trim().isEmpty())
throw new IllegalArgumentException("File not found, or invalid file name.");
BufferedImage tmpBfrImage;
try
{
tmpBfrImage = ImageIO.read(whatFile);
}
catch (IOException tmpIoe)
{
throw new Exception(tmpIoe.getMessage());
}
if (tmpBfrImage == null)
throw new IllegalArgumentException("Could not decode image.");
LuminanceSource tmpSource = new BufferedImageLuminanceSource(tmpBfrImage);
BinaryBitmap tmpBitmap = new BinaryBitmap(new HybridBinarizer(tmpSource));
MultiFormatReader tmpBarcodeReader = new MultiFormatReader();
Result tmpResult;
String tmpFinalResult;
try
{
if (whatHints != null && ! whatHints.isEmpty())
tmpResult = tmpBarcodeReader.decode(tmpBitmap, whatHints);
else
tmpResult = tmpBarcodeReader.decode(tmpBitmap);
// setting results.
tmpFinalResult = String.valueOf(tmpResult.getText());
}
catch (Exception tmpExcpt)
{
throw new Exception("BarCodeUtil.decode Excpt err - " + tmpExcpt.toString() + " - " + tmpExcpt.getMessage());
}
return tmpFinalResult;
}
}
I try to read the following two images that contains code128 and QRCode.
It can work for the code128 but not for the QRCode.
Any one knows why...
Please go through this link for complete Tutorial. The author of this code is Joe. I have not developed this code, so I am just doing Copy paste to make sure this is available in case link is broken.
The author is using ZXing(Zebra Crossing Library) you can download it from here, for this tutorial.
QR Code Write and Read Program in Java:
package com.javapapers.java;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import javax.imageio.ImageIO;
import com.google.zxing.BarcodeFormat;
import com.google.zxing.BinaryBitmap;
import com.google.zxing.EncodeHintType;
import com.google.zxing.MultiFormatReader;
import com.google.zxing.MultiFormatWriter;
import com.google.zxing.NotFoundException;
import com.google.zxing.Result;
import com.google.zxing.WriterException;
import com.google.zxing.client.j2se.BufferedImageLuminanceSource;
import com.google.zxing.client.j2se.MatrixToImageWriter;
import com.google.zxing.common.BitMatrix;
import com.google.zxing.common.HybridBinarizer;
import com.google.zxing.qrcode.decoder.ErrorCorrectionLevel;
public class QRCode {
public static void main(String[] args) throws WriterException, IOException,
NotFoundException {
String qrCodeData = "Hello World!";
String filePath = "QRCode.png";
String charset = "UTF-8"; // or "ISO-8859-1"
Map<EncodeHintType, ErrorCorrectionLevel> hintMap = new HashMap<EncodeHintType, ErrorCorrectionLevel>();
hintMap.put(EncodeHintType.ERROR_CORRECTION, ErrorCorrectionLevel.L);
createQRCode(qrCodeData, filePath, charset, hintMap, 200, 200);
System.out.println("QR Code image created successfully!");
System.out.println("Data read from QR Code: "
+ readQRCode(filePath, charset, hintMap));
}
public static void createQRCode(String qrCodeData, String filePath,
String charset, Map hintMap, int qrCodeheight, int qrCodewidth)
throws WriterException, IOException {
BitMatrix matrix = new MultiFormatWriter().encode(
new String(qrCodeData.getBytes(charset), charset),
BarcodeFormat.QR_CODE, qrCodewidth, qrCodeheight, hintMap);
MatrixToImageWriter.writeToFile(matrix, filePath.substring(filePath
.lastIndexOf('.') + 1), new File(filePath));
}
public static String readQRCode(String filePath, String charset, Map hintMap)
throws FileNotFoundException, IOException, NotFoundException {
BinaryBitmap binaryBitmap = new BinaryBitmap(new HybridBinarizer(
new BufferedImageLuminanceSource(
ImageIO.read(new FileInputStream(filePath)))));
Result qrCodeResult = new MultiFormatReader().decode(binaryBitmap,
hintMap);
return qrCodeResult.getText();
}
}
Maven dependency for the ZXing QR Code library:
<dependency>
<groupId>com.google.zxing</groupId>
<artifactId>core</artifactId>
<version>2.2</version>
</dependency>
<dependency>
<groupId>com.google.zxing</groupId>
<artifactId>javase</artifactId>
<version>2.2</version>
</dependency>
Curiously your code works for me, but I had to remove the follow hint.
tmpHintsMap.put(DecodeHintType.PURE_BARCODE, Boolean.FALSE);
When my image is not pure barcode, this hint broke my result.
Thank you!
This Code worked for me.
public static List<string> ScanForBarcodes(string path)
{
return ScanForBarcodes(new Bitmap(path));
}
public static List<string> ScanForBarcodes(Bitmap bitmap)
{
// initialize a new Barcode reader.
BarcodeReader reader = new BarcodeReader
{
TryHarder = true, // TryHarder is slower but recognizes more Barcodes
PossibleFormats = new List<BarcodeFormat> // in the ZXing There is an Enum where all supported barcodeFormats were contained
{
BarcodeFormat.CODE_128,
BarcodeFormat.QR_CODE,
//BarcodeFormat. ... ;
}
};
return reader.DecodeMultiple(bitmap).Select(result => result.Text).ToList(); // return only the barcode string.
// If you want the full Result use: return reader.DecodeMultiple(bitmap);
}
Did you use this (ZXing.Net) Lib?