Java - Variable number of variables - java

I wrote a code to find all URLs within a PDF file and replace the one(s) that matches the parameters that was passed from a PHP script.
It is working fine when a single URL is passed. But I don't know how to handle more than one URL, I'm guessing I would need a loop that reads the array length, and call the changeURL method passing the correct parameters.
I actually made it work with if Statements (if myarray.lenght < 4 do this, if it is < 6, do that, if < 8.....), but I am guessing this is not the optimal way. So I removed it and want to try something else.
Parameters passed from PHP (in this order):
args[0] - Location of original PDF
args[1] - Location of new PDF
args[2] - URL 1 (URL to be changed)
args[3] - URL 1a (URL that will replace URL 1)
args[4] - URL 2 (URL to be changed)
args[5] - URL 2a - (URL that will replace URL 2)
args...
and so on... up to maybe around 16 args, depending on how many URLs the PDF file contains.
Here's the code:
Main.java
public class Main {
public static void main(String[] args) {
if (args.length >= 4) {
URLReplacer.changeURL(args);
} else {
System.out.println("PARAMETER MISSING FROM PHP");
}
}
}
URLReplacer.java
public class URLReplacer {
public static void changeURL(String... a) {
try (PDDocument doc = PDDocument.load(a[0])) {
List<?> allPages = doc.getDocumentCatalog().getAllPages();
for (int i = 0; i < allPages.size(); i++) {
PDPage page = (PDPage) allPages.get(i);
List annotations = page.getAnnotations();
for (int j = 0; j < annotations.size(); j++) {
PDAnnotation annot = (PDAnnotation) annotations.get(j);
if (annot instanceof PDAnnotationLink) {
PDAnnotationLink link = (PDAnnotationLink) annot;
PDAction action = link.getAction();
if (action instanceof PDActionURI) {
PDActionURI uri = (PDActionURI) action;
String oldURL = uri.getURI();
if (a[2].equals(oldURL)) {
//System.out.println("Page " + (i + 1) + ": Replacing " + oldURL + " with " + a[3]);
uri.setURI(a[3]);
}
}
}
}
}
doc.save(a[1]);
} catch (IOException | COSVisitorException e) {
e.printStackTrace();
}
}
}
I have tried all sort of loops, but with my limited Java skills, did not achieve any success.
Also, if you notice any dodgy code, kindly let me know so I can learn the best practices from more experienced programmers.

Your main problem - as I understand -, is the "variable number of variables". And you have to send from PHP to JAVA.
1 you can transmit one by one as your example
2 or, in a structure.
there are several structures.
JSON is rather simple at PHP: multiple examples here:
encode json using php?
and for java you have: Decoding JSON String in Java.
or others (like XML , which seems too complex for this).

I'd structure your method to accept specific parameters. I used map to accept URLs, a custom object would be another option.
Also notice the way loops are changed, might give you a hint on some Java skills.
public static void changeURL(String originalPdf, String targetPdf, Map<String, String> urls ) {
try (PDDocument doc = PDDocument.load(originalPdf)) {
List<PDPage> allPages = doc.getDocumentCatalog().getAllPages();
for(PDPage page: allPages){
List annotations = page.getAnnotations();
for(PDAnnotation annot : page.getAnnotations()){
if (annot instanceof PDAnnotationLink) {
PDAnnotationLink link = (PDAnnotationLink) annot;
PDAction action = link.getAction();
if (action instanceof PDActionURI) {
PDActionURI uri = (PDActionURI) action;
String oldURL = uri.getURI();
for (Map.Entry<String, String> url : urls.entrySet()){
if (url.getKey().equals(oldURL)) {
uri.setURI(url.getValue());
}
}
}
}
}
}
doc.save(targetPdf);
} catch (IOException | COSVisitorException e) {
e.printStackTrace();
}
}
If you have to get the URL and PDF locations from command line, then call the changeURL function like this:
public static void main(String[] args) {
if (args.length >= 4) {
String originalPdf = args[0];
String targetPdf = args[1];
Map<String, String> urls = new HashMap<String, String>();
for(int i = 2; i< args.length; i+=2){
urls.put(args[i], args[i+1]);
}
URLReplacer.changeURL(originalPdf, targetPdf, urls);
} else {
System.out.println("PARAMETER MISSING FROM PHP");
}
}

Of the top of my head, you could do something like this
public static void main(String[] args) {
if (args.length >= 4 && args.length % 2 == 0) {
for(int i = 2; i < args.length; i += 2) {
URLReplacer.changeURL(args[0], args[1], args[i], args[i+1]);
args[0] = args[1];
}
} else {
System.out.println("PARAMETER MISSING FROM PHP");
}
}

Related

System.out.format not working in java for loop

Below is the concerned code. Basically what the code is supposed to do is output the URL, name and version of each GitHub release defined by GetUpdateInfo.getInfo().
GetUpdateInfo.getInfo (NOTE Strings login, oauth and password omitted for security reasons.):
import java.util.List;
import org.kohsuke.github.*;
import org.apache.commons.lang3.ArrayUtils;
public class GetUpdateInfo {
public static getInfo() {
String version = "";
String url = "";
String[] urls = {};
String[] names = {};
String[] versions = {};
String[] releases = {};
GitHub github = GitHubBuilder.fromEnvironment(login, password, oauth).build();
//Get the repo name from the organization
GHOrganization gho = github.getOrganization("NuovoLauncher-Mods");
PagedIterable<GHRepository> repos = gho.listRepositories();
List<GHRepository> repos_list = repos.asList();
for(int i=0; i < repos_list.size(); i++) {
GHRepository repo_test = repos_list.get(i);
GHRelease latest = repo_test.getLatestRelease();
ArrayUtils.add(releases, latest.toString());
ArrayUtils.add(names, latest.getName());
ui.setName(names);
ui.setRelease(releases);
List<GHAsset> assets = latest.getAssets();
for( int x = 0; x < assets.size(); x++ ) {
GHAsset asset = assets.get(x);
url = asset.getBrowserDownloadUrl();
version = url.split("/")[7];
System.out.format("URL: %s, Name: %s, Latest Release: %s. Version %s\n", url, latest.getName(), latest, version);
ArrayUtils.add(urls, url);
ArrayUtils.add(versions, version);
ui.setURL(urls);
ui.setVersion(versions);
}
}
return ui;
}
public static void main(String[] args) throws Exception {
GetUpdateInfo.getInfo();
}
}
DownloadUpdate.runner:
public static void runner() throws Exception {
String system_type = System.getProperty("os.name");
File fpath = new File("");
UpdateInfo ui = GetUpdateInfo.getInfo();
for(int i = 0; i < ui.getName().length; i++) {
System.out.format("URL: %s, Name %s, Version, %s", ui.getURL()[i], ui.getName()[i], ui.getVersion()[i]);
System.out.format("Downloading %s-%s", ui.getName()[i], ui.getVersion()[i]);
System.out.print("\n");
if(system_type.equals("Linux")) {
fpath = new File(System.getProperty("user.home") + "/.minecraft/mods/" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
} else if(system_type.equals("Windows")) {
fpath = new File(System.getProperty("user.home") + "/AppData/Roaming/.minecraft/mods" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
} else {
fpath = new File(System.getProperty("user.home") + "/.minecraft/mods/" + ui.getName()[i] + "-" + ui.getVersion()[i] + ".jar");
}
String url = ui.getURL()[i];
FileUtils.copyURLToFile(new URL(url), fpath);
}
}
public static void main(String[] args) throws Exception {
System.out.println("DEBUG START");
DownloadUpdate.runner();
}
}
Looking at the code, I cannot see a reason why the code is not outputting like expected; I am getting zero output on console, simply the line stating that the code is being executed. No exceptions either.
EDIT: variable ui is not being returned properly. For example, ui.getName[0] throws an ArrayIndexOutOfBoundsException, due to the length being zero. Seeing this, I now understand why the for loop isn't behaving as expected. Is this a scope issue? What am I missing here?
An obvious problem of your code is the use of ArrayUtils.add: you have to reassign its result to the input array, as you cannot modify arrays like lists in Java.
Use it like this:
releases = ArrayUtils.add(releases, latest.toString());
names = ArrayUtils.add(names, latest.getName());
and later in the for-loop:
urls = ArrayUtils.add(urls, url);
versions = ArrayUtils.add(versions, version);
Also you don't need to set the elements in each loop cycle to the result:
ui.setURL(urls);
ui.setVersion(versions);
Those would be sufficient once the for-loop has completed.
An alternative would be to first use List<String> instead of the arrays. If you have control over the UpdateInfo class, change it there to be lists too, otherwise create an array from the lists before you set it in UpdateInfo.
As a general advice I would recommend that you get rid of your static methods. Create instances and use your credentials (login, password, oauth) as member fields OR pass in even the whole GitHub instance. This way you would have an easier time writing proper tests.

Unsupported data type when getting mail JPG images

I'm trying to get the inline images of a mail, for which I have the following code:
protected void setCidAttachments(Message message, MensajeEmail mensajeEmail) {
try {
MimeMultipart mimeMultipart = (MimeMultipart) message.getDataHandler().getContent();
for (int k = 0; k < mimeMultipart.getCount(); k++) {
MimeBodyPart part = (MimeBodyPart) mimeMultipart.getBodyPart(k);
processPart(part, mensajeEmail);
}
}
catch (Exception e) {
log.error("Error obtendo adxuntos con cid", e);
}
}
private void processPart (BodyPart part, MensajeEmail mensajeEmail) throws MessagingException, IOException {
String type = getContentType(part);
StringBuilder content = new StringBuilder(mensajeEmail.getContenido());
if (isImage(type) && part.getDataHandler() != null && part.getDataHandler().getContent() != null) {
if (part.getDataHandler().getContent() instanceof MimeMultipart) {
MimeMultipart p = (MimeMultipart) part.getDataHandler().getContent();
for (int i = 0; i < p.getCount(); i++) {
BodyPart subpart = p.getBodyPart(i != p.getCount() - 1 ? i + 1 : i);
processPart(subpart, mensajeEmail);
}
} else {
mensajeEmail.setContenido(getInlineImage(part, content));
}
}
}
private String getInlineImage (BodyPart part, StringBuilder content) throws MessagingException, IOException {
Base64 decoder64 = new Base64();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
// Get type
String type = getContentType(part);
// Get Content-ID
String contentId = getContentId(part);
// Replace
if (contentId.length() > 0) {
part.getDataHandler().writeTo(bos);
int start = content.indexOf("src=\"cid:" + contentId + "\"") + 5;
if (start > 4) {
int length = contentId.length() + 4;
content.replace(start, start + length, "data:" + (isImage(type) ? type : "image/png;") + " base64," + decoder64.encodeToString(bos.toByteArray()));
}
}
bos.close();
return content.toString();
}
private String getContentId (BodyPart part) throws MessagingException {
Enumeration headers = part.getAllHeaders();
while (headers.hasMoreElements()) {
Header header = (Header)headers.nextElement();
if (header.getName().equalsIgnoreCase("Content-ID"))
return cleanContentId(header.getValue());
}
return "";
}
private String getContentType (BodyPart part) throws MessagingException {
return part.getContentType().split(" ")[0];
}
private boolean isImage (String mime) {
return !mime.equals("text/html;") && !mime.equals("text/plain;");
}
private String cleanContentId (String contentId) {
if (contentId.charAt(0) == '<') contentId = contentId.substring(1, contentId.length() - 1);
return contentId;
}
This works perfectly fine when I send PNG images (which makes me think my code is indeed correct). However, when I try to send a JPG image, I get the following exception:
javax.activation.UnsupportedDataTypeException: Unknown image type image/jpeg; name=sony-car-796x418.jpg
at org.apache.geronimo.activation.handlers.AbstractImageHandler.getContent(AbstractImageHandler.java:57)
at javax.activation.DataSourceDataContentHandler.getContent(DataHandler.java:795)
at javax.activation.DataHandler.getContent(DataHandler.java:542)
at es.enxenio.fcpw.plinper.daemons.email.AbstractProtocoloObtencionEmail.processPart(AbstractProtocoloObtencionEmail.java:378)
...
Is the framework really not able to work with JPG images? Is there some way I can fix this?
EDIT: Gmail doesn't even let me send JPG images so it's probably not a very common format for mail images, which makes me think might not be widely implemented and that could be the reason why Java doesn't seem to be able to work with it
I found the problem. This line
if (isImage(type) && part.getDataHandler() != null && part.getDataHandler().getContent()
shouldn't check whether the type is an image but whether it is a multipart. Otherwise we could be processing a jpg image as a multipart. For some reason png images don't mind this and that's why it was working. Here are the changed parts of the code:
protected void setCidAttachments(Message message, MensajeEmail mensajeEmail) {
try {
processPart(message, mensajeEmail);
}
catch (Exception e) {
log.error("Error obtendo adxuntos con cid", e);
}
}
private void processPart(Part part, MensajeEmail mensajeEmail) throws MessagingException, IOException {
String type = getContentType(part);
StringBuilder content = new StringBuilder(mensajeEmail.getContenido());
if (isMultipart(type) && part.getDataHandler() != null && part.getDataHandler().getContent() != null && part.getDataHandler().getContent() instanceof MimeMultipart) {
MimeMultipart multipart = (MimeMultipart) part.getDataHandler().getContent();
for (int i = 0; i < multipart.getCount(); i++) {
BodyPart subpart = multipart.getBodyPart(i);
processPart(subpart, mensajeEmail);
}
}
else {
mensajeEmail.setContenido(getInlineImage(part, content));
}
}
private boolean isMultipart(String mime) {
return (Pattern.matches("multipart/.*", mime));
}
I got a similar exception running an app on eclipse osgi with java 11 and with bundles javax.mail.glassfish 1.4.1 and javax.activation 1.1.0 (got these 2 from https://eclipse.org/orbit):
javax.activation.UnsupportedDataTypeException: Unknown image type image/jpeg; name=image001.jpg
at org.apache.geronimo.activation.handlers.AbstractImageHandler.getContent(AbstractImageHandler.java:57)
at javax.activation.DataHandler.getContent(DataHandler.java:147)
at javax.mail.internet.MimeBodyPart.getContent(MimeBodyPart.java:652)
at my.code.calling.getcontent.MyClass(MyClass.java:802)
The package org.apache.geronimo.activation.handlers is included in the javax.transaction 1.1.0 bundle.
I resolved the problem by #-commenting the image/gif, image/jpeg handlers in the file META-INF/mailcap inside the javax.activation bundle:
## <apache license disclaimer> http://www.apache.org/licenses/LICENSE-2.0
##
## $Rev$ $Date: 2008/04/09 19:25:23 $
##
text/plain;; x-java-content-handler=org.apache.geronimo.activation.handlers.TextPlainHandler
text/html;; x-java-content-handler=org.apache.geronimo.activation.handlers.TextHtmlHandler
text/xml;; x-java-content-handler=org.apache.geronimo.activation.handlers.TextXmlHandler
#image/gif;; x-java-content-handler=org.apache.geronimo.activation.handlers.ImageGifHandler
#image/jpeg;; x-java-content-handler=org.apache.geronimo.activation.handlers.ImageJpegHandler
multipart/*;; x-java-content-handler=org.apache.geronimo.activation.handlers.MultipartHandler
There's no image/png here, that's why pngs are not a problem in the first place.
So by commenting gif and jpeg handlers, attachments of these types are now handled like pngs: getContent() will just yield an InputStream, instead of an AWT Image, which I think those geronimo ImageHandlers would produce if everything worked as intended.
Some internals: Geronimo AbstractImageHandler of javax.activation 1.1.0 tries to determine the type of image from javax.mail.glassfish 1.4.1 method IMAPBodyPart.getContent(), but this returns the mime-type incl. parameters, e.g. "image/jpeg; name=sony-car-796x418.jpg", which isn't understood and ultimately leads to the UnsupportedDataTypeException.
javax.mail.glassfish also has an META-INF/mailcap file, whose image/* section interestingly looks like this:
# can't support image types because java.awt.Toolkit doesn't work on servers
#
#image/gif;; x-java-content-handler=com.sun.mail.handlers.image_gif
#image/jpeg;; x-java-content-handler=com.sun.mail.handlers.image_jpeg
That could be a lead, I still did get the original jpeg exception also in a gui application, though.
Another thing, this error doesn't occur for me when running the same setup with java 8 instead of 11, probably got something to do with java 8 having its own javax.activation package.

Write a Java program that downloads the first 100 comics of the webcomic XKCD. Be sure to use https:// for all URLS

This is what I have so far, and I am having trouble downloading 1-100 comics starting at https://xkcd.com/1/ and I know I am supposed to be going to the source code for the website. However, I cant seem to figure out how to get all the first 100 comics into my designated file I set it to save to. For example, I want https://xkcd.com/1/(view-source:https://xkcd.com/1/), https://xkcd.com/2/(view-source:https://xkcd.com/2/), and all the way up to comic 100. I know the img src is at line 50, but once again I don't know how to approach it.
public static void main(String[] args) {
URL imgURL = null;
for (int web = 1; web <= 100; web++) {
try {
imgURL = new URL("https://imgs.xkcd.com/comics/barrel_cropped_(1).jpg");
InputStream stream = imgURL.openStream();
Files.copy(stream, Paths.get("file/WebComics" + web + ".png"));
System.out.println("Done!");
} catch (Exception e) {
e.printStackTrace();
System.out.println("Error!");
}
}
}
}
Add jsoup library jar to your project, and then try this:
static void do_page(int id) throws IOException {
Document doc = Jsoup.connect("https://xkcd.com/" + id).get();
Elements imgs = doc.select("#comic img");
for (Element e: imgs) {
System.out.println(e.attr("src"));
}
}
Then call the do_page function in a loop:
for (int i = 1; i <= 100; i++) {
do_page(i);
}
Now, instead of printing it, you can use JSoup again to probably download the images like you see fit.

Using PDFBox to remove Optional Content Groups that are not enabled

I'm using apache PDFBox from java, and I have a source PDF with multiple optional content groups. What I am wanting to do is export a version of the PDF that includes only the standard content and the optional content groups that were enabled. It is important for my purposes that I preserve any dynamic aspects of the original.... so text fields are still text fields, vector images are still vector images, etc. The reason that this is required is because I intend to ultimately be using a pdf form editor program that does not know how to handle optional content, and would blindly render all of them, so I want to preprocess the source pdf, and use the form editing program on a less cluttered destination pdf.
I've been trying to find something that could give me any hints on how to do this with google, but to no avail. I don't know if I'm just using the wrong search terms, or if this is just something that is outside of what the PDFBox API was designed for. I rather hope it's not the latter. The info shown here does not seem to work (converting the C# code to java), because despite the pdf I'm trying to import having optional content, there does not seem to be any OC resources when I examine the tokens on each page.
for(PDPage page:pages) {
PDResources resources = page.getResources();
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
Collection tokens = parser.getTokens();
...
}
I'm truly sorry for not having any more code to show what I've tried so far, but I've just been poring over the java API docs for about 8 hours now trying to figure out what I might need to do this, and just haven't been able to figure it out.
What I DO know how to do is add text, lines, and images to a new PDPage, but I do not know how to retrieve that information from a given source page to copy it over, nor how to tell which optional content group such information is part of (if any). I am also not sure how to copy form fields in the source pdf over to the destination, nor how to copy the font information over.
Honestly, if there's a web page out there that I wasn't able to find with google with the searches that I tried, I'd be entirely happy to read up more about it, but I am really quite stuck here, and I don't know anyone personally that knows about this library.
Please help.
EDIT:
Trying what I understand from what was suggested below, I've written a loop to examine each XObject on the page as follows:
PDResources resources = pdPage.getResources();
Iterable<COSName> names = resources.getXObjectNames();
for(COSName name:names) {
PDXObject xobj = resources.getXObject(name);
PDFStreamParser parser = new PDFStreamParser(xobj.getStream().toByteArray());
parser.parse();
Object [] tokens = parser.getTokens().toArray();
for(int i = 0;i<tokens.length-1;i++) {
Object obj = tokens[i];
if (obj instanceof COSName && obj.equals(COSName.OC)) {
i++;
Object obj = tokens[i];
if (obj instanceof COSName) {
PDPropertyList props = resources.getProperties((COSName)obj);
if (props != null) {
...
However, after an OC key, the next entry in the tokens array is always an Operator tagged as "BMC". Nowhere am I finding any info that I can recognize from the named optional content groups.
Here's a robust solution for removing marked content blocks (open to feedback if anyone finds anything that isn't working right). You should be able to adjust for OC blocks...
This code properly handles nesting and removal of resources (xobject, graphics state and fonts - easy to add others if needed).
public class MarkedContentRemover {
private final MarkedContentMatcher matcher;
/**
*
*/
public MarkedContentRemover(MarkedContentMatcher matcher) {
this.matcher = matcher;
}
public int removeMarkedContent(PDDocument doc, PDPage page) throws IOException {
ResourceSuppressionTracker resourceSuppressionTracker = new ResourceSuppressionTracker();
PDResources pdResources = page.getResources();
PDFStreamParser pdParser = new PDFStreamParser(page);
PDStream newContents = new PDStream(doc);
OutputStream newContentOutput = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter newContentWriter = new ContentStreamWriter(newContentOutput);
List<Object> operands = new ArrayList<>();
Operator operator = null;
Object token;
int suppressDepth = 0;
boolean resumeOutputOnNextOperator = false;
int removedCount = 0;
while (true) {
operands.clear();
token = pdParser.parseNextToken();
while(token != null && !(token instanceof Operator)) {
operands.add(token);
token = pdParser.parseNextToken();
}
operator = (Operator)token;
if (operator == null) break;
if (resumeOutputOnNextOperator) {
resumeOutputOnNextOperator = false;
suppressDepth--;
if (suppressDepth == 0)
removedCount++;
}
if (OperatorName.BEGIN_MARKED_CONTENT_SEQ.equals(operator.getName())
|| OperatorName.BEGIN_MARKED_CONTENT.equals(operator.getName())) {
COSName contentId = (COSName)operands.get(0);
final COSDictionary properties;
if (operands.size() > 1) {
Object propsOperand = operands.get(1);
if (propsOperand instanceof COSDictionary) {
properties = (COSDictionary) propsOperand;
} else if (propsOperand instanceof COSName) {
properties = pdResources.getProperties((COSName)propsOperand).getCOSObject();
} else {
properties = new COSDictionary();
}
} else {
properties = new COSDictionary();
}
if (matcher.matches(contentId, properties)) {
suppressDepth++;
}
}
if (OperatorName.END_MARKED_CONTENT.equals(operator.getName())) {
if (suppressDepth > 0)
resumeOutputOnNextOperator = true;
}
else if (OperatorName.SET_GRAPHICS_STATE_PARAMS.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.EXT_G_STATE, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.DRAW_OBJECT.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.XOBJECT, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.SET_FONT_AND_SIZE.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.FONT, operands.get(0), suppressDepth == 0);
}
if (suppressDepth == 0) {
newContentWriter.writeTokens(operands);
newContentWriter.writeTokens(operator);
}
}
if (resumeOutputOnNextOperator)
removedCount++;
newContentOutput.close();
page.setContents(newContents);
resourceSuppressionTracker.updateResources(pdResources);
return removedCount;
}
private static class ResourceSuppressionTracker{
// if the boolean is TRUE, then the resource should be removed. If the boolean is FALSE, the resource should not be removed
private final Map<COSName, Map<COSName, Boolean>> tracker = new HashMap<>();
public void markForOperator(COSName resourceType, Object resourceNameOperand, boolean preserve) {
if (!(resourceNameOperand instanceof COSName)) return;
if (preserve) {
markForPreservation(resourceType, (COSName)resourceNameOperand);
} else {
markForRemoval(resourceType, (COSName)resourceNameOperand);
}
}
public void markForRemoval(COSName resourceType, COSName refId) {
if (!resourceIsPreserved(resourceType, refId)) {
getResourceTracker(resourceType).put(refId, Boolean.TRUE);
}
}
public void markForPreservation(COSName resourceType, COSName refId) {
getResourceTracker(resourceType).put(refId, Boolean.FALSE);
}
public void updateResources(PDResources pdResources) {
for (Map.Entry<COSName, Map<COSName, Boolean>> resourceEntry : tracker.entrySet()) {
for(Map.Entry<COSName, Boolean> refEntry : resourceEntry.getValue().entrySet()) {
if (refEntry.getValue().equals(Boolean.TRUE)) {
pdResources.getCOSObject().getCOSDictionary(COSName.XOBJECT).removeItem(refEntry.getKey());
}
}
}
}
private boolean resourceIsPreserved(COSName resourceType, COSName refId) {
return getResourceTracker(resourceType).getOrDefault(refId, Boolean.FALSE);
}
private Map<COSName, Boolean> getResourceTracker(COSName resourceType){
if (!tracker.containsKey(resourceType)) {
tracker.put(resourceType, new HashMap<>());
}
return tracker.get(resourceType);
}
}
}
Helper class:
public interface MarkedContentMatcher {
public boolean matches(COSName contentId, COSDictionary props);
}
Optional Content Groups are marked with BDC and EMC. You will have to navigate through all of the tokens returned from the parser and remove the "section" from the array. Here is some C# Code that was posted a while ago - [1]: How to delete an optional content group alongwith its content from pdf using pdfbox?
I investigated that (converting to Java) but couldn't get it work as expected. I managed to remove the content between BDC and EMC and then save the result using the same technique as the sample but the PDF was corrupted. Perhaps that is my lack of C# Knowledge (related to Tuples etc.)
Here is what I came up with, as I said it doesn't work perhaps you or someone else (mkl, Tilman Hausherr) can spot the flaw.
OCGDelete (PDDocument doc, int pageNum, String OCName) {
PDPage pdPage = (PDPage) doc.getDocumentCatalog().getPages().get(pageNum);
PDResources pdResources = pdPage.getResources();
PDFStreamParser pdParser = new PDFStreamParser(pdPage);
int ocgStart
int ocgLength
Collection tokens = pdParser.getTokens();
Object[] newTokens = tokens.toArray()
try {
for (int index = 0; index < newTokens.length; index++) {
obj = newTokens[index]
if (obj instanceof COSName && obj.equals(COSName.OC)) {
// println "Found COSName at "+index /// Found Optional Content
startIndex = index
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if (obj instanceof COSName) {
prop = pdRes.getProperties(obj)
if (prop != null && prop instanceof PDOptionalContentGroup) {
if ((prop.getName()).equals(delLayer)) {
println "Found the Layer to be deleted"
println "prop Name was " + prop.getName()
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if ((obj.getName()).equals("BDC")) {
ocgStart = index
println("OCG Start " + ocgStart)
ocgLength = -1
index++
while (index < newTokens.size()) {
ocgLength++
obj = newTokens[index]
println " Loop through relevant OCG Tokens " + obj
if (obj instanceof Operator && (obj.getName()).equals("EMC")) {
println "the next obj was " + obj
println "after that " + newTokens[index + 1] + "and then " + newTokens[index + 2]
println("OCG End " + ocgLength++)
break
}
index++
}
if (endIndex > 0) {
println "End Index was something " + (startIndex + ocgLength)
}
}
}
}
}
}
}
}
}
}
catch (Exception ex){
println ex.message()
}
for (int i = ocgStart; i < ocgStart+ ocgLength; i++){
newTokens.removeAt(i)
}
PDStream newContents = new PDStream(doc);
OutputStream output = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(output);
writer.writeTokens(newTokens);
output.close();
pdPage.setContents(newContents);
}

How can I get a Javascript variable value in Java?

In my current project I have to read a JavaScript file from the web and extract an object from it. The variable can vary from time to time, so I have to read it instead of hard coding it into my android app.
Say I want to extract the following variable (and parse the string using JSONObject after that, which is trivial):
var abc.xyz = {
"a": {"one", "two", "three"},
"b": {"four", "five"}
}
I have a problem with this. Do I have to implement some compiler-like scanner just to look for the name and get its value, or there is some existing tool I can use?
The JavaScript file is not as simple as this example. It contains a lot of other code. So a simple new JSONObject() or something will not do.
There are many libraries in Java to parse the JSON. There is a list on JSON.org
Read the file with Java
import org.json.JSONObject;
URL url = new URL("http://example.com/foo.js");
InputStream urlInputStream = url.openStream();
JSONObject json = new JSONObject(urlInputStream.toString());
Finally code it myself.
//remove comments
private String removeComment(String html){
String commentA = "/*";
String commentB = "*/";
int indexA, indexB;
indexA = html.indexOf(commentA);
indexB = html.indexOf(commentB);
while(indexA != -1 && indexB != -1 ){
html = html.substring(0, indexA) + html.substring(indexB + commentB.length());
indexA = html.indexOf(commentA);
indexB = html.indexOf(commentB);
}
return html;
}
//find variable with name varName
private String findVar(String varName, String html, char lBrace, char rBrace){
String tmp = html.substring(html.indexOf(varName));
tmp = tmp.substring(tmp.indexOf(lBrace));
int braceCount = 0;
int index = 0;
while(true){
if(tmp.charAt(index) == lBrace){
braceCount ++;
}else if(tmp.charAt(index) == rBrace){
braceCount --;
}
index ++;
if(braceCount == 0){
break;
}
}
return tmp.substring(0, index);
}

Categories

Resources