I'm trying to extract content of a MTOM using code below
Iterator i = msg.getAttachments();
while (i.hasNext())
{
AttachmentPart att = (AttachmentPart)i.next();
Object obj = att.getContent();
}
where msg is SOAPMessage MIME type but the rawContent comes as null and will crash on getting AttachmentPart
Is there any other way to get MTOM content? Getting boundaries and looping through?
I have ended up with the code below
MimeMultipart mp = new MimeMultipart(new ByteArrayDataSource(InputStream, "multipart/related"));
int count = mp.getCount();
for (int i = 0; i < count; i++) {
BodyPart bodyPart = mp.getBodyPart(i);
String content = new String(read(bodyPart));
String partContentType = bodyPart.getContentType();
if(partContentType.toLowerCase().contains(SOAPConstants.SOAP_1_2_CONTENT_TYPE)) {
//process SOAP 1.2
}
if(partContentType.toLowerCase().contains(SOAPConstants.SOAP_1_1_CONTENT_TYPE)) {
//process SOAP 1.1
}
if(partContentType.toLowerCase().contains("application/octet-stream")) {
// process binary part
}
}
I'm using apache PDFBox from java, and I have a source PDF with multiple optional content groups. What I am wanting to do is export a version of the PDF that includes only the standard content and the optional content groups that were enabled. It is important for my purposes that I preserve any dynamic aspects of the original.... so text fields are still text fields, vector images are still vector images, etc. The reason that this is required is because I intend to ultimately be using a pdf form editor program that does not know how to handle optional content, and would blindly render all of them, so I want to preprocess the source pdf, and use the form editing program on a less cluttered destination pdf.
I've been trying to find something that could give me any hints on how to do this with google, but to no avail. I don't know if I'm just using the wrong search terms, or if this is just something that is outside of what the PDFBox API was designed for. I rather hope it's not the latter. The info shown here does not seem to work (converting the C# code to java), because despite the pdf I'm trying to import having optional content, there does not seem to be any OC resources when I examine the tokens on each page.
for(PDPage page:pages) {
PDResources resources = page.getResources();
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
Collection tokens = parser.getTokens();
...
}
I'm truly sorry for not having any more code to show what I've tried so far, but I've just been poring over the java API docs for about 8 hours now trying to figure out what I might need to do this, and just haven't been able to figure it out.
What I DO know how to do is add text, lines, and images to a new PDPage, but I do not know how to retrieve that information from a given source page to copy it over, nor how to tell which optional content group such information is part of (if any). I am also not sure how to copy form fields in the source pdf over to the destination, nor how to copy the font information over.
Honestly, if there's a web page out there that I wasn't able to find with google with the searches that I tried, I'd be entirely happy to read up more about it, but I am really quite stuck here, and I don't know anyone personally that knows about this library.
Please help.
EDIT:
Trying what I understand from what was suggested below, I've written a loop to examine each XObject on the page as follows:
PDResources resources = pdPage.getResources();
Iterable<COSName> names = resources.getXObjectNames();
for(COSName name:names) {
PDXObject xobj = resources.getXObject(name);
PDFStreamParser parser = new PDFStreamParser(xobj.getStream().toByteArray());
parser.parse();
Object [] tokens = parser.getTokens().toArray();
for(int i = 0;i<tokens.length-1;i++) {
Object obj = tokens[i];
if (obj instanceof COSName && obj.equals(COSName.OC)) {
i++;
Object obj = tokens[i];
if (obj instanceof COSName) {
PDPropertyList props = resources.getProperties((COSName)obj);
if (props != null) {
...
However, after an OC key, the next entry in the tokens array is always an Operator tagged as "BMC". Nowhere am I finding any info that I can recognize from the named optional content groups.
Here's a robust solution for removing marked content blocks (open to feedback if anyone finds anything that isn't working right). You should be able to adjust for OC blocks...
This code properly handles nesting and removal of resources (xobject, graphics state and fonts - easy to add others if needed).
public class MarkedContentRemover {
private final MarkedContentMatcher matcher;
/**
*
*/
public MarkedContentRemover(MarkedContentMatcher matcher) {
this.matcher = matcher;
}
public int removeMarkedContent(PDDocument doc, PDPage page) throws IOException {
ResourceSuppressionTracker resourceSuppressionTracker = new ResourceSuppressionTracker();
PDResources pdResources = page.getResources();
PDFStreamParser pdParser = new PDFStreamParser(page);
PDStream newContents = new PDStream(doc);
OutputStream newContentOutput = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter newContentWriter = new ContentStreamWriter(newContentOutput);
List<Object> operands = new ArrayList<>();
Operator operator = null;
Object token;
int suppressDepth = 0;
boolean resumeOutputOnNextOperator = false;
int removedCount = 0;
while (true) {
operands.clear();
token = pdParser.parseNextToken();
while(token != null && !(token instanceof Operator)) {
operands.add(token);
token = pdParser.parseNextToken();
}
operator = (Operator)token;
if (operator == null) break;
if (resumeOutputOnNextOperator) {
resumeOutputOnNextOperator = false;
suppressDepth--;
if (suppressDepth == 0)
removedCount++;
}
if (OperatorName.BEGIN_MARKED_CONTENT_SEQ.equals(operator.getName())
|| OperatorName.BEGIN_MARKED_CONTENT.equals(operator.getName())) {
COSName contentId = (COSName)operands.get(0);
final COSDictionary properties;
if (operands.size() > 1) {
Object propsOperand = operands.get(1);
if (propsOperand instanceof COSDictionary) {
properties = (COSDictionary) propsOperand;
} else if (propsOperand instanceof COSName) {
properties = pdResources.getProperties((COSName)propsOperand).getCOSObject();
} else {
properties = new COSDictionary();
}
} else {
properties = new COSDictionary();
}
if (matcher.matches(contentId, properties)) {
suppressDepth++;
}
}
if (OperatorName.END_MARKED_CONTENT.equals(operator.getName())) {
if (suppressDepth > 0)
resumeOutputOnNextOperator = true;
}
else if (OperatorName.SET_GRAPHICS_STATE_PARAMS.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.EXT_G_STATE, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.DRAW_OBJECT.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.XOBJECT, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.SET_FONT_AND_SIZE.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.FONT, operands.get(0), suppressDepth == 0);
}
if (suppressDepth == 0) {
newContentWriter.writeTokens(operands);
newContentWriter.writeTokens(operator);
}
}
if (resumeOutputOnNextOperator)
removedCount++;
newContentOutput.close();
page.setContents(newContents);
resourceSuppressionTracker.updateResources(pdResources);
return removedCount;
}
private static class ResourceSuppressionTracker{
// if the boolean is TRUE, then the resource should be removed. If the boolean is FALSE, the resource should not be removed
private final Map<COSName, Map<COSName, Boolean>> tracker = new HashMap<>();
public void markForOperator(COSName resourceType, Object resourceNameOperand, boolean preserve) {
if (!(resourceNameOperand instanceof COSName)) return;
if (preserve) {
markForPreservation(resourceType, (COSName)resourceNameOperand);
} else {
markForRemoval(resourceType, (COSName)resourceNameOperand);
}
}
public void markForRemoval(COSName resourceType, COSName refId) {
if (!resourceIsPreserved(resourceType, refId)) {
getResourceTracker(resourceType).put(refId, Boolean.TRUE);
}
}
public void markForPreservation(COSName resourceType, COSName refId) {
getResourceTracker(resourceType).put(refId, Boolean.FALSE);
}
public void updateResources(PDResources pdResources) {
for (Map.Entry<COSName, Map<COSName, Boolean>> resourceEntry : tracker.entrySet()) {
for(Map.Entry<COSName, Boolean> refEntry : resourceEntry.getValue().entrySet()) {
if (refEntry.getValue().equals(Boolean.TRUE)) {
pdResources.getCOSObject().getCOSDictionary(COSName.XOBJECT).removeItem(refEntry.getKey());
}
}
}
}
private boolean resourceIsPreserved(COSName resourceType, COSName refId) {
return getResourceTracker(resourceType).getOrDefault(refId, Boolean.FALSE);
}
private Map<COSName, Boolean> getResourceTracker(COSName resourceType){
if (!tracker.containsKey(resourceType)) {
tracker.put(resourceType, new HashMap<>());
}
return tracker.get(resourceType);
}
}
}
Helper class:
public interface MarkedContentMatcher {
public boolean matches(COSName contentId, COSDictionary props);
}
Optional Content Groups are marked with BDC and EMC. You will have to navigate through all of the tokens returned from the parser and remove the "section" from the array. Here is some C# Code that was posted a while ago - [1]: How to delete an optional content group alongwith its content from pdf using pdfbox?
I investigated that (converting to Java) but couldn't get it work as expected. I managed to remove the content between BDC and EMC and then save the result using the same technique as the sample but the PDF was corrupted. Perhaps that is my lack of C# Knowledge (related to Tuples etc.)
Here is what I came up with, as I said it doesn't work perhaps you or someone else (mkl, Tilman Hausherr) can spot the flaw.
OCGDelete (PDDocument doc, int pageNum, String OCName) {
PDPage pdPage = (PDPage) doc.getDocumentCatalog().getPages().get(pageNum);
PDResources pdResources = pdPage.getResources();
PDFStreamParser pdParser = new PDFStreamParser(pdPage);
int ocgStart
int ocgLength
Collection tokens = pdParser.getTokens();
Object[] newTokens = tokens.toArray()
try {
for (int index = 0; index < newTokens.length; index++) {
obj = newTokens[index]
if (obj instanceof COSName && obj.equals(COSName.OC)) {
// println "Found COSName at "+index /// Found Optional Content
startIndex = index
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if (obj instanceof COSName) {
prop = pdRes.getProperties(obj)
if (prop != null && prop instanceof PDOptionalContentGroup) {
if ((prop.getName()).equals(delLayer)) {
println "Found the Layer to be deleted"
println "prop Name was " + prop.getName()
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if ((obj.getName()).equals("BDC")) {
ocgStart = index
println("OCG Start " + ocgStart)
ocgLength = -1
index++
while (index < newTokens.size()) {
ocgLength++
obj = newTokens[index]
println " Loop through relevant OCG Tokens " + obj
if (obj instanceof Operator && (obj.getName()).equals("EMC")) {
println "the next obj was " + obj
println "after that " + newTokens[index + 1] + "and then " + newTokens[index + 2]
println("OCG End " + ocgLength++)
break
}
index++
}
if (endIndex > 0) {
println "End Index was something " + (startIndex + ocgLength)
}
}
}
}
}
}
}
}
}
}
catch (Exception ex){
println ex.message()
}
for (int i = ocgStart; i < ocgStart+ ocgLength; i++){
newTokens.removeAt(i)
}
PDStream newContents = new PDStream(doc);
OutputStream output = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(output);
writer.writeTokens(newTokens);
output.close();
pdPage.setContents(newContents);
}
I am trygin to fetch the html source of an email message so i can parse it.
For some reason content.getCount() returns 2
and sometime the content.getBodyPart(i) is actually a nested Multipart object
How do I distinguish between the elements returned by content.getBodyPart(i)
Is this the correct way to fetch the HTML Source?
ParsedEmailData procesMultiPart(Multipart content){
for (int i = 0; i < content.getCount(); i++) {
BodyPart bodyPart = content.getBodyPart(i);
Object body = bodyPart.getContent();
if(body instanceof Multipart) {
return procesMultiPart((Multipart) body);
}
if (body instanceof String) {
return parsedEmailData = parseEmailBody((String) body);
}
}
I am trying to upload a file to a S3 container and before doing the upload, I am setting the metadata of the file. The upload fails with an error saying signature doesn't match. Below is the code I am using :
public URL send(File f, HashMap<String,String> metadata, String type) throws Exception {
String path = type+"/"+f.getName();
InitiateMultipartUploadRequest req = new InitiateMultipartUploadRequest(container, secretKey).withKey(path);
req.setCannedACL(CannedAccessControlList.AuthenticatedRead);
if (metadata != null) {
ObjectMetadata objectMetadata = new ObjectMetadata();
Set<String> keys = metadata.keySet();
Iterator<String> i = keys.iterator();
while (i.hasNext()) {
String key = i.next();
objectMetadata.addUserMetadata(key, metadata.get(key));
}
req.setObjectMetadata(objectMetadata);
}
InitiateMultipartUploadResult res = s3client.initiateMultipartUpload(req);
String uploadId = res.getUploadId();
long fileSize = f.length();
//check the size doesn't exceed max limit
if (fileSize > MAX_OBJ_SIZE) {
throw new Exception("Object size exceeds repository limit");
}
long chunkSize = 1024 * 1024 * 16;
int chunks = (int) (fileSize/chunkSize + 2);
List<PartETag> chunkList = new ArrayList<PartETag>();
long pos = 0;
try {
for (int i = 1; i < chunks; i++) {
if ((chunks -i) < 2) {
chunkSize = fileSize - pos;
}
UploadPartRequest upReq = new UploadPartRequest()
.withBucketName(container).withKey(path)
.withUploadId(uploadId).withPartNumber(i)
.withFileOffset(pos).withFile(f)
.withPartSize(chunkSize);
PartETag pTag = null;
// repeat the upload until it succeeds.
boolean repeat;
do {
repeat = false; // reset switch
try {
// Upload part and add response to our list.
pTag = s3client.uploadPart(upReq).getPartETag();
}
catch (Exception ex) {
repeat = true; // repeat
}
} while (repeat);
chunkList.add(pTag);
pos = pos + chunkSize;
}
CompleteMultipartUploadRequest compl = new CompleteMultipartUploadRequest(
container, secretKey, uploadId, chunkList).withKey(path);
CompleteMultipartUploadResult complRes = s3client.completeMultipartUpload(compl);
return new URL(URLDecoder.decode(complRes.getLocation(), "UTF-8"));
}
catch (Exception ex) {
s3client.abortMultipartUpload(new AbortMultipartUploadRequest(container,
secretKey, uploadId));
throw new Exception("File upload error: "+ex.toString());
}
}
Below is the error I am getting :
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 0805716BBD0662AB, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: wNAzUyrLZgWCazZFe3KpMHO0uh0FM5FF7fiwBzN1A2YDEYS5hKZBYh5nWSjIhnhG
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:767)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:414)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:228)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3316)
at com.amazonaws.services.s3.AmazonS3Client.initiateMultipartUpload(AmazonS3Client.java:2401)
at net.timbusproject.storage.awss3.S3Client.send(S3Client.java:134)
Line 134 in S3Client.java where the error is occurring is :
InitiateMultipartUploadResult res = s3client.initiateMultipartUpload(req);
The upload works fine if I am not attaching any metadata. i.e, if I comment the below line, the upload works :
req.setObjectMetadata(objectMetadata);
I am unable to figure out why the request fails when metadata is set. Am I missing any step in the upload process ?
I was able to work around this problem by URL encoding the metadata keys and values.
objectMetadata.addUserMetadata(URLEncoder.encode(key, "UTF-8"), URLEncoder.encode(metadata.get(key),"UTF-8"));
Obviously the metadata seems to have some offending characters which are messing with the AWS calls. This workaround will let upload complete without error and also updates the metadata but the strings remain url encoded, which can be a problem later.
I am searching for a .txt file that is located at change set.
Then I need to create locally over my pc the full path directory of this file.
For example if there a file called"test.txt" that it's located at:
Project1-->Folder1-->Folder2-->test.txt
Till now I have managed to search for this file.
Now I need to fetch the full directory and create similar one over my pc:
Result at my pc:
Folder1-->Folder2-->test.txt
That's what I did to search for the file within a changeset and retrieve it:
public IFileItem getTextFileFile(IChangeSet changeSet, ITeamRepository repository) throws TeamRepositoryException{
IVersionableManager vm = SCMPlatform.getWorkspaceManager(repository).versionableManager();
List changes = changeSet.changes();
IFileItem toReturn = null;
for(int i=0;i<changes.size();i++) {="" <br=""> Change change = (Change) changes.get(i);
IVersionableHandle after = change.afterState();
if( after != null && after instanceof IFileItemHandle) {
IFileItem fileItem = (IFileItem) vm.fetchCompleteState(after, null);
if(fileItem.getName().contains(".txt")) {
toReturn = fileItem;
break;
} else {
continue;
}
}
}
if(toReturn == null){
throw new TeamRepositoryException("Could not find the file");
}
return toReturn;
}
I use RTC:4
Win:XP
Thanks in advance.
I have the following IConfiguration that I fetched by the following:
IWorkspaceManager workspaceManager = SCMPlatform.getWorkspaceManager(repository);
IWorkspaceSearchCriteria wsSearchCriteria = WorkspaceSearchCriteria.FACTORY.newInstance();
wsSearchCriteria.setKind(IWorkspaceSearchCriteria.STREAMS);
wsSearchCriteria.setPartialOwnerNameIgnoreCase(projectAreaName);
List <iworkspacehandle> workspaceHandles = workspaceManager.findWorkspaces(wsSearchCriteria, Integer.MAX_VALUE, Application.getMonitor());
IWorkspaceConnection workspaceConnection = workspaceManager.getWorkspaceConnection(workspaceHandles.get(0),Application.getMonitor());
IComponentHandle component = changeSet.getComponent();
IConfiguration configuration = workspaceConnection.configuration(component);
List lst = new ArrayList<string>();
lst=configuration.locateAncestors(lst,Application.getMonitor());
=========================================
Now to get the full path of the file item ,I made the following method I got from :
https://jazz.net/forum/questions/94927/how-do-i-find-moved-from-location-for-a-movedreparented-item-using-rtc-4-java-api
=========================================
private String getFullPath(List ancestor, ITeamRepository repository)
throws TeamRepositoryException {
String directoryPath = "";
for (Object ancestorObj : ancestor) {
IAncestorReport ancestorImpl = (IAncestorReport) ancestorObj;
for (Object nameItemPairObj : ancestorImpl.getNameItemPairs()) {
NameItemPairImpl nameItemPair = (NameItemPairImpl) nameItemPairObj;
Object item = SCMPlatform.getWorkspaceManager(repository)
.versionableManager()
.fetchCompleteState(nameItemPair.getItem(), null);
String pathName = "";
if (item instanceof IFolder) {
pathName = ((IFolder) item).getName();
}
else if (item instanceof IFileItem) {
pathName = ((IFileItem) item).getName();
}
if (!pathName.equals(""))
directoryPath = directoryPath + "\\" + pathName;
}
}
return directoryPath;
}
=========================================