Downloading large amount of data and storing on Android & iOS

Downloading large amount of data and storing on Android & iOS - java

So I have this API that downloads from our web service, but the we service sends it as a ZIP file instead of a json stream or something else.
Now the files can get quite large butt they are not saved as a ZIP file on the device but are instead unzipped and then saved in a realm database.
This seems like a extremely complicated way to do this and I would just like to remove the zip part and turn it into a Json streaming service instead.
Is that a valid way to do this or is there something else I should be doing?
The app for context is basically a Form viewer that is intended to have offline mode.
[WebMethod]
public string AndroidGetFormByID(string sessionID, int formID)
{
JObject json = new JObject();
UserDetails user = DBUserHelper.GetUserBySessionID(new Guid(sessionID));
if (user == null)
{
json["Error"] = "Not logged in";
return json.ToString(Newtonsoft.Json.Formatting.None);
}
Client client = Client.GetClient(user.ClientID);
var formTemplateRecord = SqlInsertUpdate.SelectQuery("SELECT JSON, CreatedDate FROM FormTemplates WHERE ID=#ID AND clientID=#clientID", "FormsConnectionString", new List<SqlParameter> { new SqlParameter("#ID", formID), new SqlParameter("#clientID", client.ID) }).GetFirstRow();
var formJson = formTemplateRecord["JSON"].ToString();
if (formJson == null)
{
json["Error"] = "No such form";
return json.ToString(Newtonsoft.Json.Formatting.None);
}
json = JObject.Parse(formJson);
json["formID"] = formID;
try
{
json["created"] = Convert.ToDateTime(formTemplateRecord["CreatedDate"]).ToString("dd/MM/yyyy");
}
catch (Exception e)
{
}
MemoryStream convertedFormData = new MemoryStream();
try
{
using (MemoryStream ms = new MemoryStream(json.ToString(Newtonsoft.Json.Formatting.None).ToByteArray()))
{
ms.Seek(0, SeekOrigin.Begin);
using (ZipFile zipedForm = new ZipFile())
{
zipedForm.AddEntry(json["title"].ToString() + "_" + json["formID"].ToString(), ms);
zipedForm.Save(convertedFormData);
}
}
}
catch (Exception ex)
{
return ex.Message.ToString();
}
return Convert.ToBase64String(convertedFormData.ToArray());
}
Also added a bit of java code for context how it is being used:
private void getForms ( WeakReference < Context > contextWeakReference, List < Integer > ids )
{
AtomicInteger atomicReference = new AtomicInteger( );
Observable.interval( 1, TimeUnit.SECONDS )
.map( aLong -> ids.get( aLong.intValue() ) )
.take( ids.size() )
.flatMap( integer ->
{
atomicReference.set( integer );
GetFormsListener.setCurrentItem( listOfIds.indexOf( integer ) + 1 );
FormDBHelper.updateTemplateDownloading( contextWeakReference, atomicReference.get( ), -1, FormIOHelper.FORM_STATUS.DOWNLOADING.toString() );
return ServiceGenerator.createService( ).androidGetFormByID( ClientUtils.loginDetailsConstructor.sessionID, String.valueOf( integer ) );}, 1 )
.map( base64 ->
{
final Context context = contextWeakReference.get();
if(context == null)
throw new NullPointerException( );
AppUtils.LogToConsole( Log.ASSERT, "Reached Here Before Write Form", AppUtils.getLoggedTime( ) );
final File file = FormIOHelper.checkFormFileExists( context.getFilesDir(), atomicReference.get(), "Library", FormIOHelper.FOLDERS.TEMPLATES.toString() );
FormIOHelper.writeForm( file, base64 );
AppUtils.LogToConsole( Log.ASSERT, "Reached Here After Write Form", AppUtils.getLoggedTime( ) );
return file;
} )
.map( file ->
{
JsonObject formObject = null;
try
{
JsonObject jsonObject = FormIOHelper.getFormFromZipFileAndStrip( file );
formObject = FormDBHelper.stripFormJson( contextWeakReference, jsonObject, -1 );
} catch ( Throwable e )
{
ErrorLog.log( e );
FormDBHelper.updateTemplateDownloading( contextWeakReference, atomicReference.get( ), -1, FormIOHelper.FORM_STATUS.ERROR.toString( ) );
}
if ( formObject == null )
return new JsonArray( );
JsonArray jsonElements;
if ( formObject.has( "embeddedFiles" ) && formObject.get( "embeddedFiles" ).isJsonArray( ) )
jsonElements = formObject.get( "embeddedFiles" ).getAsJsonArray( );
else
jsonElements = new JsonArray( );
if ( jsonElements.size( ) > 0 )
{
final List < DownloadableFilesConstructor > downloadableFilesConstructorList = FormIOHelper.setEmbeddedFiles( jsonElements );
Context context = contextWeakReference.get( );
if ( context == null )
return jsonElements;
DownloadableFilesDBHelper.saveData( context, downloadableFilesConstructorList );
}
return jsonElements;
} )

You can try out Google GSON Stream. It helps in downloading large amount of data through JSON REST API.

Related

Can not extract text via Apache Tika using Lucee

I would like to extract text from pdf, docx etc via Lucee 5+ (5.2.9), but unfortunately i get empty result set. I have used several Apache Tika versions (runnable jar with Java 1.8.0) that might fit to my specific Lucee and Java requirements, but the result set always remains empty.
exract.cfc
component {
public any function init() {
_setTikaJarPath( GetDirectoryFromPath( GetCurrentTemplatePath( ) ) & "tika-app-1.19.1.jar" );
return this;
}
private struct function doParse( required any fileContent, boolean includeMeta=true, boolean includeText=true ) {
var result = {};
var is = "";
var jarPath = _getTikaJarPath();
if ( IsBinary( arguments.fileContent ) ) {
is = CreateObject( "java", "java.io.ByteArrayInputStream" ).init( arguments.fileContent );
} else {
// TODO, support plain string input (i.e. html)
return {};
}
try {
var parser = CreateObject( "java", "org.apache.tika.parser.AutoDetectParser", jarPath );
var ch = CreateObject( "java", "org.apache.tika.sax.BodyContentHandler" , jarPath ).init(-1);
var md = CreateObject( "java", "org.apache.tika.metadata.Metadata" , jarPath ).init();
parser.parse( is, ch, md );
if ( arguments.includeMeta ) {
result.metadata = {};
for( var key in md.names() ) {
var mdval = md.get( key );
if ( !isNull( mdval ) ) {
result.metadata[ key ] = _removeNonUnicodeChars( mdval );
}
}
}
if ( arguments.includeText ) {
result.text = _removeNonUnicodeChars( ch.toString() );
}
} catch( any e ) {
result = { error = e };
}
return result;
}
public function read(required string filename) {
var result = {};
if(!fileExists(filename)) {
result.error = "#filename# does not exist.";
return result;
};
var f = createObject("java", "java.io.File").init(filename);
var fis = createObject("java","java.io.FileInputStream").init(f);
try {
result = doParse(fis);
} catch(any e) {
result.error = e;
}
fis.close();
return result;
}
private string function _removeNonUnicodeChars( required string potentiallyDirtyString ) {
return ReReplace( arguments.potentiallyDirtyString, "[^\x20-\x7E]", "", "all" );
}
// GETTERS AND SETTERS
private string function _getTikaJarPath() {
return _tikaJarPath;
}
private void function _setTikaJarPath( required string tikaJarPath ) {
_tikaJarPath = arguments.tikaJarPath;
}
}
and the code that i use to run it
<cfset takis = new exract()>
<cfset files = directoryList(expandPath("./sources"))>
<cfloop index="f" array="#files#">
<cfif not findNoCase(".DS_Store",f)>
<cfdump var="#takis.read(f)#" label="#f#">
</cfif>
</cfloop>

I think the problem is a class clash: The Lucee core engine already loads a version of Tika meaning the one you point to is ignored. But the loaded version doesn't behave as expected, returning empty strings as you've seen.
I've solved this by using OSGi to load the desired Tika version. This involves editing the Manifest of the tika-app jar to include basic OSGi metadata and then loading it via my osgiLoader
There is a pre-built Tika bundle available but I haven't been able to get it to work with Lucee.
Here's how to convert the latest tika-app jar to OSGi:
open the "tika-app-1.28.2.jar" with 7-zip
open META-INF then select MANIFEST.MF and press F4 to open it in a text editor
add the following to the end of the file:
Bundle-Name: Apache Tika App Bundle
Bundle-SymbolicName: apache-tika-app-bundle
Bundle-Description: Apache Tika App jar converted to an OSGi bundle
Bundle-ManifestVersion: 2
Bundle-Version: 1.28.2
Bundle-ClassPath: .,tika-app-1.28.2.jar
Save choosing to update when prompted.
You can then call the jar using osgiLoader as follows:
extractor.cfc
component{
property name="loader" type="object";
property name="tikaBundle" type="struct";
public extractor function init( required object loader, required struct tikaBundle ){
variables.loader = arguments.loader
variables.tikaBundle = arguments.tikaBundle
return this
}
public string function parseToString( required string filePath ){
try{
var fileStream = CreateObject( "java", "java.io.FileInputStream" ).init( JavaCast( "string", arguments.filePath ) )
var tikaObject = loader.loadClass( "org.apache.tika.Tika", tikaBundle.path, tikaBundle.name, tikaBundle.version )
var result = tikaObject.parseToString( fileStream )
}
finally{
fileStream.close()
}
return result
}
}
(The following script assumes extractor.cfc, the modified Tika jar, the osgiLoader.cfc and the document to be processed are in the same directory.)
index.cfm
<cfscript>
docPath = ExpandPath( "test.pdf" )
loader = New osgiLoader()
tikaBundle = {
version: "1.28.2"
,name: "apache-tika-app-bundle"
,path: ExpandPath( "tika-app-1.28.2.jar" )
}
extractor = New extractor( loader, tikaBundle )
result = extractor.parseToString( docPath )
dump( result )
</cfscript>
Another way to get the right version loaded is to use JavaLoader. For some reason I couldn't get it to work with the latest tika-app jar (1.28.2), but 1.19.1 does seem to work.
Hacking the existing extension
I would advise you to raise an issue with Preside to change their extension to avoid the clash, but as a temporary hack you could try amending it yourself as follows:
First, add your modified Tika bundle and the osgiLoader.cfc to the /preside-ext-tika/services/ directory.
Next, change line 14 of DocumentMetadataService.cfc so the name of the Tika jar path matches your modified bundle.
_setTikaJarPath( GetDirectoryFromPath( GetCurrentTemplatePath( ) ) & "tika-app-1.28.2.jar" );
Then, modify lines 33-35 of the same cfc to replace:
var parser = CreateObject( "java", "org.apache.tika.parser.AutoDetectParser", jarPath );
var ch = CreateObject( "java", "org.apache.tika.sax.BodyContentHandler" , jarPath ).init(-1);
var md = CreateObject( "java", "org.apache.tika.metadata.Metadata" , jarPath ).init();
with the following:
var loader = New osgiLoader();
var tikaBundle = { version: "1.28.2", name: "apache-tika-app-bundle" };
var parser = loader.loadClass( "org.apache.tika.parser.AutoDetectParser", jarPath, tikaBundle.name, tikaBundle.version )
var ch = loader.loadClass( "org.apache.tika.sax.BodyContentHandler" , jarPath, tikaBundle.name, tikaBundle.version ).init(-1)
var md = loader.loadClass( "org.apache.tika.metadata.Metadata" , jarPath, tikaBundle.name, tikaBundle.version ).init()
NB: I don't have Preside so can't test it in context.

AWS SDK V2 S3 fetch object is not fetching objects more than 1000

I am using AWS SDK version : 2.16.78. But the ListObjectsRequest object is not fetching objects more than 1000.
I did go through the documentation but I wasn't able to find how to set the continuous token.
I am using the below code snippet
try {
ListObjectsRequest listObjects = ListObjectsRequest
.builder()
.bucket(bucketName)
.build();
ListObjectsResponse res = s3.listObjects(listObjects);
List<S3Object> objects = res.contents();
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
System.out.print("\n The name of the key is " + myValue.key());
}
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
The above code is only fetching 1000 s3 objects.

As you indicated, AWS will only return up to 1000 of the objects in a bucket:
Returns some or all (up to 1,000) of the objects in a bucket.
Amazon S3 lists objects in alphabetical order. You can take advantage of this fact and provide a marker to the key that should be used to start with in the next requests, if appropriate:
try {
ListObjectsRequest listObjects = ListObjectsRequest
.builder()
.bucket(bucketName)
.build()
;
ListObjectsResponse listObjectsResponse = null;
String lastKey = null;
do {
if ( listObjectsResponse != null ) {
listObjectsRequest = listObjectsRequest.toBuilder()
.marker(lastKey)
.build()
;
}
listObjectsResponse = s3.listObjects(listObjectsRequest);
List<S3Object> objects = listObjectsResponse.contents();
// Iterate over results
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
String key = myValue.key();
System.out.print("\n The name of the key is " + key);
// Update the value of the last key processed
lastKey = key;
}
} while ( listObjectsResponse.isTruncated() );
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
Something very similar can be achieved with the v2 of the list objects API ListObjectsV2Request startAfter method.
With v2, you can use ListObjectsV2Response and continuation token as well. Something similar to:
try {
ListObjectsV2Request listObjects = ListObjectsV2Request
.builder()
.bucket(bucketName)
.build()
;
ListObjectsV2Response listObjectsResponse = null;
String nextContinuationToken = null;
do {
if ( listObjectsResponse != null ) {
listObjectsRequest = listObjectsRequest.toBuilder()
.continuationToken(nextContinuationToken)
.build()
;
}
listObjectsResponse = s3.listObjectsV2(listObjectsRequest);
nextContinuationToken = listObjectsResponse.nextContinuationToken();
List<S3Object> objects = listObjectsResponse.contents();
// Iterate over results
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
String key = myValue.key();
System.out.print("\n The name of the key is " + key);
}
} while ( listObjectsResponse.isTruncated() );
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
Finally, you can use the listObjectsV2Paginator method to iterate over the results in a similar way like listNextBatchOfObjects was used in the v1 of the API. See for instance this related v1 code and these 1 2 related SO questions.
All the mappings between operations from v1 and v2 versions of the API are documented here.

Obtain Kerberos ticket in C/C++

Does anyone know how to get a ticket from the Key Distribution Center (KDC) using the MIT krb5 API in C/C++?
I already have a working Java Client which uses GSS-API to obtain a ticket from the KDC (using a local TGT) and forwards it to a Java Server.
The server accepts the security context using the following logic:
private GSSContext acceptSecurityContext(Subject serverSubject, final byte[] kerberosServiceTicket) {
return Subject.doAs(serverSubject, (PrivilegedAction<GSSContext>) () -> {
GSSContext gssContext;
try {
gssContext = manager.createContext((GSSCredential) null);
} catch (GSSException ex) {
LOGGER.warn("Could not create Kerberos gssContext: " + ex.getMessage(), ex);
return null;
}
try {
gssContext.acceptSecContext(kerberosServiceTicket, 0, kerberosServiceTicket.length);
} catch (GSSException ex) {
LOGGER.warn("Could not accept security context: " + ex.getMessage(), ex);
return null;
}
return gssContext;
});
}
I am trying to implement a C client - similar to the Java one - using MIT krb5 API and I can't seem to make it work. So far this is my C client code:
krb5_context context;
krb5_ccache ccache;
krb5_creds *outCreds = NULL;
krb5_creds inCreds;
int retval;
char *principal = "...";
retval = krb5_init_secure_context(&context);
...
retval = krb5_cc_default(context, &ccache);
...
memset(&inCreds, 0, sizeof(inCreds));
retval = krb5_parse_name(context, principal, &inCreds.server);
...
retval = krb5_cc_get_principal(context, ccache, &inCreds.client);
...
retval = krb5_get_credentials(context, 0, ccache, &inCreds, &outCreds);
...
// also tried using the following: krb5Ticket->enc_part.ciphertext.data
// (maybe this is the correct way, but I should somehow decrypt it and use krb5Ticket->enc_part2 ?)
// retval = krb5_decode_ticket(&outCreds->ticket, &krb5Ticket);
// ...
char *base64KerberosTicket = base64_encode(outCreds->ticket.data, strlen(outCreds->ticket.data));
char *response = loginKerberos(base64KerberosTicket);
...

After some further reading it appears that my approach was not correct for my use case. I should have used GSS-API directly.
The following snippet works:
gss_name_t get_spn(char *spn)
{
OM_uint32 maj_stat;
OM_uint32 min_stat;
gss_buffer_desc name_buf = GSS_C_EMPTY_BUFFER;
gss_name_t spn_gss_name = GSS_C_NO_NAME;
name_buf.value = spn;
name_buf.length = strlen(name_buf.value);
maj_stat = gss_import_name(&min_stat, &name_buf, GSS_KRB5_NT_PRINCIPAL_NAME, &spn_gss_name);
if (GSS_ERROR(maj_stat))
{
display_status("Major status", maj_stat, GSS_C_GSS_CODE);
display_status("Minor status", min_stat, GSS_C_MECH_CODE);
}
return spn_gss_name;
}
char* init_sec_context(char *spn)
{
OM_uint32 maj_stat;
OM_uint32 min_stat;
OM_uint32 flags = GSS_C_REPLAY_FLAG | GSS_C_SEQUENCE_FLAG | GSS_C_MUTUAL_FLAG;
gss_ctx_id_t gss_context = GSS_C_NO_CONTEXT;
gss_name_t spn_gss_name = get_spn(spn);
gss_buffer_desc output_token;
char *base64_encoded_kerberos_token = NULL;
maj_stat = gss_init_sec_context( //
&min_stat, // minor_status
GSS_C_NO_CREDENTIAL, // claimant_cred_handle
&gss_context, // context_handle
spn_gss_name, // target_name
GSS_C_NO_OID, // mech_type of the desired mechanism
flags, // req_flags
0, // time_req for the context to remain valid. 0 for default lifetime.
GSS_C_NO_CHANNEL_BINDINGS, // channel bindings
GSS_C_NO_BUFFER, // input token
NULL, // actual_mech_type
&output_token, // output token
NULL, // ret_flags
NULL // time_req
);
if (GSS_ERROR(maj_stat))
{
...
}
else if (output_token.length != 0)
{
base64_encoded_kerberos_token = base64_encode(output_token.value, output_token.length, &(output_token.length));
}
if (gss_context != GSS_C_NO_CONTEXT)
{
gss_delete_sec_context(&min_stat, &gss_context, GSS_C_NO_BUFFER);
}
if (spn_gss_name != GSS_C_NO_NAME)
{
gss_release_name(&min_stat, &spn_gss_name);
}
gss_release_buffer(&min_stat, &output_token);
return base64_encoded_kerberos_token;
}

Using PDFBox to remove Optional Content Groups that are not enabled

I'm using apache PDFBox from java, and I have a source PDF with multiple optional content groups. What I am wanting to do is export a version of the PDF that includes only the standard content and the optional content groups that were enabled. It is important for my purposes that I preserve any dynamic aspects of the original.... so text fields are still text fields, vector images are still vector images, etc. The reason that this is required is because I intend to ultimately be using a pdf form editor program that does not know how to handle optional content, and would blindly render all of them, so I want to preprocess the source pdf, and use the form editing program on a less cluttered destination pdf.
I've been trying to find something that could give me any hints on how to do this with google, but to no avail. I don't know if I'm just using the wrong search terms, or if this is just something that is outside of what the PDFBox API was designed for. I rather hope it's not the latter. The info shown here does not seem to work (converting the C# code to java), because despite the pdf I'm trying to import having optional content, there does not seem to be any OC resources when I examine the tokens on each page.
for(PDPage page:pages) {
PDResources resources = page.getResources();
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
Collection tokens = parser.getTokens();
...
}
I'm truly sorry for not having any more code to show what I've tried so far, but I've just been poring over the java API docs for about 8 hours now trying to figure out what I might need to do this, and just haven't been able to figure it out.
What I DO know how to do is add text, lines, and images to a new PDPage, but I do not know how to retrieve that information from a given source page to copy it over, nor how to tell which optional content group such information is part of (if any). I am also not sure how to copy form fields in the source pdf over to the destination, nor how to copy the font information over.
Honestly, if there's a web page out there that I wasn't able to find with google with the searches that I tried, I'd be entirely happy to read up more about it, but I am really quite stuck here, and I don't know anyone personally that knows about this library.
Please help.
EDIT:
Trying what I understand from what was suggested below, I've written a loop to examine each XObject on the page as follows:
PDResources resources = pdPage.getResources();
Iterable<COSName> names = resources.getXObjectNames();
for(COSName name:names) {
PDXObject xobj = resources.getXObject(name);
PDFStreamParser parser = new PDFStreamParser(xobj.getStream().toByteArray());
parser.parse();
Object [] tokens = parser.getTokens().toArray();
for(int i = 0;i<tokens.length-1;i++) {
Object obj = tokens[i];
if (obj instanceof COSName && obj.equals(COSName.OC)) {
i++;
Object obj = tokens[i];
if (obj instanceof COSName) {
PDPropertyList props = resources.getProperties((COSName)obj);
if (props != null) {
...
However, after an OC key, the next entry in the tokens array is always an Operator tagged as "BMC". Nowhere am I finding any info that I can recognize from the named optional content groups.

Here's a robust solution for removing marked content blocks (open to feedback if anyone finds anything that isn't working right). You should be able to adjust for OC blocks...
This code properly handles nesting and removal of resources (xobject, graphics state and fonts - easy to add others if needed).
public class MarkedContentRemover {
private final MarkedContentMatcher matcher;
/**
*
*/
public MarkedContentRemover(MarkedContentMatcher matcher) {
this.matcher = matcher;
}
public int removeMarkedContent(PDDocument doc, PDPage page) throws IOException {
ResourceSuppressionTracker resourceSuppressionTracker = new ResourceSuppressionTracker();
PDResources pdResources = page.getResources();
PDFStreamParser pdParser = new PDFStreamParser(page);
PDStream newContents = new PDStream(doc);
OutputStream newContentOutput = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter newContentWriter = new ContentStreamWriter(newContentOutput);
List<Object> operands = new ArrayList<>();
Operator operator = null;
Object token;
int suppressDepth = 0;
boolean resumeOutputOnNextOperator = false;
int removedCount = 0;
while (true) {
operands.clear();
token = pdParser.parseNextToken();
while(token != null && !(token instanceof Operator)) {
operands.add(token);
token = pdParser.parseNextToken();
}
operator = (Operator)token;
if (operator == null) break;
if (resumeOutputOnNextOperator) {
resumeOutputOnNextOperator = false;
suppressDepth--;
if (suppressDepth == 0)
removedCount++;
}
if (OperatorName.BEGIN_MARKED_CONTENT_SEQ.equals(operator.getName())
|| OperatorName.BEGIN_MARKED_CONTENT.equals(operator.getName())) {
COSName contentId = (COSName)operands.get(0);
final COSDictionary properties;
if (operands.size() > 1) {
Object propsOperand = operands.get(1);
if (propsOperand instanceof COSDictionary) {
properties = (COSDictionary) propsOperand;
} else if (propsOperand instanceof COSName) {
properties = pdResources.getProperties((COSName)propsOperand).getCOSObject();
} else {
properties = new COSDictionary();
}
} else {
properties = new COSDictionary();
}
if (matcher.matches(contentId, properties)) {
suppressDepth++;
}
}
if (OperatorName.END_MARKED_CONTENT.equals(operator.getName())) {
if (suppressDepth > 0)
resumeOutputOnNextOperator = true;
}
else if (OperatorName.SET_GRAPHICS_STATE_PARAMS.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.EXT_G_STATE, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.DRAW_OBJECT.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.XOBJECT, operands.get(0), suppressDepth == 0);
}
else if (OperatorName.SET_FONT_AND_SIZE.equals(operator.getName())) {
resourceSuppressionTracker.markForOperator(COSName.FONT, operands.get(0), suppressDepth == 0);
}
if (suppressDepth == 0) {
newContentWriter.writeTokens(operands);
newContentWriter.writeTokens(operator);
}
}
if (resumeOutputOnNextOperator)
removedCount++;
newContentOutput.close();
page.setContents(newContents);
resourceSuppressionTracker.updateResources(pdResources);
return removedCount;
}
private static class ResourceSuppressionTracker{
// if the boolean is TRUE, then the resource should be removed. If the boolean is FALSE, the resource should not be removed
private final Map<COSName, Map<COSName, Boolean>> tracker = new HashMap<>();
public void markForOperator(COSName resourceType, Object resourceNameOperand, boolean preserve) {
if (!(resourceNameOperand instanceof COSName)) return;
if (preserve) {
markForPreservation(resourceType, (COSName)resourceNameOperand);
} else {
markForRemoval(resourceType, (COSName)resourceNameOperand);
}
}
public void markForRemoval(COSName resourceType, COSName refId) {
if (!resourceIsPreserved(resourceType, refId)) {
getResourceTracker(resourceType).put(refId, Boolean.TRUE);
}
}
public void markForPreservation(COSName resourceType, COSName refId) {
getResourceTracker(resourceType).put(refId, Boolean.FALSE);
}
public void updateResources(PDResources pdResources) {
for (Map.Entry<COSName, Map<COSName, Boolean>> resourceEntry : tracker.entrySet()) {
for(Map.Entry<COSName, Boolean> refEntry : resourceEntry.getValue().entrySet()) {
if (refEntry.getValue().equals(Boolean.TRUE)) {
pdResources.getCOSObject().getCOSDictionary(COSName.XOBJECT).removeItem(refEntry.getKey());
}
}
}
}
private boolean resourceIsPreserved(COSName resourceType, COSName refId) {
return getResourceTracker(resourceType).getOrDefault(refId, Boolean.FALSE);
}
private Map<COSName, Boolean> getResourceTracker(COSName resourceType){
if (!tracker.containsKey(resourceType)) {
tracker.put(resourceType, new HashMap<>());
}
return tracker.get(resourceType);
}
}
}
Helper class:
public interface MarkedContentMatcher {
public boolean matches(COSName contentId, COSDictionary props);
}

Optional Content Groups are marked with BDC and EMC. You will have to navigate through all of the tokens returned from the parser and remove the "section" from the array. Here is some C# Code that was posted a while ago - [1]: How to delete an optional content group alongwith its content from pdf using pdfbox?
I investigated that (converting to Java) but couldn't get it work as expected. I managed to remove the content between BDC and EMC and then save the result using the same technique as the sample but the PDF was corrupted. Perhaps that is my lack of C# Knowledge (related to Tuples etc.)
Here is what I came up with, as I said it doesn't work perhaps you or someone else (mkl, Tilman Hausherr) can spot the flaw.
OCGDelete (PDDocument doc, int pageNum, String OCName) {
PDPage pdPage = (PDPage) doc.getDocumentCatalog().getPages().get(pageNum);
PDResources pdResources = pdPage.getResources();
PDFStreamParser pdParser = new PDFStreamParser(pdPage);
int ocgStart
int ocgLength
Collection tokens = pdParser.getTokens();
Object[] newTokens = tokens.toArray()
try {
for (int index = 0; index < newTokens.length; index++) {
obj = newTokens[index]
if (obj instanceof COSName && obj.equals(COSName.OC)) {
// println "Found COSName at "+index /// Found Optional Content
startIndex = index
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if (obj instanceof COSName) {
prop = pdRes.getProperties(obj)
if (prop != null && prop instanceof PDOptionalContentGroup) {
if ((prop.getName()).equals(delLayer)) {
println "Found the Layer to be deleted"
println "prop Name was " + prop.getName()
index++
if (index < newTokens.size()) {
obj = newTokens[index]
if ((obj.getName()).equals("BDC")) {
ocgStart = index
println("OCG Start " + ocgStart)
ocgLength = -1
index++
while (index < newTokens.size()) {
ocgLength++
obj = newTokens[index]
println " Loop through relevant OCG Tokens " + obj
if (obj instanceof Operator && (obj.getName()).equals("EMC")) {
println "the next obj was " + obj
println "after that " + newTokens[index + 1] + "and then " + newTokens[index + 2]
println("OCG End " + ocgLength++)
break
}
index++
}
if (endIndex > 0) {
println "End Index was something " + (startIndex + ocgLength)
}
}
}
}
}
}
}
}
}
}
catch (Exception ex){
println ex.message()
}
for (int i = ocgStart; i < ocgStart+ ocgLength; i++){
newTokens.removeAt(i)
}
PDStream newContents = new PDStream(doc);
OutputStream output = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(output);
writer.writeTokens(newTokens);
output.close();
pdPage.setContents(newContents);
}

Java BufferedImages from .ts File

How can I get a BufferedImage (Frame at defined Position) from a .ts File in Java? I don't want to use any JNI / CLI Wrapper if possible.
System.out.println( JCodecUtil.detectFormat( file ) );
Demuxer demuxer = JCodecUtil.createDemuxer( JCodecUtil.detectFormat( file ), file );
for ( DemuxerTrack demuxerTrack : demuxer.getVideoTracks() ) {
Packet packet;
while ( ( packet = demuxerTrack.nextFrame() ) != null ) {
System.out.println( "frame " + packet.getDuration() );
}
}
The output of the first Snippet is just MPEG_TS
[ERROR] . (:0): Format MPEG_TS is not supported
MPEG_TS
[ERROR] . (:0): Format MPEG_TS is not supported
for ( DemuxerTrack demuxerTrack : JCodecUtil.createM2TSDemuxer( file, TrackType.VIDEO ).v1.getTracks() ) {
Packet packet;
while ( ( packet = demuxerTrack.nextFrame() ) != null ) {
System.out.println( "frame : " + ImageIO.read( new ByteArrayInputStream( packet.getData().array() ) ) );
}
}
In this Snippet, it just outputs null for each Frame. How to fix this?

Picture tmp = Picture.create(1920, 1088, ColorSpace.YUV420);
VideoDecoder vd = JCodecUtil.createVideoDecoder(JCodecUtil.detectDecoder(data.duplicate()), data.duplicate());
Picture pic = vd.decodeFrame(data, tmp.getData());
BufferedImage buf = AWTUtil.toBufferedImage(pic);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Downloading large amount of data and storing on Android & iOS - java

You can try out Google GSON Stream. It helps in downloading large amount of data through JSON REST API.

Related

Can not extract text via Apache Tika using Lucee

AWS SDK V2 S3 fetch object is not fetching objects more than 1000

Obtain Kerberos ticket in C/C++

Using PDFBox to remove Optional Content Groups that are not enabled

Java BufferedImages from .ts File

Categories

Resources