java com.itextpdf.text.exceptions.InvalidPdfException: The document has no page root

java com.itextpdf.text.exceptions.InvalidPdfException: The document has no page root - java

I'm trying to read a PDF files and I got this exception
com.itextpdf.text.exceptions.InvalidPdfException: The document has no page root (meaning: it's an invalid PDF).
at com.itextpdf.text.pdf.PdfReader.readPages(PdfReader.java:1248)
at com.itextpdf.text.pdf.PdfReader.readPdf(PdfReader.java:739)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:181)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:219)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:207)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:197)
at com.mitech.med.watermark.Test2.main(Test2.java:11)
I used itext 5.5.10.
This is my code:
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
PdfReader reader = new PdfReader("C:/Users/matteo.fusi/Downloads/testPDF/1142.pdf");
} catch (Exception e) {
e.printStackTrace();
}
}
This is the link to the PDF document:
https://drive.google.com/file/d/0B2IrLGj9wefRVFZxSUhkN0o0N1k/view?usp=sharing
Thanks in advance
Regards
Matteo

I get the same issue on itext 5.5.10. I haven't taken a look yet on some new changes on the latest version. But it's working fine on itext 5.3.4. You could try on that version

The PDF in question is broken.
This is the page tree root dictionary object:
1 0 obj
<</Type /Pages /Count 1
/Kids[
4 0 R
]
/Type /Page
/MediaBox [ 0 0 595 842 ]
/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
/Resources <<
/Font << /F0 6 0 R /F1 7 0 R /F2 8 0 R /F3 9 0 R /F4 10 0 R /F5 11 0 R /F6 12 0 R /F7 13 0 R /F8 14 0 R /F9 15 0 R /F10 16 0 R /F11 17 0 R /F12 18 0 R /F13 19 0 R /F14 20 0 R >>
/XObject <<
/Im0 5 0 R >>
>>
>>endobj
As you see the key Type occurs twice, once with value Pages and once with value Page. But the specification ISO 32000-1 clearly states in section 7.3.7 - Dictionary Objects:
Multiple entries in the same dictionary shall not have the same key.
(This, by the way, is a fairly obvious requirement for dictionary objects in general...)
The result of such a defect may be different in different PDF processors, the major obvious cases:
They might explicitly check for such problems and reject the file outright.
They might not check but use the first value assigned to the key.
They might not check but use the last value assigned to the key.
iText appears to be of the third kind. As far as iText is concerned, therefore, the page tree root dictionary has a Type value Page. But the specification requires the Type of a page tree node to be Pages. Thus, iText throws the observed exception.

Firstly I would try an alternative that seems to work for a lot of people :
http://pdfbox.apache.org/ <- Inspired by an older post I researchd on stack overflow
Secondly while debugging the issue I found this
rootPages == null || (!PdfName.PAGES.equals(rootPages.get(PdfName.TYPE))
&& !PdfName.PAGES.equals(rootPages.get(new PdfName("Types"))))
Is not satisfied hence your problem. I believe it might be a current Bug in Itext.
DFTBA

Related

H2O AI : Unsupported MOJO model 'word2vec'

I have 3 h2o models:
$ ls dataset/mojo
1. DeepLearning_model_python_1582176092021_2.zip
2. StackedEnsemble_BestOfFamily_AutoML_20200220_073620.zip
3. Word2Vec_model_python_1582176092021_1.zip
The binary models for these 3 were generated on v3.28.0.3, but I am trying to upgrade the h2o version and productionize it onto v3.30.0.5
So i converted those 3 binaries successfully to MOJO models (as listed above)
When trying to upload these mojo models using the h2o.upload_mojo, for Word2Vec alone, am getting the error:
In [15]: w2v_path = 'dataset/mojo/Word2Vec_model_python_1582176092021_1.zip'
In [16]: w2v_model = h2o.upload_mojo(w2v_path)
generic Model Build progress: | (failed) | 0%
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-16-734005ed70a8> in <module>
----> 1 w2v_model = h2o.upload_mojo(w2v_path)
~/.envs/h2o-test/lib/python3.8/site-packages/h2o/h2o.py in upload_mojo(mojo_path)
2149 frame_key = response["destination_frame"]
2150 mojo_estimator = H2OGenericEstimator(model_key = get_frame(frame_key))
-> 2151 mojo_estimator.train()
2152 print(mojo_estimator)
2153 return mojo_estimator
~/.envs/h2o-test/lib/python3.8/site-packages/h2o/estimators/estimator_base.py in train(self, x, y, training_frame, offset_column, fold_column, weights_column, validation_frame, max_runtime_secs, ignored_columns, model_id, verbose)
113 validation_frame=validation_frame, max_runtime_secs=max_runtime_secs,
114 ignored_columns=ignored_columns, model_id=model_id, verbose=verbose)
--> 115 self._train(parms, verbose=verbose)
116
117 def train_segments(self, x=None, y=None, training_frame=None, offset_column=None, fold_column=None,
~/.envs/h2o-test/lib/python3.8/site-packages/h2o/estimators/estimator_base.py in _train(self, parms, verbose)
205 return
206
--> 207 job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
208 model_json = h2o.api("GET /%d/Models/%s" % (rest_ver, job.dest_key))["models"][0]
209 self._resolve_model(job.dest_key, model_json)
~/.envs/h2o-test/lib/python3.8/site-packages/h2o/job.py in poll(self, poll_updates)
75 if self.status == "FAILED":
76 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
---> 77 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
78 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
79 else:
OSError: Job with key $03010a64051932d4ffffffff$_8d0c64127137bd1eef16202889cf4fca failed with an exception: java.lang.IllegalArgumentException: Unsupported MOJO model 'word2vec'.
stacktrace:
java.lang.IllegalArgumentException: Unsupported MOJO model 'word2vec'.
at hex.generic.Generic$MojoDelegatingModelDriver.computeImpl(Generic.java:99)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:248)
at hex.generic.Generic$MojoDelegatingModelDriver.compute2(Generic.java:78)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1557)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
The other two models succeed without any issues, and returns a valid model_id. Any idea what the issue here is, coz from docs its understood that all three model types are supported by MOJO
I tried this with a cluster of 2 pods on K8s with 2Gi/1cpu memory each, but results in same outcome as above.

Word2Vec is not currently in the list of allowed algos to import back into H2O.
The documentation is a little bit confusing and needs improvement. MOJO is a way to take H2O models into production. Those are usable outside of H2O using H2O's genmodel. Some of those MOJOs are importable back into H2O and inspected. But not all of them. The first two algorithms listed are supported. Unfortunately, Word2Vec is not.
I've created a JIRA to track this issue. We should be able to enable at least scoring.

Morena 7 Scanner completely ignores settings

I'm trying to setup Morena 7 in my java application, but i can't configure my scanner from my code, it ignores the settings i set.
Even though my scanner works with the example projects they provide with every supported setting.
I have searched the web for explanations but i have found very little to none documentation.
This the code i use to scan, it is identical to sample given in the tutorial document :
public void scan() throws Exception {
Manager manager = Manager.getInstance();
List devices = manager.listDevices();
if(devices.isEmpty()) {
System.out.println("No scanners detected");
return;
}
Device device = (Device) devices.get(0);
if (device instanceof Scanner) {
Scanner scanner = (Scanner) device;
scanner.setMode(Scanner.RGB_8);
scanner.setResolution(75);
scanner.setFrame(100, 100, 500, 500);
BufferedImage bimage = SynchronousHelper.scanImage(scanner);
// Do the necessary processes with bimage
manager.close();
}
else {
System.out.println("Please Connect A Scanner");
}
}
When i run this code, i get back an image but with default values from the printer, every setting like color, resolution and scan area (frame) are ignored.

First I think one reason can be the problem that Morena 7 always spools the scanner data into a file. You cannot access this scanner data before written to a file (unfortunately). So in case you want to scan bilevel images you will get a jpg image with greylevels. Morena saves scannerdata as jpg on Mac OSX and as a bmp on Windows.
You should check the temp file Morena 7 creates. Assuming you use the class SynchronousHelper from the Moran example you can edit the scanImage method which just loads the temp file using ImageIO.
If I check this temp file (on Mac OSX) all the set values as resolution and colormode are considered. Probably your scanner does not support some things? Or Morena does something wrong while saving the image.
And check the system error output. Should look something like the following where you can see that I set the resolution to 400dpi and the colormode to bilevel (ICScannerPixelDataTypeBW and bitDepth 1).
Functional unit: ICScannerFunctionalUnitFlatbed <0x7fefe850f4e0>:
pixelDataType : ICScannerPixelDataTypeBW
supportedBitDepths : <NSMutableIndexSet: 0x7fefe850f4b0>[number of indexes: 2 (in 2 ranges), indexes: (1 8)]
bitDepth : 1
supportedDocumentTypes : <NSMutableIndexSet: 0x7fefede9a9f0>[number of indexes: 6 (in 2 ranges), indexes: (1-5 10)]
documentType : 1
physicalSize : [width = 8.50 inches, height = 14.00 inches]
measurementUnit : 0
supportedResolutions : <NSMutableIndexSet: 0x7fefedee4390>[number of indexes: 7 (in 7 ranges), indexes: (100 150 200 300 400 600 1200)]
preferredResolutions : <NSMutableIndexSet: 0x7fefedee4390>[number of indexes: 7 (in 7 ranges), indexes: (100 150 200 300 400 600 1200)]
resolution : 400
overviewResolution : 150
supportedScaleFactors : <NSMutableIndexSet: 0x7fefedec3dd0>[number of indexes: 1 (in 1 ranges), indexes: (100)]
preferredScaleFactors : <NSMutableIndexSet: 0x7fefedec3dd0>[number of indexes: 1 (in 1 ranges), indexes: (100)]
scaleFactor : 100
acceptsThresholdForBlackAndWhiteScanning : NO
usesThresholdForBlackAndWhiteScanning : NO
thresholdForBlackAndWhiteScanning : 0
templates : (null)
vendorFeatures : (null)
state : 0x00000001

Buffered Writer occasionally creates symbols rather than numbers

So I have a program that collects a bunch of data and continuously concatenates the data into a string with a single white space between each entry. During my close routine I print the String into a txt file using buffered writer. About 50% of the time the data shows up as (mostly) Chinese symbols. Is the VM doing some weird Unicode stuff? Why does this only occur sometimes?
I've looked around on other forums and have not seen other instances of this problem. None of the other CS majors I know understand what is happening.
EDIT : the data is all integer numbers ranging 0-1365;
UPDATE: upon further research I found this which makes me think a may need a PrintStream rather than a BufferedWriter can anyone speak to that? I tested the PrintStream and I will not be able to construct it with a FileWriter as I would a BufferedWriter which means I need more research to write to my txt.
UPDATE: printing to the console does not make this error occur. I will accept an answer that explains way Notepad (the program I am using to open the txt) sometimes displays numbers and sometimes displays symbols.
Here is the relevant code:
//fields
private static BufferedWriter out;
private File saveFile;
String data;
//inside constructor
this.saveFile = new File("C:\\Users\\HPlaptop\\Desktop\\MouseData.txt");
this.saveFile.delete();
try{this.saveFile.createNewFile();}
catch (IOException e ){System.out.println("File creation error");}
try {out = new BufferedWriter(new FileWriter("C:\\Users\\HPlaptop\\Desktop\\MouseData.txt"));}
catch (IOException e) {System.out.println("IO Error");}
this.control.addWindowListener(new WindowAdapter()
{
public void windowClosing(WindowEvent e)
{ //there is a method call here but the basics are below
out.write(data);
out.close();
System.exit(0);
}
});
Here is an example data set printed correctly:
1365 767 1365 767 1365 767 1364 767 1353 756 1268 692 1114 604 980 488 812 334 744 283 694 244 593 150 473 81 328 13 207 0 124 0 115 0 102 0 99 6 107 13 132 20 173 32 187 31 190 25 194 20 201 17 215 14 221 10 224 7 224 7 224 7 226 6 226 6 226 6 226 6 226 6 226 6 226 6
This data set was taken seconds later and is not what I want
㐀ㄹ㈠㤰㐠㔸㈠㈱㐠㠶㈠㐱㐠㘲㈠㘰㌠㠷ㄠ㔹㌠㌳ㄠ㌹㈠㘹㈠㄰㈠㠷㈠㜳㈠㐶㈠㐷㈠㐶㈠㔷㈠㌶㈠㔵㈠㐵㈠㠰㈠㤴ㄠ㔲㈠㤴㐠‶㐲‹㌱㈠㘴〠㈠㘴〠㈠㘴〠㈠㜴〠㈠㠴〠㈠㠴〠㈠㜴㠠㈠㔴ㄠ‶㐲‵㤱㈠㔴ㄠ‹㐲‵㠱㈠㜴ㄠ‶㐲‹ㄱ㈠〵ㄠ‰㔲‰〱

The BufferedWriter is not making an error and the code is correct except for
the redundancy of using
this.saveFile.delete();
try{this.saveFile.createNewFile();}
catch (IOException e ){System.out.println("File creation error");}
and
new FileWriter
The error in reading the data occurs when the file is opened. The depending on what program opens the data different results are displayed because of the way the software reads the data. Notepad was displaying symbols because it interpreted the numbers as ASCII. The console did not try to interpret the data and just displayed what was written to it. Using a program that does not try to interpret the numbers in the file will allow the data to be viewed correctly.

Since you did not provide and example of what data you write into the stream, you are probably experiencing the bush hid the facts phenomenon.

BCrypt very slow in app engine application

I'm developing a GWT/app engine application, using Eclipse. The below problem occurred after upgrading to app engine 1.6.4 from 1.6.3. After the upgrade, I my application would not work at all. Unfortunately, I had deleted my old app engine plugins, so I was unable to rollback to 1.6.3. After 2 hours of banging my head against the wall, I decided to recreate my Eclipse project. The project worked again, except for the following anomaly:
I'm using BCrypt to implement one-way hash encoding of passwords. Before yesterday, this worked fine, with password encodes and checks occurring very fast -- probably in a few milliseconds. Now these operations take on the order of 2 minutes! Using the debugger, I paused the application to see if I could figure out what was going on. Each time I pause, I get a stack trace such as the following:
Thread [798744730#qtp-2080190228-3] (Suspended)
Class<T>.forName0(String, boolean, ClassLoader) line: not available [native method]
Class<T>.forName(String) line: 186
RuntimeHelper.checkRestricted(boolean, String, String, String) line: 63
Runtime.checkRestricted(boolean, String, String, String) line: 63
BCrypt.encipher(int[], int) line: 496
BCrypt.key(byte[]) line: 558
BCrypt.crypt_raw(byte[], byte[], int) line: 622
BCrypt.hashpw(String, String) line: 681
BCrypt.checkpw(String, String) line: 749
BCrypt.encipher() is as follows: (line 496 is shown below in a line comment)
private final void encipher(int lr[], int off) {
int i, n, l = lr[off], r = lr[off + 1];
l ^= P[0];
for (i = 0; i <= BLOWFISH_NUM_ROUNDS - 2;) {
// Feistel substitution on left word
n = S[(l >> 24) & 0xff];
n += S[0x100 | ((l >> 16) & 0xff)];
n ^= S[0x200 | ((l >> 8) & 0xff)];
n += S[0x300 | (l & 0xff)];
r ^= n ^ P[++i]; //*** LINE 496 *****
// Feistel substitution on right word
n = S[(r >> 24) & 0xff];
n += S[0x100 | ((r >> 16) & 0xff)];
n ^= S[0x200 | ((r >> 8) & 0xff)];
n += S[0x300 | (r & 0xff)];
l ^= n ^ P[++i];
}
lr[off] = r ^ P[BLOWFISH_NUM_ROUNDS + 1];
lr[off + 1] = l;
}
Depending on when I pause the debugger, different lines in BCrypt are the caller to Runtime.checkRestricted(), but it appears that Runtime.checkRestricted() is called continuously. Since this is called in embedded loops, I'm thinking that this is the cause. I then went on a hunt as to how to avoid this checkRestricted() call from happening. No luck.
I have a somewhat complicated application structure that contains three Google web applications (Eclipse projects). I'll call them:
Base
Store
App
where Store depends upon Base, and App depends upon both Store and Base. I use an Ant task to build the Base and Store projects into JAR files and copy them to the App/war/WEB-INF/lib folder.
Originally, I had BCrypt in its own Eclipse project and my ANT task would also JAR this and copy it to App/war/WEB-INF/lib. This was working fine for the past few months until now. To try to work around the current problem, I tried moving the BCrypt class (it contains only 1 class) directly into the Base project, with the same result, then into the Store project, again with the same result. Since my app currently calls BCrypt methods only from the Store project, I figured either of this might work. They did, functionally, but still taking 2 minutes to complete an encipher() call.
From the stack trace, Runtime or RuntimeHelper return Source Not Found when I click on them, and I can find nothing about them in Google searches.
Questions:
Why is every line in BCrypt subjected to a checkRestricted() call? This doesn't seem normal.
More importantly, any idea on how to fix this problem?
I don't know what to look at next. Any ideas would be very welcome, even if you don't know the ultimate solution.
Thanks very much.
Rick

I just created a ticket at the GAE code project: http://code.google.com/p/googleappengine/issues/detail?id=7277&thanks=7277&ts=1333530915
Maybe, we get an answer there as using BCrypt is also provided by an official tutorial: http://code.google.com/p/google-web-toolkit-incubator/wiki/LoginSecurityFAQ
Google fixed the bug and released a new version (1.6.4.1) today:
http://code.google.com/p/googleappengine/downloads/detail?name=appengine-java-sdk-1.6.4.1.zip

Yes. There is a regression in 1.6.4
See Is google app engine 1.6.4 slower in local? Comments there by Googlers working on the GAE SDK.

Java: why are multiple objects showing up with runhprof output?

I was curious about the runhprof output? I am mainly concerned about the memory section. It looks like there are multiple entries of the same class. Why would that be.
Is there a way to get hprof to print how much memory a particular class(the instances of that class) take up in memory. One value for each class.
Also, what tools do you use beside 'hat' to analyze the output?
I ran the java command with jvm arg:
-Xrunhprof:heap=sites,depth=4,format=a,file=prof/hprof_dump.txt
Here is brief snippet of the output. Some classes are listed multiple times in the output.
SITES BEGIN (ordered by live bytes) Tue Jul 28 19:33:41 2009
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 29.75% 29.75% 700080 43755 576000016 36000001 307483 java.lang.Double
2 7.13% 36.88% 167840 5245 370432 11576 300993 clojure.lang.PersistentHashMap$LeafNode
3 2.09% 38.98% 49296 2054 60048 2502 301295 clojure.lang.Symbol
4 2.09% 41.07% 49200 3 49200 3 301071 char[]
5 1.33% 42.40% 31344 1306 68088 2837 300998 clojure.lang.PersistentHashMap$BitmapIndexedNode
6 1.10% 43.50% 25800 645 25800 645 301050 clojure.lang.Var
7 1.05% 44.54% 24624 3 24624 3 301069 byte[]
8 0.86% 45.40% 20184 841 49608 2067 301003 clojure.lang.PersistentHashMap$INode[]
9 0.78% 46.18% 18304 572 58720 1835 301308 clojure.lang.PersistentList
10 0.75% 46.93% 17568 549 17568 549 308832 java.lang.String[]
11 0.70% 47.62% 16416 2 16416 2 301036 byte[]

Eclipse Memory Analyzer is excellent. Loads the dump file up very very quickly, produces lots of nice reports about the heapdump, lets you query the dump for objects/classes using a SQL-like language. Love it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java com.itextpdf.text.exceptions.InvalidPdfException: The document has no page root - java

I get the same issue on itext 5.5.10. I haven't taken a look yet on some new changes on the latest version. But it's working fine on itext 5.3.4. You could try on that version

Related

H2O AI : Unsupported MOJO model 'word2vec'

Morena 7 Scanner completely ignores settings

Buffered Writer occasionally creates symbols rather than numbers

BCrypt very slow in app engine application

Java: why are multiple objects showing up with runhprof output?

Categories

Resources