how to solve this java.lang.ExceptionInInitializerError in java code? - java

there is one error occur on this line i did not understand what to say this. and there is also i am use one library of qoppa.jar how to slove this issue can any one help me
m_LoadedDoc = new PDFDocument(new FilePDFSource((String) path[0]), PDFViewer.this);
java.lang.ExceptionInInitializerError
at com.qoppa.android.pdfProcess.PDFDocument$1.b(Unknown Source)
at com.qoppa.android.pdfViewer.e.p.b(Unknown Source)
at com.qoppa.android.pdfProcess.PDFDocument.b(Unknown Source)
at com.qoppa.android.pdfProcess.PDFDocument.<init>(Unknown Source)
at com.qoppa.android.pdfProcess.PDFDocument.<init>(Unknown Source)
at com.pdfplugin.PDFViewer$LoadDocument.doInBackground(PDFViewer.java:469)

There is a magic line that needs to be called before reading the document:
// Magic: Register asset manager for font loading.
StandardFontTF.mAssetMgr = getContext().getAssets();
// Now you can read the document.
PDFDocument doc = new PDFDocument(new FilePDFSource(path), PDFViewer.this);
You might also need to include the assets/fonts and assets/cmaps directories from the sample project.

Related

Java heap space when adding documents to List of documents

I am using import org.w3c.dom.Document; for document.
I have this block of code that parses the xml file from the arraylist fileList, there are more than 2000 xml files to be parsed and size of the xml files are around 30-50 Kb, I have no problem parsing the files:
try {
for(int i = 0; i < fileList.size(); i++) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(fileList.get(i)); //<------ error will point here when docList.add(doc) is uncommented.
docList.add(doc);
}
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
}
but whenever I add them to the list this error comes up:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.createChunk(Unknown Source)
at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.ensureCapacity(Unknown Source)
at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.createNode(Unknown Source)
at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.createDeferredTextNode(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractDOMParser.characters(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at com.test.parser.Parser.getDocs(Parser.java:146)
at com.test.parser.Parser.main(Parser.java:50)
uncommenting the docList.add(doc) does not produce this exception, any idea why this is happening?
EDIT: I added -Xmx1024M to VMArguments in Run Configurations and it worked.
uncommenting the docList.add(doc) does not produce this exception, any idea why this is happening?
It's simple: without storing doc reference in docList, doc reference will be overrived by new object - Document doc = builder.parse(fileList.get(i));, so the doc from previous iteration will be orphan - object without reference. This one will be fastly removed by JVM garbage collector, so during loop you will have at most 2 doc objects on the heap.
But, with docList.add(doc) active, you will still have references to all doc objects created in loop: exactly fileList.size() instances. They aren't collected (and removed from heap) by garbage collector, because docList will have valid, active references to them.
How to avoid OutOfMemoryError? Just parse / process document one by one, after destroying DOM object of previous doc, or consider using streaming parser, for example SAXParser.
right click on project folder
click -> runAs -> run Configuration -> click on arguments tab -> add
-xmx512M press Enter
-xmx2048M
Apply and Run.

Exception while trying to load openNLP POS models

I have been trying to use POS Models for POS tagging, but while loading the Models I get the following exception, and this happens for both maxent as well as perceptron models:
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
at java.util.zip.ZipInputStream.read(Unknown Source)
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readLong(Unknown Source)
at java.io.DataInputStream.readDouble(Unknown Source)
at opennlp.model.BinaryFileDataReader.readDouble(BinaryFileDataReader.java:53)
at opennlp.model.AbstractModelReader.readDouble(AbstractModelReader.java:75)
at opennlp.model.AbstractModelReader.getParameters(AbstractModelReader.java:146)
at opennlp.perceptron.PerceptronModelReader.constructModel(PerceptronModelReader.java:69)
at opennlp.model.GenericModelReader.constructModel(GenericModelReader.java:59)
at opennlp.model.AbstractModelReader.getModel(AbstractModelReader.java:87)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:35)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:31)
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:231)
at opennlp.tools.util.model.BaseModel.(BaseModel.java:190)
at opennlp.tools.postag.POSModel.(POSModel.java:86)
at nlpcheck.NlpPOC.POSTag(NlpPOC.java:54)
at nlpcheck.NlpPOC.main(NlpPOC.java:86)
I have tried loading the tokenizaton model (en-token.bin) and Its loading and working fine.
Following is java snippet that I am using to load Model:
InputStream is = new FileInputStream(MODEL_PATH);
POSModel model = new POSModel(is);
I have downloaded the models (en-pos-perceptron.bin, en-pos-maxent.bin) from http://www.opennlp.org/models-1.5/.
It turns out the model file hosted on site mentioned above were corrupt, I was trying a different tool namely GATE(General architecture for Text Engineering) which was using the same model files so I copied them and put them on build path and it worked.

FontFactory (lowagie), Java, getting UnsupportedEncodingException when trying to use UniJIS-UCS2-H (for Japanese)

I am using com.lowagie.text.FontFactory in generating a PDF file and am trying to use a custom font, KozMinPro-Regular, which provides support for Japanese characters, as we have a need to support this. I have found examples from searching that show how to do this similar to how I am doing it below and these examples assume that UniJIS-UCS2-H encoding is supported but when I try this I am getting the exception below that says this encoding is not supported. I would appreciate if anyone may have any insight into this. Thanks
FontFactory.register("/usr/share/fonts/truetype/KozMinPro-Regular.ttf", "JapaneseCompatible");
contentFont = FontFactory.getFont("JapaneseCompatible", "UniJIS-UCS2-H", true, 11, Font.BOLD);
headerFont = FontFactory.getFont("JapaneseCompatible", "UniJIS-UCS2-H", true, 11, Font.BOLD);
The exception I get:
Exception: [.ReportPdfView] Exception caught during generation of pdf file. Cause: UniJIS-UCS2-H
ExceptionConverter: java.io.UnsupportedEncodingException: UniJIS-UCS2-H
at java.lang.StringCoding.encode(StringCoding.java:286)
at java.lang.String.getBytes(String.java:954)
at com.lowagie.text.pdf.PdfEncodings.convertToBytes(Unknown Source)
at com.lowagie.text.pdf.TrueTypeFont.<init>(Unknown Source)
at com.lowagie.text.pdf.BaseFont.createFont(Unknown Source)
at com.lowagie.text.pdf.BaseFont.createFont(Unknown Source)
at com.lowagie.text.pdf.BaseFont.createFont(Unknown Source)
at com.lowagie.text.FontFactoryImp.getFont(Unknown Source)
at com.lowagie.text.FontFactoryImp.getFont(Unknown Source)
at com.lowagie.text.FontFactory.getFont(Unknown Source)
at com.lowagie.text.FontFactory.getFont(Unknown Source)
You need iTextAsian.jar . It gives CJK support.
see...
http://itextpdf.sourceforge.net/ for earlier versions of iText or
http://sourceforge.net/projects/itext/files/extrajars/ for later version of iText.(extrajars.zip contains iTextAsian.jar)

Why is WSDL parser still importing external documents?

I tried to turn off importing documents in WSDL4J (1.6.2) in the way suggested
by the API documentation:
wsdlReader.setFeature("javax.wsdl.importDocuments", false);
In fact, it stops importing XML schema files declared with wsdl:import tag, but does stop importing files declared with xs:import tags.
The following code snippet [see at the end of the letter] for the example file
http://www.ibspan.waw.pl/~gawinec/example.wsdl
returns the following exception:
javax.wsdl.WSDLException: WSDLException (at /definitions/types/xs:schema):
faultCode=OTHER_ERROR: An error occurred trying to resolve schema referenced
at 'EchoExceptions.xsd', relative to
'http://www.ibspan.waw.pl/~gawinec/example.wsdl'.:
java.io.FileNotFoundException: This file was not found:
http://www.ibspan.waw.pl/~gawinec/EchoExceptions.xsd
at com.ibm.wsdl.xml.WSDLReaderImpl.parseSchema(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.parseSchema(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.parseTypes(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.parseDefinitions(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.readWSDL(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.readWSDL(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.readWSDL(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.readWSDL(Unknown Source)
at com.ibm.wsdl.xml.WSDLReaderImpl.readWSDL(Unknown Source)
at IsolatedExample.main(IsolatedExample.java:15)
Caused by: java.io.FileNotFoundException: This file was not found:
http://www.ibspan.waw.pl/~gawinec/EchoExceptions.xsd
at com.ibm.wsdl.util.StringUtils.getContentAsInputStream(Unknown Source)
... 10 more
Can you suggest me any solution to this problem? I just don't want to import
external XML schemata.
Regards,
Maciej
import javax.wsdl.WSDLException;
import javax.wsdl.factory.WSDLFactory;
import javax.wsdl.xml.WSDLReader;
public class IsolatedExample {
public static void main(String[] args) {
WSDLFactory wsdlFactory;
try {
wsdlFactory = WSDLFactory.newInstance();
WSDLReader wsdlReader = wsdlFactory.newWSDLReader();
wsdlReader.setFeature("javax.wsdl.verbose", false);
wsdlReader.setFeature("javax.wsdl.importDocuments", false);
wsdlReader.readWSDL("http://www.ibspan.waw.pl/~gawinec/example.wsdl");
} catch (WSDLException e) {
e.printStackTrace();
}
}
}
A quick look at WSDL4J (it's been a while since I've worked directly with this project) suggests that there is no option specifically to prevent the reading of imported schemas. You may have stumbled upon on a bug in WSDL4J's mechanism of deserializing schemas. That said, if you're not interested in the contents of any schemas, including those inlined in the WSDL document, you can register your own extension registry (simply modify the PopulatedExtensionRegistry class to leave out the SchemaDeserializer).
Specifically, leave out the following lines:
mapExtensionTypes(Types.class, SchemaConstants.Q_ELEM_XSD_1999,
SchemaImpl.class);
registerDeserializer(Types.class, SchemaConstants.Q_ELEM_XSD_1999,
new SchemaDeserializer());
registerSerializer(Types.class, SchemaConstants.Q_ELEM_XSD_1999,
new SchemaSerializer());
mapExtensionTypes(Types.class, SchemaConstants.Q_ELEM_XSD_2000,
SchemaImpl.class);
registerDeserializer(Types.class, SchemaConstants.Q_ELEM_XSD_2000,
new SchemaDeserializer());
registerSerializer(Types.class, SchemaConstants.Q_ELEM_XSD_2000,
new SchemaSerializer());
mapExtensionTypes(Types.class, SchemaConstants.Q_ELEM_XSD_2001,
SchemaImpl.class);
registerDeserializer(Types.class, SchemaConstants.Q_ELEM_XSD_2001,
new SchemaDeserializer());
registerSerializer(Types.class, SchemaConstants.Q_ELEM_XSD_2001,
new SchemaSerializer());
I haven't used Java for webservices, but have you tried setting an absolute path to the schemas you import? Perhaps it's trying to load a local file.
You could also try sniffing the wire to see if you're making a request, perhaps it's malformed.
$0.02

Unmarshaling xml with html entities using JAXB

I need to load wikipedia revision histories into POJOs, so I'm using JAXB to unmarshall the wikipeida data dump (well, individual pages of it). The problem is that the text nodes occasionally contain entities that are not defined in the wikipedia xml dump. eg: ° (`°' pleases keep in mind that I do not know the complete set of entities that I need to be able to read. My input file is 3tb, so let's just assume that everything html can render is in there.).
How can I configure JAXB to handle entities that are not valid xml?
Here is the SAX Exception that JAXB throws when it encounters an undefined entity:
Exception in thread "main" javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException: The entity "deg" was referenced, but not declared.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:481)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:199)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:168)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:137)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:184)
at com.stottlerhenke.tools.wikiparse.WikipediaIO.readPage(WikipediaIO.java:73)
at com.stottlerhenke.tools.wikiparse.WikipediaIO.main(WikipediaIO.java:53)
Caused by: org.xml.sax.SAXParseException: The entity "deg" was referenced, but not declared.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:195)
Edit: The input that triggered that exception is the complete revision history for the wikipedia article on the Arctic Circle. The XSD used to generate the JAXB classes is here: http://www.mediawiki.org/xml/export-0.3.xsd
Edit: The source of this problem was an error on my part -- I was using an initial extractor that did not maintain encoded entities properly. However, I did find a way around this, should anyone have the problem I thought I had. See below.
Resolving entities is not the job of JAXB's. It's the job of the underlying
XML parser.
What you could do is:
read the data yourself using DOM
replace all unresolved entities by something you wish
then, let JAXB handle the result
This is a hack, but it works in a pinch.
I downloaded the html entity definitions from w3.org, and set the doctype of the input xml file to xhtml-transitional, but directed the doctype url to a local dtd:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "xhtml1-transitional.dtd">
xhtml1-transitional.dtd, in turn, requires:
xhtml-lat1.ent
xhtml-special.ent
xhtml-symbol.ent
which I sucked down and put along side xhtml1-transitional.dtd
(All files are available at: http://www.w3.org/TR/xhtml1/DTD/ )
Like I said, ugly as hell, but it did seem to do the job.

Categories

Resources