Convert XPS-File to Text with Java

Convert XPS-File to Text with Java - java

I wish to convert a .XPS File to Text with Java or Kotlin.
Aspose is to expensive for me.
I found Java-AXP, which should do the job but I did not find any documentation or sample code for it.
I managed to get the Java-AXP Core to my project libraries in IntelliJ. I can now access the files from within my project. But actually understanding how to use the library to get the text converstion is way beyond me.
The only attempt to use the library with example code I found here:
File b =new File("D:\\Chemia\\Clients\\Clients\\Docs\\Equipment\\CCP\\XPSOCRDemo.xps");
IXPSAccess access = new XPSFileAccessImpl(b);
IXPSFileAccess xpsFileAccess = access.getFileAccess();
XPSDocumentAccessImpl xpsimpl=new XPSDocumentAccessImpl(access);
int docunum = xpsimpl.getFirstDocNum();
IDocumentStructure structure=xpsimpl.getDocumentStructure(docunum);
List<IOutlineEntry> list=(List<IOutlineEntry>)
structure.getDocumentStructureOutline().getDocumentOutline().getOutlineEntry(
);
list.stream().forEach(restu->{
System.out.println( restu.getOutlineTarget());
});
However, I do get the same null pointer exeption as the OP and I can't fix it. I'd need a working example code to continue on my own.
So how can I use Java-AXP? Or are there alternative libraries? I am open to use anything.
Thanks for any help.
Edit:
#Abra Thanks.
Exception in thread "main" java.lang.IllegalStateException: structure must not be null
at MainKt.main(main.kt:80)
at MainKt.main(main.kt)
73 | val structure = xpsimpl.getDocumentStructure(0)
80 | val list = structure.documentStructureOutline?.documentOutline?.outlineEntry as List<IOutlineEntry>?
xpsimpl.getDocumentStructure(docunum) is null
I can not use a website for my purpose because the XPS files contain sensitive information.
Edit: Here is the solution I found.
I found this repo which uses Java-AXP. However there seemed to be some differences to the java-axp-core.jar I used from
Google Code
So I copied the javaaxp folder to my project.
It needed Apache Tika so I downloaded tika-app-2.1.0.jar from Apache Tika and included it in my project via the Project Structure menu.
In XPSZipFileAccess.java I had a wrong import for IOUtils so I changed
import org.apache.tika.io.IOUtils;
to
import org.apache.commons.io.IOUtils;
Now the Java-AXP could resolve all references.
Then I copied XPSParser.java to the root of my project and made sure all imports work.
In DocSaver.java on line 90 I found where XPSParser was used and I adapted the code so I could convert the XPS file to text:
val xpsFile = File(path + "filename.xps")
val inputStream = FileInputStream(xpsFile)
val metadata = Metadata()
val handler = BodyContentHandler()
XPSParser().parse(inputStream, handler, metadata, ParseContext())
val docContents = handler.toString()
println(docContents)
inputStream.close()
Hope it helps someone.

Here is the solution I found.
I found this repo which uses Java-AXP. However there seemed to be some differences to the java-axp-core.jar I used from
Google Code
So I copied the javaaxp folder to my project.
It needed Apache Tika so I downloaded tika-app-2.1.0.jar from Apache Tika and included it in my project via the Project Structure menu.
In XPSZipFileAccess.java I had a wrong import for IOUtils so I changed
import org.apache.tika.io.IOUtils;
to
import org.apache.commons.io.IOUtils;
Now the Java-AXP could resolve all references.
Then I copied XPSParser.java to the root of my project and made sure all imports work.
In DocSaver.java on line 90 I found where XPSParser was used and I adapted the code so I could convert the XPS file to text:
val xpsFile = File(path + "filename.xps")
val inputStream = FileInputStream(xpsFile)
val metadata = Metadata()
val handler = BodyContentHandler()
XPSParser().parse(inputStream, handler, metadata, ParseContext())
val docContents = handler.toString()
println(docContents)
inputStream.close()

Related

Spring Rest Docs snippet template is ignored

I am attempting to be able to create custom snippet template for Spring Rest Docs documentation purposes. I have been following the reference guide
My first problem I ran into was in IntelliJ when creating a .snippet file in src/test/resources/org/springframework/restdocs/templates/asciidoctor/path-parameters.snippet as instructed by the guide. IntelliJ registers it as a jShell snippet file which from a quick glance is not the same as a AsciiDoc snippet. So I went into Settings -> Editor -> File Types and changed jShell Snippet from *.snippet to *.snippetOld. I then created a file type called Snippet and put it's pattern as *.snippet. This now fixes the problem that the snippet template is read as a jShell file. So the .snippet template I created no longer gets compile/validation errors.
But now when I run my mockMvc test after deleting the previous existing adoc files. The path-perameters.adoc file which should had used my custom template, if not use the default template, is now not generated at all.
In my mockMvc test I have the following
//Given
RestDocumentationResultHandler document = makeDocument("name-of-method");
document.document(
pathParameters(//do path parameters)
requestParameters(
parameterWithName("Month").description("The month requested").attributes(
key("type").value("integer"), key("constraints").value("more than 0 & less than 13.")
),
parameterWithName("Year").description("The year requested").attributes(
key("type").value("integer"), key("constraints").value("more than 1970 and less than current year")
)
),
responseField(//Do response fields)
);
// When
mvc.perform(get(REQUEST_PATH, USERID)
.contentType(MediaType.APPLICATION_JSON)
.param("month", "8")
.param("year", "2018"))
// Then
.andExpect(status().isOk())
.andDo(document);
And my snippet template is the following:
|===
|Parameter|Description|Type|Constraints
|{{parameter}}
|{{description}}
|{{type}}
|{{constraints}}
Could somebody point out what I've done wrong/different from the reference guide and how I could fix it to have my template working?

I suspect the reason you did not get output is that you confused path-parameters with request-parameters. Your documentation config specifies request parameters, but you customized path-parameters template
Here is the complete list of snippet templates:
curl-request.snippet
http-request.snippet
http-response.snippet
httpie-request.snippet
links.snippet
path-parameters.snippet
request-body.snippet
request-fields.snippet
request-headers.snippet
request-parameters.snippet
request-part-body.snippet
request-part-fields.snippet
request-parts.snippet
response-body.snippet
response-fields.snippet
response-headers.snippet
Source:
https://github.com/spring-projects/spring-restdocs/tree/master/spring-restdocs-core/src/main/resources/org/springframework/restdocs/templates/asciidoctor

Did Matlab 2017a change how it imports external java classes?

I'm calling PDFBox from Matlab to figure out how many pages there are in a PDF. Everything works great with Matlba 2016b and prior. I can import the library and load a PDF without a problem:
import org.apache.pdfbox.pdmodel.PDDocument;
pdfFile = PDDocument.load(filename);
When I run the same thing in 2017a, I get the following error:
No method 'load' with matching signature found for class
'org.apache.pdfbox.pdmodel.PDDocument'.
I can change the line after the import so that the function signature matches:
jFilename = java.lang.String(filename);
pdfFile = PDDocument.load(jFilename.getBytes());
However, this causes PDFBox to have problems when I call load:
Java exception occurred:
java.io.IOException: Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1111)
at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:1874)
at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:1853)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:242)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1093)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
This error seems to occur regardless of the PDF I'm trying to load. I'm getting the same exception with PDFBox 1.8.10 and 2.0.6.
I'm left with 2 questions:
Did Matlab 2017a change how it passes strings to Java? I didn't see anything in the release notes about this.
What could be causing the PDFBox error? Matlab is still on Java 1.7 in 2017a so I wouldn't think there should be any difference in how PDFBox works.

It seems like the method you are calling is from PDDocument version 1.8.11
In the latest version, PDDocument version 2.0.2 the method signature for accepting a file name no longer exists.
Change your code to the following, and it should work.
pdfFile = PDDocument.load(java.io.File(filename));

apache PIG with datafu: Cannot resolve UDF's

I'm trying the quickstart from here: http://datafu.incubator.apache.org/docs/datafu/getting-started.html
I tried nearly everything, but I'm sure it must be my fault somewhere. I tried already:
exporting PIG_HOME, CLASSPATH, PIG_CLASSPATH
starting pig with -cpdatafu-pig-incubating-1.3.0.jar
registering datafu-pig-incubating-1.3.0.jar locally and in hdfs => both succesful (at least no error shown)
nothing helped
Trying this on pig:
register datafu-pig-incubating-1.3.0.jar
DEFINE Median datafu.pig.stats.StreamingMedian();
data = load '/user/hduser/numbers.txt' using PigStorage() as (val:int);
data2 = FOREACH (GROUP data ALL) GENERATE Median(data);
or directly
data2 = FOREACH (GROUP data ALL) GENERATE datafu.pig.stats.StreamingMedian(data);
I get this name-resolve error:
2016-06-04 17:22:22,734 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1070: Could not resolve datafu.pig.stats.StreamingMedian using imports: [, java.lang., org.apache.pig.builtin.,
org.apache.pig.impl.builtin.] Details at logfile:
/home/hadoop/pig_1465053680252.log
When I look into the datafu-pig-incubating-1.3.0.jar it looks OK, everything in place. I also tried some Bag functions, same error then.
I think it's kind of a noob-error which I just don't see (as I did not find particular answers for datafu in SO or google), so thanks in advance for shedding some light on this.

Pig script is proper, the only thing that could break is that while registering datafu there were some class dependencies that coudn't been met.
Try to run locally (pig -x local) and see a detailed log.
Check also the version of pig - it should be newer than 0.14.0.

SURF and SIFT algorithms doesn't work in OpenCV 3.0 Java

I am using OpenCV 3.0 (the latest version) in Java, but when I use SURF algorithm or SIFT algorithm it doesn't work and throws Exception which says: OpenCV Error: Bad argument (Specified feature detector type is not supported.) in cv::javaFeatureDetector::create
I have googled, but the answers which was given to this kind of questions did not solve my problem. If anyone knows about this problem please let me know.
Thanks in advance!
Update: The code below in third line throws exception.
Mat img_object = Imgcodecs.imread("data/img_object.jpg");
Mat img_scene = Imgcodecs.imread("data/img_scene.jpg");
FeatureDetector detector = FeatureDetector.create(FeatureDetector.SURF);
MatOfKeyPoint keypoints_object = new MatOfKeyPoint();
MatOfKeyPoint keypoints_scene = new MatOfKeyPoint();
detector.detect(img_object, keypoints_object);
detector.detect(img_scene, keypoints_scene);

If you compile OpenCV from source, you can fix the missing bindings by editing opencv/modules/features2d/misc/java/src/cpp/features2d_manual.hpp yourself.
I fixed it by making the following changes:
(line 6)
#ifdef HAVE_OPENCV_FEATURES2D
#include "opencv2/features2d.hpp"
#include "opencv2/xfeatures2d.hpp"
#include "features2d_converters.hpp"
...(line 121)
case SIFT:
fd = xfeatures2d::SIFT::create();
break;
case SURF:
fd = xfeatures2d::SURF::create();
break;
...(line 353)
case SIFT:
de = xfeatures2d::SIFT::create();
break;
case SURF:
de = xfeatures2d::SURF::create();
break;
The only requirement is that you build opencv_contrib optional module along with your sources (you can download the git project from https://github.com/Itseez/opencv_contrib and just set its local path on opencv's ccmake settings.
Oh, and keep in mind that SIFT and SURF are non-free software ^^;

That is because they are not free in newer versions of OpenCV (3+). I faced that problem some time ago. You have to:
Download OpenCV (if you have not)
Download the nonfree part from opencv github repo
Generate the makefiles with cmake -DBUILD_SHARED_LIBS=OFF specifying the nonfree part with DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules option and build with make -j8 (or whatever Java version you use)
Edit features2d_manual.hpp file, including opencv2/xfeatures2d.hpp and including the necessary code for SIFT and SURF case, which are commented and not defined:
fd=xfeatures2d::SIFT::create(); for SIFT descriptor and de = xfeatures2d::SIFT::create(); for SIFT extractor. Do the same for SURF if you want to use it too.
I wrote this post explaining step by step how to compile the non-free OpenCV part in order to use privative tools like SIFT or SURF.
Compile OpenCV non-free part.

I believe changing features2d module (FeatureDetector class or any other classes from features2d_manual.hpp) to enable methods from OpenCV contrib modules is less attractive approach because it leads to circular dependency between the "core" OpenCV and extensions (which can be non-free or experimental).
There is another way to fix this issue without affecting feature2d classes. Making changes in xfeatures2d CMakeLists.txt as described here leads to generation of java wrappers for SIFT and SURF - opencv-310.jar has org.opencv.xfeatures2d package now. Some fix was required in /opencv/modules/java/generator/gen_java.py. Namely inserted 2 lines as shown below:
def addImports(self, ctype):
if ctype.startswith('vector_vector'):
self.imports.add("org.opencv.core.Mat")
self.imports.add("org.opencv.utils.Converters")
self.imports.add("java.util.List")
self.imports.add("java.util.ArrayList")
self.addImports(ctype.replace('vector_vector', 'vector'))
elif ctype.startswith('Feature2D'): #added
self.imports.add("org.opencv.features2d.Feature2D") #added
elif ctype.startswith('vector'):
self.imports.add("org.opencv.core.Mat")
self.imports.add('java.util.ArrayList')
if type_dict[ctype]['j_type'].startswith('MatOf'):
self.imports.add("org.opencv.core." + type_dict[ctype]['j_type'])
else:
self.imports.add("java.util.List")
self.imports.add("org.opencv.utils.Converters")
self.addImports(ctype.replace('vector_', ''))
After these changes wrappers are generated successfully. However the main problem still remains, how to use these wrappers from Java )). For example SIFT.create() gives the pointer to a new SIFT class but calling any class method (for instance detect()) crashes Java. I also noticed that using MSER.create() directly from Java leads to the same crash.
So it looks like the problem is isolated to the way how Feature2D.create() methods are wrapped in Java. The solution seems to be the following (again, changing /opencv/modules/java/generator/gen_java.py):
Find the string:
ret = "%(ctype)s* curval = new %(ctype)s(_retval_);return (jlong)curval->get();" % { 'ctype':fi.ctype }
Replace it with the following:
ret = "%(ctype)s* curval = new %(ctype)s(_retval_);return (jlong)curval;" % { 'ctype':fi.ctype }
Rebuild the opencv. That is it, all create() methods will start working properly for all children of Feature2D class including experimental and non-free methods. FeatureDescriptor/DescriptorExtractor wrappers can be deprecated I think as Feature2D is much easier to use.
BUT! I'm not sure if the suggested fix is safe for other OpenCV modules. Is there a scenario when (jlong)curval needs to be dereferenced? It looks like the same fix was suggested already here.

Creating index and adding mapping in Elasticsearch with java api gives missing analyzer errors

Code is in Scala. It is extremely similar to Java code.
Code that our map indexer uses to create index: https://gist.github.com/a16e5946b67c6d12b2b8
Utilities that the above code uses to create index and mapping: https://gist.github.com/4f88033204cd761abec0
Errors that java gives: https://gist.github.com/d6c835233e2b606a7074
Response of http://elasticsearch.domain/maps/_settings after running code and getting errors: https://gist.github.com/06ca7112ce1b01de3944
JSON FILES:
https://gist.github.com/bbab15d699137f04ad87
https://gist.github.com/73222e300be9fffd6380
Attached are the json files i'm loading in. I have confirmed that it is loading the right json files and properly outputting it as a string into .loadFromSource and .setSource.
Any ideas why it can't find the analyzers even though they are in _settings? If I run these json files via curl they work fine and properly setup the mapping.

The code I was using to create the index (found here: Define custom ElasticSearch Analyzer using Java API) was creating settings in the index like:
"index.settings.analysis.filter.my_snow.type: "stemmer","
It had settings in the setting path.
I changed my indexing code to the following to fix this:
def createIndex(client: Client, indexName: String, indexFile: String) {
//Create index
client.admin().indices().prepareCreate(indexName)
.setSource(Utils.loadFileAsString(indexFile))
.execute()
.actionGet()
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert XPS-File to Text with Java - java

Related

Spring Rest Docs snippet template is ignored

Did Matlab 2017a change how it imports external java classes?

apache PIG with datafu: Cannot resolve UDF's

SURF and SIFT algorithms doesn't work in OpenCV 3.0 Java

Creating index and adding mapping in Elasticsearch with java api gives missing analyzer errors

Categories

Resources