How to use a Tika custom parser in a jar file? - java

I'm trying to write a custom Apache Tika parser (for DICOM medical images), and package it as a plugin in a jar file.
I'm following the instructions from http://tika.apache.org/1.18/parser_guide.html, and took these projects as models:
https://github.com/Gagravarr/VorbisJava (the tika part),
https://github.com/Gagravarr/MPXJ-Tika.
So, I created a Maven project, wrote a parser class and a org.apache.tika.parser.Parser file in the resources folder, built the project with mvn install, and I now have a jar file.
My question is, how do I make Tika use this new parser? The instructions on the Tika wiki say:
To install a plugin, download it according to instructions below and
drop the jar(s) on your classpath. Tika will auto detect the plugin.
I tried to do this with java -classpath /path/to/my-parser.jar ... but it doesn't seem to work:
java -classpath /path/to/my-parser.jar -jar tika-app-1.18.jar --list-parsers
doesn't list the new parser, for instance.
I'm not a java person, and I'm really not sure about what "drop the jar on your classpath" means. I would really appreciate if someone could point me to the right direction! Thanks.

You've sadly made a common Java newbie mistake - for various historic reasons the java program won't accept both -jar and -classpath options, and will ignore the -classpath parts you've given.
If you want to run the Apache Tika App on the command line, with an extra parser jar or two added, what you need to do is something like:
java -classpath tika-app.jar:my-extra-parser.jar org.apache.tika.cli.TikaCLI --list-parsers
That calls the main Tika App entry point (the default with -jar) when running with both the Tika App jar and your custom extra jar on the classpath.
You may also find the Troubleshooting Apache Tika guide from the Tika wiki useful when developing custom plugins like this!

Related

Java code reuse with pre compiled class including Main

I downloaded all the Apache POI downloadables recently, specifically poi-examples-3.11-20141221.jar wherein it includes pre-compiled examples like "How to Use".
The problem is I can't run the pre-compiled classes without Eclipse.
Specifics: poi-examples-3.11-20141221.jar
-> org.apache.poi.xssf.eventusermodel
-> XLSX2CSV.class
XLSX2CSV is already compiled with main() and I just want to simply run it without eclipse.
Links through other tutorials about JAVA Reference Class and Jar in Java will also be helpful.
Im new here so please be gentle.
Run it with the jar on the classpath, but not with the -jar option
For example, for the .xls to csv converter example XLS2CSVmra you'd do something like:
java -classpath poi-3.12-beta1.jar:poi-examples-3.12-beta1.jar org.apache.poi.hssf.eventusermodel.examples.XLS2CSVmra
Make sure that all the POI jars you need are on the classpath (.xlsx / XSSF needs more), along with any of their dependencies from the lib directory. See the POI Components Page for details of what jars you need for what

How do I get the opennlp api files recognized to compile?

I've downloaded the OpenNLP tools, and was able to get the command line tools to run after adding the right paths to my bash file, but I can't figure out how to get the api files to work with my IDE (Netbeans).
For the command line tools I'm pointing the path to the /bin directory. There are four jar files in the /lib directory: jwnl-1.3.3.jar, opennlp-maxent-3.0.3.jar, opennlp-tools-1.5.3.jar, opennlp-uima-1.5.3.jar.
Any help on where to put these files, and how to access the opennlp tools api in Netbeans would be greatly appreciated.
To use the API (either training or tagging), you only need opennlp-tools-1.5.3.jar. In NetBeans, if you have external library dependencies, see this question:
How to add a JAR in NetBeans

Can't open a .jar file on my mac

I'm trying to download jsoup on my mac (Mountain Lion). I've downloaded the jsoup.jar file and installed the last java 7 from the site. But here is the problem, when I double click the .jar file it tells to me:
The Java JAR file “jsoup-1.7.2.jar” could not be launched. Check the
console.
I can't find even the console! Someone can help me? I read a lot of answers about this topic, but they all talk about Java 6 and it has different settings that can't find.
EDIT
i also tried from the terminal with this command:
java -jar /Users/Ben/Downloads/jsoup-1.7.2.jar
but it tells me:
Failed to load Main-Class manifest attribute from
/Users/Ben/Downloads/jsoup-1.7.2.jar
The JSoup JAR is not executable, so you are not going to be able to 'run' it in any of the ways you described. You are supposed to include it in your project classpath and use classes from it to do your parsing (after importing them of course).
You might want to refer to the JSoup Guide for examples on using the library.
I guess you are trying to run the jsoup library as a standalone application assuming it to be an executable jar. All indicates that the jar file you are using is NOT an executable jar hence it wont work.
jsoup.jar is supposed to be used as a java library and you will need to write java code to be able to use the HTML parsing capabilities.
If you are using an IDE like IntelliJ, you can open the module settings for a particular project and select Libraries. There'll be an option to add a particular external library from the Maven repository after which you can download the JAR and include it in your project's dependencies.

Java application deploy

I have a Swing desktop application and have created a jar file which depends on library (which is kept in ./lib/) and a .txt file in the same folder. Now to execute the jar I have written a .bat file which checks if Java is installed or not. If installed then I run the jar file with command:
javaw -jar TagEdit.jar
Now there are two problems I am facing with this:
I would rather prefer a single executable, if possible.
As using bat file, the console is visible in back (looks kind of weird). Is it possible to turn it off?
Java is everywhere, and there are lots of applications that are built in Java and packaged in a setup, or given as exe. I Googled a lot but could not find a way to create a setup for the software or an exe. How are those software packaged?
Have tried jlaunch, but could not get that to work correctly.
Himz, Eclipse can automatically build a so-called "fat-jar" for you. It is a jar that contains all the dependencies you need.
If you are a happy Maven user, then you have two brilliant alternatives - the shade plugin, and the assembly plugin. They both can produce a "fat-jar" for you. :)
There are various answers to this.
javaws.exe will execute the jar without the console appearing behind
But I feel this isn't really the best way.
I think should investigate using Java Web Start, So you create a JNLP file and have it jar downloaded from the web, I think, you can also have a desktop icon.
If you don't want that
I think you can get/buy binary wrappers for the jar.
You could convert it to an executable. Try Googling java to exe.
Once that is done, you could package it up as an installer using NSIS.

What is the process to compile Nutch into one Jar file (and run it)?

I'm trying to run the Nutch crawler in a way that I can access all its functionality through one JAR file that contains all its dependencies.
For instance,
java -jar nutch-all-1.2.jar -crawl <other params>
and at a later stage, call it with hadoop.
Currently, doing a
java -jar nutch-1.2.jar
on the JAR file that exists in the nutch directory results in the error,
Failed to load Main-Class manifest attribute from
nutch-1.2.jar
I believe this happens because this particular JAR does not contain the manifest XML files, or other dependent JARs. What would you recommend as the best method to build nutch into one JAR for this purpose?
Thanks!
I realized after much looking around that to run Nutch off the command line in a simple manner, the nutch.job file can be used instead. The syntax is,
hadoop jar nutch-1.0.job org.apache.nutch.crawl.Crawl urls -dir crawl -depth 1

Categories

Resources