Generating training data using Tess4J

Generating training data using Tess4J - java

I am using Tess4J to use Tesseract programmatically, which works great for recognition tasks.
Now I want to add some features that help with generating traineddata files from texts like described in this wiki article, but from Java/Tess4J. It won't matter if I have to use the "NEW Automated method" or the "Old Manual method". Both will be fine.
Does Tess4J support this or is there another binding for Java that is capable of training Tesseract?

The training is provided by other executables besides Tesseract, and they are not exposed as API or libraries. For Java-based Tesseract training, you may want to check out jTessBoxEditor project.

Related

How To Set up RapidMiner library in Android Studio

I am currently working on an object recognition app. I'm using Android Studio and I have created a Neural Network model on Rapid Miner Studio and saved it as PMML. I want to apply this model on a set of extracted features in Android Studio so that I can obtain a prediction (e.g: is the object a fruit/vegetable/nut?). However, I'm not able to integrate the Rapid Miner library in Android Studio. I've downloaded "rapidminer-extension-template" from https://github.com/rapidminer/rapidminer-extension-template. Is it the correct file that should be downloaded?
I have looked for a working solution for the past 4 days but I can't seem to find one.
Do I have to use the Rapid Miner library to apply the PMML model or is it possible to use something else?

the extension template is used to create new extensions for RapidMiner, which can provide new Operators.
So in your case you would need is an interpreter of the PMML model in Android. This project looks promising, but I haven't tested it myself.
Also please note, that the Write PMML Operator of RapidMiner currently does not support Neural Network models. The help text states the supported models are :
Decision Tree Models
Rule Models
Naive Bayes models for nominal attributes
Linear Regression Models
Logistic Regression Models
Centroid based Cluster models like models of k-means and k-medoids
Also feel free to ask further, or re-post, questions in the RapidMiner community forum, especially if you have questions about extension development you can find qualified help there.

I downloaded the pmml-evaluator library and added it to my project. The documentation on GitHub helped me achieve what I was looking for; it was able to read the PMML model and provide a prediction.
Also, even if on RapidMiner's website, they say that the PMML writer's list of compatible classifiers is restricted to the ones mentioned, I was able to save the Neural Net model and use it for prediction in Android Studio.

CMUsphinx support for other languages

currently I am using Sphinx4 in a java desktop application to just to match some speech/words against the xxx.gram file. I have not installed sphinx in my system rather I'm just using sphinx4.jar in my project's classpath. And its working fine till now.
Now, I want to use it with Spanish, Portuguese, and Chinese languages. how could I do that? I don't want to install Sphinx. Is there any .jar file or light weighted code or services (even if other than sphinx) available for the same?

You can download Spanish model here:
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Spanish%20Voxforge/voxforge-es-0.1.1.tar.gz/download
Portuguese and Chinese are not supported. If you need them you can build models from your audio, see for details
http://cmusphinx.sourceforge.net/wiki/tutorialam
To make data collection easier you can use audiobooks.

Generating CCD document using MDHT API

I am trying to use MDHT API to generate CCD documents. I am doing this in the following way.
Downloaded Java runtime libraries and placed them in classpath and writing code to generate all the sections using MDHT API.
Writing code for each section is taking long (a bit complicated). I was wondering if I am missing anything. Is there any open source mdht GUI that generates code for each section or am I moving in the right direction?
I am currently stuck at Medications Section/Immunization Section. Can anyone please redirect me to any examples/tutorials related to each section. I have already looked at user guide/developers guide.
Any help is appreciated.

I think MDHT API will only provide a Model for the CCD document.
if it contains any default implimentaion to generate CCD document,,i dont know..
any way better you just generate the XML in DOM,STAXs API.
CCD example link
Another better API that i found is MIRTH
follow the link
Mirth User guide

The best place to look for help/sample code is the developers forum: Eclipse Community Forums » Model Driven Health Tools.
You may need to create a (free) account to get access.
You could also go to the forums to post your specific MedicationsSection/immunization question for a possible specific answer.
Another good site is CDA tools: MDHT Developers Guide. Look at Produce CDA Content using MDHT API.
There are countless examples of building documents in the test code projects.
Download the All In One (MDHT_CDATools) from GitHub linked in the MDHT project site and look at the code in the test projects such as: org.openhealthtools.mdht.cda.consol.example
The closest (current) thing to a GUI based document building application in MDHT would be using the all in one to modify the existing models and generate sample snippets/documents that way (or creating a model from scratch, which includes only what you need). If you wanted the entire document produced in XML, you could then generate the instance from GeneralHeaderConstraints as opposed to any of the many child templates (which would give you snippets). Either way, this is not really the intention of the model interface (it's more for conformance) and would take you far longer to implement than using the API itself - which accesses the models which already exist in order to auto-generate conformant content quickly.

1-D barcode scanner(using images from a capturing device) implementation in java

I am a student and as a project i have to implement a barcode(1-D) based attendance marking system.While surfing across the web i came to know that barcode readers are a bit costly toys to purchase,so now what I want to do is I want to capture images of barcodes through a capturing device(mostly a webcam) and then process them to get the content stored in it.
I found a few projects on the internet that do the same but they use .NET f/w and I am not so familiar with .NET technology. The only project that uses java is http://sourceforge.net/projects/javabarcoderead/ but somehow i am not able to run the jar file they are providing.
SO, I would like to know about the algorithms or methods that can be used for the same or even any project from where i can get some insight on how to move further with this...
Happy Coding...

You're right, it would be very difficult to use a library with no documentation and no source code.
I'd suggest using ZXing. It's a well-documented library with lots of examples.

Image Processing via Standalone application

I'm developing a project for doing Content Based Image Retrieval where front end will be in java.
The main issue is about choosing tool for performing image processing. Since Matlab provides a lot of functionality for doing CBIR. But the main problem about using Matlab is that you need to have Matlab installed on every computer using the application.
Is there any other way in which I can do my project (Using other tools or driver) so that my application will run without using any other tools ???
Or can I develop entire application in Matlab only and deploy it as a standalone application ???
Thank you..

There are plenty of image processing libraries, for example for Java: ImageJ, there is also one by the Apache Commons project. If you need higher-level computer vision libraries there is OpenCV for C++ that also has bindings for Java, for example.
You can also develop the entire application in Matlab, but to deploy a stand alone application requires this requires licensing Mathworks Builder NE (which can be expensive). Matlab is very good for research and prototyping purposes.
There are other alternatives that are amenable to quick prototyping for example Python and PIL.
I think the bottom line is that there are plenty of options.

Java image utilities library: A Java library for loading, editing, analyzing and saving pixel image files.
It supports various file formats.
Provides demo applications for the command line. It has AWT GUI toolkit too.

Matlab is an excellent tool for prototyping as already pointed out by carlosdc. Matlab offers limited options with regard to UI programming. GUIDE is ok for small projects, but hinders more than it helps on bigger ones.
With MATLAB Builder JA you're able to compile your Matlab code into Java classes.
With regard to plotting time series in real time, libraries like JFreeChart are way slower.

I think OpenCV is one of the best libraries out there for image processing but Java Advanced Imaging is also quite good but doesn't has as much features and examples. Color similarity would be simple in JAI but shape probably would involve more code.
If you choose to use OpenCV I think you have at least two possible binding implementations for Java. The one my group uses is this one. It has some Processing dependencies.
Regardless of what library you choose be prepared for some frustration. Matlab users are used to all the nice features it provides and when they have to port their code to other languages end having to write a lot more code.

Well, after a long search finally I've found the way to deploy Matlab code along with java that too standalone application..
The steps are simple::--
1. Go and get Javabuilder.jar file located at location::
Matlab\toolbox\javabuilder\jar\javabuilder.jar
Next type deploytool in Matlabs command line...
deploytool window will open now create a new java project.
Select Matlab files that you want to use.
The deploytool will now convert the .m file to .jar file.
Now use both of the above mentioned jar files and develop your java compatible matlab code
and thats the way you can create the standalone application of matlab..

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Generating training data using Tess4J - java

The training is provided by other executables besides Tesseract, and they are not exposed as API or libraries. For Java-based Tesseract training, you may want to check out jTessBoxEditor project.

Related

How To Set up RapidMiner library in Android Studio

CMUsphinx support for other languages

Generating CCD document using MDHT API

1-D barcode scanner(using images from a capturing device) implementation in java

Image Processing via Standalone application

Categories

Resources