Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Can anyone suggest a good open-source image processing library in Java?
I want to develop an OMR reader using it.
There are a number of options out there, each with their own features and drawbacks. If you want to discuss your needs in more detail, I can touch on the specific attributes of each library as it relates to your project:
ImageJ - http://rsbweb.nih.gov/ij/index.html -- Note that ImageJ is primarily a self-contained application. However, the underlying API is very easy to use in your own applications without having to invoke the GUI.
Fiji - http://pacific.mpi-cbg.de/wiki/index.php/Main_Page -- This is ImageJ with a number of additional features. I have no personal experience with this library, but it looks promising.
JAI - http://www.oracle.com/technetwork/articles/javaee/jai-142803.html -- This is Sun's image processing Java offering. Limited in functionality, but it can be used as a basis for more powerful libraries.
jMagick - http://www.jmagick.org/index.html -- This is just a Java wrapper around ImageMagick and uses JNI to interface with the ImageMagick API
Apache Sanselan - http://commons.apache.org/imaging/ -- This library mostly does image IO, but it has a handful of features that can facilitate image analysis.
JIU (Java Imaging Utilities) - http://sourceforge.net/projects/jiu/ -- A Java library for loading, editing, analyzing and saving pixel image files.
Endrov - http://www.endrov.net/wiki/index.php?title=Main_Page -- Endrov is a multi-purpose image analysis program. I get the impression that the underlying API is usable outside of the application, but it also seems that not everything is implemented in Java. I have no personal experience with this library and am only throwing it in because it seems to have a number of useful features.
JAI
Marvin Image Processing Framework
http://marvinproject.sourceforge.net
and the dead-simple one: imgscalr
I would suggest using JAI, as mentioned, for the imaging side, but for writing an OMR application you will need template registration. This can be achieved using OpenCv. This works with Java (as well as many other languages and platforms).
Without good image registration, regardless of image processing library, you will end up missing some of the marks on some scans, as you will find that some scans are shifted due to the way scanners work.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Hoping that anybody here knows about a good one: I'm looking for a (free to use) C++ library with a class hierarchy and methods resembling the Java API, with at least the I/O & networking part if it, specifically HTTP handling.
I work mainly with C & Java, but for this particular project C++ is recommended, so I thought of adopting a good set of C++ libraries without facing a steep learning curve.
Thanks in advance for any recommendation.
Qt is IMHO very java like. I.e. they prefer Java-Style Iterators over the STL ones. Qt includes networking (examples) and much other stuff (like scripting via javascript)
Have you looked at the Boost libraries?
Boost.IOStreams provides a framework for defining streams, stream buffers and i/o filters.
Asio - Portable networking, including sockets, timers, hostname resolution and socket iostreams.
Many others....
The Boost libraries provide similar capabilities as compared to the Java API, but they very much 'look and feel' - appropriately - like a C++ library.
There is also the option of using something like POCO, which is slightly simpler than using something like Boost, while still being cross platform.
While the only time I used HTTP in Java was a long time ago, the interface for the POCO library looks fairly simple to use. It gives a example of basic FTP usage a something like this:
Poco::Net::FTPStreamFactory::registerFactory();
std::ofstream localFile(inputFile, std::ios_base::out | std::ios_base::binary);
Poco::URI uri(inputURL);
std::auto_ptr<std::istream> ptrFtpStream(Poco::Net::URIStreamOpener::defaultOpener().open(uri));
Poco::StreamCopier::copyStream(*ptrFtpStream.get(), localFile);
A C++ library that looked like a Java one would be a bad library, IMHO. The two languages are so very different that what is good design for one will almost inevitably be bad design for the other.
You can take a look at Mindroid, which is primarily oriented to embeddded programming:
Mindroid is an application framework (with focus on messaging and concurrency) that lets you create applications using a set of reusable components - just like Android. The name Mindroid has two different meanings. On one hand Mindroid is a minimal set of core Android classes and on the other hand these classes also form Android's mind (at least in my opinion).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Is there an Open Source java alternative to GraphViz? I'm aware of the existence of Grappa which basically wraps the Graph interface to GraphViz as an JavaAPI. However the layouting is still done by the GraphViz binaries.
I'm looking for a pure-java, open source library providing the same functions and layouting algorithms as GraphViz.
You can have a look at JUNG (Java Universal Network/Graph Framework) which has visualization and analytics functions. It's open source.
Interestingly, the Eclipse project has an SWT/JFace component/framework capable of displaying and generating (import/export) Graphviz's 'DOT' format, in pure Java:
ZEST (home page & download links)
See http://wiki.eclipse.org/Graphviz_DOT_as_a_DSL_for_Zest for usage examples.
Although ZEST is touted as an Eclipse plugin, it does seem that the DOT-manipulation API's can be used standalone and external to an Eclipse installation.
To clarify, the DOT functionality is a part of the ZEST 2 functionality, which itself is a sub-component of the GEF4 project.
Cheers
Rich
Update (May 2017) https://github.com/nidi3/graphviz-java
You could look at JGraph though I have never used it so cannot comment on now it compares to GraphViz.
yFiles seems to provide all this, but it's not free and not really cheap either. But then again it seems to be a very professional product (haven't used it, except in yEd, which can be used for free).
I guess ZGRViewer is what you want. I really like ZGRViewer and AJaPaD.
I worked with yFiles about four years ago, and it was excellent. It's costly (though less than JGraph, apparently) but I work in a CS research lab and had access to their generous academic pricing.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
We are currently researching ways of enhancing image quality prior to submission to OCR. The OCR engine we are currently utilizing is the Scansoft API from Nuance (v15). We were researching the Lead Tools but have since decided to look elsewhere. The licensing costs associated with Lead Tools is just too great. To start with we are looking for simple image enhancement features such as: deskewing, despeckling, line removal, punch hole removal, sharpening, etc. We are running a mix of .NET and Java software, but java solution would be preferred.
Kofax is good for pre-processing, but for the types of cleanup you are talking about may be overkill unless the images are really bad. Unless your specialty is in image processing, I'd recommend working with a provider that does the image cleanup and the OCR so you can focus on the value you actually add.
We license the OCR development kit from ABBYY (ABBY SDK) and have found it to be superb for both image processing and OCR. The API is quite extensive, and the sample apps, help and support have been beyond impressive. I definitely recommend taking a look.
Disclaimer: I work for Atalasoft
We have those functions and run-time royalty-free licensing for .NET.
http://www.atalasoft.com/products/dotimage/
We also have OCR components including a .NET wrapper for Abbyy, Tesseract and others and Searchable PDF generation (image on top of text in a PDF)
Not sure if this would be quite up to the standards that you guys would need, but perhaps you should look at some of the Paint.Net APIs. I don't know how easy it would be to extract their image processing algorithms for use in your project, but I believe they do some of the things you are looking for. Plus it is an open source project with an MIT License, so it should be pretty friendly for business use.
Research about KOFAX VRS at KOFAX.com
Maybe JMagick, it is an open source Java interface of ImageMagick. It is implemented in the form of a thin Java Native Interface (JNI) layer into the ImageMagick API. It's licensed under the LGPL so it shouldn't be a problem license wise.
http://sourceforge.net/projects/jmagick/
I would suggest Intel for its zero-cost runtime licensing.
Depends on the number and quality of the original images. Managed code and imaging tool kits will work but it's not always the best solution if you haved several million images to process. For small batches and tight budgets, I agree with the previous posters that projects like Aforge, Paint.NET, and other open source computer vision libraries will do the trick. Of course, you are on your own if the results are not improving... At least this let's you put everything you need under one application for a low cost.
If you are processing several hundred thousand images a month, then I would suggest you divide up the process into smaller workflow step and tweak each one until your cost per image gets as close to zero as you can. You will find that the OCR results rise quickly at first and then level off sooner than you expected. (I'm not a big fan of OCR but it has its place)
I use commercial Windows product from Recogniform to process and clean up the images prior to OCR in a batch mode using scripts adjusted for various kinds of images. If an image fails QC or is rejected by the OCR engine, it is "repaired" by hand using a custom .NET application built with Atalasoft's toolkit. Batch process everything and only touch what fails.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I would like to read the text and binary attachments in a saved Outlook message (.msg file) from a Java application, without resorting to native code (JNI, Java Native Interface).
Apache POI-HSMF seems to be in the right direction, but it's in very early stages of development...
msgparser is a small open source Java library that parses Outlook .msg files and provides their content using Java objects. msgparser uses the Apache POI - POIFS library to parse the message files which use the OLE 2 Compound Document format.
You could use Apache POIFS, which
seems to be a little more mature,
but that would appear to duplicate the efforts of POI-HSMF.
You could use POI-HSMF and contribute changes to get the
features you need working. That's
often how FOSS projects like that expand.
You
could use com4j, j-Interop, or some
other COM-level interop feature and
interact directly with the COM
interfaces that provide access to
the structured document. That would
be much easier than trying to hit it
directly through JNI.
Have you tried to use Jython with the Python win32 extensions (http://www.jython.org/Project/ + http://python.net/crew/mhammond/win32/)?
If this is for a "personal" or "internal" project Jython with Python may be a very good choice. If you are building a "shrink wrapped" software package this may not be the best option.
Apache POI-HSMF.
You can start from the example given in below link.
http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/examples/src/org/apache/poi/hsmf/examples/Msg2txt.java?revision=821500&view=markup&pathrev=821500
Further read library docs.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written.
I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable.
I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible).
Any help would be appreciated.
If you have a text-based PDF, I'd strongly recommend PDFTextStream. It's not free, but licensing is reasonable, and it is much much better than PDFBox. PDFBox chokes on many PDF files which are generated by newer tools, and is not too consistent about PDFs it can handle. PDFTextStream handles any PDF I throw at it, including PDFs with embedded PNG images, which PDFBox can not do.
If you heckle the PDFTextStream folks to add OCR, they may listen up.
We use ABBYY FineReader Engine 11. They have java wrapper.
Pros:
It works great with all the languages (English, Russian, Uzbek etc) and doing real OCR (even if you have pdf without OCR they perform rendering at first and OCRing).
Cons:
It costs. You have to buy developer license and end-user license.
And it is EXTREMELY slow.
If you want to extract OCR from text based PDF you may have to convert it to an image first.
You can use Java wrappers of Tesseract - tesjeract or Tess4J - to perform OCR. However, for PDF, you'll need to convert to image (PNG or TIFF) first before feeding it to the OCR engine.
VietOCR calls Tesseract executable to perform the text extraction. It uses GhostScript to do PDF-to-image conversion.