I'm trying to read a PDF file with images of a scanned document.
Aspire OCR is giving me this text as the output of the OCR.
<error: failed to read pdf. error code: ÿ>
There could be a few possible causes: 1) the PDF file doesn't exist; 2) the PDF file is malformed.
We recommend that you should get the latest version of Asprise OCR and Barcode Recognition Library for Java, C# VB.NET, Python, C/C++ and Delphi and refer to the Developer’s Guide to Asprise OCR SDK for Java, C# VB.NET, Python, C/C++ & Delphi Pascal. Alternatively, you can contact our support team directly.
Related
I use Apache PDFBox library to create pdf-files. It creates files with the XFA structure. Applications on PC, Mac or Linux can read these files without any problems. But Android devices cannot do it. I see the following error message in the pdf-file:
"Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document".
I am trying to find a solution to create pdf files that could be read by Android devices. I cannot find any information how to do it.
Did anyone do something like that?
If you generate a non-XFA PDF you'll probably have more luck with it. The XFA spec is large, complicated and not well supported. Adobe Reader supports it but not many other readers (on Android or on desktop).
I'm looking for a Java library that can take a PDF and create a thumbnail image (PNG) from the first page in an Android Mobile Device.
I found this Create thumbnail image for PDF in Java but this will not work for android as it is dependent on AWT.
Please help me with the free support lib in android.
Apache PDFBox.
The Apache PDFBox™ library is an open source Java tool for working
with PDF documents. This project allows creation of new PDF documents,
manipulation of existing documents and the ability to extract content
from documents. Apache PDFBox also includes several command line
utilities. Apache PDFBox is published under the Apache License v2.0.
In itself, it is not compatible with Android so you may want to have a look at the Android port - PdfBox-Android library.
If you want a web service, Datalogics PDF WebAPI is an option.
I'm evaluating transformers to find visio to pdf transformer. I found there is none. Is there any external third party api that can convert visio to pdf(Both Linux and Windows). I found easyPDF api, but it only support windows platform.
Any kind help is appreciated.
Are there any JAVA APIs or tools that can convert Handwritten Scanned Doc to txt files?
I have tried google tesseract and few other tools , but I am not getting satisfactory results for hand written scanned docs.
Strange that other answers here are pointing out to OCR tools while question clearly states handwriting recongition.
Handwriting is even more difficult area than OCR and number of technologies available is very narrow. I don't think you will be able to find any open source tool for that, while there are few commertial vendors:
http://www.a2ia.com
http://www.parascript.com/
I don't know if they have Java API, but it is better to start researching from contacting them.
You can try the Java OCR Project. I think that you might do the writing to a text file section yourself though.
Also, hand writing tends to vary from one individual to another, so I guess you will need to select some good training data to get good results.
Have a look at these :
Java OCR
Java OCR is a suite of pure java libraries for image processing and character recognition. Provides modular structure for easier deployment .
GOCR
GOCR is an OCR program, developed under the GNU Public License. It converts scanned images of text back to text files.
I have MCA final year project to extract data from image (jpg, gif, etc.).
I want to recognize data from image.
I have used java ocr but it is not working.
Are there any open source libraries which can help me?
Have a look at zxing, http://code.google.com/p/zxing/downloads/list
Matlab has a trainable OCR that has been used to break capthcas. Unfortunately the group that broke the capthcas didn't release source code. However, here is example code of training Matlab's ocr.
The matlab code will easily compile into your java project.
Here is a java based OCR tool. The page claims that the tool can recognize triangles and other patterns from letters - they have given sample images too. The code is open source and downloadble.
Did you try Asprise?
tesseract is an open source OCR tool, but it's not in Java. See tesseract in action