Scanning feature in Alfresco community edition?

Scanning feature in Alfresco community edition? - java

I am working on to the alfresco 4.2 community addition.now i have to use the some kind of Scanning feature to scan the hard copy of the document and upload.
I have googled but haven't found any good solution.

Additionally to Alfresco you need a so called capture software which handles the scanning, converting to a PDF, OCR and the filing to Alfresco. There are several solutions available in in the market in different quality with different concepts.
Here a (not complete) list of working solutions I know of in the order of costs:
Quikdrop (Client-Installation): simple .NET-Client with Scan-Client, PDF-Conversion, OCR and limited Metadata-Support
Kofax-Express with Alfresco-Connector from ic-solution (Client-Installation): professional Capture Client supporting barcodes, scan optimizations, guided metadata extraction, validations, delivery to Alfresco supporting document types & metadata
Ephesoft (Server-Installation): web based capture solution available as a community, cloud and commercial version
Abbyy Flexicapture (Server-Installation): Local Capture Clients with a central Capture / Transformation and Extraction Service
Kofax with Alfresco-Kofax-Connector (Server-Installation): Local Capture Clients with a central Capture / Transformation and Extraction Service

The answer to your question is probably not related directly to Alfresco. Alfresco is excellent at managing documents, but not until you get them into Alfresco.
So first you have to scan the documents by a scanner and really any scanning software out there. Once you do, you upload the documents using something like:
CIFS - you just mount a folder in Alfresco to your desktop, as any other network drive and move the scanned documents in that folder. Usually you'll create an Alfresco rule on that folder to move the documents away, to email somebody, start a workflow or anything really.
You can upload the documents using Explorer or Share. It is probably not efficient if you have a lot of documents to upload.
You can use another application to connect to Alfresco using the upload API and send the documents in.
You email the scanned documents to Alfresco (provided that you have configured up incomming email box on Alfresco).
Use Alfrescos built-in FTP server to upload the documents.
There are more ways to get the documents in, these are, I think, the common ones.

You can use ChronoScan (http://www.chronoscan.org) there is a CMIS module to scan/ocr and send directly to Alfresco, SharePoint, etc in PDF Text or other formats,
The software is free for no commercial use with a nag screen, and is very similar to x10 price solutions (Kofax Express, etc..)
Regards

In addition to #zladuric:s answer I would like to add that there are software like Ephesoft and Kofax that for example can aid in the extraction of metadata from the scanned documents.

Related

Java web application with S3 File Manager

I am working on a web application based on Java/JSP and I need to include a web based file explorer that connects to an Amazon S3 bucket that can handle the basic file explorer tasks such as upload, delete, rename and navigation. Any suggestions would be appreciated.

have you tried simply creating a wrapper to wrap around s3cmd ? that would be the ultra lazy way to do it. since then all the protocol handling is done for you and all you have to do is feed/parse its input and output
S3cmd is extremely well documented and pretty simple to use.

Client android application for site

I want to write a client application for a site (e.g. to read smth from site, add some comments, likes etc). I haven't got access to site sources and there isn't any API for work with it. So at my Android application I decided to parse this site (it has static pages) using : JSOUP Library
And using this library I'm going to write unofficial, but API for my purposes to work with this site, and then use it in my Android application.
Can somebody tell me, is this good practice or there are better ways to do? Is this good idea at all to parse site in Android device.

As I wrote in comment - in general building your own application on top of the third party www service is not a good idea. If you want to do it anyway you have 2 options:
Use jSoup (or any other html parser if exists) and parse third party content on the device
Set up some middleware server to parse content and serve it in some more convenient way.
The second option has a little advantages - you can fix application without forcing users to update it and probably you'll save a bit of device's bandwidth. Of course disadvantage is that you have to pay for server.
General problem with applications like that is that every single change with layout, skin, server configuration can cause your application to stop working, as well as parsing html needs much more work that just connect to existing API.
More over - publishing your application can cause some legal issues (copyright) and is against Google Play's policy:
Do not post an app where the primary functionality is to: Drive
affiliate traffic to a website or Provide a webview of a website not
owned or administered by you (unless you have permission from the
website owner/administrator to do so)

Pre-process client's local csv file data into an array using Java and GWT

I've searched and searched, coming across questions that address parts of the problem, but nothing comprehensive. I'm using GWT and eclipse to develop a website that uses highcharts to make some fancy plots.
The idea is that the user will be able to select one of their local data files of type csv and upon selection of the file, the plot will be rendered using their data and our fancy algorithms.
We don't want to send enormous amounts of data to the server as this will become costly and time consuming for the user. Is there a way to process or at least pre-process the user's data using Java code to be implemented in a GWT-eclipse project?
Any help is greatly appreciated!

This is a duplicate of GWT Toolkit: preprocessing files on client side
One of the answers points to these links:
http://code.google.com/p/gwt-nes-port/wiki/FileAPI - GWT wrapper for HTML5 File API
http://www.html5rocks.com/en/tutorials/file/dndfiles/ - HTML5 FileAPI
But, alas, the FileAPI is pretty new: http://caniuse.com/fileapi
The other alternative you have, to avoid server, is a text area to paste the CSV file into, then read that using GWT. This is a common trick and I think you can even copy+paste from certain spreadsheet programs this way.

You cannot do it in a universal way in GWT in all browsers currently. GWT translates to javascript and it does not have the required privileges to process client side the files.
For more detailed answer you can reference - How to retrieve file from GWT FileUpload component?

Best architecture for crawling website in application

I am working on a product in which we need a feature to crawl the user given URL and publish his separate mobile site. In the crawling process we want to crawl the site content, CSS, images and scripts. The product used to do more activities like scheduling some marketing activities and all. What I want to ask -
what is the best practice and open source framework to do this task?
Should we do it in the application itself or should there be another server for doing this activity (if this activity takes load)? Keep in mind that we have 1 "lacks" user visiting every month publishing his mobile site from the website, and around 1-2k concurrent users.
The application is built in Java and the Java EE platform using Spring and Hibernate as server side technology.

We used Derkley DB Java edition for managing off-heap queue of links to crawl and distinguish between links pending download and ones downloaded yet.
For parsing HTML TagSoup is the best choise in wild internet.
Batik is the choice for parsing CSS and SVG.
PDFBox is awesome and allows to extract links from PDF
Quartz scheduler is intustry-proven choice for event scheduling.
And yes, you will need one or more servers for crawling, one server for aggregating results and scheduling tasks, and perhaps another server for WEB front and back end.
This worked well for http://linktiger.com and http://pagefreezer.com

I'm implementing a crawling project based on Selenium HtmlUnit Driver. I think it's really the best Java Framework to automate a headless browser.

Crawl Web Data using Web Crawler

I would like to use a web crawler and crawl a particular website. The website is a learning management system where many student upload their assignments,project presentations and so on. My question is that can i use a web crawler and download the files that have been uploaded in the learning management system. After i download them i would like to create an index on them so as to query the set of documents. User can use my application as a search engine. Can a crawler does this? I know about webeater ( Crawler written in Java )

Download the files in Java SingleThread.
Parse the files (you can get idea from parse plugins of nutch).
Create index with lucene

If you want to use a real webcrawler, user http://www.httrack.com/
It offers you so many options for copying websites or content on webpages including flash. It works on windows and mac.
Then you can do steps 2 and 3 as suggested above.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Scanning feature in Alfresco community edition? - java

I am working on to the alfresco 4.2 community addition.now i have to use the some kind of Scanning feature to scan the hard copy of the document and upload. I have googled but haven't found any good solution.

You can use ChronoScan (http://www.chronoscan.org) there is a CMIS module to scan/ocr and send directly to Alfresco, SharePoint, etc in PDF Text or other formats, The software is free for no commercial use with a nag screen, and is very similar to x10 price solutions (Kofax Express, etc..) Regards

In addition to #zladuric:s answer I would like to add that there are software like Ephesoft and Kofax that for example can aid in the extraction of metadata from the scanned documents.

Related

Java web application with S3 File Manager

Client android application for site

Pre-process client's local csv file data into an array using Java and GWT

Best architecture for crawling website in application

Crawl Web Data using Web Crawler

Categories

Resources