Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I was trying to use a Named Entity Recognizer for extracting the product names from a given text.
ie,
Input text : " Google makes google fit "
Expected output : Google Fit (Product)
Is there any tool already available for this ?
(I tested Alchemy API which is not relevant for extracting product names)
If no such tools are present , How can I build my own a training model for accomplishing this ?
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
Some Examples: Click Here
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
In one of my NLP assignments I have to read PDF files and extract information out of them. Using Java I am able to read the textual content from PDF and able to apply our NLP algorithms on the text, but I also need to extract information present in Tables in PDF, I am trying to read them but not able to get them in proper format. Any idea how I can read tables from PDF document , or any hint if any library is available in OpenNLP, GATE, Stanford NLP for achieving these.
Unfortunately, tables as structures are not stored in PDFs. You have to apply some serious coordinate math to figure out/estimate where a table is, where the columns are and where the rows are.
For PDFs, Apache Tika doesn't have any special table handling (it does for MSWord, MSPPT and many other formats, but not PDFs).
To extract tables as tables from PDFs, you might consider tabulapdf; see also John Hewson's recommendation. There are also commercial tools that likely do a decent job with table extraction from PDFs -- Abby Finereader, Nuance *PDF products.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Within my organization, we have maintained a sharepoint site to store a large amount of files related to previous/ongoing projects. These files can be word, pdf and ppt files. We are interesting to build a solution that have following functionalities
1) Advanced search, return a set of files that matches the keyword input by users. It is better to mark the returned files with some label (like using color) on the contents that are directly related to the search keyword.
2) Enable users to perform some types of analysis on the sharepoint site. Such as social network analysis of the person who are authors of some sharepoint files.
Are there any commercial software or open source library to fulfill these types of tasks?
This response is assuming you are using SharePoint 2010 or 2013.
Consider using faceted search. If you have an Enterprise cal you can easily set this up. The trick is making sure the metadata for the facets is available. This would obtain the search behavior your looking for, but not the interaction and tagging.
For this it would be best to create a custom solution, and leverage term sets in managed metadata. In SharePoint 2010 there is conditional formatting that you could use for color coding, however this is deprecated in 2013.
Hope those directions are helpful, but ultimately you are likely going to need to do a combination with custom code and event handlers.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking for a good library or some project that has been done in the area of SMS text normalization. I have found some good research projects like this one.
I am using Java as the programming language.
The concept in a nutshell is to handle SMS based text like "tel him 2 go home nw" and convert it to normal english language text "tell him to go home now".
Why not just to download a dictionary from a site like this: http://smsdictionary.co.uk/abbreviations and use a string replacement?
Dictionary substitution does not cut it, since it misses context in translations. e.g. do you translate '2' to 'to', 'too' or 'two'?
You can get a corpus and train a statistical model yourself, either using Moses (http://www.statmt.org/moses/) or Phrasal (http://nlp.stanford.edu/software/phrasal/).
As an author of the Stanford one (http://www-nlp.stanford.edu/sms/translate.php), I could be convinced to offer a REST based API for such a service, but I don't know the demand for it...
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I am building a search engine in Java. The search engine should search for the word entered in the textbox in 80 documents and then display the number of matching documents, the number of times that word appearing in each document.
For a start I imported all txt files and created a class Search.
I need to build an index in order to index each word in the 80 documents so that I can develop an algorithm to compare the word entered with the indexed words and return back the results.
Any suggestions for a start would be grateful !!
Regards,
Humam.
Any suggestions for a start would be grateful!
Absolutely - Lucene:
Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Take a look at the FTS capabilities of SQLite. That should do pretty much what you want.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Can Someone suggest me a good open source java library to read ECG data in MFER / HL7 / Other formats?
There are a number of options for a java library that parses HL7. For example, you could use the HAPI library, available on sourceforge at http://hl7api.sourceforge.net/. There is also a .NET version of that library at http://nhapi.sourceforge.net/home.php, for those that prefer that platform. Another Java-based option is HL7Comm at http://nule.org/wp/?page_id=63.
More options can be found on the Wikipedia page at http://en.wikipedia.org/wiki/Health_Level_7#Open_source_tools, and at http://www.hl7.org.au/HL7-Tools.htm.
For ECG processing in general, see the OpenECG portal at http://www.openecg.net/.
One of the standard ECG software analysis libraries is WFDB and PhysioNet:
http://www.physionet.org/physiotools/wag/wag.htm
Two American National Standards, ANSI/AAMI EC38:1998 (Ambulatory Electrocardiographs) and ANSI/AAMI EC57:1998 (Testing and Reporting Performance Results of Cardiac Rhythm and ST Segment Measurement Algorithms) require the use of several of the WFDB applications for evaluation of certain devices and algorithms.
There are wrappers written, so you can use Java to access the code:
http://www.physionet.org/physiotools/wfdb-swig.shtml