I was wondering if anybody knew of any good Perl modules and/or Java classes for sentiment analysis. I have read about LingPipe, but the program would eventually need to be used for commercial use so something open-source would be better. I also looked into GATE, but their documentation on sentiment analysis is sparse at best.
Have a look at Rate_Sentiment in the WebService::GoogleHack module at CPAN. There's more information about the project at SourceForge.
I just added a sentiment analysis library to my Social Media Analytics Research Toolkit. The blog post / announcement is here. It's in R, not in Java, but there's a good interface between R and Java in the toolkit, so you can write your "glue code" in Java to call the R library. There's also an R - Python interface in the toolkit.
There's supposed to be an R / Perl interface too, but I haven't been able to contact the maintainer about bugs, so I took it out of the build.
You might want to take a look at LingPipe (Java) based sentiment analysis at:
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
and GATE (http://gate.ac.uk/sentiment/)
For more generalized NLP parsers see The Stanford parser (http://nlp.stanford.edu/software/lex-parser.shtml), NLTK (Python) (http://www.nltk.org/), etc.
I'm not aware of any similar open source tools for Perl, although there are some good basic references out there to get you started, e.g.:
Billisoly, R. (2008) Practical Text Mining with Perl. Wiley. ISBN 978-0-470-17643-6.
Related
I'm really interested in parser combinators, especially those who can deal with left-recursive and ambiguous grammars. I know the fabulous Superpower by Nicholas Blumhardt but it's unable to deal with this kind of grammars.
I've found some GLL parser combinators libraries like this https://github.com/djspiewak/gll-combinators, but it uses Scala and, that is a big inconvenience for me (I don't know that language).
I would like to know if there is any of these in C# (or Java)
Thank you very much.
I did a compiler project, using Java on IntelliJ IDE with ANTLR 4 extension, there are good resources out on the internet. This is the official book "The Definitive ANTLR 4 Reference" I find it quite good, also they offer nice documentation.
ANTLR 4 has the ability to deal with left-recursive and ambiguous grammars, you can implement the compiler with c# and Java and any language I think.
You can use their starter grammars for too many different languages.
Edit:
ANTLR 4 is a tool for Language Recognition, a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
It's NOT a library.
I am doing a project and I need to find out a machine learning library written in java specialized for document classification. Can anyone please give me some examples?
Here are two famous Java libraries
Stanford core NLP - http://nlp.stanford.edu/software/classifier.shtml
GATE - http://osdir.com/ml/ai.gate.general/2007-05/msg00003.html, https://gate.ac.uk/sale/tao/splitch19.html#chap:ml
Depends on the kind of ML you are looking for.
There is the linguistic part of the problem (parsing documents, extracting entities, etc.) which can significantly improve the result, and the ML algorithms part.
For the latter look at Apache Mahout, for example - it also has examples of document classifications coming with it. Especially if you plan to deal with a lot of data. Stanford classifier is also a good choice to start with.
Both machine learning frameworks MALLET (http://mallet.cs.umass.edu/classification.php) and Weka (http://www.cs.waikato.ac.nz/ml/weka/) can do document classification. They are both easy to get started with compared to say Mahout or Spark.
I'm starting a project in which sentiment analysis is going to take center stage. Specifically, we'll be doing sentiment analysis of Twitter, Facebook, YouTube and other social network data.
I know of OpenNLP from Apache. It appears great but I think it's a little heavyweight for what I want to do in addition to it's dependence on Hadoop and the like. I haven't used it before and I may be wrong in my assessment of it.
I've seen elsewhere on this site about Stanford NLP. I can't seem to get a good starting point with this library; a tutorial sort of.
Also, I've read about Sentiment Anaysis APIs like AlchemyAPI on this site, but I want a solution I'm fully in control of. I just want a library I can bundle with my application.
In a nut shell I'm looking for a solution that is lightweight, and that I can set up in my local PC. Also, a pointer to a good starting point for Stanford NLP or OpenNLP will be appreciated very much.
UPDATE:
I've gone through the UIMA documentation and its support for components like OpenNLP components and other third party components, in addition to its inbuilt text processing capabilities makes it an attractive starting point. It's open architecture makes me feel it's ideal for what I want to achieve. Additional recommendation or advice will still be appreciated very much.
You should also take a look at:
SenticNet
CLIPS
All of these are easy to integrate with Python. You also can use NLTK, a great library for doing NLP.
I am trying to do sentiment analysis for non english languages like japenese, chinese, german etc. I want to know if any Machine translator available for translating documents in these languages to english. I am working on JAVA so I should be able to call the API or tool.
I have used google translator API so please suggest anything apart from it.
Sentiment analysis is highly dependent on both the culture and the domain of practice (see http://en.wikipedia.org/wiki/Sentiment_analysis ). We are working in the area of SA for scientific texts and this is undoubtedly a research area. So I don't think you will find anything off-the-shelf either for the human language or for the SA.
There are a plenty of different Machine Translation APIs: Google, Microsoft, Yandex, IBM, PROMT, Systran, Baidu etc.
I may refer you to our recent evaluation study (November 2017): https://www.slideshare.net/KonstantinSavenkov/state-of-the-machine-translation-by-intento-november-2017-81574321
However, it's not clear how MT quality scores are correlated with good sentiment analysis on the results. That's something we are going to explore soon.
I'm developing a small programming language based mostly of the C99 standard and I've already written a fairly decent lexer in java and now I'm looking to generate a Java Parser from the grammar. I know there's Bison, but that seems to only generate C code. I'm looking for a application that will allow me to input my grammar and create a full parser class in java code. Reading other SO posts on related topics, I've found ANTLR, but I'm wondering if anyone in the SO knows about a better tool?
thanks!
Another couple to look at are JavaCC and SableCC (it has been a long time since I looked at SableCC).
I've been quite impressed by BNFC, which is able to generate parsers in Java as well as in C, C++, C#, F#, Haskell, and OCaml.
The JFlex home page at http://jflex.de indicates where to find Bison-like tools that can target Java:
http://byaccj.sourceforge.net/
http://www2.cs.tum.edu/projects/cup/
http://www.antlr.org/