Machine translation for multilingual sentiment analysis

Machine translation for multilingual sentiment analysis - java

I am trying to do sentiment analysis for non english languages like japenese, chinese, german etc. I want to know if any Machine translator available for translating documents in these languages to english. I am working on JAVA so I should be able to call the API or tool.
I have used google translator API so please suggest anything apart from it.

Sentiment analysis is highly dependent on both the culture and the domain of practice (see http://en.wikipedia.org/wiki/Sentiment_analysis ). We are working in the area of SA for scientific texts and this is undoubtedly a research area. So I don't think you will find anything off-the-shelf either for the human language or for the SA.

There are a plenty of different Machine Translation APIs: Google, Microsoft, Yandex, IBM, PROMT, Systran, Baidu etc.
I may refer you to our recent evaluation study (November 2017): https://www.slideshare.net/KonstantinSavenkov/state-of-the-machine-translation-by-intento-november-2017-81574321
However, it's not clear how MT quality scores are correlated with good sentiment analysis on the results. That's something we are going to explore soon.

Related

Natural Language Processing (NLP) with Java

I'm starting a project in which sentiment analysis is going to take center stage. Specifically, we'll be doing sentiment analysis of Twitter, Facebook, YouTube and other social network data.
I know of OpenNLP from Apache. It appears great but I think it's a little heavyweight for what I want to do in addition to it's dependence on Hadoop and the like. I haven't used it before and I may be wrong in my assessment of it.
I've seen elsewhere on this site about Stanford NLP. I can't seem to get a good starting point with this library; a tutorial sort of.
Also, I've read about Sentiment Anaysis APIs like AlchemyAPI on this site, but I want a solution I'm fully in control of. I just want a library I can bundle with my application.
In a nut shell I'm looking for a solution that is lightweight, and that I can set up in my local PC. Also, a pointer to a good starting point for Stanford NLP or OpenNLP will be appreciated very much.
UPDATE:
I've gone through the UIMA documentation and its support for components like OpenNLP components and other third party components, in addition to its inbuilt text processing capabilities makes it an attractive starting point. It's open architecture makes me feel it's ideal for what I want to achieve. Additional recommendation or advice will still be appreciated very much.

You should also take a look at:
SenticNet
CLIPS
All of these are easy to integrate with Python. You also can use NLTK, a great library for doing NLP.

Grammar description for language Java

I'm writing a bachelor's thesis on "Analysis of the source code in Java applications". I have a few points that must include written part. One of them is "a brief description of the grammar and writing Java." Since this is a bachelor thesis, sources of information must be verified - the books, the official site of Java, etc. Unfortunately I can not find this information on the Java website (maybe I'm just casually looking for). If possible, it is easier for me to use online resources than books.
Can anyone advise me where I found this information verified? Of course we were in school in certain subjects taught either syntax or semantics of Java, but it does not seem so "official source".
Thank you all.

Use the official Java Specification from Oracle.
http://docs.oracle.com/javase/specs/jls/se7/html/jls-2.html

I think that Code Conventions for the Java Programming Language is good place to start.

For static code analysis in Java you can find some automated tools.
Static Code Analysis Tools

Have you tried google scolar.
Scolar only gives proper scientific results.
for instance you might like this article

Algorithms/methods to compile forum discussions into categorized articles or information?

I'm designing and coding a knowledge based community sharing system (forum, Q&A, article sharing between students, professors and experts) in Java, for the web.
I need to use some data mining/text processing techniques/algorithms to analyse the discussions between experts and students (discussions are categorized using tags) and create proper notes and compilations on specific similar topics.
I'm not an expert regarding such algorithms or tools available. It'd be great if anyone can provide me with some pointers or explain how I can proceed with this problem.
Thanks!!

For categorization of articles you can use LSA (Latent Semantic Analysis) technique .
You can check these tools for text processing.
LingPipe : Tool kit for processing text.
Lucene : Text mining
Solr : Powerful text search tool

Start reading up on Text Mining. There is no general answer to your question because it is not precise enough. You must be more precise about your aims, then people can suggest methods for these. Your "analyze" is way too broad. Counting the number of words is "analyzing", too!
So: what do you want to recognize, group or predict?

Java Grammar syntax analyzer (ASCII to graph)

I am developing an assistant to type database commands for DBAs, because these commands have many parameters, and an assistant will help a lot with their job. For this assistant, I need the grammar of the commands, but database vendors (Oracle, DB2) do not provide that information in any format, the only thing is the documentation.
One example of a DB2 command is: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001933.html
For this reason, I am trying to analyze the grammar diagram or railroad diagram (http://en.wikipedia.org/wiki/Syntax_diagram), but I have not found anything in Java that could help me. I would like some re-engineering (reverse) tool that takes the ASCII (textual representation) of the grammar, and creates a graph in Java. Then, with the graph in Java, the assistant could propose options of the current typed command.
On example of the assistant http://www.youtube.com/watch?v=5sBoUHJupvs
If you have information about how to analyze grammar diagrams with Java (not generate) I will appreciate that information.

The closest tool I've seen is Grammar Recovery System by Ralf Lammel. It depends on accessibility of railroad diagrams as text strings. That's generally not how they are found. You appear to be lucky in the DB2 case, Ralf's work points in the right direction.
Considering that such diagrams are usally rendered as just a set of pixels (PLSQL's are like this in the PDF files provided for documentation), you have several sets of problems: recognizing graphical entities from pixels, assembling them into actual representations of the railroad diagrams, and then using such as your assistant.
I think this is a long, hard, impractical approach. If you got it to work, you'd discover the diagrams are slightly wrong in many places (read Ralf's paper or find out the hard way), and therefore unusable for a tool that is supposed to produce the "right" stuff to help your DBAs.
Of course, you are objecting to the other long, hard, "impractical" approach of reading the documentation and producing grammars that match, and then validating those grammars against the real world. Yes, this is a tough slog too, but it actually does produce useful results. You need to find vendors that done this and will make it available to you.
ANTLR.org offers a variety of grammars. Have you checked there?
My company offers grammars and tools for processing them. We have done this for PLSQL and SQL2011 but not yet DB2.
Given a grammar, you now need to use it to provide "advice" to your users. Your users aren't going to type in a complete "program"; they want to generate fragments (e.g., SELECT statements). Now you need a parser that will process grammar fragments and at least say "legal" or "not". Most won't do that. Our DMS Software Reengineering Toolkit will do that.
To provide advice, you need to be able to walk the grammar (much as you considered for railroad diagrams) to compute "what is legal next". That's actually pretty hard (an in fact it is roughly equivalent to what an LR/GLR parser generator does when building tables). Our DMS engine does that during syntax error repair by traversing its GLR parse tables (since that work is already encoded in the tables!). That's not easy to do, as it is a peculiar variant of the GLR parsing algorithm. You might do better with an Earley parser, which keeps around all possible parses as a set of choices; you could simply inspect each one.
But this looks like quite a lot of work, and I think you'll be surprised by the amount of machinery you need.
The best work in this area is Harmonia, which produces incremental editors for code. Our DMS engine's parser is based on earlier work done by this project, because we are interested in the incrementality aspect.

You can try using ANTLR http://www.antlr.org/
It will not be able to understand an ASCII representation of the grammar, but it is powerful enough to do anything else you need, if you don't mind spending the time to learn the software.

Inference algorithm of Shenoy and Shafer

I have heard of this algorithm, but is this algorithm good to use it with Bayesian Belief networks? Hugin is based on it and I'm looking for a book / article on this algorithm.

The algorithm is described in this paper. It is quite detailed and should be a good point to start.

I haven't kept track of this research area for a while, but I can point you towards the
CiteSeerX search engine if you don't know it already. (http://citeseerx.ist.psu.edu/)
Searching for papers which cite Shenoy & Shafer's An axiomatic framework for Bayesian and belief function propagation (1990) will give you a list of other researchers who have tried to apply the algorithm.

I am not familiar with the algorithm but another place to check for information would be
a search in google scholar.

Pulcinella is a tool for Propagating Uncertainty through Local Computations based on the general framework af valuation systems proposed by Shenoy and Shafer
Pulcinella is freely available for
educational and strictly
non-commercial use. Pulcinella is
written in Common Lisp. It has been
tested on Allegro CL on Macintosh, and
on Lucid CL, Allegro CL, and CLisp on
a Sun. The code is just "pure" common
lisp, so it should also run on any
other reasonable implementation of
common-lisp (well, you know...). To
get the latest version, click here.
Alternatively, you can get Pulcinella
by anonymous ftp from
ftp://aass.oru.se/pub/saffiotti. The
Pulcinella tar archive includes a few
examples, taken from the User's
Manual. If you fetch this program, you
are expected to send a letter at the
address below, stating that you will
use Pulcinella for research and
non-commercial use only.
Also here is some references.
Even More references:
An Algorithm for Bayesian Belief Network Construction from Data
A Tutorial on Learning With Bayesian Networks
http://en.wikipedia.org/wiki/Bayesian_network#External_links

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.