Natural Language Processing (NLP) with Java - java

I'm starting a project in which sentiment analysis is going to take center stage. Specifically, we'll be doing sentiment analysis of Twitter, Facebook, YouTube and other social network data.
I know of OpenNLP from Apache. It appears great but I think it's a little heavyweight for what I want to do in addition to it's dependence on Hadoop and the like. I haven't used it before and I may be wrong in my assessment of it.
I've seen elsewhere on this site about Stanford NLP. I can't seem to get a good starting point with this library; a tutorial sort of.
Also, I've read about Sentiment Anaysis APIs like AlchemyAPI on this site, but I want a solution I'm fully in control of. I just want a library I can bundle with my application.
In a nut shell I'm looking for a solution that is lightweight, and that I can set up in my local PC. Also, a pointer to a good starting point for Stanford NLP or OpenNLP will be appreciated very much.
UPDATE:
I've gone through the UIMA documentation and its support for components like OpenNLP components and other third party components, in addition to its inbuilt text processing capabilities makes it an attractive starting point. It's open architecture makes me feel it's ideal for what I want to achieve. Additional recommendation or advice will still be appreciated very much.

You should also take a look at:
SenticNet
CLIPS
All of these are easy to integrate with Python. You also can use NLTK, a great library for doing NLP.

Related

What does Twitter's architecture diagram looks like?

I'm building a service somewhat like Twitter and in the process of creating the design.
I looked at Twitter's open source projects on github, and on some alternative open source projects for part of the design example
So I have a general idea of what is needed to accomplish my task.
I failed to find Twitters actual architecture diagram or an article containing an overview about it.
What does Twitter's architecture diagram looks like?
Thanks
It would be difficult to find a single, grand architecture diagram explaining all Twitter services. However, you can find high level overviews and articles dealing with specific parts of it.
An overview from a Twitter engineering lead is here.
You can follow the Twitter engineering blog for updates.
This video is about their move from ROR to the JVM.
A list of Twitter scale numbers.
An article on tweaking Twitter's architecture.
This one deals with their storage backend.
Here is the updated architecture and details about some of the design decisions. Its interesting note that Twitter is using a write intensive architecture to make read perform better (O(1) infact). The highscalability article points to the original Infoq presentation.
Here is a great overview as of 19 Jan 2017 in case anyone else happens upon this thread. It covers lessons learned and the evolution of their various data solutions over time.
Infrastructure Behind Twitter

Using Mahout in Java Application

i want to write a Java Application (for University) which uses Latent Drichlet Allocation (LDA). The only Framework i found which offers LDA was Mahout.
I have quite some expierience in Java programming, even though i would not consider myself a Java Pro (I am coming from PHP).
The application will not be used in a distributed computing context, so the mahout / hadoop way might be a way over the top, but if i am right it should at least work.
My Problem:
The Mahout wiki etc. does not really help me, in fact i do not understand a single word. I dont want to use mahout in that "terminal way". I just want to load the classes into my application and kind of do something like that:
documents = obj.load(Documents);
mahout.doLDA(documents);
(I know it will not be that easy, but i am sure you know what i mean).
thanks
Mahout's libraries could be used in local mode, without full Hadoop cluster. You can look to examples from "Mahout in Action" book to see how this could be done.

library for text classification in java

I have a set of categorized text files. I want to categorize another large set of text files to use in my research. Is there a good way to compare them?
I think SVM based methods are useful but is there a simple and documented library for using such algorithms?
I don't know much about SVM, but LingPipe might be really helpful for you. The link is a tutorial specifically about categorization of documents (automatic or guided).
Also, look into the inter-related search products Lucene (a search library), Solr (search server app), and Carrot2 (for 'clustering' search results). There should be some interesting work in that space for you.
Mallet is another awesome library to look into. It has good commandline tools to help you get started and a Java API once you start getting into integrating it with the rest of your system.

Need help picking a datamining/neural-network API

I'm planning on building a feature for an e-commerce platform I developed in Java to display related products in much the same way Amazon does. There are a few different metrics for relating products that I want to explore.
Purchase history (purchased at the same time)
Related by family/type (similar product classifications)
Intentionally related (boosting results; "Buy this!")
While I would probably be able to develop my own datamining library, it wouldn't be very portable and I dare say it wouldn't be very good either.
There are several packages out there for doing this sort of thing but I don't feel like I am in a position to evaluate which package or solution would work best for me. Any input anecdotal or from personal experience would be greatly appreciated.
Note: I've tagged this as Neural networking because of a python talk I was at where a neural-like-network was used for datamining, I'm not convinced a neural network is the best choice for this job.
Take a look at Apache Mahout
There are some artificial algorithm techniques used for data mining, such as C4.5 or ID3. These algorithm does classification. Other techniques such as ant clustering, neural networks or genetic algorithms are used for classification purposes in data mining.
As far as algorithms, I don't know much but ID3/C4.5 can be easily programmed.
Hope this helps.

tool to graph method calls over time

I'm looking for a tool that can graph method calls over time for a java app. Perhaps a profiler or other log parsing tool?
I know I can write something in python and I'll work towards doing this. I was just hoping not to reinvent the wheel.
edit:
What I ended up doing was writing some python to parse my logs and take snapshots at 5 second intervals. Then I used google docs and a spreadsheet to visualize my data with a chart that had 2 columns of data: time and frequency. Google docs was super useful. Use the "move chart to own sheet" for a nice fullsize view. I'll post my python when I clean it up a bit.
here is the output graph from the method I specify in my comment
Check out JProfiler. I wouldn't suggesting writing your own tool, this is a space with lots of players already....unless you're really looking for something to do. :-)
you can also check the NetBeans profiler, that's quite straight forward if you application a standard Java code (I mean, it's a bit more complicated with projects deployed in Glassfish for instance)
(from Google Image from Dr. Dobbs)
EDIT: sorry, after another look at your question, it's not exactly what you were looking for, but it might be interesting anyway
YourKit Java Profiler is probably the most powerful Java profiler out there. It is not free but not unreasonably expensive either. If it doesn't have the feature you are looking for, I kinda doubt any application would.
VisualVM is a visual tool integrating several commandline JDK tools and lightweight profiling capabilities. Designed for both production and development time use, it further enhances the capability of monitoring and performance analysis for the Java SE platform.

Categories

Resources