Java or Python for Natural Language Processing [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I would like to know which programming language is better for natural language processing. Java or Python? I have found lots of questions and answers regarding about it. But I am still lost in choosing which one to use.
And I want to know which NLP library to use for Java since there are lots of libraries (LingPipe, GATE, OpenNLP, StandfordNLP). For Python, most programmers recommend NLTK.
But if I am to do some text processing or information extraction from unstructured data (just free formed plain English text) to get some useful information, what is the best option? Java or Python? Suitable library?
Updated
What I want to do is to extract useful product information from unstructured data (E.g. users make different forms of advertisement about mobiles or laptops with not very standard English language)

Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you'll need to use one or the other and often there isn't much of a choice unless you're heading a project.
Other than NLTK (www.nltk.org), there are actually other libraries for text processing in python:
TextBlob: http://textblob.readthedocs.org/en/dev/
Gensim: http://radimrehurek.com/gensim/
Pattern: http://www.clips.ua.ac.be/pattern
Spacy:: http://spacy.io
Orange: http://orange.biolab.si/features/
Pineapple: https://github.com/proycon/pynlpl
(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)
For Java, there're tonnes of others but here's another list:
Freeling: http://nlp.lsi.upc.edu/freeling/
OpenNLP: http://opennlp.apache.org/
LingPipe: http://alias-i.com/lingpipe/
Stanford CoreNLP: http://stanfordnlp.github.io/CoreNLP/ (comes with wrappers for other languages, python included)
CogComp NLP: https://github.com/CogComp/cogcomp-nlp
This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html
A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4
If you're uncertain, which is the language to go for NLP, personally i say, "any language that will give you the desired analysis/output", see Which language or tools to learn for natural language processing?
Here's a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp
An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp
Other than language processing tools, you would very much need machine learning tools to incorporate into NLP pipelines.
There's a whole range in Python and Java, and once again it's up to preference and whether the libraries are user-friendly enough:
Machine Learning libraries in python:
Sklearn (Scikit-learn): http://scikit-learn.org/stable/
Milk: http://luispedro.org/software/milk
Scipy: http://www.scipy.org/
Theano: http://deeplearning.net/software/theano/
PyML: http://pyml.sourceforge.net/
pyBrain: http://pybrain.org/
Graphlab Create (Commerical tool but free academic license for 1 year): https://dato.com/products/create/
(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)
Weka: http://www.cs.waikato.ac.nz/ml/weka/index.html
Mallet: http://mallet.cs.umass.edu/
Mahout: https://mahout.apache.org/
With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
I'll avoid listing deep learning tools out of non-favoritism / neutrality.
Other Stackoverflow questions that also asked for NLP/ML tools:
Machine Learning and Natural Language Processing
What are good starting points for someone interested in natural language processing?
Natural language processing
Natural Language Processing in Java (NLP)
Is there a good natural language processing library
Simple Natural Language Processing Startup for Java
What libraries offer basic or advanced NLP methods?
Latest good languages and books for Natural Language Processing, the basics
(For NER) Entity Extraction/Recognition with free tools while feeding Lucene Index
(With PHP) NLP programming tools using PHP?
(With Ruby) https://stackoverflow.com/questions/3776361/ruby-nlp-libraries

The question is very open ended. That said, rather than choose one, below is a comparison depending on the language that you would like to use (since there are good libraries available in both languages).
Python
In terms of Python, the first place you should look at is the Python Natural Language Toolkit. As they note in their description, NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
There is also some excellent code that you can look up that originated out of Google's Natural Language Toolkit project that is Python based. You can find a link to that code here on GitHub.
Java
The first place to look would be Stanford's Natural Language Processing Group. All of software that is distributed there is written in Java. All recent distributions require Oracle Java 6+ or OpenJDK 7+. Distribution packages include components for command-line invocation, jar files, a Java API, and source code.
Another great option that you see in a lot of machine learning environments here (general option), is Weka. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

Related

Quantitative finance/mathematical library for Java [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am currenlty working on a project which relies on a couple of mathematical/statistical/financial tools which I would ideally like to include in one (perhaps several) libraries. What I would like to have is:
Statistical measures - modes, variance etc.
probability distributions + sampling from them available
financial models for e.g. option pricing
Does anyone have an idea which libraries might be useful?
I do quite a bit of numerical work in Java and Clojure so hopefully my perspective can be useful.
For 1. and 2. here are some options to explore:
Parallel Colt - very comprehensive, high-performance mathematical library. Good for lots of vector processing which you get in finance apps.
Incanter - R-like statistical library for Clojure (very easy to embed in a Java app). Has pretty much all the statistical tools you could desire, plus is very easy to extend if you want to create your own DSLs
EJML - probably the fastest matrix library available in Java
Apache Commons Math - handy collection of maths and statistical tools
For 3. (financial models) these tend to be pretty bespoke to the problem you are trying to solve. I doubt you would want to use an off-the-shelf solution even if it existed. I'd suggest developing yourself using the toolkits mentioned above.
QuantLib apparently has Java bindings via SWIG.
IdylFin might be just what you are looking for -- inspired by quantmod, it contains some blazing fast, numerically stable statistical methods, an API to Yahoo Finance, as well as some portfolio optimization methods and it's being extended actively.
Disclaimer: I am the author of the library, and it would knock my socks off if you used it ;).
Strata from OpenGamma contains those items. It is a professional quantitative finance library written in Java and open source.
The quantitative finance part contains interest rate, forex and equity models. It is developed by industry quants for bank and hedge funds. I don't which type of financial models you want to use; there is a large panel of models and asset classes and you should be able to find off-the-self a professional type implementation.

Looking for strong/explicit-typed language without GIL [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Are there any languages which feature static type checking like in C++ with modern syntax like in Python, and does not have GIL?
I belive, Python 3 with ability to explicitly declare type of each variable would be 'almost there', but GIL makes me sad.
Java is nice, but I need something more 'embedable' without bulky JRE.
Update: Anything .NET-related or non-open source is a no-go.
Update2: I need explicit+strong typing to write safer code in the expense of development speed. GIL is important as the code is going to be quite computing extensive and will run on multicore servers, so it has to effectively use multiple CPU.
Update3: Target platform is Linux(Debian) on x86
Boo
Boo is an object oriented, statically
typed programming language that seeks
to make use of the Common Language
Infrastructure's support for Unicode,
internationalization and web
applications, while using a
Python-inspired syntax and a
special focus on language and compiler
extensibility. Some features of note
include type inference, generators,
multimethods, optional duck typing,
macros, true closures, currying, and
first-class functions. Boo has been
actively developed since 2003.
cython
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
Anything in the ML family might work for you. Ocaml is a great place to start, but it does have a stop-the-world GC last I looked. Haskell is famous as a lab for innovative concurrency models. Python's comprehensions came from Haskell, where they'rr a convenient syntax for some very fundamental ideas. And Erlang is strongly dynamcally typed, fun to write in, and does concurrency better than anybody else.
Ada is a strongly-typed, compiled language with a modern, easy-to-read syntax and proven reliability for multicore computing. Ada was designed for use in large, critical, real-time systems where software MUST work at all costs.
"Ada supports run-time checks to protect against access to unallocated memory, buffer overflow errors, off-by-one errors, array access errors, and other detectable bugs. These checks can be disabled in the interest of runtime efficiency, but can often be compiled efficiently. It also includes facilities to help program verification. For these reasons, Ada is widely used in critical systems, where any anomaly might lead to very serious consequences, i.e., accidental death or injury. Examples of systems where Ada is used include avionics, weapon systems (including thermonuclear weapons), and spacecraft." (quote from Wikipedia article linked above).
Ada is freely available as part of GCC / GNAT and should be an easy "apt-get install" on Debian. You can also find up-to-date compilers and libraries (both community-supported GPL-licensed and commercially-supported packages) at http://libre.adacore.com/libre/
Ada can compile to Java bytecode for use in a JVM or compile to binary for bare-metal or embedded use.
I think GO would fit your requirements.
This is my personal feeling but go code looks very similar to python code.
It still has classic compile approach but google will develop some interpreter certainly.
From google site:
Go is a general-purpose language
designed with systems programming in
mind. It is strongly typed and
garbage-collected and has explicit
support for concurrent programming.
Programs are constructed from
packages, whose properties allow
efficient management of dependencies.
The existing implementations use a
traditional compile/link model to
generate executable binaries.
After reading your updated spec:
I need explicit+strong typing to write safer code in the expense of development speed. GIL is important as the code is going to be quite computing extensive and will run on multicore servers, so it has to effectively use multiple CPU
What exactly does "computing extensive" mean? What problem domain? What do others who work in this problem domain use? If you are serious with this specification, you can't do much other things than using C++ in connection with well-tested libraries for multithreading and numerical computing.
my $0.02
rbo

Is there a way to transfer/translate the code written in Java to other languages? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Theoretically this seems possible to me. So can Any one confirm this to me, if it's possible? and if there is such a software that does this?(like Java to C++ or C#)
And in general would it be possible to transfer languages like Java to server-side programing language like PHP?
Translating the syntactical elements of one language and producing another is not trivial but it's not impossible. A good parser can build syntax trees in one language and then emit another. The difficulty of porting code outside the context of simple "Hello World" type applications is twofold:
The libraries of one language will probably differ (e.g. WinForms vs Swing)
Some language features will have to be catered for: (lambda expressions, anonymous methods, different inheritance implementations etc).
It is possible, but the major problem is that Java has a very large runtime library which needs to be made available in the target language in order to be able to do a fully automatic conversion.
For the special case of Java -> .NET, you can use J# from Microsoft to compile it into a .NET assembly which can then be used. Also ikvm.net allows for running a JVM inside .NET.
For PHP I do not believe such a solution exist. You MAY be able to use gcj to create a native library which can be linked in, but I do not believe it is a feasible soultion.
What functionality do you need in PHP?
Visual Studio ships with a Java to C# translator, and even tough it does a pretty decent job, there's still a lot to clean up afterwards.
In my experience you really have to ask yourself if it makes sense to translate code from one language to another. What is the gain? Will the translated code be maintainable? If the answers to these questions point in the wrong direction, translating is probably not the right approach.
Google Web Toolkit does conversion from Java to JavaScript:
http://code.google.com/webtoolkit/overview.html
to answer your question, yup, theoretically this is indeed possible and practically such technology is used every day :)
The interesting thing, in my opinion, is that the Java converters typically convert by taking the bytecode, not the source code. Then it's, say, bytecode-to-ObjectiveC source code. For some converters (at least one opensource one) it's bytecode-to-XML then XML-to-target-language.
For example, the Uniwar application for the iPhone, which has been acclaimed by all and made its way to the appStore's top ten, as been written in Java (JME) and automatically converted from the Java bytecode. Reaching the top ten, even for a few days, means that this is deployed on a lot of machines ;)
In the Real-World [TM], Cobol-Java and, weirdly, Java-Cobol are not unheard of.
For all this to work that said you need a really good converter :)
Theoretically it is possible. But as others pointed out the main problem is to translate libraries.
Some time ago I made Java to Tcl(XOTcl) and Java to Python translators to evaluate the translation posibility. Search by java2tcl and yava2python.
They convert syntax but do not make relevant constructions translations (e.g. Java file operations to Python ones). That would require more development time.
In general my opinion is what such a translation may be possible. But only if your translator covers classes/libraries of the converted project.

Can Jython replace Java? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
So many server side and the mobile Java applications use the native Java language for Java. Can we use Jython instead to build the enterprise applications, e.g. websites, application servers etc.
Also what do you feel about Java ME applications in Jython.
P.S. Comments on question also welcome.
No, Jython is not a suitable replacement for Java. Consider, for instance, that it provides no way to implement interfaces without writing the interface in Java and then writing a class leveraging it in Jython.
What's needed is a JVM-targeted equivalent to Boo. Boo is a language targeting the .NET CLR which is roughly inspired by Python but not compatible, and which fully exposes the CLR's functionality (thus being feature-equivalent with C#). There presently is no Pythonic language with feature parity with Java -- and such a language would necessarily be incompatible with Python, as Python simply doesn't provide a way to express some of the relevant concepts (such as interface typing information).
Since there have been some questions about this, yet me clarify:
Jython is not a replacement for Java in the sense that you can't take an arbitrary Java project, decide to implement a random subset of that project in Jython instead, and not have anyone else on the development team know or care. Certainly, Jython is suitable for many of the same classes of projects that Java is, except when you're building an interface which will be called from Java, rather than the reverse. Given as "enterprise applications" tend to have a lot of components which operate closely with each other, being able to build a class with an arbitrary external interface is important... and is something which isn't readily done in pure Jython.
It depends largely what your requirements are. All languages have their strengths and weaknesses. There is no perfect language and making intelligent decisions about language choice is an important skill for a programmer.
I disagree with Charles that Jython can't replace Java because you can't implement Interfaces. In dynamic languages, it's seen as a feature that you don't need Interfaces (see duck typing).
Jython takes the great language features of Python and combines it with easy access to huge collection of Java libraries. However it does have a price in terms of overhead, and being a fairly new language you have less programmers and support to back you up.
The nice thing about Jython is that you can rewrite individual modules in Java if performance is a problem. The speed issue is less notable on servers where you can throw hardware at the problem. On mobile phones, performance is still key and I can't see Jython making a big impact there in the near future.
So will Jython replace Java? No, I don't think it will - the momentum behind Java is too great. But we will see increasingly more software with parts written in Jython and other languages targeting the JVM.
No, Jython cannot replace Java, because Python is dynamically typed. Yes, Jython can be used for server applications.
Jython is not good for Java ME, because Java ME has very limited resources. Maybe several years later.
Edit: dynamically typed, not weakly
Yes. The Dynamic languages are the future. I heard this from SUN representatives about one yar ago. The amazing part about Jython is "native" usage of java libraries.
Services and other components may remain in Java but integration layer will move for sure in the dynamic languages direction.
I think Python is natively object oriented like Java that's why they match so good.
I don't expect something like that from other scripting languages like Perl but for sure Ruby and Groovy are also good alternatives.
You can use Jython for that domains.
Obviously, the world for web development is moving towards dynamically typed languages as Python or Ruby. Jython is nice in this domain, since it allows to use the complete "Java stack" (App server, Tomcat, deployment and testing infrastructure,...) in enterprises. You can also implement your business logic in Java.
However, Jython is not a replacement for Java as it is C#. You (may) gain programmer productivity, but you lose performance and compile time checks. The languages are really different and are "optimized" for different tasks. I would propose to evaluate Jython for smaller, non-critical tasks, e.g. by starting to implement some test cases in it.
I doubt that Jython is useful for Java ME development. The reasons are given in the other posts: Resource constraints.

What are the best practices for the Middleware API? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
We are developing a middleware SDK, both in C++ and Java to be used as a library/DLL by, for example, game developers, animation software developers, Avatar developers to enhance their products.
What I would like to know is this: Are there standard "Best Practices" for the development of these types of API?
I am thinking in terms of usability, readability, efficiency etc.
My two favourite resources on the subject: http://mollyrocket.com/873 and http://video.google.com/videoplay?docid=-3733345136856180693
From using third party libraries on Windows I've learned the following two things:
Try to distribute your library as a DLL rather than a static library. This gives way better compatibility between different c compilers and linkers. Another problem with static libraries in visual c++ is that the choice of runtime library can make libraries incompatible with code using a different runtime library and you may end up needing to distribute one version of the library for each runtime library.
Avoid c++ if possible. The c++ name mangling differs alot between different compilers and it's unlikely that a library built for visual c++ will be possible to link from another build environment in windows. When it comes to C, things are much better, in particular if you use dll's.
If you really want to get the good parts of c++ (such as resource management through constructors and destructors), build a convenience layer in c++ that you distribute as source code that hides away your c functions. Since the user has the source and compiles it locally, it won't have any name mangiling or abi issues with the local environment.
Without knowing too much about calling c/c++ code from Java, I expect it to be way easier to work with c code than c++ code because of the name mangling issues.
The book "Imperfect C++" has some discussion on library compatibility that I found very helpful.
The video from Josh Bloch mentioned by yrp is a classic - I second that recommendation.
Some general guidelines:
DO define your API primarily in terms of interfaces, factories, and builders.
DO clearly specify exactly which packages and classes are part of the API.
DO provide a jar specifically used for compiling against the API.
DO NOT rely heavily on inheritance or the template method pattern - over time this becomes fragile and broken.
DO NOT use the singleton pattern or at least use it with extreme caution.
DO create package and class level javadoc explaining usage and concepts.
Take a look at Framework Design Guidelines. I know it is .NET specific, but you can probably learn a lot of general information from it too.
There are lots of ways to design apis, depending on what you are solving. I think a full answer to this question would be worthy off a whole book, such as the gang of four patterns book. For Java specifically, and also just OO programming in general, I would recommend Effective Java 2nd Edition. The first is general and a lot of popular programming patterns, when they apply and their benefits. Effective Java is Java centered, but parts of it is general enough to apply to any programming language.

Categories

Resources