Using Mahout in Java Application - java

i want to write a Java Application (for University) which uses Latent Drichlet Allocation (LDA). The only Framework i found which offers LDA was Mahout.
I have quite some expierience in Java programming, even though i would not consider myself a Java Pro (I am coming from PHP).
The application will not be used in a distributed computing context, so the mahout / hadoop way might be a way over the top, but if i am right it should at least work.
My Problem:
The Mahout wiki etc. does not really help me, in fact i do not understand a single word. I dont want to use mahout in that "terminal way". I just want to load the classes into my application and kind of do something like that:
documents = obj.load(Documents);
mahout.doLDA(documents);
(I know it will not be that easy, but i am sure you know what i mean).
thanks

Mahout's libraries could be used in local mode, without full Hadoop cluster. You can look to examples from "Mahout in Action" book to see how this could be done.

Related

Python web frameworks vs Java web frameworks (how is web development in Python done?)

I am thinking in starting a personal pet web project to experiment with different things and extend my knowledge.
I use Java a lot at work (for web applications :D) and was thinking in making my own in Python since I kinda like this language but never passed the simple scripts stages.
I want to step up a gear regarding Python (using 2.6.5) and don't know what to expect or what framework to choose from: Django, Pylons, web2py etc.
I also don't know how much these frameworks will offer me and how much will I have to write from scratch.
I could use a comparison with Java if somebody can provide me with. I'm thinking at filter functionalities such as sitemesh, custom tags like JSTL; In Python, can I write clean pages of HTML with tags in them or write a lot of print statements (something like servlets did in Java etc?
I don't know exactly how to phrase this question.
I actually need a presentation of how web development is performed in Python, at what level, and what the web frameworks bring to the table.
Can you share from your experience?
TIA!
hi try bottle python framework (bottle.paws.de / bottlepy.org) its really nice to use blistering fast and gets out of your way + the best thing about it is that its one single file to import, i recently migrated from PHP and i have to tell you am so ... loving it!
Python web frameworks run the full gamut of capabilities/facilities, all the way from shims around WSGI such as Bottle and Flask, all the way to full frameworks such as Django and TurboGears, and even "megaframeworks" such as Zope. Each does things slightly differently, but there will be some familiarity from one to the next.
It may sound strange, but there's no need to know "how web development is performed in Python" to start doing it.
In fact, working with language/framework/etc is a single most reliable way to get understanding of it. You won't gain a lot from one-page summaries.
Also, comparing it with Java isn't likely to help. There's no point in doing "Java-style development in Python". If you want to benefit, you'll need to clear your mind and do everything "Python-way".
As to what Python framework to choose, Django seems like like a good starting point. It's very popular, which means you won't be left without tutorials/documentation/help.
PS Short version: just do it.
Python web frameworks do it in a similar way as some Java-based frameworks. I can speak for Django here.
A good comparison could be Play! vs. Django. Both of them foster using an MVC architecture (or MTV = models, templates, views) and already provide you with a lot of things like CRUD operations in admin pages, ORM, authentication, URL configurations, a template language and much more.
Other Java-based frameworks might differ a lot, and I can't give you a general answer. Depending on the choice, there are only few differences. You can simply choose the language and framework you like the most. I'd recommend to go through some tutorials (Django tutorial, Play! framework tutorial for instance) and look which one works best for your needs.

tool to graph method calls over time

I'm looking for a tool that can graph method calls over time for a java app. Perhaps a profiler or other log parsing tool?
I know I can write something in python and I'll work towards doing this. I was just hoping not to reinvent the wheel.
edit:
What I ended up doing was writing some python to parse my logs and take snapshots at 5 second intervals. Then I used google docs and a spreadsheet to visualize my data with a chart that had 2 columns of data: time and frequency. Google docs was super useful. Use the "move chart to own sheet" for a nice fullsize view. I'll post my python when I clean it up a bit.
here is the output graph from the method I specify in my comment
Check out JProfiler. I wouldn't suggesting writing your own tool, this is a space with lots of players already....unless you're really looking for something to do. :-)
you can also check the NetBeans profiler, that's quite straight forward if you application a standard Java code (I mean, it's a bit more complicated with projects deployed in Glassfish for instance)
(from Google Image from Dr. Dobbs)
EDIT: sorry, after another look at your question, it's not exactly what you were looking for, but it might be interesting anyway
YourKit Java Profiler is probably the most powerful Java profiler out there. It is not free but not unreasonably expensive either. If it doesn't have the feature you are looking for, I kinda doubt any application would.
VisualVM is a visual tool integrating several commandline JDK tools and lightweight profiling capabilities. Designed for both production and development time use, it further enhances the capability of monitoring and performance analysis for the Java SE platform.

Embedding dendrogram in Java

I'm looking for a library capable of drawing dendrograms of data in Java (not calculating them, I can do it by myself).. do you have any clues? Already tried to search it over Google but haven't found anything that is not stand-alone (while I need to embed the generation inside my program).
Thanks!
Check out the JUNG graph library. It won't perform the actual clustering for you but is a really good library for visualising your results.
Take a look at Archaeopteryx. It has fairly many features; it's open source, and it is available in a pre-packaged jar file.
BTW, I use JUNG and really like it. It can perform various clusterings, but AFAIK, it has no inherent dendrogram capabilities. Because it has graphing capabilities, you could roll your own dendrogram, but it would take some work.

What's an example of Java functionality that I could add to a JRuby/Rails project?

This is actually two questions rolled into one.
Is there a particular type of Java functionality that people are using JRuby for or is it mainly because of the performance advantage that JRuby gives versus the MRI?
The reason I ask is, I'd like to add some Java functionality to a Rails project (just to show that it's possible). Ideally this Java functionality would also be useful rather than redundant. Which leads to my next question . . .
What's an example of something that would make a good demonstration of Java functionality being added to a simple Rails CRUD app?
I guess anything you can do in a Ruby class, you could just as easily do in a Java class (with about twice as much code), so I understand the question may be hard to answer. I'm just wondering if there is a particular type of functionality that is more appropriate to do in Java.
There's nothing at stake here, by the way. I'm just playing around and testing things out.
I'd find a focused Java library that fills an existing need in Java and work on leveraging that library within your rails application. One possibility is to use JTS (Java Topology Suite) to manipulate geographic regions in a rails app and convert them for display on a map within your rails app.
I think the Ruby class libs (gems) can do pretty much anything the Java libs can do, so there's not much of a compelling reason to use JRuby on Rails in a "stand-alone" scenario.
I use it to integrate with a vendor Java app. Knocking up a quick controller and some views is much easier than extending using Java/Swing.
Also in a "Enterprise" environment, a rails developer may be obliged to deploy to Tomcat or Glassfish. The Warbler gem for JRuby enables this.

What's the best/easiest way to manipulate ActiveX objects in Java?

I want to open and manipulate Excel files with ActiveX. I've had success with Python's Win32 Extensions and Groovy's Scriptom libraries on other projects but need to do this is pure Java this time if possible.
I've tried the Jacob Java COM Bridge but that doesn't seem as straightforward or simple to use, and I couldn't get it to retrieve cell values (even though this is the library underlying Scriptom). Are there alternatives?
Jacob is really the tool for the job here. I recommend that you take the time to learn a bit about how COM and ActiveX work, and I think you'll find that it's easier to use. COM is quite an accomplishment, but it's hard. Wrappers like VB make it seem easy (For the limited use that they work for), but it is not at all easy. I have a great book on learning COM, but don't have the name handy right now...
You want to learn about the IDispatch interface (this is what most of Excel's COM interface is developed around). It's a nasty, nasty interface (one of those viral things that you can do so much with it that it becomes impossible to tell what is actually happening) - but learning it is key.
If you are having issues in just one area (i.e. getting a value from a cell), you could grab the source for Scriptom and see what they do (open source, after all!).
Another suggestion is to try to implement some test cases of your code in VBA and make sure that you are correctly thinking through all the return values. When we were doing Excel automation in one of our Java apps, we implemented the general algorithm from Word's VBA, worked through the problem cases, etc... After that, transferring over to Jacob was pretty straightforward.
K
how about http://www.nevaobject.com/_docs/_java2com/java2com.htm -- this is commercial but works better.
Have you looked at JExcelAPI? Instead of using ActiveX this is a Java library which directly reads and writes Excel files.
Not an exact answer to your questions but it might solve the problem just as well, especially if you're looking for a pure Java solution.
There's also JIntegra, which does a similar thing. Also commercial.
And there's JNIWrapper, which does a similar thing. again, also commercial.

Categories

Resources