As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need to run various machine learning techniques on a big dataset (10-100 billions records)
The problems are mostly around text mining/information extraction and include various kernel techniques but are not restricted to them (we use some bayesian methods, bootstrapping, gradient boosting, regression trees -- many different problems and ways to solve them)
What would be the best implementation? I'm experienced in ML but do not have much experience how to do it for huge datasets
Is there any extendable and customizable Machine Learning libraries utilizing MapReduce infrastructure
Strong preference to c++, but Java and python are ok
Amazon Azure or own datacenter (we can afford it)?
Unless the classification state space you are attempting to learn is extremely large, I would expect that there is significant redundancy in a text-mining-focused dataset with 10-100 billion records or training samples. As a rough guess, I would doubt that one would need much more than a 1-2% random sample subset to learn reliable classifiers that would hold up well under cross-validation testing.
A quick literature search came up with the following relevant papers. The Tsang paper claims O(n) time complexity for n training samples, and there is software related to it available as the LibCVM toolkit. The Wolfe paper describes a distributed EM approach based on MapReduce.
Lastly, there was a Large-Scale Machine Learning workshop at the NIPS 2009 conference that looks to have had lots of interesting and relevant presentations.
References
Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung (2005). "Core Vector Machines: Fast SVM Training on Very Large Data Sets", Journal of Machine Learning Research, vol 6, pp 363–392.
J Wolfe, A Haghighi, D Klein (2008). "Fully Distributed EM for Very Large Datasets", Proceedings of the 25th International Conference on Machine Learning, pp 1184-1191.
Olivier Camp, Joaquim B. L. Filipe, Slimane Hammoudi and Mario Piattini (2005). "Mining Very Large Datasets with Support Vector Machine Algorithms ", Enterprise Information Systems V, Springer Netherlands, pp 177-184.
Apache Mahout is what you are looking for.
Late answer, but here is a good link for large scale data mining and machine learning:
The GraphLab project consists of a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API. In addition, we are actively developing new interfaces to allow users to leverage the GraphLab API from other languages and technologies.
Im not aware of any ML library that uses map/reduce. Maybe you have the capability to use an ML library and a Map/Reduce library together? You might want to look into Hadoop's Map/Reduce:
http://hadoop.apache.org/mapreduce/
you would have to implement the reduce and the map methods. The fact that you use so many techniques might complicate this.
you can run it on your own cluster or if you are doing research maybe you could look into BOINC (http://boinc.berkeley.edu/).
On the other hand, maybe you can reduce your data-set. I have no idea what you are training on, but there must be some redundancy in 10 billion records...
I don't know of any ML libraries that can support 10 to 100 billion records, that's a bit of an extreme so I wouldn't expect to find anything off the shelf. What I would recommend is that you take a look at NetFlix prize winners: http://www.netflixprize.com//community/viewtopic.php?id=1537
The NetFlix prize had over 100 million entries, so while it's not quite as big as your data set you may still find their solutions to be applicable. What the BelKor team did was to combine multiple algorithms (something similar to ensemble learning) and weight the "prediction" or output of each algorithm.
Take a look at http://hunch.net/?p=1068 for info on Vowpal Wabbit; it's a stochastic gradient descent library for large-scale applications.
A friend of mine has worked on a similar project. He used perl for text mining and matlab for techniques as bayesian methods, latent semantic analysis and gaussian mixture...
See this list of large-scale machine learning resources (courses, papers etc): http://www.quora.com/Machine-Learning/What-are-some-introductory-resources-for-learning-about-large-scale-machine-learning
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm Developing a CBIR (Content based Image Retrieval System) as part of my BE project.
Which of the below mentioned tools will be better to be used for image processing??
1-> Matlab
2-> Mathematica
I'm planning to develop this system using java as a front end of the system. which of above systems will be better. Or should I go for some 3rd party image processing API's available for java ????
I used Mathematica for years and still found it easier to learn Matlab from scratch in order to do an image processing project. The thing that makes Matlab better here is that many state-of-the-art image algorithms have code available. For instance, for content-based image retrieval you need to extract content features, and vl_sift library does that. Also, you can bundle your Matlab library to run as a stand-alone executable, and I don't know if that's possible with Mathematica.
I previously suggested ImageJ and others mentioned ImageMagick since I mentioned Java environment. However, I would like to change my suggestion. I came across Intel's OpenCV (Opensource Computer Vision) libraries. This is a great set of libraries for use with C, C++ and Python. This is cross platform too! So porting the code shouldn't be too difficult.
Why I think OpenCV is great is because even novices (like me) in Image Processing can use it. for example, smoothing an image is as easy as calling one function cvSmooth() with a few parameters on which type of smoothing (blur, gaussian etc). It supports much more advanced functions such as Optical flow and blob tracking. And the great thing is, its quick to test out or build simple image transforms.
for more info please go to http://opencv.willowgarage.com/wiki/ . Here you'll find cheatsheats, reference manuals, examples and some tips. great help and starting point.
Thanks
What are your criteria for measuring the relative superiority of programs for image processing ? For example, if you are a Mathematica expert then you will find it easier to use Mathematica for the task. On the other hand, if you are a penniless student then you will find Java and some of its libraries more to your taste.
EDIT: in answer to OP's comments ...
'ease of image processing' is entirely subjective -- if you don't know Mathematica then it will be difficult to use it for image processing -- so this one is your call.
'processing time' is entirely objective -- but do you have the time to try out all 3 of your suggested options and compare them ? For a BE project you'll be far better using the tool you are most comfortable with and spending as little time as you can wrestling with an unfamiliar tool for the sake of a bit of extra speed.
'cellular automata' for image processing -- don't know how relevant it is, but Mathematica has inbuilt functionality for cellular automata.
I would look into the ImageMagick/GraphicsMagick family (SO discussion), which has several Java wrappers (e.g., JMagick).
you could use ImageMagick or why not look into the JMF (Java Media Framework)
Matlab is the better of the two. It has huge built in libraries and implementations of thousands of algorithms. Its fast, easy and well documented.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I want to know what is the best open source Java based framework for Text Mining, to use botg Machine Learning and dictionary Methods.
I'm using Mallet but there are not that much documentation and I do not know if it will fit all my requirements.
I honestly think that the several answers presented here are very good. However, to fulfill my requirements I have chosen to use Apache UIMA with ClearTK. It supports several ML Methods and I do not have any licences problem. Plus, I can make wrappers to other ML methodologies, and I take the advantage of the UIMA framework, which is very well organized and fast.
Thank you all for your interesting answers.
Best Regards,
ukrania
Although not a specialized text mining framework, Weka has a number of classifiers usually employed in text mining tasks such as: SVM, kNN, multinomial NaiveBayes, among others.
It also has a few filters to wok with textual data like the StringToWordVector filter which can perform TF/IDF transformation.
Check out the Weka wiki website for more information.
Maybe have a look at Java Open Source NLP and Text Mining tools.
I've used LingPipe -- a suite of Java libraries for the linguistic analysis of human language -- for text mining (and other related) tasks.
It is a very well documented software package, and the site contains several tutorials which thoroughly explain how to do a certain task with LingPipe, such as named entity recognition. There is also a newsgroup, wherein you can post any question you have about the software (or NLP related tasks), and have a prompt reply from the authors of the package themselves; and of course, a blog.
The source code is also very easy to follow and well documented which, for me, is always a big plus.
As for Machine Learning algorithms, there are plenty, from Naïve Bayes to Conditional Random Field. On the other hand, for dictionary-matching algorithms, they have an ExactDicitonaryChunker, which is an implementation of the Aho-Corasich algorithm (a very, very, fast algorithm for this task).
In sum, I think it is one of the best NLP software package for Java (I haven't used every single package that is out there, so I can't say it's the best), and I definitely recommend it for the task that you have at hand.
You may already know about GATE: http://gate.ac.uk/
...but that's what we've used (at my day job) for lots of different text mining problems. It's pretty flexible and open.
I built a maximum entropy named entity recognizer for CoNLL data using OpenNLP MaxEnt http://sourceforge.net/projects/maxent/ for a course once.
Required a lot of data preprocessing with custom perl scripts do get all the features extracted into nice neat numerical vectors though.
We use lucene to process live streams from the internet. It has a native java api.
http://lucene.apache.org/java/docs/
You can then use mahout which is a bunch of machien learning algorithms which operate on top of lucene.
http://lucene.apache.org/mahout/
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
As a Java newbie I'm wondering: of all the languages in the world, why is Java frequently used for enterprise applications? What makes it that way compared to the other languages? Will it continue to be this way in the upcoming years?
I'd appreciate your insights. Thanks in advance :)
One word: libraries. Java has an vast array of excellent libraries for solving most of the common problems one needs to solve when developing enterprise applications. In many cases, there is more than one good choice for addressing a particular need, and oftentimes those libraries are free and open source under a business-friendly license.
Some have argued that there are, in fact, too many choices in the Java ecosystem, and that developing enterprise software in Java requires developers to make a large number of decisions that can have far-reaching impact on the end product for better or worse. This has probably helped propel the popularity of alternatives like .NET, which has a reputation of offering fewer choices, but with the benefits of a more well-integrated application stack and tools set. What direction you choose depends, I guess, on whether you place more value on "freedom of choice" or "freedom from choice".
There are lots of reasons a large company (the type to go for enterprise solutions) would pick Java. Note I'm not saying all these reasons are correct or valid. But the relevant point is that they appear valid to a CTO at MegaCorp.
Learning Curve
Java is a simple language without much of the flexibility of other members of the C family, this cuts both ways, but it is seen as a straightforward language for use by an army of programmers. Enterprise projects tend to involve large numbers of developers (rightly or wrongly) and it is much easier to get a developer to a minimum level of competence in Java than C++. You also have a whole generation of graduates who have probably been largely schooled in Java.
Choice
Java has a vast array of libraries, frameworks, tools and IDEs, and server providers. To an enterprise its good to have choice, even if that's just for use as a bargaining chip when negotiating price. The language lends itself to code quality tools that allow enforcement of corporate standards (and as mentioned there are a lot of those tools).
Platform Independence
Java is write once, run (well, debug) everywhere. Sun has actively encouraged open standards that allow multiple vendors to implement their solutions. These standards give the customer the comfort that they can migrate from one vendor to another if a given vendor goes under or starts charging more. Of course the reality is that each vendor does their best to provide some "added value" features that tie the customer to them quite nicely.
Maturity
Its been around a long time, running a lot of servers. If your web application needs to be "6 sigma" or similar and you are the MegaCorp CTO, you are not going to look that kindly on Joe the developer wanting to do it in RoR.
Timing/Marketing
Java came out when programming was moving towards the web. It was positioned cleverly and got a strong position early in web development. Because of the open standards, there are some very big companies producing these platforms and they market Java pretty hard to sell those platforms.
Inertia
Large corporations move forward at a glacial pace (a lot are still using Java 1.4 five years after 5 was released), so once they've picked Java, it takes a massive investment to move to another platform. With each day that goes by they're cranking out more Java that would need to be migrated. Most of these companies are not primarily coding shops, so it is a very hard sell to convince the business to spend a few tens of millions rewriting their entire code base for no immediate business benefit.
Another reason might be the care Sun has taken to keep Java backwards compatible. The vast majority of Java code can be run on the latest version of the JVM without a problem. That is quite an achievement, given the age of Java. On the other hand you might argue Java has not changed all that much in all these years.
Enterprises like stability in a platform.
Sun targeted Java to speak to the needs of enterprises early on. It pushes standards that promote vendor independance at every level. Platform independant, database independant, application server independant, etc.
In addition they promoted enterprise level tools for it, in terms of messaging, transaction management and other things that the enterpise worries about.
Before Java, enterprise level stuff tended to be done in C++ (there were plenty of exceptions (does anyone remember PowerBuilder?) but that was the rule) and Java fits well as a successor to C++ for business applications, where that kind of memory management isn't something worth paying for.
In addition to all of that the language itself speaks to enterprises in terms of avoiding hard-to-get-right constructs that can really mess up a code base, such as operator overloading. Enterprise level applications tend to get handled by many different hands, not all of which are the top of the line programmers, and having safty nets to prevent shooting themselves in the foot is a desirable thing.
It also came along at the right time. A new paradigm (this was well before .NET existed) that promissed to combine multiple vendors into an ability to compete with Microsoft, which got the likes of IBM and Oracle on board, that happened to fill a new hole, which was the emerging requirement to develop web applications, where C++ was no longer an obvious choice.
I shouldn't be saying this, but...
The real reason is because it's named after coffee!
Business is about time, money and opportunity.
Using Java means that your number of errors in the code goes down, simply because pointers are hard. You use a GC and you instantly remove an entire class of errors from your code.
Secondly, Java was one of the first languages to ship with a pre written library of functions, which really did cover a lot of the development phase. This restricted the way things were done but it meant that people could learn faster, had more tools at their disposal and had a great set of libraries to do things like network, GUI, web, encryption etc. Java on its own as a language really wasn't that special, but Java plus the Java API was.
So if you've got a language that has less errors and more infrastructure for free, then you end up with more code in less time. Sure the code doesn't cure cancer, it's not as fast as C++ code to achieve the same task, but it will achieve the business' goal of getting an application.
If you make more code, for less money, you can pursue more opportunities. You then bring inertia to the table in terms of code that's already been implemented in Java and you start seeing why the business doesn't want to move away from their comfort zone.
Personally I believe one major reason is the cross-platform issue.
Java programs written "correctly" (without assumptions of the underlying operating system) can run on any JVM. This means that you are not tied to a particular platform, unlike .NET which marries you to Windows.
I have seen Java code run on mainframes, Linux routers, inside Oracle database, and naturally on PC's.
It's cheap, RAD, cross-platform, and developers abound.
Developing in C++ is too slow and expensive and .NET has not been around for long enough. Business inertia is huge, remember.
Businesses want languages that are backed by a professional provider (ie. a company like Sun) and often stay away from Open Source languages for the simple reason that it was not written by a company.
Also for client-server applications, you have an abundance of choices for production-quality app servers that have the same J2EE interface (IBM WebSphere, BEA Weblogic, JBoss). Alternatively, you could use the Spring Framework on any server like Apache Tomcat the complies to the Servlet API if you're convinced you don't need EJBs. In contrast to .NET, it's hard to find choices with respect to app servers.
There are an abundance of choices with regards to frameworks for a given task be it an ORM tool, logging, collections, caching, web UIs, etc. There is no hardly any need to reinvent the wheel.
Finally, while it's fashionable these days to lament the very real shortcomings of Java the language, it's a language where folks know how to get things done and how to avoid certain anti-patterns.
The other answers are all good. Two things need to be added, corporate standards and the bandwagon effect. If you want to build an enterprise system you need to have pretty strong case for not using your company's entrenched standards and this is mostly JavaEE. And if you need to resource a project it is a lot easier to recruit a Java programmer than it is, say, Erlang.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Like most people, I learnt C++ after C. I learnt C++ because it was one of those languages which fetched jobs. I am still studying (doing masters) though. One of my cousins has been working as a developer for around 12 years.
He advises me to learn Java so that I can land up in a good job. He says only few sectors like tele communications use C++ and almost all the enterprise software is developed in Java. I am working as a student technician in my university. The job involves Windows programming in C++. I learnt core Java. I do not know any of those Struts, Hibernate or whatever is out there.
To be on the safer side, I am planning to master both. But I think Java is an ocean, though it's easier to program in Java than C++ (this is my opinion. C++ lovers don't kill me).
What do you geeks and professionals out there advise me. Do you think I should learn both of them. Which sectors of the software industry use C++?
Best advice I ever got as an undergraduate was from my languages professor, who told me (paraphrasing here): "Don't memorize languages; don't marry yourself to a language. They're just tools. They all do the same basic things. Instead of learning a specific language, learn the foundations of good software development. Then you'll be able to take any language and be successful with it."
I feared learning new languages before he told me that. Afterwords, I took his advice to heart and it's made all the difference. Languages are just tools in your toolbox. If you can program and write good software in one language, you can do it in another.
C++ is still used extensively for performance intensive tasks. For example, I do a lot of work with video compression and decompression - it's all C++. Partly because we need the performance, and partly because all the 3rd party libraries which need to use are C++ so it's easier to integrate with them if we're using the same language.
A brief off-the-top-of-my-head listing of projects that use C++:
Device driver development
Games
Advanced engines (audio, image processing, etc)
Telecom
Embedded software (devices and such)
C++ is not even close to being dead.
My experience:
Bloomberg, Reuters, and all their bank/trading clients who want low latency market data feeds use C++. (Fidelity, JPMorgan, BNP, Citibank, etc.)
The defense industry uses C++. (among other languages - Java is also popular)
These industries also use java and other languages, but C++ is hard to beat. Note also that there is a HUGE existing C++ code base that will take a long time to go away (if ever)
It is not hard to know and use them both. Don't worry too much about a particular language. That is not the real crux of developing software.
Some site calling itself Google reckons it uses C++ extensively. Thank God they don't need performance to index the few websites out there.
If this Interweb thingy takes off, they'll be in real trouble though. (-:
Seriously though, apart from the current change of channel (reversion?) to C and Perl, the previous four projects were all C++.
a large scale departure control system for several of the world's major airlines (30,000 10MB log files per day)
a flight training simulator and ground based software proving platform for a new military aircraft
a large scale framework for an existing air-traffic control simulator that was used to incorporate the new Aeronautical Telecommunications Network (ATN)
an advanced radar processing and display system for an on-route air traffic control centre
All really interesing.
Video games still use C++. One example that comes to mind is the Torque engine which can be used for iphone and xbox game development.
C++ is not anywhere close to being dead, and as suggested by others, don't marry yourself to a language.
If you're learning C++ in terms of job prospects, I'd rather suggest you pick up a representative of a programming paradigm.
C - Imperative programming
C++/Java/Obj-C - OOP
Lisp/Haskell/Scheme - Functional Programming
Getting adjusted to the paradigms is usually harder than picking up the language itself. Hope others would chime in for other suggestions.
Many defense contractors, especially when dealing with hardware, use C++
Many makers of advanced medical equipment (e.g., MRIs) use C++.
I say learn both if you can. It's good to have the C++ foundation even if you end up in a sandboxed language like C++/C#. Too many people don't understand the basics.
Note that different cities have a majority of jobs in a specific language. Go to the craigslist job boards for the city you think you'd end up in and see what they seek. For example, Pittsburgh is a C# town whereas New York is more balanced.
C++ is still used extensively in engineering applications, such as high-end CAD/CAM systems.
I think the death of native code is much exaggerated. Even Microsoft has backed away from making C++ all about .NET in Visual Studio 2008. It seems to be actively improving its native C++ development tools again as well as making interop with the managed world more easy.
My advice, learn C#. These days almost 50% of the job listings were Java and almost the other 50% were C#, there were only a few listing C++.
That said, the people using C++ know why they are using it and probably have good use cases for it... meaning you would really learn how to use the language at the highest levels... meanwhile companies slogging out business logic applications in Java/C# will probably not teach you as much about programming as quickly.
If you do decide to take a C++ job, make sure the shop uses the STL effectively and that they are using Boost. Not looking at using Boost/STL, but actually using it.
Keywords: Banking (financial institutions), games
Oh, and also particle physicists and other scientists using the (in)famous ROOT framework.
And check out this list of C++ based applications.
Bjarne Stroustrup has a list of companies/applications written in C++.
Your university didn't have any Java based courses? That's odd!!
You should learn not just C++ and Java, but should be exposed to several languages and paradigms
I tend to agree with your cousin. Java definitely is used in more enterprise applications (think things that a bank or insurance company would need), while C++ is used in more "niche" markets. I also agree with you that Java is easier to learn and master than C++. No shame or offense in admitting that.
We use both where I work, writing embedded software in C++, but communicating with it from GUI apps written in Java. There's definitely plenty of room in the software industry for both (and many more) languages.
Most packaged software is still written in C++. That means games, office applications, graphics and video editors, and operating systems. In fact, if you think of the software you use every day that isn't online, chances are it is written in C++ (or C or objective-C). It's not written in Java, Python, Ruby, Perl, etc. While packaged software is becoming less prevalent, it's still used a whole lot and isn't going away anytime soon.
Operating systems, whether embedded or for the desktop are almost exclusively written in some C language. Attempts have been made to write them in Java or C#, but none of those have worked out. All of your drivers are written in some C variant as well. I don't see this changing any time soon.
Java is great until you have to start caring about the details it abstracts away. Most people don't need to do that.
My advice is to master whichever language you enjoy more. I know C++ and Java well, and have used both professionally. However, I would never accept another C++ job because I find C++ tedious and mind-numbing. I rather invest my time mastering more interesting languages.
I've been writing in C++ primarily for internal and external desktop applications within manufacturing and telecom companies for a long time, and only worked full time with Java for about 1 year out of 16.
Java never seemed to take over the desktop like they were predicting back in the mid 90s when it first came out, and now C++ seems to have more competition from .net and AJAX/web applications.
I can't really speak for it's use in back end systems, but I have seen more of it there than I've seen C++, so it depends of what kind of programming you are talking about (e.g. user facing vs. internal systems).
Well I suggest learning more than a couple languages. I work in the communications industry and yes we use C++, however a lot of projects we are working on are in Java as well.
I suggest you learn:
1) C++ / C
2) Java
3) Erlang or Haskel
4) Ruby
This will help you understand concepts that exist in one language but not in another. Also helps with marketability.
Take a look at the tiobe programming index and you will get a feel for languages on the rise.
C++ is heavily used in embedded systems.
It is being used in UI development. For eg. famous QT framework is written in C++.
Ultimately it depends on what you want to do. If you want to work in embedded systems, better learn C/C++.
I use C and C++ regularly at my job where I work on embedded devices. We're not talking super low level stuff, here, either. I do mid-level and UI work. Think Qt, Gtk+, Windows Mobile SDK, etc.
C++ is heavily used by ISVs. Think Microsoft, Adobe, Oracle, Sun, Symantec...
Game industry looks like go with C++ forever.
I had to scroll all the way to the bottom to finally see a post mentioning video games. C and C++ are huge in that industry.
As far as the "What should I learn question." That's pretty open ended, you want to learn as much as possible. So yes, you should learn both of them. As a few people have said, C# is a good one too. You should also learn Python, Ruby, or PHP and some web frameworks (Rails, Symfony, Django, ASP.Net). Web apps are pretty big these days.
If your bubble consists of only C++ you aren't going to be very marketable.
EDIT: Oops, I guess games were mentioned a lot more than I initially saw in here. My bad.
In my career I've developed C++ for:
Server applications (Securities industry)
Embedded software (Wireless telecoms)
Desktop application software (Wireless telecoms)
I've also seen a lot of adverts for C++ developers in the defense industry. There is work out there for C++ developers, not as much as there has been in the past, but still plenty of work.
Though I say it wouldn't hurt to learn Java, it is a good skill to have under your belt.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
On a recent Java project, we needed a free Java based real-time data plotting utility. After much searching, we found this tool called the Scientific Graphics Toolkit or SGT from NOAA. It seemed pretty robust, but we found out that it wasn't terribly configurable. Or at least not configurable enough to meet our needs. We ended up digging very deeply into the Java code and reverse engineering the code and changing it all around to make the plot tool look and act the way we wanted it to look and act. Of course, this killed any chance for future upgrades from NOAA.
So what free or cheap Java based data plotting tools or libraries do you use?
Followup: Thanks for the JFreeChart suggestions. I checked out their website and it looks like a very nice data charting and plotting utility. I should have made it clear in my original question that I was looking specifically to plot real-time data. I corrected my question above to make that point clear. It appears that JFreeChart support for live data is marginal at best, though. Any other suggestions out there?
I've had success using JFreeChart on multiple projects. It is very configurable. JFreeChart is open source, but they charge for the developer guide. If you're doing something simple, the sample code is probably good enough. Otherwise, $50 for the developer guide is a pretty good bargain.
With respect to "real-time" data, I've also used JFreeChart for these sorts of applications. Unfortunately, I had to create some custom data models with appropriate synchronization mechanisms to avoid race conditions. However, it wasn't terribly difficult and JFreeChart would still be my first choice. However, as the FAQ suggests, JFreeChart might not give you the best performance if that is a big concern.
I just ran into a similar issue (displaying fast-updating data for engineering purposes), and I'm using JChart2D. It's pretty minimalist and has a few quirks but it seems fairly fast: I'm running a benchmark speed test where it's adding 2331 points per second (333x7 traces) to a strip chart and uses 1% of the CPU on my 3GHz Pentium 4.
Live Graph supports real-time rendering.
I'm using GRAL for real-time plotting. It's an LGPL Java library. Although it's not as powerful as JFreeChart it has a nicer API. I got a plot up and running in very short time. They also ship a real-time plotting example.
I found this question when I was googling for open source plotting libraries for java. I wasn't quite happy with the answers posted here so I did some further research on the issue.
Although this question has been posted back in 2008 this might still be interesting to someone.
Here is a list of Open Source Charting & Reporting Tools in Java
http://autoplot.org/ allows for real-time updates and can be used to create many types of scientific plots.
To update the plot, specify the URL to a data file and then append &filePollUpdates=1&tail=100. See the example at http://autoplot.org/cookbook#Loading_Data
Waterloo Scientific Graphics is a new LGPL project. Data objects are observable and could be updated in a real time plotting scenario.
For details see http://waterloo.sourceforge.net/
A few screenshots:
The library I wrote, Plot4j, also supports real-time plotting.
I used JFreeChart (http://www.jfree.org/jfreechart/) on a previous project. It has some very good built-in capabilities, and the design was WAY extensible so you could always roll your own extension later if you needed some custom chart annotation or wanted an axis to render differently, or whatever. It's definitely worth checking out.
Check ILOG's JViews - they have a lot of stuff and something might fit your needs. All of them are extremely configurable and quite fast. Not free though.
I've used JFreeChart in a rather complex application that needed to visualize data streams and calculations based on the data. We implemented the ability to visually edit the data plots by mouse and had a very large set of data points. JFreeChart handled it very well.
Unfortunately I was stuck with v0.7, but the newest release are sooo much better when it comes to API clarity. The community is very helpful and the developers are responding to mails too.
If you're doing a web application and don't want to bother with libraries, you can check the Google Chart API. Didn't use it myself, but I started some tests which were very promising.
For real-time plotting you can use QN Plot, JOpenChart or its fork Openchart2.
JHandles is an alternative graphics package for Octave (a math package). It is probably worth looking into, but being Octave specific may not have what you need.
-Adam
PtPlot
may be a good choice. Formerly called Ptolemy.
jcckit can handle real-time plotting. It's a bear to use though.
I forked it, and made a very simple wrapper around it for non-realtime plotting. The underlying complicated interface can be used directly too.
https://bitbucket.org/hughperkins/easyjcckit
You might want to check out JMathPlot