I've been doing some programming in Java and some in C but now I need to sort of use both together.
Here's the situation, I'm using Hadoop/Hbase to process and store a lot of data but I'm using C/Cuda to do number crunching on the data. Is there a stable/mature/common way to take data (it's basically a log file) in Java and pass it to a C program, which C processes the data it stores it as a linked list that is then accessible by the Java app?
I might not be searching for the right thing, but so far I found JavaCPP, which is good but seems to involve both programs together. Because Java handles the data flow and C handles the processing of the data, I thought it might be better to keep them as independent programs that can communicate to each other as opposed to a single program that may become confusing. But I'm totally flexible so any suggestions/solutions are welcomed.
You may find it easier to keep the programs testable and clear if you leave them separate and then use a client-server approach, or simply choose a common file format and have the latter steps poll the output directory for new files to process.
To make it easier to define file formats across different languages, consider a package like Apache Thrift or Google Protocol Buffers.
Here what I have on the top of my head
1. run C program using command line from java app.
2. Use JNI/JNA
3. Implement your own "client-server" architecture. It sounds complicated but in some cases it may be the best and the simplest solution.
4. Communicate using Web service, SOAP, REST, whatever.
I hope this is helpful for the beginning.
You are welcome to ask more specific questions once you have.
Related
I'm curious to know if it's possible to fetch data from other applications to my Java program. I know java is running in a virtual environment, hence JVM and therefore have problems communicating with other applications unless you were to use the Robot class or so forth.
What I would like to do, for starters as educational purposes is to take let's say a music application like Spotify/iTunes, fetch the playlist (text data) and send/display it in a text file. I've tried a few things so far, and the only thing I've come close to is by using the Robot class, opening the application, doing Ctrl+A, pasting it in to a text document and so forth but that's more like a macro. I would like to make a java application that would do this automatically. Is that possible in any sense with Java or are you just better off changing languages? I wish to do it with Java though because it's the language I've studied for the past year and what I'm trying to master. (Sorry for the long explanation.)
It's nothing to do with Java and the JVM. Any language has the same problems and solutions for this sort of situation.
The thing you need to talk to has to provide you a way to talk to it. You need to talk to it using that way.
Methods include pipes, custom network protocols, SOAP and Restful web services, etc.
Just because an application runs in a virtual machine doesn't mean it cannot access external data through an API provided by an external program. For example, iTunes has a COM-based API for accessing playlists, and here is an example of using it from C#. You'll need something which allows accessing COM objects from Java.
(Please note I know nothing about this topic, this is just what I found with a little searching...)
You need to know which data interfaces provides the application you want to connect to.
As an example, maybe the application writes files to disk or stores some info in a database.
Then with Java you can read files, query the database, use an API (web services, REST, etc..) and so on...
I'm currently developing a GUI for a Java-application that I've created. I would like to keep the GUI in a separate process from the rest of the client. The rationale behind this is:
Reduced risk of crashing. E.g. a OutOfMemoryError in the GUI won't crash the rest of the client.
I get an API "for free". If I later on want to allow other people to programmatically access the client I can just let them use the same API that the GUI is using.
I'm writing the GUI in SWT and the client is created using IntelliJ. Since Eclipse has a lot better SWT-support it makes sense to keep them separate, so that I can use Eclipse for the GUI-code and IntelliJ for the rest.
My question is now: what technology should I use to expose the client's interface to the GUI? RMI is what came to mind first. However, that has the disadvantages of restricting the API to be Java only. I'm also not sure about how well suited RMI is for large scale deployment (e.g. how does it handle local firewalls?). However, I don't want to rule it out just yet.
I should also mention that I have some deployment-requirements as well:
Must be able to run as non-admin.
Must be able to handle restrictive local firewall-restrictions.
Deployment must be automatic (it's a large scale consumer-app) and work on Windows, Mac OS X and Linux.
Given these restriction what solution would you use?
I faced this same situation a while back, except that the back-end was in Python, and the GUI was in java.
Important points to consider:
how flexible and granular the interface between the GUI and the back-end needs to be. Do you want to be able to do one thing from the GUI? 5 different things? 10? 50? How tightly coupled is the GUI -- will it know about/be calling individual methods in the back-end?
how the output gets from the back-end to the GUI. Can it simply write to STDOUT or to temp files? Does it need something more elaborate?
the format of the output. Ideally, it should be easily parseable, which indicates XML or JSON may be your best bet.
You might find JSON-RPC useful: it's a standard for remote method calls to separate programs.
All in all, it's hard for me to say what would be best for you. I ended up avoiding RPC, and gave the back-end a simple command-line interface; output was written to temp files and STDERR, as JSON objects. I feel that this was a good decision because it kept the interface between the programs very simple and uncoupled.
Sorry in advance if this is a basic question. I'm reading a book on hbase and learing but most of the examples in the book(and well as online) tend to be using Java(I guess because hbase is native to java). There are a few python examples and I know I can access hbase with python(using thrift or other modules), but I'm wondering about additional functions?
For example, hbase has a 'coprocessors' function that pushs the data to where your doing your computing. Does this type work with python or other apps that are using streaming hadoop jobs? It seems with java, it can know what your doing and manage the data flow accordingly but how does this work with streaming? If it doesn't work, is there a way to get this type of functionality(via streaming without switching to another language)?
Maybe another way of asking this is..what can a non-java programmer do to get all the benefits of the features of hadoop when streaming?
Thanks in advance!
As far as I know, you are talking about 2(or more) totally different concepts.
"Hadoop Streaming" is there to stream data through your executable (independent from your choice of programming language). When using streaming there can't be any loss of functionality, since the functionality is basicly map/reduce the data you are getting from hadoop stream.
For hadoop part you can even use pig or hive big data query languages to get things done efficiently. With the newest versions of pig you can even write custom functions in python and use them inside your pig scripts.
Although there are tools to make you use the language you are comfortable with never forget that hadoop framework is mostly written in java. There could be times when you would need to write a specialized InputFormat; or a UDF inside pig, etc. Then a decent knowledge in java would come handy.
Your "Hbase coprocessors" example is kinda unrelated with streaming functionality of hadoop. Hbase coproccessors consists of 2 parts : server-side part, client-side part. I am pretty sure there would be some useful server-side coprocessors embedded inside hbase with release; but other than that you would need to write your own coprocessor (and bad news: its java). For client side I am sure you would be able to use them with your favorite programming language through thrift without too much problem.
So as an answer to your question: you can always dodge learning java; still using hadoop to it's potential (using 3rd party libraries/applications). But when shit hits the fan its better to understand the underlaying content; to be able to develop with java. Knowing java would give you a full control over hadoop/hbase enviroment.
Hope you would find this helpful.
Yes, you should get data local code execution with streaming. You do not push the data to where the program is, you push the program to where the data is. Streaming simply takes the local input data and runs it through stdin to your python program. Instead of each map running inside of a java task, it spins up and instance of your python program and just pumps the input through that.
If you really want to do fast processing you really should learn java though. Having to pipe everything through stdin and stout is a lot of overhead.
We have two code bases, one written in C++ (MS VS 6) and another in Java (JDK 6).
Looking for creative ways to make the two talk to each other.
More Details:
Both applications are GUI applications.
Major rewrites or translations are not an option.
Communications needs to be two-way.
Try to avoid anything involving writing files to disk.
So far the options considered are:
zero MG
RPC
CORBA
JNI
Compiling Java to native code, and then linking
Essentially, apart from the last item, this boils down to a choice between various ways to achieve interprocess communication between a Java application and a C++ application. Still open to other creative suggestions!
If you have attempted this, or something similar before please chime in with your suggestions, lessons learnt, pitfalls to avoid, etc.
Someone will no doubt point out shortly, that there is no one correct answer to this question. I thought I would tap on the collective expertise of the SO community anyway, and hope to get many excellent answers.
Well, it depends on how tightly integrated you want these applications to be and how you see them evolving in the future. If you just want to communicate data between the two of them (e.g. you want one to be able to open a file written by the other, or read a stream directly from the other), then I would say that protocol buffers are your best bet. If you want the window rendered by one of these GUI apps to actually be embedded in a panel of the other GUI app, then you probably want to use the JNI approach. With the JNI approach, you can use SWIG to automate a great deal of it, though it is dangerously magical and comes with a number of caveats (e.g. it doesn't do so well with function overloading).
I strongly recommend against CORBA, RMI, and similarly remote-procedure-call implementations, mostly because, in my experience, they tend to be very heavy-weight and consume a lot of resources. If you do want something similar to RMI, I would recommend something lighter weight where you pass messages, but not actual objects (as is the case with RMI). For example, you could use protocol buffers as your message format, and then simply serialize these back and forth across normal sockets.
Kit Ho mentioned XML or JSON, but protocol buffers are significantly more efficient than either of those formats and also have notions of backwards-compatibility built directly into the definition language.
Use Jacob ( http://sourceforge.net/projects/jacob-project ), JCom ( http://sourceforge.net/projects/jcom ), or j-Interop ( http://j-interop.org ) and use COM for communication.
Since you're using Windows, I'd suggest using DDE (Dynamic Data Exchange). There's a Java library available from Java Parts.
Dont' know how much data and what type of data you wanna transfer and communicate.
But to simplify the way, I suggest using XML or Json based on HTTP protocol.
Since there are lots of library for both applications and you won't spend too much effort to implement and understand.
More, if you have additional applications to talk with, it is not hard since both tech. are cross-languages.
correct me if i am wrong
I know of at least one post which has same words like this. But this is not exactly same as that post. I'm trying to work a way to "share" data between a .NET and Java application. I'm not concerned about objects, but just plain strings if u like.
I have a .NET application capturing real-time data and a Java application which has capability to analyze and work on this data. I'm looking for ways to re-use this same java app without coding it entirely in .NET.
My problem is that the data is "fairly" REAL-Time (.NET), and so has to be the analysis (Java). I can live with microsecond delays but I can't afford one second delay. WebServices, Queues (as in Messaging Queues), RDBMS are some of the options I can think of. Is there any better way?
Or has anybody got some real performance numbers for the solutions I mentioned above to select one of them? And just to get started: RDBMSs' are not "THAT" good for concurrent (connections doing) insertion/updation/reading, at least with the crude way of doing DBMS stuff. (Deadlocks?)
What are "objects" if not a mechanism for describing "data"? But I digress - I suspect I would look at a TCP socket between the two. If the data is very basic, then fine - just write directly to the stream; if there is any complexity, perhaps use something like "protocol buffers" to provide an easy way of reading/writing dense data to a stream without having to write every last byte yourself.
I think microsecond delays are going to be a challenge for any approach here... will millisecond delays do?
For completeness:
Another possible is to use Named pipes, it should be pretty quick, and I'd imagine (being a java guy I can only imagine) that .NET has native support for them. The down side is that on windows you'll have to either write a JNI extension or use a library like JNA to poke around at the Win32 API from Java.
Sounds like a local socket could do. The latency should be in low ms or less.
Depending on your program you may get some milage out of what #Cowan reports in answer to 'Any Concept of shared memory in java', his answer is: Any concept of shared memory in Java
In summary: he say's that you can use memory mapped files between two processes on the same machine. This in theory could work between .NET and java assuming .NET has some memory mapped file support.
Different machines communicate with each other by sending messages into sockets. Please check the below link for example.
Socket programming in the real world
Answers provided here are great. One idea that might be of interest, but is probably asking for more trouble than it's worth is to load both VMs in a single process (both the JVM and the CLR can be loaded within a native Windows application) and give them access to native code. Java via JNI and .Net via the mapping functions to native code that they allow.
You could also leverage native queue semaphores to wake up a thread on one side or the other when data is updated.
While JNI transitions are expense, they would probably still be faster than the native local socket implementation.
How is your Java application currently deployed? It sounds to me like you're willing to make some modification to it, so I'm assuming you have access to the source code.
I know this is a little out there, but could you compile the Java application in the J# compiler, so that your .NET app has native access to it?
You can convert your compiled java application to .NET by IKVM. After that you can change logic of your .NET application so it will not make data transfers to Java application, but just call data processing code written in Java as it were written and compiled for .NET.
There are a number of JMS servers which support .NET and Java clients. These can perform messages in under a millisecond.
However you might like to try an RPC solution like Hessian RPC or Protobuf RPC. These can achieve lower latencies and can give the appearance of direct calls between platforms. These support .NET and Java as well.