There is RMI, which I understand to be relatively fragile, direct Socket connections, which is rather low level, and Strings, which while about as solid as it gets seems to be the metaphorical PHP.
What less basic options do I have for internet based Client/Server communication ? What are the advantages/disadvantages ? What are some concerns I should take into consideration ? Third party library suggestions are fine, as long as they stay platform independent (i.e. no restrictive native code).
Looking for options, not a definite answer, so I'm leaving the details of my own requirements blank.
As you specified "internet based", there's a lot to be said for an HTTP-based, RESTful approach (I've highlighted some concerns you should consider):
Pros:
You can use/abuse one of the myriad web-tier frameworks for the server side (e.g. Spring MVC, Play!)
The low-level work has been done on client-side (Apache HTTPClient)
Plain-text protocol is easy to debug on the wire
Tons of tools available to help you debug the interactions (e.g. SoapUI) - you can pretend to be client OR server and so develop in isolation until the other end is ready
Using a well-known port (80/443) makes punching through corporate firewalls a whole lot easier
Cons:
There's a fairly major assumption that the server will be doing the lion's share of the work - if your model is "inverted" then it might not make much sense to be RESTful
Raw performance will be lower than a bits-on-the-wire socket-based approach
Plain-text protocol is easy to sniff on the wire (SSL can remedy this)
What does 'relatively fragile' mean? The issues with RMI have to do with it being a large superstructure on a narrow foundation, and specifically that it has a large reliance on DNS and Object Serialization. The closer you are to the silicon, the less 'fragile' any program becomes, but the more code you have to write. It's a tradeoff. I'm not a one-eyed RMI supporter, despite having written a book about it, but 'fragility' is too strong a word. It does what it does, and it does that reasonably well. RMI/IIOP does it even better in many respects if you need large-scale scalability. If your view of the world is an at-most-once remote method call without too much security, RMI/JRMP will give it to you. The further you get from that model the harder it becomes to apply.
Related
I'm currently considering using java in one of my projects(for reasons unrelated to networking). At the moment I'm using C++ and a custom protocol built on top of UDP. My problem here is that while the added efficiency is nice for sending large amounts of realtime-data, I'd rather have something along the lines of RPCs for pure "logic actions" such as login. RPC's in C++ are hard to do though, since standard C++ itself has no notion of serialization.
In another answer, I found Java's RMI, which seems to be similar to RPCs, but I couldn't find how efficient/responsive it is, nor whether it could be plugged into my existing UDP socket, since I don't want to have two ports open on my server program.
Alternatively, since I think Java has serialization, I could implement RPC's myself, depending on how straightforward deserializing an arbitrary stream of objects in java is. Still, if this would require me to spend days on learning the intrinsics of java, this wouldn't be an option for me.
If you're interested in RPC, there is always XML-RPC and JSON-RPC, both of which have free/open-source C++ implementations. Unfortunately, most of my development has been in Java, so I can't speak to how usable or effective they are, but it might be something to look into since it sounds like you have already done some work in C++ and are comfortable with it. They also have Java implementations, so you might even be able to support both Java and C++ applications with XML-RPC or JSON-RPC, if you want to go down that route.
The only downside is that it looks like most of these use HTTP connections. One of the things you wanted to do was to reuse the existing connection. Now, I haven't looked at all of the implementations, but the two that I looked at might not meet that requirement. Worst case is that perhaps you can get some ideas. Best case if that there might be another implementation out there somewhere that does what you need and you now have a starting point to find it.
The use of RPCs as an abstraction do not preclude the use of UDP as the transport layer: RMI is an RPC abstraction that generally used TCP under the hood (last time I looked).
I'd suggest just coding up a Java layer to talk your UDP protocol: you can use any one of many libraries to do it and you don't have to discard all your existing work. If you want to wrap an RPC layer around your protocol no reason why you can't do that: create a login method that sends the login UDP packet and receives the appropriate response and returns it.
If it's a remotely serious project, you should probably take a look at Netty.
It's a great library for developing networked systems, has a lot of proven production usage and is well suited for things like TCP or UDP client-server communication. I wouldn't go reinventing this wheel unless you really have to :-)
As a bonus they have some good examples and documentation too.
Why WebServices took advantage over CORBA ?
I suspect everything started from the firewall issues: CORBA requests are binary, multiple random ports are required for the normal work so CORBA requests and responses used to be blocked by firewalls when these first appeared. HTTP and FTP also use phantom ports, but these protocols were in much more wider use so immediately was obvious that firewalls must be configured to allow them. As a result, developers cannot rely on possibility to have CORBA connection between the server and end user PC and need to use some more firewall friendly approach.
Firewalls appear much less an issue in communication between the specialized servers than can use separate networks, IP/MAC filtering, specialized firewalls and the like. I think CORBA, same as JDBC, are still used to communicate data between servers.
It may also be a factor that CORBA messages use aligned fields (to match boundary alignments as they are used in C/C++ data structures). Derived protocols (like Google Protocol buffers) do not send unnecessary bytes just for alignment. Hence they messages are compact and these protocols may be preferred when binary messages and fast pre-generated message parsers are desired. Protocol buffers that appear for me rather similar to CORBA by design (IDL-like compiler, stubs and servants, binary messages,
language interoperability) is really far from decline, being internally used in many Google services.
While CORBA framework is complex, a "properly done" web service stack is also not exactly trivial, so I do not think that the complexity of the standard has been an issue. Similarly, while the original OMG specification documents may appear horrible, the similar SOAP/WDSL specifications are equally complex, probably it is difficult to document a standard in an easy to read way.
CORBA protocols are not proprietary, they have been implemented in Free software many times, including JacORB and also GNU/Classpath implementations (well, now OpenJDK is also Free).
While initially CORBA may have been thought to provide what web services provide today, I think I agree that it seems that for this application CORBA has "lost".
However, as an RPC technology that is supported on a wide array of platforms (embedded), works and is tested well from native (C++), and can be used to implement pretty performant data transfer scenarios, I do think that CORBA still has an important niche. It's just got nothing to to with "the web".
To directly refer to the question: I like to think that CORBA "lost" because, unlike WebServices, it is not targeted at The Web as it is used today -- As in: Tunneling everything through port 80, running apps 60% in the browser and 40% in a webserver, etc.
I don't think it's a stretch to say that web services have won out in the marketplace. CORBA is a niche at best, and a small one at that.
Web services:
Simpler, although WS-* can add weight and complexity
Use HTTP as wire protocol instead of proprietary
Can tunnel through port 80 in firewall
Services aren't as complete
CORBA:
Requires ORB to operate
Proprietary vendors or open source
May use HTTP, but also use proprietary protocols
Provide services like naming, directory, transaction, security, etc.
CORBA lost primarily because of two things:
1) lack of good development/testing tools or IDE-plugins
2) See (1) again
I'm developing a Java application that consists of a server and a client (possibly multiple clients in future) which may run on different hosts.
For communication between these two I currently use a custom protocol which consists of JSON messages that are sent over network sockets and that are converted back to Java Bean objects on both sides. However the more complex the application gets I notice that this method doesn't meet my standards and is too complex.
I'm looking for a well established, possibly standardized alternative.
I've looked at Remote Method Invocation (RMI) but read that the protocol is slow (big network overhead).
The technology I'm looking for should be lightweight (protocol and library wise), robust, maybe support compression (big plus if it does!), maybe support encryption, well document and well established (e.g. an Apache project). It should be as easy as calling a method on a remote object with RMI but without its disadvantages.
What can you recommend?
Avro is an Apache project that is designed for cross-language RPC (see Thrift for its spiritual predecessor). It is fairly new (less than two years old), so it isn't as well-established as RMI, for example. You should still give it a chance, though; large projects like Cassandra are moving to Avro. Avro is also a sub-project under Hadoop and has been receiving healthy support from that community.
It designed to be fast and support multiple languages, so you will probably need to introduce another step during compilation in which you translate an Avro IDL file into Java, although it isn't strictly necessary. The rest is typical RPC.
One nice thing about Avro is that its transport layers are independent of how data is represented. For example, it comes with various "transceivers" (their base communication class) for raw sockets, HTTP, and even local intra-process calls. HTTPS and SASL transceivers can provide security.
For representing data, there are encoders and decoders of various types, although the default BinaryEncoder generally suffices since Hadoop, Cassandra, etc... focus on efficiency. There is also a JsonEncoder in case you find that useful.
This really all depends on what kind of compatibility you require between client and server. CORBA is a well established and standardized way of communicating between different languages, but it requires a bit more effort to use than Java RMI. If the clients are running from some external, untrusted source, then an HTTP based protocol makes more sense. If you follow a REST approach, then it becomes easier to scale out later as you need to add more servers.
If both client and server are Java, and they are running within a trusted network, RMI meets your requirements for being "well established". Performance overhead of RMI is exaggerated, but very early versions did not pool connections.
If you're willing to toss away both "well established" and "standardized", you can use Dirmi as a substitute for RMI. It's faster, easier, has more features, and it doesn't have the firewall problems RMI has. Like RMI, it supports TLS (encryption), but neither supports built-in compression.
Whatever you choose, beware of lock-in. Try to design your server such that the remote access layer is a thin layer over the core code. This allows you to easily support multiple protocols, perhaps at the same time.
Mybe CORBA?
Would you consider HTTP/REST?
If so, you can leverage something like a Tomcat/Spring, and still support all the requirements you listed ( robust, lightweight, well documented, well established )
The RPC based protocols are simply antiquated.
Seriously, unless you're doing a web app that already requires the web baggage, you really do want RMI or, even better, CORBA. I recommend JacORB (www.jacorb.org).
Ignore general claims of slow/fast and perform your own performance tests.
Keep in mind that a software project is successful because it performs the useful function for which it was designed and intended, not because it uses the latest cool buzzword tech.
Good luck.
Apache MINA library for client-server communication and EJB3 will suit best
if you needed to build a highly scalable web application using java, what framework would you use and why?
I'm just reading thinking-in-java, head first servlets and manning's spring framework book, but really I want to focus on highly scalable architectures etc.
would you use tomcat, hibernate, ehcache?
(just assume you have to design for scale, not looking for the 'worry about it when you get traffic type responses)
The answer depends on what we mean by "scalable". A lot depends on your application, not on the framework you choose to implement it with.
No matter what framework you choose, the fact is that the hardware you deploy it on will have an upper limit on the number of simultaneous requests it'll be able to handle. If you want to handle more traffic, you'll have to throw more hardware at it and include load balancing, etc.
The part that's pertinent in that case has to do with shared state. If you have a lot of shared state, you'll have to make sure that it's thread safe, "sticky" when it needs to be, replicated throughout a cluster, etc. All that has to do with the app server you deploy it to and the way you design your app, not the framework.
Tomcat's not a "framework", it's a servlet/JSP engine. It's got clustering capabilities, but so do most other Java EE app servers. You can use Tomcat if you've already chosen Spring, because it implies that you don't have EJBs. Jetty, Resin, WebLogic, JBOSS, Glassfish - any of them will do.
Spring is a good choice if you already know it well. I think following the Spring idiom will make it more likely that your app is layered and architecturally sound, but that's not the deciding factor when it comes to scalability.
Hibernate will make your development life easier, but the scalability of your database depends a great deal on the schema, indexes, etc. Hibernate isn't a guarantee.
"Scalable" is one of those catch-all terms (like "lightweight") that is easy to toss off but encompasses many considerations. I'm not sure that a simple choice of framework will solve the issue once and for all.
I would check out Apache Mina. From the home page:
Apache MINA is a network application
framework which helps users develop
high performance and high scalability
network applications easily. It
provides an abstract · event-driven ·
asynchronous API over various
transports such as TCP/IP and UDP/IP
via Java NIO.
It has an HTTP engine AsyncWeb built on top of it.
A less radical suggestion (!) is Jetty - a servlet container geared towards performance and a small footprint.
The two keywords I would mainly focus on are Asynchronous and Stateless. Or at least "as stateless as possible: Of course you need state but maybe, instead of going for a full fledged RDBMS, have a look at document centered datastores.
Have a look at AKKA concerning async and CouchDB or MongoDB as datastores...
Frameworks are more geared towards speeding up development, not performance. There will be some overhead with any framework because of use cases it handles that you don't need. Granted, the overhead may be low, and most frameworks will point you towards patterns that have been proven to scale, but those patterns can be used without the framework as well.
So I would design your architecture assuming 'bare metal', i.e. pure servlets (yes, you could go even lower level, but I'm assuming you don't want to write your own http socket layer), straight JDBC, etc. Then go back and figure out which frameworks best fit your architecture, speed up your development, and don't add too much overhead. Tomcat versus other containers, Hibernate versus other ORMs, Struts versus other web frameworks - none of that matters if you make the wrong decisions about the key performance bottlenecks.
However, a better approach might be to choose a framework that optimizes for development time and then find the bottlenecks and address those as they occur. Otherwise, you could spin your wheels optimizing prematurely for cases that never occur. But that probably falls in the category of 'worry about it when you get traffic'.
All popular modern frameworks (and "stacks") are well-written and don't pose any threat to performance and scaling, if used correctly. So focus on what stack will be best for your requirements, rather than starting with the scalability upfront.
If you have a particular requirement, then you can ask a question about it and get recommendations about what's best for handling it.
There is no framework that is magically going to make your web service scalable.
The key to scalability is replicating the functionality that is (or would otherwise be) a bottleneck. If you are serious about making your service, you need to start with a good understanding of the characteristics of your application, and hence an idea of where the bottlenecks are likely to be:
Is it a read-only service or do user requests cause primary data to change?
Do you have / need sessions, or is the system RESTful?
Are the requests normal HTTP requests with HTML responses, or are you doing AJAX or callbacks or something.
Are user requests computation intensive, I/O intensive, rendering intensive?
How big/complicated is your backend database?
What are the availability requirements?
Then you need to decide how scalable you want it to be. Do you need to support hundreds, thousands, millions of simultaneous users? (Different degrees of scalability require different architectures, and different implementation approaches.)
Once you have figured these things out, then you decide whether there is an existing framework that can cope with the level traffic that you need to support. If not, you need to design your own system architecture to be scalable in the problem areas.
If you are able to work with a commercial system, then I'd suggest taking a look at Jazz Foundation at http://jazz.net. It's the base for IBM Rational's new products. The project is led by the guys that developed Eclipse within IBM before it was open-sourced. It has pluggable DB layer as well as supporting multiple App Servers. It's designed to handle clustering and multi-site type deployments. It has nice capabilities like OAuth support and License management.
In addition to the above:
Take a good look at JMS (Java Message Service). This is a much under rated technology. There are vendor solutions such as TibCo EMS, Oracle etc. But there are also free stacks such as Active MQ.
JMS will allow you to build synch and asynch solutions using queues. You can choose to have persistent or non-persistent queues.
As others already have replied scalability isn't about what framework you use. Sure it is nice to squeeze out as much performance as possible from each node, but what you ideally want is that by adding another node you scale your app in a linear fashion.
The application should be architected in distinct layers so it is possible to add more power to different layers of the application without a rewrite and also to add different layered caching. Caching is key to archive speed.
One example of layers for a big webapp:
Load balancers (TCP level)
Caching reverse proxies
CDN for static content
Front end webservers
Appservers (business logic of the app)
Persistent storage (RDBMS, key/value, document)
We have a C++ application on Windows that starts a java process. These two apps need to communicate with each other (via snippets of xml).
What interprocess communication method would you choose, and why?
Methods on the table for us are: a shared file(s), pipes and sockets (although I think this has some security concerns). I'm open to other methods.
I'm not sure why you think socket-based communication would have security concerns (use SSL). It is often a very good approach as it is language agnostic, assuming that you have a well-defined communication protocol. Have a look at Google's protocol buffers, for example - they generate the required Java classes and streams.
In my experience, file systems (especially network file systems) are not well suited to such communication as they are not necessarily tuned for messaging (I've seen caching issues result in files being not picked up by the target process for example).
Another option is a messaging layer (AMQ or Tibco for example) although this will likely involve a greater administrative overhead (plus expertise) to set up.
Personally I would opt for a pure-socket approach because of its flexibility and simplicity. You will be in complete control.
I've used named pipes for communication between C# and a cross-platform c++ app and had nothing but good results. Barring that sockets is definitely the way to go.
Sockets are nice. They give you the ability to very easily create a blackbox testing layer around each component, as well as run each component on its own machine.
Security is definitely a concern, but there are a good range of options depending on how important it is. You can use SSL, custom handshaking, password protected logins and firewalls to help secure it.
Edit:
Not something I'd recommend, but there's also shared memory using JNI. Just thought I'd mention it because it's not on your list.
Ice is pretty cool :)