Linux\BSD Network Programming

Linux\BSD Network Programming - java

Looking to write an SNMP and Netflow tool for Linux\BSD and seeking advice on language selection, C or Java.
The tool will collect Netflows, send and recieve SNMP queries, connect to a Postgresql Databases and will be fronted by a web interface (PHP), in the future it will interface with devices using web services.
Normally I would have reached for C to implement the above, plenty of robust libraries and low level access to the network stack but the database access and web services could be implemented easier (better?) in Java.
Question is whether Java is up to the task of processing all this network information under load or should I stick with lower level access provided by C?
Supplemental question, I've been considering making this a hybrid application. Heavy lifting in C and doing the higher level stuff in Java. Experiences and thoughts on this are welcome.

Java's implementations today are robust and mature, so your worry about whether they're "up to processing ... under load" are misplaced. C has its advantages (tiniest memory footprint, fastest startup times), but you pay dearly for them in terms of programming work needed to do your own memory management. It doesn't appear from what you say that minimizing memory or optimizing frequent restart is a big deal for this app, anyway. Why don't you start with Java (or whatever other high-level language you're most comfortable with) and only consider recoding some parts in C, maybe, if and when your profiling shows CPU- or memory-bottlenecks arising from the higher-level language use? (I'd bet you'll most likely end up not needing such recoding, btw).

I would absolutely go with Java on this. It's entirely capable of handling the "load". I work on Java projects which are responsible for processing extremely large amounts of data in real-time and without issue.
Java wont struggle one bit for what you are talking about doing here and it will be far easier and quicker to develop in.

Everyone's nailed this; the modern JVM implementations are in the same ballpark as C for speed, unless you're doing direct hardware access.
I'm curious why you'd consider a Java backend for a PHP frontend. C/PHP would make sense, but if you're going for Java on the backend, it might help to have the same language throughout for easier maintainability.

Related

Best practices for calling native code from Java

I intend to swap out some hotspots in my Java code for something native (perhaps C++, but still to be decided). What are the modern choices for how to do this? (I don't intend or want to port the whole application).
I might for example use:
JNI
JNA
Spawn the native piece as a separate process, then communicate using:
TCP
UDP
shared memory
stdin/stdout
ZeroMQ..
I have complete freedom at this point of how I build this. Is there a recommended best practice approach?

This is a very broad question. The best practice is going to be different depending on each "hot spot" your system has, and what alternatives you have available to you for native processing.
For example, in a Java application that I work with, we have several sections of graphics processing, including transformation from one format to another. While third party Java libraries do exist and I've tested several, I haven't found one that is as efficient and accurate at transforming PDFs to TIFFs as a C++ library I access via JNI. Why JNI over JNA? Performance based on the version we tested with; the access via JNI code had a faster throughput. This might not hold for all applications, but it did for my implementation.
Your hot spots could simply be an issue with your coding architecture and would simply need some investigation and refactoring within Java and not require any native replacement.
One application I refactored conducted several tasks in a procedural fashion. Even multi-threading the entire procedure wasn't working. I broke the procedure down into 8 different stages, and identified the bottleneck, which was then handled by a multi-threaded worker queue. Now the system runs efficiently, and is still 100% Java.
Communication to non-Java applications is also dependent on what facilities exist for the non-Java application to accept that communication. I've had scenarios where I've used Runtime.exec() to spawn a native process, dropped a data file in a folder for external processing, and made calls to Web Services. The "native" solution really depends on (1) what your available native resources are, (2) which native resource is the best for handling this "hot spot" use case, and (3) what accessibility to said native resources is available. And even then, refactoring your existing code might still be the best bet.
To reiterate: any given "hot spot" can be due to a number of factors, each requiring individual attention that a broad answer here cannot provide. Your best bet is to identify each "hot spot", and evaluate them individually as to how best to optimize them; if necessary, then ask a "hot spot"-specific question for handling it.

If the function is reasonably short (~milliseconds or less), in-process might be the only reasonable option. (When running a separate process, context switch latencies on each call may outweigh the execution time).
JNI is a portable API layer that is a de-facto standard; raw JNI is, however, not easy to use. JNA uses JNI under the hood, but it is higher level and easier to use, and quite popular. It might be your safest bet. You could also consider https://github.com/dejwk/janet (which I wrote); it also uses JNI under the hood, but I think it's even easier to use than JNA.

Using IPC to combine multiple languages

This is a general "noob" question about software design, so I apologise if it seems vague,
but I would really appreciate the advice. Note the system described below is purely an example, not a specific product I have in mind.
I often have a need to combine the functionality of several libraries or utilities, written in different languages. For example, if I want to code a high-performance audio processing application for the desktop, I will write it in C / C++. Then, I want to add a nice GUI. But I don't want to learn Qt. I like the look and feel of Adobe Air, and would like to use that. Later, I have a need to access a USB device. But the USB library I have only has an API in Java. How can I combine all these elements together, to take advantage of their relative strengths?
Clearly, I cannot compile these various elements into one single executable. So I need to create and run them seperately, and give them a means to communicate. The most common way to do this seems to be using IPC (Inter Process Communication), eg shared memory or sockets. I prefer the idea of sockets, as the programs could potentially run on seperate machines on a network.
So I decide to create a local client / server system, with a custom API, to allow these elements to communicate. For example, the Air application will receive a message from the C application, telling it to update it's UI. The USB application running in Java will use the sockets to stream audio from the USB hardware, into the C application.
My question : is using local sockets in this way a typical way to design such a system?
Will the performance be much worse than a truly native application (e.g. everything in Java or C, in a single executable) ? It also seems likely that such an approach would be prone to bugs, and difficult to maintain?
I frequently find myself coming up against the limits of existing software libraries (e.g. a graphics library with a pretty, flexible UI but no way to access low-level hardware, or a media library that can mix many audio streams, but has no support for video playback), and find it very frustrating. If anyone could advise the best way to combine arbitrary software libraries like this, I would really appreciate it.
Thanks in advance!

As you have correctly identified, combining libraries from different language or platforms is hard. There are several ways to do it, but none are ideal. Examples:
Native call interfaces (e.g. JNI / JNA) - very fast but tricky to make work correctly, and you have the problem that the data types used typically don't map cleanly across different platforms. Adds native dependencies.
Socket based IPC with text protocol (XML, JSON, etc) - works OK and common formats are likely to be supported at both ends, but adds a lot of overhead. Can be a pain to maintain custom schema mappings etc.
Socket based IPC with binary protocol (e.g. Google protocol buffers) - quite efficient, needs a lot of work to get a custom protocol working correctly on both ends
Communication via a 3rd system (e.g. database, message queue, filesystem) - lots of overhead, can get fragile, introduces a major dependency on a 3rd system.
In my experience, it usually isn't worth integrating a new language / platform just to get one specific library or feature. Take your user interface example - no matter how nice Adobe Air looks, I doubt it is worth trying to integrate it with an existing C/C++ application.
Even if you get it to work, it will significantly complicate the future maintenance and devlopment of your application. Builds become more complex. You need to maintain additional communication / "glue" code. You need to manage more dependencies. Your users will get hit by many more configuration issues. Testing becomes more difficult. It becomes harder to teach someone new about how the whole system works. You need to maintain your skills in more languages / frameworks etc.
I'd recommend the following strategy:
Pick a primary platform
Whenever you need a new library or feature, look for something on your primary platform first. Hopefully (usually?) there is something good available - but even if not then it might be worth coding something yourself if the requirement is quite small.
Only if there is no reasonable option on the primary platform, then you can start to think about integrating a new language/platform
In terms of primary platform, I'd normally suggest a JVM language like Java, Scala or Clojure since the JVM is very well engineered, offers great performance, is highly portable and has the largest / most cohesive library ecosystem (most of which is open source). The JVM is therefore probably the best "general purpose" choice unless you have some very specific requirement which is unlikely to be possible on the JVM, e.g.:
If you are doing lots of embedded / realtime / systems programming wthat requires hardware access you probably need to go for C/C++
If you are coding purely for web-based clients, you probably want to use JavaScript (if you are also writing code on the server side you can consider JavaScript code generation frameworks/libraries that can work on the JVM, e.g. Vaadin or ClojureScript)

the answer is pretty much depends on the technologies you're using and there is no silver-bullet solution for this.
In general, this solutions will fall into one of the following categories:
Some interprocess communication techniques
Integrations provided by the language/platform itself
Database/some common storage (even files :) )
Example of the first:
Sockets/pipes/whatever you operating system allows.
CORBA - allows to write distributed code in different languages.
Google protobuf - allows serialization/deserialization of data-objects and its language agnostic
For the second it really depends on language/ecosystem you're using.
Examples for java:
JNI - Java Native interface - allow to execute code (dlls/so) outside the JVM.
JCA - if you're in the enterprise environment - you can write the integration with the legacy systems in this.
For languages that are compiled into the native code its less tricky - you can write and compile some code, say in Pascal, and then use the DLL in C.
Sometimes when we're talking about Java there is a plethora of languages that have their own syntax and compiler, but their compiler compiles into java binary code that can be run inside the jvm. So if your solution is based on these languages the integration will be easier. Languages like Scala, Groovy, Closure, Jython and so on are falling into this category.
The last but not the least technology to be mentioned is Web Services. This is a very popular tool for integration of different system, although its more used in enterprise environment.
Basically its an abstraction over the sockets layer that allows to send data objects in XML/JSON format between the processes/servers. Both of XML and JSON are language agnostic, so its not an issue to create an XML in a program written in C++ and then consume it in JAVA.
Hope this helps

What's the best language/platform for SOA on Linux in a post-Java world?

I need to choose a language/platform for the new development of a series of services in a SOA. I'm looking into Scala and Clojure but don't think the community and products are mature enough for a real-world enterprise product yet.
Update/Clarifications:
Of course we can use many languages/platforms for SOA but some language/platforms are easier and more suited for an SOA. IMO the best ones for SOA should allow interface programming (to ease definition of contracts), should have options for hosting the services (like Felix for Java or WCF in .NET) and scale well (see Twitter issues with RoR).
Java has always been the favourite in the enterprise market. However, many developers are looking into dynamic languages as well as talking about stagnation of Java after v6. As a result many new post Java languages have arrived: Scala, Clojure and Groovy to name a few that still run on JVM but are not Java.
I hope these clarify the question.

Depends what you mean by "mature enough for a real-world enterprise product", and your relative level of tolerance for living on the cutting edge.
For example, I'm currently building a "real-world enterprise product" in Clojure (I'd have been equally happy with Scala, it was only that Clojure fitted my needs slightly better from the concurrency and meta-programming perspective).
I'm very happy with my decision.
Some quick perspectives if you are considering this "post-Java" path:
The communities are great and supportive, but you'll still have to solve problems yourself, if only because nobody else has run into the same problem yet. None of these are likely to be insurmountable, but it does present a bit of extra risk to delivery schedules.
Both Scala and Clojure can be very productive (in terms of value delivered to customers per hour coding), but you can equally well write bad and unmaintainable code in any langauge. Java pretty much forces you to write things in a standardised, somewhat verbose but syntactically simple and understandable way. With Scala and Clojure you get a whole new arsenal of crazy ways to hit your target or shoot yourself in the foot. Is your team going to be able to make the best use of Scala/Clojure advantages?
It's harder (though by no means impossible) to bring skilled people on board with existing Clojure/Scala skills. On the flipside, the people who do have these skills (or are keen to acquire them) are likely to be among the more talented / motivated developers so the search may still be productive.
Be prepared to make tough decisions regarding whether to target language/library features that are "just round the corner". For example, do you wait for the enhanced primitive support coming in Clojure 1.3? Or make do with the perfectly adequate but slower boxed primitive functions in Clojure 1.2?
A great benefit of being on the JVM is that you can still take full advantage of the Java ecosystem without being tied to Java as a language. Don't underestimate how useful this is: for example, I use a number of extremely well tested, mature Java libraries (e.g. Netty) pretty much transparently in my Clojure application. This significantly reduces your risk and the amount of new development that you need to do.

At the moment (having just completed a services/integration project ) Jersey on top of Spring are right up there on my favourites list for web services.
I can't offer any suggestions for a SOA framework, last time I was involved in that type of thing we user Oracle BPEL Process Manager and I have mixed feelings about it. We weren't using REST then either and I'm not sure how well the Oracle software works with it.

For me python seems the easiest way to do some SOA and have interoperability with Windows computers. I don't have frameworks name but there is a lot of them in SOAP, REST, RPC...

Bluray Burner in Java - Where to start?

Like the subject of this post suggests, I am looking at developing a suite like nero which helps burn bluray discs. I am kind of clueless as to where to start. Is there anything in Java API that lets you do this? If I were to start from scratch, would I need to start with the bluray disc spec? Are there any open source tools which are already doing this? I tried searching at sourceforge.net and found nothing useful. Any help is much appreciated.

To start with the obvious: Know your requirements and tools. I try guessing here, maybe.
Requirements:
Should burn BluRay discs
Graphical user interface
Preferred tool:
Java
Now, Java, being perhaps the prime example of a VM language from the 90es, achieves its relatively good platform-agnosticism by virtue of its VM. It's a language designed to run on a virtual hardware to ease portability to real hardware.
Now, what comes with this fact is that you abstract away many things you would have to care about, like memory-management details and architecture or platform-specifics. Among those things you can't reliably get access to is hardware. After all, you abstracted most of that away.
Now, to burn a BluRay disc you have to access hardware, in particular the BluRay writer. Not that it's impossible but Java is, in my humble opinion, not the right tool for this. You can go out of your way by implementing a library in C or C++ and using JNI/JNA to access that but looking at that, what do you really gain?
Java is usually a choice when you need a fairly modern high-level language with a large standard library and you also need your programs to run on more than one platform. Those are the primary use cases. It's not impossible with other technologies, but perhaps harder to achieve, depending on what exactly you need.
If you implement a native library to talk to the BluRay writer and talk to that from Java, then you necessarily need to re-implement it for other platforms as well (assuming that's what you want—if not, then again: Why Java?).
TL/DR version: My point is that it's not too surprising that you can't find much on exactly that topic. For one, Java wasn't really designed to do that sort of things. Most of the Java/native interop lies in the JVM and that's already an awful lot of code. Don't expect Java to natively support very rare usage scenarios such as CD/DVD/BluRay burning. Secondly, BluRay is a relatively new technology with writers not yet common hardware in computers such as CD/DVD writers, so the lack of libraries and tools may also be a mirror of the current demands of the market.

Low-level hardware access is simply not possible in pure Java unless it's in the standard API, which Bluray isn't.
Therefore, you will have to use non-Java code to access the hardware; at that point you lose the platform-independance of Java, and necessarily have a multi-language system, which is always more painful to program than using just a single language.
However, if you can find (or, I guess, develop) a multi-platform Bluray writing API or command line tool in (most likely) C, then it might still make sense to write the rest of the app in Java as a GUI wrapper with added functionality.

Why is Java EE scalable?

I heard from various sources that Java EE is highly scalable, but to me it seems that you could never scale a Java EE application to the level of the google search engine or any other large website.
I would like to hear the technical reasons why it is so scalable.

Java EE is considered scalable because if you consider the EJB architecture and run on an appropriate application server, it includes facilities to transparently cluster and allow the use of multiple instances of the EJB to serve requests.
If you managed things manually in plain-old-java, you would have to figure out all of this yourself, for example by opening ports, synchronizing states, etc.
I am not sure you could define Google as a "large website". That would be like likening the internet to your office LAN. Java EE was not meant to scale to the global level, which is why sites like Amazon and Google use their own technologies (e.g., with use of MapReduce).
There are many papers discussing the efficiency of Java EE scalability.
For example this

What makes Java EE scalable is what makes anything scalable: separation of concerns. As your processing or IO needs increase, you can add new hardware and redistribute the load semi-transparently (mostly transparent to the app, obviously less so to the configuration monkeys) because the separated, isolated concerns don't know or care if they're on the same physical hardware or on different processors in a cluster.
You can make scalable applications in any language or execution platform. (Yes, even COBOL on ancient System 370 mainframes.) What application frameworks like Java EE (and others, naturally -- Java EE is hardly unique in this regard!) give you is the ability to easily (relatively speaking) do this by doing much of the heavy lifting for you.
When my web app uses, say, an EJB to perform some business logic, that EJB may be on the same CPU core, on a different core in the same CPU, on a different CPU entirely or, in extreme cases, perhaps even across the planet. I don't know and, for the most part, provided the performance is there, I don't care. Similarly when I send a message out on the message bus to get handled, I don't know nor do I care where that message goes, which component does the processing and where that processing takes place, again as long as the performance falls within my needs. That's all for the configuration monkeys to work out. The technology permits this and the tools are in place to assess what pieces have to go where to get acceptable performance as the system scales up in size.
Now when I try and hand roll all of this, I start with the problems right away. If I don't think about all the proxying and scheduling and distribution and such in advance, when my app expands beyond the bounds of a single machine's handling I now have major rewrites in place as I shift some of the application to another box. And then each time my capacities grow I have to do this again and again.
If I do think about all of this in advance, I'm writing a whole lot of boilerplate code for each application that does minor variations of all the same things. I can code things in a scalable way, but do I want to do this every. damned. time. I write an app?
So what Java EE (and other frameworks) bring to the table is pre-written boilerplate for the common requirements of making scalable applications. Writing my apps to these doesn't guarantee they'd be scalable, of course, but the frameworks make writing said scalable apps a whole lot easier.

One could look at a scalable architecture from the point of view of what the base framework (like Java EE) provides. But that's just the beginning.
Designing for a scalable infrastructure is an architectural art. It's like the art of projection ... how will it behave when it's blown up real big. The base questions are:
Where do I keep commonly accessed stuff so that when so many persons are asking for it, I don't have to go for it so many time (cache)?
Where do I keep each individual's stuff so that when there are so many individuals needing stuff kept, I won't have trouble managing them all.
How do I remember what a person did here the last time they came here, since they may not be coming back to the same particular node they visited the last time.
How long will I have to wait for (block on) a long-running procedure if so many persons are requesting it?
...
that sort of thing is beyond what a framework can wrap. In other words, the framework could be scalable but the product is wired too tight to scale.
Java EE, as a framework is quite scalable, like most modern microprocessor-targeting enterprise frameworks. But I have seen amazing (not in a good way) stuff build out of even the best of them.
For a plethora of references, please search Google for "Designing for Scalability"

The "scalability" thing talks about "what will you do when your application doesn't fit in a single computer anymore?".
Scalable applications can grow over more computers than one.
Note that large servers can have VERY large applications with lots of memory and lots of cpu's - see http://www.sun.com/servers/highend/m9000/ or http://www-03.ibm.com/systems/i/hardware/595/index.html - but it is usually more expensive than having lots of small servers with the application spreading over them.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.