Large amount of data - what is the best way to send them?

Large amount of data - what is the best way to send them? - java

we have this scenario:
A server which contains needed data and client component which these data wants.
On the server are stored 2 types of data:
- some information - just a couple of strings basically
- binary data
We have a problem with getting binary data. Both sides are written in Java 5 so we have couple of ways....
Web Service is not the best solution because of speed, memory etc...
So, What would you prefer?
I would like to miss low level socket connection if possible...
thanks in advance
Vitek

I think the only way to do LARGE amounts of data is going to be with raw socket access.
You will hit the Out of Memory issues on large files with most other methods.
Socket handling is really pretty straight forward in Java, and it will let you stream the data without loading the entire file into memory (which is what happens behind the scenes without your own buffering).
Using this strategy I managed to build a system that allowed for the transfer of arbitrarily large files (I was using a 7+ GB DVD image to test the system) without hitting memory issues.

Take a look at the W3C standard MTOM to transfer binary data as part of a SOAP service. It is efficient in that it sends as a binary and can also send as buffered chunks. It will also interop with other clients or providers:
How to do MTOM Interop
Server Side - Sending Attachments with SOAP

You might want to have a look at protobuf, this is the library that google uses to exchange data. Its very efficient and extensible. On a sidenote, Never underestimate the bandwidth of a station wagon full of 1TB harddisks!

I've tried converting the binary data to Base64 and then sending it over via SOAP calls and it's worked for me. I don't know if that counts as a web service, but if it does, then you're pretty much stuck with sockets.

Some options:
You could use RMI which will hide the socket level stuff for you, and perhaps gzip the data...but if the connection fails it won't resume for you. Probably will encounter memory issues too.
just HTTP the data with a binary mime type (again perhaps configuring gzip on the webserver). similar problem on resume.
spawn something like wget (I think this can do resume)
if the client already has the data (a previous version of it), rsync would copy only the changes

What about the old, affordable and robust FTP? You can for example easily embed an FTP server in your server-side components and then code a FTP client. FTP was born exactly for that (File Transfer Protocol, isn't it?), while SOAP with attachments was not designed with that stuff in mind and can perform very badly.
For example you could have a look at:
http://mina.apache.org/ftpserver/
But there are other implementations out there, Apache Mina is just the first one I can recall.
Good luck & regards

Is sneakernet an option? :P
RMI is well known for its ease-of-use and its memory leaks. Be warned. Depending on just how much data we're talking about, sneakernet and sockets are both good options.

Consider GridFTP as your transport layer. See also this question.

Related

Download acceleration

I have Google'd my butt off, and I can't find anything on this topic.
I am trying to create a download client using Java, and I have figured out how to download files with Java, but I want to accelerate the download speed. I know how this works (opening several connections to the download server), but how can I achieve this?
I am looking for either some detailed explanation of such an algorithm or some code examples.

This is only possible if the server side supports range requests. You can determine that by checking using a HEAD request if the HTTP response header contains Accept-Ranges: bytes. If that is the case, then you can just spawn several threads which downloads the file in parts using the Range header. The URLConnection and ExecutorService are helpful in this.
Keep in mind that you also take the limitation in amount of threads and network bandwidth of your own machine into account.
Related questions:
Reading first part of file using HTTP
How to use URLConnection to fire and handle HTTP requests
Make simultaneous web requests in Java

BalusC described the trick and here is a reference to some source-code you can review and start with:
JDownLoader[Java]: http://svn.jdownloader.org/projects/show/jd
Free Download Manager[CPP]: http://freedownload.svn.sourceforge.net/viewvc/freedownload/
#BalusC Nice Work

I'm a bit unclear, are you writing a Java client that will talk to a server (perhaps a Java servlet?), so you control both sides of the data transfer? If so, you can do nearly anything you want. Java has java.util.zip, which has functions to do the compression.
If you want to download four (or N) files at once, just start up N threads and pass the HTTP requests to the server in parallel. This may not actually improve things, depending on link speed, network congestion, etc.
Writing your own client and making it properly multi-thread safe is a whole lot of work, which is why people just use the Apache HTTP client code. Its rock solid.

Tips about design/implementation of own protocol

Where I work we are in need of a protocol capable of:
User login/logout
Send/recive instructions
Send/recive files
Send/recive audio stream(could use RTP)
Send/recive small XML files Use
cryptography for all those.
It will be implemented in java. So I have some questions, since I´ve never implemeted a network protocol yet.
Is it possible to use existing protocols to build this one?
What tool can I use to help me design the protocol? for "Modeling"
Is it possible to acomplish all this, doing it alone? I have as much time as I need for this.
I have a pretty good background in Java and C++, but not yet with sockets/networking programming.
Thanks

Take a look a Google Protocol Buffers, which will generate a compact wire protocol as well as autogenerating Java message classes. I wish I'd heard of it before rolling my own message codec using Java NIO ByteBuffers.

I've got a feeling you're trying to reinvent either SIP (if your packet processing is mostly stateless and XML is small enough to go into <3k packets), or XMPP.
If you need a connection oriented login/logout, and stateful commands/instructions, then XMPP is probably closer to the requirements. Also, Jingle extension to XMPP already deals with RTP setup and teardown. XML messages are trivial to embed into custom XMPP packets (which themselves are XML) and there are known XMPP solutions for proxying a file transfer.
I'm pretty sure it meets your requirements quite well (at least the way they're presented here). If you don't have to design a completely new protocol, it's probably easier if you don't. Also reusing an existing XMPP server will allow you to solve the pain of creating your own message broker. There's OpenFire server, which is written in Java.

I dont know if this is bad advice or not, but what I usually do for my networking applications is to make a Message object that holds a TAG string and CONTENT string. The CONTENT part is usually a JSON string and the message itself is also sent to/from the server as a json string.
When the server or client receives a message, it parses the json into a Message object. You can then check the TAG part of the message to see what type of content is held in the CONTENT part of the message and decide what to do with it.
For example, if TAG=="LOGIN" then the CONTENT may be login details or similar. And when the TAG=="MESSAGE" then the CONTENT will perhaps be a json string representing your parameters, for example, who is the recipient/s and what is the content of the message, etc.
You can then do you encryption and decryption on the strings. If this is a stupid way of doing it, please tell me so in a comment so i can learn :)
I also usually implement a state design pattern on both sides, but at least on the server side. For example, the server starts out in a WaitingForLogin state. When the client logs in it switches to a different state that only listens for files and chat messages as an example. In this way I found it is a bit easier to manage.

You could use http or https. The java media framework contains an implementation of rtp.

Writing the protocol from scratch may require a lot of work. Take a look at XMPP.
If you want to write your own protocol, start with learning a form of RPC like JSON or similar, which will make your life a lot easier.

How to pass a lot of data between two computers

What are possible options to transfer a lot of data from one computer to another not in the same LAN. The amount of data is about 100Mb unzipped and 2Mb zipped? Another requirement is that when I create a server for this (with C#) Java clients should be able to consume it.
Does WCF support something like this? But if Java clients won't be able to consume it I'm not interested.
What could be other strategies here?

I'd just use something common like HTTP or FTP, since there will be plenty of existing libraries to do it and you're pretty much guaranteed not to have compatibility problems. 2MB is not an unreasonably large amount of data for those protocols.

This is an interesting kind of question. The question is fairly simple to answer. But the interesting thing is that this kind of questions are new, they didn't exists before. Let me explain, but first I will answer your question:
You should create a server and clients both using old fashion TCP streams. To not waist bandwidth you need to compress the stream somehow, here use one of the most common compression algorithms you can find (anyone said Zip?). Now you have a language independent protocol. Clients in any language will work, mission accomplished. Also to keep it cross-platform, do not pick the best compression out there, pick the most common one (It will be good enough).
Now to why this kind of questions are interesting, they show something about OOP on the large scale. People understanding and using huge frameworks and asking if this or that framework can perform this or that simple task for them. Here we have lose our roots, we have lost the inner workings of things, it's hitting the nail not with a hammer but with a nuclear missile. It's overshooting the target, and it will produce huge applications, with huge footprint and often poor performance.
I believe that this questions has increased in number since OOP was fully adopted. It's like new programmers only want to learn these new big frameworks and that the framework dim the view of the world. There is absolutely nothing wrong with big frameworks, they are great, but I believe it's wrong to start out using them before one have mastered the basics. It's like learning to fly using a NASA space shuttle instead of a school version of a Cessna private airplane.

In C# you can serialize your object as an XML and transmit, on the other end your can deserialize your XML back to an object.
In terms of files size, you can transmit as zipped or 7z..and on the client decompress it before parsing the xml.

WCF supports SOAP and includes optional JSON serialization for XHTTP. There are other mechanisms but they are MS orientated. You will easily be able to consume the service you create. However you will have to consider how to encode the data as it will hit the wire in a non binary data friendly manner (XML/JSON).
You may wish to instead create a simple http handler that can return the data directly as zip using appropriate mime headers etc. You should then be able to just hit that using your Java client.

XMPP is another option. You need another server, but this could be an advantage: the client wouldn't need to know the servers IP address, server and clients would simply connect to the XMPP server to exchange message and files.
Related links (for Java):
Openfire XMPP server
XMPP library for java (Smack)

You didn't mentioned what type of data do you want to send. So for keeping things simple I will suppose that you have data stream which can be converted to byte array. Content of the stream has to be in format which is understandable to both C# and Java!
The best choice is to compress your data stream with GZip stream. Gzip should be supported on Java. Than you can send that stream converted to byte array as response from your WCF service operation. You can use default text encoding which will convert byte array into Base64 encoded string. If your java client supports MTOM (it is standard which is supported by Java) than you can use MTOM encoding which uses smaller messages.
If you don't have a stream with well known content format you have some sort of custom data. For custom data you have to use interoperable transport format which is XML. Using XML will futher increase size of your data. In that case you should consider dividing your data transfer into several calls. You can also try to host your WCF service in IIS 7.x and take advantage of its build in feature - compression of dynamic content. If your Java client calls the service with HTTP Accept-Encoding header set to compress, gzip it will automatically compress the response. Be aware that only .NET 4.0 WCF clients can work with such service.

implementing a Intelligent File Transfer Software in java over TCP/IP

Can you please provide answer to some of these?
Thanks

Since you say your client does not want to use FTP, I am assuming that you will be writing your own protocol. It seems like a number of these questions are more related to functional specifications, and you should be posing these to your client to get better requirements for your project. With that in mind here are ideas/comments for some of your questions:
How can we guarantee the file is received at the destination? Have your protocol include some kind of ACK / NAK message after file transfer is completed.
If a file isn’t received the first time, we should try it again (even after a restart or power breakdown)? This sounds more like a functional requirement that should be specified by your client. Do they want reliable transfer, then yes I would think some kind of retransmission would be justified (perhaps quitting after some number of failures.)
How does the receiver knows the file that is received is complete? You could have the protocol send some kind of data about the file before transfer such as an MD5 hash that could be checked by the client against the received bytes (and back to question 1 send the ACK / NAK if the hash matches)
How can we transfer multiple files synchronously? Again if you are writing the protocol just make it part of the spec. For example "Server will send byte listing number of files to be transferred, followed by name of each file, followed by number of bytes for each file." So your server might send "2, foo.txt, bar.txt, 512, 1024, [1516 bytes of data]"
How does one interoperate between multiple OS platforms? I think you need to be more specific with this question, what do you mean by interoperate? Everything I can think of should be abstracted away by Java (i.e. file system access, raw socket communication, etc)
What about authentication? As with reliability this seems like it would be more of a functional requirement specified by your client.
Auditing / logging? As above what does the client want?
Archiving? As above what does the client want?

How does one interoperate between multiple OS platforms? You said you're using Java, so this shouldn't be an issue, at least.

Learn "rsync" by heart, and then see how it can solve most of your clients needs.

Binary vs non-binary socketing java

I hear'd that exist binary communication. I'm beginner in java, I use plain text based which I learn from java.sun.com tutorials for sockets. So I want to know what benefits have binary socketing? Why I should use that? And what resources you can suggest about binary communication?

I would advise you to look at Object stream which allow simple binary passing of Objects over a Socket connection from one Java VM to another (across a network if required).
The Java Sun Tutorial is a good resource to start. See the Basic I/O section.

Sending binary data across the socket would obviously be more efficient in terms of bandwidth usage. However, it also increases the complexity of having to marshal and unmarshal data across the socket. Sending textual data (XML, JSON, etc.) is more intuitive and easier to develop in many circumstances if performance is not critical.

You would only work directly with binary data over sockets if you had a specific reason to do so - such as extreme performance. If you had such a reason you would know it, and using binary data direct over sockets does not guarantee high performance.
There are many abstractions provided in Java and libraries to facilitate network communication whilst protecting the programmer from the labour intensive and error-prone work at that low level:
Remote Method Invocation
Web Services
UrlConnection
Object Serialization
are just some examples.

As mentioned in the other answers, you should avoid raw socket communication unless absolutely necessary. I would suggest XML as a communication data format instead of objects as it allows you easily communicate with clients running other languages as needed and is much easier to debug. XStream is a great library to facilitate object<-->xml conversion with minimal supporting code.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.