I set up a server with a ServerSocket, connect to it with a client machine. They're directly networked through a switch and the ping time is <1ms.
Now, I try to push a "lot" of data from the client to the server through the socket's output stream. It takes 23 minutes to transfer 0.6Gb. I can push a much larger file in seconds via scp.
Any idea what I might be doing wrong? I'm basically just looping and calling writeInt on the socket. The speed issue doesn't matter where the data is coming from, even if I'm just sending a constant integer and not reading from disk.
I tried setting the send and receive buffer on both sides to 4Mb, no dice. I use a buffered stream for the reader and writer, no dice.
Am I missing something?
EDIT: code
Here's where I make the socket
System.out.println("Connecting to " + hostname);
serverAddr = InetAddress.getByName(hostname);
// connect and wait for port assignment
Socket initialSock = new Socket();
initialSock.connect(new InetSocketAddress(serverAddr, LDAMaster.LDA_MASTER_PORT));
int newPort = LDAHelper.readConnectionForwardPacket(new DataInputStream(initialSock.getInputStream()));
initialSock.close();
initialSock = null;
System.out.println("Forwarded to " + newPort);
// got my new port, connect to it
sock = new Socket();
sock.setReceiveBufferSize(RECEIVE_BUFFER_SIZE);
sock.setSendBufferSize(SEND_BUFFER_SIZE);
sock.connect(new InetSocketAddress(serverAddr, newPort));
System.out.println("Connected to " + hostname + ":" + newPort + " with buffers snd=" + sock.getSendBufferSize() + " rcv=" + sock.getReceiveBufferSize());
// get the MD5s
try {
byte[] dataMd5 = LDAHelper.md5File(dataFile),
indexMd5 = LDAHelper.md5File(indexFile);
long freeSpace = 90210; // ** TODO: actually set this **
output = new DataOutputStream(new BufferedOutputStream(sock.getOutputStream()));
input = new DataInputStream(new BufferedInputStream(sock.getInputStream()));
Here's where I do the server-side connection:
ServerSocket servSock = new ServerSocket();
servSock.setSoTimeout(SO_TIMEOUT);
servSock.setReuseAddress(true);
servSock.bind(new InetSocketAddress(LDA_MASTER_PORT));
int currPort = LDA_START_PORT;
while (true) {
try {
Socket conn = servSock.accept();
System.out.println("Got a connection. Sending them to port " + currPort);
clients.add(new MasterClientCommunicator(this, currPort));
clients.get(clients.size()-1).start();
Thread.sleep(500);
LDAHelper.sendConnectionForwardPacket(new DataOutputStream(conn.getOutputStream()), currPort);
currPort++;
} catch (SocketTimeoutException e) {
System.out.println("Done listening. Dispatching instructions.");
break;
}
catch (IOException e) {
e.printStackTrace();
}
catch (Exception e) {
e.printStackTrace();
}
}
Alright, here's where I'm shipping over ~0.6Gb of data.
public static void sendTermDeltaPacket(DataOutputStream out, TIntIntHashMap[] termDelta) throws IOException {
long bytesTransferred = 0, numZeros = 0;
long start = System.currentTimeMillis();
out.write(PACKET_TERM_DELTA); // header
out.flush();
for (int z=0; z < termDelta.length; z++) {
out.writeInt(termDelta[z].size()); // # of elements for each term
bytesTransferred += 4;
}
for (int z=0; z < termDelta.length; z++) {
for (int i=0; i < termDelta[z].size(); i++) {
out.writeInt(1);
out.writeInt(1);
}
}
It seems pretty straightforward so far...
You do not want to write single bytes when you are transferring large amounts of data.
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.ServerSocket;
import java.net.Socket;
public class Transfer {
public static void main(String[] args) {
final String largeFile = "/home/dr/test.dat"; // REPLACE
final int BUFFER_SIZE = 65536;
new Thread(new Runnable() {
public void run() {
try {
ServerSocket serverSocket = new ServerSocket(12345);
Socket clientSocket = serverSocket.accept();
long startTime = System.currentTimeMillis();
byte[] buffer = new byte[BUFFER_SIZE];
int read;
int totalRead = 0;
InputStream clientInputStream = clientSocket.getInputStream();
while ((read = clientInputStream.read(buffer)) != -1) {
totalRead += read;
}
long endTime = System.currentTimeMillis();
System.out.println(totalRead + " bytes read in " + (endTime - startTime) + " ms.");
} catch (IOException e) {
}
}
}).start();
new Thread(new Runnable() {
public void run() {
try {
Thread.sleep(1000);
Socket socket = new Socket("localhost", 12345);
FileInputStream fileInputStream = new FileInputStream(largeFile);
OutputStream socketOutputStream = socket.getOutputStream();
long startTime = System.currentTimeMillis();
byte[] buffer = new byte[BUFFER_SIZE];
int read;
int readTotal = 0;
while ((read = fileInputStream.read(buffer)) != -1) {
socketOutputStream.write(buffer, 0, read);
readTotal += read;
}
socketOutputStream.close();
fileInputStream.close();
socket.close();
long endTime = System.currentTimeMillis();
System.out.println(readTotal + " bytes written in " + (endTime - startTime) + " ms.");
} catch (Exception e) {
}
}
}).start();
}
}
This copies 1 GiB of data in short over 19 seconds on my machine. The key here is using the InputStream.read and OutputStream.write methods that accept a byte array as parameter. The size of the buffer is not really important, it just should be a bit larger than, say, 5. Experiment with BUFFER_SIZE above to see how it effects the speed but also keep in mind that it probably is different for every machine you are running this program on. 64 KiB seem to be a good compromise.
Hey, I figured I'd follow up for anyone that was interested.
Here's the bizarre moral of the story:
NEVER USE DataInputStream/DataOutputStream and sockets!!
If I wrap the socket in a BufferedOutputStream/BufferedInputStream, life is great. Writing to it raw is just fine.
But wrap the socket in a DataInputStream/DataOutputStream, or even have DataOutputStream(BufferedOutputStream(sock.getOutputStream())) is EXTREMELY SLOW.
An explanation for that would be really interesting to me. But after swapping everything in and out, this is what's up. Try it yourself if you don't believe me.
Thanks for all the quick help, though.
Maybe you should try sending ur data in chunks(frames) instead of writing each byte seperately. And align your frames with the TCP packet size for best performance.
Can you try doing this over loopback, it should then transfer the data in second.
If it takes minutes, there is something wrong with your application. If is only slow sending data over the internet it could be you network link which is slow.
My guess is that you have a 10 Mb/s network between your client and your server and this is why your transfer is going slowly. If this is the case, try using a DeflatoutOutputStream and an InflatorInputStream for your connection.
How are you implementing the receiving end? Please post your receiving code as well.
Since TCP is a reliable protocol, it will take steps to make sure the client is able to receive all of the data sent by the sender. This means that if your client cannot get the data out of the data receive buffer in time, then the sending side will simply stop sending more data until the client has a chance to read all the bytes in the receiving buffer.
If your receiving side is reading data one byte at a time, then your sender probably will spend a lot of time waiting for the receiving buffer to clear, hence the long transfer times. I'll suggest changing your receiving code to reading as many bytes as possible in each read operation . See if that will solve your problem.
Since I cannot yet comment on this site, I must write answer to #Erik here.
The problem is that DataOutputStream doesn't buffer. The whole Stream-thing in Java is based on decorators design pattern. So you could write
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(socket.getOutputStream()));
It will wrap the original stream in a BufferedOutputStream which is more efficient, which is then wrapped into a DataOutputStream which offers additional nice features like writeInt(), writeLong() and so on.
#Erik: using DataXxxputStream is not the problem here. Problem is you were sending data in too small chunks. Using a buffer solved your problem because even you would write bit by bit the buffer would solve the problem.
Bombe's solution is much nicer, generic and faster.
You should download a good packet sniffer. I'm a huge fan of WireShark personally and I end up using it every time I do some socket programming. Just keep in mind you've got to have the client and server running on different systems in order to pick up any packets.
Things to try:
Is the CPU at 100% while the data is being sent? If so, use visualvm and do a CPU profiling to see where the time is spent
Use a SocketChannel from java.nio - these are generally faster since they can use native IO more easily - of course this only helps if your operation is CPU bound
If it's not CPU bound, there's something going wrong at the network level. Use a packet sniffer to analyze this.
I was using PrintWriter to send data. I removed that and sent data with BufferedOutputStream.send(String.getBytes()) and got about 10x faster sending.
How is your heap size set? I had a similar problem recently with the socket transfer of large amounts of data and just by looking at JConsole I realized that the application was spending most of its time doing full GCs.
Try -Xmx1g
USe Byte buffer for sending the data
Related
In the application that I am developing in Android, I send bytes of number 5 using sockets tcp and udp. I would like to know if it is possible to get the amount of payload that was received until a SocketTimeoutException exception was thrown.I'm doing some moving tests with Wi-Fi Direct technology so, when sending, all may not be received because the peers are disconnected before.
Also for the case of UDP when packet loss occurs I would like to know the amount of information that I receive.
To read what I get, I use readFully and recieve. This reception is done in a single step or in a loop in which I receive large amounts of information. I could not receive the bytes one by one because it would be really slow.
TCP RX:
ServerSocket serverSocket = new ServerSocket(SERVERPORT);
Socket client = serverSocket.accept();
DataInputStream DIS = new DataInputStream(client.getInputStream());
int tamMensaje = (100 * 1000 * 1000);
byte[] payload = new byte[tamMensaje];
DIS.readFully(payload);
int failures = 0;
for (int i = 0; i < tamMensaje; i++) {
if (payload[i] != 5) {
failures = failures + 1;
}
}
int nPayLoad = payload.length- failures;
client.close();
serverSocket.close();
UDP RX:
DatagramSocket datagramSocket = new DatagramSocket(SERVERPORT);
byte[] buffer = new byte[1400];
DatagramPacket packet = new DatagramPacket(buffer, buffer.length);
int tamMensaje = 1400 *71428;
int iteration = 71428;
for (int i = 0; i < iteration; i++) {
datagramSocket.receive(packet);
msg_received = msg_received + new String(buffer, 0, packet.getLength());
}
byte[] payload = msg_received.getBytes();
int failures = 0;
for (int i = 0; i < tamMensaje; i++) {
if (payload[i] != 5) {
failures = failures + 1;
}
}
int nPayload = payload.length- failures;
datagramSocket.close();
How can I know the amount of information received if the communication is cut off when sending for TCP and UDP occurs as well as in case UDP does not receive everything it should?
Thanks
You can't if you use any of the compound readXXX() methods, as their APIs don't provide any means of retrieving it.
However as, for example, an int is normally sent in one go, e.g. by writeInt(), there is really little reason why half of it would arrive long before the other half, even if you got really unlucky and segmentation or packetization split it. It could happen, but you would have to be really be extraordinarily unlucky, and in any case half an int is no more use than none of it.
Similar considerations apply to readFully(): presumably you are reading something, all of which is supposed to be there, so half of it wouldn't be of much use; and conversely, if it would, don't use readFully().
If you use the basic read() method, the answer of course is zero: nothing arrived before the timeout.
I could not receive the bytes one by one because it would be really slow.
Of course, but that's not the only alternative. Use read(byte[]), with a buffer size of your choice, say 8192. Or a BufferedInputStream.
As a hobby project, I'm writing an android voip client. When writing voice data to the socket (Vars.mediaSocket), many times, the data isn't immediately sent out over the wifi but just stalls and then all at once it will send 20 seconds worth of voice. Then it will stall again and wait for 30 seconds and then send 30 seconds of voice. The wait is not consistent but after a while it will continuously send voice data immediately. I've tried everything from using DataOutputStream to setting the socket output buffer size, setting the sendbuffer size huge, small, and lastly, buffering the voice data from its 32 byte chunks to anything from 128bytes to 32kb.
Utils.logcat(Const.LOGD, encTag, "MediaCodec encoder thread has started");
isEncoding = true;
byte[] amrbuffer = new byte[32];
short[] wavbuffer = new short[160];
int outputCounter = 0;
//setup the wave audio recorder. since it is released and restarted, it needs to be setup here and not onCreate
wavRecorder = null; //remove pointer to the old recorder for safety
wavRecorder = new AudioRecord(MediaRecorder.AudioSource.MIC, SAMPLESWAV, AudioFormat.CHANNEL_IN_MONO, FORMAT, 160);
wavRecorder.startRecording();
AmrEncoder.init(0);
while(!micMute)
{
int totalRead = 0, dataRead;
while(totalRead < 160)
{//although unlikely to be necessary, buffer the mic input
dataRead = wavRecorder.read(wavbuffer, totalRead, 160 - totalRead);
totalRead = totalRead + dataRead;
}
int encodeLength = AmrEncoder.encode(AmrEncoder.Mode.MR122.ordinal(), wavbuffer, amrbuffer);
try
{
Vars.mediaSocket.getOutputStream().write(amrbuffer);
Vars.mediaSocket.getOutputStream().flush();
}
catch (IOException i)
{
Utils.logcat(Const.LOGE, encTag, "Cannot send amr out the media socket");
Utils.dumpException(tag, i);
}
Is there something I'm missing? To simulate a second cell phone, I have another client which just simply reads the voice data, throws it away, and reads again in a loop. I can confirm in the simulated second cell phone when the real cell phone stops sending voice, the simulated one's socket.read hangs until the real one starts sending voice again.
I'm really hoping not to have to write a jni for the socket as I don't know anything about that and was hoping I could write the app as a standard java app.
CASE CLOSED: turned out to be a server side bug but the simplifying back to basics suggestions is still a good idea.
You are adding most of the latency yourself by reading large amounts of data before writing any of it. You should just use the standard Java copy loop:
byte[] buffer = new byte[8192];
int count;
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
You need to adapt this to incorporate your codec step. Note that you don't need a buffer the size of the entire input. You can tune its size to suit yourself but 8192 is a good starting point. You can increase it to say 32k but don't decrease it. If your codec needs the data in fixed-size chunks, use a buffer of that size and DataInputStream.readFully(). But the larger the buffer the more the latency.
EDIT Specific issues with your code:
byte[] amrbuffer = new byte[AMRBUFFERSIZE];
byte[] outputbuffer = new byte [outputBufferSize];
Remove (see below).
short[] wavbuffer = new short[WAVBUFFERSIZE];
int outputCounter = 0;
Remove outputCounter.
//setup the wave audio recorder. since it is released and restarted, it needs to be setup here and not onCreate
wavRecorder = null; //remove pointer to the old recorder for safety
Pointless. Remove.
wavRecorder = new AudioRecord(MediaRecorder.AudioSource.MIC, SAMPLESWAV, AudioFormat.CHANNEL_IN_MONO, FORMAT, WAVBUFFERSIZE);
wavRecorder.startRecording();
AmrEncoder.init(0);
OK.
try
{
Vars.mediaSocket.setSendBufferSize(outputBufferSize);
}
catch (SocketException e)
{
e.printStackTrace();
}
Pointless. Remove. The socket send buffer should be as large as possible. Unless you know that its default size is < outputBufferSize there is no benefit to this. In any case we are getting rid of outputBuffer altogether.
while(!micMute)
{
int totalRead = 0, dataRead;
while(totalRead < WAVBUFFERSIZE)
{//although unlikely to be necessary, buffer the mic input
dataRead = wavRecorder.read(wavbuffer, totalRead, WAVBUFFERSIZE - totalRead);
totalRead = totalRead + dataRead;
}
int encodeLength = AmrEncoder.encode(AmrEncoder.Mode.MR122.ordinal(), wavbuffer, amrbuffer);
OK.
if(outputCounter == outputBufferSize)
{
Utils.logcat(Const.LOGD, encTag, "Sending output buffer");
try
{
Vars.mediaSocket.getOutputStream().write(outputbuffer);
Vars.mediaSocket.getOutputStream().flush();
}
catch (IOException i)
{
Utils.logcat(Const.LOGE, encTag, "Cannot send amr out the media socket");
Utils.dumpException(tag, i);
}
outputCounter = 0;
}
System.arraycopy(amrbuffer, 0, outputbuffer, outputCounter, encodeLength);
outputCounter = outputCounter + encodeLength;
Utils.logcat(Const.LOGD, encTag, "Output buffer fill: " + outputCounter);
Remove all the above and substitute
Vars.mediaSocket.getOutputStream().write(amrbuffer, 0, encodeLength);
This also means you can get rid of 'outputBuffer' as promised.
NB Don't flush inside loops. As a matter of fact flushing a socket output stream does nothing, but the general principle still holds.
Ok, So I'm making a Java program that has a server and client and I'm sending a Zip file from server to client. I have sending the file down, almost. But recieving I've found some inconsistency. My code isn't always getting the full archive. I'm guessing it's terminating before the BufferedReader has the full thing. Here's the code for the client:
public void run(String[] args) {
try {
clientSocket = new Socket("jacob-custom-pc", 4444);
out = new PrintWriter(clientSocket.getOutputStream(), true);
in = new BufferedInputStream(clientSocket.getInputStream());
BufferedReader inRead = new BufferedReader(new InputStreamReader(in));
int size = 0;
while(true) {
if(in.available() > 0) {
byte[] array = new byte[in.available()];
in.read(array);
System.out.println(array.length);
System.out.println("recieved file!");
FileOutputStream fileOut = new FileOutputStream("out.zip");
fileOut.write(array);
fileOut.close();
break;
}
}
}
} catch(IOException e) {
e.printStackTrace();
System.exit(-1);
}
}
So how can I be sure the full archive is there before it writes the file?
On the sending side write the file size before you start writing the file. On the reading side Read the file size so you know how many bytes to expect. Then call read until you have gotten everything you expect. With network sockets it may take more than one call to read to get everything that was sent. This is especially true as your data gets larger.
HTTP sends a content-length: x+\n in bytes. This is elegant, it might throw a TimeoutException if the conn is broken.
You are using a TCP socket. The ZIP file is probably larger than the network MTU, so it will be split up into multiple packets and reassembled at the other side. Still, something like this might happen:
client connects
server starts sending. The ZIP file is bigger than the MTU and therefore split up into multiple packets.
client busy-waits in the while (true) until it gets the first packets.
client notices that data has arrived (in.available() > 0)
client reads all available data, writes it to the file and exits
the last packets arrive
So as you can see: Unless the client machine is crazily slow and the network is crazily fast and has a huge MTU, your code simply won't receive the entire file by design. That's how you built it.
A different approach: Prefix the data with the length.
Socket clientSocket = new Socket("jacob-custom-pc", 4444);
DataInputStream dataReader = new DataInputStream(clientSocket.getInputStream());
FileOutputStream out = new FileOutputStream("out.zip");
long size = dataReader.readLong();
long chunks = size / 1024;
int lastChunk = (int)(size - (chunks * 1024));
byte[] buf = new byte[1024];
for (long i = 0; i < chunks; i++) {
dataReader.read(buf);
out.write(buf);
}
dataReader.read(buf, 0, lastChunk);
out.write(buf, 0, lastChunk);
And the server uses DataOutputStream to send the size of the file before the actual file. I didn't test this, but it should work.
How can I make sure I received whole file through socket stream?
By fixing your code. You are using InputStream.available() as a test for end of stream. That's not what it's for. Change your copy loop to this, which is also a whole lot simpler:
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
Use with any buffer size greater than zero, typically 8192.
In.available() just tells you that there is no data to be consumed by in.read() without blocking (waiting) at the moment but it does not mean the end of stream. But, they may arrive into your PC at any time, with TCP/IP packet. Normally, you never use in.available(). In.read() suffices everything for the reading the stream entirely. The pattern for reading the input streams is
byte[] buf;
int size;
while ((size = in.read(buf)) != -1)
process(buf, size);
// end of stream has reached
This way you will read the stream entirely, until its end.
update If you want to read multiple files, then chunk you stream into "packets" and prefix every one with an integer size. You then read until size bytes is received instead of in.read = -1.
update2 Anyway, never use in.available for demarking between the chunks of data. If you do that, you imply that there is a time delay between incoming data pieces. You can do this only in the real-time systems. But Windows, Java and TCP/IP are all these layers incompatible with real-time.
these days I'm confused about the Tcp performance while using java socket. In fact the java code is very simple. details as below:
server open a port and begin to listen.
client request and after connect to server, client begin to write to socket.
after server got the request, it will open a new thread to handle this connection. (this connection is a long connection which will not time out).
the server will keep reading until it got the end separator, then give a response to the client and continue to keep reading again.
after client get the response, it will send another request again.
I find if the client write the whole message (including the end separator) one time, the communication speed is good satisfactorily, the speed can reach to 50000 messages per minute. How ever, if the client write the bytes to socket in separated times, the speed cut down quickly, just almost 1400 messages per minute, it is 1/40 times compared with the original speed. I'm quite confused about it. Any one could give me a hand? Any comments is appreciated!
my simulated server side is as below:
public class ServerForHelp {
final static int BUFSIZE = 10240;
Socket socket;
String delimiter = "" + (char) 28 + (char) 13;
public static void main(String[] args) throws IOException {
ServerSocket ss = new ServerSocket(9200);
System.out.println("begin to accept...");
while (true) {
Socket s = ss.accept();
Thread t = new Thread(new SocketThread1(s));
t.start();
}
}
public String readUntilDelimiter() throws Exception {
StringBuffer stringBuf = new StringBuffer();
InputStream stream = socket.getInputStream();
InputStreamReader reader = null;
reader = new InputStreamReader(stream);
char[] buf = new char[BUFSIZE];
while (true) {
int n = -1;
n = reader.read(buf, 0, BUFSIZE);
if (n == -1) {
return null; // it means the client has closed the connection, so return null.
} else if (n == 0) {
continue; // continue to read the data until got the delimiter from the socket.
}
stringBuf.append(buf, 0, n);
String s = stringBuf.toString();
int delimPos = s.indexOf(delimiter);
if (delimPos >= 0) {
// found the delimiter; return prefix of s up to separator and
// To make the thing simple, I have discarded the content after the delimiter.
String result = s.substring(0, delimPos);
sendTheResponse(socket);
return result;
}
}
}
private void sendTheResponse(Socket socket) throws IOException {
Writer writer = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
writer.write("Hi, From server response");
writer.flush();
}
}
class SocketThread1 implements Runnable {
Socket socket;
public SocketThread1(Socket socket) {
this.socket = socket;
}
#Override
public void run() {
ServerForHelp server = new ServerForHelp();
server.socket = socket;
while (true) {
try {
if (server.readUntilDelimiter() == null) // it means that the client has closed the connection, exist
break;
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
It is a normal socket programming.
and the following is my client side:
public void execute() throws Exception{
int msgCnt = 0;
Socket socket = null;
byte[] bufBytes = new byte[512];
long start = 0;
final char START_MESSAGE = 0x0B;
final char END_MESSAGE = 0x1C;
final char END_OF_RECORD = 0x0D;//\r
String MESSAGE = "HELLO, TEST";
socket = new Socket("192.168.81.39", 9200);
OutputStream os = socket.getOutputStream();
InputStream is = socket.getInputStream();
while (System.currentTimeMillis() - start < 60000)
{
// If you send the total message at one time, the speed will be improved significantly
// FORMAT 1
StringBuffer buf = new StringBuffer();
buf.append(START_MESSAGE);
buf.append(MESSAGE);
buf.append(END_MESSAGE);
buf.append(END_OF_RECORD);
os.write(buf.toString().getBytes());
// FORMAT 1 END
//FORMAT 2
// os.write(START_MESSAGE);
// os.write(MESSAGES[port].getBytes());
// os.write(END_MESSAGE);
// os.write(END_OF_RECORD);
//FORMAT 2 END
os.flush();
is.read(bufBytes);
msgCnt++;
System.out.println(msgCnt);
}
System.out.println( msgCnt + " messages per minute");
}
If I use the "FORMAT 1", to send the message, the speed could reach to 50000 messages per minute, but If use "FORMAT 2", the speed is down to 1400 messages per minute. Who is clear about the reason?
I'm trying to describe as detail as I can and any help will be appreciated very much.
Multiple very short writes to a socket in rapid succession followed by a read can trigger a bad interaction between Nagle's algorithm and TCP delayed acknowledgment; even if you disable Nagle's algorithm, you'll cause an entire packet to be sent per individual write call (with 40+ bytes of overhead, whether the write is one byte or a thousand).
Wrapping a BufferedOutputStream around the socket's output stream should give you performance similar to "FORMAT 1" (precisely because it holds things in a byte array until it fills or is flushed).
As John Nagle explained on Slashdot:
The user-level solution is to avoid write-write-read sequences on sockets. write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once.
Local on Linux. It's about 10 seconds for a 20k message. My guess is my Java is bad and Python is fine.
py client:
def scan(self, msg):
try:
print 'begin scan'
HOST = 'localhost'
PORT = 33000
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT));
s.sendall(msg)
data = s.recv(1024)
s.close()
print 'Received', repr(data)
except Exception, e:
print "error: " + str(e)
Java server:
ServerSocket service = new ServerSocket(33000);
while(true) {
debug("Begin waiting for connection");
//this spins
Socket connection = service.accept();
debug("Connection received from " + connection.getInetAddress().getHostName());
OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream());
BufferedInputStream in = new BufferedInputStream(connection.getInputStream());
ScanResultsHeader results = new ScanResultsHeader();
Scanner scanner = new Scanner();
results = scanner.scan("scannerfake#gmail.com", "123", in);
and
public ScanResultsHeader scan (String userEmail,
String imapRetrievalId,
BufferedInputStream mimeEmail)
throws IOException, FileNotFoundException, MimeException, ScannerException {
//how fast would it be to just slurp up stream?
debug("slurp!");
String slurp = IOUtils.toString(mimeEmail);
debug("slurped " + slurp.length() + " characters");
slurp = slurp.toLowerCase();
debug("lc'ed it");
//...
My guess is I'm juggling the input streams wrong. One catch is the "BufferedInputStream mimeEmail" signature is required by the library API scan is using, so I'll need to get to that form eventually. But I noticed the simple act of slurping up a string takes ludicrously long so I'm already doing something incorrect.
Revising my answer....
If you are reading efficiently, and it appears you are, it will only be taking a lot time because either
You are creating a new connection every time you send a message which can be very expensive.
You are not sending the data as fast as you think.
The message is very large (unlikely but it could be)
There are plenty of examples on how to do this and a good library you can use is IOUtils which makes it simpler.
You should be able to send about 200K/s messages over a single socket in Java.
If you have a sends X bytes protocol using Big Endian you can do this.
DataInputStream dis = new DataInputStream( ...
int len = dis.readInt();
byte[] bytes = new byte[len];
dis.readFully(bytes);
String text = new String(bytes, "UTF-8");
Original problem was that the client isn't sending an end-of-input so the "slurp" operation keeps waiting for more stuff to cross the connection.
Solution was to implement an application-layer protocol to send the size of the message in advance, then stop listening for more message after that many bytes. I would have preferred a standard library -- something like, FiniteInputStream extends BufferedInputStream and takes a size as an argument, but wrote my own.