java socket writeUTF() and readUTF() - java

I've been reading some Java socket code snippet and fonund out a fact that in socket communication, to send messages in sequence, you don't have to seperate them by hand, the writer/reader stream do the things automatically for you. Here is an example:
writer.java
writeUTF("Hello");
writeUTF("World");
reader.java
String a=readUTF(); // a=Hello
String a=readUTF(); // b=World
I've tried this code snippet and it works fine. However, I'm wondering whether this kind of coding style is supposed to be working fine. Is there any potential risks of using the socket stream in sequence without explicitly seperating each segment?

The writeUTF() and readUTF() write the length of the String (in bytes, when encoded as UTF-8) followed by the data, and use a modified UTF-8 encoding. So there are some potential problems:
The maximum length of Strings that can be handled this way is 65535 for pure ASCII, less if you use non-ASCII characters - and you cannot easily predict the limit in that case, other than conservatively assuming 3 bytes per character. So if you're sure you'll never send Strings longer than about 20k, you'll be fine.
If the app ever needs to communicate with something else (that's not written in Java), the other side may have a hard time handling the modified UTF-8. For application-internal communication, you don't have to worry though.

According to the documentation the readUTF and writeUTF methods work with a modified version of UTF8 that also adds the length of the character to be read in the beginnig.
This should mean that the read operation will wait until enough characters had been fetched before returning the string.. this means they are actually segmented also if you don't see it since you merely decorate the streams of the socket with the DataInputStream and DataOutputStream.
In conclusion, yes, it should be quite safe, since the API itself will take care of separating the single messages.

java.net.Socket works fine, the stream waits readUTF();
But when using mina's CumulativeProtocolDecoder, it won't, throws java.io.EOFException

Related

Handle HTTP POST multipart response through ServerSocket

Good afternoon everyone,
First of all, I'll say that it's only for personal purpose in a certain way, it's made to use for little projects to improve my Java knowledge, but my idea is to make this kind of things to understand better the way developers works with sockets and bytes, as I really like to understand this kind of things better for my future ideas.
Actually I'm making a lightweight HTTP server in Java to understand the way it works, and I've been reading documentation but still have some difficulties to actually understand part of the official documentation. The main problem I'm facing is that, something I'd like to know if it's related or not, the content-length seems to have a higher length than the one I get from the BufferedReader. I don't know if the issue is about the way chars are managed and bytes are being parsed to chars on the BufferedReader, so it has less data, so probably what I have to do is treat this part as a binary, so I'd have to read the bytes of the InputStream, but here comes the real deal I'm facing.
As Readers reads a certain amount of bytes, and then it stops and uses this as buffer, this means the data from the InputStream is being used on the Reader, and it's no longer on the stream, so using read() would end up on a -1 as there aren't more bytes to read. A multipart is divided in multiple elements separated with a boundary, and a newline that delimiters the information from the content. I still have to get the information as an String to process it, but the content should be parsed into a binary data, and, without modifying the buffer length, implying I'd require knowledge about the exact length I require to get only the information, the most probably result would be the content being transferred to the BufferedReader buffer. Is it possible to do it even with the processed data from the BufferedStream, or should I find a way to get that certain content as binary without being processed?
As I said, I'm new working with sockets and services, so I don't exactly know which are the possibilities it's another kind of issue, so any help would be appreciated, thank you in advance.
Answer from Remy Lebeau, that can be found on the comments, which become useful for me:
since multipart data is both textual and binary, you are going to have to do your own buffering of the socket data so you have more control and know where the data switches back and forth. At the very least, since you can read binary data directly from a BufferedInputStream, and access its internal buffer, you can let it handle the actual buffering for you, and it is not difficult to write a custom readLine() method that can read a line of text from a BufferedInputStream without using BufferedReader

How SSLContext.getInstance() method works?

Entire code is quire complicated so I am directly coming to the point.
Code is as follows
SSLContext ctx = SSLContext.getInstance("TLS");
If you read docs for getInstance(String protocol) method it says
This method traverses the list of registered security Providers, starting
with the most preferred Provider. A new SSLContext object encapsulating
the SSLContextSpi implementation from the first Provider that supports the
specified protocol is returned.
Note that the list of registered providers may be retrieved via the
Security.getProviders() method.
For me Security.getProviders() method gives following providers
Now I have verified that "TLS" protocol is in com.sun.net.ssl.internal.ssl.Provider (index 2 ) and is always selected.
But the corresponding SSLContextSpi object is coming different in Java 6 and Java 7. In java 6 I am getting com.sun.net.ssl.internal.ssl.SSLContextImpl#7bbf68a9 and in java 7 I am getting sun.security.ssl.SSLContextImpl$TLS10Context#615ece16. This is having very bad effect as when later I am creating SSL socket they are of different class.
So why is this happening? Is there a work around? I want the same com.sun.net.ssl.internal.ssl.SSLContextImpl#7bbf68a9 SSLContextSpi object encapsulated in com.sun.net.ssl.internal.ssl.Provider context(which is same in both cases).
This is having very bad effect as when later I am creating SSL socket they are of different class.
This is not a bad effect. Which actual class you get from the factories in the public API is at the discretion of the JRE implementation: these concrete classes are not part of the public API.
The fact that you get different classes between Java 6 and Java 7 doesn't really matter. Even if they had the same name, if wouldn't make sense to compare them to one another.
EDIT:
public int read(byte[] b) function reads only 1 bytes when I give it a
byte array of length 4 and also i have confirmed that there are 4
bytes in the stream.
SSLSocket in Java 7 is behaving correctly when you get this. In fact, it's probably behaving better, since this initial 1-byte read is due to the BEAST-prevention measure. I'll copy and paste my own answer to that question, since you're making exactly the same mistake.
The assumption you're making about reading the byte[] exactly as you write them on the other end is a classic TCP mistake. It's not actually specific to SSL/TLS, but could also happen with a TCP connection.
There is no guarantee in TCP (and in SSL/TLS) that the reader's buffer will be filled with the exact same packet length as the packets in the writer's buffer. All TCP guarantees is in-order delivery, so you'll eventually get all your data, but you have to treat it as a stream.
This is why protocols that use TCP rely on indicators and delimiters to tell the other end when to stop reading certain messages.
For example, HTTP 1.1 uses a blank line to indicate when the headers end, and it uses the Content-Length header to tell the recipient what entity length to expect (or chunked transfer encoding). SMTP also uses line returns and . at the end of a message.
If you're designing your own protocol, you need to define a way for the recipient to know when what you define as meaningful units of data are delimited. When you read the data, read such indicators, and fill in your read buffer until you get the amount of bytes you expect or until you find the delimiter that you've defined.

Tokenize Java InputStream into streams, not Strings

I know the Java libraries pretty well, so I was surprised when I realized that, apparently, there's no easy way to do something seemingly simple with a stream. I'm trying to read an HTTP request containing multipart form data (large, multiline tokens separated be delimiters that look like, for example, ------WebKitFormBoundary5GlahTkFmhDfanAn--), and I want to read until I encounter a part of the request with a given name, and then return an InputStream of that part.
I'm fine with just reading the stream into memory and returning a ByteArrayInputStream, because the files submitted should never be larger than 1MB. However, I want to make sure that the reading method throws an exception if the file is larger than 1MB, so that excessively-large files don't fill up the JVM's memory and crash the server. The file data may be binary, so that rules out BufferedReader.readLine() (it drops newlines, which could be any of \r, \n, or \r\n, resulting in loss of data).
All of the obvious tokenizing solutions, such as Scanner, read the tokens as Strings, not streams, which could cause OutOfMemoryErrors for large files--exactly what I'm trying to avoid. As far as I can tell, there's no equivalent to Scanner that returns each token as an InputStream without reading it into memory. Is there something I'm missing, or is there any way to create something like that myself, using just the standard Java libraries (no Apache Commons, etc.), that doesn't require me to read the stream a character at a time and write all of the token-scanning code myself?
Addendum: Shortly before posting this, I realized that the obvious solution to my original problem was simply to read the full request body into memory, failing if it's too large, and then to tokenize the resulting ByteArrayInputStream with a Scanner. This is inefficient, but it works. However, I'm still interested to know if there's a way to tokenize an InputStream into sub-streams, without reading them into memory, without using extra libraries, and without resorting to character-by-character processing.
It's not possible without loading them into memory (the solution you don't want) or saving them to disk (becomes I/O heavy). Tokenizing the stream into separate streams without loading it into memory implies that you can read the stream (to tokenize it) and be able to read it again later. In short, what you want is impossible unless your stream is seekable, but these are generally specialized streams for very specific applications and specialized I/O objects, like RandomAccessFile.

Send both characters and bytes via sockets(TCP)

I made multiclient chat that is working pretty well. I am sending data through PrintWriter and receiving them using BufferedReader. As far as they are characters everything goes fine. But I though about adding possibility to send voice, too. And here I faced the problem. I have already used socket input and output stream and using them for transmitting characters. How to solve this problem and make sending bytes also possible? Isn't possible to create second stream that would be responsible for transmitting bytes? It would make things much more easier . If not how to solve it otherway?
I would not use TCP for voice transmitting, see the differences between TCP and UDP.
However, you can mix it by sending bytes only and converting all char-messages to byte-messages? I would not mix the writers streams.
In your case I'd simply open another socket.
Since you're considering VoIP, you might want to consider a UDP socket instead of TCP (assuming you use TCP for your chat).
But remember that in the end you always send bytes over a socket; it doesn't matter if it's text or voice data; text strings are also converted to bytes.

What is the proper way to handle strings : Java client & C++ server

I'm writing a C++ server/client application (TCP) that is working fine but I will soon have to write a Java client which obviously has to be compatible with the C++ server it connects to.
As for now, when the server or client receives strings (text), it loops through the bits till a '\0' is found, which marks the end of the string ...
Here's the question : is it still a good practice to handle strings that way when communicating over Java/C++ rather than C++/C++ ?
There's one thing you should read about: Encodings. Basically, the same sequence of bytes can be interpreted in different ways. As long as you pass things around in C++ or Java, things will agree on their meaning, but when using the net (i.e. a byte stream) you must make up your mind. If in doubt, read about and use UTF-8.
Consider using Protocol Buffers or Thrift instead of rolling your own protocol.

Categories

Resources