What is the proper way to handle strings : Java client & C++ server

What is the proper way to handle strings : Java client & C++ server - java

I'm writing a C++ server/client application (TCP) that is working fine but I will soon have to write a Java client which obviously has to be compatible with the C++ server it connects to.
As for now, when the server or client receives strings (text), it loops through the bits till a '\0' is found, which marks the end of the string ...
Here's the question : is it still a good practice to handle strings that way when communicating over Java/C++ rather than C++/C++ ?

There's one thing you should read about: Encodings. Basically, the same sequence of bytes can be interpreted in different ways. As long as you pass things around in C++ or Java, things will agree on their meaning, but when using the net (i.e. a byte stream) you must make up your mind. If in doubt, read about and use UTF-8.

Consider using Protocol Buffers or Thrift instead of rolling your own protocol.

Related

How SSLContext.getInstance() method works?

Entire code is quire complicated so I am directly coming to the point.
Code is as follows
SSLContext ctx = SSLContext.getInstance("TLS");
If you read docs for getInstance(String protocol) method it says
This method traverses the list of registered security Providers, starting
with the most preferred Provider. A new SSLContext object encapsulating
the SSLContextSpi implementation from the first Provider that supports the
specified protocol is returned.
Note that the list of registered providers may be retrieved via the
Security.getProviders() method.
For me Security.getProviders() method gives following providers
Now I have verified that "TLS" protocol is in com.sun.net.ssl.internal.ssl.Provider (index 2 ) and is always selected.
But the corresponding SSLContextSpi object is coming different in Java 6 and Java 7. In java 6 I am getting com.sun.net.ssl.internal.ssl.SSLContextImpl#7bbf68a9 and in java 7 I am getting sun.security.ssl.SSLContextImpl$TLS10Context#615ece16. This is having very bad effect as when later I am creating SSL socket they are of different class.
So why is this happening? Is there a work around? I want the same com.sun.net.ssl.internal.ssl.SSLContextImpl#7bbf68a9 SSLContextSpi object encapsulated in com.sun.net.ssl.internal.ssl.Provider context(which is same in both cases).

This is having very bad effect as when later I am creating SSL socket they are of different class.
This is not a bad effect. Which actual class you get from the factories in the public API is at the discretion of the JRE implementation: these concrete classes are not part of the public API.
The fact that you get different classes between Java 6 and Java 7 doesn't really matter. Even if they had the same name, if wouldn't make sense to compare them to one another.
EDIT:
public int read(byte[] b) function reads only 1 bytes when I give it a
byte array of length 4 and also i have confirmed that there are 4
bytes in the stream.
SSLSocket in Java 7 is behaving correctly when you get this. In fact, it's probably behaving better, since this initial 1-byte read is due to the BEAST-prevention measure. I'll copy and paste my own answer to that question, since you're making exactly the same mistake.
The assumption you're making about reading the byte[] exactly as you write them on the other end is a classic TCP mistake. It's not actually specific to SSL/TLS, but could also happen with a TCP connection.
There is no guarantee in TCP (and in SSL/TLS) that the reader's buffer will be filled with the exact same packet length as the packets in the writer's buffer. All TCP guarantees is in-order delivery, so you'll eventually get all your data, but you have to treat it as a stream.
This is why protocols that use TCP rely on indicators and delimiters to tell the other end when to stop reading certain messages.
For example, HTTP 1.1 uses a blank line to indicate when the headers end, and it uses the Content-Length header to tell the recipient what entity length to expect (or chunked transfer encoding). SMTP also uses line returns and . at the end of a message.
If you're designing your own protocol, you need to define a way for the recipient to know when what you define as meaningful units of data are delimited. When you read the data, read such indicators, and fill in your read buffer until you get the amount of bytes you expect or until you find the delimiter that you've defined.

Send a Data Record through TCP in java

I'm a Delphi developer and recently I decided to port one of my programs to java and I'm doing the server side program in java to make it cross-platform.
In Delphi, I could easily send a record as an array of bytes through TCP but I don't have much experience in java and I have no idea how to do it in an easy but moderated way.
Here is a sample of my data record:
type
Tlogin = record
username : string[50];
password : string[50];
version : word;
end;
And I would just simply send this type of record after making it an array of bytes.
Any ideas how to make such data records in java and how do I set size for strings, or any better suggestions to handle strings for sending them through TCP.

In Java, you simply send objects over the sockets between a client and server and there are a number of ways to do that. For a related reference please visit
Sending objects over Java sockets
For a more step by step example visit the following link:
JGuru - Sending objects over a socket
In your case your object would look as follows
class TLogin implements Serializable
{
private String userName;
private String password;
private int version;
//implement your objects methods below
}
Fields within the object that you do not want to participate in serialization and de-serialization can be marked as transient
For a detailed step by step example of serialization visit
Java Serialization Example
Edit based on the comment provided to my earlier response.
Serialization in simple words : It is a technique where-in a Java object is converted to a byte sequence (essentially, all fields of the object except those marked transient are a part of this byte sequence). This byte sequence can then be used to re-construct the object at a later point of time. The byte sequence obtained by serializing an object can be either persisted to a store or transmitted over a network channel, in order to have it re-construct the object at a later stage.
Serialization is at the core of a lot of communication protocols that happen within a client server environment within Java using either of RMI, Sockets or SOAP.
Having talked about serialization , we come to the client-server problem.
In case, the plan is only to port the server side code to Java then you have the following options to enable communication between the client and server:
Design the server to use SOAP/REST to communicate with the Delphi client.
Augment your record with a header data structure that contains information about the length and type of the data being stored and use this header within the client transmitted byte sequence on the server side to re-construct the object.
However, in my opinion the first method is better than the second since
it is a standard inter-operable technique.If at a later point of time you wish to port the client to some other language like C# or Python, you do not need to change the server.
it lets the web service infrastructure handle the nitty gritty of
SOAP/REST serialization and lets you focus on the business logic
I hope this lengthy answer points you in a direction towards the solution

Protocol for platform independent socket communication

I'm trying to understand how platform independent socket communication works, because I would like to share socket data between a Java server and some native Unix and Windows clients. Sockets are platform independent by design, but the data representation is machine-related, hence it is advantageous if the TCP data abstracts the real data format, because a data format that is supported on one system doesn't have to be necessarily supported on another.
For example if I want to send an unsigned int value from a C++ client program to a Java server I must tell the server that this number should be interpreted as a negative integer. How does this kind of abstraction work? With my limited knowledge I would just send a number as text and then append some kind of unique character sequence that tells the receiver what kind of data he received, but I don't know if this is a viable approach.
To be a bit more concrete: I would like to send messages that contain the following content:
At the beginning of the message some kind of short signal or command
so that the receiver exactly knows what to do with the data that will follow.
Then some textual content of arbitrary length.
Followed by a number, which can be also text, but should be
interpreted separately.
At the end maybe a mark that tells the server that the message ends
here.
TCP processes the data in byte chunks. Does this mean when I write an UTF-8 encoded char in one byte that this char is interpreted in the same way on different machines if the client machines take Java's big endian byte order into account? Thanks for any input and help.

Sockets are independent but not the data transmitted in (Types length, byte order, String encoding, ...)
Look at Thrift, Protobuf or Avro if you want to send binary data with cross-languages and cross-platform functionnalities

How can I send data in binary form over a Java socket?

I've seen lots of examples of sending serialized data over sockets in Java, but all I want is to send some simple integers and a string. And, the problem is I'm trying to communicate these to a binary written in C.
So, bottom line: how can I just send some bytes over a socket in Java?

You can use the simple OutputStream given by the Socket.
From there you can write bytes.
If you want you can also encapsulate this stream in a BufferedOutputStream to have a buffer.

I would really recommend not using the Java Sockets library directly. I've found Netty (from JBoss) to be really easy to implement and really powerful. The Netty ChannelBuffer class comes with a whole host of options for writing different data types and of course to can write your own encoders and decoders to write POJOs down the stream if you wish.
This page is a really good starter - I was able to make a fairly sophisticated client/server with custom encoders and decoders in under 30 minutes reading this: http://docs.jboss.org/netty/3.2/guide/html/start.html.
If you really want to use Java sockets. The socket output stream can be wrapped in a DataOutputStream which allows you to write many different data types as well, for example:
new DataOutputStream(socket.getOutputStream()).writeInt(5);
I hope that's useful.

I would recommend looking into Protocol Buffers for the serialization and ZeroMQ for the data transfer.

java socket writeUTF() and readUTF()

I've been reading some Java socket code snippet and fonund out a fact that in socket communication, to send messages in sequence, you don't have to seperate them by hand, the writer/reader stream do the things automatically for you. Here is an example:
writer.java
writeUTF("Hello");
writeUTF("World");
reader.java
String a=readUTF(); // a=Hello
String a=readUTF(); // b=World
I've tried this code snippet and it works fine. However, I'm wondering whether this kind of coding style is supposed to be working fine. Is there any potential risks of using the socket stream in sequence without explicitly seperating each segment?

The writeUTF() and readUTF() write the length of the String (in bytes, when encoded as UTF-8) followed by the data, and use a modified UTF-8 encoding. So there are some potential problems:
The maximum length of Strings that can be handled this way is 65535 for pure ASCII, less if you use non-ASCII characters - and you cannot easily predict the limit in that case, other than conservatively assuming 3 bytes per character. So if you're sure you'll never send Strings longer than about 20k, you'll be fine.
If the app ever needs to communicate with something else (that's not written in Java), the other side may have a hard time handling the modified UTF-8. For application-internal communication, you don't have to worry though.

According to the documentation the readUTF and writeUTF methods work with a modified version of UTF8 that also adds the length of the character to be read in the beginnig.
This should mean that the read operation will wait until enough characters had been fetched before returning the string.. this means they are actually segmented also if you don't see it since you merely decorate the streams of the socket with the DataInputStream and DataOutputStream.
In conclusion, yes, it should be quite safe, since the API itself will take care of separating the single messages.

java.net.Socket works fine, the stream waits readUTF();
But when using mina's CumulativeProtocolDecoder, it won't, throws java.io.EOFException

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.