I am using Java for socket communication. The server is reading bytes from the client like this:
InputStream inputStream;
final int BUFFER_SIZE = 65536;
byte[] buffer = new byte[BUFFER_SIZE];
String msg="";
while (msg.indexOf(0)==-1 && (read = inputStream.read(buffer)) != -1)
{
msg += new String(buffer, 0, read);
}
handleMessage(msg)
There is a problem when a client is sending multiple messages at once the server mixes the messages e.g.
MSG1: <MyMessage><Hello/>nul
MSG2: </MyMessage><MyMessage><Hello again /></MyMessage>nul
So the tail of Message 1 is part of Message 2.
The null represents the java nul symbol.
Why does the inputstream mix the messages?
Thanks in advance!
You are doing the wrong comparison. You check if there is a \0 anywhere in the String, and then believe it is a single message. Wrong. In fact, in the second example, the \0 comes twice.
You should do it differently. Read from the Stream in char by char (Using a wrapping BufferedInputStream, else the performance will be awful), and skip when the \0 is reached. Now the message is complete, and you can handle it.
InputStream bin = new BufferedInputStream(inputStream);
InputStreamReader reader = new InputStreamReader(bin);
StringBuilder msgBuilder = new StringBuilder();
char c;
while ( (c=reader.read()) != -1 )
{
msgBuilder .append(c);
}
handleMessage(msgBuilder.toString())
Even better would be using the newline character for line separation. In this case you could just use the readline() functionality of BufferedReader.
Sockets and InputStream are only a stream of bytes, not messages.
If you want to break up the stream based on a char like \0 you need to do this yourself.
However, in your case it appears you have a bug on your sending side as the \0 isn't the right places and it is highly unlikley to be a bug on the client side.
btw: Using String += is very inefficient.
The data you read from InputStream will come in as it's available from the OS and there's no guarantee on how it will be split up. If you're looking to split on new lines you might want to consider something like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
And then use reader.readLine() to get each line as a String that's your message.
I think your problem is in your approach:
There is a problem when a client is sending multiple messages at once
Sockets in general just get chunks of bytes and the trick is in how you send them, how do you mark start/end of a message and how do you check it for errors (quick hash can do a lot of good) so the behaviour is fine in my eyes, you just need to work on your messaging if you really need to send multiple messages at once.
Sockets will control if your message is integral physical wise but what is IN the message is your concern.
Related
I'm developing a small terminal app that handles interactions with a POP3 server. However, I'm having a problem where both read() and readLine() from BufferedReader block. My initial attempts used readLine(), but after reading on SO and other sites, I figured that the server isn't returning the appropriate characters to mark the end of the line, so I attempted to use read(). But for some reason, that blocks as well.
Socket s = new Socket(InetAddress.getByName(this.HOST), 110);
BufferedReader in = new BufferedReader(new InputStreamReader(s.getInputStream()));
PrintWriter out = new PrintWriter(s.getOutputStream(), true);
String res = in.readLine(); // This works fine
System.out.println(res);
res = "";
char [] charRes = new char[1024];
out.println("USER " + this.username);
// res = in.readLine();
in.read(charRes); // Does not work
res = charRes.toString();
System.out.println(res);
The problem is not with the server because I tested it using Telnet and it works fine. I'm not sure what I'm doing wrong and I would appreciate any help.
My client software is running on a Linux system and I am connecting to a Windows server.
According to the documentation, the read(byte[]) method will block until at least one byte has been read from the stream.
To ensure there is data available before calling the read(byte[]) method, you can call the available() method, documented here.
Based on your original question and the subsequent comments, the main problem seems to be caused by the line endings. You have indicated that the client is running on Linux, where the standard line ending is LF (\n) but the POP3 RFC specifically requires each command to be terminated by CRLF (\r\n). Instead of using out.println() (which will use your system line ending of \n automatically), try using the PrintWriter.write(String) (documentation) method.
You appear to be using the PrintWriter constructor with autoFlush set to true. This meant that your stream would be flushed automatically when using the println() method, HOWEVER, it will not flush automatically when using the write() method, so you will need to also add a call to flush(). Again, it is advisable to refer to the documentation. The updated code would look something like this:
out.write("USER " + this.username + "\r\n");
out.flush();
You may want to consider similarly updating your reads from the input stream to make sure you don't have to manually remove \r characters from the input (the various readLine() methods will consume / discard the \n but not the \r when the client runs on Linux).
I know this has been asked before, but since I haven't been able to find an answer with a definitive conclusion or at least one that shows the pros and cons of the possibles approaches, I have to ask :
When it comes to read data from the Internet, a webservice for instance, what is the correct or more efficient way to read this data?
From all the books I have glanced over, I've found at least 4 ways to read data:
1) Reading a specific amount of characters at a time.
In this case the data is read in chunks of 4026 characters
BufferedReader reader = new BufferedReader(
new InputStreamReader(in, encoding));
char[] buffer = new char[4096];
StringBuilder sb = new StringBuilder();
int downloadedBytes = 0;
int len1 = 0;
while ((len1 = reader.read(buffer)) > 0) {
sb.append(buffer);
}
return sb.toString();
2) Read the data knowing the content lenght
int length =(HttpURLConnection) urlConnection.getContentLength();
InputStream inputStream = urlConnection.getInputStream();
BufferedReader bufferedReader =new BufferedReader(new InputStreamReader(inputStream));
StringBuilder stringBuilder = new StringBuilder(length);
char[] buffer = new char[length];
int charsRead;
while ((charsRead = bufferedReader.read(buffer)) != -1) {
stringBuilder.append(buffer, 0, charsRead);
}
return stringBuilder.toString();
3) Read the data line by line :
BufferedReader reader=new BufferedReader(new InputStreamReader(c.getInputStream()));
StringBuilder buf=new StringBuilder();
String line=null;
while ((line=reader.readLine()) != null) {
buf.append(line);
}
return(buf.toString());
4) Read the data character by character:
InputStream in = mConnection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(
in, enconding));
int ch;
StringBuilder sb = new StringBuilder();
while ((ch=reader.read()) > 0) {
sb.append((char)ch);
}
return sb.toString().trim();
I have tried 3 of these 4 different techniques, except for number 3 (Read the data line by line ) and out of the three techniques only the fourth has given me good results.
The first method, didn't work for me because when I read large amounts of data, as it often cut the data giving me as a result invalid json strings or string with white spaces at the end.
The second approach, well I wasn't able to use that method because getContentLength is not always reliable and if the value is not set , there's nothing we can do about it , well that's my case.
I didn't tried the third method because I wasn't sure about the fact of reading data "line" by "line". Does this apply to data that contains an array of json objects or only to files that indeed contain lines??
Being the last technique the last choice I was left with, I tried it and it worked, BUT I don't think that reading a large amount of data character by character would be efficient at all.
So now I would really appreciate your opinions and ideas. What approach do you use when it comes to reading data from webservices? and more importantly why?
Thanks.
P.D. I know I could've easily used DefaultHttpClient, but the doc clearly encourages not to do so.
For Android 2.3 (Gingerbread) and later, HttpURLConnection is the best
choice. Its simple API and small size makes it great fit for Android.
Transparent compression and response caching reduce network use,
improve speed and save battery.
Ive tried all the methods that you have mentioned. One problem if faced was the reply not being read completely. After some research, the most efficient/fastest way i found was to go about it like this
DefaultHttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("Accept", "application/json");
httpGet.setHeader("Content-type", "application/json");
//ive put json header because im using json
try {
HttpResponse execute = client.execute(httpGet);
String responseStr = EntityUtils.toString(execute.getEntity());
}
responseStr will contain the webservice reply and it reads it in one go. Hope this helps
If the data volume is not too big, it doesn't really matter, what approach you use. If it is, then it makes sense to use buffering - and read data in chunks.
2nd approach is not too good, as you not always can get ContentLength.
Then, if your data is text/html/JSON you can use 3rd approach, as you don't have to bother yourself with the chunk size. Also, you can print the incoming data line-by-line to aim debugging.
If your data is a binary/base64 stream like image, you should use 1st approach and read data in 4k (usually used) blocks.
UPDATE:
BTW, instead of the dreaded DefaultHttpClient I'm using the AndroidHttpClient as a singleton and it works smooth :)
It matters. Best for performance is to read from InputStream into a buffer of a reasonable size. This way you transfer a decent amount of data at one time, rather then repeating the same operation thousand times. Do not always rely on Content-length header value. For gzipped content it might show incorrect size.
It seems that there are many, many ways to read text files in Java (BufferedReader, DataInputStream etc.) My personal favorite is Scanner with a File in the constructor (it's just simpler, works with mathy data processing better, and has familiar syntax).
Boris the Spider also mentioned Channel and RandomAccessFile.
Can someone explain the pros and cons of each of these methods? To be specific, when would I want to use each?
(edit) I think I should be specific and add that I have a strong preference for the Scanner method. So the real question is, when wouldn't I want to use it?
Lets start at the beginning. The question is what do you want to do?
It's important to understand what a file actually is. A file is a collection of bytes on a disc, these bytes are your data. There are various levels of abstraction above that that Java provides:
File(Input|Output)Stream - read these bytes as a stream of byte.
File(Reader|Writer) - read from a stream of bytes as a stream of char.
Scanner - read from a stream of char and tokenise it.
RandomAccessFile - read these bytes as a searchable byte[].
FileChannel - read these bytes in a safe multithreaded way.
On top of each of those there are the Decorators, for example you can add buffering with BufferedXXX. You could add linebreak awareness to a FileWriter with PrintWriter. You could turn an InputStream into a Reader with an InputStreamReader (currently the only way to specify character encoding for a Reader).
So - when wouldn't I want to use it [a Scanner]?.
You would not use a Scanner if you wanted to, (these are some examples):
Read in data as bytes
Read in a serialized Java object
Copy bytes from one file to another, maybe with some filtering.
It is also worth nothing that the Scanner(File file) constructor takes the File and opens a FileInputStream with the platform default encoding - this is almost always a bad idea. It is generally recognised that you should specify the encoding explicitly to avoid nasty encoding based bugs. Further the stream isn't buffered.
So you may be better off with
try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream())), "UTF-8") {
//do stuff
}
Ugly, I know.
It's worth noting that Java 7 Provides a further layer of abstraction to remove the need to loop over files - these are in the Files class:
byte[] Files.readAllBytes(Path path)
List<String> Files.readAllLines(Path path, Charset cs)
Both these methods read the entire file into memory, which might not be appropriate. In Java 8 this is further improved by adding support for the new Stream API:
Stream<String> Files.lines(Path path, Charset cs)
Stream<Path> Files.list(Path dir)
For example to get a Stream of words from a Path you can do:
final Stream<String> words = Files.lines(Paths.get("myFile.txt")).
flatMap((in) -> Arrays.stream(in.split("\\b")));
SCANNER:
can parse primitive types and strings using regular expressions.
A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types.more can be read at http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
DATA INPUT STREAM:
Lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class. More can be read at http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html
BufferedReader:
Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.More detail are at http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
NOTE: This approach is outdated. As Boris points out in his comment. I will leave it here for history, but you should use methods available in JDK.
It depends on what kind of operation you are doing and the size of the file you are reading.
In most of the cases, I recommend using commons-io for small files.
byte[] data = FileUtils.readFileToByteArray(new File("myfile"));
You can read it as string or character array...
Now, you are handing big files, or changing parts of a file directly on the filesystem, then the best it to use a RandomAccessFile and potentially even a FileChannel to do "nio" style.
Using BufferedReader
BufferedReader reader;
char[] buffer = new char[10];
reader = new BufferedReader(new FileReader("FILE_PATH"));
//or
reader = Files.newBufferedReader(Path.get("FILE_PATH"));
while (reader.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new char[10];
}
//or
while (buffReader.ready()) {
System.out.println(
buffReader.readLine());
}
reader.close();
Using FileInputStream-Read Binary Files to Bytes
FileInputStream fis;
byte[] buffer = new byte[10];
fis = new FileInputStream("FILE_PATH");
//or
fis=Files.newInoutSream(Paths.get("FILE_PATH"))
while (fis.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new byte[10];
}
fis.close();
Using Files– Read Small File to List of Strings
List<String> allLines = Files.readAllLines(Paths.get("FILE_PATH"));
for (String line : allLines) {
System.out.println(line);
}
Using Scanner – Read Text File as Iterator
Scanner scanner = new Scanner(new File("FILE_PATH"));
while (scanner.hasNextLine()) {
System.out.println(scanner.nextLine());
}
scanner.close();
Using RandomAccessFile-Reading Files in Read-Only Mode
RandomAccessFile file = new RandomAccessFile("FILE_PATH", "r");
String str;
while ((str = file.readLine()) != null) {
System.out.println(str);
}
file.close();
Using Files.lines-Reading lines as stream
Stream<String> lines = Files.lines(Paths.get("FILE_PATH") .forEach(s -> System.out.println(s));
Using FileChannel-for increasing performance by using off-heap memory furthermore using MappedByteBuffer
FileInputStream i = new FileInputStream(("FILE_PATH");
ReadableByteChannel r = i.getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while (r.read(buffer) != -1) {
buffer.flip();
while (buffer.hasRemaining()) {
System.out.print((char) buffer.get());
}
buffer.clear();
}
I've been Googling up some answers, and I can't seem to find the best one.
Here's what I have so far for reading internal files on Android:
fis = openFileInput("MY_FILE");
StringBuilder fileContent = new StringBuilder("");
byte[] buffer = new byte[fis.available()];
while (fis.read(buffer) != -1) {
fileContent.append(new String(buffer));
}
MYVARIABLE = fileContent.toString();
fis.close();
It use to leave a lot of whitespaces, but I just used .available method to only return what I need.
Is there a faster or shorter way to write this? I can't seem to find any good ones in the API guide.
1). API for available() says it should not be used for the purposes you need:
Note that this method provides such a weak guarantee that it is not very useful in practice.
Meaning it may not give you the file size.
2). When you read smth in RAM, then take under account the file can be lengthy, so try to avoid spending extra RAM. For this a relatively small (1~8 KB) buffer is used to read from source and then append to result. On the other hand using too small buffers (say, several bytes) slows down reading significantly.
3). Reading bytes differs from reading characters, because a single character may be represented by more than one byte (depends on encoding). To read chars the spesific classes are used which are aware of encoding and know how to convert bytes to chars properly. For instance InputStreamReader is one of such classes.
4). The encoding to use for reading should be the encoding tha was used for persisting the data.
Taking all the said above I would use smth like this:
public static String getStringFromStream(InputStream in, String encoding)
throws IOException {
InputStreamReader reader;
if (encoding == null) {
// This constructor sets the character converter to the encoding
// specified in the "file.encoding" property and falls back
// to ISO 8859_1 (ISO-Latin-1) if the property doesn't exist.
reader = new InputStreamReader(in);
} else {
reader = new InputStreamReader(in, encoding);
}
StringBuilder sb = new StringBuilder();
final char[] buf = new char[1024];
int len;
while ((len = reader.read(buf)) > 0) {
sb.append(buf, 0, len);
}
return sb.toString();
}
5). Make sure to always close an InputStream when done working with it.
Sure, there are more than one way to read text from file in Java/Android. This is mostly because Java API contains several generations of IO APIs. For instance, classes from java.nio package were created to be more efficient, however usually there is no strong reason of using them (don't fall into premature optimization sin).
We are streaming data between a server (written in .Net running on Windows) to a client (written in Java running on Ubuntu) in batches. The data is in XML format. Occasionally the Java client throws an unexpected EOF while trying decompress the stream. The message content always varies and is user driven. The response from the client is also compressed using GZip. This never fails and seems to be rock solid. The response from the client is controlled by the system.
Is there a chance that some arrangement of characters or some special characters are creating false EOF markers? Could it be white-space related? Is GZip suitable for compressing XML?
I am assuming that the code to read and write from the input/output streams works because we only occasionally gets this exception and when we inspect the user data at the time there seems to be special characters (which is why I asked the question) such as the '#' sign.
Any ideas?
UPDATE:
The actual code as requested. I thought it wasn't this due to the fact that I had been to a couple of sites to get help on this issue and they all more or less had the same code. Some sites mentioned appended GZip. Something to do with GZip creating multiple segments?
public String receive() throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(8192);
do {
int nrBytes = in.read(buffer);
if (nrBytes > 0) {
baos.write(buffer, 0, nrBytes);
}
} while (in.available() > 0);
return compressor.decompress(baos.toByteArray());
}
public String decompress(byte[] data) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(data);
try {
GZIPInputStream inflater = new GZIPInputStream(in);
byte[] byteBuffer = new byte[8192];
int r;
while((r = inflater.read(byteBuffer)) > 0 ) {
buffer.write(byteBuffer, 0, r);
}
} catch (IOException e) {
log.error("Could not decompress stream", e);
throw e;
}
return new String(buffer.toByteArray());
}
At first I thought there must be something wrong with the way that I am reading in the stream and I thought perhaps I am not looping properly. I then generated a ton of data to be streamed and checked that it was looping. Also the fact they it happens so seldom and so far has not been reproducable lead me to believe that it was the content rather than the scenario. But at this point I am totally baffled and for all I know it is the code.
Thanks again everyone.
Update 2:
As requested the .Net code:
Dim DataToCompress = Encoding.UTF8.GetBytes(Data)
Dim CompressedData = Compress(DataToCompress)
To get the raw data into bytes. And then it gets compressed
Private Function Compress(ByVal Data As Byte()) As Byte()
Try
Using MS = New MemoryStream()
Using Compression = New GZipStream(MS, CompressionMode.Compress)
Compression.Write(Data, 0, Data.Length)
Compression.Flush()
Compression.Close()
Return MS.ToArray()
End Using
End Using
Catch ex As Exception
Log.Error("Error trying to compress data", ex)
Throw
End Try
End Function
Update 3: Also added more java code. the in variable is the InputStream return from socket.getInputStream()
It certainly shouldn't be due to the data involved - the streams deal with binary data, so that shouldn't make any odds at all.
However, without seeing your code, it's hard to say for sure. My first port of call would be to check anywhere that you're using InputStream.read() - check that you're using the return value correctly, rather than assuming a single call to read() will fill the buffer.
If you could provide some code, that would help a lot...
I would suspect that for some reason the data is altered underway, by treating it as text, not as binary, so it may either be \n conversions or a codepage alteration.
How is the gzipped stream transferred between the two systems?
It is not pssible. EOF in TCP is delivered as an out of band FIN segment, not via the data.