I'm currently creating a little webserver (for testing purposes) and I have a problem reading the HTTP request-header (coming from the browser, chromium in my case).
First, I simply tried something like this:
BufferedReader in = new BufferedReader(
new InputStreamReader(client_socket.getInputStream(), "UTF-8")
);
StringBuilder builder = new StringBuilder();
while (in.ready()){
builder.append(in.readLine());
}
return builder.toString();
This worked fine for the first request. However, after the first request was done, the ready()-method only returned false (i closed the client_socket as well as all readers/writers).
After a little searching I stumbled across this older question: Read/convert an InputStream to a String
I tried the first four solutions (two with Apache Commons, the one with the Scanner and the one with the do-while loop). All of them blocked for ever and the browser gave me a "Website not reachable"-error.
I'd like to do this on my own (without using any library's or embedded servers), that's why I even try.
I'm a little lost right now, how would you go about this?
You are reading from the socket until there is no more data to read. That is wrong. You need to keep reading until you encounter a 0-length line, then process the received headers to determine if there is more data to read (look for Content-Length: ... and Transfer-Encoding: chunked headers), eg:
StringBuilder builder = new StringBuilder();
String line;
do
{
line = in.readLine();
if (line == "") break;
builder.append(line);
}
while (true);
// use builder as needed...
// read message body data if headers say there
// is a body to read, per RFC 2616 Section 4.4...
Read RFC 2616 Section 4 for more details. Not only does this facilitate proper request reading, but doing so also allows you to support HTTP keep-alives correctly (for sending multiple requests on a single connection).
The solution suggested above by Remy Lebeau is wrong, as it was shown in a test of mine. This alternative is fail-safe:
StringBuilder builder = new StringBuilder();
String line;
do
{
line = in.readLine();
if (line.equals("")) break;
builder.append(line);
}
while (true);
Refer to: How do I compare strings in Java?
Related
Beginner in java, I try to decompress an HTTP response in Gzip format. Roughly, I have a bufferReader which allows me to read lines of http response from a socket. Thanks to that, I parse the http header and if it specifies that the body is in gzip format then I have to decompress it. Here is the code which I use:
DataInputStream response = new DataInputStream(clientSideSocket.getInputStream());
BufferedReader buffer = new BufferedReader(new InputStreamReader(response))
header = parseHTTPHeader(buffer); // return a map<String,String> with header options
StringBuilder SBresponseBody = new StringBuilder();
String responseBody = new String();
String line;
while((line = buffer.readLine())!= null) // extract the body as if was a string...
SBresponseBody.append(line);
responseBody = SBresponseBody.toString();
if (header.get("Content-Encoding").contains("gzip"))
responseBody = unzip(responseBody); // function I try to construct
My attempt for the unzip function is as follows:
private String unzip(String body) throws IOException {
String responseBody = "";
byte[] readBuffer = new byte[5000];
GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream(body.getBytes());
int read = gzip.read(readBuffer,0,readBuffer.length);
gzip.close();
byte[] result = Arrays.copyOf(readBuffer, read);
responseBody = new String(result, "UTF-8");
return responseBody;
}
I get an error in the GZIPInputStream: not GZIP format (because gzip header is not found in body).
Here are my thoughts:
• Is body.toByte() wrong since it has been read by a bufferReader as a character string and therefore converting it back to byte[] makes no sense since it has already been interpreted in the wrong way? Or do I reconvert Sting body to byte[] in the wrong way?
• Do I have to build a GZIP header myself using the information provided in the HTTP header and adding it to the String body ?
• Do I need to create another InputStream from my socket.getInputStream() to read the information byte by byte, or is it tricky since there is already a buffer "connected" to this socket?
Roughly, I have a bufferReader which allows me to read lines of http response from a socket.
You've handrolled a HTTP client.
This is not a good thing; HTTP is considerably more complicated than you think it is. gzip is just one of about 10,000 things you need to think about. There's HTTP/2.0, Spdy, http3, chunked transfer encoding, TLS, redirects, mime packing, and so much more to think about.
So, if you want to write an actual HTTP client, you need about 100x this code and a ton of domain knowledge, because the actual specs of the HTTP protocol, while handy, don't really tell the story. The de-facto protocol you're implementing is 'whatever servers connected to the internet tend to send' and what they tend to send is tightly wound up with 'whatever commonly used browsers tend to get right', which is almost, but not quite, what that spec document says. This is one of those cases where pragmatics and implementations are the 'real spec', and the actual spec is merely attempting to document reality.
That's a long way around to say: Your mistake is trying to handroll a HTTP client. Don't do that. Use OkHttp or the http client introduced in jdk11 in the core libraries.
But, I know what I want!
Your code is loaded up with bugs, though.
DataInputStream response = new DataInputStream(clientSideSocket.getInputStream());
DataInputStream is useless here. Remove that wrapper.
BufferedReader buffer = new BufferedReader(new InputStreamReader(response))
Missing semi-colon. Also, this is broken - this will convert the bytes flowing over the wire to characters using 'platform default encoding' which is wrong, you need to look at the Content-Type header.
responseBody = unzip(responseBody)
You cannot do this. Your major misunderstanding is that you appear to think that there is no difference between a bunch of bytes, and a sequence of characters.
That's wrong. Once you stored bytes into chars, you cannot unzip it anymore.
The fix is to check for the gzip header FIRST, then wrap your inputstream through GZipStream.
I'm using Tomcat 6.0.36 and JRE 1.5.0, and I'm doing development work on Windows 7.
As a proof of concept for some work I'm doing, from Java code I'm HTTP posting some XML over a socket to a servlet. The
servlet then echos back the xml. In my first implementation, I was handing the input stream at both ends to an XML
document factory to extract the xml that was sent over the wire. This worked without a hitch in the servlet but failed
on the client side. It turned out that it failed on the client side because the reading of the response was blocking
to the point that the document factory was timing out and throwing an exception prior to the entire response arriving.
(The behaviour of the document factory is now moot because, as I describe below, I am getting the same blocking issue
without the use of the document factory.)
To try to work through this blocking issue, I then came up with a simpler version of the client side code and the
servlet. In this simpler version, I eliminated the document builder from the equation. The code on both sides now
simply reads the text from their respective input streams.
Unfortunately, I still have this blocking issue with the response and, as I describe below, it has not been resolved by
simply calling response.flushBuffer(). Google searches retrieved only one relevant topic that I could find
(Tomcat does not flush the response buffer) but this was not the exact same
issue.
I have included my code and explained the exact issues below.
Here's my servlet code (remember, this is bare-bones proof-of-concept code, not production code),
import java.io.InputStreamReader;
import java.io.LineNumberReader;
import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public final class EchoXmlServlet extends HttpServlet {
public void init(ServletConfig config) throws ServletException {
System.out.println("EchoXmlServlet loaded.");
}
public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException {
}
public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException {
try {
processRequest(request, response);
}
catch(Exception e) {
e.printStackTrace();
throw new ServletException(e);
}
System.out.println("Response sent.");
return;
}
private final void processRequest(HttpServletRequest request, final HttpServletResponse response) throws Exception {
String line = null;
StringBuilder sb = new StringBuilder();
LineNumberReader lineReader = new LineNumberReader(new InputStreamReader(request.getInputStream(), "UTF-8"));
while((line = lineReader.readLine()) != null) {
System.out.println("line: " + line);
sb.append(line);
sb.append("\n");
}
sb.append("An additional line to see when it turns up on the client.");
System.out.println(sb);
response.setHeader("Content-Type", "text/xml;charset=UTF-8");
response.getOutputStream().write(sb.toString().getBytes("UTF-8"));
// Some things that were tried.
//response.getOutputStream().print(sb.toString());
//response.getOutputStream().print("\r\n");
//response.getOutputStream().flush();
//response.flushBuffer();
}
public void destroy() {
}
}
Here's my client side code,
import java.io.BufferedOutputStream;
import java.io.InputStreamReader;
import java.io.LineNumberReader;
import java.io.OutputStream;
import java.net.Socket;
public final class SimpleSender {
private String host;
private String path;
private int port;
public SimpleSender(String host, String path, int port) {
this.host = host;
this.path = path;
this.port = port;
}
public void execute() {
Socket connection = null;
String line;
try {
byte[] xmlBytes = getXmlBytes();
byte[] headerBytes = getHeaderBytes(xmlBytes.length);
connection = new Socket(this.host, this.port);
OutputStream outputStream = new BufferedOutputStream(connection.getOutputStream());
outputStream.write(headerBytes);
outputStream.write(xmlBytes);
outputStream.flush();
LineNumberReader lineReader
= new LineNumberReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
while((line = lineReader.readLine()) != null) {
System.out.println("line: " + line);
}
System.out.println("The response is read.");
}
catch(Exception e) {
e.printStackTrace();
}
finally {
try {
connection.close();
}
catch(Exception e) {}
}
}
private byte[] getXmlBytes() throws Exception {
StringBuffer sb = null;
sb = new StringBuffer()
.append("<my-xml>\n")
.append("Hello to myself.\n")
.append("</my-xml>\n");
return sb.toString().getBytes("UTF-8");
}
private byte[] getHeaderBytes(int contentLength) throws Exception {
StringBuffer sb = null;
sb = new StringBuffer()
.append("POST ")
.append(this.path)
.append(" HTTP/1.1\r\n")
.append("Host: ")
.append(this.host)
.append("\r\n")
.append("Content-Type: text/xml;charset=UTF-8\r\n")
.append("Content-Length: ")
.append(contentLength)
.append("\r\n")
.append("\r\n");
return sb.toString().getBytes("UTF-8");
}
}
When a request is sent to the servlet via a call to SimpleSender.execute(), the code in the servlet that receives the
request reads the xml without a hitch. My servlet code also exits from its processRequest() and doPost() without a
hitch. This is the immediate (i.e. there is no blocking between any output line) output on the server:
line: <my-xml>
line: Hello to myself.
line: </my-xml>
<my-xml>
Hello to myself.
</my-xml>
An additional line to see when it turns up on the client.
Response sent.
The output above is exactly as expected.
On the client side, however, the code outputs the following then blocks:
HELLO FROM MAIN
line: HTTP/1.1 200 OK
line: Server: Apache-Coyote/1.1
line: Content-Type: text/xml;charset=UTF-8
line: Content-Length: 74
line: Date: Sun, 18 Nov 2012 23:58:43 GMT
line:
line: <my-xml>
line: Hello to myself.
line: </my-xml>
After about 20 seconds of blocking (I timed it), the following lines are output on the client side,
line: An additional line to see when it turns up on the client.
The response is read.
GOODBYE FROM MAIN
Note that the entire output on the server side is fully visible while the blocking is occurring on the client side.
From there, I tried to flush on the server side to try to fix this issue. I independently tried two methods of flushing:
response.flushBuffer() and response.getOutputStream().flush(). With both methods of flushing, I still had blocking on
the client side (but in a different part of the response), but I had other issues as well. Here's where the client
blocked,
HELLO FROM MAIN
line: HTTP/1.1 200 OK
line: Server: Apache-Coyote/1.1
line: Content-Type: text/xml;charset=UTF-8
line: Transfer-Encoding: chunked
line: Date: Mon, 19 Nov 2012 00:21:53 GMT
line:
line: 4a
line: <my-xml>
line: Hello to myself.
line: </my-xml>
line: An additional line to see when it turns up on the client.
line: 0
line:
After blocking for about 20 seconds, the following is output on the client side,
The response is read.
GOODBYE FROM MAIN
There are three problems with this output on the client side. Firstly, the reading of the response is still blocking, it's
just blocking after a different part of the response. Secondly, I have unanticipated characters returned ("4a", "0").
Finally, the headers have changed. I've lost the Content-Length header, and I have gained the
"Transfer-encoding: chunked" header.
So, without a flush, my response is blocking prior to sending the final line and a termination to the response. However,
with a flush, the response is still blocking but now I'm getting characters I don't want and a change to the headers I
don't want.
In Tomcat, my connector has the default definition,
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
The connectionTimeout is set for 20 seconds. When I changed this to 10 seconds, my client side code blocked for 10
seconds instead of 20. So it appears that it is the connection timeout, as managed by Tomcat, that is causing my
response to be fully flushed and terminated.
Is there something additional I should be doing in my servlet code to indicate that my response is finished?
Has anyone got suggestions as to why my response is blocking prior to sending the final line and termination indicator?
Has anyone got suggestions as to why flush is sending unwanted characters and why the response is still blocking after
a flush?
If someone has the time, could you tell me if you get the same issues if you try running the code included in this post?
EDIT - In response to Guido's first reply
Guido,
Thanks very much for your reply.
Your client is blocking because you are using readLine to read the
body of the message. readLine hangs because the body does not end with
a line feed
No, I don't think this is true. Firstly, in my original version of my code, I was not using line readers on either the client or server side. On both sides, I was handing the stream to the xml document factory and letting it read from the stream. On the server, this worked fine. On the client, it timed out. (On the client, I was reading to the end of the headers prior to passing the stream to the document factory.)
Secondly, when I change my client code to not use a line reader, the blocking still occurs. Here's a version of SimpleSender.execute() that does not use a line reader,
public void execute() {
Socket connection = null;
int byteCount = 0;
try {
byte[] xmlBytes = getXmlBytes();
byte[] headerBytes = getHeaderBytes(xmlBytes.length);
connection = new Socket(this.host, this.port);
OutputStream outputStream = new BufferedOutputStream(connection.getOutputStream());
outputStream.write(headerBytes);
outputStream.write(xmlBytes);
outputStream.flush();
while(connection.getInputStream().read(new byte[1]) >= 0) {
++byteCount;
}
System.out.println("The response is read: " + byteCount);
}
catch(Exception e) {
e.printStackTrace();
}
finally {
try {
connection.close();
}
catch(Exception e) {}
}
return;
}
The above code blocks at,
HELLO FROM MAIN
then 20 seconds later, finishes wtih,
The response is read: 235
GOODBYE FROM MAIN
I think the above shows conclusively the problem is not with the use of a line reader on the client side.
sb.append("An additional line to see when it turns up on the client.\n");
The addition of the return in the above line just defers the block to one line later. I had tested this prior to my OP and I just tested again.
If you want to do your own HTTP parser, you have to read through the headers until you get two blank lines.
Yes, I do know that, but in this contrived simple example, it is a moot point. On the client, I am simply outputting the returned HTTP message, headers and all.
Then you need to scan the headers to see if you had a Content-Length header. If there is no Content-Length then you are done. If there is a Content-Length you need to parse it for the length, then read exactly that number of additional bytes from the stream. This allows HTTP to transport both text data and also binary data which has no line feeds.
Yup, all true but not relevant in this contrived simple example.
I recommend you replace the guts of your client code HTTP writer/parse with a pre-written client library that handles these details for you.
I agree entirely. I was actually hoping to pass off the handling of the streams to the xml document factories. As a way of dealing with my blocking issues, I also looked into Apache commons-httpclient. The new version (httpcomponents) still leaves it to the developer to handle the stream of a post return (from what I can tell), so that was of no use. If you can suggest another library, I'd be interested for sure.
I've disagreed with your points but I thank you for your reply and I mean no offense or any negative intimations by my disagreement. I'm obviously doing something wrong or not doing something I should, but I don't think the line reader is the issue. Additionally, where are those funky characters coming from if I flush? Why does the blocking occur when a line reader is not in use on the client side?
Also, I have replicated the issue on Jetty. Hence, this is definetly not a Tomcat issue and very much a 'me' issue. I'm doing something wrong but I don't know what it is.
Your server code looks fine. The problem is with your client code. It does not obey the HTTP protocol and is treating the response like a bunch of lines.
Quick fix on the server. Change to:
sb.append("An additional line to see when it turns up on the client.\n");
Your client is blocking because you are using readLine to read the body of the message. readLine hangs because the body does not end with a line feed. Finally Tomcat times out, closes the connection, your buffered reader detects this and returns the remaining data.
If you make the change above (to the server), This will make your client appear to work as you expect. Even though it is still wrong.
If you want to do your own HTTP parser, you have to read through the headers until you get two blank lines. Then you need to scan the headers to see if you had a Content-Length header. If there is no Content-Length then you are done. If there is a Content-Length you need to parse it for the length, then read exactly that number of additional bytes from the stream. This allows HTTP to transport both text data and also binary data which has no line feeds.
I recommend you replace the guts of your client code HTTP writer/parse with a pre-written client library that handles these details for you.
LOL Ok, I was doing something wrong (by omission). The solution to my issue? Add the following header to my http request,
Connection: close
That simple. Without that, the connection was staying alive. My code was relying on the server signifying that it was finished, but, the server was still listening on the open connection rather than closing it.
The header causes the server to close the connection when it finishes writing the response (which I guess is signified when its call to doPost(...) returns).
Addendum
With regard to the funky characters when flush() is called...
My server code, now that I'm using Connection: close, does not call flush(). However, if the content to be sent back is large enough (larger than the Tomcat connector's buffer size I suspect), I still get the funky characters sent back and the header 'Transfer-Encoding: chunked' appears in the response.
To fix this, I explicitly call, on the server side, response.setContentLength(...) prior to writing my response. When I do this, the Content-Length header is in the response instead of Transfer-Encoding: chunked, and I don't get the funky characters.
I'm not willing to burn any more time on this as my code is now working, but I do wonder if the funky characters were chunk delimiters, where, once I explicitly set the content length, the chunk delimiters were no longer necessary.
I am trying to create a proxy server.
I want to read the websites byte by byte so that I can display images and all other stuff. I tried readLine but I can't display images. Do you have any suggestions how I can change my code and send all data with DataOutputStream object to browser ?
try{
Socket s = new Socket(InetAddress.getByName(req.hostname), 80);
String file = parcala(req.url);
DataOutputStream out = new DataOutputStream(clientSocket.getOutputStream());
BufferedReader dis = new BufferedReader(new InputStreamReader(s.getInputStream()));
PrintWriter socketOut = new PrintWriter(s.getOutputStream());
socketOut.print("GET "+ req.url + "\n\n");
//socketOut.print("Host: "+req.hostname);
socketOut.flush();
String line;
while ((line = dis.readLine()) != null){
System.out.println(line);
}
}
catch (Exception e){}
}
Edited Part
This is what I should have to do. I can block banned web sites but can't allow other web sites in my program.
In the filter program, you will open a TCP socket at the specified port and wait for connections. If a
request comes (i.e. the client types a URL to access a web site), the application will process it to
decide whether access is allowed or not and then, using the same socket, it will send the reply back
to the client. After the client opened her connection to WebPolice (and her request has been checked
and is allowed), the real web page needs to be shown to the client. Therefore, since the user already gave her request, now it is WebPolice’s turn to forward the request so that the user can get the web page. Thus, WebPolice acts as a client and requests the web page. This means you need to open a connection to the web server (without closing the connection to the user), forward the request over this connection, get the reply and forward it back to the client. You will use threads to handle multiple connections (at the same time and/or at different times).
I don't know what exactly you're trying to do, but crafting an HTTP request and reading its response incorporates somewhat more than you have done here. Readline won't work on binary data anyway.
You can take a look at the URLConnection class (stolen here):
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
Then you can read textual or binary data from the in object.
Read line will treat the line read as a String, so unless you want to mess around with conversions over to bytes, I wouldn't recommend that.
I would just read bytes until you can't read anymore, then write them out to a file, this should allow you to grab the images, keeping file headers intact which can be important when dealing with files other than text.
Hope this helps.
Instead of using BufferedReader you can try to use InputStream.
It has several methods for reading bytes.
http://docs.oracle.com/javase/6/docs/api/java/io/InputStream.html
I have a Java program that uses OAuth for communication with a server to retrieve XML data.
It makes use of the Signpost OAuth library to connect with the source, and uses a standard way of reading the InputStream to access the XML that is returned.
Of late, I've noticed the slow time it's taken to retrieve the information and tests have revealed that some requests can take anywhere from 2000 ms up to 10000 ms (if it matters, the source server is in Europe, I am in Australia).
I added a timestamp after the OAuth communication (request.connect()) and again after the reading of the InputStream and here's the output:
Request #1: Communication: [6351ms] Data process: [403ms] Total: [6754ms]
Request #2: Communication: [1ms] Data process: [3121ms] Total: [3122ms]
Request #3: Communication: [1ms] Data process: [1297ms] Total: [1298ms]
Request #4: Communication: [0ms] Data process: [539ms] Total: [539ms]
Request #4 is actually Request #2 being run a 2nd time. All requests are made in one run of the program (there's no stopping and starting).
My question: is the InputStream returned to the HttpURLConnection object as part of the connect() method, or is it streamed back as I read from it (as the name suggests) and part of the actual connection process?
Secondary question: With the timing above, is the slow time most likely to be a problem with the server or my method of reading the InputStream?
For reference, here is the code in question:
long startTime = System.currentTimeMillis();
URL url = new URL(urlString);
HttpURLConnection request = (HttpURLConnection) url.openConnection();
consumer.sign(request);
request.connect();
long connectionTime = System.currentTimeMillis();
InputStream is = request.getInputStream();
if (is != null) {
final BufferedReader bufferedreader = new BufferedReader(
new InputStreamReader(is, "UTF-8"));
final StringBuffer s2 = new StringBuffer();
String line;
line = bufferedreader.readLine();
if (line != null) {
s2.append(line);
while ((line = bufferedreader.readLine()) != null) {
s2.append('\n');
s2.append(line);
}
}
bufferedreader.close();
rv = s2.toString();
}
long finishTime = System.currentTimeMillis();
long timeTaken = finishTime - startTime;
long totalConnectionTime = connectionTime - startTime;
long processDataTime = finishTime - connectionTime;
String info = "Communication: [" + totalConnectionTime +
"ms] Data process: [" + processDataTime +
"ms] Total: [" + timeTaken + "ms]";
Thanks in advance.
Based on the information provided, here are few observations and suggestions.
To answer your question, the data is streamed back as you read from it. And then you have buffered layers on the top of it. The whole data is not returned and its streamed. I hope I read your question correctly.
The secondary question: The time taken could in both the places viz. the server and in your code as well. Since you are not doing any other processing in your code other than reading the data (except Bufferedreader.close() and s2.toString();) the delay appears to be in server BUT just for being sure, if possible, hit the URL using any browser and see the time taken to fetch the request. (from the code I see that you are just fetching the data from URL and hence should be easy to access the same using browser)
You also have mentioned that you are retrieving XML from the server. I would recommend to use some standard xml parsers (SAX,xstrem etc) which are optimized (hence better performance) for reading xml data from an InputStream.
openConnection() does create the TCP connection, but unless you use a non-default streaming mode no data is sent until you either get the input stream or the reader or the response code. So sending the request is seen as part of getInputStream() in your case.
I am currently trying to get HttpComponents to send HttpRequests and retrieve the Response.
On most URLs this works without a problem, but when I try to get the URL of a phpBB Forum namely http://www.forum.animenokami.com the client takes more time and the responseEntity contains passages more than once resulting in a broken html file.
For example the meta tags are contained six times. Since many other URLs work I can't figure out what I am doing wrong.
The Page is working correctly in known Browsers, so it is not a Problem on their side.
Here is the code I use to send and receive.
URI uri1 = new URI("http://www.forum.animenokami.com");
HttpGet get = new HttpGet(uri1);
get.setHeader(new BasicHeader("User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:6.0) Gecko/20100101 Firefox/6.0"));
HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(get);
HttpEntity ent = response.getEntity();
InputStream is = ent.getContent();
BufferedInputStream bis = new BufferedInputStream(is);
byte[] tmp = new byte[2048];
int l;
String ret = "";
while ((l = bis.read(tmp)) != -1){
ret += new String(tmp);
}
I hope you can help me.
If you need anymore Information I will try to provide it as soon as possible.
This code is completely broken:
String ret = "";
while ((l = bis.read(tmp)) != -1){
ret += new String(tmp);
}
Three things:
This is converting the whole buffer into a string on each iteration, regardless of how much data has been read. (I suspect this is what's actually going wrong in your case.)
It's using the default platform encoding, which is almost never a good idea.
It's using string concatenation in a loop, which leads to poor performance.
Fortunately you can avoid all of this very easily using EntityUtils:
String text = EntityUtils.toString(ent);
That will use the appropriate character encoding specified in the response, if any, or ISO-8859-1 otherwise. (There's another overload which allows you to specify which character encoding to use if it's not specified.)
It's worth understanding what's wrong with your original code though rather than just replacing it with the better code, so that you don't make the same mistakes in other situations.
It works fine but what I don't understand is why I see the same text multiple times only on this URL.
It will be because your client is seeing more incomplete buffers when it reads the socket. Than could be:
because there is a network bandwidth bottleneck on the route from the remote site to your client,
because the remote site is doing some unnecessary flushes, or
some other reason.
The point is that your client must pay close attention to the number of bytes read into the buffer by the read call, otherwise it will end up inserting junk. Network streams in particular are prone not filling the buffer.