BufferedReader stuck in readLine() [closed]

BufferedReader stuck in readLine() [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm trying to get the HTTP request from Google Chrome to get it's data. For that I use readLine() from BufferedReader but for some reason I think it gets stuck at the last line because the buffer stays open and it stays waiting for more input. Here is the code that I use in the while loop:
String line;
ArrayList<String> request = new ArrayList<String>();
while ((line = inFromClient.readLine()) != null) {
request.add(line);
}
If I forcely break the loop it works, basically im trying to get an efficient read of all lines but without the inconsistencies of ready()

HTTP seems like a crazy simple protocol but it is not; you should use an HTTP client library such as the built-in java.net.http client.
The problem is that the concept of 'give me my data, then close it down' is HTTP/1.0, and that's a few decades out of date. HTTP/2.0 and HTTP/3.0 are binary protocols, and HTTP/1.1 tends to leave the connection open. In general, 'read lines', and even 'use Reader' (as in, read characters instead of bytes) is the wrong way to go about it, as HTTP is not a textual protocol. I know. It looks like one. It's not.
Here is a highly oversimplified overview of how e.g. a browser reads HTTP/1.1 responses:
Use raw byte processing because HTTP body content is raw (or can be), therefore wrapping the whole thing into e.g. an InputStreamReader or BufferedReader is a non-starter.
Keep reading until an 0x0A byte (in ASCII, the newline symbol), or X bytes have been read and your buffer for this is full, where X is not extraordinarily large. Wouldn't want a badly behaving server or a misunderstanding where you connect to a different (non-HTTP) service to cause a memory issue! Parse this first line as an HTTP/1.1 response.
Keep doing this loop to pick up all headers. Use the same 'my buffer has limits' trick to avoid memory issues.
Then check the response code in order to figure out if a body will be forthcoming. It's HTTP/1.1, so you can't just go: "Well, if the connection is closed, I guess no body is forthcoming". Whether one will be coming or not depends primarily on the response code.
Assuming a body exists, read the double-newline that separates headers from the body.
If the content is transfered as chunked encoding (common), start blitting data into a buffer, but check if you read the entire chunk. Reading chunked encoding is its own game, really.
Alternatively, HTTP/1.1 DEMANDS that if chunked encoding isn't used that Content-Length is present. Use this header to know precisely how many bytes to read.
Neither 'a newline' nor 'close connection' can ever serve as a meaningful marker of 'end of data' in HTTP/1.1, so, don't.
Then either pass the content+headers+returncode verbatim to the requesting code, or dress it up a bit. For example, if the Content-Type header is present and has value text/html; encoding=UTF-8 you can consider taking the body data and turning it into a string via UTF-8 (new String(byteArray, StandardCharsets.UTF_8);).
Note that I've passed right over some bizarre behaviour that servers do because in ye olden days some dumb browser did weird things and it's now the status quo (for example, range requests are quite bizarre) and there's of course HTTP2 and HTTP3 which are completely different protocols.
Also, of course, HTTP servers are rare these days; HTTPS is where its at, and that's quite different too.

Related

Handle HTTP POST multipart response through ServerSocket

Good afternoon everyone,
First of all, I'll say that it's only for personal purpose in a certain way, it's made to use for little projects to improve my Java knowledge, but my idea is to make this kind of things to understand better the way developers works with sockets and bytes, as I really like to understand this kind of things better for my future ideas.
Actually I'm making a lightweight HTTP server in Java to understand the way it works, and I've been reading documentation but still have some difficulties to actually understand part of the official documentation. The main problem I'm facing is that, something I'd like to know if it's related or not, the content-length seems to have a higher length than the one I get from the BufferedReader. I don't know if the issue is about the way chars are managed and bytes are being parsed to chars on the BufferedReader, so it has less data, so probably what I have to do is treat this part as a binary, so I'd have to read the bytes of the InputStream, but here comes the real deal I'm facing.
As Readers reads a certain amount of bytes, and then it stops and uses this as buffer, this means the data from the InputStream is being used on the Reader, and it's no longer on the stream, so using read() would end up on a -1 as there aren't more bytes to read. A multipart is divided in multiple elements separated with a boundary, and a newline that delimiters the information from the content. I still have to get the information as an String to process it, but the content should be parsed into a binary data, and, without modifying the buffer length, implying I'd require knowledge about the exact length I require to get only the information, the most probably result would be the content being transferred to the BufferedReader buffer. Is it possible to do it even with the processed data from the BufferedStream, or should I find a way to get that certain content as binary without being processed?
As I said, I'm new working with sockets and services, so I don't exactly know which are the possibilities it's another kind of issue, so any help would be appreciated, thank you in advance.

Answer from Remy Lebeau, that can be found on the comments, which become useful for me:
since multipart data is both textual and binary, you are going to have to do your own buffering of the socket data so you have more control and know where the data switches back and forth. At the very least, since you can read binary data directly from a BufferedInputStream, and access its internal buffer, you can let it handle the actual buffering for you, and it is not difficult to write a custom readLine() method that can read a line of text from a BufferedInputStream without using BufferedReader

Find byte offsets for e-mail attachments

I got a requirement to deliver emails to a legacy system that needs to read the attachments.
For each part in a multipart email I need to provide the byte offset for where the attachment starts in the email, so the legacy system doesn't need to know how to parse emails.
Performance and memory usage is an issue, so the solution can't load the entire email into memory. And to my eyes that leaves out javax.mail.
How would you go about it in Java?
My first idea was to use mime4j, but the library does not keep of byte offset or even the line number.
I investigated making a PR to mime4j to add tracking of line numbers and byte offsets. But it is not very easy, since it is a very mature project and it uses lots of buffering internally.
Now I am thinking that maybe I am going about this the wrong way. So I would very much appreciate any ideas of how to solve this in a simple matter.

You're going to run into issues just sending the byte offsets and the full email, as emails still can be base64 encoded or printed-quoteable encoded.
You'll want to use a MimeStreamParser and give your own ContentHandler and override the body method. You can then directly send the BodyDescriptor and InputStream to the legacy system. The InputStream is the "decoded" email (IE handles any Content-Transfer-Encoding). The BodyDescriptor is useful to extract stuff from the headers of the part that you may care about (MimeType and Charset are the most useful ones).
This does not buffer the whole email, and allows you to stream out just the body parts. I'm not sure how you're communicating with the legacy system (via the network or if it's an inprocess subcomponent) but hopefully that works!

What makes a connection reusable

I saw this description in the Oracle website:
"Since TCP by its nature is a stream based protocol, in order to reuse an existing connection, the HTTP protocol has to have a way to indicate the end of the previous response and the beginning of the next one. Thus, it is required that all messages on the connection MUST have a self-defined message length (i.e., one not defined by closure of the connection). Self demarcation is achieved by either setting the Content-Length header, or in the case of chunked transfer encoded entity body, each chunk starts with a size, and the response body ends with a special last chunk."
See Oracle doc
I don't know how to implement, can someone give me an example of Java implementation ?

If you are trying to implement "self-demarcation" in the same way as HTTP does it:
the HTTP 1.1 specification defines how it works,
the source code of (say) the Apache HTTP libraries are an example of its implementation.
In fact, it is advisable NOT to try and implement this (HTTP) yourself from scratch. Use an existing implementation.
On the other hand, if you simply want to implement your own ad-hoc self-demarcation scheme, it is really easy to do.
The sender figures out the size of the message, in bytes or characters or some other unit that makes sense.
The sender sends a the message size, followed by the message itself.
At the other end:
The receiver reads the message size, and then reads the requisite number of bytes, characters, to form the message body.
An alternative is to for the sender to send the message followed by a special end-of-message marker. To make this work, either you need to guarantee that no message will contain the end-of-message marker, or you need to use some sort of escaping mechanism.
Implementing these schemes is simple Java programming.
What makes a connection reusable
That is answered by the text that you quoted in your Question.

HttpURLConnection: What's the deal with having to read the whole response?

My current problem is very similar to this one.
I have a downloadFile(URL) function that creates a new HttpURLConnection, opens it, reads it, returns the results. When I call this function on the same URL multiple times, the second time around it almost always returns a response code of -1 (But throws no exception!!!).
The top answer in that question is very helpful, but there are a few things I'm trying to understand.
So, if setting http.keepAlive to false solves the problem, it indicates what exactly? That the server is responding in a way that violates the http protocol? Or more likely, my code is violating the protocol in some way? What will the trace tell me? What should I look for?
And what's the deal with this:
You need to read everything from error
stream. Otherwise, it's going to
confuse next connection and that's the
cause of -1.
Does this mean if the response is some type of error (which would be what response code(s)?), the stream HAS to be fully read? Also, every time I am attempting an http request I am basically creating a new connection, and then disconnect()ing it at the end.
However, in my case I'm not getting a 401 or whatever. It's always a 200. But my second connection almost always fails. Does this mean there's some other data I should be reading that I'm not (in a similar manner that the error stream must be fully read)?
Please help shed some light on this? I feel like there's some fundamental http protocol understanding I'm missing.
PS If I were just using the Apache HttpClient, would I not have to deal with all these protocol details? Does it take care of everything for me?

The support for keep-alive in the default HTTP URL handler is very buggy. We always turn it off.
Use Apache HttpClient with a pooled connection manager if you want keep-alive. If you don't want change your code, you can get another handler like this one,
http://www.innovation.ch/java/HTTPClient/
If your second connection always fails, that means your server doesn't support keepalive. With Keepalive, the HTTP handler simply leaves connection open (even if you call disconnect). The server closes connection if keep-alive is not supported but the handler doesn't know till you make next request on the connection so the 2nd connection fails.
Regarding the read error stream, it only applies if you get non-200 responses.

i think you're probably talking about this HttpURLConnection bug, fixed in froyo:
http://code.google.com/p/android/issues/detail?id=2939
see that bug for other workarounds. if this isn't the bug you've hit, please raise a bug with a repeatable test case at http://code.google.com/p/android/issues/entry.

How to get HTTP response through socket in java?

I have written a code to send a HTTP request through a socket in java. Now I want to get the HTTP response that was sent by the server to which I sent HTTP request.

It's not totally clear what you're asking for. Assuming you've written the request to the socket, the next thing you'll want to do is:
Call shutdownOutput() on the socket to tell the server that the request is done (not necessary if you've sent the content length)
Read the response from the socket's input stream, parsing according to the HTTP spec.
This is a bunch of work, so I'd suggest that rather than rolling your own HTTP request logic, use URLConnection which is built-in to Java and includes methods for retrieving the content of a response as well as any headers set by the server.

As Jon said, read the HTTP spec. However Internet protocols are generally line oriented, so you can read the response a line a time. The first line will be the response. Following this will be the headers, one per line (unless there's a continuation). If there's a body one of the headers will the content-type, this will tell you what the content is. If you want to read the content you will need to understand the different ways the content can be sent. There may be a content length header (or not) or the content maybe chunked (can't remember the header off the top of my head). And of course the content may be binary rather than text.

yup!
that's right!
the respond should be clearly readed by the inputstream into
a few chunk of bytes...
thus we could translate it into a readable format.
But that also take longer time.... :(

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.