how to load first x bytes from URL with Java / Scala?

how to load first x bytes from URL with Java / Scala? - java

I want to read the first x bytes from a java.net.URLConnection (although I'm not forced to use this class - other suggestions welcome).
My code looks like this:
val head = new Array[Byte](2000)
new BufferedInputStream(connection.getInputStream).read(head)
IOUtils.toString(new ByteArrayInputStream(head), charset)
It works, but does this code load only the first 2000 bytes from the network?
Next trial
As 'JB Nizet' said it is not useful to use a buffered input stream, so I tried it with an InputStreamReader:
val head = new Array[Char](2000)
new InputStreamReader(connection.getInputStream, charset).read(head)
new String(head)
This code may be better, but the load times are about the same. So does this procedure limit the transferred bytes ?

No, it doesn't. It could read up to 8192 bytes (the deault buffer size of BufferedInputStream). It could also read 0 bytes, or any number of bytes between 0 and 2000, since you don't check the number of bytes that have actually been read, and which is returned by the read() method.
And finally, depending on the value of charset, and of the actual charset used by the HTTP response, this could return an incorrect string, or a String truncated in the middle of a multi-byte character. You should use a Reader to read text.
I suggest you read the Java IO tutorial.

You can use read(Reader, char[]) from Apache Commons IO. Just pass a 2000-character buffer to it and it will fill it with as many characters as possible, up to 2000.
Be sure you understand the objections in the other answers/comments, in particular:
Don't use Buffered... wrappers, it goes against your intentions.
If you read textual data, then use a Reader to read 2000 characters instead of InputStream reading 2000 bytes. The proper procedure would be to determine the character encoding from the headers of a response (Content-Type) and set that encoding into InputStreamReader.
Calling plain read(char[]) on a Reader will not fully fill the array you give to it. It can read as little as one character no matter how big the array is!
Don't forget to close the reader afterwards.
Other than that, I'd strongly recommend you to use Apache HttpClient in favor of java.net.URLConnection. It's much more flexible.
Edit: To understand the difference between Reader.read and IOUtils.read, it's worth examining the source of the latter:
public static int read(Reader input, char[] buffer,
int offset, int length)
throws IOException
{
if (length < 0) {
throw new IllegalArgumentException("Length must not be negative: " + length);
}
int remaining = length;
while (remaining > 0) {
int location = length - remaining;
int count = input.read(buffer, offset + location, remaining);
if (EOF == count) { // EOF
break;
}
remaining -= count;
}
return length - remaining;
}
Since Reader.read can read less characters than a given length (we only know it's at least 1 and at most the length), we need to iterate calling it until we get the amount we want.

Related

Why String receiver's size is smaller than original ByteArrayOutputStream's size when I call toString()

I'm in front of a curious problem. Some code is better than long story:
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
buffer.write(...); // I write byte[] data
// In debugger I can see that buffer's count = 449597
String szData = buffer.toString();
int iSizeData = buffer.size();
// But here, szData's count = 240368
// & iSizeData = 449597
So my question is: why szData doesn't contain all the buffer's data? (only one Thread run this code) because after that kind of operation, I don't want szData.charAt(iSizeData - 1) crashes!
EDIT: szData.getBytes().length = 450566. There is encoding problems I think. Better use a byte[] instead of a String finally?

In Java, char ≠ byte, depending on the default character coding of the platform, char can occupy up to 4 bytes in memory. You work either with bytes (binary data), or with characters (strings), you cannot (easily) switch between them.
For String operations like strncasecmp in C, use the methods of the String class, e.g. String.compareToIgnoreCase(String str). Also have a look at the StringUtils class from the Apache Commons Lang library.

Java - download a file through network with a buffer

i want to read from a network stream and write the bytes to a file, directly.
But every time i run the program very few bytes are written to the file actually.
Java:
InputStream in = uc.getInputStream();
int clength=uc.getContentLength();
byte[] barr = new byte[clength];
int offset=0;
int totalwritten=0;
int i;
int wrote=0;
OutputStream out = new FileOutputStream("file.xlsx");
while(in.available()!=0) {
wrote=in.read(barr, offset, clength-offset);
out.write(barr, offset, wrote);
offset+=wrote;
totalwritten+=wrote;
}
System.out.println("Written: "+totalwritten+" of "+clength);
out.flush();

That's because available() doesn't do what you think it does. Read its API documentation. You should simply read until the number of bytes read, returned by read(), is -1. Or even simpler, use Files.copy():
Files.copy(in, new File("file.xlsx").toPath());
Using a buffer that has the size of the input stream also pretty much defeats the purpose of using a buffer, which is to only have a few bytes in memory.
If you want to reimplement copy(), the general pattern is the following:
byte[] buffer = new byte[4096]; // number of bytes in memory
int numberOfBytesRead;
while ((numberOfBytesRead = in.read(buffer)) >= 0) {
out.write(buffer, 0, numberOfBytesRead);
}

You're using .available() wrong. From Java documentation:
available() returns an estimate of the number of bytes that can be read
(or skipped over) from this input stream without blocking by the next
invocation of a method for this input stream
That means that the first time your stream is slower than your file writing speed (very soon in all probability) the while ends.
You should either prepare a thread that waits for the input until it has read all the expected content length (with a sizable timeout, of course) or just block your program in the wait, if user interaction is not a big deal.

Why doesn't InputStream fill the array fully?

Dude, I'm using following code to read up a large file(2MB or more) and do some business with data.
I have to read 128Byte for each data read call.
At the first I used this code(no problem,works good).
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
for(int idx=0;idx<128;idx++){
read=is.read(); if(read==-1){return;}//end of stream
buff[idx]=(byte)read;
}
process_data(buff);
}
Then I tried this code which the problems got appeared(Error! weird responses sometimes)
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
//ERROR! java doesn't read 128 bytes while it's available
if((read=is.read(buff,0,128))==128){process_data(buff);}else{return;}
}
The above code doesn't work all the time, I'm sure that number of data is available, but reads(read) 127 or 125, or 123, sometimes. what is the problem?
I also found a code for this to use DataInputStream#readFully(buff:byte[]):void which works too, but I'm just wondered why the seconds solution doesn't fill the array data while the data is available.
Thanks buddy.

Consulting the javadoc for FileInputStream (I'm assuming since you're reading from file):
Reads up to len bytes of data from this input stream into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
The key here is that the method only blocks until some data is available. The returned value gives you how many bytes was actually read. The reason you may be reading less than 128 bytes could be due to a slow drive/implementation-defined behavior.
For a proper read sequence, you should check that read() does not equal -1 (End of stream) and write to a buffer until the correct amount of data has been read.
Example of a proper implementation of your code:
InputStream is; // = something...
int read;
int read_total;
byte[] buf = new byte[128];
// Infinite loop
while(true){
read_total = 0;
// Repeatedly perform reads until break or end of stream, offsetting at last read position in array
while((read = is.read(buf, read_total, buf.length - offset)) != -1){
// Gets the amount read and adds it to a read_total variable.
read_total = read_total + read;
// Break if it read_total is buffer length (128)
if(read_total == buf.length){
break;
}
}
if(read_total != buf.length){
// Incomplete read before 128 bytes
}else{
process_data(buf);
}
}
Edit:
Don't try to use available() as an indicator of data availability (sounds weird I know), again the javadoc:
Returns an estimate of the number of remaining bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. Returns 0 when the file position is beyond EOF. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In some cases, a non-blocking read (or skip) may appear to be blocked when it is merely slow, for example when reading large files over slow networks.
The key there is estimate, don't work with estimates.

Since the accepted answer was provided a new option has become available. Starting with Java 9, the InputStream class has two methods named readNBytes that eliminate the need for the programmer to write a read loop, for example your method could look like
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
byte[] buff = new byte[128];
while (true) {
int numRead = is.readNBytes(buff, 0, buff.length);
if (numRead == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff, numRead);
}
}
or the slightly simpler
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
while (true) {
byte[] buff = is.readNBytes(128);
if (buff.length == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff);
}
}

read(byte[] b, int off, int len) : Reading A File To Its End

PREMISE
I am new to working with steams in Java and am finding that my question at least appears different from those asked, previously. Here is a fragment of my code at this juncture (the code is more of a proof of concept):
try {
//file is initialized to the path contained in the command-line args.
File file = new File(args[0]);
inputStream = new FileInputStream(file);
byte[] byteArray = new byte[(int)file.length()];
int offset = 0;
while (inputStream.read(byteArray, offset, byteArray.length - offset) != -1) {
for (int i = 0; i < byteArray.length; i++) {
if (byteArray[offset + i] >=32 && byteArray[offset + i] <= 126) {
System.out.print("#");
} else {
System.out.print("#");
}
//offset = byteArray.length - offset;
}
}
GOAL
Here is my goal: to create a program that reads in only 80 bytes of input (the number is arbitrary - let it be x), decides whether each byte within that segment represents an ASCII character, and prints accordingly. The last two portions of the code are "correct" for all intents and purposes: being that the code already appropriately makes the determination and prints, accordingly - this is not the premise of my question.
Let's say the length() of file is greater than 80 bytes and I want - while only reading in 80 bytes of input at a time - to reach the EOF, i.e input the file's entire contents. Each line printed to the console can only contain 80 - or, x amount of - bytes worth of content. I know to adjust the offset and have been tinkering with that; however, when I hit the EOF, I don't want to program to crash and burn - to "explode", so to speak.
QUESTION
When encountering EOF, how do I ensure the captured bytes are still read and that the code in the for loop is still executed?
For instance, changing the above inputStream.read() to:
inputStream.read(byteArray, offset, 80)
This would "bomb" were the end of file (EOF) encountered in reading the last bytes within the file. For instance, if I am trying to read 80 bytes and only 10 remain.

The return value from read tells you the number of bytes which were read. This will be <= the value of length. Just because the file is larger than length does not mean that a request for length number of bytes will actually result in that many bytes being read into your byte[].
When -1 is returned, that does indicate EOF. It also indicates that no data was read into your byte[].

Optimising writing StringBuilder's content to ServletResponse

I would like to obtain a few comments in regards to optimising my method created for writing of the whole content's of StringBuilder to a ServletResponse.
I did it to avoid creation of gigantic Strings at a single go before passing it to the out.write() method. In my situation a StringBuilder's content length, in some occasions, arrives at few million characters.
public static void writeResponse(ServletResponse response, StringBuilder sb) throws IOException {
try (PrintWriter out = response.getWriter()) {
int length = sb.length();
//to avoid creation of gigantic strings we are writing substrings from the sb
int bufferSize = (response.getBufferSize() != 0? response.getBufferSize():10000);
log.log(Level.INFO, "READY TO SEND To CLIENT, length of responseSB={0}", length);
if (length <= bufferSize) {
out.write(sb.toString());
} else {
int noWrites = length / bufferSize;
for (int i = 0; i < noWrites; i++) {
out.write(sb.substring(i * bufferSize, (i + 1) * bufferSize));
log.log(Level.INFO, "SENDING To CLIENT, write no={0} of {1}", new Object[]{(i + 1), noWrites});
}
int rest = length % bufferSize;
if (rest != 0) {
out.write(sb.substring(length - rest, length));
}
}
}
}
I want it to write a single (not chunked) message. Thus, I would like to know how accurately establish a response's buffer size in relation to a number of characters (or a String's length) it can fit?
At the moment, I am taking the current buffer size and using it as if it was expressing a number of characters it can fit, how to correctly evaluate the buffer size? Also I am not including the header size, how could I achieve it?
I would like to optimise its performance to maximum (so it works the fastest), any suggestion is much appreciated. Or maybe there is all together a better way of writing a gigantic StringBuilder content to ServletResponse?

The fastest way is:
out.write(sb.toString());
If you want to save on memory, replace StringBuilder with PrintWriter and pass response.getWriter() around.
Any optimization regarding buffer sizes will only make things slower. Without optimization, the cost is roughly: StringBuilder.toString() + out.write() which passed the long string to the container for chunking/sending.
With your optimization, it looks like this: StringBuilder.toString() + substring() + out.write() + copying substring into send buffer + many calls to container to send the pieces.
If you get rid of the builder, the number of calls to the container will stay the same (out.write() uses an internal buffer) but you won't waste memory to keep data around.
If you want to keep the StringBuilder, then find out how big the pages are and create a StringBuilder with a non-default size so it doesn't have to extend it's internal buffer all the time.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

how to load first x bytes from URL with Java / Scala? - java

Related

Why String receiver's size is smaller than original ByteArrayOutputStream's size when I call toString()

Java - download a file through network with a buffer

Why doesn't InputStream fill the array fully?

read(byte[] b, int off, int len) : Reading A File To Its End

Optimising writing StringBuilder's content to ServletResponse

Categories

Resources