Ok I know a buffer is actually an array of byte, however I have never seen the following declaration (taken from here)
URLConnection con = new URL("http://maps...").openConnection();
InputStream is = con.getInputStream();
byte bytes[] = new byte[con.getContentLength()];
is.read(bytes);
Is it the right way to avoid using a BufferInputStream object? Here we have an unbuffered stream reading from a byte []? should not be the other way around?
thanks in advance.
No, it is not the right way. Method read() reads up to N bytes where N is the length of your array. It can read less bytes (even 0) if no more byte are available. Number of bytes that have been read is returned by method read(). When end of stream is reached the method returns -1.
Therefore the right way is to read bytes in loop:
byte[] buf = new buf[MAX];
int n = 0;
while ((n = stream.read(buf)) >= 0) {
// deal with n first bytes from buf
}
or use Apache commons-io
InputStream is;
byte[] bytes = IOUtils.toByteArray(is);
Related
In a java program (Eclipse ISE on a PC) I want to read a huge amount of data (around 1640188 bytes) from a web site. With Wireshark I can see that these datas come in many blocks of 1460 bytes.
When I use the following code I read only the first block seen at high level (size around 18000 bytes). How could I do to have the other blocks?
URLConnection con = url.openConnection();
InputStream input = con.getInputStream();
while(input.available()>0)
{
System.out.println(input.available());
int n = input.available();
byte[] mydataTab = new byte[n];
input.read(mydataTab, 0, n);
String str = new String(mydataTab);
memoData += str;
}
First:
Do not
int n = input.available();
byte[] mydataTab = new byte[n];
because:
Note that while some implementations of InputStream will return the
total number of bytes in the stream, many will not. It is never
correct to use the return value of this method to allocate a buffer
intended to hold all data in this stream.
Java InputStream Documentation
Second:
Try to use some predefined chunck size for your reading, so you can do:
int chuncksize = 1024;
int sizeRead = input.read(mydataTab, 0, n);
where the sizeRead is the amount of bytes that you read.
And keep reading the chunks until the end of the streaming.
So I've been trying to make a small program that inputs a file into a byte array, then it will turn that byte array into hex, then binary. It will then play with the binary values (I haven't thought of what to do when I get to this stage) and then save it as a custom file.
I studied a lot of internet code and I can turn a file into a byte array and into hex, but the problem is I can't turn huge files into byte arrays (out of memory).
This is the code that is not a complete failure
public void rundis(Path pp) {
byte bb[] = null;
try {
bb = Files.readAllBytes(pp); //Files.toByteArray(pathhold);
System.out.println("byte array made");
} catch (Exception e) {
e.printStackTrace();
}
if (bb.length != 0 || bb != null) {
System.out.println("byte array filled");
//send to method to turn into hex
} else {
System.out.println("byte array NOT filled");
}
}
I know how the process should go, but I don't know how to code that properly.
The process if you are interested:
Input file using File
Read the chunk by chunk of the file into a byte array. Ex. each byte array record hold 600 bytes
Send that chunk to be turned into a Hex value --> Integer.tohexstring
Send that hex value chunk to be made into a binary value --> Integer.toBinarystring
Mess around with the Binary value
Save to custom file line by line
Problem:: I don't know how to turn a huge file into a byte array chunk by chunk to be processed.
Any and all help will be appreciated, thank you for reading :)
To chunk your input use a FileInputStream:
Path pp = FileSystems.getDefault().getPath("logs", "access.log");
final int BUFFER_SIZE = 1024*1024; //this is actually bytes
FileInputStream fis = new FileInputStream(pp.toFile());
byte[] buffer = new byte[BUFFER_SIZE];
int read = 0;
while( ( read = fis.read( buffer ) ) > 0 ){
// call your other methodes here...
}
fis.close();
To stream a file, you need to step away from Files.readAllBytes(). It's a nice utility for small files, but as you noticed not so much for large files.
In pseudocode it would look something like this:
while there are more bytes available
read some bytes
process those bytes
(write the result back to a file, if needed)
In Java, you can use a FileInputStream to read a file byte by byte or chunk by chunk. Lets say we want to write back our processed bytes. First we open the files:
FileInputStream is = new FileInputStream(new File("input.txt"));
FileOutputStream os = new FileOutputStream(new File("output.txt"));
We need the FileOutputStream to write back our results - we don't want to just drop our precious processed data, right? Next we need a buffer which holds a chunk of bytes:
byte[] buf = new byte[4096];
How many bytes is up to you, I kinda like chunks of 4096 bytes. Then we need to actually read some bytes
int read = is.read(buf);
this will read up to buf.length bytes and store them in buf. It will return the total bytes read. Then we process the bytes:
//Assuming the processing function looks like this:
//byte[] process(byte[] data, int bytes);
byte[] ret = process(buf, read);
process() in above example is your processing method. It takes in a byte-array, the number of bytes it should process and returns the result as byte-array.
Last, we write the result back to a file:
os.write(ret);
We have to execute this in a loop until there are no bytes left in the file, so lets write a loop for it:
int read = 0;
while((read = is.read(buf)) > 0) {
byte[] ret = process(buf, read);
os.write(ret);
}
and finally close the streams
is.close();
os.close();
And thats it. We processed the file in 4096-byte chunks and wrote the result back to a file. It's up to you what to do with the result, you could also send it over TCP or even drop it if it's not needed, or even read from TCP instead of a file, the basic logic is the same.
This still needs some proper error-handling to work around missing files or wrong permissions but that's up to you to implement that.
A example implementation for the process method:
//returns the hex-representation of the bytes
public static byte[] process(byte[] bytes, int length) {
final char[] hexchars = "0123456789ABCDEF".toCharArray();
char[] ret = new char[length * 2];
for ( int i = 0; i < length; ++i) {
int b = bytes[i] & 0xFF;
ret[i * 2] = hexchars[b >>> 4];
ret[i * 2 + 1] = hexchars[b & 0x0F];
}
return ret;
}
I made an InputStream Object from a file and a InputStreamReader from that.
InputStream ips = new FileInputStream("c:\\data\\input.txt");
InputStreamReader isr = new InputStreamReader(ips);
I will basically read data in the form of bytes to a buffer but when there comes a time when i should read in chars I will 'switch mode' and read with InputStreamReader
byte[] bbuffer = new byte[20];
char[] cbuffer = new char[20];
while(ips.read(buffer, 0, 20)!=-1){
doSomethingWithbBuffer(bbuffer);
// check every 20th byte and if it is 0 start reading as char
if(bbuffer[20] == 0){
while(isr.read(cbuffer, 0, 20)!=-1){
doSomethingWithcBuffer(cbuffer);
// check every 20th char if its # return to reading as byte
if(cbuffer[20] == '#'){
break;
}
}
}
}
is this a safe way to read files that have mixed char and byte data?
no, this is not safe. the InputStreamReader may read "too much" data from the underlying stream (it uses internal buffers) and corrupt your attempt to read from the underlying byte stream. You can use something like DataInputStream if you want to mix reading characters and bytes.
Alternately, just read the data as bytes and use the correct character encoding to convert those bytes to characters/Strings.
I have the following statement:
DataInputStream is = new DataInputStream(process.getInputStream());
I would like to print the contents of this input stream but I dont know the size of this stream. How should I read this stream and print it?
It is common to all Streams, that the length is not known in advance. Using a standard InputStream the usual solution is to simply call read until -1 is returned.
But I assume, that you have wrapped a standard InputStream with a DataInputStream for a good reason: To parse binary data. (Note: Scanner is for textual data only.)
The JavaDoc for DataInputStream shows you, that this class has two different ways to indicate EOF - each method either returns -1 or throws an EOFException. A rule of thumb is:
Every method which is inherited from InputStream uses the "return -1" convention,
Every method NOT inherited from InputStream throws the EOFException.
If you use readShort for example, read until an exception is thrown, if you use "read()", do so until -1 is returned.
Tip: Be very careful in the beginning and lookup each method you use from DataInputStream - a rule of thumb can break.
Call is.read(byte[]) repeadely, passing a pre-allocated buffer (you can keep reusing the same buffer). The function will return the number of bytes actually read, or -1 at the end of the stream (in which case, stop):
byte[] buf = new byte[8192];
int nread;
while ((nread = is.read(buf)) >= 0) {
// process the first `nread` bytes of `buf`
}
byte[] buffer = new byte[100];
int numberRead = 0;
do{
numberRead = is.read(buffer);
if (numberRead != -1){
// do work here
}
}while (numberRead == buffer.length);
Keep reading a set buffer size in a loop. If the return value is ever less than the size of the buffer you know you have reached the end of the stream. If the return value is -1, there is no data in the buffer.
DataInputStream.read
DataInputStream is something obsolete. I recommend you to use Scanner instead.
Scanner sc = new Scanner (process.getInputStream());
while (sc.hasNextXxx()) {
System.out.println(sc.nextXxx());
}
The documentation says that one should not use available() method to determine the size of an InputStream. How can I read the whole content of an InputStream into a byte array?
InputStream in; //assuming already present
byte[] data = new byte[in.available()];
in.read(data);//now data is filled with the whole content of the InputStream
I could read multiple times into a buffer of a fixed size, but then, I will have to combine the data I read into a single byte array, which is a problem for me.
The simplest approach IMO is to use Guava and its ByteStreams class:
byte[] bytes = ByteStreams.toByteArray(in);
Or for a file:
byte[] bytes = Files.toByteArray(file);
Alternatively (if you didn't want to use Guava), you could create a ByteArrayOutputStream, and repeatedly read into a byte array and write into the ByteArrayOutputStream (letting that handle resizing), then call ByteArrayOutputStream.toByteArray().
Note that this approach works whether you can tell the length of your input or not - assuming you have enough memory, of course.
Please keep in mind that the answers here assume that the length of the file is less than or equal to Integer.MAX_VALUE(2147483647).
If you are reading in from a file, you can do something like this:
File file = new File("myFile");
byte[] fileData = new byte[(int) file.length()];
DataInputStream dis = new DataInputStream(new FileInputStream(file));
dis.readFully(fileData);
dis.close();
UPDATE (May 31, 2014):
Java 7 adds some new features in the java.nio.file package that can be used to make this example a few lines shorter. See the readAllBytes() method in the java.nio.file.Files class. Here is a short example:
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
// ...
Path p = FileSystems.getDefault().getPath("", "myFile");
byte [] fileData = Files.readAllBytes(p);
Android has support for this starting in Api level 26 (8.0.0, Oreo).
You can use Apache commons-io for this task:
Refer to this method:
public static byte[] readFileToByteArray(File file) throws IOException
Update:
Java 7 way:
byte[] bytes = Files.readAllBytes(Paths.get(filename));
and if it is a text file and you want to convert it to String (change encoding as needed):
StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes)).toString()
You can read it by chunks (byte buffer[] = new byte[2048]) and write the chunks to a ByteArrayOutputStream. From the ByteArrayOutputStream you can retrieve the contents as a byte[], without needing to determine its size beforehand.
I believe buffer length needs to be specified, as memory is finite and you may run out of it
Example:
InputStream in = new FileInputStream(strFileName);
long length = fileFileName.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("File is too large!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + fileFileName.getName());
}
in.close();
Max value for array index is Integer.MAX_INT - it's around 2Gb (2^31 / 2 147 483 647).
Your input stream can be bigger than 2Gb, so you have to process data in chunks, sorry.
InputStream is;
final byte[] buffer = new byte[512 * 1024 * 1024]; // 512Mb
while(true) {
final int read = is.read(buffer);
if ( read < 0 ) {
break;
}
// do processing
}