Reading a binary file into ArrayList in Java

Reading a binary file into ArrayList in Java - java

I want to read binary file in java. I have 153(1.bin, 2.bin...153.bin) bin file. I must read that. I thought that I must use ArrayList for buffering. But I could not do that. How can I do this ?
After some research, I found that way in this title(Reading a binary input stream into a single byte array in Java). There is code like below in this title.
StringBuilder sb = new StringBuilder();
String fileName = "/path/10.bin";
byte[] buffer;
try {
buffer = Files.readAllBytes(Paths.get(fileName));
for(byte b : buffer){
sb.append(String.format("%02X ", b));
}
System.out.println(sb.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Is it true way for my question or do I must use Arraylist for buffering ? If I use single byte array for buffering, do I must clear the buffer for the other binary files.
Edit : 153 unit means 153 file(1.bin,2.bin ... 153.bin)

Your question is unclear. For one, you don't tell what those "units" are, how long they are etc. Second, all your code does is dump the contents of the file in hexadecimal.
What I suggest you do here is map the file into memory and use a class to wrap that around, and make it implement Closeable.
See FileChannel.open() and FileChannel.map(). Please note however that it is unsafe to map more than 1 GiB in memory. This is not a "real" mmap().

I'm not sure what you mean by "units". Byte data is read something like that:
File f = new File ("File.txt");
FileInputStream fis = new FileInputStream (f);
byte[] bytes = new byte[(int) f.length ()];
fis.read (bytes, 0, (int) f.length () );
Make sure your File is not too big.

I'm assuming a unit is one byte. Is this correct?
An ArrayList is not appropriate for byte buffering. It is a wrapper class for an array that implements the List interface (by which most of it's power is defined.)
http://docs.oracle.com/javase/7/docs/api/java/util/ArrayList.html
If you simply want to read bytes in from a file, you could use FileInputStream.
http://docs.oracle.com/javase/tutorial/essential/io/bytestreams.html
Here's a simple example that reads input from a file containing "hello world!"
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class Input {
public static void main(String[] args) throws FileNotFoundException, IOException {
try (FileInputStream in = new FileInputStream("test.txt")) {
int c;
while ( (c = in.read()) != -1 )
System.out.print((byte)c + " ");
}
}
}
Output in bytes: 104 101 108 108 111 32 119 111 114 108 100 33 13 10

Related

Why is my DataInputStream only reading 114 bytes?

I'm trying to extract a file from my jar and copying it into the temp directory.
To read the file within the jar, I am using a DataInputStream, to write the file in the temp directory, I am using a DataOutputStream.
The file I am trying to extract has a file size of 310 kilobytes, my copied file only contains 114 bytes after I've called my method (this is also the number of bytes my method prints to the console).
Here is my method:
private static void extractFile(String pathInJar, String fileToCopy) {
File outputFile = new File(System.getProperty("java.io.tmpdir") + "/LDEngine/"+fileToCopy);
boolean couldDirsBeCreated = outputFile.getParentFile().mkdirs();
if(couldDirsBeCreated && !outputFile.exists()) {
int x;
int actualBytesRead = 0;
byte[] tmpByteArray = new byte[4096];
try(
DataOutputStream output = new DataOutputStream(new FileOutputStream(outputFile));
DataInputStream in = new DataInputStream(LibLoader.class.getResourceAsStream("/libs/natives/"+pathInJar))
){
while((x=in.read(tmpByteArray)) != -1) {
output.write(tmpByteArray);
actualBytesRead += x;
}
} catch(Exception e) {
System.err.println("Fatal error: Could not write file!");
System.exit(1);
}
System.out.println(actualBytesRead);
}
}
The file I am trying to copy is a .dll, so it's binary data I'm dealing with.
The question is why is this happening and what am I doing wrong?

This does not explain why your method stops so soon, but you need to take care of it or you will have an even stranger problem with the result data being completely garbled.
From the APi doc of DataInputStream.read():
Reads some number of bytes from the contained input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer.
You need to use that return value and call the write() method that takes and offset and length.

incorrect data input in file from an OutputStream( Java )

I wrote a data to text file, but data in file are incorrect. I think it is problem with OutpubStream, because I display data on previous steps, and they were correct.
private void Output(File file2) {
// TODO Auto-generated method stub
OutputStream os;
try {
os = new FileOutputStream(file2); //file2-it is my output file, all normal with him
Iterator<Integer> e=mass.iterator();
int r=0;
while(e.hasNext()){
r=e.next();
System.out.println(r);//display data-all be correct
os.write(r);//I think problem create in this step/
}
os.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Input data file1.txt
10
56
2
33
45
21
15
68
54
85
Output data file2.txt
3 strahge tokens plus !-68DU
thanks for answers, excuse me for my english.

The line
os.write(r);
Writes the binary value of integer r to the file.
Use something like:
os.write(String.valueOf(r));
and you probably want new lines:
os.write(String.valueOf(r)+"\n");

FileOutputStream is used to write binary raw Data. As specified in document :
FileOutputStream is meant for writing streams of raw bytes such as
image data. For writing streams of characters, consider using
FileWrite
Since you are writing integers to the file so what you need is text-output Stream like PrintWriter. It can be used in your code as follows:
PrintWriter pw = new PrintWriter(file2); //file2-it is my output file, all normal with it
Iterator<Integer> e=mass.iterator();
int r=0;
while(e.hasNext()){
r=e.next();
pw.print(r);
pw.println();//for new line
}
pw.close();

use FileWriter instead of FileOutputStream as your data is text and you probably want to use a stream of characters

you could consider transforming the string into a bytecode:
System.out.println(r);// display data-all be correct
String line = (String.valueOf(r) + "\n");
os.write(line.getBytes());

File -> byte[] -> String -> byte[] -> File Conversion

This is basically what I am trying to do.
I wanna take a File
Turn it into a Byte Array
Turn it into a String
Store it in a MySQL Table
Retrieve the String
Turn it back into a Byte Array
Turn it back into a File
Now, I have some code for you, which I tried to comment as best as I could. My problem is, that the file I get at the end of this code, doesn't come out right. It's missing information. It's a text file, so I should be able to tell whether the file is complete or not.
As far as I can see, it looks like I only get the last part of the file, and not the entire file. I am pretty sure I messing something up badly somewhere in this conversion. If you got suggestions on how to do this conversion and retrieval more efficiently (Still keeping the Database and all that in mind), please let me know as well!
The code is listed below
import java.io.*;
import java.util.StringTokenizer;
import java.util.logging.Level;
import java.util.logging.Logger;
public class main {
public static void main(String[] args) {
// The file we want to save.
File f = new File("build.xml");
try {
// Make it into a byte array first
FileInputStream fis = new FileInputStream(f);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
try {
for(int readNum; (readNum = fis.read(buf)) != -1;) {
bos.write(buf, 0, readNum);
System.out.println("read " + readNum + " bytes,");
}
StringBuilder s = new StringBuilder();
// Now we simulate making it into a String, for easier storage
// in a database.
for(byte b : buf) {
// for debugging
s.append(b).append(",");
System.out.print(b +",");
}
// Now we want to retrieve the file from the database as a string
File someFile = new File("build2.xml");
FileOutputStream fos = new FileOutputStream(someFile);
// We count how many bytes there are in this string.
// One byte per Token.
StringTokenizer st = new StringTokenizer(s.toString(),",");
buf = new byte[st.countTokens()];
int i = 0;
StringBuilder t = new StringBuilder();
// Now we parse out all Bytes from the string, and put them into
// the prepared byte array.
while(st.hasMoreTokens()) {
byte b = Byte.parseByte(st.nextToken());
System.out.print(b + ",");
buf[i] = b;
i++;
// for debugging
t.append(b).append(",");
}
// Here I print true if both strings are exactly the same
// which they should be, which means that the bytes are intact
// before and after conversion.
System.out.println("\n" +(t.toString().equals(s.toString()) ? true : false));
// Here we would make the physical file on the machine.
fos.write(buf);
fos.flush();
fos.close();
} catch (IOException ex) {
Logger.getLogger(main.class.getName()).log(Level.SEVERE, null, ex);
}
} catch (FileNotFoundException ex) {
Logger.getLogger(main.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
http://pastebin.com/699yuE8f

Your approach is totally ignoring encodings, which is not a good thing. Characters are not equal to or equivalent to bytes.
If you have to do it in the sequence you describe, then create the string by something like this:
String intermediateString = new String(theByteArray,
theSameEncodingTheFileWasCreatedWith);
Likewise, when you convert the string back into bytes, get the bytes like this:
byte[] bytesToSave = intermediateString.getBytes(theSameEncodingTheFileWasCreatedWith);
But besides any of that, what's the point of using the string at all? Why not just store the bytes right into the database?

You simply messed up the string creation, and you don't read the bos but the buf.
for(byte b : >>buf<<) {
// for debugging
s.append(b).append(",");
System.out.print(b +",");
}
Otherwise I am not convinced that it will work or it is a good solution. Why can't you just store it simply in the database?

The code you shared is IMHO more complicated as it had to be.
Why do you read your text on byte-level if you are only interested in it's String representation?
I would prefer to read the file using an InputStreamReader. That allows you to directly operate on characters.

Java: reading strings from a random access file with buffered input

I've never had close experiences with Java IO API before and I'm really frustrated now. I find it hard to believe how strange and complex it is and how hard it could be to do a simple task.
My task: I have 2 positions (starting byte, ending byte), pos1 and pos2. I need to read lines between these two bytes (including the starting one, not including the ending one) and use them as UTF8 String objects.
For example, in most script languages it would be a very simple 1-2-3-liner like that (in Ruby, but it will be essentially the same for Python, Perl, etc):
f = File.open("file.txt").seek(pos1)
while f.pos < pos2 {
s = f.readline
# do something with "s" here
}
It quickly comes hell with Java IO APIs ;) In fact, I see two ways to read lines (ending with \n) from regular local files:
RandomAccessFile has getFilePointer() and seek(long pos), but it's readLine() reads non-UTF8 strings (and even not byte arrays), but very strange strings with broken encoding, and it has no buffering (which probably means that every read*() call would be translated into single undelying OS read() => fairly slow).
BufferedReader has great readLine() method, and it can even do some seeking with skip(long n), but it has no way to determine even number of bytes that has been already read, not mentioning the current position in a file.
I've tried to use something like:
FileInputStream fis = new FileInputStream(fileName);
FileChannel fc = fis.getChannel();
BufferedReader br = new BufferedReader(
new InputStreamReader(
fis,
CHARSET_UTF8
)
);
... and then using fc.position() to get current file reading position and fc.position(newPosition) to set one, but it doesn't seem to work in my case: looks like it returns position of a buffer pre-filling done by BufferedReader, or something like that - these counters seem to be rounded up in 16K increments.
Do I really have to implement it all by myself, i.e. a file readering interface which would:
allow me to get/set position in a file
buffer file reading operations
allow reading UTF8 strings (or at least allow operations like "read everything till the next \n")
Is there a quicker way than implementing it all myself? Am I overseeing something?

import org.apache.commons.io.input.BoundedInputStream
FileInputStream file = new FileInputStream(filename);
file.skip(pos1);
BufferedReader br = new BufferedReader(
new InputStreamReader(new BoundedInputStream(file,pos2-pos1))
);
If you didn't care about pos2, then you woundn't need Apache Commons IO.

I wrote this code to read utf-8 using randomaccessfiles
//File: CyclicBuffer.java
public class CyclicBuffer {
private static final int size = 3;
private FileChannel channel;
private ByteBuffer buffer = ByteBuffer.allocate(size);
public CyclicBuffer(FileChannel channel) {
this.channel = channel;
}
private int read() throws IOException {
return channel.read(buffer);
}
/**
* Returns the byte read
*
* #return byte read -1 - end of file reached
* #throws IOException
*/
public byte get() throws IOException {
if (buffer.hasRemaining()) {
return buffer.get();
} else {
buffer.clear();
int eof = read();
if (eof == -1) {
return (byte) eof;
}
buffer.flip();
return buffer.get();
}
}
}
//File: UTFRandomFileLineReader.java
public class UTFRandomFileLineReader {
private final Charset charset = Charset.forName("utf-8");
private CyclicBuffer buffer;
private ByteBuffer temp = ByteBuffer.allocate(4096);
private boolean eof = false;
public UTFRandomFileLineReader(FileChannel channel) {
this.buffer = new CyclicBuffer(channel);
}
public String readLine() throws IOException {
if (eof) {
return null;
}
byte x = 0;
temp.clear();
while ((byte) -1 != (x = (buffer.get())) && x != '\n') {
if (temp.position() == temp.capacity()) {
temp = addCapacity(temp);
}
temp.put(x);
}
if (x == -1) {
eof = true;
}
temp.flip();
if (temp.hasRemaining()) {
return charset.decode(temp).toString();
} else {
return null;
}
}
private ByteBuffer addCapacity(ByteBuffer temp) {
ByteBuffer t = ByteBuffer.allocate(temp.capacity() + 1024);
temp.flip();
t.put(temp);
return t;
}
public static void main(String[] args) throws IOException {
RandomAccessFile file = new RandomAccessFile("/Users/sachins/utf8.txt",
"r");
UTFRandomFileLineReader reader = new UTFRandomFileLineReader(file
.getChannel());
int i = 1;
while (true) {
String s = reader.readLine();
if (s == null)
break;
System.out.println("\n line " + i++);
s = s + "\n";
for (byte b : s.getBytes(Charset.forName("utf-8"))) {
System.out.printf("%x", b);
}
System.out.printf("\n");
}
}
}

For #Ken Bloom A very quick go at a Java 7 version. Note: I don't think this is the most efficient way, I'm still getting my head around NIO.2, Oracle has started their tutorial here
Also note that this isn't using Java 7's new ARM syntax (which takes care of the Exception handling for file based resources), it wasn't working in the latest openJDK build that I have. But if people want to see the syntax, let me know.
/*
* Paths uses the default file system, note no exception thrown at this stage if
* file is missing
*/
Path file = Paths.get("C:/Projects/timesheet.txt");
ByteBuffer readBuffer = ByteBuffer.allocate(readBufferSize);
FileChannel fc = null;
try
{
/*
* newByteChannel is a SeekableByteChannel - this is the fun new construct that
* supports asynch file based I/O, e.g. If you declared an AsynchronousFileChannel
* you could read and write to that channel simultaneously with multiple threads.
*/
fc = (FileChannel)file.newByteChannel(StandardOpenOption.READ);
fc.position(startPosition);
while (fc.read(readBuffer) != -1)
{
readBuffer.rewind();
System.out.println(Charset.forName(encoding).decode(readBuffer));
readBuffer.flip();
}
}

Start with a RandomAccessFile and use read or readFully to get a byte array between pos1 and pos2. Let's say that we've stored the data read in a variable named rawBytes.
Then create your BufferedReader using
new BufferedReader(new InputStreamReader(new ByteArrayInputStream(rawBytes)))
Then you can call readLine on the BufferedReader.
Caveat: this probably uses more memory than if you could make the BufferedReader seek to the right location itself, because it preloads everything into memory.

I think the confusion is caused by the UTF-8 encoding and the possibility of double byte characters.
UTF8 doesn't specify how many bytes are in a single character. I'm assuming from your post that you are using single byte characters. For example, 412 bytes would mean 411 characters. But if the string were using double byte characters, you would get the 206 character.
The original java.io package didn't deal well with this multi-byte confusion. So, they added more classes to deal specifically with strings. The package mixes two different types of file handlers (and they can be confusing until the nomenclature is sorted out). The stream classes provide for direct data I/O without any conversion. The reader classes convert files to strings with full support for multi-byte characters. That might help clarify part of the problem.
Since you state you are using UTF-8 characters, you want the reader classes. In this case, I suggest FileReader. The skip() method in FileReader allows you to pass by X characters and then start reading text. Alternatively, I prefer the overloaded read() method since it allows you to grab all the text at one time.
If you assume your "bytes" are individual characters, try something like this:
FileReader fr = new FileReader( new File("x.txt") );
char[] buffer = new char[ pos2 - pos ];
fr.read( buffer, pos, buffer.length );
...

I'm late to the party here, but I ran across this problem in my own project.
After much traversal of Javadocs and Stack Overflow, I think I found a simple solution.
After seeking to the appropriate place in your RandomAccessFile, which I am here calling raFile, do the following:
FileDescriptor fd = raFile.getFD();
FileReader fr = new FileReader(fd);
BufferedReader br = new BufferedReader(fr);
Then you should be able to call br.readLine() to your heart's content, which will be much faster than calling raFile.readLine().
The one thing I'm not sure about is whether UTF8 strings are handled correctly.

The java IO API is very flexible. Unfortunately sometimes the flexibility makes it verbose. The main idea here is that there are many streams, writers and readers that implement wrapper patter. For example BufferedInputStream wraps any other InputStream. The same is about output streams.
The difference between streams and readers/writers is that streams work with bytes while readers/writers work with characters.
Fortunately some streams, writers and readers have convenient constructors that simplify coding. If you want to read file you just have to say
InputStream in = new FileInputStream("/usr/home/me/myfile.txt");
if (in.markSupported()) {
in.skip(1024);
in.read();
}
It is not so complicated as you afraid.
Channels is something different. It is a part of so called "new IO" or nio. New IO is not blocked - it is its main advantage. You can search in internet for any "nio java tutorial" and read about it. But it is more complicated than regular IO and is not needed for most applications.

println(char), characters turn into Chinese?

Please help me to troubleshoot this problem.
A have an input file 'Trial.txt' with content "Thanh Le".
Here is the function I used in an attempt to read from the file:
public char[] importSeq(){
File file = new File("G:\\trial.txt");
char temp_seq[] = new char[100];
try{
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream dis = new DataInputStream(bis);
int i = 0;
//Try to read all character till the end of file
while(dis.available() != 0){
temp_seq[i]=dis.readChar();
i++;
}
System.out.println(" imported");
} catch (FileNotFoundException e){
e.printStackTrace();
} catch (IOException e){
e.printStackTrace();
}
return temp_seq;
}
And the main function:
public static void main(String[] args) {
Sequence s1 = new Sequence();
char result[];
result = s1.importSeq();
int i = 0;
while(result[i] != 0){
System.out.println(result[i]);
i++;
}
}
And this is the output.
run:
imported
瑨
慮
栠
汥
BUILD SUCCESSFUL (total time: 0 seconds)

That's honestly said a pretty clumsy way to read a text file into a char[].
Here's a better example, assuming that the text file contains only ASCII characters.
File file = new File("G:/trial.txt");
char[] content = new char[(int) file.length()];
Reader reader = null;
try {
reader = new FileReader(file);
reader.read(content);
} finally {
if (reader != null) try { reader.close(); } catch (IOException ignore) {}
}
return content;
And then to print the char[], just do:
System.out.println(content);
Note that InputStream#available() doesn't necessarily do what you're expecting.
See also:
Java IO tutorial

Because in Java a char is made by 2 bytes, so, when you use readChar, it will read pairs of letters and compose them into unicode characters.
You can avoid this by using readByte(..) instead..

Some code to demonstrate, what exactly is happening. A char in Java consists of two bytes and represents one character, the glyph (pixels) you see on the screen. The default encoding in Java is UTF-16, one particular way to use two bytes to represent one of all the glyphs. Your file had one byte to represent one character, probably ASCII. When you read one UTF-16 character, you read two bytes and thus two ASCII characters from your file.
The following code tries to explain how single ASCII bytes 't' and 'h', become one chinese UTF-16 character.
public class Main {
public static void main(String[] args) {
System.out.println((int)'t'); // 116 == x74 (116 is 74 in Hex)
System.out.println((int)'h'); // 104 == x68
System.out.println((int)'瑨'); // 29800 == x7468
// System.out.println('\u0074'); // t
// System.out.println('\u0068'); // h
// System.out.println('\u7468'); // 瑨
char th = (('t' << 8) + 'h'); //x74 x68
System.out.println(th); //瑨 == 29800 == '\u7468'
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading a binary file into ArrayList in Java - java

I'm not sure what you mean by "units". Byte data is read something like that: File f = new File ("File.txt"); FileInputStream fis = new FileInputStream (f); byte[] bytes = new byte[(int) f.length ()]; fis.read (bytes, 0, (int) f.length () ); Make sure your File is not too big.

Related

Why is my DataInputStream only reading 114 bytes?

incorrect data input in file from an OutputStream( Java )

File -> byte[] -> String -> byte[] -> File Conversion

Java: reading strings from a random access file with buffered input

println(char), characters turn into Chinese?

Categories

Resources