Reading binary file byte by byte

Reading binary file byte by byte - java

I've been doing research on a java problem I have with no success. I've read a whole bunch of similar questions here on StackOverflow but the solutions just doesn't seem to work as expected.
I'm trying to read a binary file byte by byte.
I've used:
while ((data = inputStream.read()) != -1)
loops...
for (int i = 0; i < bFile.length; i++) {
loops...
But I only get empty or blank output. The actual content of the file I'm trying to read is as follows:
¬í sr assignment6.PetI¿Z8kyQŸ I ageD weightL namet Ljava/lang/String;xp > #4 t andysq ~ #bÀ t simbasq ~ #I t wolletjiesq ~
#$ t rakker
I'm merely trying to read it byte for byte and feed it to a character array with the following line:
char[] charArray = Character.toChars(byteValue);
Bytevalue here represents an int of the byte it's reading.
What is going wrong where?

Since java 7 it is not needed to read byte by byte, there are two utility function in Files:
Path path = Paths.get("C:/temp/test.txt");
// Load as binary:
byte[] bytes = Files.readAllBytes(path);
String asText = new String(bytes, StandardCharset.ISO_8859_1);
// Load as text, with some Charset:
List<String> lines = Files.readAllLines(path, StandardCharsets.ISO_8859_1);
As you want to read binary data, one would use readAllBytes.
String and char is for text. As opposed to many other programming languages, this means Unicode, so all scripts of the world may be combined. char is 16 bit as opposed to the 8 bit byte.
For pure ASCII, the 7 bit subset of Unicode / UTF-8, byte and char values are identical.
Then you might have done the following (low-quality code):
int fileLength = (int) path.size();
char[] chars = new char[fileLength];
int i = 0;
int data;
while ((data = inputStream.read()) != -1) {
chars[i] = (char) data; // data actually being a byte
++i;
}
inputStream.close();
String text = new String(chars);
System.out.println(Arrays.toString(chars));
The problem you had, probably concerned the unwieldy fixed size array in java, and that a char[] still is not a String.
For binary usage, as you seem to be reading serialized data, you might like to dump the file:
int i = 0;
int data;
while ((data = inputStream.read()) != -1) {
char ch = 32 <= data && data < 127 ? (char) data : ' ';
System.out.println("[%06d] %02x %c%n", i, data, ch);
++i;
}
Dumping file position, hex value and char value.

it is simple example:
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
If you want to read text(characters) - use Readers, if you want to read bytes - use Streams

Why not using Apache Commons:
byte[] bytes = IOUtils.toByteArray(inputStream);
Then you can convert it to char:
String str = new String(bytes);
Char[] chars = str.toCharArray();
Or like you did:
char[] charArray = Character.toChars(bytes);
To deserialize objects:
List<Object> results = new ArrayList<Object>();
FileInputStream fis = new FileInputStream("your_file.dat");
ObjectInputStream ois = new ObjectInputStream(fis);
try {
while (true) {
results.add(ois.readObject());
}
} catch (OptionalDataException e) {
if (!e.eof) throw e;
} finally {
ois.close();
}

Edit:
Use file.length() for they array size, and make a byte array. Then inputstream.read(b).
Edit again: if you want characters, use inputstreamreader(fileinputstream(file),charset), it even comes with charset.

Related

how to read a byte from a text file as an actual byte in hex instead of characters?

Im really unsure how to phrase my question, but here is the situation.
I have data in a text file, for example: 0x7B 0x01 0x2C 0x00 0x00 0xEA these values are a hex representation of ASCII symbols. I need to read this data and be able to parse and translate accordingly.
My problem so far is that ive tried using a scanner via something like scan.getNextByte() and was directed towards the post: [using java.util.Scanner to read a file byte by byte]
After changing the file input format to a fileinputstream i found that while doing something like fis.read(), this is returning 48, the ascii value for the character 0 in 0x7B.
I am looking for a way to interpret the data being read in has hex so 0x7B will be equivalent to "{" in ASCII.
Hope this is clear enough to all,
Thanks,

Since your bytes are delimited by spaces, you can just use a Scanner to read them:
try (Scanner scanner = new Scanner(Paths.get(filename))) {
while (scanner.hasNext()) {
int byteValue = Integer.decode(scanner.next());
// Process byteValue ...
}
}
I encourage you to read about the Integer.decode method and the Scanner class.

If you need scalable solution, try to write your own InputStream
Basic example:
class ByteStringInputStream extends InputStream {
private final InputStream inputStream;
public ByteStringInputStream(InputStream inputStream) {
this.inputStream = inputStream;
}
private boolean isHexSymbol(char c) {
return (c >= '0' && c <= '9')
|| (c >= 'A' && c <= 'F')
|| (c == 'x');
}
#Override
public int read() throws IOException {
try {
int readed;
char[] buffer = new char[4];
int bufferIndex = 0;
while ((readed = inputStream.read()) != -1 && bufferIndex < 4) {
if (isHexSymbol((char) readed)) {
buffer[bufferIndex] = (char) readed;
}
bufferIndex++;
}
String stringBuffer = new String(buffer);
if (!stringBuffer.matches("^0x[0-9A-F]{2}$")) {
throw new NumberFormatException(stringBuffer);
}
return Integer.decode(stringBuffer);
} catch (Exception ex) {
inputStream.close();
throw new IOException("<YOUR_EXCEPTION_TEXT_HERE>", ex);
}
}
}
Usage:
ByteStringInputStream bsis = new ByteStringInputStream(new BufferedInputStream(System.in));
//you can use any InputStream instead
while (true) {
System.out.println(bsis.read());
}
Demo:
>0x7B 0x01 0x2C 0x00 0x00 0xEA
123
1
44
0
0
234

If you're in a position to use external libraries, the Apache Commons Codec library has a Hex utility class that can turn a character-array representation of hex bytes into a byte array:
final String hexChars = "0x48 0x45 0x4C 0x4C 0x4F";
// to get "48454C4C4F"
final String plainHexChars = hexChars.replaceAll("(0x| )", "");
final byte[] hexBytes = Hex.decodeHex(plainHexChars.toCharArray());
final String decodedBytes = new String(hexBytes, Charset.forName("UTF-8"));

How to parse "SecciÃ³n" to "Sección"? (string accutes encoding issue)

I have a string with this value "SecciÃ³n"
I need to parse it to UTF-8, so the string gets transformed to "Sección"
I tried with line = new String(line.getBytes("UTF-8"), "UTF-8"); but this does not work.
Edit
I'm reading the string with this method:
public static String loadLine(InputStream is) {
if (is == null)
return null;
final short TAM_LINE = 256;
String line;
char[] buffer = new char[TAM_LINE];
short i;
int ch;
try {
line = "";
i = 0;
do {
ch = is.read();
if ((ch != '\n') && (ch != -1)) {
buffer[i++] = (char)(ch & 0xFF);
if (i >= TAM_LINE) {
line += new String(buffer, 0, i);
i = 0;
}
}
} while ((ch != '\n') && (ch != -1));
// Si no hemos llegado a leer ning�n caracter, devolvemos null
if (ch == -1 && i == 0)
return null;
// A�adimos el �ltimo trozo de l�nea le�do
line += new String(buffer, 0, i);
} catch (IOException e) {
e.printStackTrace();
return null;
}
return line;
}

The character ó is encoded as 0xc3 0xb3 in UTF-8. It appears that whichever program read that UTF-8-encoded string in the first place read it assuming the wrong encoding, for example windows-1252, where 0xc3 encodes Ã and 0xb3 encodes ³.
In your case, your edit shows that (as far as I can tell, I don't know Java), you're reading the input byte by byte, building the string one character at a time, one from each byte. This is not a good idea if the encoding UTF-8 uses multiple bytes to encode certain characters such as ó.
You should read the input into a bytes array first, then build a String using the correct encoding:
line = new String(byteArray, "UTF-8")

FileInputStream and Huffman Tree

I am creating a Huffman tree to compress a text file but I am having some issues. This method I am making is supposed to take a FileInputStream which inputs the text data and returns a Map of the characters and the counts. However, to do that, I need to define the size of byte[] to store the data. The problem is that the byte[] array size needs to be just the right length or else the Map will also have some unneeded data. Is there a way to make the byte[] just the right size?
Here is my code:
// provides a count of characters in an input file and place in map
public static Map<Character, Integer> getCounts(FileInputStream input)
throws IOException {
Map<Character, Integer> output = new TreeMap<Character, Integer>(); // treemap keeps keys in sorted order (chars alphabetized)
byte[] fileContent = new byte[100]; // creates a byte[]
//ArrayList<Byte> test = new ArrayList<Byte>();
input.read(fileContent); // reads the input into fileContent
String test = new String(fileContent); // contains entire file into this string to process
// goes through each character of String to put chars as keys and occurrences as keys
for (int i = 0; i < test.length(); i++) {
char temp = test.charAt(i);
if (output.containsKey(temp)) { // seen this character before; increase count
int count = output.get(temp);
System.out.println("repeat; char is: " + temp + "count is: " + count);
output.put(temp, count + 1);
} else { // Haven't seen this character before; create count of 1
System.out.println("new; char is: " + temp + "count is: 1");
output.put(temp, 1);
}
}
return output;
}

The return value of FileInputStream.read() is the number of bytes actually read, or -1 in case of EOF. You can use this value instead of test.length() in the for loop.
Notice that read() is not guaranteed to read in the buffer length worth of bytes, even if the end of file is not reached, so it is usually used in a loop:
int bytesRead;
//Read until there is no more bytes to read.
while((bytesRead = input.read(buf))!=-1)
{
//You have next bytesRead bytes in a buffer here
}
Finally, if your strings are Unicode, this approach will not work, since read() can terminate mid-character. Consider using InputStreamReader to wrap FileInputStream:
Reader fileReader = new InputStreamReader(input, "UTF-8");
int charsRead;
char buf[] = new char[256];
while ((charsRead = fileReader.read(buf)) > 0) {
//You have charsRead characters in a buffer here
}

Getting MD5 Hash of File from URL

The result I'm getting is that files of the same type are returning the same md5 hash value. For example two different jpgs are giving me the same result. However, a jpg vs a apk are giving different results.
Here is my code...
public static String checkHashURL(String input) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
InputStream is = new URL(input).openStream();
try {
is = new DigestInputStream(is, md);
int b;
while ((b = is.read()) > 0) {
;
}
} finally {
is.close();
}
byte[] digest = md.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < digest.length; i++) {
sb.append(
Integer.toString((digest[i] & 0xff) + 0x100, 16).substring(
1));
}
return sb.toString();
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}

This is broken:
while ((b = is.read()) > 0)
Your code will stop at the first byte of the stream which is 0. If the two files have the same values before the first 0 byte, you'll fail. If you really want to call the byte-at-a-time version of read, you want:
while (is.read() != -1) {}
The parameterless InputStream.read() method returns -1 when it reaches the end of the stream.
(There's no need to assign a value to b, as you're not using it.)
Better would be to read a buffer at a time:
byte[] ignoredBuffer = new byte[8 * 1024]; // Up to 8K per read
while (is.read(ignoredBuffer) > 0) {}
This time the condition is valid, because InputStream.read(byte[]) would only ever return 0 if you pass in an empty buffer. Otherwise, it will try to read at least one byte, returning the length of data read or -1 if the end of the stream has been reached.

How can I write a sequence of strings and then a byte array to a file?

I want to write first a sequence of strings and then a sequence of bytes into a file, using Java. I started by using FileOutputStream because of the array of bytes. After searching the API, I realised that FileOutputStream cannot write Strings, only ints and bytes, so I switched to DataOutputStream. When I run the program, I get an exception. Why?
Here's a portion of my code:
try {
// Create the file
FileOutputStream fos;
DataOutputStream dos; // = new DataOutputStream("compressedfile.ecs_h");
File file= new File("C:\\MyFile.txt");
fos = new FileOutputStream(file);
dos=new DataOutputStream(fos);
/* saves the characters as a dictionary into the file before the binary seq*/
for (int i = 0; i < al.size(); i++) {
String name= al.get(i).name; //gets the string from a global arraylist, don't pay attention to this!
dos.writeChars(name); //saving the name in the file
}
System.out.println("\nIS SUCCESFULLY WRITTEN INTO FILE! ");
dos.writeChars("><");
String strseq;
/*write all elements from the arraylist into a string variable*/
strseq= seq.toString();
System.out.println("sTringSeq: " + strseq);
/*transpose the sequence string into a byte array*/
byte[] data = new byte[strseq.length() / 8];
for (int i = 0; i < data.length; i++) {
data[i] = (byte) Integer.parseInt(strseq.substring(i * 8, (i + 1) * 8), 2);
dos.write(data[i]);
}
dos.flush();
//Close the output stream
dos.close();
} catch(Exception e){}

The problem with your code is that the last for loop was counting over the wrong number of bytes. The code below fixes your problem writing your test data to a file. This works on my machine.
public static void main(String[] args) {
ArrayList<String> al = new ArrayList<String>();
al.add("String1");
al.add("String2");
try {
// Create the file
FileOutputStream fos = new FileOutputStream("MyFile.txt");
DataOutputStream dos = new DataOutputStream(fos);
/* saves the characters as a dictionary into the file before the binary seq */
for (String str : al) {
dos.writeChars(str);
}
System.out.println("\nIS SUCCESFULLY WRITTEN INTO FILE! ");
dos.writeChars("><");
String strseq = "001100111100101000101010111010100100111000000000";
// Ensure that you have a string of the correct size
if (strseq.length() % 8 != 0) {
throw new IllegalStateException(
"Input String is cannot be converted to bytes - wrong size: "
+ strseq.length());
}
int numBytes = strseq.length() / 8;
for (int i = 0; i < numBytes; i++) {
int start = i * 8;
int end = (i + 1) * 8;
byte output = (byte) Integer.parseInt(strseq.substring(start, end), 2);
dos.write(output);
}
dos.writeChars("> Enf of File");
dos.flush();
// Close the output stream
dos.close();
} catch (Exception e) {
e.printStackTrace();
}
}
The approach of writing bytes directly to a test file does have a few problems (I assume that it's a text file in that your test file name ends with .txt), the most obvious one being that some text editors don't handle/display null characters very well (your last test byte was: 00000000 or null). If you want to see the bytes as readable bytes then you could investigate encoding them using Base64 encoding.

Line:
data[i] = (byte) Integer.parseInt(strseq.substring(i * 8, (i + 1) * 8), 2);
looks very suspiciously...
can you provide move details about strseq and its value?

What about this code ?
this code :
byte[] data = new byte[strseq.length() / 8];
for (int i = 0; i < data.length; i++) {
data[i] = (byte) Integer.parseInt(strseq.substring(i * 8, (i + 1) * 8), 2);
dos.write(data[i]);
}
becomes
byte[] data = strseq.getBytes();

With the FileWriter class you have a nice abstraction of a file writing operation.
May this class can help you to write your file...
You can substitute the other OutputStreams by only this class. It have all the methods of you want for write a string and a byte array in a file.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading binary file byte by byte - java

Edit: Use file.length() for they array size, and make a byte array. Then inputstream.read(b). Edit again: if you want characters, use inputstreamreader(fileinputstream(file),charset), it even comes with charset.

Related

how to read a byte from a text file as an actual byte in hex instead of characters?

How to parse "SecciÃ³n" to "Sección"? (string accutes encoding issue)

FileInputStream and Huffman Tree

Getting MD5 Hash of File from URL

How can I write a sequence of strings and then a byte array to a file?

Categories

Resources