Reading in Partial InputStream - java

I'm looking to read an InputStream in sections because I need the first n bytes of the file and last m bytes as well as the contents between.
byte[] beginning = inputStream.readNBytes(16);
This works just fine, but to get the last m bytes, I tried the following:
byte[] middle = inputStream.readNBytes(inputStream.available() - 32);
byte[] end = inputStream.readNBytes(inputStream.available());
The end variable looks how I expect it to but not the middle variable, which ends up cutting out part of the stream.
I'm also a bit confused why the buf parameter size in the input stream doesn't seem to be equal to the byte array size when converting one to the other.
Anyway, I assume this isn't working how I want it to because (inputStream.available() - 32) is not adding up to a value compatible with readNBytes, so part of the stream is lost.
Is there a way to go about doing this?
EDIT: What I ended up doing which seemed to work(mostly) is when creating the file, to prepend both pieces I will later be extracting instead of prepending one and appending the other. That way I can just call inputStream.readAllBytes() on the last piece.
I also had to change where I'm writing to the file. I was writing to a CipherOutputStream when I should've been writing to the FileOutputStream and using that to create the Cipher OS.
Even after doing this I still have an extra 16 bytes at the end of the file, which confuses me, but I can easily ignore that last bit if I can't figure out why it's doing that.

Related

Resource file format processing in Java

I am trying to implement a processor for a specific resource archive file format in Java. The format has a Header comprised of a three-char description, a dummy byte, plus a byte indicating the number of files.
Then each file has an entry consisting of a dummy byte, a twelve-char string describing the file name, a dummy byte, and an offset declared in a three-byte array.
What would be the proper class for reading this kind of structure? I have tried RandomAccessFile but it does not allow to read arrays of data, e.g. I can only read three chars by calling readChar() three times, etc.
Of course I can extend RandomAccessFile to do what I want but there's got to be a proper out-of-the-box class to do this kind of processing isn't it?
This is my reader for the header in C#:
protected override void ReadHeader()
{
Header = new string(this.BinaryReader.ReadChars(3));
byte dummy = this.BinaryReader.ReadByte();
NFiles = this.BinaryReader.ReadByte();
}
I think you got lucky with your C# code, as it relies on the character encoding to be set somewhere else, and if it didn't match the number of bytes per character in the file, your code would probably have failed.
The safest way to do this in Java would be to strictly read bytes and do the conversion to characters yourself. If you need seek abilities, then indeed RandomAccessFile would be your easiest solution, but it should be pointed out that InputStream allows skipping, so if you don`t need actual random access, just to skip some of the files, you could certainly use it.
In either case, you should read the bytes from the file per the file specification, and then convert them to characters based on a known encoding. You should never trust a file that was not written by a Java program to contain any Java data types other than byte, and even if it was written by Java, it may well have been converted to raw bytes while writing.
So your code should be something along the lines of:
String header = "";
int nFiles = 0;
RandomAccessFile raFile = new RandomAccessFile( "filename", "r" );
byte[] buffer = new byte[3];
int numRead = raFile.read( buffer );
header = new String( buffer, StandardCharsets.US_ASCII.name() );
int numSkipped = raFile.skipBytes(1);
nFiles = raFile.read(); // The byte is read as an integer between 0 and 255
Sanity checks (checking that actual 3 bytes were read, 1 byte was skipped and nFiles is not -1) and exception handling have been skipped for brevity.
It's more or less the same if you use InputStream.
I would go with MappedByteBuffer. This will allow you to seek arbitrarily, but will also deal efficiently and transparently with large files that are too large to fit comfortably in RAM.
This is, to my mind, the best way of reading structured binary data like this from a file.
You can then build your own data structure on top of that, to handle the specific file format.

Writing Bits to a file using BitSet & FileOutputStream

I've run into a bit of a problem when it comes to writing specific bits to a file. I apologise if this is a duplicate of anything but I could not find a reasonable answer with the searches I ran.
I have a number of difficulties with the following:
Writing a header (Long) bit by bit (converted to a byte array so the
FileOutputStream can utilise it) to the file.
Writing single bits to the file. For example, at one stage I am required to write a single bit set to 0 to the file so my initial thought would be to use a BitSet but Java seems to treat this as a null?
BitSet initialPadding = new BitSet();
initialPadding.set(0, false);
fileOutputStream.write(initialPadding.toByteArray());
1)
I create a FileOutputStream as shown below with the necessary file name:
FileOutputStream fileOutputStream = new FileOutputStream(file.getAbsolutePath());
I am attempting to create an ".amr" file so the first step before I perform any bit manipulation is to write a header to the beginning of the file. This has the following value:
Long defaultHeader = 0x2321414d520aL;
I've tried writing this to the file using the following method but I am pretty sure it does not write the correct result:
fileOutputStream.write(defaultHeader.byteValue());
Am I using the correct streams? Are my convertions completely wrong?
2)
I have a public BitSet fileBitSet;which has bits read in from a ".raw" file as the input. I need to be able to extract certain bits from the BitSet in order to write them to the file later. I do this using the following method:
public int getOctetPayloadHeader(int startPoint) {
int readLength = 0;
octetCMR = fileBitSet.get(0, 3);
octetRES = fileBitSet.get(4, 7);
if (octetRES.get(0, 3).isEmpty()) {
/* Keep constructing the payload header. */
octetFBit = fileBitSet.get(8, 8);
octetMode = fileBitSet.get(9, 12);
octetQuality = fileBitSet.get(13, 13);
octetPadding = fileBitSet.get(14, 15);
... }
What would be the best way to go for writing these bits to a file bearing in mind that I may be required to sometimes write a single bit or 81 bits at a particular offset in the fileBitSet ?
There is only one thing you can write to an OutputStream: bytes. You have to do the composing of your bits into bytes yourself; only you know the rules how the bits are to be put together into bytes.
As for stuff like:
Long defaultHeader = 0x2321414d520aL;
fileOutputStream.write(defaultHeader.byteValue());
You should take a close look at the javadocs for the methods you are using. byteValue() returns a single byte; so of course its not doing what you expect. Working with streams is well explained in oracles tutorials: http://docs.oracle.com/javase/tutorial/essential/io/streams.html
For writing single bits or groups of bits, you will need a custom OutputStream that handles grouping the bits into bytes to be written. Thats commonly called a BitStream (there is no such class in the JDK); you have to either write it yourself (which I highly recommend, its a very good excercise to teach you about bits and bytes) or find one on the web.

How to read x number of characters from a text file progressively

I am trying to read x characters from a text file at a time, progressively. So if I had: aaaaabbbbbcccccabckcka and im reading 5 at a time I would get, aaaaa, bbbbb,ccccc, abckc and ka. The code I am using is:
status = is.read(bytes);
text = new String(bytes);
where bytes is: bytes = new byte[5], I am calling these two lines of code till status becomes -1, the problem I am facing is, the output is not what I have mentioned above, but I get this:
aaaaa, bbbbb, ccccc, abckc and kackc, notice the last segment 'kackc' is garbage, why is this happening ?
Note: that bytes is initialized once outside the reading loop.
Your current solution works for ASCII, but many characters in other encodings use more than one byte. You should use a Reader and a char[] instead of an InputStream and a byte[], respectively.
It turns out, I need to clear my byte buffer every time I read new input, I just used a for loop to zero it out and it worked

How do you write any ASCII character to a file in Java?

Basically I'm trying to use a BufferedWriter to write to a file using Java. The problem is, I'm actually doing some compression so I generate ints between 0 and 255, and I want to write the character who's ASCII value is equal to that int. When I try writing to the file, it writes many ? characters, so when I read the file back in, it reads those as 63, which is clearly not what I want. Any ideas how I can fix this?
Example code:
int a = generateCode(character); //a now has an int between 0 and 255
bw.write((char) a);
a is always between 0 and 255, but it sometimes writes '?'
You are really trying to write / read bytes to / from a file.
When you are processing byte-oriented data (as distinct from character-oriented data), you should be using InputStream and OutputStream classes and not Reader and Writer classes.
In this case, you should use FileInputStream / FileOutputStream, and wrap with a BufferedInputStream / BufferedOutputStream if you are doing byte-at-a-time reads and writes.
Those pesky '?' characters are due to issues the encoding/decoding process that happens when Java converts between characters and the default text encoding for your platform. The conversion from bytes to characters and back is often "lossy" ... depending on the encoding scheme used. You can avoid this by using the byte-oriented stream classes.
(And the answers that point out that ASCII is a 7-bit not 8-bit character set are 100% correct. You are really trying to read / write binary octets, not characters.)
You need to make up your mind what are you really doing. Are you trying to write some bytes to a file, or are you trying to write encoded text? Because these are different concepts in Java; byte I/O is handled by subclasses of InputStream and OutputStream, while character I/O is handled by subclasses of Reader and Writer. If what you really want to write is bytes to a file (which I'm guessing from your mention of compression), use an OutputStream, not a Writer.
Then there's another confusion you have, which is evident from your mention of "ASCII characters from 0-255." There are no ASCII characters above 127. Please take 15 minutes to read this: "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" (by Joel Spolsky). Pay particular attention to the parts where he explains the difference between a character set and an encoding, because it's critical for understanding Java I/O. (To review whether you understood, here's what you need to learn: Java Writers are classes that translate character output to byte output by applying a client-specified encoding to the text, and sending the bytes to an OutputStream.)
Java strings are based on 16 bit wide characters, it tries to perform conversions around that assumption if there is no clear specifications.
The following sample code, write and reads data directly as bytes, meaning 8-bit numbers which have an ASCII meaning associated with them.
import java.io.*;
public class RWBytes{
public static void main(String[] args)throws IOException{
String filename = "MiTestFile.txt";
byte[] bArray1 =new byte[5];
byte[] bArray2 =new byte[5];
bArray1[0]=65;//A
bArray1[1]=66;//B
bArray1[2]=67;//C
bArray1[3]=68;//D
bArray1[4]=69;//E
FileOutputStream fos = new FileOutputStream(filename);
fos.write(bArray1);
fos.close();
FileInputStream fis = new FileInputStream(filename);
fis.read(bArray2);
ByteArrayInputStream bais = new ByteArrayInputStream(bArray2);
for(int i =0; i< bArray2.length ; i++){
System.out.println("As the bytem value: "+ bArray2[i]);//as the numeric byte value
System.out.println("Converted as char to printiong to the screen: "+ String.valueOf((char)bArray2[i]));
}
}
}
A fixed subset of the 7 bit ASCII code is printable, A=65 for example, the 10 corresponds to the "new line" character which steps down one line on screen when found and "printed". Many other codes exist which manipulate a character oriented screen, these are invisible and manipulated the screen representation like tabs, spaces, etc. There are also other control characters which had the purpose of ringing a bell for example.
The higher 8 bit end above 127 is defined as whatever the implementer wanted, only the lower half have standard meanings associated.
For general binary byte handling there are no such qualm, they are number which represent the data. Only when trying to print to the screen the become meaningful in all kind of ways.

How do I convert a file's line number to a byte offset (or get the byte offset of the beginning of each line with a BufferedReader)?

I'm using a FileReader wrapped in a LineNumberReader to index a large text file for speedy access later on. Trouble is I can't seem to find a way to read a specific line number directly. BufferedReader supports the skip() function, but I need to convert the line number to a byte offset (or index the byte offset in the first place).
I took a crack at it using RandomAccessFile, and while it worked, it was horribly slow during the initial indexing. BufferedReader's speed is fantastic, but... well, you see the problem.
Some key info:
The file can be any size (currently 35,000 lines)
It's stored on Android's internal filesystem (via getFilesDir() to be exact)
The formatting is not fixed width, unfortunately (hence the need to read by line)
Any ideas?
Describes an extended RandomAccessFile with buffering semantics
Trouble is I can't seem to find a way to read a specific line number directly
Unless you know the length of each line you can't read it directly
There is no shortcut, you will need to read then entire file up front and calculate the offsets manualy.
I would just use a BufferedReader and then get the length of each string and add 1 (or 2?) for the EOL string.
Consider saving an file index along with the large text file. If this file is something you are generating, either on your server, or on the device, it should be trivial to generate an index once and distribute and/or save it along with the file.
I'd recommend an int[] where each value is the absolute offset in bytes for the n*(index+1) th line. So you could have an array of size 35,000 with the start of each line, or an array of size 350, with the start of every 100th line.
Here's an example assuming you have an index file containing an raw sequence of int values:
public String getLineByNumber(RandomAccessFile index,
RandomAccessFile data,
int lineNum) {
index.seek(lineNum*4);
data.seek(index.readInt());
return data.readLine();
}
I took a crack at it using
RandomAccessFile, and while it worked,
it was horribly slow during the
initial indexing
You've started the hard part already. Now for the harder part.
BufferedReader's speed is fantastic,
but...
Is there something in your use of RandomAccessFile that made it slower than it has to be? How many bytes did you read at a time? If you read one byte at a time it will be sloooooow. IF you read in an array of bytes at a time, you can speed things up and use the byte array as a buffer.
Just wrapping up the previous comments :
Either you use RandomAccessFile to first count byte and second parse what you read to find lines by hand OR you use a LineNumberReader to first read lines by lines and count the bytes of each line of char (2 bytes in utf 16 ?) by hand.

Categories

Resources