Most efficient way to write to a file? - java

I am writing my own image compression program in Java, I have entropy encoded data stored in multiple arrays which I need to write to file. I am aware of different ways to write to file but I would like to know what needs to be taken into account when trying to use the least possible amount of storage. For example, what character set should I use (I just need to write positive and negative numbers), would I be able to write less than 1 byte to a file, should I be using Scanners/BufferedWriters etc. Thanks in advance, I can provide more information if needed.

Read the Java tutorial about IO.
You should
not use Writers and character sets, since you want to write binary data
use a buffered stream to avoid too many native calls and make the write fast
not use Scanners, as they're used to read data, and not write data
And no, you won't be able to write less than a byte in a file. The byte is the smallest amount of information that can be stored in a file.

Compression is almost always more expensive than file IO. You shouldn't worry about the speed of your writes unless you know it's a bottle neck.
I am writing my own image compression program in Java, I have entropy encoded data stored in multiple arrays which I need to write to file. I am aware of different ways to write to file but I would like to know what needs to be taken into account when trying to use the least possible amount of storage.
Write the data in a binary format and it will be the smallest. This is why almost all image formats use binary.
For example, what character set should I use (I just need to write positive and negative numbers),
Character encoding is for encoding characters i.e. text. You don't use these in binary formats generally (unless they contain some text which you are unlikely to do initially).
would I be able to write less than 1 byte to a file,
Technically you can use less than the block size on disk e.g. 512 bytes or 4 KB. You can write any amount less than this but it doesn't use less space, nor would it really matter if it did because the amount of disk is too small to worry about.
should I be using Scanners/BufferedWriters etc.
No, These are for text,
Instead use DataOutputStream and DataInputStream as these are for binary.

what character set should I use
You would need to write your data as bytes, not chars, so forget about character set.
would I be able to write less than 1 byte to a file
No, this would not be possible. But to follow decoder expected bit stream you might need to construct a byte, from something like 5 and 3 bits before writing that byte to the file.

Related

How do I read a file without any buffering in Java?

I'm working through the problems in Programming Pearls, 2nd edition, Column 1. One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file. Since Java is the language I'm the most familiar with, I've decided to use it even though the author seems to have had C and C++ in mind.
Since I'm pretending memory is limited for the purpose of the problem I'm working on, I'd like to make sure the process of reading the file has no buffering at all.
I thought InputStreamReader would be a good solution, until I read this in the Java documentation:
To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
Ideally, only the bytes that are necessary would be read from the stream -- in other words, I don't want any buffering.
One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file.
This implies that you need to read the file as bytes (not characters).
Assuming that you do have a genuine requirement to read from a file without buffering, then you should use the FileInputStream class. It does no buffering. It reads (or attempts to read) precisely the number of bytes that you asked for.
If you then need to convert those bytes to characters, you could do this by applying the appropriate String constructor to a byte or byte[]. Note that for multibyte character encodings such as UTF-8, you would need to read sufficient bytes to complete each character. Doing that without the possibility of read-ahead is a bit tricky ... and entails "knowledge* of the character encoding you are reading.
(You could avoid that knowledge by using a CharsetDecoder directly. But then you'd need to use the decode method that operates on Buffer objects, and that is a bit complicated too.)
For what it is worth, Java makes a clear distinction between stream-of-byte and stream-of-character I/O. The former is supported by InputStream and OutputStream, and the latter by Reader and Write. The InputStreamReader class is a Reader, that adapts an InputStream. You should not be considering using it for an application that wants to read stuff byte-wise.

Write to Binary files in Java

I am trying to write a lot of data into a binary file. Because it is a lot of data, it is important that this is done fast and I want to be able to write the data as ints one by one. I have tried RandomAccessFile, BufferedWriter, DataOutputStream etc. but all of those are either too slow or cannot write ints. Any ideas that might help me?
Every stream can 'write ints' if you write the correct code to convert ints to bytes.
The two 'fast' IO options in Java are BufferedOutputStream on top of FileOutputStream and the use of a FileChannel with NIO buffers.
If all you are writing is many, many, int values, you can use IntBuffer instances to pass the data to a fileChannel.
Further, 'one at a time' is generally incompatible with 'fast'. Sooner or later, data has to travel to the disk in blocks. If you force data to disk in small quantities, you will find that the process is very slow. You could, for example, add integer values to a buffer and write the buffer when it fills, and then repeat.
Take a look at the java.nio package. You will find classes that you can use for your needed purposes.
Well, writing to a file one int at a time isn't an inherently fast operation. Even with bufferedwriter you're potentially making a lot of function calls (and may still be doing a lot of file writes if you haven't set the buffer to be large enough).
Have you tried putting the integers into an array, using ByteBuffer to convert it to a byte array, and then writing the byte array to a file?

Create a wav with hidden binary data in it and read it (Java)

What I'm willing to do is to convert a text string into a wav file format in high frequencies (18500Hz +): this will be the encoder.
And create an engine to decode this text string from a wav formatted recording that will support error control as I will not use the same file obviously, to read, but a recording of this sound.
Thanks
An important consideration will be whether or not you want to hide the string into an existing audio file (so it sounds like a normal file, but has an encoded message -- that is called steganography), or whether you will just be creating a file that sounds like gibberish, for the purpose of encoding data only. I'm assuming the latter since you didn't ask to hide a message in an existing file.
So I assume you are not looking for low-level details on writing WAV files (I am sure you can find documentation on how to read and write individual samples to a WAV file). Obviously, the simplest approach would be to simply take each byte of the source string, and store it as a sample in the WAV file (assuming an 8-bit recording. If it's a 16-bit recording, you can store two bytes per sample. If it's a stereo 16-bit recording, you can store four bytes per sample). Then you can just read the WAV file back in and read the samples back as bytes. That's the simple approach but as you say, you want to be able to make a (presumably analog) recording of the sound, and then read it back into a WAV file, and still be able to read the data.
With the approach above, if the analog recording is not exactly perfect (and how could it be), you would lose bytes of the message. This means you need to store the message in such a way that missing bytes, or bytes that have a slight error, are not going to be a problem. How you do this will depend highly upon exactly what sort of "damage" will be happening to the sound file. I would expect two major forms of damage:
"Vertical" damage: A sample (byte) would have a slightly higher or lower value than it originally had.
"Horizontal" damage: Samples may be averaged, stretched or squashed horizontally. From a byte perspective, this means some samples may be repeated, while others may be missing.
To combat this, you need some redundancy in the message. More redundancy means the message will take up more space (be longer), but will be more reliable.
I would recommend thinking about how old (pre-mobile) telephone dial tones worked: each key generated a unique tone and sent it across the wire. The tones are long enough, and far enough apart pitch-wise that they can be distinguished even given the above forms of damage. So, choose two parameters: a) length and b) frequency-delta. For each byte of data, select a frequency, spacing the 256 byte values frequency-delta Hertz apart. Then, generate a sine wave for length milliseconds of that frequency. This encodes a lot more redundancy than the above one-byte-per-sample approach, since each byte takes up many samples, and if you lose some samples, it doesn't matter.
When you read them back in, read every length milliseconds of audio data and then estimate the frequency of the sine wave. Map this onto the byte value with the nearest frequency.
Obviously, longer values of length and further-apart frequency-delta will make the signal more reliable, but require the sound to be longer and higher-frequency, respectively. So you will have to play around with these values to see what works.
Some last thoughts, since your title says "hidden" binary data:
If you really want the data to be "hidden", consider encrypting it before encoding it to audio.
If you want to take the steganography approach, you will have to read up on audio steganography (I imagine you can use the above techniques, but you will have to insert them as extremely low-volume signals on top of the existing sound).

Huffman coding in Java

I want encode every file by Huffman code.
I have found the length of bits per symbol (its Huffman code).
Is it possible to encode a character into a file in Java: are there any existing classes that read and write to a file bit by bit and not with minimum dimension of char?
You could create a BitSet to store your encoding as you are creating it and simply write the String representation to a file when you are done.
You really don't want to write single bits to a file, believe me. Usually we define a byte buffer, build the "file" in memory and, after all work is done, write the complete buffer. Otherwise it would take forever (nearly).
If you need a fast bit vector, then have a look at the colt library. That's pretty convenient if you want to write single bits and don't do all this bit shifting operations on your own.
I'm sure there are Huffman classes out there, but I'm not immediately aware of where they are. If you want to roll your own, two ways to do this spring to mind immediately.
The first is to assemble the bit strings in memory my using mask and shift operators and accumulate the bits into larger data objects (i.e. ints or longs) and then write those out to file with standard streaming.
The second, more ambitious and self-contained idea would be to write an implementation of OutputStream that has a method for writing a single bit and then this OutputStream class would do the aforementioned buffering/shifting/accumulating itself and probably pass the results down to a second, wrapped OutputStream.
Try writing a bit vector in java to do the bit representation: it should allow you to set/reset the individual bits in a bit stream.
The bit stream can thus hold your Huffman encoding. This is the best approach, and lightning fast too.
Huffmann sample analysis here
You can find a working (and fast) implementation here: http://code.google.com/p/kanzi/source/browse/src/kanzi/entropy/HuffmanTree.java

.txt limit, need help

I have a problem here and no idea what to do. Basically I'm creating a .txt file which serves as an index for a random access file. In it I have the number of bytes required to seek to each entry in the file.
The file has 1484 records. This is where I have my problem: with the large amount of bytes the record has, I end up writing pretty long numbers into the file, and ultimately the .txt file ends up being too big. When I open it with an appropriate piece of software (such as notepad) the file is simply cut off at a certain point.
I tried to minimize it as much as possible, but it's just too big.
What can I do here? I'm clueless.
Thanks.
I am not really sure that the problem is that one... only 1484 records?
You can write a binary file instead, in which each four or eight bytes correspond to a record position. This way, all positions have the same length on disk, no matter how many digits they hold. If you need to browse/modify the file, you can easily write utility programs that decode the file so it lets you inspect it, and that encode your modifications, modifying it.
Another solution would be to compress the file. You can use the zip capabilities of Java, and unzip the file before using it, and zip it again after that.
It is probably because you are not feeding new lines to terminate each line. There is a limit set to the maximum line length that text editors can handle safely.
Storing your indices in a binary file, inside some kind of Collection (depending on your needs) would probably be much faster and lighter.

Categories

Resources