Reading and Writing Bits to Text Files in Java - java

I'm trying to implement some compression algorithms, and I need to deal with bits in Java.
What I need to do is that when I write the value 1 then the value 2, those numbers are stored in the file as bits, so the file size will be 1 byte instead of 2, as 1 is stored in 1 bit and 2 is stored in 2 bits.
Is it possible? Thanks very much

All the I/O methods have a byte as the lowest granularity. You can write bits, but you have to pack them into bytes by yourself. Maybe a one-byte buffer that you write out to the file once it fills up would be appropriate.
Also note that there is no way to know the length of the file in bits (you do not know if the last byte was "full"). So your application needs to take care of that somehow.
You can also google for "BitOutputStream", of which there are a few, though not in libraries that are very common. Maybe just use one of those.
Finally, the file you will be creating will not be a "Text" file, it will be very much binary (even more so than usual...)

Related

How is a variable actually stored in memory for Java?

I'm learning about Text I/O and Binary I/O in java right now. I read that each value that you write to a file is initially stored in binary. For text I/O, the individual digits are converted to it's corresponding Unicode values and then encoded to the file-specific encoding such as ASCII. For binary I/O, the binary value is directly represented in the file. For example, 199 would be represented as 0xC7 which in binary is 11000111. Now I'm confused on one part. If a variable is initially stored as a binary format, does each digit represent a separate byte that is stored or is the entirety of the number stored as a single byte. For example, is 199 originally stored as 0xc7 which would be 11000111 in binary? Or would it be stored in 3 bytes with each byte representing the binary value for the digit. If it was stored in 3 separate bytes, does binary I/O convert that 3 byte number to a single byte? If it's stored in a single byte, how does text I/O translate that single byte into 3 separate byte values. I'm just confused on how to word this. Hope you can understand what I'm getting at. Thanks
The only thing which a computer is capable of dealing with are sets of 0/1 bits which are stored in memory or, if you wish on a storage device. Those bits can be streamed to monitors and converted to characters by graphical hardware. Sams story with keyboards, you type a key and a few bits of data will be send to the computer.
Bits are stored in memory and are accessible by memory addresses. The addresses are also sets of bits.
For practical reasons the bits are grouped into bytes, words, long words, ... A byte used to be the smallest addressable unit of bits and historically ended up as a group of 8 bits, which is currently used in most of the hardware. Modern memory can store data in multiple byte addressable chunks. Same for the disk, you store data there, using specific addressing mechanisms. But in any case those are just sets of bits.
What you are confused about is the interpretation of those bits. They can represent integer numbers, floating point numbers, characters, addresses, ... The way they are interpreted only depends on the program which uses them.
Characters do not exist in the computer. They are just an abstraction which is provided by programming languages. The programs interpret the bits stored on the computer. There are standards. For example the ASCII encoding maps English characters plus a few special characters into numbers from 0 to 127. Those fit into a single byte (leaving number 128 to 255 for special use). A print command will read those bytes one by one and send them to graphics to form letters on the screen as specified in the encoding standard. Different encoding scheme will display the same bytes differently.
If you write a program wit the "hello world" sting in it, the program will convert the symbols between quotes into a set of 11 ascii bytes. (In 'c' it will add yet another byte which is equal to '0' and ends the string this way). Unicode is yet another way to represent characters. Every unicode character is represented by multiple bytes of data. There are other schemes as well. One thing to pay attention to. If you write strings on the disk using certain encoding, you should read them with the same encoding, or your prints will give you garbage. But you can always read and copy then as binary data without interpretation.
So, any variable of any type is just an abstraction and always consists of bytes of data which your program knows how to interpret based on the data type and/or operations it wants to perform. Variables of type int, double, any java object, including String, are just sets of bytes of different sizes. Only the program (and java interpreter is a program) knows what to do with them, use them in calculations or display as characters.

How to write ints with specific amount of bits in Java

So I want to write in a file integers with, for example, 10 bits each in Little Endian format. They also shouldn't be aligned to the byte.
The following image may help you understand the scructure.
I looked at ByteBuffer (I'm coding in Java) but it doesn't seem to do this.
This is not possible by default. Java doesn't have a bit type, so the closest you are going to get is Byte or Boolean. You can make a util class with 10 booleans (as bits) in whatever order you would like, but other than that, Java does not hold this functionality.

Reading amplitude at the beginning and end of a 16 bit WAV file in Java

I'm reading in 16 bit WAV files in Java (apparently always little-endian). I need to check the amplitude at the beginning and end of a WAV file. I'm hoping for silence at the start and end of clips but need to report on a scale if not. The files are always accessible locally. I've read about converting the file to a byte array and that converting each byte to a signed integer representation of the hex gives the amplitude but (if this is the case) I'm confused about how to apply this to audio that would need to be split across 2 bytes per sample. I've also read about bit-shifting but I'm unsure if it's relevant if I use a byte array.
To clarify, I'd rather not use unnecessary imports if possible but could and I don't have to use bytes to divide up the WAV, I only need a reliable way to present the amplitude at particular points in the array (start and end).

How to parse byte[] (including BCD coded values) to Object in Java

I'd like to know if there is a simple way to "cast" a byte array containing a data-structure of a known layout to an Object. The byte[] consists of BCD packed values, 1 or 2-byte integer values and character values. I'm obtaining the byte[] via reading a file with a FileInputStream.
People who've worked on IBM-Mainframe systems will know what I mean right away - the problem is I have to do the same in Java.
Any suggestions welcome.
No, because the object layout can vary depending on what VM you're using, what architecture the code is running on etc.
Relying on an in-memory representation has always felt brittle to me...
I suggest you look at DataInputStream - that will be the simplest way to parse your data, I suspect.
Not immediately, but you can write one pretty easily if you know exactly what the bytes represent.
To convert a BCD packed number you need to extract the two digits encoded. The four lower bits encode the lowest digit and you get that by &'ing with 15 (1111 binary). The four upper bits encode the highest digit which you get by shifting right 4 bits and &'ing with 15.
Also note that IBM most likely have tooling available if you this is what you are actually doing. For the IBM i look for the jt400 IBM Toolbox for Java.

Are text files, pictures, sound files and video files all stored/transferred as stream of bytes?

What else are represented as stream of bytes?
At a certain level of abstraction, just about everything is stored, represented or transferred as a sequence or stream of bytes.
Ok, what can be stored/transferred as a System.IO.Stream object in .NET or counterpart in Java?
Any information that can be represented by a computer can (in theory) be turned into a sequence of bytes and stored / transferred via a byte-oriented I/O stream. You may need to write some software to transform the computer representation of the information into a sequence of bytes that is suitable for transfer via a byte stream. However, any finite representation can be transformed into bytes.
The only things that you cannot represent and transmit as a byte stream are those that only have an infinite representation (e.g. the complete value of Pi, or the set of all prime numbers), and those that have no digital representation (e.g. beauty or Barack Obama).
Ok, what can be stored/transferred as a System.IO.Stream object in .NET or counterpart in Java?
I don't know about the .NET case, but Java's ObjectOutputStream only works for classes that implement the Serializable or Externalizable interfaces. (And in the former case, all other classes in the non-transient closure of the original object must also implement Serializable.)
Some system classes are not Serializable; for example, Thread, Process, various IO classes and most AWT / Swing related classes. The common theme is that that these classes all involve some kind of resource that is managed by the operating system.
This may be more of a philosophical matter, but anything that you can think of objectively can be stored as a sequence of numbers. Bytes are just one example, but you can store them as a sequence of numbers, text characters (because they are also translatable to numbers), peanuts on a table, anything.
For example, you can represent the same thing as either bytes or hex digits themselves represented as decimal numbers and the characters A, B, C, D, E and F right? such as
#nav{color:#123ABC;}
You can also Base-64 anything, and Base64 means there are 64 possibilities. You could make up Base65 if you wanted to, and it would work too.
Then what can be represented? What can you think of? What can you define rationally? All that can be thought can be represented as a stream of numbers - every file in our hard drives is one after the other in a huge stream, the concept of "folders", "files", etc. is just an abstraction of offsets in that huge chain of ones and zeroes that we interpret as bytes, ints, chars, etc.

Categories

Resources