Reading and Writing a PGM file - java

As a homework assignment we are supposed to read in a .pgm file and then draw a square in by changing the pixel values, and then output the new image.
After I go through and change the pixels I print them all to a .txt as a way to check that they actually got added. The part I'm having trouble with is writing the new file. I know it's supposed to be binary so after googling I think I should be using DataOutputStream, but I could be wrong. After I write the file its size is 1.9MB where as the original is only 480KB, so right off the bat I suspect something must be wrong. Any advice or tips for writing to .pgm files would be great!
public static void writeImage(String fileName) throws IOException{
DataOutputStream writeFile = new DataOutputStream(new FileOutputStream(fileName));
// Write the .pgm header (P5, 800 600, 250)
writeFile.writeUTF(type + "\n");
writeFile.writeUTF(width + " " + height + "\n");
writeFile.writeUTF(max + "\n");
for(int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
writeFile.writeByte(img[i][j]); //Write the number
writeFile.writeUTF(" "); //Add white space
}
writeFile.writeUTF(" \n"); //finished one line so drop to next
}
writeFile.close();
}
When I try to open the new file i get an error message saying "illegal image format", and the original file opens properly.

It can't be right that you use the writeByte method to write a pixel. writeByte will write a single byte even if the argument is of type int. (the eight low-order bits of the argument are written).
You need to read the file format specification carefully and make sure that you are writing out the right number of bytes. A hex editor can help a lot.

I think you are somehow confusing the binary (P5) and ASCII (P2) modes of the PGM format.
The ASCII version has spaces between each pixel and (optional) line feeds, like you have in your code.
For the binary format, you should just write the pixel values (as bytes, as you have max value 250). No spaces or line feeds.
(I don't write the code for you, as this is an assignment, but you are almost there, so I'm sure you'll make it! :-)
PS: Also carefully read the documentation on DataOuputStream.writeUTF(...):
First, two bytes are written to the output stream as if by the writeShort method giving the number of bytes to follow.
Are you sure this is what you want? Keep in mind that the PGM format headers are all ASCII, so there's really no need to use UTF here.

Related

How to increase io performances of this piece of code

How can I make this piece of code extremely quick?
It reads a raw image using RandomAccessFile (in) and write it in a file using DataOutputStream (out)
final int WORD_SIZE = 4;
byte[] singleValue = new byte[WORD_SIZE];
long position;
for (int i=1; i<=100000; i++)
{
out.writeBytes(i + " ");
for(int j=1; j<=17; j++)
{
in.seek(position);
in.read(singleValue);
String str = Integer.toString(ByteBuffer.wrap(singleValue).order(ByteOrder.LITTLE_ENDIAN).getInt());
out.writeBytes(str + " ");
position+=WORD_SIZE;
}
out.writeBytes("\n");
}
The inner for creates a new line in the file every 17 elements
Thanks
I assume that the reason you are asking is because this code is running really slowly. If that is the case, then one reason is that each seek and read call is doing a system call. A RandomAccessFile has no buffering. (I'm guessing that singleValue is a byte[] of length 1.)
So the way to make this go faster is to step back and think about what it is actually doing. If I understand it correctly, it is reading each 4th byte in the file, converting them to decimal numbers and outputting them as text, 17 to a line. You could easily do that using a BufferedInputStream like this:
int b = bis.read(); // read a byte
bis.skip(3); // skip 3 bytes.
(with a bit of error checking ....). If you use a BufferedInputStream like this, most of the read and skip calls will operate on data that has already been buffered, and the number of syscalls will reduce to 1 for every N bytes, where N is the buffer size.
UPDATE - my guess was wrong. You are actually reading alternate words, so ...
bis.read(singleValue);
bis.skip(4);
Every 100000 offsets I have to jump 200000 and then do it again till the end of the file.
Use bis.skip(800000) to do that. It should do a big skip by moving the file position without actually reading any data. One syscall at most. (For a FileInputStream, at least.)
You can also speed up the output side by a roughly equivalent amount by wrapping the DataOutputStream around a BufferedOutputStream.
But System.out is already buffered.

How to read x number of characters from a text file progressively

I am trying to read x characters from a text file at a time, progressively. So if I had: aaaaabbbbbcccccabckcka and im reading 5 at a time I would get, aaaaa, bbbbb,ccccc, abckc and ka. The code I am using is:
status = is.read(bytes);
text = new String(bytes);
where bytes is: bytes = new byte[5], I am calling these two lines of code till status becomes -1, the problem I am facing is, the output is not what I have mentioned above, but I get this:
aaaaa, bbbbb, ccccc, abckc and kackc, notice the last segment 'kackc' is garbage, why is this happening ?
Note: that bytes is initialized once outside the reading loop.
Your current solution works for ASCII, but many characters in other encodings use more than one byte. You should use a Reader and a char[] instead of an InputStream and a byte[], respectively.
It turns out, I need to clear my byte buffer every time I read new input, I just used a for loop to zero it out and it worked

How do you write any ASCII character to a file in Java?

Basically I'm trying to use a BufferedWriter to write to a file using Java. The problem is, I'm actually doing some compression so I generate ints between 0 and 255, and I want to write the character who's ASCII value is equal to that int. When I try writing to the file, it writes many ? characters, so when I read the file back in, it reads those as 63, which is clearly not what I want. Any ideas how I can fix this?
Example code:
int a = generateCode(character); //a now has an int between 0 and 255
bw.write((char) a);
a is always between 0 and 255, but it sometimes writes '?'
You are really trying to write / read bytes to / from a file.
When you are processing byte-oriented data (as distinct from character-oriented data), you should be using InputStream and OutputStream classes and not Reader and Writer classes.
In this case, you should use FileInputStream / FileOutputStream, and wrap with a BufferedInputStream / BufferedOutputStream if you are doing byte-at-a-time reads and writes.
Those pesky '?' characters are due to issues the encoding/decoding process that happens when Java converts between characters and the default text encoding for your platform. The conversion from bytes to characters and back is often "lossy" ... depending on the encoding scheme used. You can avoid this by using the byte-oriented stream classes.
(And the answers that point out that ASCII is a 7-bit not 8-bit character set are 100% correct. You are really trying to read / write binary octets, not characters.)
You need to make up your mind what are you really doing. Are you trying to write some bytes to a file, or are you trying to write encoded text? Because these are different concepts in Java; byte I/O is handled by subclasses of InputStream and OutputStream, while character I/O is handled by subclasses of Reader and Writer. If what you really want to write is bytes to a file (which I'm guessing from your mention of compression), use an OutputStream, not a Writer.
Then there's another confusion you have, which is evident from your mention of "ASCII characters from 0-255." There are no ASCII characters above 127. Please take 15 minutes to read this: "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" (by Joel Spolsky). Pay particular attention to the parts where he explains the difference between a character set and an encoding, because it's critical for understanding Java I/O. (To review whether you understood, here's what you need to learn: Java Writers are classes that translate character output to byte output by applying a client-specified encoding to the text, and sending the bytes to an OutputStream.)
Java strings are based on 16 bit wide characters, it tries to perform conversions around that assumption if there is no clear specifications.
The following sample code, write and reads data directly as bytes, meaning 8-bit numbers which have an ASCII meaning associated with them.
import java.io.*;
public class RWBytes{
public static void main(String[] args)throws IOException{
String filename = "MiTestFile.txt";
byte[] bArray1 =new byte[5];
byte[] bArray2 =new byte[5];
bArray1[0]=65;//A
bArray1[1]=66;//B
bArray1[2]=67;//C
bArray1[3]=68;//D
bArray1[4]=69;//E
FileOutputStream fos = new FileOutputStream(filename);
fos.write(bArray1);
fos.close();
FileInputStream fis = new FileInputStream(filename);
fis.read(bArray2);
ByteArrayInputStream bais = new ByteArrayInputStream(bArray2);
for(int i =0; i< bArray2.length ; i++){
System.out.println("As the bytem value: "+ bArray2[i]);//as the numeric byte value
System.out.println("Converted as char to printiong to the screen: "+ String.valueOf((char)bArray2[i]));
}
}
}
A fixed subset of the 7 bit ASCII code is printable, A=65 for example, the 10 corresponds to the "new line" character which steps down one line on screen when found and "printed". Many other codes exist which manipulate a character oriented screen, these are invisible and manipulated the screen representation like tabs, spaces, etc. There are also other control characters which had the purpose of ringing a bell for example.
The higher 8 bit end above 127 is defined as whatever the implementer wanted, only the lower half have standard meanings associated.
For general binary byte handling there are no such qualm, they are number which represent the data. Only when trying to print to the screen the become meaningful in all kind of ways.

How do we determine the number of lines in a text file?

Hi all I have a local file which looks like this:
AAA Anaa
AAC EL-ARISH
AAE Annaba
AAF APALACHICOLA MUNI AIRPORT
AAG ARAPOTI
AAL Aalborg Airport
AAM Mala Mala
AAN Al Ain
AAQ Anapa
AAR Aarhus Tirstrup Airport
AAT Altay
AAX Araxa
AAY Al Ghaydah
...
Java Tutorials suggests estimating the number of lines in a file by doing java.io.File.length
and dividing the result by 50.
But isn't there a more "solid" way to get the number of lines in a text file (yet without having to pay for the overhead of reading the entire file)?
Can't you just read the file with a FileReader and count the number of lines read?
int lines = 0;
BufferedReader br = new BufferedReader(new FileReader("foo.in"));
while (br.readLine != null) {
lines++;
}
The benefit to the estimation algorithm you've got is that it is very fast: one stat(2) call and then some division. It'll take the same length of time and memory no matter how large or small the file is. But it's also vastly wrong on a huge number of inputs.
Probably the best way to get the specific number is to actually read through the entire file looking for '\n' characters. If you read the file in in large binary blocks (think 16384 bytes or a larger power of two) and look for the specific byte you're interested in, it can go at something approaching the disk IO bandwidth.
You need to use BufferedReader and a counter which increment the value 1 for each readLine().

How do I convert a file's line number to a byte offset (or get the byte offset of the beginning of each line with a BufferedReader)?

I'm using a FileReader wrapped in a LineNumberReader to index a large text file for speedy access later on. Trouble is I can't seem to find a way to read a specific line number directly. BufferedReader supports the skip() function, but I need to convert the line number to a byte offset (or index the byte offset in the first place).
I took a crack at it using RandomAccessFile, and while it worked, it was horribly slow during the initial indexing. BufferedReader's speed is fantastic, but... well, you see the problem.
Some key info:
The file can be any size (currently 35,000 lines)
It's stored on Android's internal filesystem (via getFilesDir() to be exact)
The formatting is not fixed width, unfortunately (hence the need to read by line)
Any ideas?
Describes an extended RandomAccessFile with buffering semantics
Trouble is I can't seem to find a way to read a specific line number directly
Unless you know the length of each line you can't read it directly
There is no shortcut, you will need to read then entire file up front and calculate the offsets manualy.
I would just use a BufferedReader and then get the length of each string and add 1 (or 2?) for the EOL string.
Consider saving an file index along with the large text file. If this file is something you are generating, either on your server, or on the device, it should be trivial to generate an index once and distribute and/or save it along with the file.
I'd recommend an int[] where each value is the absolute offset in bytes for the n*(index+1) th line. So you could have an array of size 35,000 with the start of each line, or an array of size 350, with the start of every 100th line.
Here's an example assuming you have an index file containing an raw sequence of int values:
public String getLineByNumber(RandomAccessFile index,
RandomAccessFile data,
int lineNum) {
index.seek(lineNum*4);
data.seek(index.readInt());
return data.readLine();
}
I took a crack at it using
RandomAccessFile, and while it worked,
it was horribly slow during the
initial indexing
You've started the hard part already. Now for the harder part.
BufferedReader's speed is fantastic,
but...
Is there something in your use of RandomAccessFile that made it slower than it has to be? How many bytes did you read at a time? If you read one byte at a time it will be sloooooow. IF you read in an array of bytes at a time, you can speed things up and use the byte array as a buffer.
Just wrapping up the previous comments :
Either you use RandomAccessFile to first count byte and second parse what you read to find lines by hand OR you use a LineNumberReader to first read lines by lines and count the bytes of each line of char (2 bytes in utf 16 ?) by hand.

Categories

Resources