I've looked at this every way I can think... The problem is that I end up writing the PERFECT number of bytes, and the files are VERY similar - but some bytes are different. I opened the Java generated file in Scite as well as the original, and even though they are close, they are not the same. Is there any way to fix this? I've tried doing everything possible - I've used different wrappers, readers, writers and different methods of taking the byte array (or taking it as chars - tried both) and making it into a file.
The image in question, for the test, is at http://www.google.com/images/srpr/nav_logo13.png. Here's the code:
import java.awt.Image;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import javax.imageio.ImageIO;
public class ImgExample
{
private String address = "http://www.google.com";
/**
* Returns a 3 dimensional array that holds the RGB values of each pixel at the position of the current
* webcam picture. For example, getPicture()[1][2][3] is the pixel at (2,1) and the BLUE value.
* [row][col][0] is alpha
* [row][col][1] is red
* [row][col][2] is green
* [row][col][3] is blue
*/
public int[][][] getPicture()
{
Image camera = null;
try {
int maxChars = 35000;
//The image in question is 28,736 bytes, but I want to make sure it's bigger
//for testing purposes as in my case, it's an image stream so it's unpredictable
byte[] buffer = new byte[maxChars];
//create the connection
HttpURLConnection conn = (HttpURLConnection)(new URL(this.address+"/images/srpr/nav_logo13.png")).openConnection();
conn.setUseCaches(false);
//wrap a buffer around our input stream
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
int bytesRead = 0;
while ( bytesRead < maxChars && reader.ready() )
{
//reader.read returns an int - I'm assuming this is okay?
buffer[bytesRead] = (byte)reader.read();
bytesRead++;
if ( !reader.ready() )
{
//This is here to make sure the stream has time to download the next segment
Thread.sleep(10);
}
}
reader.close();
//Great, write out the file for viewing
File writeOutFile = new File("testgoog.png");
if ( writeOutFile.exists() )
{
writeOutFile.delete();
writeOutFile.createNewFile();
}
FileOutputStream fout = new FileOutputStream(writeOutFile, false);
//FileWriter fout = new FileWriter(writeOutFile, false);
//needed to make sure I was actually reading 100% of the file in question
System.out.println("Bytes read = "+bytesRead);
//write out the byte buffer from the first byte to the end of all the chars read
fout.write(buffer, 0, bytesRead);
fout.flush();
fout.close();
//Finally use a byte stream to create an image
ByteArrayInputStream byteImgStream = new ByteArrayInputStream(buffer);
camera = ImageIO.read(byteImgStream);
byteImgStream.close();
} catch ( Exception e ) { e.printStackTrace(); }
return ImgExample.imageToPixels(camera);
}
public static int[][][] imageToPixels (Image image)
{
//there's a bunch of code here that works in the real program, no worries
//it creates a 3d arr that goes [x][y][alpha, r, g, b val]
//e.g. imageToPixels(camera)[1][2][3] gives the pixel's blue value for row 1 col 2
return new int[][][]{{{-1,-1,-1}}};
}
public static void main(String[] args)
{
ImgExample ex = new ImgExample();
ex.getPicture();
}
}
The problem as I see it is that you're using Readers. In Java, Readers are for processing character streams, not binary streams, and the character conversions that it does are most likely what's changing your bytes on you.
Instead, you should read() from the InputStream directly. InputStream's read() will block until data is available, but returns -1 when the end of the stream is reached.
Edit: You can also wrap the InputStream in a BufferedInputStream.
BufferedReader is intended for reading character streams, not byte/binary streams.
BufferedReader.read() returns The character read, as an integer in the range 0 to 65535. This will likely truncate any binary data where the byte value is greater than 65535.
I think you want to use InputStream.read() directly, not wrapped in a BufferedReader/InputStreamReader.
And finally, not related to the problem, but if you open a FileOutputStream to append=false, there isn't really any point to deleting any file that already exists - append=false does the same thing.
I think your problem is you are using an InputStreamReader. From the javadocs
An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
You dont want the conversion to character streams.
And you also shouldn't be using ready() like that. You are just wasting time with that and the sleeps. read() will block until data arrives anyway, and it will block the correct length of time, not an arbitrary guess. The canonical copy loop in Java goes like this:
int count;
byte[] buffer; // whatever size you like
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
Related
I'm working on a .opus music library software which converts audio/video files to .opus files and tags them with metadata automatically.
Previous versions of the program have saved the album art as binary data apparently as revealed by exiftool.
The thing is that when I run the command to output data as binary using the -b option, the entire thing is in binary seemingly. I'm not sure how to get the program to parse it. I was kind of expecting an entry like Picture : 11010010101101101011....
The output looks similar to this though:
How can I parse the picture data so I can reconstruct the image for newer versions of the program? (I'm using Java8_171 on Kubuntu 18.04)
It looks like you're trying to open the raw bytes in a text editor, which will of course give you gobble-dee-gook since those raw bytes do not represent characters that can be displayed by any text editor. I can see from your output from exiftool that you are able to know the length of the image in bytes. Providing you know the beginning byte position in the file, this should make your task relatively easy with a little bit of Java code. If you can get the starting position of the image inside your file, you should be able to do something like:
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.*;
public class SaveImage {
public static void main(String[] args) throws IOException {
byte[] imageBytes;
try (RandomAccessFile binaryReader =
new RandomAccessFile("your-file.xxx", "r")) {
int dataLength = 0; // Assign this the byte length shown in your
// post instead of zero
int startPos = 0; // I assume you can find this somehow.
// If it's not at the beginning
// change it accordingly.
imageBytes = new byte[dataLength];
binaryReader.read(imageBytes, startPos, dataLength);
}
try (InputStream in = new ByteArrayInputStream(imageBytes)) {
BufferedImage bImageFromConvert = ImageIO.read(in);
ImageIO.write(bImageFromConvert,
"jpg", // or whatever file format is appropriate
new File("/path/to/your/file.jpg"));
}
}
}
I'm trying to learn about RandomAccessFile but after creating a test program I'm getting some bizarre output.
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
public class RandomAccessFileTest
{
public static void main(String[] args) throws IOException
{
// Create a new blank file
File file = new File("RandomAccessFileTest.txt");
file.createNewFile();
// Open the file in read/write mode
RandomAccessFile randomfile = new RandomAccessFile(file, "rw");
// Write stuff
randomfile.write("Hello World".getBytes());
// Go to a location
randomfile.seek(0);
// Get the pointer to that location
long pointer = randomfile.getFilePointer();
System.out.println("location: " + pointer);
// Read a char (two bytes?)
char letter = randomfile.readChar();
System.out.println("character: " + letter);
randomfile.close();
}
}
This program prints out
location: 0
character: ?
Turns out that the value of letter was '䡥' when it should be 'H'.
I've found a question similar to this, and apparently this is caused by reading one byte instead of two, but it didn't explain how exactly to fix it.
You've written "Hello World" in the platform default encoding - which is likely to use a single byte per character.
You're then reading RandomAccessFile.readChar which always reads two bytes. Documentation:
Reads a character from this file. This method reads two bytes from the file, starting at the current file pointer. If the bytes read, in order, are b1 and b2, where 0 <= b1, b2 <= 255, then the result is equal to:
(char)((b1 << 8) | b2)
This method blocks until the two bytes are read, the end of the stream is detected, or an exception is thrown.
So H and e are being combined into a single character - H is U+0048, e is U+0065, so assuming they've been written as ASCII character, you're reading bytes 0x48 and 0x65 and combining them into U+4865 which is a Han character for "a moving cart".
Basically, you shouldn't be using readChar to try to read this data.
Usually to read a text file, you want an InputStreamReader (with an appropriate encoding) wrapping an InputStream (e.g. a FileInputStream). It's not really ideal to try to do this with RandomAccessFile - you could read data into a byte[] and then convert that into a String but there are all kinds of subtleties you'd need to think about.
I do not have experience using Java channels. I would like to write a byte array to a file. Currently, I have the following code:
String outFileString = DEFAULT_DECODED_FILE; // Valid file pathname
FileSystem fs = FileSystems.getDefault();
Path fp = fs.getPath(outFileString);
FileChannel outChannel = FileChannel.open(fp, EnumSet.of(StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE));
// Please note: result.getRawBytes() returns a byte[]
ByteBuffer buffer = ByteBuffer.allocate(result.getRawBytes().length);
buffer.put(result.getRawBytes());
outChannel.write(buffer); // File successfully created/truncated, but no data
With this code, the output file is created, and truncated if it exists. Also, in the IntelliJ debugger, I can see that buffer contains data. Also, the line outChannel.write() is successfully called without throwing an exception. However, after the program exits, the data does not appear in the output file.
Can somebody (a) tell me if the FileChannel API is an acceptable choice for writing a byte array to a file, and (b) if so, how should the above code be modified to get it to work?
As gulyan points out, you need to flip() your byte buffer before writing it. Alternately, you could wrap your original byte array:
ByteBuffer buffer = ByteBuffer.wrap(result.getRawBytes());
To guarantee the write is on disk, you need to use force():
outChannel.force(false);
Or you could close the channel:
outChannel.close();
You should call:
buffer.flip();
before the write.
This prepares the buffer for reading.
Also, you should call
buffer.clear();
before putting data into it.
To answer your first question
tell me if the FileChannel API is an acceptable choice for writing a byte array to a file
It's ok but there's simpler ways. Try using a FileOutputStream. Typically this would be wrapped by a BufferedOutputStream for performance but the key is both of these extend OutputStream which has a simple write(byte[]) method. This is much easier to work with than the channel/buffer API.
Here is a complete example of FileChannel.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.WritableByteChannel;
public class FileChannelTest {
// This is a Filer location where write operation to be done.
private static final String FILER_LOCATION = "C:\\documents\\test";
// This is a text message that to be written in filer location file.
private static final String MESSAGE_WRITE_ON_FILER = "Operation has been committed.";
public static void main(String[] args) throws FileNotFoundException {
// Initialized the File and File Channel
RandomAccessFile randomAccessFileOutputFile = null;
FileChannel outputFileChannel = null;
try {
// Create a random access file with 'rw' permission..
randomAccessFileOutputFile = new RandomAccessFile(FILER_LOCATION + File.separator + "readme.txt", "rw");
outputFileChannel = randomAccessFileOutputFile.getChannel();
//Read line of code one by one and converted it into byte array to write into FileChannel.
final byte[] bytes = (MESSAGE_WRITE_ON_FILER + System.lineSeparator()).getBytes();
// Defined a new buffer capacity.
ByteBuffer buffer = ByteBuffer.allocate(bytes.length);
// Put byte array into butter array.
buffer.put(bytes);
// its flip the buffer and set the position to zero for next write operation.
buffer.flip();
/**
* Writes a sequence of bytes to this channel from the given buffer.
*/
outputFileChannel.write(buffer);
System.out.println("File Write Operation is done!!");
} catch (IOException ex) {
System.out.println("Oops Unable to proceed file write Operation due to ->" + ex.getMessage());
} finally {
try {
outputFileChannel.close();
randomAccessFileOutputFile.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
I have written a code to split a .gz file into user specified parts using byte[] array. But the for loop is not reading/writing the last part of the parent file which is less than the array size. Can you please help me in fixing this?
package com.bitsighttech.collection.packaging;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.log4j.Logger;
public class FileSplitterBytewise
{
private static Logger logger = Logger.getLogger(FileSplitterBytewise.class);
private static final long KB = 1024;
private static final long MB = KB * KB;
private FileInputStream fis;
private FileOutputStream fos;
private DataInputStream dis;
private DataOutputStream dos;
public boolean split(File inputFile, String splitSize)
{
int expectedNoOfFiles =0;
try
{
double parentFileSizeInB = inputFile.length();
Pattern p = Pattern.compile("(\\d+)\\s([MmGgKk][Bb])");
Matcher m = p.matcher(splitSize);
m.matches();
String FileSizeString = m.group(1);
String unit = m.group(2);
double FileSizeInMB = 0;
try {
if (unit.toLowerCase().equals("kb"))
FileSizeInMB = Double.parseDouble(FileSizeString) / KB;
else if (unit.toLowerCase().equals("mb"))
FileSizeInMB = Double.parseDouble(FileSizeString);
else if (unit.toLowerCase().equals("gb"))
FileSizeInMB = Double.parseDouble(FileSizeString) * KB;
} catch (NumberFormatException e) {
logger.error("invalid number [" + FileSizeInMB + "] for expected file size");
}
double fileSize = FileSizeInMB * MB;
int fileSizeInByte = (int) Math.ceil(fileSize);
double noOFFiles = parentFileSizeInB/fileSizeInByte;
expectedNoOfFiles = (int) Math.ceil(noOFFiles);
int splinterCount = 1;
fis = new FileInputStream(inputFile);
dis = new DataInputStream(new BufferedInputStream(fis));
fos = new FileOutputStream("F:\\ff\\" + "_part_" + splinterCount + "_of_" + expectedNoOfFiles);
dos = new DataOutputStream(new BufferedOutputStream(fos));
byte[] data = new byte[(int) fileSizeInByte];
while ( splinterCount <= expectedNoOfFiles ) {
int i;
for(i = 0; i<data.length-1; i++)
{
data[i] = s.readByte();
}
dos.write(data);
splinterCount ++;
}
}
catch(Exception e)
{
logger.error("Unable to split the file " + inputFile.getName() + " in to " + expectedNoOfFiles);
return false;
}
logger.debug("Successfully split the file [" + inputFile.getName() + "] in to " + expectedNoOfFiles + " files");
return true;
}
public static void main(String args[])
{
String FilePath1 = "F:\\az.gz";
File file= new File(FilePath1);
FileSplitterBytewise fileSplitter = new FileSplitterBytewise();
String splitlen = "1 MB";
fileSplitter.split(file, splitlen);
}
}
I'd suggest to make more methods. You've got a complicated string-handling section of code in split(); it would be best to make a method that takes the human-friendly string as input and returns the number you're looking for. (It would also make it far easier for you to test this section of the routine; there's no way you can test it now.)
Once it is split off and you're writing test cases, you'll probably find that the error message you generate if the string doesn't contain kb, mb, or gb is extremely confusing -- it blames the number 0 for the mistake rather than pointing out the string does not have the expected units.
Using an int to store the file size means your program will never handle files larger than two gigabytes. You should stick with long or double. (double feels wrong for something that is actually confined to integer values but I can't quickly think why it would fail.)
byte[] data = new byte[(int) fileSizeInByte];
Allocating several gigabytes like this is going to destroy your performance -- that's a potentially huge memory allocation (and one that might be considered under control of an adversary; depending upon your security model, this might or might not be a big deal). Don't try to work with the entire file in one piece.
You appear to be reading and writing the files one byte at a time. That's a guarantee to very slow performance. Doing some performance testing for another question earlier today, I found that my machine could read (from a hot cache) 2000 times faster using 131kb blocks than two-byte blocks. One-byte blocks would be even worse. A cold cache would be significantly worse for such small sizes.
fos = new FileOutputStream("F:\\ff\\" + "_part_" + splinterCount + "_of_" + expectedNoOfFiles);
You only appear to ever open one file output stream. Your post probably should have said "only the first works", because it looks like you've not yet tried it on a file that creates three or more pieces.
catch(Exception e)
At this point, you've got the ability to discover errors in your program; you choose to ignore them completely. Sure, you log an error message, but you cannot actually debug your program with the data you log. You should log at a minimum the exception type, message, and maybe even full stack-trace. This combination of data is immensely useful when trying to solve problems, especially in a few months when you've forgotten the details of how it works.
Can you please help me in fixing this?
I would use;
drop the DataInput/OutputStreams, you don't need them.
use in.read(data) to read the whole block instead on one byte at a time. Reading one byte at a time is so much slower!
or read the whole of the data array, you are reading one less.
stop when you reach the end of the file, it might not be a whole multiple of the size.
only write as much as you have read, if your blocks at 1 MB byte there is 100 KB left you should only read/write 100 KB at the end.
close your files when have finished, esp as you have a buffered stream.
you "split" writes everything to the same file (so its not actually splitting) You need to create, write to and close output files in a loop.
don't use fields when you could be/should be using local variables.
would use the length as a long in bytes.
the pattern ignores incorrect input and your pattern doesn't match the test you check for. e.g. your patten allows 1 G or 1 k but these will be treated as 1 MB.
So I have large (around 4 gigs each) txt files in pairs and I need to create a 3rd file which would consist of the 2 files in shuffle mode. The following equation presents it best:
3rdfile = (4 lines from file 1) + (4 lines from file 2) and this is repeated until I hit the end of file 1 (both input files will have the same length - this is by definition). Here is the code I'm using now but this doesn't scale very good on large files. I was wondering if there is a more efficient way to do this - would working with memory mapped file help ? All ideas are welcome.
public static void mergeFastq(String forwardFile, String reverseFile, String outputFile) {
try {
BufferedReader inputReaderForward = new BufferedReader(new FileReader(forwardFile));
BufferedReader inputReaderReverse = new BufferedReader(new FileReader(reverseFile));
PrintWriter outputWriter = new PrintWriter(new FileWriter(outputFile, true));
String forwardLine = null;
System.out.println("Begin merging Fastq files");
int readsMerge = 0;
while ((forwardLine = inputReaderForward.readLine()) != null) {
//append the forward file
outputWriter.println(forwardLine);
outputWriter.println(inputReaderForward.readLine());
outputWriter.println(inputReaderForward.readLine());
outputWriter.println(inputReaderForward.readLine());
//append the reverse file
outputWriter.println(inputReaderReverse.readLine());
outputWriter.println(inputReaderReverse.readLine());
outputWriter.println(inputReaderReverse.readLine());
outputWriter.println(inputReaderReverse.readLine());
readsMerge++;
if(readsMerge % 10000 == 0) {
System.out.println("[" + now() + "] Merged 10000");
readsMerge = 0;
}
}
inputReaderForward.close();
inputReaderReverse.close();
outputWriter.close();
} catch (IOException ex) {
Logger.getLogger(Utilities.class.getName()).log(Level.SEVERE, "Error while merging FastQ files", ex);
}
}
Maybe you also want to try to use a BufferedWriter to cut down your file IO operations.
http://download.oracle.com/javase/6/docs/api/java/io/BufferedWriter.html
A simple answer is to use a bigger buffer, which help to reduce to total number of I/O call being made.
Usually, memory mapped IO with FileChannel (see Java NIO) will be used for handling large data file IO. In this case, however, it is not the case, as you need to inspect the file content in order to determine the boundary for every 4 lines.
If performance was the main requirement, then I would code this function in C or C++ instead of Java.
But regardless of language used, what I would do is try to manage memory myself. I would create two large buffers, say 128MB or more each and fill them with data from the two text files. Then you need a 3rd buffer that is twice as big as the previous two. The algorithm will start moving characters one by one from input buffer #1 to destination buffer, and at the same time count EOLs. Once you reach the 4th line you store the current position on that buffer away and repeat the same process with the 2nd input buffer. You continue alternating between the two input buffers, replenishing the buffers when you consume all the data in them. Each time you have to refill the input buffers you can also write the destination buffer and empty it.
Buffer your read and write operations. Buffer needs to be large enough to minimize the read/write operations and still be memory efficient. This is really simple and it works.
void write(InputStream is, OutputStream os) throws IOException {
byte[] buf = new byte[102400]; //optimize the size of buffer to your needs
int num;
while((n = is.read(buf)) != -1){
os.write(buffer, 0, num);
}
}
EDIT:
I just realized that you need to shuffle the lines, so this code will not work for you as is but, the concept still remains the same.