I'm trying to copy part of a file from a filechannel to another (writing a new file, in effect, equals to the first one).
So, I'm reading chunks of 256kb, then putting them back into another channel
static void openfile(String str) throws FileNotFoundException, IOException {
int size=262144;
FileInputStream fis = new FileInputStream(str);
FileChannel fc = fis.getChannel();
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap(barray);
FileOutputStream fos = new FileOutputStream(str+"2" /**/);
FileChannel fo = fos.getChannel();
StringBuilder sb;
while (fc.read(bb) != -1) {
fo.write(bb /**/);
bb.clear();
}
}
The problem is that fo.write (I think) writes again from the beginning of the channel, so the new file is made only of the last chunk read.
I tried with fo.write (bb, bb.position()) but it didn't work as I expected (does the pointer returns to the beginning of the channel?) and with FileOutputStream(str+"2", true) thinking it would append to the end of the new file, but it didn't.
I need to work with chunks of 256kb, so I can't change much the structure of the program (unless I am doing something terribly wrong)
Resolved with bb.flip();
while (fi.read(bb) != -1) {
bb.flip();
fo.write(bb);
bb.clear();
}
This is a very old question but I stumbled upon it and though I might add another answer that has potentially better performance using using FileChannel.transferTo or FileChannel.transferFrom. As per the javadoc:
This method is potentially much more efficient than a simple loop that reads from the source channel and writes to this channel. Many operating systems can transfer bytes directly from the source channel into the filesystem cache without actually copying them.
public static void copy(FileChannel src, FileChannel dst) throws IOException {
long size = src.size();
long transferred = 0;
do {
transferred += src.transferTo(0, size, dst);
} while (transferred < size);
}
on most cases a simple src.transferTo(0, src.size(), dst); will work if non of the channels are non-blocking.
The canonical way to copy between channels is as follows:
while (in.read(bb) > 0 || bb.position() > 0)
{
bb.flip();
out.write(bb);
bb.compact();
}
The simplified version in your edited answer doesn't work in all circumstances, e.g. when 'out' is non-blocking.
Related
In Java, how do you split a binary file into multiple parts while only loading a small portion of the File into memory at one time?
So I have a file FullFile that is large. I need to upload it to cloud storage but it's so large that it often times out.
I can make this problem less likely if I split the file and upload in chunks.
So I need to split FullFile into files of chunk size MaxChunkSize.
List<File> fileSplit(File fullFile, int maxChunkSize)
File fileJoin(List<File> splitFiles)
Most code snippets around require the file to be text. But in my case the files are compressed binary.
What would be the best way to implement these methods?
Below is the full answer:
The maxChunkSize represents the size in bytes of a file chunk.
In the example below I read a 5mb zip file and split it into five 1MB chunks and later join them back using the fileJoin function.
The method stageLocally stages the files locally but you can modify it to work with any cloud storage. (Better to abstract this out so you can switch between multiple storage implementations)
You can tweak maxChunkSize based on the amount of data you want to store inmemory at a given time
The IOutils.copy() methods is from the commons library, here is the maven link. You can also use Files.copy() in liue of it. The Files.copy() methods comes from the java.nio package, so you don't have to add an external dependency to use it.
I have ommitted the exception handling for brevity.
public static void main(String[] args) throws IOException {
File input = new File(_5_MB_FILE_PATH);
File outPut = fileJoin(split(input, 1_024_000));
System.out.println(IOUtils.contentEquals(Files.newInputStream(input.toPath()), Files.newInputStream(outPut.toPath())));
}
public static List<File> split(File largeFile, int maxChunkSize) throws IOException {
InputStream in = Files.newInputStream(largeFile.toPath());
List<File> list = new ArrayList<>();
final byte[] buffer = new byte[maxChunkSize];
int dataRead = in.read(buffer);
while (dataRead > -1) {
list.add(stageLocally(buffer, dataRead));
dataRead = in.read(buffer);
}
return list;
}
private static File stageLocally(byte[] buffer, int length) throws IOException {
File outPutFile = File.createTempFile("temp-", "split", new File(TEMP_DIRECTORY));
FileOutputStream fos = new FileOutputStream(outPutFile);
fos.write(buffer, 0, length);
fos.close();
return outPutFile;
}
public static File fileJoin(List<File> list) throws IOException {
File outPutFile = File.createTempFile("temp-", "unsplit", new File(TEMP_DIRECTORY));
FileOutputStream fileOutputStream = new FileOutputStream(outPutFile);
for (File file : list) {
InputStream in = Files.newInputStream(file.toPath());
IOUtils.copy(in, fileOutputStream);
in.close();
}
fileOutputStream.close();
return outPutFile;
}
Let me know if this helps.
I have 2 disks in the Linux system, say /dev/dsk1 and /dev/dsk2, and I'm trying to read the raw data from dsk1 in bytes and write them into dsk2, in order to make dsk2 an exact copy of dsk1. I tried to do that in the following way (executed with sudo):
import...
public class Main {
public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
Path src = new File("/dev/dsk1").toPath();
Path dst = new File("/dev/dsk2").toPath();
FileChannel r = FileChannel.open(src, StandardOpenOption.READ, StandardOpenOption.WRITE);
FileChannel w = FileChannel.open(dst, StandardOpenOption.READ, StandardOpenOption.WRITE);
long size = r.size();
ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
for (int offset = 0; offset < size; offset+=1024) {
r.position(offset);
w.position(offset);
r.read(byteBuffer);
byteBuffer.flip();
w.write(byteBuffer);
byteBuffer.clear();
}
r.close();
w.close();
}
}
but after writing all the bytes in dsk1 to dsk2, dsk2's filesystem seems to be corrupted. No files can be found in it and if I try to mkdir it will say "structure needs cleaning".
I've tested the above code on regular files, like a text1.txt containing a few characters as src and an empty text2.txt as dst, and it worked fine.
Did I miss something there when reading & writing raw data on block device?
You never check if read method read all 1024 bytes, or if write method wrote them all. Most likely you're leaving gaps in the copy.
There's no magic involved reading from and writing to devices. The first thing I would try is this:
try (FileInputStream src = new FileInputStream("/dev/dsk1");
FileOutputStream dst = new FileOutputStream("/dev/dsk2")) {
src.transferTo(dst);
}
I've written Java code for writing String into a file. Size of string will be hardly 10KB.
Below is the code I've written to write files. I've written 3 ways to write into a file.
void writeMethod(String string, int m)
{
if (m == 1)
{
FileChannel rwChannel = new RandomAccessFile(filePath, "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, string.length() * 1);
wrBuf.put(string.getBytes());
rwChannel.close();
}
if (m == 2)
{
FileOutputStream fileOutputStream = new FileOutputStream(filePath);
fileOutputStream.write(string.getBytes());
fileOutputStream.close();
}
if (m == 3)
{
FileWriter bw new FileWriter(filePath);
bw.write(string);
bw.close( );
}
}
**Ignore errors
I call the above function from 3 threads, one method per thread. I'm not sure which one is the fastest. If not among these ways, which one is good. I've to write 17,000,000 files.
You might also want to try the java.nio.file package as one of your methods for test purpose.
Something like:
Path path = Paths.get(filePath);
Files.write(path, string.getBytes(), null);
I have an uncompressed binary file in res/raw that I was reading this way:
public byte[] file2Bytes (int rid) {
byte[] buffer = null;
try {
AssetFileDescriptor afd = res.openRawResourceFd(rid);
FileInputStream in = new FileInputStream(afd.getFileDescriptor());
int len = (int)afd.getLength();
buffer = new byte[len];
in.read(buffer, 0, len);
in.close();
} catch (Exception ex) {
Log.w(ACTNAME, "file2Bytes() fail\n"+ex.toString());
return null;
}
return buffer;
}
However, buffer did not contain what it was supposed to. The source file is 1024 essentially random bytes (a binary key). But buffer, when written out and examined, was not the same. Amongst unprintable bytes at beginning appeared "res/layout/main.xml" (the literal path) and then further down, part of the text content of another file from res/raw. O_O?
Exasperated after a while, I tried:
AssetFileDescriptor afd = res.openRawResourceFd(rid);
//FileInputStream in = new FileInputStream(afd.getFileDescriptor());
FileInputStream in = afd.createInputStream();
Presto, I got the content correctly -- this is easily reproducible.
So the relevant API docs read:
public FileDescriptor getFileDescriptor ()
Returns the FileDescriptor that can be used to read the data in the
file.
public FileInputStream createInputStream ()
Create and return a new auto-close input stream for this asset. This
will either return a full asset
AssetFileDescriptor.AutoCloseInputStream, or an underlying
ParcelFileDescriptor.AutoCloseInputStream depending on whether the the
object represents a complete file or sub-section of a file. You should
only call this once for a particular asset.
Why would a FileInputStream() constructed from getFileDescriptor() end up with garbage whereas createInputStream() gives proper access?
As per pskink's comment, the FileDescriptor returned by AssetFileDescriptor() is apparently not an fd that refers just to the file -- it perhaps refers to whatever bundle/parcel/conglomeration aapt has made of the resources.
AssetFileDescriptor afd = res.openRawResourceFd(rid);
FileInputStream in = new FileInputStream(afd.getFileDescriptor());
in.skip(afd.getStartOffset());
Turns out to be the equivalent of the FileInputStream in = afd.createInputStream() version.
I suppose there is a hint in the difference between "create" (something new) and "get" (something existing). :/
AssetFileDescriptor can be thought of as the entry point to the entire package's assets data.
I have run into the same issue and solved it finally.
If you want to manually create a stream from an AssetFileDescriptor, you have to skip n bytes to the requested resource. It is like you are paging thru all the available files in one big file.
Thanks to pskink! I had a look at the hex content of the jpg image I want to acquire, it starts with -1. The thing is, there are two jpg images. I did not know, so I arbitrarily skip 76L bytes. Got the first image!
I want to read images inside a .CBZ archive and store them inside an ArrayList. I have tried the following solution but it has, at least, 2 problems.
I get an OutOfMemory error after adding 10-15 images to the ArrayList
There must be a better way of getting the images inside the ArrayList instead of writing them on a temp file and reading that again.
public class CBZHandler {
final int BUFFER = 2048;
ArrayList<BufferedImage> images = new ArrayList<BufferedImage>();
public void extractCBZ(ZipInputStream tis) throws IOException{
ZipEntry entry;
BufferedOutputStream dest = null;
if(!images.isEmpty())
images.clear();
while((entry = tis.getNextEntry()) != null){
System.out.println("Extracting " + entry.getName());
int count;
FileOutputStream fos = new FileOutputStream("temp");
dest = new BufferedOutputStream(fos,BUFFER);
byte data[] = new byte[BUFFER];
while ((count = tis.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
BufferedImage img = ImageIO.read(new FileInputStream("temp"));
images.add(img);
}
tis.close();
}
}
The "OutOfMemoryError" may or may not be inherent in the amount of data you're trying to store in memory. You may need to change your maximum heap size. However, you can certainly avoid writing to disk - just write to a ByteArrayOutputStream instead, then you can get at the data as a byte array - potentially creating a ByteArrayInputStream round it if you need to. Do you definitely need to add them in your list as BufferedImage rather than (say) keeping each as a byte[]?
Note that if you're able to use Guava it makes the "extract data from an InputStream" bit very easy:
byte[] data = ByteStreams.toByteArray(tis);
Each BufferedImage will typically require significantly more memory than the byte[] from which it is constructed. Cache the byte[] and stamp each one out to an image as needed.