This question already has answers here:
Java - Read file and split into multiple files
(11 answers)
Closed 3 years ago.
How can I split a file into parts larger than 2GB?
An array of bytes accepts an int instead of a long as the size. any solution?
public void splitFile(SplitFile file) throws IOException {
int partCounter = 1;
int sizeOfFiles = (int)value;
byte[] buffer = new byte[sizeOfFiles];
File f = file.getFile();
String fileName = f.getName();
try (FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis)) {
int bytesAmount = 0;
while ((bytesAmount = bis.read(buffer)) > 0) {
String filePartName = fileName + partCounter + file.config.options.getExtension();
partCounter++;
File newFile = new File(f.getParent(), filePartName);
try (FileOutputStream out = new FileOutputStream(newFile)) {
out.write(buffer, 0, bytesAmount);
}
}
}
}
Don't read the entire file into memory, obviously, or even an entire 'part file'.
Your code as pasted will split the file into as many parts as the read method partitions; this seems very silly; after all, the read() method is specced to allow it to partition into single byte increments.
Don't make a new part-file for every call to read. Instead, separate this out: Your read call gets anywhere from 1 to <BUFFER_SIZE> bytes, and your part's size is <PART_SIZE> large; these two things do not have to be the same and you shouldn't write the code that way.
Once you have an open FileOutputStream you can call .write(buffer, 0, bytesAmount) on it any number of times; you can even call .write(buffer, 0, theSmallerOfBytesLeftToWriteInThisPartAndBytesAmount) followed by opening up the next part file FileOutputStream and calling .write(buffer, whereWeLeftOff, remainder) on that one.
Related
In Java, how do you split a binary file into multiple parts while only loading a small portion of the File into memory at one time?
So I have a file FullFile that is large. I need to upload it to cloud storage but it's so large that it often times out.
I can make this problem less likely if I split the file and upload in chunks.
So I need to split FullFile into files of chunk size MaxChunkSize.
List<File> fileSplit(File fullFile, int maxChunkSize)
File fileJoin(List<File> splitFiles)
Most code snippets around require the file to be text. But in my case the files are compressed binary.
What would be the best way to implement these methods?
Below is the full answer:
The maxChunkSize represents the size in bytes of a file chunk.
In the example below I read a 5mb zip file and split it into five 1MB chunks and later join them back using the fileJoin function.
The method stageLocally stages the files locally but you can modify it to work with any cloud storage. (Better to abstract this out so you can switch between multiple storage implementations)
You can tweak maxChunkSize based on the amount of data you want to store inmemory at a given time
The IOutils.copy() methods is from the commons library, here is the maven link. You can also use Files.copy() in liue of it. The Files.copy() methods comes from the java.nio package, so you don't have to add an external dependency to use it.
I have ommitted the exception handling for brevity.
public static void main(String[] args) throws IOException {
File input = new File(_5_MB_FILE_PATH);
File outPut = fileJoin(split(input, 1_024_000));
System.out.println(IOUtils.contentEquals(Files.newInputStream(input.toPath()), Files.newInputStream(outPut.toPath())));
}
public static List<File> split(File largeFile, int maxChunkSize) throws IOException {
InputStream in = Files.newInputStream(largeFile.toPath());
List<File> list = new ArrayList<>();
final byte[] buffer = new byte[maxChunkSize];
int dataRead = in.read(buffer);
while (dataRead > -1) {
list.add(stageLocally(buffer, dataRead));
dataRead = in.read(buffer);
}
return list;
}
private static File stageLocally(byte[] buffer, int length) throws IOException {
File outPutFile = File.createTempFile("temp-", "split", new File(TEMP_DIRECTORY));
FileOutputStream fos = new FileOutputStream(outPutFile);
fos.write(buffer, 0, length);
fos.close();
return outPutFile;
}
public static File fileJoin(List<File> list) throws IOException {
File outPutFile = File.createTempFile("temp-", "unsplit", new File(TEMP_DIRECTORY));
FileOutputStream fileOutputStream = new FileOutputStream(outPutFile);
for (File file : list) {
InputStream in = Files.newInputStream(file.toPath());
IOUtils.copy(in, fileOutputStream);
in.close();
}
fileOutputStream.close();
return outPutFile;
}
Let me know if this helps.
I'm trying to parse my file which keeps all data in binary form. How to read N bytes from file with offset M? And then I need to convert it to String using new String(myByteArray, "UTF-8");. Thanks!
Here's some code:
File file = new File("my_file.txt");
byte [] myByteArray = new byte [file.lenght];
UPD 1: The answers I see are not appropriative. My file keeps strings in byte form, for example: when I put string "str" in my file it actually prints smth like [B#6e0b... in my file. Thus I need to get from this byte-code my string "str" again.
UPD 2: As it's found out the problem appears when I use toString():
PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(System.getProperty("db.file")), true), "UTF-8")));
Iterator it = storage.entrySet().iterator();//storage is a map<String, String>
while (it.hasNext()){
Map.Entry pairs = (Map.Entry)it.next();
String K = new String(pairs.getKey().toString());
String V = new String(pairs.getValue().toString);
writer.println(K.length() + " " + K.getBytes() + " " + V.length() + " " + V.getBytes());//this is just the format I need to have in file
it.remove();
}
May be there're some different ways to perform that?
As of Java 7, reading the whole of a file really easy - just use Files.readAllBytes(path). For example:
Path path = Paths.get("my_file.txt");
byte[] data = Files.readAllBytes(path);
If you need to do this more manually, you should use a FileInputStream - your code so far allocates an array, but doesn't read anything from the file.
To read just a portion of a file, you should look at using RandomAccessFile, which allows you to seek to wherever you want. Be aware that the read(byte[]) method does not guarantee to read all the requested data in one go, however. You should loop until either you've read everything you need, or use readFully instead. For example:
public static byte[] readPortion(File file, int offset, int length)
throws IOException {
byte[] data = new byte[length];
try (RandomAccessFile raf = new RandomAccessFile(file)) {
raf.seek(offset);
raf.readFully(data);
}
return data;
}
EDIT: Your update talks about seeing text such as [B#6e0b... That suggests you're calling toString() on a byte[] at some point. Don't do that. Instead, you should use new String(data, StandardCharsets.UTF_8) or something similar - picking the appropriate encoding, of course.
I'm trying to copy part of a file from a filechannel to another (writing a new file, in effect, equals to the first one).
So, I'm reading chunks of 256kb, then putting them back into another channel
static void openfile(String str) throws FileNotFoundException, IOException {
int size=262144;
FileInputStream fis = new FileInputStream(str);
FileChannel fc = fis.getChannel();
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap(barray);
FileOutputStream fos = new FileOutputStream(str+"2" /**/);
FileChannel fo = fos.getChannel();
StringBuilder sb;
while (fc.read(bb) != -1) {
fo.write(bb /**/);
bb.clear();
}
}
The problem is that fo.write (I think) writes again from the beginning of the channel, so the new file is made only of the last chunk read.
I tried with fo.write (bb, bb.position()) but it didn't work as I expected (does the pointer returns to the beginning of the channel?) and with FileOutputStream(str+"2", true) thinking it would append to the end of the new file, but it didn't.
I need to work with chunks of 256kb, so I can't change much the structure of the program (unless I am doing something terribly wrong)
Resolved with bb.flip();
while (fi.read(bb) != -1) {
bb.flip();
fo.write(bb);
bb.clear();
}
This is a very old question but I stumbled upon it and though I might add another answer that has potentially better performance using using FileChannel.transferTo or FileChannel.transferFrom. As per the javadoc:
This method is potentially much more efficient than a simple loop that reads from the source channel and writes to this channel. Many operating systems can transfer bytes directly from the source channel into the filesystem cache without actually copying them.
public static void copy(FileChannel src, FileChannel dst) throws IOException {
long size = src.size();
long transferred = 0;
do {
transferred += src.transferTo(0, size, dst);
} while (transferred < size);
}
on most cases a simple src.transferTo(0, src.size(), dst); will work if non of the channels are non-blocking.
The canonical way to copy between channels is as follows:
while (in.read(bb) > 0 || bb.position() > 0)
{
bb.flip();
out.write(bb);
bb.compact();
}
The simplified version in your edited answer doesn't work in all circumstances, e.g. when 'out' is non-blocking.
I want to read images inside a .CBZ archive and store them inside an ArrayList. I have tried the following solution but it has, at least, 2 problems.
I get an OutOfMemory error after adding 10-15 images to the ArrayList
There must be a better way of getting the images inside the ArrayList instead of writing them on a temp file and reading that again.
public class CBZHandler {
final int BUFFER = 2048;
ArrayList<BufferedImage> images = new ArrayList<BufferedImage>();
public void extractCBZ(ZipInputStream tis) throws IOException{
ZipEntry entry;
BufferedOutputStream dest = null;
if(!images.isEmpty())
images.clear();
while((entry = tis.getNextEntry()) != null){
System.out.println("Extracting " + entry.getName());
int count;
FileOutputStream fos = new FileOutputStream("temp");
dest = new BufferedOutputStream(fos,BUFFER);
byte data[] = new byte[BUFFER];
while ((count = tis.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
BufferedImage img = ImageIO.read(new FileInputStream("temp"));
images.add(img);
}
tis.close();
}
}
The "OutOfMemoryError" may or may not be inherent in the amount of data you're trying to store in memory. You may need to change your maximum heap size. However, you can certainly avoid writing to disk - just write to a ByteArrayOutputStream instead, then you can get at the data as a byte array - potentially creating a ByteArrayInputStream round it if you need to. Do you definitely need to add them in your list as BufferedImage rather than (say) keeping each as a byte[]?
Note that if you're able to use Guava it makes the "extract data from an InputStream" bit very easy:
byte[] data = ByteStreams.toByteArray(tis);
Each BufferedImage will typically require significantly more memory than the byte[] from which it is constructed. Cache the byte[] and stamp each one out to an image as needed.
This question already has answers here:
Convert audio stream to WAV byte array in Java without temp file
(5 answers)
Closed 9 years ago.
After extensive research into the subject I have reached a brick wall.
All I want to do is add a collection of .wav files into a byte array, one after another, and output them all into one complete newly created .wav file. I extract all the .wav data into a byte array, skipping the .wav header and going straight for the data, then when it comes to writing it to the newly created .wav file I get an error like:
Error1: javax.sound.sampled.UnsupportedAudioFileException: could not get audio input stream from input stream
Error2: could not get audio input stream from input stream
The code is:
try
{
String path = "*********";
String path2 = path + "newFile.wav";
File filePath = new File(path);
File NewfilePath = new File(path2);
String [] folderContent = filePath.list();
int FileSize = 0;
for(int i = 0; i < folderContent.length; i++)
{
RandomAccessFile raf = new RandomAccessFile(path + folderContent[i], "r");
FileSize = FileSize + (int)raf.length();
}
byte[] FileBytes = new byte[FileSize];
for(int i = 0; i < folderContent.length; i++)
{
RandomAccessFile raf = new RandomAccessFile(path + folderContent[i], "r");
raf.skipBytes(44);
raf.read(FileBytes);
raf.close();
}
boolean success = NewfilePath.createNewFile();
InputStream byteArray = new ByteArrayInputStream(FileBytes);
AudioInputStream ais = AudioSystem.getAudioInputStream(byteArray);
AudioSystem.write(ais, Type.WAVE, NewfilePath);
}
Your byte array doesn't contain any header information which probably means that AutoSystem.write doesn't think it is really WAV data.
Can you create suitable header for your combined data?
Update: This question might hold the answer for you.