Copy raw data of block device using Java - java

I have 2 disks in the Linux system, say /dev/dsk1 and /dev/dsk2, and I'm trying to read the raw data from dsk1 in bytes and write them into dsk2, in order to make dsk2 an exact copy of dsk1. I tried to do that in the following way (executed with sudo):
import...
public class Main {
public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
Path src = new File("/dev/dsk1").toPath();
Path dst = new File("/dev/dsk2").toPath();
FileChannel r = FileChannel.open(src, StandardOpenOption.READ, StandardOpenOption.WRITE);
FileChannel w = FileChannel.open(dst, StandardOpenOption.READ, StandardOpenOption.WRITE);
long size = r.size();
ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
for (int offset = 0; offset < size; offset+=1024) {
r.position(offset);
w.position(offset);
r.read(byteBuffer);
byteBuffer.flip();
w.write(byteBuffer);
byteBuffer.clear();
}
r.close();
w.close();
}
}
but after writing all the bytes in dsk1 to dsk2, dsk2's filesystem seems to be corrupted. No files can be found in it and if I try to mkdir it will say "structure needs cleaning".
I've tested the above code on regular files, like a text1.txt containing a few characters as src and an empty text2.txt as dst, and it worked fine.
Did I miss something there when reading & writing raw data on block device?

You never check if read method read all 1024 bytes, or if write method wrote them all. Most likely you're leaving gaps in the copy.
There's no magic involved reading from and writing to devices. The first thing I would try is this:
try (FileInputStream src = new FileInputStream("/dev/dsk1");
FileOutputStream dst = new FileOutputStream("/dev/dsk2")) {
src.transferTo(dst);
}

Related

Split/join a binary file into multiple parts without loading file into memory?

In Java, how do you split a binary file into multiple parts while only loading a small portion of the File into memory at one time?
So I have a file FullFile that is large. I need to upload it to cloud storage but it's so large that it often times out.
I can make this problem less likely if I split the file and upload in chunks.
So I need to split FullFile into files of chunk size MaxChunkSize.
List<File> fileSplit(File fullFile, int maxChunkSize)
File fileJoin(List<File> splitFiles)
Most code snippets around require the file to be text. But in my case the files are compressed binary.
What would be the best way to implement these methods?
Below is the full answer:
The maxChunkSize represents the size in bytes of a file chunk.
In the example below I read a 5mb zip file and split it into five 1MB chunks and later join them back using the fileJoin function.
The method stageLocally stages the files locally but you can modify it to work with any cloud storage. (Better to abstract this out so you can switch between multiple storage implementations)
You can tweak maxChunkSize based on the amount of data you want to store inmemory at a given time
The IOutils.copy() methods is from the commons library, here is the maven link. You can also use Files.copy() in liue of it. The Files.copy() methods comes from the java.nio package, so you don't have to add an external dependency to use it.
I have ommitted the exception handling for brevity.
public static void main(String[] args) throws IOException {
File input = new File(_5_MB_FILE_PATH);
File outPut = fileJoin(split(input, 1_024_000));
System.out.println(IOUtils.contentEquals(Files.newInputStream(input.toPath()), Files.newInputStream(outPut.toPath())));
}
public static List<File> split(File largeFile, int maxChunkSize) throws IOException {
InputStream in = Files.newInputStream(largeFile.toPath());
List<File> list = new ArrayList<>();
final byte[] buffer = new byte[maxChunkSize];
int dataRead = in.read(buffer);
while (dataRead > -1) {
list.add(stageLocally(buffer, dataRead));
dataRead = in.read(buffer);
}
return list;
}
private static File stageLocally(byte[] buffer, int length) throws IOException {
File outPutFile = File.createTempFile("temp-", "split", new File(TEMP_DIRECTORY));
FileOutputStream fos = new FileOutputStream(outPutFile);
fos.write(buffer, 0, length);
fos.close();
return outPutFile;
}
public static File fileJoin(List<File> list) throws IOException {
File outPutFile = File.createTempFile("temp-", "unsplit", new File(TEMP_DIRECTORY));
FileOutputStream fileOutputStream = new FileOutputStream(outPutFile);
for (File file : list) {
InputStream in = Files.newInputStream(file.toPath());
IOUtils.copy(in, fileOutputStream);
in.close();
}
fileOutputStream.close();
return outPutFile;
}
Let me know if this helps.

Combining compressed Gzipped Text Files using Java

my question might not be entirely related to Java but I'm currently seeking a method to combine several compressed (gzipped) textfiles without the requirement to recompress them manually. Lets say I have 4 files, all text that is compressed using gzip and want to compress these into one single *.gz file without de + recompressing them. My current method is to open an InputStream and parse the file linewise, storing in a GZIPoutputstream, which works but isn't very fast.... I could of course also call
zcat file1 file2 file3 | gzip -c > output_all_four.gz
This would work, too but isn't really fast either.
My idea would be to copy the inputstream and write it to outputstream directly without "parsing" the stream, as I don't need to manipulate anything actually. Is something like this possible?
Find below a simple solution in Java (it does the same as my cat ... example). Any kind of buffering the input/output has been omitted to keep the code slim.
public class ConcatFiles {
public static void main(String[] args) throws IOException {
// concatenate the single gzip files to one gzip file
try (InputStream isOne = new FileInputStream("file1.gz");
InputStream isTwo = new FileInputStream("file2.gz");
InputStream isThree = new FileInputStream("file3.gz");
SequenceInputStream sis = new SequenceInputStream(new SequenceInputStream(isOne, isTwo), isThree);
OutputStream bos = new FileOutputStream("output_all_three.gz")) {
byte[] buffer = new byte[8192];
int intsRead;
while ((intsRead = sis.read(buffer)) != -1) {
bos.write(buffer, 0, intsRead);
}
bos.flush();
}
// ungezip the single gzip file, the output contains the
// concatenated input of the single uncompressed files
try (GZIPInputStream gzipis = new GZIPInputStream(new FileInputStream("output_all_three.gz"));
OutputStream bos = new FileOutputStream("output_all_three")) {
byte[] buffer = new byte[8192];
int intsRead;
while ((intsRead = gzipis.read(buffer)) != -1) {
bos.write(buffer, 0, intsRead);
}
bos.flush();
}
}
}
The above method works if you just require to gzip many zipped files. In my case I had made a web servlet and my response was in 20-30 KBs. So I was sending the zipped response.
I tried to zip all my individual JS files on server start only and then add dynamic code runtime using the above method. I could print the entire response in my log file but chrome was able to unzip the first file only. Rest output was coming in bytes.
After research I found out that this is not possible with chrome and they have closed the bug also without solving it.
https://bugs.chromium.org/p/chromium/issues/detail?id=20884

How to download a file from the internet using Java

Hi I am trying to write some code in my program so I can grab a file from the internet but it seems that is not working. Can someone give me some advice please ? Here is my code. In this case I try to download an mp3 file from the last.fm website, my code runs perfectly fine but when I open my downloads directory the file is not there. Any idea ?
public class download {
public static void main(String[] args) throws IOException {
String fileName = "Death Grips - Get Got.mp3";
URL link = new URL("http://www.last.fm/music/+free-music-downloads");
InputStream in = new BufferedInputStream(link.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(response);
fos.close();
System.out.println("Finished");
}
}
Every executing program has a current working directory. Often times, it is the directory where the executable lives (if it was launched in a "normal" way).
Since you didn't specify a path (in fileName), the file will be saved with that name in the current working directory.
If you want the file to be saved in your downloads directory, specify the full path. E.g.
String fileName = "C:\\Users\\YOUR_USERNAME\\Downloads\\Death Grips - Get Got.mp3";
Note how I've escaped the backslashes. Also note that there are methods for joining paths in Java. There is a way to get the current working directory in Java.

Copy from a filechannel to another

I'm trying to copy part of a file from a filechannel to another (writing a new file, in effect, equals to the first one).
So, I'm reading chunks of 256kb, then putting them back into another channel
static void openfile(String str) throws FileNotFoundException, IOException {
int size=262144;
FileInputStream fis = new FileInputStream(str);
FileChannel fc = fis.getChannel();
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap(barray);
FileOutputStream fos = new FileOutputStream(str+"2" /**/);
FileChannel fo = fos.getChannel();
StringBuilder sb;
while (fc.read(bb) != -1) {
fo.write(bb /**/);
bb.clear();
}
}
The problem is that fo.write (I think) writes again from the beginning of the channel, so the new file is made only of the last chunk read.
I tried with fo.write (bb, bb.position()) but it didn't work as I expected (does the pointer returns to the beginning of the channel?) and with FileOutputStream(str+"2", true) thinking it would append to the end of the new file, but it didn't.
I need to work with chunks of 256kb, so I can't change much the structure of the program (unless I am doing something terribly wrong)
Resolved with bb.flip();
while (fi.read(bb) != -1) {
bb.flip();
fo.write(bb);
bb.clear();
}
This is a very old question but I stumbled upon it and though I might add another answer that has potentially better performance using using FileChannel.transferTo or FileChannel.transferFrom. As per the javadoc:
This method is potentially much more efficient than a simple loop that reads from the source channel and writes to this channel. Many operating systems can transfer bytes directly from the source channel into the filesystem cache without actually copying them.
public static void copy(FileChannel src, FileChannel dst) throws IOException {
long size = src.size();
long transferred = 0;
do {
transferred += src.transferTo(0, size, dst);
} while (transferred < size);
}
on most cases a simple src.transferTo(0, src.size(), dst); will work if non of the channels are non-blocking.
The canonical way to copy between channels is as follows:
while (in.read(bb) > 0 || bb.position() > 0)
{
bb.flip();
out.write(bb);
bb.compact();
}
The simplified version in your edited answer doesn't work in all circumstances, e.g. when 'out' is non-blocking.

Combining all text files in a folder into a single file

How can I combine all txt files in a folder into a single file? A folder usually contains hundreds to thousands of txt files.
If this program were only to be run on windows machines I would just go with a batch file containing something like
copy /b *.txt merged.txt
But that is not the case, so I figured it might be easier to just write it in Java to complement everything else we have.
I have written something like this
// Retrieves a list of files from the specified folder with the filter applied
File[] files = Utils.filterFiles(downloadFolder + folder, ".*\\.txt");
try
{
// savePath is the path of the output file
FileOutputStream outFile = new FileOutputStream(savePath);
for (File file : files)
{
FileInputStream inFile = new FileInputStream(file);
Integer b = null;
while ((b = inFile.read()) != -1)
outFile.write(b);
inFile.close();
}
outFile.close();
}
catch (Exception e)
{
e.printStackTrace();
}
But it takes several minutes to combine thousands of files so it is not feasible.
Use NIO, it is much easier than using inputstreams/outputstreams. Note: uses Guava's Closer, which means all resources are safely closed; even better would be to use Java 7 and try-with-resources.
final Closer closer = Closer.create();
final RandomAccessFile outFile;
final FileChannel outChannel;
try {
outFile = closer.register(new RandomAccessFile(dstFile, "rw"));
outChannel = closer.register(outFile.getChannel());
for (final File file: filesToCopy)
doWrite(outChannel, file);
} finally {
closer.close();
}
// doWrite method
private static void doWrite(final WriteableByteChannel channel, final File file)
throws IOException
{
final Closer closer = Closer.create();
final RandomAccessFile inFile;
final FileChannel inChannel;
try {
inFile = closer.register(new RandomAccessFile(file, "r"));
inChannel = closer.register(inFile.getChannel());
inChannel.transferTo(0, inChannel.size(), channel);
} finally {
closer.close();
}
}
Because of this
Integer b = null;
while ((b = inFile.read()) != -1)
outFile.write(b);
Your OS is making a lot of IO calls. read() only reads one byte of data. Use the other read methods that accept a byte[]. You can then use that byte[] to write to your OutputStream. Similarly write(int) does an IO call writing a single byte. Change that too.
Of course, you can look into tools that do this for you, like Apache Commons IO or even the Java 7 NIO package.
Try using BufferedReader and BufferedWriter instead of writing bytes one by one.
You can use IoUtils to merge files,IoUtils.copy() method will help you for merging files.
This link may be useful merging file in java
I would do it this way !
check for the OS
System.getProperty("os.name")
Run the System Level command from Java.
If windows
copy /b *.txt merged.txt
if Unix
cat *.txt > merged.txt
or whatever best System level command available.

Categories

Resources