Write humongous number of tiny files

Write humongous number of tiny files - java

I've written Java code for writing String into a file. Size of string will be hardly 10KB.
Below is the code I've written to write files. I've written 3 ways to write into a file.
void writeMethod(String string, int m)
{
if (m == 1)
{
FileChannel rwChannel = new RandomAccessFile(filePath, "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, string.length() * 1);
wrBuf.put(string.getBytes());
rwChannel.close();
}
if (m == 2)
{
FileOutputStream fileOutputStream = new FileOutputStream(filePath);
fileOutputStream.write(string.getBytes());
fileOutputStream.close();
}
if (m == 3)
{
FileWriter bw new FileWriter(filePath);
bw.write(string);
bw.close( );
}
}
**Ignore errors
I call the above function from 3 threads, one method per thread. I'm not sure which one is the fastest. If not among these ways, which one is good. I've to write 17,000,000 files.

You might also want to try the java.nio.file package as one of your methods for test purpose.
Something like:
Path path = Paths.get(filePath);
Files.write(path, string.getBytes(), null);

Related

Copy raw data of block device using Java

I have 2 disks in the Linux system, say /dev/dsk1 and /dev/dsk2, and I'm trying to read the raw data from dsk1 in bytes and write them into dsk2, in order to make dsk2 an exact copy of dsk1. I tried to do that in the following way (executed with sudo):
import...
public class Main {
public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
Path src = new File("/dev/dsk1").toPath();
Path dst = new File("/dev/dsk2").toPath();
FileChannel r = FileChannel.open(src, StandardOpenOption.READ, StandardOpenOption.WRITE);
FileChannel w = FileChannel.open(dst, StandardOpenOption.READ, StandardOpenOption.WRITE);
long size = r.size();
ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
for (int offset = 0; offset < size; offset+=1024) {
r.position(offset);
w.position(offset);
r.read(byteBuffer);
byteBuffer.flip();
w.write(byteBuffer);
byteBuffer.clear();
}
r.close();
w.close();
}
}
but after writing all the bytes in dsk1 to dsk2, dsk2's filesystem seems to be corrupted. No files can be found in it and if I try to mkdir it will say "structure needs cleaning".
I've tested the above code on regular files, like a text1.txt containing a few characters as src and an empty text2.txt as dst, and it worked fine.
Did I miss something there when reading & writing raw data on block device?

You never check if read method read all 1024 bytes, or if write method wrote them all. Most likely you're leaving gaps in the copy.
There's no magic involved reading from and writing to devices. The first thing I would try is this:
try (FileInputStream src = new FileInputStream("/dev/dsk1");
FileOutputStream dst = new FileOutputStream("/dev/dsk2")) {
src.transferTo(dst);
}

Split large files using Java [duplicate]

This question already has answers here:
Java - Read file and split into multiple files
(11 answers)
Closed 3 years ago.
How can I split a file into parts larger than 2GB?
An array of bytes accepts an int instead of a long as the size. any solution?
public void splitFile(SplitFile file) throws IOException {
int partCounter = 1;
int sizeOfFiles = (int)value;
byte[] buffer = new byte[sizeOfFiles];
File f = file.getFile();
String fileName = f.getName();
try (FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis)) {
int bytesAmount = 0;
while ((bytesAmount = bis.read(buffer)) > 0) {
String filePartName = fileName + partCounter + file.config.options.getExtension();
partCounter++;
File newFile = new File(f.getParent(), filePartName);
try (FileOutputStream out = new FileOutputStream(newFile)) {
out.write(buffer, 0, bytesAmount);
}
}
}
}

Don't read the entire file into memory, obviously, or even an entire 'part file'.
Your code as pasted will split the file into as many parts as the read method partitions; this seems very silly; after all, the read() method is specced to allow it to partition into single byte increments.
Don't make a new part-file for every call to read. Instead, separate this out: Your read call gets anywhere from 1 to <BUFFER_SIZE> bytes, and your part's size is <PART_SIZE> large; these two things do not have to be the same and you shouldn't write the code that way.
Once you have an open FileOutputStream you can call .write(buffer, 0, bytesAmount) on it any number of times; you can even call .write(buffer, 0, theSmallerOfBytesLeftToWriteInThisPartAndBytesAmount) followed by opening up the next part file FileOutputStream and calling .write(buffer, whereWeLeftOff, remainder) on that one.

Copy from a filechannel to another

I'm trying to copy part of a file from a filechannel to another (writing a new file, in effect, equals to the first one).
So, I'm reading chunks of 256kb, then putting them back into another channel
static void openfile(String str) throws FileNotFoundException, IOException {
int size=262144;
FileInputStream fis = new FileInputStream(str);
FileChannel fc = fis.getChannel();
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap(barray);
FileOutputStream fos = new FileOutputStream(str+"2" /**/);
FileChannel fo = fos.getChannel();
StringBuilder sb;
while (fc.read(bb) != -1) {
fo.write(bb /**/);
bb.clear();
}
}
The problem is that fo.write (I think) writes again from the beginning of the channel, so the new file is made only of the last chunk read.
I tried with fo.write (bb, bb.position()) but it didn't work as I expected (does the pointer returns to the beginning of the channel?) and with FileOutputStream(str+"2", true) thinking it would append to the end of the new file, but it didn't.
I need to work with chunks of 256kb, so I can't change much the structure of the program (unless I am doing something terribly wrong)
Resolved with bb.flip();
while (fi.read(bb) != -1) {
bb.flip();
fo.write(bb);
bb.clear();
}

This is a very old question but I stumbled upon it and though I might add another answer that has potentially better performance using using FileChannel.transferTo or FileChannel.transferFrom. As per the javadoc:
This method is potentially much more efficient than a simple loop that reads from the source channel and writes to this channel. Many operating systems can transfer bytes directly from the source channel into the filesystem cache without actually copying them.
public static void copy(FileChannel src, FileChannel dst) throws IOException {
long size = src.size();
long transferred = 0;
do {
transferred += src.transferTo(0, size, dst);
} while (transferred < size);
}
on most cases a simple src.transferTo(0, src.size(), dst); will work if non of the channels are non-blocking.

The canonical way to copy between channels is as follows:
while (in.read(bb) > 0 || bb.position() > 0)
{
bb.flip();
out.write(bb);
bb.compact();
}
The simplified version in your edited answer doesn't work in all circumstances, e.g. when 'out' is non-blocking.

Combining all text files in a folder into a single file

How can I combine all txt files in a folder into a single file? A folder usually contains hundreds to thousands of txt files.
If this program were only to be run on windows machines I would just go with a batch file containing something like
copy /b *.txt merged.txt
But that is not the case, so I figured it might be easier to just write it in Java to complement everything else we have.
I have written something like this
// Retrieves a list of files from the specified folder with the filter applied
File[] files = Utils.filterFiles(downloadFolder + folder, ".*\\.txt");
try
{
// savePath is the path of the output file
FileOutputStream outFile = new FileOutputStream(savePath);
for (File file : files)
{
FileInputStream inFile = new FileInputStream(file);
Integer b = null;
while ((b = inFile.read()) != -1)
outFile.write(b);
inFile.close();
}
outFile.close();
}
catch (Exception e)
{
e.printStackTrace();
}
But it takes several minutes to combine thousands of files so it is not feasible.

Use NIO, it is much easier than using inputstreams/outputstreams. Note: uses Guava's Closer, which means all resources are safely closed; even better would be to use Java 7 and try-with-resources.
final Closer closer = Closer.create();
final RandomAccessFile outFile;
final FileChannel outChannel;
try {
outFile = closer.register(new RandomAccessFile(dstFile, "rw"));
outChannel = closer.register(outFile.getChannel());
for (final File file: filesToCopy)
doWrite(outChannel, file);
} finally {
closer.close();
}
// doWrite method
private static void doWrite(final WriteableByteChannel channel, final File file)
throws IOException
{
final Closer closer = Closer.create();
final RandomAccessFile inFile;
final FileChannel inChannel;
try {
inFile = closer.register(new RandomAccessFile(file, "r"));
inChannel = closer.register(inFile.getChannel());
inChannel.transferTo(0, inChannel.size(), channel);
} finally {
closer.close();
}
}

Because of this
Integer b = null;
while ((b = inFile.read()) != -1)
outFile.write(b);
Your OS is making a lot of IO calls. read() only reads one byte of data. Use the other read methods that accept a byte[]. You can then use that byte[] to write to your OutputStream. Similarly write(int) does an IO call writing a single byte. Change that too.
Of course, you can look into tools that do this for you, like Apache Commons IO or even the Java 7 NIO package.

Try using BufferedReader and BufferedWriter instead of writing bytes one by one.

You can use IoUtils to merge files,IoUtils.copy() method will help you for merging files.
This link may be useful merging file in java

I would do it this way !
check for the OS
System.getProperty("os.name")
Run the System Level command from Java.
If windows
copy /b *.txt merged.txt
if Unix
cat *.txt > merged.txt
or whatever best System level command available.

How do i get a filename of a file inside a gzip in java?

int BUFFER_SIZE = 4096;
byte[] buffer = new byte[BUFFER_SIZE];
InputStream input = new GZIPInputStream(new FileInputStream("a_gunzipped_file.gz"));
OutputStream output = new FileOutputStream("current_output_name");
int n = input.read(buffer, 0, BUFFER_SIZE);
while (n >= 0) {
output.write(buffer, 0, n);
n = input.read(buffer, 0, BUFFER_SIZE);
}
}catch(IOException e){
System.out.println("error: \n\t" + e.getMessage());
}
Using the above code I can succesfully extract a gzip's contents although the extracted file's filenames are, as expected, will always be current_output_name (I know its because I declared it to be that way in the code). My problem is I dont know how to get the file's filename when it is still inside the archive.
Though, java.util.zip provides a ZipEntry, I couldn't use it on gzip files.
Any alternatives?

as i kinda agree with "Michael Borgwardt" on his reply, but it is not entirely true, gzip file specifications contains an optional file name stored in the header of the gz file, sadly there are no way (as far as i know ) of getting that name in current java (1.6). as seen in the implementation of the GZIPInputStream in the method getHeader in the openjdk
they skip reading the file name
// Skip optional file name
if ((flg & FNAME) == FNAME) {
while (readUByte(in) != 0) ;
}
i have modified the class GZIPInputStream to get the optional filename out of the gzip archive(im not sure if i am allowed to do that) (download the original version from here), you only need to add a member String filename; to the class, and modify the above code to be :
// Skip optional file name
if ((flg & FNAME) == FNAME) {
filename= "";
int _byte = 0;
while ((_byte= readUByte(in)) != 0){
filename += (char)_byte;
}
}
and it worked for me.

Apache Commons Compress offers two options for obtaining the filename:
With metadata (Java 7+ sample code)
try ( //
GzipCompressorInputStream gcis = //
new GzipCompressorInputStream( //
new FileInputStream("a_gunzipped_file.gz") //
) //
) {
String filename = gcis.getMetaData().getFilename();
}
With "the convention"
String filename = GzipUtils.getUnCompressedFilename("a_gunzipped_file.gz");
References
Apache Commons Compress
GzipCompressorInputStream
See also: GzipUtils#getUnCompressedFilename

Actually, the GZIP file format, using the multiple members, allows the original filename to be specified. Including a member with the FLAG of FLAG.FNAME the name can be specified. I do not see a way to do this in the java libraries though.
http://www.gzip.org/zlib/rfc-gzip.html#specification

following the answers above, here is an example that creates a file "myTest.csv.gz" that contains a file "myTest.csv", notice that you can't change the internal file name, and you can't add more files into the gz file.
#Test
public void gzipFileName() throws Exception {
File workingFile = new File( "target", "myTest.csv.gz" );
GZIPOutputStream gzipOutputStream = new GZIPOutputStream( new FileOutputStream( workingFile ) );
PrintWriter writer = new PrintWriter( gzipOutputStream );
writer.println("hello,line,1");
writer.println("hello,line,2");
writer.close();
}

Gzip is purely compression. There is no archive, it's just the file's data, compressed.
The convention is for gzip to append .gz to the filename, and for gunzip to remove that extension. So, logfile.txt becomes logfile.txt.gz when compressed, and again logfile.txt when it's decompressed. If you rename the file, the name information is lost.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Write humongous number of tiny files - java

You might also want to try the java.nio.file package as one of your methods for test purpose. Something like: Path path = Paths.get(filePath); Files.write(path, string.getBytes(), null);

Related

Copy raw data of block device using Java

Split large files using Java [duplicate]

Copy from a filechannel to another

Combining all text files in a folder into a single file

How do i get a filename of a file inside a gzip in java?

Categories

Resources