I'm trying to make a simplified HDFS (Hadoop Distributed File System) for a final project in a Distributed System course.
So, the first thing that I'm trying is to write a program which split an arbitrary file into blocks (chunks) of an arbitrary dimension.
I found this useful example, which code is:
package javabeat.net.io;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
/**
* Split File Example
*
* #author Krishna
*
*/
public class SplitFileExample {
private static String FILE_NAME = "TextFile.txt";
private static byte PART_SIZE = 5;
public static void main(String[] args) {
File inputFile = new File(FILE_NAME);
FileInputStream inputStream;
String newFileName;
FileOutputStream filePart;
int fileSize = (int) inputFile.length();
int nChunks = 0, read = 0, readLength = PART_SIZE;
byte[] byteChunkPart;
try {
inputStream = new FileInputStream(inputFile);
while (fileSize > 0) {
if (fileSize <= 5) {
readLength = fileSize;
}
byteChunkPart = new byte[readLength];
read = inputStream.read(byteChunkPart, 0, readLength);
fileSize -= read;
assert (read == byteChunkPart.length);
nChunks++;
newFileName = FILE_NAME + ".part"
+ Integer.toString(nChunks - 1);
filePart = new FileOutputStream(new File(newFileName));
filePart.write(byteChunkPart);
filePart.flush();
filePart.close();
byteChunkPart = null;
filePart = null;
}
inputStream.close();
} catch (IOException exception) {
exception.printStackTrace();
}
}
}
But I think that there is a big issue: the value of PART_SIZE cannot be greater than 127, otherwise an error: possible loss of precision will occur.
How can I solve without totally changing the code?
The problem is that PART_SIZE is a byte; its maximum value is therefore indeed 127.
The code you have at the moment however is full of problems; for one, incorrect resource handling etc.
Here is a version using java.nio.file:
private static final String FILENAME = "TextFile.txt";
private static final int PART_SIZE = xxx; // HERE
public static void main(final String... args)
throws IOException
{
final Path file = Paths.get(FILENAME).toRealPath();
final String filenameBase = file.getFileName().toString();
final byte[] buf = new byte[PART_SIZE];
int partNumber = 0;
Path part;
int bytesRead;
byte[] toWrite;
try (
final InputStream in = Files.newInputStream(file);
) {
while ((bytesRead = in.read(buf)) != -1) {
part = file.resolveSibling(filenameBase + ".part" + partNumber);
toWrite = bytesRead == PART_SIZE ? buf : Arrays.copyOf(buf, bytesRead);
Files.write(part, toWrite, StandardOpenOption.CREATE_NEW);
partNumber++;
}
}
}
List<PDDocument> Pages=new ArrayList<PDDocument>();
Document.load(filePath);
try {
Splitter splitter = new Splitter();
splitter.setSplitAtPage(NoOfPagesDocumentWillContain);
Pages = splitter.split(document);
}catch(Exception e)
{
l
e.getCause().printStackTrace();
}
Related
I am able to create a 7z file but want to create the file with a password, I tried with the set compression method but there is not an option to set the key, Please help me how I can create a password-protected 7Z file in Java.
public static void main(String args[]) throws FileNotFoundException, IOException {
SevenZOutputFile sevenZOutput = new SevenZOutputFile(new File("D:\\Test\\outFile.7z"));
File entryFile = new File("D:\\Test\\Test_20200210200232.dat");
SevenZArchiveEntry entry = sevenZOutput.createArchiveEntry(entryFile, entryFile.getName());
sevenZOutput.putArchiveEntry(entry);
FileInputStream in = new FileInputStream(entryFile);
int len;
byte buffer[] = new byte[8192];
int transferedMegaBytes2=0;
while ((len = in.read(buffer)) > 0) {
sevenZOutput.write(buffer, 0, len);
transferredBytes += len;
int transferedMegaBytes = (int) (transferredBytes / 1048576);
if(transferedMegaBytes>transferedMegaBytes2){
System.out.println("Transferred: " + transferedMegaBytes + " Megabytes.");
transferedMegaBytes2=transferedMegaBytes;
}
}
sevenZOutput.closeArchiveEntry();
sevenZOutput.setContentCompression(SevenZMethod.AES256SHA256);
sevenZOutput.close();
}
Apache Commons Compress does not support creating 7Z with passwords.
https://commons.apache.org/proper/commons-compress/limitations.html
I've tidied up your code, added try-with-resources & annotated a couple of problem-areas:
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZMethod;
import org.apache.commons.compress.archivers.sevenz.SevenZOutputFile;
public class Q66451111 {
private static final int KB = 1024;
private static final int MB = KB * KB;
public static void main(final String[] args) throws IOException {
final File entryFile = new File("D:\\Test\\Test_20200210200232.dat");
final File new7Z = new File("D:\\Test\\outFile.7z");
try ( final InputStream fin = new FileInputStream(entryFile);
final InputStream in = new BufferedInputStream(fin);
final SevenZOutputFile szof = new SevenZOutputFile (new7Z) )
{
final SevenZArchiveEntry entry = szof.createArchiveEntry(entryFile, entryFile.getName());
szof.putArchiveEntry(entry);
final byte buffer[] = new byte[8192];
int transferredBytes = 0;
int transferredBytesSincePrint = 0;
int len;
while ((len = in.read (buffer)) != -1 /* TODO Note: do NOT use '> 0' */) {
szof.write(buffer, 0, len);
transferredBytes += len;
transferredBytesSincePrint += len;
if (transferredBytesSincePrint > MB) {
transferredBytesSincePrint = 0;
System.out.println("Transferring.: " + ((double) transferredBytes / MB) + " Megabytes.");
}
}
System .out.println("Transferred..: " + ((double) transferredBytes / MB) + " Megabytes.");
szof.closeArchiveEntry();
szof.setContentCompression(SevenZMethod.AES256SHA256 /* FIXME Unsupported 7z Method!! */);
}
}
}
I've been asked to measure current disk performance, as we are planning to replace local disk with network attached storage on our application servers. Since our applications which write data are written in Java, I thought I would measure the performance directly in Linux, and also using a simple Java test. However I'm getting significantly different results, particularly for reading data, using what appear to me to be similar tests. Directly in Linux I'm doing:
dd if=/dev/zero of=/data/cache/test bs=1048576 count=8192
dd if=/data/cache/test of=/dev/null bs=1048576 count=8192
My Java test looks like this:
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class TestDiskSpeed {
private byte[] oneMB = new byte[1024 * 1024];
public static void main(String[] args) throws IOException {
new TestDiskSpeed().execute(args);
}
private void execute(String[] args) throws IOException {
long size = Long.parseLong(args[1]);
testWriteSpeed(args[0], size);
testReadSpeed(args[0], size);
}
private void testWriteSpeed(String filePath, long size) throws IOException {
File file = new File(filePath);
BufferedOutputStream writer = null;
long start = System.currentTimeMillis();
try {
writer = new BufferedOutputStream(new FileOutputStream(file), 1024 * 1024);
for (int i = 0; i < size; i++) {
writer.write(oneMB);
}
writer.flush();
} finally {
if (writer != null) {
writer.close();
}
}
long elapsed = System.currentTimeMillis() - start;
String message = "Wrote " + size + "MB in " + elapsed + "ms at a speed of " + calculateSpeed(size, elapsed) + "MB/s";
System.out.println(message);
}
private void testReadSpeed(String filePath, long size) throws IOException {
File file = new File(filePath);
BufferedInputStream reader = null;
long start = System.currentTimeMillis();
try {
reader = new BufferedInputStream(new FileInputStream(file), 1024 * 1024);
for (int i = 0; i < size; i++) {
reader.read(oneMB);
}
} finally {
if (reader != null) {
reader.close();
}
}
long elapsed = System.currentTimeMillis() - start;
String message = "Read " + size + "MB in " + elapsed + "ms at a speed of " + calculateSpeed(size, elapsed) + "MB/s";
System.out.println(message);
}
private double calculateSpeed(long size, long elapsed) {
double seconds = ((double) elapsed) / 1000L;
double speed = ((double) size) / seconds;
return speed;
}
}
This is being invoked with "java TestDiskSpeed /data/cache/test 8192"
Both of these should be creating 8GB files of zeros, 1MB at a time, measuring the speed, and then reading it back and measuring again. Yet the speeds I'm consistently getting are:
Linux: write - ~650MB/s
Linux: read - ~4.2GB/s
Java: write - ~500MB/s
Java: read - ~1.9GB/s
Can anyone explain the large discrepancy?
When I run this using NIO on my system. Ubuntu 15.04 with an i7-3970X
public class Main {
static final int SIZE_GB = Integer.getInteger("sizeGB", 8);
static final int BLOCK_SIZE = 64 * 1024;
public static void main(String[] args) throws IOException {
ByteBuffer buffer = ByteBuffer.allocateDirect(BLOCK_SIZE);
File tmp = File.createTempFile("delete", "me");
tmp.deleteOnExit();
int blocks = (int) (((long) SIZE_GB << 30) / BLOCK_SIZE);
long start = System.nanoTime();
try (FileChannel fc = new FileOutputStream(tmp).getChannel()) {
for (int i = 0; i < blocks; i++) {
buffer.clear();
while (buffer.remaining() > 0)
fc.write(buffer);
}
}
long mid = System.nanoTime();
try (FileChannel fc = new FileInputStream(tmp).getChannel()) {
for (int i = 0; i < blocks; i++) {
buffer.clear();
while (buffer.remaining() > 0)
fc.read(buffer);
}
}
long end = System.nanoTime();
long size = tmp.length();
System.out.printf("Write speed %.1f GB/s, read Speed %.1f GB/s%n",
(double) size/(mid-start), (double) size/(end-mid));
}
}
prints
Write speed 3.8 GB/s, read Speed 6.8 GB/s
You may get better performance if you drop the BufferedXxxStream. It's not helping since you're doing 1Mb read/writes, and is cause extra memory copy of the data.
Better yet, you should be using the NIO classes instead of the regular IO classes.
try-finally
You should clean up your try-finally code.
// Original code
BufferedOutputStream writer = null;
try {
writer = new ...;
// use writer
} finally {
if (writer != null) {
writer.close();
}
}
// Cleaner code
BufferedOutputStream writer = new ...;
try {
// use writer
} finally {
writer.close();
}
// Even cleaner, using try-with-resources (since Java 7)
try (BufferedOutputStream writer = new ...) {
// use writer
}
To complement Peter's great answer, I am adding the code below. It compares head-to-head the performance of the good-old java.io with NIO. Unlike Peter, instead of just reading data into a direct buffer, I do a typical thing with it: transfer it into an on-heap byte array. This steals surprisingly little from the performance: where I was getting 7.5 GB/s with Peter's code, here I get 6.0 GB/s.
For the java.io approach I can't have a direct buffer, but instead I call the read method directly with my target on-heap byte array. Note that this array is smallish and has an awkward size of 555 bytes. Nevertheless I retrieve almost identical performance: 5.6 GB/s. The difference is so small that it would evaporate completely in normal usage, and even in this artificial scenario if I wasn't reading directly from the disk cache.
As a bonus I include at the bottom a method which can be used on Linux and Mac to purge the disk caches. You'll see a dramatic turn in performance if you decide to call it between the write and the read step.
public final class MeasureIOPerformance {
static final int SIZE_GB = Integer.getInteger("sizeGB", 8);
static final int BLOCK_SIZE = 64 * 1024;
static final int blocks = (int) (((long) SIZE_GB << 30) / BLOCK_SIZE);
static final byte[] acceptBuffer = new byte[555];
public static void main(String[] args) throws IOException {
for (int i = 0; i < 3; i++) {
measure(new ChannelRw());
measure(new StreamRw());
}
}
private static void measure(RW rw) throws IOException {
File file = File.createTempFile("delete", "me");
file.deleteOnExit();
System.out.println("Writing " + SIZE_GB + " GB " + " with " + rw);
long start = System.nanoTime();
rw.write(file);
long mid = System.nanoTime();
System.out.println("Reading " + SIZE_GB + " GB " + " with " + rw);
long checksum = rw.read(file);
long end = System.nanoTime();
long size = file.length();
System.out.printf("Write speed %.1f GB/s, read Speed %.1f GB/s%n",
(double) size/(mid-start), (double) size/(end-mid));
System.out.println(checksum);
file.delete();
}
interface RW {
void write(File f) throws IOException;
long read(File f) throws IOException;
}
static class ChannelRw implements RW {
final ByteBuffer directBuffer = ByteBuffer.allocateDirect(BLOCK_SIZE);
#Override public String toString() {
return "Channel";
}
#Override public void write(File f) throws IOException {
FileChannel fc = new FileOutputStream(f).getChannel();
try {
for (int i = 0; i < blocks; i++) {
directBuffer.clear();
while (directBuffer.remaining() > 0) {
fc.write(directBuffer);
}
}
} finally {
fc.close();
}
}
#Override public long read(File f) throws IOException {
ByteBuffer buffer = ByteBuffer.allocateDirect(BLOCK_SIZE);
FileChannel fc = new FileInputStream(f).getChannel();
long checksum = 0;
try {
for (int i = 0; i < blocks; i++) {
buffer.clear();
while (buffer.hasRemaining()) {
fc.read(buffer);
}
buffer.flip();
while (buffer.hasRemaining()) {
buffer.get(acceptBuffer, 0, Math.min(acceptBuffer.length, buffer.remaining()));
checksum += acceptBuffer[acceptBuffer[0]];
}
}
} finally {
fc.close();
}
return checksum;
}
}
static class StreamRw implements RW {
final byte[] buffer = new byte[BLOCK_SIZE];
#Override public String toString() {
return "Stream";
}
#Override public void write(File f) throws IOException {
FileOutputStream out = new FileOutputStream(f);
try {
for (int i = 0; i < blocks; i++) {
out.write(buffer);
}
} finally {
out.close();
}
}
#Override public long read(File f) throws IOException {
FileInputStream in = new FileInputStream(f);
long checksum = 0;
try {
for (int i = 0; i < blocks; i++) {
for (int remaining = acceptBuffer.length, read;
(read = in.read(buffer)) != -1 && (remaining -= read) > 0; )
{
in.read(acceptBuffer, acceptBuffer.length - remaining, remaining);
}
checksum += acceptBuffer[acceptBuffer[0]];
}
} finally {
in.close();
}
return checksum;
}
}
public static void purgeCache() throws IOException, InterruptedException {
if (System.getProperty("os.name").startsWith("Mac")) {
new ProcessBuilder("sudo", "purge")
// .inheritIO()
.start().waitFor();
} else {
new ProcessBuilder("sudo", "su", "-c", "echo 3 > /proc/sys/vm/drop_caches")
// .inheritIO()
.start().waitFor();
}
}
}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
The split method takes two arguments, name of file to split, and size of each split. Could you check if I'm on the write track? And the pseudocode on what to put in the for loop?
import java.io.*;
public class SplitFile {
public static void main(String[] args) throws IOException {
Split("testfile.pdf", 256);
}
public static Split(String filename, int splitSize) throws IOException {
int numberOfFiles = 0;
File file = new File(filename);
numberOfFiles = ((int) file.length() / splitSize) + 1;
for (; numberOfFiles >= 0; numberOfFiles--) {
DataInputStream in = new DataInputStream(new BufferedInputStream(
new FileInputStream(filename)));
DataOutputStream out = new DataOutputStream(
new BufferedOutputStream(new FileOutputStream(file))); //What do I put here?
}
}
}
Required changes
File object per output part, e.g.
Initialize data input stream outside the loop, not inside
Code
File original = new File(filename);
int numberOfFiles = ((int) original.length() / splitSize) + 1;
DataInputStream in =
new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
// <== just count through parts.
for (int i = 0; i < numberOfFiles; i++) {
File output = new File(String.format("%s-%d", filename, i));
// <== Part of file being output e.g. testfile.pdf-1, testfile.pdf-2
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(output)));
}
For the actual writing...
read bytes from input stream using read() call
write bytes to output stream using write() call
Two approaches, either 1 byte at a time - easiest, but less efficient, or use a buffer, harder to code, but more efficient.
Buffered approach
long length = original.length();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
int pos = 0;
byte[] buffer = new byte[splitSize];
for (...) {
...
// make sure you deal with file not being exactly divisible,
// last chunk might be smaller
long remaining = length - pos;
in.read(buffer, pos, (int) Math.min(splitSize, remaining));
out.write(buffer, 0, (int) Math.min(splitSize, remaining));
pos += splitSize;
}
1 byte at a time.
for (...) {
...
for (int i = 0; i < splitSize && pos < length; i++) {
out.write(in.read());
pos++;
}
}
You can do it using the Java NIO API in the following way.
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public final class SplitFile {
public static void main(String[] args) throws IOException {
split("testfile.pdf", 256);
}
private static void split(String filename, int splitSize) throws IOException {
int i = filename.lastIndexOf('.');
String basename = filename.substring(0, i);
String ext = filename.substring(i + 1);
Path inputPath = Paths.get(filename);
int numberOfFiles = (int) (Files.size(inputPath) / splitSize) + 1;
try (FileChannel inputChannel = FileChannel.open(inputPath, StandardOpenOption.READ)) {
for (int j = 0; j < numberOfFiles; j++) {
String outputFilename = String.format("%s-%04d.%s", basename, j + 1, ext);
Path outputPath = inputPath.getParent().resolve(outputFilename);
try (FileChannel outputChannel = FileChannel.open(outputPath, StandardOpenOption.CREATE, StandardOpenOption.WRITE)) {
inputChannel.transferTo(j * splitSize, splitSize, outputChannel);
}
}
}
}
}
I want to split my audio file (.wav format) in frames of 32 milliseconds each. Sampling frequency - 16khz, number of channels - 1(mono), pcm signal, sample size = 93638.
After getting the data in the byte format, I am converting the byte array storing the wav file data to double array since I require it to pass it to a method which accepts a double array, I am using the following code can someone tell me how to proceed?
import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.nio.ByteBuffer;
public class AudioFiles
{
public static void main(String[] args)
{
String file = "D:/p.wav";
AudioFiles afiles = new AudioFiles();
byte[] data1 = afiles.readAudioFileData(file);
byte[] data2 = afiles.readWAVAudioFileData(file);
System.out.format("data len1: %d\n", data1.length);
System.out.format("data len2: %d\n", data2.length);
/* for(int i=0;i<data2.length;i++)
{
System.out.format("\t"+data2[i]);
}*/
System.out.println();
/* for(int j=0;j<data1.length;j++)
{
System.out.format("\t"+data1[j]);
}*/
System.out.format("diff len: %d\n", data2.length - data1.length);
double[] d = new double[data1.length];
d = toDoubleArray(data1);
for (int j = 0; j < data1.length; j++)
{
System.out.format("\t" + d[j]);
}
daub a = new daub();
a.daubTrans(d);
}
public static double[] toDoubleArray(byte[] byteArray)
{
int times = Double.SIZE / Byte.SIZE;
double[] doubles = new double[byteArray.length / times];
for (int i = 0; i < doubles.length; i++)
{
doubles[i] = ByteBuffer.wrap(byteArray, i * times, times).getDouble();
}
return doubles;
}
public byte[] readAudioFileData(final String filePath)
{
byte[] data = null;
try
{
final ByteArrayOutputStream baout = new ByteArrayOutputStream();
final File file = new File(filePath);
final AudioInputStream audioInputStream = AudioSystem
.getAudioInputStream(file);
byte[] buffer = new byte[4096];
int c;
while ((c = audioInputStream.read(buffer, 0, buffer.length)) != -1)
{
baout.write(buffer, 0, c);
}
audioInputStream.close();
baout.close();
data = baout.toByteArray();
}
catch (Exception e)
{
e.printStackTrace();
}
return data;
}
public byte[] readWAVAudioFileData(final String filePath)
{
byte[] data = null;
try
{
final ByteArrayOutputStream baout = new ByteArrayOutputStream();
final AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new File(filePath));
AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, baout);
audioInputStream.close();
baout.close();
data = baout.toByteArray();
}
catch (Exception e)
{
e.printStackTrace();
}
return data;
}
}
I want to pass the double array d to method performing wavelet transform, in the frames of 32 millisecond since it accepts a double array.
In my previous question I was given a reply that:
At 16kHz sample rate you'll have 16 samples per millisecond. Therefore, each 32ms frame would be 32*16=512 mono samples. Multiply by the number of bytes-per-sample (typically 2 or 4) and that will be the number of bytes per frame.
I want to know whether my frame size changes when I convert my array from byte format to double format or does it remains the same??
My Previous Question.
if I read via
package net.example;
import java.io.FileInputStream;
import java.io.IOException;
public class Test {
public static void main(String[] args) throws IOException {
byte[] buffer = new byte[1024];
FileInputStream in = new FileInputStream("test.txt");
int rc = in.read(buffer);
while (rc != -1) {
System.out.print(new String(buffer));
rc = in.read(buffer);
}
}
}
a textfile, the it doesn't put out the correct content. The output is bigger than the input.
Example: http://pastebin.com/r5uGfYgD
I know it is because of the buffer size. But how can i tell it to stop reading after the file ends?
Edit:
Now it works, here to full source. Thanks a lot! If somebody has some improvments: Tell me!
package net.example;
import java.io.FileInputStream;
import java.io.IOException;
import fr.cryptohash.Digest;
import fr.cryptohash.MD5;
public class Test {
public static void main(String[] args) throws IOException {
Digest dig = new MD5();
byte[] srcBuffer = new byte[102400];
byte[] buffer = null;
FileInputStream in = new FileInputStream("text.txt");
int rc = -1;
while ((rc = in.read(srcBuffer)) != -1) {
buffer = new byte[rc];
System.arraycopy(srcBuffer, 0, buffer, 0, rc);
dig.update(buffer);
}
System.out.println(toHex(dig.digest()));
}
private static String toHex(byte[] hash) {
char[] HEX_CHARS = "0123456789abcdef".toCharArray();
StringBuilder sb = new StringBuilder(hash.length * 2);
for (byte b : hash) {
sb.append(HEX_CHARS[(b & 0xF0) >> 4]);
sb.append(HEX_CHARS[b & 0x0F]);
}
String hex = sb.toString();
return hex;
}
}
How about using String(bytes[], offset, length) constructor?
byte[] buffer = new byte[1024];
FileInputStream in = new FileInputStream("input.txt");
int rc = -1;
while ((rc = in.read(buffer)) != -1) {
System.out.print(new String(buffer, 0, rc));
}
If you need to read the file contents to a byte[], you can use a ByteArrayOutputStream or using commons io which has a "read to byte[]" util method.