Read different portion of a file with multiple threads in Java - java

I have a 10GB PDF file that I would like to break up into 10 files each 1GB in size. I need to do this operation in parallel, which means spinning 10 threads which each starts from a different position and read up to 1GB of data and write to a file. Basically the final result should be 10 files that each contain a portion of the original 10GB file.
I looked at FileChannel, but the position is shared, so once I modify the position in one thread, it impacts the other thread. I also looked at AsynchronousFileChannel in Java 7 but I'm not sure if that's the way to go. I appreciate any suggestion on this issue.
I wrote this simple program that reads a small text file to test the FileChannel idea, doesn't seem to work for what I'm trying to achieve.
package org.cas.filesplit;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;
public class ConcurrentRead implements Runnable {
private int myPosition = 0;
public int getPosition() {
return myPosition;
}
public void setPosition(int position) {
this.myPosition = position;
}
static final String filePath = "C:\\Users\\temp.txt";
#Override
public void run() {
try {
readFile();
} catch (IOException e) {
e.printStackTrace();
}
}
private void readFile() throws IOException {
Path path = Paths.get(filePath);
FileChannel fileChannel = FileChannel.open(path);
fileChannel.position(myPosition);
ByteBuffer buffer = ByteBuffer.allocate(8);
int noOfBytesRead = fileChannel.read(buffer);
while (noOfBytesRead != -1) {
buffer.flip();
System.out.println("Thread - " + Thread.currentThread().getId());
while (buffer.hasRemaining()) {
System.out.print((char) buffer.get());
}
System.out.println(" ");
buffer.clear();
noOfBytesRead = fileChannel.read(buffer);
}
fileChannel.close();
}
}

Related

Java IO outperforms Java NIO when it comes to file reading

I believed that the new nio package would outperform the old io package when it comes to the time required to read the contents of a file. However, based on my results, io package seems to outperform nio package. Here's my test:
import java.io.*;
import java.lang.reflect.Array;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;
public class FileTestingOne {
public static void main(String[] args) {
long startTime = System.nanoTime();
File file = new File("hey2.txt");
try {
byte[] a = direct(file);
String s = new String(a);
}
catch (IOException err) {
err.printStackTrace();
}
long endTime = System.nanoTime();
long totalTime = (endTime - startTime);
System.out.println(totalTime);
}
public static ByteBuffer readFile_NIO(File file) throws IOException {
RandomAccessFile rFile = new RandomAccessFile(file.getName(), "rw");
FileChannel inChannel = rFile.getChannel();
ByteBuffer _buffer = ByteBuffer.allocate(1024);
int bytesRead = inChannel.read(_buffer);
while (bytesRead != -1) {
_buffer.flip();
while (_buffer.hasRemaining()) {
byte b = _buffer.get();
}
_buffer.clear();
bytesRead = inChannel.read(_buffer);
}
inChannel.close();
rFile.close();
return _buffer;
}
public static byte[] direct(File file) throws IOException {
byte[] buffer = Files.readAllBytes(file.toPath());
return buffer;
}
public static byte[] readFile_IO(File file) throws IOException {
byte[] _buffer = new byte[(int) file.length()];
InputStream in = null;
try {
in = new FileInputStream(file);
if ( in.read(_buffer) == -1 ) {
throw new IOException(
"EOF reached while reading file. File is probably empty");
}
}
finally {
try {
if (in != null)
in.close();
}
catch (IOException err) {
// TODO Logging
err.printStackTrace();
}
}
return _buffer;
}
}
// Small file
//7566395 -> readFile_NIO
//10790558 -> direct
//707775 -> readFile_IO
// Large file
//9228099 -> readFile_NIO
//737674 -> readFile_IO
//10903324 -> direct
// Very large file
//13700005 -> readFile_NIO
//2837188 -> readFile_IO
//11020507 -> direct
Results are:
Small file:
nio implementation: 7,566,395ns
io implementation: 707,775ns
direct implementation: 10,790,558ns
Large file:
nio implementation: 9,228,099ns
io implementation: 737,674ns
direct implementation: 10,903,324ns
Very large file:
nio implementation: 13,700,005ns
io implementation: 2,837,188ns
direct implementation: 11,020,507ns
I wanted to ask this question because (I believe) nio package is non-blocking, thus it needs to be faster, right?
Thank you,
Edit:
Changed ms to ns
Memory mapped files (or MappedByteBuffer) are a part of Java NIO and could help improve performance.
The non-blocking in Java NIO means that a thread does not have to wait for the next data to read. It does not necessarily affect performance of a full operation (like reading and processing a file) at all.

Updating parameter in SwingWorker

I need some help, I'm making a program like a file manager. In my program I need to make simultaneous files copies. For that I use SwingWorker to see the progress of the copies in a JProgressbar, but I need to know how to add more files to Copy in the task with the same destination.
This is my class that extends from Swingworker in my principal program I´ll select some files or folders to copy in one destination. What I need is while the Copytask is working I can to add more files to the Copyitem Arraylist.
Please help and sorry about my english.
import java.awt.Dimension;
import java.awt.Toolkit;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import javax.swing.JDialog;
import javax.swing.JOptionPane;
import javax.swing.JProgressBar;
import javax.swing.SwingWorker;
import xray.XRAYView;
public class CopyTask extends SwingWorker<Void, Integer>
{
ArrayList<CopyItem>copia;
private long totalBytes = 0L;
private long copiedBytes = 0L;
JProgressBar progressAll;
JProgressBar progressCurrent;
boolean override=true;
boolean overrideall=false;
public CopyTask(ArrayList<CopyItem>copia,JProgressBar progressAll,JProgressBar progressCurrent)
{
this.copia=copia;
this.progressAll=progressAll;
this.progressCurrent=progressCurrent;
progressAll.setValue(0);
progressCurrent.setValue(0);
totalBytes=retrieveTotalBytes(copia);
}
public void AgregarCopia(ArrayList<CopyItem>addcopia)throws Exception{
copia.addAll(copia.size(), addcopia);
totalBytes=retrieveTotalBytes(addcopia)+totalBytes;
System.out.println("AL AGREGAR: "+copia.size()+" Tamaño"+totalBytes);
}
public File getDriveDest(){
File dest=new File(copia.get(0).getOrigen().getPath().split("\\")[0]);
return dest;
}
#Override
public Void doInBackground() throws Exception
{
for(CopyItem cop:copia){
File ori=cop.getOrigen();
File des=new File(cop.getDestino().getPath());
if(!des.exists()){
des.mkdirs();
}
if(!overrideall){
override =true;
}
File para=new File(cop.getDestino().getPath()+"\\"+ori.getName());
copyFiles(ori, para);
}
return null;
}
#Override
public void process(List<Integer> chunks)
{
for(int i : chunks)
{
progressCurrent.setValue(i);
}
}
#Override
public void done()
{
setProgress(100);
}
private long retrieveTotalBytes(ArrayList<CopyItem>fich)
{
long size=0;
for(CopyItem cop: fich)
{
size += cop.getOrigen().length();
}
return size;
}
private void copyFiles(File sourceFile, File targetFile) throws IOException
{
if(overrideall==false){
if(targetFile.exists() && !targetFile.isDirectory()){
String []options={"Si a Todos","Si","No a Ninguno","No"};
int seleccion=JOptionPane.showOptionDialog(null, "El fichero \n"+targetFile+" \n se encuentra en el equipo, \n¿Desea sobreescribirlo?", "Colisión de ficheros", JOptionPane.DEFAULT_OPTION, JOptionPane.WARNING_MESSAGE, null, options, null);
switch(seleccion){
case 0:
override=true;
overrideall=true;
break;
case 1:
override=true;
overrideall=false;
break;
case 2:
override =false;
overrideall=true;
break;
case 3:
override =false;
overrideall=false;
break;
}
}
}
if(override || !targetFile.exists()){
FileInputStream LeeOrigen= new FileInputStream(sourceFile);
OutputStream Salida = new FileOutputStream(targetFile);
byte[] buffer = new byte[1024];
int tamaño;
long fileBytes = sourceFile.length();
long totalBytesCopied = 0;
while ((tamaño = LeeOrigen.read(buffer)) > 0) {
Salida.write(buffer, 0, tamaño);
totalBytesCopied += tamaño;
copiedBytes+= tamaño;
setProgress((int)Math.round(((double)copiedBytes++ / (double)totalBytes) * 100));
int progress = (int)Math.round(((double)totalBytesCopied / (double)fileBytes) * 100);
publish(progress);
}
Salida.close();
LeeOrigen.close();
publish(100);
}
}
}
Here is CopyItem class
import java.io.File;
public class CopyItem {
File origen;
File destino;
String root;
public CopyItem(File origen, File destino) {
this.origen = origen;
this.destino = destino;
}
public CopyItem(File origen, File destino, String root) {
this.origen = origen;
this.destino = destino;
this.root = root;
}
public String getRoot() {
return root;
}
public void setRoot(String root) {
this.root = root;
}
public File getOrigen() {
return origen;
}
public void setOrigen(File origen) {
this.origen = origen;
}
public File getDestino() {
return destino;
}
public void setDestino(File destino) {
this.destino = destino;
}
#Override
public String toString() {
return super.toString(); //To change body of generated methods, choose Tools | Templates.
}
}
yes you can add the files directly to source List(the list contains files to be copied ) but you need to synchronize your code because adding more file will be in different thread(UI Thread),another way is to implement (produce/consumer ) using BlockingQueue
Consumer class run in separate Thread or Swingworker coping files is in progress.
Producer class runs UI Thread (selecting more files).
both should have access to BlockingQueue (contains files to be copied)(of course BlockingQueue implementations are thread-safe based on the documentation. ,it has the advantage to block the execution and wait for the files to be added this is very useful if you dont know when the files are added )
I prefer using Thread Pool to manage the threads executions(Optional).

Java 8 - program not reading file but seems to be writing though

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.file.Files;
import java.nio.file.OpenOption;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.nio.file.attribute.FileAttribute;
import java.nio.file.attribute.PosixFilePermission;
import java.nio.file.attribute.PosixFilePermissions;
import java.util.HashSet;
import java.util.Set;
public class RAFRead {
public static void main(String[] args) {
create();
read();
}
public static void create() {
// Create the set of options for appending to the file.
Set<OpenOption> options = new HashSet<OpenOption>();
options.add(StandardOpenOption.APPEND);
options.add(StandardOpenOption.CREATE);
// Create the custom permissions attribute.
Set<PosixFilePermission> perms = PosixFilePermissions
.fromString("rw-r-----");
FileAttribute<Set<PosixFilePermission>> attr = PosixFilePermissions
.asFileAttribute(perms);
Path file = Paths.get("./outfile.log");
ByteBuffer buffer = ByteBuffer.allocate(4);
try {
SeekableByteChannel sbc = Files.newByteChannel(file, options, attr);
for (int i = 9; i >= 0; --i) {
sbc = sbc.position(i * 4);
buffer.clear();
buffer.put(new Integer(i).byteValue());
buffer.flip();
sbc.write(buffer);
}
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
public static void read() {
// Create the set of options for appending to the file.
Set<OpenOption> options = new HashSet<OpenOption>();
options.add(StandardOpenOption.READ);
Path file = Paths.get("./outfile.log");
ByteBuffer buffer = ByteBuffer.allocate(4);
try {
SeekableByteChannel sbc = Files.newByteChannel(file, options);
int nread;
do {
nread = sbc.read(buffer);
if(nread!= -1) {
buffer.flip();
System.out.println(buffer.getInt());
}
} while(nread != -1 && buffer.hasRemaining());
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
}
I first create the file.
I am trying to put 9, then 8, then 7 and so on in the file.
But I am trying to add to file in reverse order using random access.
The output of file actually will be numbers in ascending order.
I am just writing to file in reverse order to try out random access writing.
After that I try to read the file and print the data (numbers).
It prints only 0. I was expecting it to print 1-9.
I couldn't figure out the reason. Any help is appreciated.
I followed this link from Oracle site: https://docs.oracle.com/javase/tutorial/essential/io/file.html
The file has size after I run this program, so it seems program is writing.
Since it is buffer read, i can't see the data by vi or cat.
You need to flip() the buffer before calling write() or get()(and friends), and compact() afterwards.

Download Manager In java using multi threading

as you have seen before I'm working on a download manager in java, I have asked This Question and I have read This Question But These hadn't solve my problem. now I have wrote another code in java. but there is a problem. when download finishes file is larger than it's size and related software can't read it
This is image of my code execution :
as you see file size is about 9.43 MB
This is My project directory's image:
as you see my downloaded filesize is about 13 MB
So what is my Prooblem?
here is my complete source code
Main Class:
package download.manager;
import java.util.Scanner;
/**
*
* #author Behzad
*/
public class DownloadManager {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
System.out.print("Enter url here : ");
String url = input.nextLine();
DownloadInfo information = new DownloadInfo(url);
}
}
DownloadInfo Class:
package download.manager;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.logging.Level;
import java.util.logging.Logger;
public class DownloadInfo {
private String downloadUrl;
private String fileName;
private String fileExtension;
private URL nonStringUrl;
private HttpURLConnection connection;
private int fileSize;
private int remainingByte;
private RandomAccessFile outputFile;
public DownloadInfo(String downloadUrl) {
this.downloadUrl = downloadUrl;
initiateInformation();
}
private void initiateInformation(){
fileName = downloadUrl.substring(downloadUrl.lastIndexOf('/') + 1, downloadUrl.length());
fileExtension = fileName.substring(fileName.lastIndexOf('.') + 1, fileName.length());
try {
nonStringUrl = new URL(downloadUrl);
connection = (HttpURLConnection) nonStringUrl.openConnection();
fileSize = ((connection.getContentLength()));
System.out.printf("File Size is : %d \n", fileSize);
System.out.printf("Remain File Size is : %d \n", fileSize % 8);
remainingByte = fileSize % 8;
fileSize /= 8;
outputFile = new RandomAccessFile(fileName, "rw");
} catch (MalformedURLException ex) {
Logger.getLogger(DownloadInfo.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(DownloadInfo.class.getName()).log(Level.SEVERE, null, ex);
}
System.out.printf("File Name is : %s\n", fileName);
System.out.printf("File Extension is : %s\n", fileExtension);
System.out.printf("Partition Size is : %d MB\n", fileSize);
int first = 0 , last = fileSize - 1;
ExecutorService thread_pool = Executors.newFixedThreadPool(8);
for(int i=0;i<8;i++){
if(i != 7){
thread_pool.submit(new Downloader(nonStringUrl, first, last, (i+1), outputFile));
}
else{
thread_pool.submit(new Downloader(nonStringUrl, first, last + remainingByte, (i+1), outputFile));
}
first = last + 1;
last += fileSize;
}
thread_pool.shutdown();
try {
thread_pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException ex) {
Logger.getLogger(DownloadInfo.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
and this is my downloader class:
package download.manager;
import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
*
* #author Behzad
*/
public class Downloader implements Runnable{
private URL downloadURL;
private int startByte;
private int endByte;
private int threadNum;
private RandomAccessFile outputFile;
private InputStream stream;
public Downloader(URL downloadURL,int startByte, int endByte, int threadNum, RandomAccessFile outputFile) {
this.downloadURL = downloadURL;
this.startByte = startByte;
this.endByte = endByte;
this.threadNum = threadNum;
this.outputFile = outputFile;
}
#Override
public void run() {
download();
}
private void download(){
try {
System.out.printf("Thread %d is working...\n" , threadNum);
HttpURLConnection httpURLConnection = (HttpURLConnection) downloadURL.openConnection();
httpURLConnection.setRequestProperty("Range", "bytes="+startByte+"-"+endByte);
httpURLConnection.connect();
outputFile.seek(startByte);
stream = httpURLConnection.getInputStream();
while(true){
int nextByte = stream.read();
if(nextByte == -1){
break;
}
outputFile.write(endByte);
}
} catch (IOException ex) {
Logger.getLogger(Downloader.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
This file is MP4 for as you seen, but Gom can't play it
Would you please help me?
OoOoOopppps finally I found what is the problem , It's all on seek method. because i have a file and 8 threads. so seek method changes the cursor repeatedly and make larger file and unexecutable file :), But I'm so sorry . I can't show whole code :)

I/O operations on file

I have a requirement, for example, there will be number of .txt files in one loation c:\onelocation. I want to write the content to another location in txt format. This part is pretty easy and straight forward. But there is speed breaker here.
There will be time interval take 120 seconds. Read the files from above location and write it to another files with formate txt till 120secs and save the file with name as timestamp.
After 120sec create one more files with that timestamp but we have to read the files were cursor left in previous file.
Please can you suggest any ideas, if code is provided that would be also appreciable.
Thanks Damu.
How about this? A writer that automatically changes where it is writing two every 120 seconds.
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
public class TimeBoxedWriter extends Writer {
private static DateFormat FORMAT = new SimpleDateFormat("yyyyDDDHHmm");
/** Milliseconds to each time box */
private static final int TIME_BOX = 120000;
/** For testing only */
public static void main(String[] args) throws IOException {
Writer w = new TimeBoxedWriter(new File("."), "test");
// write one line per second for 500 seconds.
for(int i = 0;i < 500;i++) {
w.write("testing " + i + "\n");
try {
Thread.sleep(1000);
} catch (InterruptedException ie) {}
}
w.close();
}
/** Output folder */
private File dir_;
/** Timestamp for current file */
private long stamp_ = 0;
/** Stem for output files */
private String stem_;
/** Current output writer */
private Writer writer_ = null;
/**
* Create new output writer
*
* #param dir
* the output folder
* #param stem
* the stem used to generate file names
*/
public TimeBoxedWriter(File dir, String stem) {
dir_ = dir;
stem_ = stem;
}
#Override
public void close() throws IOException {
synchronized (lock) {
if (writer_ != null) {
writer_.close();
writer_ = null;
}
}
}
#Override
public void flush() throws IOException {
synchronized (lock) {
if (writer_ != null) writer_.flush();
}
}
private void rollover() throws IOException {
synchronized (lock) {
long now = System.currentTimeMillis();
if ((stamp_ + TIME_BOX) < now) {
if (writer_ != null) {
writer_.flush();
writer_.close();
}
stamp_ = TIME_BOX * (System.currentTimeMillis() / TIME_BOX);
String time = FORMAT.format(new Date(stamp_));
writer_ = new FileWriter(new File(dir_, stem_ + "." + time
+ ".txt"));
}
}
}
#Override
public void write(char[] cbuf, int off, int len) throws IOException {
synchronized (lock) {
rollover();
writer_.write(cbuf, off, len);
}
}
}
Use RamdomAccessFile in java to move the cursor within the file.
Before start copying check the file modification/creation(in case of new files) time, if less than 2 mins then only start copying or else skip it.
Keep a counter of no.of bytes/lines read for each file. move the cursor to that position and read it from there.
You can duplicate the file rather than using reading and writing operations.
sample code:
FileChannel ic = new FileInputStream("<source file location>")).getChannel();
FileChannel oc = new FileOutputStream("<destination location>").getChannel();
ic.transferTo(0, ic.size(), oc);
ic.close();
oc.close();
HTH
File io is simple in java, here is an example I found on the web of copying a file to another file.
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class Copy {
public static void main(String[] args) throws IOException {
File inputFile = new File("farrago.txt");
File outputFile = new File("outagain.txt");
FileReader in = new FileReader(inputFile);
FileWriter out = new FileWriter(outputFile);
int c;
while ((c = in.read()) != -1)
out.write(c);
in.close();
out.close();
}
}

Categories

Resources