Related
in one requirement, i need to copy multiple files from one location to another network location.
let assume that i have the following files present in the /src location.
a.pdf, b.pdf, a.doc, b.doc, a.txt and b.txt
I need to copy a.pdf, a.doc and a.txt files atomically into /dest location at once.
Currently i am using Java.nio.file.Files packages and code as follows
Path srcFile1 = Paths.get("/src/a.pdf");
Path destFile1 = Paths.get("/dest/a.pdf");
Path srcFile2 = Paths.get("/src/a.doc");
Path destFile2 = Paths.get("/dest/a.doc");
Path srcFile3 = Paths.get("/src/a.txt");
Path destFile3 = Paths.get("/dest/a.txt");
Files.copy(srcFile1, destFile1);
Files.copy(srcFile2, destFile2);
Files.copy(srcFile3, destFile3);
but this process the file are copied one after another.
As an alternate to this, in order to make whole process as atomic,
i am thinking of zipping all the files and move to /dest and unzip at the destination.
is this approach is correct to make whole copy process as atomic ? any one experience similar concept and resolved it.
is this approach is correct to make whole copy process as atomic ? any one experience similar concept and resolved it.
You can copy the files to a new temporary directory and then rename the directory.
Before renaming your temporary directory, you need to delete the destination directory
If other files are already in the destination directory that you don't want to overwrite, you can move all files from the temporary directory to the destination directory.
This is not completely atomic, however.
With removing /dest:
String tmpPath="/tmp/in/same/partition/as/source";
File tmp=new File(tmpPath);
tmp.mkdirs();
Path srcFile1 = Paths.get("/src/a.pdf");
Path destFile1 = Paths.get(tmpPath+"/dest/a.pdf");
Path srcFile2 = Paths.get("/src/a.doc");
Path destFile2 = Paths.get(tmpPath+"/dest/a.doc");
Path srcFile3 = Paths.get("/src/a.txt");
Path destFile3 = Paths.get(tmpPath+"/dest/a.txt");
Files.copy(srcFile1, destFile1);
Files.copy(srcFile2, destFile2);
Files.copy(srcFile3, destFile3);
delete(new File("/dest"));
tmp.renameTo("/dest");
void delete(File f) throws IOException {
if (f.isDirectory()) {
for (File c : f.listFiles())
delete(c);
}
if (!f.delete())
throw new FileNotFoundException("Failed to delete file: " + f);
}
With just overwriting the files:
String tmpPath="/tmp/in/same/partition/as/source";
File tmp=new File(tmpPath);
tmp.mkdirs();
Path srcFile1 = Paths.get("/src/a.pdf");
Path destFile1=paths.get("/dest/a.pdf");
Path tmp1 = Paths.get(tmpPath+"/a.pdf");
Path srcFile2 = Paths.get("/src/a.doc");
Path destFile2=Paths.get("/dest/a.doc");
Path tmp2 = Paths.get(tmpPath+"/a.doc");
Path srcFile3 = Paths.get("/src/a.txt");
Path destFile3=Paths.get("/dest/a.txt");
Path destFile3 = Paths.get(tmpPath+"/a.txt");
Files.copy(srcFile1, tmp1);
Files.copy(srcFile2, tmp2);
Files.copy(srcFile3, tmp3);
//Start of non atomic section(it can be done again if necessary)
Files.deleteIfExists(destFile1);
Files.deleteIfExists(destFile2);
Files.deleteIfExists(destFile2);
Files.move(tmp1,destFile1);
Files.move(tmp2,destFile2);
Files.move(tmp3,destFile3);
//end of non-atomic section
Even if the second method contains a non-atomic section, the copy process itself uses a temporary directory so that the files are not overwritten.
If the process aborts during moving the files, it can easily be completed.
See https://stackoverflow.com/a/4645271/10871900 as reference for moving files and https://stackoverflow.com/a/779529/10871900 for recursively deleting directories.
First there are several possibilities to copy a file or a directory. Baeldung gives a very nice insight into different possibilities. Additionally you can also use the FileCopyUtils from Spring. Unfortunately, all these methods are not atomic.
I have found an older post and adapt it a little bit. You can try using the low-level transaction management support. That means you make a transaction out of the method and define what should be done in a rollback. There is also a nice article from Baeldung.
#Autowired
private PlatformTransactionManager transactionManager;
#Transactional(rollbackOn = IOException.class)
public void copy(List<File> files) throws IOException {
TransactionDefinition transactionDefinition = new DefaultTransactionDefinition();
TransactionStatus transactionStatus = transactionManager.getTransaction(transactionDefinition);
TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
#Override
public void afterCompletion(int status) {
if (status == STATUS_ROLLED_BACK) {
// try to delete created files
}
}
});
try {
// copy files
transactionManager.commit(transactionStatus);
} finally {
transactionManager.rollback(transactionStatus);
}
}
Or you can use a simple try-catch-block. If an exception is thrown you can delete the created files.
Your question lacks the goal of atomicity. Even unzipping is never atomic, the VM might crash with OutOfMemoryError right in between inflating the blocks of the second file. So there's one file complete, a second not and a third entirely missing.
The only thing I can think of is a two phase commit, like all the suggestions with a temporary destination that suddenly becomes the real target. This way you can be sure, that the second operation either never occurs or creates the final state.
Another approach would be to write a sort of cheap checksum file in the target afterwards. This would make it easy for an external process to listen for creation of such files and verify their content with the files found.
The latter would be the same like offering the container/ ZIP/ archive right away instead of piling files in a directory. Most archives have or support integrity checks.
(Operating systems and file systems also differ in behaviour if directories or folders disappear while being written. Some accept it and write all data to a recoverable buffer. Others still accept writes but don't change anything. Others fail immediately upon first write since the target block on the device is unknown.)
FOR ATOMIC WRITE:
There is no atomicity concept for standard filesystems, so you need to do only single action - that would be atomic.
Therefore, for writing more files in an atomic way, you need to create a folder with, let's say, the timestamp in its name, and copy files into this folder.
Then, you can either rename it to the final destination or create a symbolic link.
You can use anything similar to this, like file-based volumes on Linux, etc.
Remember that deleting the existing symbolic link and creating a new one will never be atomic, so you would need to handle the situation in your code and switch to the renamed/linked folder once it's available instead of removing/creating a link. However, under normal circumstances, removing and creating a new link is a really fast operation.
FOR ATOMIC READ:
Well, the problem is not in the code, but on the operation system/filesystem level.
Some time ago, I got into a very similar situation. There was a database engine running and changing several files "at once". I needed to copy the current state, but the second file was already changed before the first one was copied.
There are two different options:
Use a filesystem with support for snapshots. At some moment, you create a snapshot and then copy files from it.
You can lock the filesystem (on Linux) using fsfreeze --freeze, and unlock it later with fsfreeze --unfreeze. When the filesystem is frozen, you can read the files as usual, but no process can change them.
None of these options worked for me as I couldn't change the filesystem type, and locking the filesystem wasn't possible (it was root filesystem).
I created an empty file, mount it as a loop filesystem, and formatted it. From that moment on, I could fsfreeze just my virtual volume without touching the root filesystem.
My script first called fsfreeze --freeze /my/volume, then perform the copy action, and then called fsfreeze --unfreeze /my/volume. For the duration of the copy action, the files couldn't be changed, and so the copied files were all exactly from the same moment in time - for my purpose, it was like an atomic operation.
Btw, be sure to not fsfreeze your root filesystem :-). I did, and restart is the only solution.
DATABASE-LIKE APPROACH:
Even databases cannot rely on atomic operations, and so they first write the change to WAL (write-ahead log) and flush it to the storage. Once it's flushed, they can apply the change to the data file.
If there is any problem/crash, the database engine first loads the data file and checks whether there are some unapplied transactions in WAL and eventually apply them.
This is also called journaling, and it's used by some filesystems (ext3, ext4).
I hope this solution would be useful : as per my understanding you need to copy the files from one directory to another directory.
so my solution is as follows:
Thank You.!!
public class CopyFilesDirectoryProgram {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String sourcedirectoryName="//mention your source path";
String targetdirectoryName="//mention your destination path";
File sdir=new File(sourcedirectoryName);
File tdir=new File(targetdirectoryName);
//call the method for execution
abc (sdir,tdir);
}
private static void abc(File sdir, File tdir) throws IOException {
if(sdir.isDirectory()) {
copyFilesfromDirectory(sdir,tdir);
}
else
{
Files.copy(sdir.toPath(), tdir.toPath());
}
}
private static void copyFilesfromDirectory(File source, File target) throws IOException {
if(!target.exists()) {
target.mkdir();
}else {
for(String items:source.list()) {
abc(new File(source,items),new File(target,items));
}
}
}
}
I have one jar file for example apache-cassandra-3.11.6.jar.
Firstly i split/chunked into mutiple jars like below :
apache-cassandra1.jar
apache-cassandra2.jar
apache-cassandra3.jar
apache-cassandra4.jar
apache-cassandra5.jar
apache-cassandra6.jar
Then i reassemble them again into new Jar file i.e apache-cassandra_Merged.jar.
Now the problem comes.
When i compare the original jar file i.e apache-cassandra-3.11.6.jar with new Jar file i.e apache-cassandra_Merged.jar. then it is not matching.
The newly created jar file which is apache-cassandra_Merged.jar, it's size also reduced.
Please find below my code for your reference :
/// Chunking/spliting into mutiple jars
Path path = Paths.get("/Original_Jar/apache-cassandra-3.11.6.jar");
byte [] data = Files.readAllBytes(path); // Will read all bytes at once
Now divide total bytes into equal part and then write in each small jars one by one.
int count = 0;
for(byte[] rangeData : Arrays.copyOfRange(data, rangeSTART, rangeEND)){
FileOutputStream fileOutputStream1 = new FileOutputStream("/Cassandra_Image/Chunked_Jar/apache-cassandra"+count+".jar");
fileOutputStream1.write(rangeData);
}
//Merging back to one jar
For merging i used the same way. Created array of byte for each small/chunked jars and written into FileOutputStream("/Merged_Jar/apache-cassandra_Merged.jar") one by one.
Please let me know if i should use some other method/algorithm to split jar and reassemble it again which will make sure the originality of data after chunking and merging as well.
Note : Actually i want to transfer the jars to any server/directory where i should transfer a jar with limited size so for big size jars i need to split into small jars and send them one by one and then again reassemble them in target directory/place and it should be as original jar.
Thanks in advance.
This may not be the answer, but I provide as an information for you. Java also provides pack format where you can compress the jar files and then you can uncompress using unpack.
The tool is called pack200.
How to compress
<java_location>...\jre\lib>pack200 -J-Xmx256m small.jar.gz big.jar
How to uncompress
<java_location>...\jre\lib>unpack200 small.jar.gz big.jar
You can refer the following links.
https://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/pack200.html
https://docs.oracle.com/javase/7/docs/technotes/tools/share/unpack200.html
I am able to solve the issue with shell scripting.
Written below code in my shell script file and run through my java code.
split -b 1000000 src.jar target.jar
cat src.jaraa src.jarab src.jarac src.jarad src.jarae > merged.jar
And compare with any algorithm like sha256 checksum will work fine and it shows equal. and size also equal.
The Questions:
Main Question: What's the best strategy to parallel these jobs?
Ideas: How to speed up the process using other mechanisms like a second checksum (Adler32?)
The Szenario:
I'm writing kind of a synchronization tool in java. Basically it downloads a repository from a webserver which represents the file/directory structure on the local machine and defines sources for the needed files in compressed form combined with hash values to verify files. A basic thing i guess.
Requirements:
Multi-platform java desktop application
Best possible speed and parallelization
Example structure: (best described using mods of a game)
Example Repository File
{"name":"subset1", "mods":[
{
"modfolder":"mod1",
"modfiles":[
{
"url":"http://www.example.com/file2.7z",
"localpath":"mod1/file2",
"size":5,
"sizecompressed":3,
"checksum":"46aabad952db3e21e273ce"
},
{
"url":"http://www.example.com/file1.7z",
"localpath":"mod1/file1",
"size":9,
"sizecompressed":4,
"checksum":"862f90bafda118c4d3c5ee6477"
}
]
},
{
"modfolder":"mod2",
"modfiles":[
{
"url":"http://www.example.com/file3.7z",
"localpath":"mod2/file3",
"size":8,
"sizecompressed":4,
"checksum":"cb1e69de0f75a81bbeb465ee0cdd8232"
},
{
"url":"http://www.example.com/file1.7z",
"localpath":"mod2/file1",
"size":9,
"sizecompressed":4,
"checksum":"862f90bafda118c4d3c5ee6477"
}
]
}
]}
Client file structure, as it should be after sync
mod1/
file2
file1
mod2/
file3
file1
// mod1/file2 == mod2/file2
A special thing about the repository:
The Repository got from the server represents only subsets of a bigger repository, because the user only needs a subtree, which is changing (also overlapping).
Sometimes the Repository consists of mod1 and mod2, sometimes mod1 and mod3 and so on.
Work to be done:
Download Repository and parse it (Net I/O)
Mark files not in the repository for deletion at the end of the process (files may be copied because of same checksum) (File I/O)
If file exists: Check checksum of existing file (checksum cache) (File I/O)
If file not exists: Check checksumcache for identical files in other subtrees to copy the file instead of downloading it (Light file I/O)
Download single file in compressed form (Net I/O)
Extract compressed file (File I/O)
Checksum of uncompressed file (File I/O)
Cache checksum associated with file. (Light file I/O)
My solution: (many different producers/consumers)
The Checksum cache is using MapDBs persistent maps.
ATM only md5 checksum is used.
Queues: Every Workertype has a blocking queue (producer/consumer)
Thread Pools: Every Workertype has a fixed Threadpool e.g. 3 Downloader, 2 Checksum, ...
Workers distribute the current job to other queues: Downloader -> Extract -> Checksum
Workertypes:
Localfile Worker: Checks local file structure (using checksum cache),
redirects work to Download-Worker, Delete-Worker
Copy: Copies a file with same checksum to destination
Download: Downloads a file
Checksum: Checksum a file and inserts in checksumcache
Delete: Delete a file
Extract: Extracts a compressed file
What's the best strategy to parallel these jobs?
You have I/O. And, probably, if one job is already in progress on one directory, another job cannot be run on the same directory at the same time.
So, you need locking here. Recommendation: use a locking directory on the filesystem, and use directories, not files, to lock. Why? Because directory creation is atomic (first reason), and because Java 6 does not support atomic file creation (second reason). In fact, you may even need two locking directories: one for content download, another for content processing.
The separation of download vs processing you have already done, so I have nothing more to say here ;)
I am not sure why you want to cache checksums however? It doesn't look that useful to me...
Also, I don't know how big the files you have to deal with are, but why bother with checking the existing directory contents etc vs extract the new directory and rename? Ie:
extract new directory in newdir;
checksums;
move dstdir to dstdir.old;
move newdir to dstdir;
scrap dstdir.old.
This even means you could parallelize scrapping, but that is too much I/O parallelization... You'll have to limit the number of threads doing actual I/O.
EDIT Here is how I would separate processing:
first of all, no checksums anymore on the archive itself, but there is a file in the archive which contains the MD5 sums of each file (for instance, MD5SUMS);
two blocking queues: download -> replace, replace -> scrapping;
one processor takes care of downloading; when it is done, it fills the download -> replace queue;
another processor picks a task from the download -> replace queue; this task performs, in order, unarchive and checksumming; if both are correct, as mentioned above, it renames the existing directory, renames the extracted directory to the expected directory, and puts a scrapping task on the replace -> scrappint queue;
the third, and last, processor, picks a task from the scrapping queue and performs deletion of the previous archive.
Note that the checksumming, if it is that heavy, could be parallelized.
A program we have erred when trying to move files from one directory to another. After much debugging I located the error by writing a small utility program that just moves a file from one directory to another (code below). It turns out that while moving files around on the local filesystem works fine, trying to move a file to another filesystem fails.
Why is this? The question might be platform specific - we are running Linux on ext3, if that matters.
And the second question; should I have been using something else than the renameTo() method of the File class? It seems as if this just works on local filesystems.
Tests (run as root):
touch /tmp/test/afile
java FileMover /tmp/test/afile /root/
The file move was successful
touch /tmp/test/afile
java FileMover /tmp/test/afile /some_other_disk/
The file move was erroneous
Code:
import java.io.File;
public class FileMover {
public static void main(String arguments[] ) throws Exception {
boolean success;
File file = new File(arguments[0]);
File destinationDir = new File(arguments[1]);
File destinationFile = new File(destinationDir,file.getName() );
success = file.renameTo(destinationFile);
System.out.println("The file move was " + (success?"successful":"erroneous"));
}
}
Java 7 and above
Use Files.move(Path source, Path target, CopyOption... opts).
Note that you must not provide the ATOMIC_MOVE option when moving files between file systems.
Java 6 and below
From the docs of File.renameTo:
[...] The rename operation might not be able to move a file from one filesystem to another [...]
The obvious workaround would be to copy the file "manually" by opening a new file, write the content to the file, and delete the old file.
You could also try the FileUtils.moveFile method from Apache Commons.
Javadoc to the rescue:
Many aspects of the behavior of this method are inherently
platform-dependent: The rename operation might not be able to move a
file from one filesystem to another, it might not be atomic, and it
might not succeed if a file with the destination abstract pathname
already exists. The return value should always be checked to make sure
that the rename operation was successful.
Note that the Files class defines the move method to move or rename a
file in a platform independent manner.
From the docs:
Renames the file denoted by this abstract pathname.
Many aspects of the behavior of this method are inherently
platform-dependent: The rename operation might not be able to move a
file from one filesystem to another, it might not be atomic, and it
might not succeed if a file with the destination abstract pathname
already exists. The return value should always be checked to make sure
that the rename operation was successful.
If you want to move file between different file system you can use Apache's moveFile
your ider is error
beause /some_other_disk/ is relative url but completely url ,can not find the url
i have example
java FileMover D:\Eclipse33_workspace_j2ee\test\src\a\a.txt D:\Eclipse33_workspace_j2ee\test\src
The file move was successful
java FileMover D:\Eclipse33_workspace_j2ee\test\src\a\a.txt \Eclipse33_workspace_j2ee\test\src
The file move was erronous
result is url is error
SOLVED:
This is what was wrong:
current.addFolder(folder); (in the final else clause of the if statement)
Added a new folder, but did not guarantee that the folder passed is the folder added, it may simply do nothing if the folder already exists, so to overcome this I changed addFolder to return the actual folder (for example if it already existed) and I assigned folder to that return value. And that did the trick, so now I've got:
folder = current.addFolder(folder);
current = folder;
Thanks a lot people, your help was much appreciated :)
This is going to be a very long post, hopefully you can understand what I'm talking about and I appreciate any help. Thanks
Basically, I've created a personal, non-commercial project (which I don't plan to release) that can read ZIP and RAR files. It can only read the contents in the archive, the folders inside, the files inside the folders and its properties (such as last modified date, last modified time, CRC checksum, uncompressed size, compressed size and file name). It can't extract files either, so it's really a ZIP/RAR viewer if you may.
Anyway that's slightly irrelevant to my problem but I thought I'd give you some background info.
Now for my problem:
I can successfully list all the folders and files inside a ZIP archive, so now I want to take that raw input and link it together in some useful way. I made 2 classes: ArchiveFile (represents a file inside a ZIP) and ArchiveFolder (represents a folder inside a ZIP). They both have some useful methods such as getLastModifiedDate, getName, getPath and so on. But the difference is that ArchiveFolder can hold an ArrayList of ArchiveFile's and additional ArchiveFolder's (think of this as files and folders inside a folder).
Now I want to populate my raw input into one root ArchiveFolder, which will have all the files in the root dir of the ZIP in the ArchiveFile's ArrayList and any additional folders in the root dir of the ZIP in the ArchiveFolder's ArrayList (and this process can continue on like this like a chain reaction (more files/folders in that ArchiveFolder etc etc).
So I came up with the following code:
while (archive.hasMore()) {
String path = "";
ArchiveFolder current = root;
String[] contents = archive.getName().split("/");
for (int x = 0; x < contents.length; ++x) {
if (x == (contents.length - 1) && !archive.getName().endsWith("/")) { // If on last item and item is a file
path += contents[x]; // Update final path ArchiveFile
file = new ArchiveFile(path, contents[x], archive.getUncompressedSize(), archive.getCompressedSize(), archive.getModifiedTime(), archive.getModifiedDate(), archive.getCRC());
current.addFile(file); // Create and add the file to the current ArchiveFolder
}
else if (x == (contents.length - 1)) { // Else if we are on last item and it is a folder
path += contents[x] + "/"; // Update final path
ArchiveFolder folder = new ArchiveFolder(path, contents[x], archive.getModifiedTime(), archive.getModifiedDate());
current.addFolder(folder); // Create and add this folder to the current ArchiveFile
}
else { // Else if we are still traversing through the path
path += contents[x] + "/"; // Update path
ArchiveFolder folder = new ArchiveFolder(path, contents[x]);
current.addFolder(folder); // Create and add folder (remember we do not know the modified date/time as all we know is the path, so we can deduce the name only)
current = folder; // Update current ArchiveFolder to the newly created one for the next iteration of the for loop
}
}
archive.getNext();
}
Assume that root is the root ArchiveFolder (initially empty).
And that archive.getName() returns the name of the current file OR folder in the following fashion: file.txt or folder1/file2.txt or folder4/folder2/ (this is a empty folder) etc. So basically the relative path from the root of the ZIP archive.
Please read through the comments in the above code to familiarize yourself with it. Also assume that the addFolder method in an ArchiveFile, only adds the folder if it doesn't exist already (so there are no multiple folders) and it also updates the time and date of an existing folder if it is blank (ie it was a intermediate folder we only knew the name of, but now we know its details). The code for addFolder is (pretty self-explanitory):
public void addFolder(ArchiveFolder folder) {
int loc = folders.indexOf(folder); // folders is the ArrayList containing ArchiveFolder's
if (loc == -1) {
folders.add(folder);
}
else {
ArchiveFolder real = folders.get(loc);
if (real.getTime() == null) {
real.setTime(folder.getTime());
real.setDate(folder.getDate());
}
}
}
So I can't see anything wrong with the code, it works and after finishing, the root ArchiveFolder contains all the files in the root of the ZIP as I want it to, and it contains all the direcories in the root folder as I want it to. So you'd think it works as expected, but no the ArchiveFolder's in the root folder don't contain the data inside those 'child' folders, it's just a blank folder with no additional files and folders (while it does really contain some more files/folders when viewed in WinZip).
After debugging using Eclipse, the for loop does iterate through all the files (even those not included above), so this led me to believe that there is a problem with this line of the code:
current = folder;
What it does is, it updates the current folder (used as an intermediate by the loop) to the newly added folder.
I thought Java passed by reference and thus all new operations and new additions in future ArchiveFile's and ArchiveFolder's are automatically updated, and parent ArchiveFolder's will be updated accordingly. But that does not appear to be the case?
I know this is a long ass post and I really hope anyone can help me out with this.
Thanks in advance.
Since you use eclipse, set a breakpoint and step through the method, it may take time but it helps with finding bugs. (check the object ids for example to see if the reference has changed).
Java does not actually pass references in the way you'd understand this in C++ for example. It passes by value, but all variables of non-primitive types are actually pointers to objects. So whenever you pass a variable to a method, you are giving or a copy of the pointer, meaning both variables point to the same object (change the object from one and the other will "see" the change. But assigning a different value to the pointer on caller or callee side will not change the other side's pointer.
Hope I'm clear?
I suspect you haven't overloaded equals() and hashCode() correctly on your ArchiveFolder class, and thus
folders.indexOf(folder)
in addFolder() is always returning -1.