Overwriting HDFS file/directory through Spark - java

Problem
I have a file saved in HDFS and all I want to do is to run my spark application, calculate a result javaRDD and use saveAsTextFile() in order to store the new "file" in HDFS.
However Spark's saveAsTextFile() does not work if the file already exists. It does not overwrite it.
What I tried
So I searched for a solution to this and I found that a possible way to make it work could be deleting the file through the HDFS API before trying to save the new one.
I added the Code:
FileSystem hdfs = FileSystem.get(new Configuration());
Path newFolderPath = new Path("hdfs://node1:50050/hdfs/" +filename);
if(hdfs.exists(newFolderPath)){
System.out.println("EXISTS");
hdfs.delete(newFolderPath, true);
}
filerdd.saveAsTextFile("/hdfs/" + filename);
When I tried to run my Spark application, the file was deleted but I get a FileNotFoundException.
Considering the fact, that this exception occurs when someone is trying to read a file from a path and the file does not exist, this makes no sense because after deleting the file, there is no code that tries to read it.
Part of my code
JavaRDD<String> filerdd = sc.textFile("/hdfs/" + filename) // load the file here
...
...
// Transformations here
filerdd = filerdd.map(....);
...
...
// Delete old file here
FileSystem hdfs = FileSystem.get(new Configuration());
Path newFolderPath = new Path("hdfs://node1:50050/hdfs/" +filename);
if(hdfs.exists(newFolderPath)){
System.out.println("EXISTS");
hdfs.delete(newFolderPath, true);
}
// Write new file here
filerdd.saveAsTextFile("/hdfs/" + filename);
I am trying to do the simplest thing here but I have no idea why this does not work. Maybe the filerdd is somehow connected to the path??

The problem is you use the same path for input and output. Spark's RDD will be executed lazily. It runs when you call saveAsTextFile. At this point, you have already deleted the newFolderPath. So filerdd will complain.
Anyway, you should not use the same path for input and output.

Related

File copying in java (cannot create file) by Hadoop

I currently want to copy a file from hdfs to local computer. I have finished most of the work by fileinputstream and fileoutputstream. But then I encounter the following issue.
JAVA I/O exception. Mkdirs fail to create file
I have do some research and figure out that as I am using
filesystem.create()(hadoop function)
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20org.apache.hadoop.util.Progressable)
the reason is as follows:
if I set my path to a non-existing folder, a folder will be created and the file I download is inside.
if I set my path to existing folder (say current directory), the above I/O exception occur.
Say if I already get the path and fileinputstream right, what should I use (better in filesystem library) in order to go around this problem
my code
//src and dst are the path input and output
Configuration conf = new Configuration();
FileSystem inFS = FileSystem.get(URI.create(src), conf);
FileSystem outFS = FileSystem.get(URI.create(dst), conf);
FSDataInputStream in = null;
FSDataOutputStream out = null;
in = inFS.open(new Path(src));
out = outFS.create(new Path(dst),
new Progressable() {
/*
* Print a dot whenever 64 KB of data has been written to
* the datanode pipeline.
*/
public void progress() {
System.out.print(".");
}
});
In the "File" class there is a method called
createNewFile() that will create a new file only if one doent exist.

NoSuchFileException adding file to ZIP using Java NIO

I have a frustrating issue using the Java NIO to add files to existing ZIPs.
On a test of 2500 files, 2 or 3 will fail. I am adding files to the root of the ZIP and not in a subfolder (which appears to be the source of some issues in other posts).
The weird thing is that the file cited in the exception message it claims doesn't exist is neither the ZIP or the file being added, but the temporary file created by Java as it builds a new ZIP file. Here is the code (less the try/catch):
Map<String, String> zipProps = new HashMap<>();
zipProps.put("create", "false");
zipProps.put("encoding", "UTF-8");
FileSystem zipFs = null;
URI zipAsFileSys = new URI("jar", fileToArchive.toURI().toString(), null);
zipFs = FileSystems.newFileSystem(zipAsFileSys, zipProps);
Path pathToNewFileInZip = zipFs.getPath(fileIdFile.getName());
Path pathToNewFileOnDisk = Paths.get(fileIdFile.getAbsolutePath());
Files.createFile(pathToNewFileInZip ); //Added later. No difference.
Files.copy(pathToNewFileOnDisk, pathToNewFileInZip, StandardCopyOption.REPLACE_EXISTING);
if(zipFs!=null) zipFs.close();
And the exception:
Exception: java.nio.file.NoSuchFileException: \\Server\archives\zipfstmp7224673021628877485.tmp
This was ultimately traced to network latency and/or Windows caching when manipulating files on a remote drive.
Seems Java can't figure out that the file it just created actually exists. It would be nice if it was possible to get a handle on the tmp file to see if it exists.
Seeing as I don't have the ability to mess with the OS caching, I never found a good workaround other than introducing a delay before the Files.copy(...) call, and because our production environment doesn't use multiple servers for this exact task, it isn't stopping me from proceeding.

Cannot make file java.io.IOException: No such file or directory [duplicate]

This question already has answers here:
File.createNewFile() thowing IOException No such file or directory
(10 answers)
Closed 1 year ago.
I am trying to create a file on the filesystem, but I keep getting this exception:
java.io.IOException: No such file or directory
I have an existing directory, and I am trying to write a file to that directory.
// I have also tried this below, but get same error
// new File(System.getProperty("user.home") + "/.foo/bar/" + fileName);
File f = new File(System.getProperty("user.home") + "/.foo/bar/", fileName);
if (f.exists() && !f.canWrite())
throw new IOException("Kan ikke skrive til filsystemet " + f.getAbsolutePath());
if (!f.isFile()) {
f.createNewFile(); // Exception here
} else {
f.setLastModified(System.currentTimeMillis());
}
Getting exception:
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:883)`
I have write permission to the path, however the file isn't created.
If the directory ../.foo/bar/ doesn't exist, you can't create a file there, so make sure you create the directory first.
Try something like this:
File f = new File("somedirname1/somedirname2/somefilename");
if (!f.getParentFile().exists())
f.getParentFile().mkdirs();
if (!f.exists())
f.createNewFile();
Print the full file name out or step through in a debugger. When I get confused by errors like this, it means that my assumptions and expectations don't match reality. Make sure you can see what the path is; it'll help you figure out where you've gone wrong.
Be careful with permissions, it is problably you don't have some of them. You can see it in settings -> apps -> name of the application -> permissions -> active if not.
Try with
f.mkdirs() then createNewFile()
You may want to use Apache Commons IO's FileUtils.openOutputStream(File) method. It has good Exception messages when something went wrong and also creates necessary parent dirs. If everything was right then you directly get your OutputStream - very neat.
If you just want to touch the file then use FileUtils.touch(File) instead.
File.isFile() is false if the file / directory does not exist, so you can't use it to test whether you're trying to create a directory. But that's not the first issue here.
The issue is that the intermediate directories don't exist. You want to call f.mkdirs() first.
I got the same problem when using rest-easy. After searching while i figured that this error occured when there is no place to keep temporary files. So in tomcat you can just create tomcat-root/temp folder.
i fixed my problem by this code on linux file system
if (!file.exists())
Files.createFile(file.toPath());

file.exists() returning false when the file does exist

In an Android application I'm working on, the user should be able to create a new CSV file on the SD card, named using text they input in an EditText.
The problem is that after instantiating the File using the directory and filename, file.exists() returns false, even when the file does indeed exist at that location. I have browsed to SD card using an Android file browser and through Windows Explorer, and the file does exist.
Is this the correct way to check if the file already exists, and if so, what am I missing so that it returns true when it exists?
String csvname = edittext.getText().toString() + ".csv";
File sdCard = Environment.getExternalStorageDirectory(); //path returns "/mnt/sdcard"
File dir = new File (sdCard.getAbsolutePath() + "/Android/data/" + getPackageName() + "/files/"); // path returns "/mnt/sdcard/Android/data/com.phx.license/files"
dir.mkdirs();
File file = new File(dir, csvname); //path returns "/mnt/sdcard/Android/data/com.phx.license/files/Test.csv"
if(!file.exists()) //WHY DOES IT SAY IT DOESN'T EXIST WHEN IT DOES?
{
...
}
If you use createNewFile it will only create a file if it does not already exist.
Java Files Documentation
public boolean createNewFile()
throws IOException
Atomically creates a new, empty file named by this abstract pathname if and only if a file with this name does not yet exist. The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
Note: this method should not be used for file-locking, as the resulting protocol cannot be made to work reliably. The FileLock facility should be used instead.
Returns:
true if the named file does not exist and was successfully created; false if the named file already exists
Throws:
IOException - If an I/O error occurred
SecurityException - If a security manager exists and its SecurityManager.checkWrite(java.lang.String) method denies write access to the file
Since:
1.2
Creating a new file object like so new File(dir, csvname); does not create a new file in the file system.
You need to write data to it first.
I had the exact same issue but with yarn on hadoop where a spark job was trying to execute a command.
It was a file permission issue. I troubleshooted it by code like below which is in scala. exists and notExists both return false, which means the jvm is not able to tell if the file exists or not.
import java.nio.file.Path
import java.nio.file.Paths
val path = Paths.get(fileLocation);
println(":"+ Files.exists(path)+ ":" + Files.notExists(path))

Netbeans: Try to load file but not found (Java)

I have every time the same problem when I'm trying to load files with Java in Netbeans (6.9).
It seems that the files aren't found. I get the error:
java.lang.NullPointerException
In this context:
File file = new File(this.getClass().getClassLoader().getResource("file.xml").getFile());
// or this also don't work
File file = new File("file.xml");
The file file.xml is in the same directory as the Main.java file.
How could I load this file?
This should work (it does for me):
String path = URLDecoder.decode(getClass().getResource("file.xml").getFile(), "UTF-8");
File f = new File(path);
If I understand the Javadocs correctly, this should be the same as using getClass().getClassloader().getResource() but in my experience it is different
I would suggest that you add a line so it says something along the lines (untested):
File f = new File(....);
System.out.println("f=" + f.getAbsolutePath());
// do stuff with f
This will tell you exactly where the file is expected to be and allow you to figure out what exactly is going on.
Sometimes you might need to add an extra / in front
File file = new File(this.getClass().getClassLoader().getResource("/file.xml").getFile());

Categories

Resources