I am writing a java program to count line of protobuff file stored in HDFS and execute the program with "hadoop -jar countLine.jar"
However, I get the exception
Exception in thread "main" java.lang.NoSuchMethodError: com.google.protobuf.CodedInputStream.shouldDiscardUnknownFields()Z at com.google.protobuf.GeneratedMessageV3.parseUnknownField(GeneratedMessageV3.java:290)
This only happens on some of the protobuf files. Files with different schema does not have this issue.
My protobuf file is gzipped pb.gz.
//Here is the code
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path path = new Path(<HDFS path to file>);
InputStream input = new GZIPInputStream(fs.open(path));
Message m;
while ((m = defaultMsg.getParserForType().parseDelimitedFrom(input)) != null) {
recordCount++;
}
If I put the file in local, everything works fine
InputStream input = new GZIPInputStream(new File(path_to_local_file));
Message m;
while ((m = defaultMsg.getParserForType().parseDelimitedFrom(input)) != null) {
recordCount++;
}
Does anyone have idea. Will the file size cause this issue?
Thanks
David
Thank you #jwismar for the hints.
The issue happens when I run the "hadoop jar countLine.jar" from command line. Hadoop classloader loads protobuf library came with it which has lower version than the protoc that I used to generate the java files. Once I down grade the protoc to lower version and re-generate the java files, the issue is gone.
Thanks
David
Related
I currently want to copy a file from hdfs to local computer. I have finished most of the work by fileinputstream and fileoutputstream. But then I encounter the following issue.
JAVA I/O exception. Mkdirs fail to create file
I have do some research and figure out that as I am using
filesystem.create()(hadoop function)
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20org.apache.hadoop.util.Progressable)
the reason is as follows:
if I set my path to a non-existing folder, a folder will be created and the file I download is inside.
if I set my path to existing folder (say current directory), the above I/O exception occur.
Say if I already get the path and fileinputstream right, what should I use (better in filesystem library) in order to go around this problem
my code
//src and dst are the path input and output
Configuration conf = new Configuration();
FileSystem inFS = FileSystem.get(URI.create(src), conf);
FileSystem outFS = FileSystem.get(URI.create(dst), conf);
FSDataInputStream in = null;
FSDataOutputStream out = null;
in = inFS.open(new Path(src));
out = outFS.create(new Path(dst),
new Progressable() {
/*
* Print a dot whenever 64 KB of data has been written to
* the datanode pipeline.
*/
public void progress() {
System.out.print(".");
}
});
In the "File" class there is a method called
createNewFile() that will create a new file only if one doent exist.
Problem
I have a file saved in HDFS and all I want to do is to run my spark application, calculate a result javaRDD and use saveAsTextFile() in order to store the new "file" in HDFS.
However Spark's saveAsTextFile() does not work if the file already exists. It does not overwrite it.
What I tried
So I searched for a solution to this and I found that a possible way to make it work could be deleting the file through the HDFS API before trying to save the new one.
I added the Code:
FileSystem hdfs = FileSystem.get(new Configuration());
Path newFolderPath = new Path("hdfs://node1:50050/hdfs/" +filename);
if(hdfs.exists(newFolderPath)){
System.out.println("EXISTS");
hdfs.delete(newFolderPath, true);
}
filerdd.saveAsTextFile("/hdfs/" + filename);
When I tried to run my Spark application, the file was deleted but I get a FileNotFoundException.
Considering the fact, that this exception occurs when someone is trying to read a file from a path and the file does not exist, this makes no sense because after deleting the file, there is no code that tries to read it.
Part of my code
JavaRDD<String> filerdd = sc.textFile("/hdfs/" + filename) // load the file here
...
...
// Transformations here
filerdd = filerdd.map(....);
...
...
// Delete old file here
FileSystem hdfs = FileSystem.get(new Configuration());
Path newFolderPath = new Path("hdfs://node1:50050/hdfs/" +filename);
if(hdfs.exists(newFolderPath)){
System.out.println("EXISTS");
hdfs.delete(newFolderPath, true);
}
// Write new file here
filerdd.saveAsTextFile("/hdfs/" + filename);
I am trying to do the simplest thing here but I have no idea why this does not work. Maybe the filerdd is somehow connected to the path??
The problem is you use the same path for input and output. Spark's RDD will be executed lazily. It runs when you call saveAsTextFile. At this point, you have already deleted the newFolderPath. So filerdd will complain.
Anyway, you should not use the same path for input and output.
I am getting an NPE at the point of getting path of a File (an sh file in assets folder).
I have tried to read about NPE i detail from the following thread, but this actually could not solve my problem.
What is a NullPointerException, and how do I fix it?
Following is my code snippet:
File absPathofBash;
url = ClassLoader.class.getResource("assets/forbackingup.sh");
absPathofBash = new File(url.getPath());
Later I'm using it in a ProcessBuilder, as
ProcessBuilder pb = new ProcessBuilder(url.getPath(), param2, param3)
I've also tried getting the absolute path directly, like
absPathofBash = new File("assets/forbackingup.sh").getAbsolutePath();
Using the latter way, I am able to process it, but if I create a jar then the file cannot be found. (although the Jar contains the file within the respective folder assets)
I would be thankful if anyone can help me on that.
Once you have packaged your code as a jar, you can not load files that are inside the jar using file path, instead they are class resources and you have to use this to load:
this.getClass().getClassLoader().getResource("assets/forbackingup.sh");
This way you load assets/forbackingup.sh as an absolute path inside your jar. you also can use this.getClass().getResource() but this way the path must be relative to this class path inside jar.
getResource method gives you an URL, if you want to get directly an InputStream you can use getResourceAsStream
Hope it helps!
Since the file itself is in the jar file, you could try using:
InputStream is = this.getClass().getClassLoader().getResourceAsStream(fileNameFromJar);
In case of jar file , classloader will return URL different than that of when the target file is not embedded inside jar. Refer to answer on link which should help u :
How to use ClassLoader.getResources() in jar file
I got it done by creating a temp file. Though it's not difficult, yet I'm posting the code patch here:
InputStream stream = MyClass.class.getClassLoader().
getResourceAsStream("assets/forbackingup.sh");
File temp = File.createTempFile("forbackingup", ".sh");
OutputStream outputStream =
new FileOutputStream(temp);
int read = 0;
byte[] bytes = new byte[1024];
while ((read = stream.read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
outputStream.close();
}
Now, we have this temp file here which we can pipe to the ProcessBuilder like,
String _filePath=temp.getPath();
ProcessBuilder pb = new ProcessBuilder(url.getPath(), param2, param3)
Thank you everyone for your considerations.
You can use Path class like :
Path path = Paths.get("data/test-write.txt");
if(!Files.exists(path)){
// can handle null pointer exception
}
While trying to copy some files in my jar file to a temp directory with my java app, the following exception is thrown:
java.nio.file.FileSystemNotFoundException
at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171)
at com.sun.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:157)
at java.nio.file.Paths.get(Unknown Source)
at com.sora.util.walltoggle.pro.WebViewPresentation.setupTempFiles(WebViewPresentation.java:83)
....
and this is a small part of my setupTempFiles(with line numbers):
81. URI uri = getClass().getResource("/webViewPresentation").toURI();
//prints: URI->jar:file:/C:/Users/Tom/Dropbox/WallTogglePro.jar!/webViewPresentation
82. System.out.println("URI->" + uri );
83. Path source = Paths.get(uri);
the webViewPresentation directory resides in the root directory of my jar:
This problem only exits when I package my app as a jar, debugging in Eclipse has no problems. I suspect that this has something to do with this bug but I'm not sure how to correct this problem.
Any helps appreciated
If matters:
I'm on Java 8 build 1.8.0-b132
Windows 7 Ult. x64
A FileSystemNotFoundException means the file system cannot be created automatically; and you have not created it here.
Given your URI, what you should do is split against the !, open the filesystem using the part before it and then get the path from the part after the !:
final Map<String, String> env = new HashMap<>();
final String[] array = uri.toString().split("!");
final FileSystem fs = FileSystems.newFileSystem(URI.create(array[0]), env);
final Path path = fs.getPath(array[1]);
Note that you should .close() your FileSystem once you're done with it.
Accepted answer isn't the best since it doesn't work when you start application in IDE or resource is static and stored in classes!
Better solution was proposed at java.nio.file.FileSystemNotFoundException when getting file from resources folder
InputStream in = getClass().getResourceAsStream("/webViewPresentation");
byte[] data = IOUtils.toByteArray(in);
IOUtils is from Apache commons-io.
But if you are already using Spring and want a text file you can change the second line to
StreamUtils.copyToString(in, Charset.defaultCharset());
StreamUtils.copyToByteArray also exists.
This is maybe a hack, but the following worked for me:
URI uri = getClass().getResource("myresourcefile.txt").toURI();
if("jar".equals(uri.getScheme())){
for (FileSystemProvider provider: FileSystemProvider.installedProviders()) {
if (provider.getScheme().equalsIgnoreCase("jar")) {
try {
provider.getFileSystem(uri);
} catch (FileSystemNotFoundException e) {
// in this case we need to initialize it first:
provider.newFileSystem(uri, Collections.emptyMap());
}
}
}
}
Path source = Paths.get(uri);
This uses the fact that ZipFileSystemProvider internally stores a List of FileSystems that were opened by URI.
If you're using spring framework library, then there is an easy solution for it.
As per requirement we want to read webViewPresentation;
I could solve the same problem with below code:
URI uri = getClass().getResource("/webViewPresentation").toURI();
FileSystems.getDefault().getPath(new UrlResource(uri).toString());
I am developing a Linux-only Java application, and I need to execute a shell script in it. According to what I have read, the only way to execute that shell script is by extracting it from the jar file and executing it. The question is? How can I extract that shell script at runtime?
Thank you in advance
Unix does not know how to run scripts inside jar files. You must create a file (there are routines to create temporary files in the runtime) with the given content and then run that file - see How to run Unix shell script from Java code? for instructions. When done, delete it from the filesystem.
I found this question today ... i think there is a better answer:
unzip -p JARFILE SCRIPTFILE | bash
should do it.
where JARFILE is the path to the jar file
and SCRIPTFILE is the path WITHIN the jar of the script file to execute.
this will extract the file to stdout which is then piped to the shell (bash)
As someone has mentioned before, you can copy the content in the bundle resource to a temp location, execute the script, and remove the script in the temp location.
Here is the code to do that. Note that I am using Google Guava library.
// Read the bundled script as string
String bundledScript = CharStreams.toString(
new InputStreamReader(getClass().getResourceAsStream("/bundled_script_path.sh"), Charsets.UTF_8));
// Create a temp file with uuid appended to the name just to be safe
File tempFile = File.createTempFile("script_" + UUID.randomUUID().toString(), ".sh");
// Write the string to temp file
Files.write(bundledScript, tempFile, Charsets.UTF_8);
String execScript = "/bin/sh " + tempFile.getAbsolutePath();
// Execute the script
Process p = Runtime.getRuntime().exec(execScript);
// Output stream is the input to the subprocess
OutputStream outputStream = p.getOutputStream();
if (outputStream != null) {
outputStream.close();
}
// Input stream is the normal output of the subprocess
InputStream inputStream = p.getInputStream();
if (inputStream != null) {
// You can use the input stream to log your output here.
inputStream.close();
}
// Error stream is the error output of the subprocess
InputStream errorStream = p.getErrorStream();
if (errorStream != null) {
// You can use the input stream to log your error output here.
errorStream.close();
}
// Remove the temp file from disk
tempFile.delete();
Do not bundle the script in your jar in the first place. Deploy the scripts as independent files.