Writing file to HDFS using Java - java

I'm trying to write a file to HDFS, the file get created but it is empty on the cluster, however when I run the code locally it works like a charm.
here's my code :
FSDataOutputStream recOutputWriter = null;
FileSystem fs = null;
try {
//OutputWriter = new FileWriter(outputFileName,true);
Configuration configuration = new Configuration();
fs = FileSystem.get(configuration);
Path testOutFile = new Path(outputFileName);
recOutputWriter = fs.create(testOutFile);
//outputWriter = new FileWriter(outputFileName,true);
} catch (IOException e) {
e.printStackTrace();
}
recOutputWriter.writeBytes("======================================\n");
recOutputWriter.writeBytes("OK\n");
recOutputWriter.writeBytes("======================================\n");
if (recOutputWriter != null) {
recOutputWriter.close();
}
fs.close();
did I miss something ?

In order to write data to a file after creating it on the cluster I had to add :
System.setProperty("HADOOP_USER_NAME", "vagrant");
Refrence - Writing files to Hadoop HDFS using Scala

Related

Java - a file is saved to the wrong directory in S3 when copied from HDFS

I wrote a method for saving files from an HDFS directory to S3. But the files are getting saved to the wrong directory in S3. I've inspected the logs an have confirmed that the value of s3TargetPath is s3://bucketName/data and hdfsSource is also resolved correctly.
However instead of being saved to the s3TargetPath they are saved to s3://bucketName//data/.
And also the s3://bucketName//data/ directory contains a file data with a content type: binary/octet-stream, fs-type: Hadoop block.
What needs to be changed in my code to save files to the right S3 path?
private String hdfsPath = "hdfs://localhost:9010/user/usr1/data";
private String s3Path = "s3://bucketName/data";
copyFromHdfstoS3(hdfsPath, s3Path);
//
void copyFromHdfstoS3(String hdfsDir, String s3sDir) {
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(new URI(hdfsDir), conf);
FileSystem s3Fs = FileSystem.get(new URI(s3sDir), conf);
Path hdfsSource = new Path(hdfsDir);
Path s3TargetPath = new Path(s3sDir);
RemoteIterator<LocatedFileStatus> sourceFiles = hdfs.listFiles(sourcePath, false);
if (!s3Fs.exists(s3TargetPath)) {
s3Fs.mkdirs(s3TargetPath);
}
if (sourceFiles != null) {
while (sourceFiles.hasNext()) {
Path srcFilePath = sourceFiles.next().getPath();
if (FileUtil.copy(hdfs, srcFilePath, s3Fs, s3TargetPath, false, true, new Configuration())) {
LOG.info("Copied Successfully");
} else {
LOG.info("Copy Failed");
}
}
}
}

Create instance of file using value of a property in properties file

I'm trying to create an instance of a file to parse html records from a property value. the problem is in the url of the file that I must put in the file properties, here is my example :
the correspondance code for reading file :
public void extraxtElementWithoutId() {
Map<String,List<List<Element>>> uniciteIds = new HashMap<String,List<List<Element>>>();
FileReader fileReader = null;
Document doc = null;
try {
fileReader = new FileReader(new ClassPathResource(FILEPROPERTYNAME).getFile());
Properties prop = new Properties();
prop.load(fileReader);
Enumeration<?> enumeration = prop.propertyNames();
List<List<Element>> fiinalReturn = null;
while (enumeration.hasMoreElements()) {
String path = (String) enumeration.nextElement();
System.out.println("Fichier en question : " + prop.getProperty(path));
URL url = getClass().getResource(prop.getProperty(path));
System.out.println(url.getPath());
File inputFile = new File(url.getPath());
doc = Jsoup.parse(inputFile, "UTF-8");
//fiinalReturn = getListofElements(doc);
//System.out.println(fiinalReturn);
fiinalReturn = uniciteIds.put("Duplicat Id", getUnicityIds(doc));
System.out.println(fiinalReturn);
}
} catch (IOException e) {
e.printStackTrace();
}finally {
try{
fileReader.close();
}catch(Exception e) {
e.printStackTrace();
}
}
}
Thank you in advance,
Best Regards.
You are making a very common mistake for line -
URL url = getClass().getResource(prop.getProperty(path));
Try with property value as ( by removing src ) - /testHtmlFile/test.html and so on. Don't change code.
UrlEnterer1=/testHtmlFile/test.html instead of preceding it with src.
prop.getProperty(path) should be as per your build path location for the file. Check your build directory as how these files are stored. These are not stored under src but directly under build directory.
This answer explains a little bit about path value for file reading from class path.
Also, as a side note ( not related to question ) , try not doing prop.getProperty(path) but directly injecting property value in your class using org.springframework.beans.factory.annotation.Value annotation.

How to read Hadoop sequence file using Java

I have a sequence file generated by Spark using saveAsObjectFile function. File content is just some int numbers. And I want to read it locally with Java. Here is my code:
FileSystem fileSystem = null;
SequenceFile.Reader in = null;
try {
fileSystem = FileSystem.get(conf);
Path path = new Path("D:\\spark_sequence_file");
in = new SequenceFile.Reader(conf, SequenceFile.Reader.file(path));
Writable key = (Writable)
ReflectionUtils.newInstance(in.getKeyClass(), conf);
BytesWritable value = new BytesWritable();
while (in.next(key, value)) {
byte[] val_byte = value.getBytes();
int val = ByteBuffer.wrap(val_byte, 0, 4).getInt();
}
} catch (IOException e) {
e.printStackTrace();
}
But I can't read it correctly; I just get all the same values, and obviously they are wrong. Here is my answer snapshot
The file head is like this:
Can anybody help me?
In Hadoop usually the Keys are of type WritableComparable and values are of type Writable. Keeping this basic concept in mind I read the Sequence File in the below way.
Configuration config = new Configuration();
Path path = new Path(PATH_TO_YOUR_FILE);
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value))
// do some thing
reader.close();
The data issue in your case might be because of the reason you are using saveAsObjectFile() rather than using saveAsSequenceFile(String path,scala.Option<Class<? extends org.apache.hadoop.io.compress.CompressionCodec>> codec)
Please try to use the above method and see if the issue persist.

Java - Running JAR refuses to load file in the same directory

Quick one. I'm trying to deploy a program, which borks at the following code. I want to read a properties file named, adequately, properties.
Properties props = new Properties();
InputStream is;
// First try - loading from the current directory
try {
File f = new File("properties");
is = new FileInputStream(f);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace(System.err);
is = null;
}
try {
if (is == null) {
// Try loading from classpath
is = getClass().getResourceAsStream("properties");
}
//Load properties from the file (if found), else crash and burn.
props.load(is);
} catch (IOException e) {
e.printStackTrace(System.err);
}
Everything goes well when I run the program through Netbeans.
When I run the JAR by itself, though, I get two exceptions.
java.io.FileNotFoundException: properties (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(Unknown Source)
.
.
.
Exception in Application start method
Exception in Application stop method
java.lang.reflect.InvocationTargetException
.
.
.
(exception during props.load(is) because is == null)
I'm running the file from the "dist" folder. I've tried placing the properties file inside the folder with the jar, without result. Normally, the properties file is located in the root project folder.
Any ideas?
You read your file as a resource (getResourceAsStream("properties");). So it must be in the classpath. Perhaps in the jar directly or in a directory which you add to the classpath.
A jar is a zip file so you can open it with 7zip for example add your properties file to the jars root level and try it again.
Thanks to the comments, I built an absolute path generator based on the current run directory of the jar. Props to you, guys.
private String relativizer(String file) {
URL url = RobotikosAnomologitos.class.getProtectionDomain().getCodeSource().getLocation();
String urlString = url.toString();
int firstSlash = urlString.indexOf("/");
int targetSlash = urlString.lastIndexOf("/", urlString.length() - 2) + 1;
return urlString.substring(firstSlash, targetSlash) + file;
}
So my new file-reading structure is:
Properties props = new Properties();
InputStream is;
// First try - loading from the current directory
try {
File f = new File("properties");
is = new FileInputStream(f);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace(System.err);
is = null;
}
try {
if (is == null) {
// Try loading from classpath
String pathToProps = relativizer("properties");
is = new FileInputStream(new File(pathToProps));
//is = getClass().getResourceAsStream(pathToProps);
}
//Load properties from the file (if found), else crash and burn.
props.load(is);
} catch (IOException e) {
e.printStackTrace(System.err);
}
// Finally parse the properties.
//code here, bla bla

How to copy a file on the FTP server to a directory on the same server in Java?

I'm using Apache Commons FTP to upload a file. Before uploading I want to check if the file already exists on the server and make a backup from it to a backup directory on the same server.
Does anyone know how to copy a file from a FTP server to a backup directory on the same server?
public static void uploadWithCommonsFTP(File fileToBeUpload){
FTPClient f = new FTPClient();
FTPFile backupDirectory;
try {
f.connect(server.getServer());
f.login(server.getUsername(), server.getPassword());
FTPFile[] directories = f.listDirectories();
FTPFile[] files = f.listFiles();
for(FTPFile file:directories){
if (!file.getName().equalsIgnoreCase("backup")) {
backupDirectory=file;
} else {
f.makeDirectory("backup");
}
}
for(FTPFile file: files){
if(file.getName().equals(fileToBeUpload.getName())){
//copy file to backupDirectory
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
Edited code: still there is a problem, when i backup zip file, the backup-ed file is corrupted.
Does any body know the reason for it?
public static void backupUploadWithCommonsFTP(File fileToBeUpload) {
FTPClient f = new FTPClient();
boolean backupDirectoryExist = false;
boolean fileToBeUploadExist = false;
FTPFile backupDirectory = null;
try {
f.connect(server.getServer());
f.login(server.getUsername(), server.getPassword());
FTPFile[] directories = f.listDirectories();
// Check for existence of backup directory
for (FTPFile file : directories) {
String filename = file.getName();
if (file.isDirectory() && filename.equalsIgnoreCase("backup")) {
backupDirectory = file;
backupDirectoryExist = true;
break;
}
}
if (!backupDirectoryExist) {
f.makeDirectory("backup");
}
// Check if file already exist on the server
f.changeWorkingDirectory("files");
FTPFile[] files = f.listFiles();
f.changeWorkingDirectory("backup");
String filePathToBeBackup="/home/user/backup/";
String prefix;
String suffix;
String fileNameToBeBackup;
FTPFile fileReadyForBackup = null;
f.setFileType(FTP.BINARY_FILE_TYPE);
f.setFileTransferMode(FTP.BINARY_FILE_TYPE);
for (FTPFile file : files) {
if (file.isFile() && file.getName().equals(fileToBeUpload.getName())) {
prefix = FilenameUtils.getBaseName(file.getName());
suffix = ".".concat(FilenameUtils.getExtension(file.getName()));
fileNameToBeBackup = prefix.concat(Calendar.getInstance().getTime().toString().concat(suffix));
filePathToBeBackup = filePathToBeBackup.concat(fileNameToBeBackup);
fileReadyForBackup = file;
fileToBeUploadExist = true;
break;
}
}
// If file already exist on the server create a backup from it otherwise just upload the file.
if(fileToBeUploadExist){
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
f.retrieveFile(fileReadyForBackup.getName(), outputStream);
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());
if(f.storeUniqueFile(filePathToBeBackup, is)){
JOptionPane.showMessageDialog(null, "Backup succeeded.");
f.changeWorkingDirectory("files");
boolean reply = f.storeFile(fileToBeUpload.getName(), new FileInputStream(fileToBeUpload));
if(reply){
JOptionPane.showMessageDialog(null,"Upload succeeded.");
}else{
JOptionPane.showMessageDialog(null,"Upload failed after backup.");
}
}else{
JOptionPane.showMessageDialog(null,"Backup failed.");
}
}else{
f.changeWorkingDirectory("files");
f.setFileType(FTP.BINARY_FILE_TYPE);
f.enterLocalPassiveMode();
InputStream inputStream = new FileInputStream(fileToBeUpload);
ByteArrayInputStream in = new ByteArrayInputStream(FileUtils.readFileToByteArray(fileToBeUpload));
boolean reply = f.storeFile(fileToBeUpload.getName(), in);
System.out.println("Reply code for storing file to server: " + reply);
if(!f.completePendingCommand()) {
f.logout();
f.disconnect();
System.err.println("File transfer failed.");
System.exit(1);
}
if(reply){
JOptionPane.showMessageDialog(null,"File uploaded successfully without making backup." +
"\nReason: There wasn't any previous version of this file.");
}else{
JOptionPane.showMessageDialog(null,"Upload failed.");
}
}
//Logout and disconnect from server
in.close();
f.logout();
f.disconnect();
} catch (IOException e) {
e.printStackTrace();
}
}
If you are using apache commons net FTPClient, there is a direct method to move a file from one location to another location (if the user has proper permissions).
ftpClient.rename(from, to);
or, If you are familiar with ftp commands, you can use something like
ftpClient.sendCommand(FTPCommand.yourCommand, args);
if(FTPReply.isPositiveCompletion(ftpClient.getReplyCode())) {
//command successful;
} else {
//check for reply code, and take appropriate action.
}
If you are using any other client, go through the documentation, There wont be much changes between client implementations.
UPDATE:
Above approach moves the file to to directory, i.e, the file won't be there in from directory anymore. Basically ftp protocol meant to be transfer the files from local <-> remote or remote <-> other remote but not to transfer with in the server.
The work around here, would be simpler, get the complete file to a local InputStream and write it back to the server as a new file in the back up directory.
to get the complete file,
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
ftpClient.retrieveFile(fileName, outputStream);
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());
now, store this stream to backup directory. First we need to change working directory to backup directory.
// assuming backup directory is with in current working directory
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);//binary files
ftpClient.changeWorkingDirectory("backup");
//this overwrites the existing file
ftpClient.storeFile(fileName, is);
//if you don't want to overwrite it use storeUniqueFile
Hope this helps you..
Try this way,
I am using apache's library .
ftpClient.rename(from, to) will make it easier, i have mentioned in the code below
where to add ftpClient.rename(from,to).
public void goforIt(){
FTPClient con = null;
try
{
con = new FTPClient();
con.connect("www.ujudgeit.net");
if (con.login("ujud3", "Stevejobs27!!!!"))
{
con.enterLocalPassiveMode(); // important!
con.setFileType(FTP.BINARY_FILE_TYPE);
String data = "/sdcard/prerakm4a.m4a";
ByteArrayInputStream(data.getBytes());
FileInputStream in = new FileInputStream(new File(data));
boolean result = con.storeFile("/Ads/prerakm4a.m4a", in);
in.close();
if (result)
{
Log.v("upload result", "succeeded");
//$$$$$$$$$$$$$$$$$$$$$$$$$$$$$Add the backup Here$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$//
// Now here you can store the file into a backup location
// Use ftpClient.rename(from, to) to place it in backup
//$$$$$$$$$$$$$$$$$$$$$$$$$$$$$Add the backup Here$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$//
}
con.logout();
con.disconnect();
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
There's no standard way to duplicate a remote file over FTP protocol. Some FTP servers support proprietary or non-standard extensions for this though.
So if your are lucky that your server is ProFTPD with mod_copy module, you can use FTP.sendCommand to issue these two commands:
f.sendCommand("CPFR sourcepath");
f.sendCommand("CPTO targetpath");
The second possibility is that your server allows you to execute arbitrary shell commands. This is even less common. If your server supports this you can use SITE EXEC command:
SITE EXEC cp -p sourcepath targetpath
Another workaround is to open a second connection to the FTP server and make the server upload the file to itself by piping a passive mode data connection to an active mode data connection. Implementation of this solution (in PHP though) is shown in FTP copy a file to another place in same FTP.
If neither of this works, all you can do is to download the file to a local temporary location and re-upload it back to the target location. This is that the answer by #RP- shows.
See also FTP copy a file to another place in same FTP.
To backup at same Server (move), can you use:
String source="/home/user/some";
String goal ="/home/user/someOther";
FTPFile[] filesFTP = cliente.listFiles(source);
clientFTP.changeWorkingDirectory(goal); // IMPORTANT change to final directory
for (FTPFile f : archivosFTP)
{
if(f.isFile())
{
cliente.rename(source+"/"+f.getName(), f.getName());
}
}

Categories

Resources