I'm having trouble to append data to an existing file in HDFS. I want that if the file exists then append a line, if not, create a new file with the name given.
Here's my method to write into HDFS.
if (!file.exists(path)){
file.createNewFile(path);
}
FSDataOutputStream fileOutputStream = file.append(path);
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(fileOutputStream));
br.append("Content: " + content + "\n");
br.close();
Actually this method writes into HDFS and create a file but as I mention is not appending.
This is how I test my method:
RunTimeCalculationHdfsWrite.hdfsWriteFile("RunTimeParserLoaderMapperTest2", "Error message test 2.2", context, null);
The first param is the name of the file, the second the message and the other two params are not important.
So anyone have an idea what I'm missing or doing wrong?
Actually, you can append to a HDFS file:
From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.
I checked HDFS sources, there is DistributedFileSystem#append method:
FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException
For details, see presentation.
Also you can append through command line:
hdfs dfs -appendToFile <localsrc> ... <dst>
Add lines directly from stdin:
echo "Line-to-add" | hdfs dfs -appendToFile - <dst>
Solved..!!
Append is supported in HDFS.
You just have to do some configurations and simple code as shown below :
Step 1: set dfs.support.append as true in hdfs-site.xml :
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
Stop all your daemon services using stop-all.sh and restart it again using start-all.sh
Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :
Through command line :
./hdfs dfs -setrep -R 1 filepath/directory
Or you can do the same at run time through java code:
fsShell.setrepr((short) 1, filePath);
Step 3 : Code for Creating/appending data into the file :
public void createAppendHDFS() throws IOException {
Configuration hadoopConfig = new Configuration();
hadoopConfig.set("fs.defaultFS", hdfsuri);
FileSystem fileSystem = FileSystem.get(hadoopConfig);
String filePath = "/test/doc.txt";
Path hdfsPath = new Path(filePath);
fShell.setrepr((short) 1, filePath);
FSDataOutputStream fileOutputStream = null;
try {
if (fileSystem.exists(hdfsPath)) {
fileOutputStream = fileSystem.append(hdfsPath);
fileOutputStream.writeBytes("appending into file. \n");
} else {
fileOutputStream = fileSystem.create(hdfsPath);
fileOutputStream.writeBytes("creating and writing into file\n");
}
} finally {
if (fileSystem != null) {
fileSystem.close();
}
if (fileOutputStream != null) {
fileOutputStream.close();
}
}
}
Kindly let me know for any other help.
Cheers.!!
HDFS does not allow append operations. One way to implement the same functionality as appending is:
Check if file exists.
If file doesn't exist, then create new file & write to new file
If file exists, create a temporary file.
Read line from original file & write that same line to temporary file (don't forget the newline)
Write the lines you want to append to the temporary file.
Finally, delete the original file & move(rename) the temporary file to the original file.
Related
I have below code where i am reading the file from particular directory, processing it and once processed i am moving the file to archive directory. This is working fine. I am receiving new file everyday and i am using Control-M scheduler job to run this process.
Now in next run i am reading the new file from that particularly directory again and checking this file with the file in the archive directory and if the content is different then only process the file else dont do anything. There is shell script written to do this job and we dont see any log for this process.
Now i want to produce log message in my java code if the files are identical from the particular directory and in the archive directory then generate log that 'files are identical'. But i dont know exactly how to do this. I dont want to write the the logic to process or move anything in the file ..i just need to check the files are equal and if it is then
produce log message. The file which i recieve are not very big and the max size can be till 10MB.
Below is my code:
for(Path inputFile : pathsToProcess) {
// read in the file:
readFile(inputFile.toAbsolutePath().toString());
// move the file away into the archive:
Path archiveDir = Paths.get(applicationContext.getEnvironment().getProperty(".archive.dir"));
Files.move(inputFile, archiveDir.resolve(inputFile.getFileName()),StandardCopyOption.REPLACE_EXISTING);
}
return true;
}
private void readFile(String inputFile) throws IOException, FileNotFoundException {
log.info("Import " + inputFile);
try (InputStream is = new FileInputStream(inputFile);
Reader underlyingReader = inputFile.endsWith("gz")
? new InputStreamReader(new GZIPInputStream(is), DEFAULT_CHARSET)
: new InputStreamReader(is, DEFAULT_CHARSET);
BufferedReader reader = new BufferedReader(underlyingReader)) {
if (isPxFile(inputFile)) {
Importer.processField(reader, tablenameFromFilename(inputFile));
} else {
Importer.processFile(reader, tablenameFromFilename(inputFile));
}
}
log.info("Import Complete");
}
}
Based on the limited information about the size of file or performance needs, something like this can be done. This may not be 100% optimized, but just an example. You may also have to do some exception handling in the main method, since the new method might throw an IOException:
import org.apache.commons.io.FileUtils; // Add this import statement at the top
// Moved this statement outside the for loop, as it seems there is no need to fetch the archive directory path multiple times.
Path archiveDir = Paths.get(applicationContext.getEnvironment().getProperty("betl..archive.dir"));
for(Path inputFile : pathsToProcess) {
// Added this code
if(checkIfFileMatches(inputFile, archiveDir); {
// Add the logger here.
}
//Added the else condition, so that if the files do not match, only then you read, process in DB and move the file over to the archive.
else {
// read in the file:
readFile(inputFile.toAbsolutePath().toString());
Files.move(inputFile, archiveDir.resolve(inputFile.getFileName()),StandardCopyOption.REPLACE_EXISTING);
}
}
//Added this method to check if the source file and the target file contents are same.
// This will need an import of the FileUtils class. You may change the approach to use any other utility file, or read the data byte by byte and compare. If the files are very large, probably better to use Buffered file reader.
private boolean checkIfFileMatches(Path sourceFilePath, Path targetDirectoryPath) throws IOException {
if (sourceFilePath != null) { // may not need this check
File sourceFile = sourceFilePath.toFile();
String fileName = sourceFile.getName();
File targetFile = new File(targetDirectoryPath + "/" + fileName);
if (targetFile.exists()) {
return FileUtils.contentEquals(sourceFile, targetFile);
}
}
return false;
}
i want to store uploaded file in a specific location in java. if i upload a.pdf then i want it to store this at "/home/rahul/doc/upload/". i went through some questions and answers of stack overflow but i am not satisfied with solutions.
i am working with Play Framework 2.1.2. i am not working with servlet.
i am uploading but it is storing file into temp directory but i want that file store into a folder as not a temp file i want that file like a.pdf in folder not like temp file.
public static Result upload() {
MultipartFormData body = request().body().asMultipartFormData();
FilePart filePart1 = body.getFile("filePart1");
File newFile1 = new File("path in computer");
File file1 = filePart1.getFile();
InputStream isFile1 = new FileInputStream(file1);
byte[] byteFile1 = IOUtils.toByteArray(isFile1);
FileUtils.writeByteArrayToFile(newFile1, byteFile1);
isFile1.close();
}
but i am not satisfied with this solution and i am uploading multiple doc files.
for eg. i upload one doc ab.docx then after upload it is storing temp directory and file is this:
and it's location is this: /tmp/multipartBody5886394566842144137asTemporaryFile
but i want this: /upload/ab.docx
tell me some solution to fix this.
Everything's correct as a last step you need to renameTo the temporary file into your upload folder, you don't need to play around the streams it's as simple as:
public static Result upload() {
Http.MultipartFormData body = request().body().asMultipartFormData();
FilePart upload = body.getFile("picture");
if (upload != null) {
String targetPath = "/your/target/upload-dir/" + upload.getFilename();
upload.getFile().renameTo(new File(targetPath));
return ok("File saved in " + targetPath);
} else {
return badRequest("Something Wrong");
}
}
BTW you should implement some checking if targetPath doesn't exist to prevent errors and/or overwrites. Typical approach is incrementing the file name if file with the same name already exists, for an example sending a.pdf three times should save the files as a.pdf, a_01.pdf, a_02.pdf, etc.
i just completed it. My solution is working fine.
My solution of uploading multiple files is :
public static Result up() throws IOException{
MultipartFormData body = request().body().asMultipartFormData();
List<FilePart> resourceFiles=body.getFiles();
InputStream input;
OutputStream output;
File part1;
String prefix,suffix;
for (FilePart picture:resourceFiles) {
part1 =picture.getFile();
input= new FileInputStream(part1);
prefix = FilenameUtils.getBaseName(picture.getFilename());
suffix = FilenameUtils.getExtension(picture.getFilename());
part1=new File("/home/rahul/Documents/upload",prefix+"."+suffix);
part1.createNewFile();
output = new FileOutputStream(part1);
IOUtils.copy(input, output);
Logger.info("Uploaded file successfully saved in " + part1.getAbsolutePath());
}
I'm using a library which wants a File() as an argument.
The file I want to pass it is one I want to package with my app, as part of the .jar
Is there any way to convert the JarEntry that I get from within my .jar to a File object I can pass?
If not and I have to copy the resource to disk temporarily, where's the best place to put the temporary file?
Thanks.
You cannot get a path to a file within a JARFile, only a stream, so you should extract it to the temporary directory and then pass that extracted file.
Here's a function I wrote to do this when I provided a db with a jar previously.
/**
* This method is responsible for extracting resource files from within the .jar to the temporary directory.
* #param filePath The filepath relative to the 'Resources/' directory within the .jar from which to extract the file.
* #return A file object to the extracted file
**/
public File extract(String filePath)
{
try
{
File f = File.createTempFile(filePath, null);
FileOutputStream resourceOS = new FileOutputStream(f);
byte[] byteArray = new byte[1024];
int i;
InputStream classIS = getClass().getClassLoader().getResourceAsStream("Resources/"+filePath);
//While the input stream has bytes
while ((i = classIS.read(byteArray)) > 0)
{
//Write the bytes to the output stream
resourceOS.write(byteArray, 0, i);
}
//Close streams to prevent errors
classIS.close();
resourceOS.close();
return f;
}
catch (Exception e)
{
System.out.println("An error has occurred while extracting the database. This may mean the program is unable to have any database interaction, please contact the developer.\nError Description:\n"+e.getMessage());
return null;
}
}
A File represents a real entry in the filesystem; a JarEntry doesn't exist on the file system. The mapping won't be there unless you extract the JAR entry to an actual file.
You can create a temp file using File.createTempFile. More details are available at this SO answer.
How can I combine all txt files in a folder into a single file? A folder usually contains hundreds to thousands of txt files.
If this program were only to be run on windows machines I would just go with a batch file containing something like
copy /b *.txt merged.txt
But that is not the case, so I figured it might be easier to just write it in Java to complement everything else we have.
I have written something like this
// Retrieves a list of files from the specified folder with the filter applied
File[] files = Utils.filterFiles(downloadFolder + folder, ".*\\.txt");
try
{
// savePath is the path of the output file
FileOutputStream outFile = new FileOutputStream(savePath);
for (File file : files)
{
FileInputStream inFile = new FileInputStream(file);
Integer b = null;
while ((b = inFile.read()) != -1)
outFile.write(b);
inFile.close();
}
outFile.close();
}
catch (Exception e)
{
e.printStackTrace();
}
But it takes several minutes to combine thousands of files so it is not feasible.
Use NIO, it is much easier than using inputstreams/outputstreams. Note: uses Guava's Closer, which means all resources are safely closed; even better would be to use Java 7 and try-with-resources.
final Closer closer = Closer.create();
final RandomAccessFile outFile;
final FileChannel outChannel;
try {
outFile = closer.register(new RandomAccessFile(dstFile, "rw"));
outChannel = closer.register(outFile.getChannel());
for (final File file: filesToCopy)
doWrite(outChannel, file);
} finally {
closer.close();
}
// doWrite method
private static void doWrite(final WriteableByteChannel channel, final File file)
throws IOException
{
final Closer closer = Closer.create();
final RandomAccessFile inFile;
final FileChannel inChannel;
try {
inFile = closer.register(new RandomAccessFile(file, "r"));
inChannel = closer.register(inFile.getChannel());
inChannel.transferTo(0, inChannel.size(), channel);
} finally {
closer.close();
}
}
Because of this
Integer b = null;
while ((b = inFile.read()) != -1)
outFile.write(b);
Your OS is making a lot of IO calls. read() only reads one byte of data. Use the other read methods that accept a byte[]. You can then use that byte[] to write to your OutputStream. Similarly write(int) does an IO call writing a single byte. Change that too.
Of course, you can look into tools that do this for you, like Apache Commons IO or even the Java 7 NIO package.
Try using BufferedReader and BufferedWriter instead of writing bytes one by one.
You can use IoUtils to merge files,IoUtils.copy() method will help you for merging files.
This link may be useful merging file in java
I would do it this way !
check for the OS
System.getProperty("os.name")
Run the System Level command from Java.
If windows
copy /b *.txt merged.txt
if Unix
cat *.txt > merged.txt
or whatever best System level command available.
Java
The code below was written to read all files in, and send the data to another method (setOutput()), and then call a method to rename the last read file to another directory, and then delete the original. Everything seems to work up until the smdrCleanup() method is called. The renameTo() is failing.
From what I understand, if a FileReader is wrapped in a BufferedReader, I only need to call BufferedReader.close() to release the last read file... which I am doing here.
I have also seen where if the file were still "open", being scanned by anti-virus programs, or otherwise locked by a process, the renameTo() function would fail. I have used Process Explorer to review what may have it locked, and I don't see anything locking it.
I have my method setup to throw any kind of IOExceptions, but I am not getting any exceptions. Everything runs, but the console merely says that the file was not copied.
I am running Eclipse Helios Release 2, Windows 7 Ultimate, local administrator, UAC disabled.
Any help would be greatly appreciated.
public void smdrReader(String path, String oldPath) throws IOException
{
output = null; //nullify the value of output to avoid duplicate data
File folder = new File(path); //setting the directory for raw data files
File[] listOfFiles = folder.listFiles(); //array of files within the folder/directory
//For loop to iterate through the available files, open, & read contents to String Buffer.
for (int i = 0; i < listOfFiles.length; i++)
{
if (listOfFiles[i].isFile()) //verifying next entry in array is a file
{
File fileName = new File(listOfFiles[i].getName());//getting file name from array iteration
StringBuffer fileData = new StringBuffer(2048);//establishing StringBuffer for reading file contents into
BufferedReader reader = new BufferedReader(new FileReader(path + fileName));//reader to actually access/read file
String readData = String.valueOf(reader);//String variable being set to value of the reader
fileData.append(readData);//appending data from String variable into StringBuffer variable
setOutput(fileData);//setting the value of "output" to the value of StringBuffer
fileData.delete(0, i);
reader.close();//closing the reader (closing the file)
smdrCleanup(oldPath,fileName.toString());//calling method to move processed file and delete original
}
}
}
//method to rename input file into another directory and delete the original file to avoid duplicate processing
public void smdrCleanup(String oldPathPassed, String fileNamePassed) throws IOException
{
File oldFile = new File(oldPathPassed);//establishing File object for path to processed folder
File fileName = new File(fileNamePassed);//establishing File object for the fileName to rename/delete
String oldFilePath = oldFile.toString();
boolean success = fileName.renameTo(new File(oldFilePath + "\\" + fileName + ".old"));//attempting to rename file
if (!success) //checking the success of the file rename operation
{
System.out.println("The File Was NOT Successfully Moved!");//reporting error if the file cannot be renamed
}
else
{
fileName.delete();//deleting the file if it was successfully renamed
}
}
oldFile.toString(); returns the full path of the file including its file name, so if your old file path is c:\path\to\file.txt, oldFilePath + "\\" + fileName + ".old" will be c:\path\to\file.txt\file.txt.old.
Since there is no folder c:\path\to\file.txt, it fails. change it to
boolean success = fileName.renameTo(new File(oldFilePath + ".old"));
And you should be good to go.
File.renameTo can fail for any number of reasons:
Many aspects of the behavior of this
method are inherently
platform-dependent: The rename
operation might not be able to move a
file from one filesystem to another,
it might not be atomic, and it might
not succeed if a file with the
destination abstract pathname already
exists. The return value should always
be checked to make sure that the
rename operation was successful.
But there's not much feedback on why it fails. Before calling renameTo, verify that the file you're moving exists, and the parent directory you are moving it to also exists, and canWrite(). Are these on the same disk volume? If not, it might fail.
*EDIT: Code sample added *
Try something like the following. Modifications:
Accepts File objects instead of Strings
Uses 2-arg File constructor to create a child File object in a parent directory
Better error checking
This might give you some clues into what is failing.
public void smdrCleanup(File oldPathPassed, File fileNamePassed) throws IOException {
if (!oldPathPassed.exists() || !oldPathPassed.isDirectory() || !oldPathPassed.canWrite() ) throw new IOException("Dest is not a writable directory: " + oldPathPassed);
if (!fileNamePassed.exists()) throw new IOException("File does not exist: " + fileNamePassed);
final File dest = new File(oldPathPassed, fileNamePassed + ".old");
if (dest.exists()) throw new IOException("File already exists: " + dest);
boolean success = (fileNamePassed).renameTo(dest);//attempting to rename file
if (!success) //checking the success of the file rename operation
{
throw new IOException("The File Was NOT Successfully Moved!");
} else {
// file was successfully renamed, no need to delete
}
}