I have to archive the HDFS files frequently. The files have to be compressed in the Bunzip format using Java code. Now, what I did is the following:
Move the input files to a local location hdfs.moveToLocalFile
bzip using the bzip2 command.
Move the .bz2 files to the HDFS to another locationhdfs.moveFromLocalFile.
I'm using Hadoop 1.1.2 version. Is there any API available to bzip the files directly, without local copy and BZip?
Also now I'm using the linux shell command to BZip the files. Can somebody help me how to do the BZip command using Java code?
public void addFile(String source, String destination, Configuration paramConfiguration) throws IOException, URISyntaxException {
FileSystem localFileSystem = FileSystem.get(paramConfiguration);
String str1 = paramString1.substring(source.lastIndexOf('/') + 1, source.length());
if (destination.charAt(destination.length() - 1) != '/') {
destination = destination + "/" + str1;
} else {
destination = destination + str1;
}
BZip2Codec localBZip2Codec = new BZip2Codec();
String str2 = localBZip2Codec.getDefaultExtension();
Path localPath = new Path(paramString2 + str2);
CompressionOutputStream localCompressionOutputStream = localBZip2Codec.createOutputStream(localFileSystem.create(localPath));
IOUtils.copyBytes(localFileSystem.open(new Path(paramString1)), localCompressionOutputStream, 4096, true);
}
Related
I have this test:
#Test
void testHeader() {
String inputFile = ".\\src\\main\\resources\\binaryFile";
MDHeader addHeader = new MDHeader();
try (
InputStream inputStream = new FileInputStream(inputFile);
) {
long fileSize = new File(inputFile).length();
byte[] allBytes = new byte[(int) fileSize];
inputStream.read(allBytes);
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>("foo", allBytes);
ProducerRecord<String, byte[]> hdr = addHeader.addMDHeader(record);
for (Header header : hdr.headers()) {
assertEquals("mdpHeader", header.key());
}
}
catch(Exception e) {
assert (false);
}
}
The test succeeds when run locally through Eclipse on my Windows desktop but it fails at com.me.be.HeaderTests.testMDHeader(HeaderTests.java:81) when trying to build the jar on a Linux server. That's the line assert (false). I haven't got any more information on the issue yet but was wondering if it could be the backslashes in inputFile in a Linux environment?
Java on Windows and Linux will both accept / as path separator, whereas Linux does not like \\ as a path separator - so treats the whole string as ONE path component, not 4 parts as you'd expect:
String inputFile = "./src/main/resources/binaryFile";
However for file handling it is better to use java.nio.Path or java.io.File in place of String.
WINDOWS
jshell> Path.of("./src/main/resources/binaryFile")
$2 ==> .\src\main\resources\binaryFile
Linux
jshell> Path.of("./src/main/resources/binaryFile")
$1 ==> ./src/main/resources/binaryFile
You can also use Path.of without any file separator for any OS:
Path p = Path.of("src","main","resources","binaryFile");
The File.separator string is handy to concat into the path string in order to produce an OS independent file path.
String inputFile = "." + File.separator + "src" + File.separator + "main" + File.separator + "resources" + File.separator + "binaryFile";
Should give you a cross platform compliant file path.
I try to read a resource file from my application but it doesn't work.
Code:
String filename = getClass().getClassLoader().getResource("test.xsd").getFile();
System.out.println(filename);
File file = new File(filename);
System.out.println(file.exists());
Output when I execute the jar-file:
file:/C:/Users/username/Repo/run/Application.jar!/test.xsd
false
It works when I run the application from IntelliJ but not when I execute the jar-file. If I open my jar-file with 7-zip test.xsd is located in the root-folder. Why isn't the code working when I execute the jar-file?
Also, File refers to actual OS file-system files; in the OS's file-system, there is only a jar file, and that jar file is not a folder. You should either extract the contents of the URL to a temporary file, or operate with its bytes in-memory or as a stream.
Note that myURL.getFile() is returning a String representation, and not an actual File. In a similar way, this will not work:
File f = new URL("http://www.example.com/docs/resource1.html").getFile();
f.exists(); // always false - will not be found in the local filesystem
A nice wrapper could be the following:
public static File openResourceAsTempFile(ClassLoader loader, String resourceName)
throws IOException {
Path tmpPath = Files.createTempFile(null, null);
try (InputStream is = loader.getResourceAsStream(resourceName)) {
Files.copy(is, tmpPath, StandardCopyOption.REPLACE_EXISTING);
return tmpPath.toFile();
} catch (Exception e) {
if (Files.exists(tmpPath)) Files.delete(tmpPath);
throw new IOException("Could not create temp file '" + tmpPath
+ "' for resource '" + resourceName + "': " + e, e);
}
}
I need to process a high volume of resumes. And want to use this parser:
https://github.com/antonydeepak/ResumeParser
But you run it in powershell with the file to read and the output file.
But I do not know how to automate this, so it read a whole folder containing the resumes.
I know some Java, but cant open the code. Is scripinting in powershell the way to go?
Thanks!
> java -cp '.\bin\*;..\GATEFiles\lib\*;..\GATEFILES\bin\gate.jar;.\lib\*'
code4goal.antony.resumeparser.ResumeParserProgram <input_file> [output_file]
Either make a batch file from an edited directory listing, or write a program.
As this is stackoverflow:
So starting with the same classpath (-cp ...) you can run your own program
public void static main(String[] args) throws IOException {
File[] files = new File("C:/resumes").listFiles();
File outputDir = new File("C:/results");
outputDir.mkDirs();
if (files != null) {
for (File file : files) {
String path = file.getPath();
if (path.endsWith(".pdf")) {
String output = new File(outputDir,
file.getName().replaceFirst("\\.\\w+$", "") + ".json").getPath();
String[] params = {path, output);
ResumeParserProgram.main(params);
// For creating a batch file >x.bat
System.out.println("java -cp"
+ " '.\\bin\\*;..\\GATEFiles\lib\\*;"
+ "..\\GATEFILES\\bin\\gate.jar;.\\lib\\*'"
+ " code4goal.antony.resumeparser.ResumeParserProgram"
+ " \"" + path + "\" \"" + output + "\"");
}
}
}
}
Check that this works, that ResumeParserProgram.main is reenterable.
I'm having problems with my program on Windows, I included logging, so that I can find the specific cause of the issue. My program's JavaFX and to start it on windows I build it as .jar file.
I'm setting up a log4j FileAppender through program code, in the config-file (.../MyProject/data/configuration.txt) is the path where to have the log folder. On Mac OS X (debugged with Eclipse) everything is working fine.
If i'm starting the jar on Windows (.../MyProject/build/dist/MyProgram.jar) and see the configured log folder, i don't see any log file created. (I figured out that the config file than has to be under .../MyProject/build/dist/data/configuration.txt) If i write new subfolders to the log directory's path, the program creates them, but there's no file!
My code:
String computername = "";
try {
computername = InetAddress.getLocalHost().getHostName();
} catch (UnknownHostException e) {
e.printStackTrace();
}
int tid = (int)Thread.currentThread().getId();
PatternLayout playout = new PatternLayout("%d{yyyy-MM-dd'T'HH:mm:ss.SSSZ}; %p; %F:%L; " + computername + "; " + tid + "; [%t]; %m;%n");
SimpleDateFormat dt = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
Date datenow = new Date();
datenow.setTime(datenow.getTime());
FcManagerMain.formatteddate = dt.format(datenow);
try
{
File config = new File ("data/configuration.txt");
BufferedReader bufferedReader = new BufferedReader(new FileReader("data/configuration.txt"));
int count = 1;
String line = null;
while((line = bufferedReader.readLine()) != null)
{
if(count == 3)
{
FcManagerMain.logfolder = line;
}
count++;
}
if(isWindows())
{
File p = new File(FcManagerMain.logfolder + FcManagerMain.version);
p.mkdirs();
File pp = new File(FcManagerMain.logfolder + FcManagerMain.version + "\\" + FcManagerMain.formatteddate + ".log");
pp.createNewFile();
FileAppender fileAppender = new FileAppender(playout, pp.getAbsolutePath(), false);
loggerstatic.addAppender(fileAppender);
loggerstatic.setLevel(Level.ALL);
}
else
{
File p = new File(FcManagerMain.logfolder + FcManagerMain.version);
p.mkdirs();
File pp = new File(FcManagerMain.logfolder + FcManagerMain.version + "/" + FcManagerMain.formatteddate + ".log");
pp.createNewFile();
FileAppender fileAppender = new FileAppender(playout, pp.getAbsolutePath(), false);
loggerstatic.addAppender(fileAppender);
loggerstatic.setLevel(Level.ALL);
}
}
catch(Exception ee)
{
System.out.println(getStackTrace(ee));
}
loggerstatic is a static Logger instance (FcManagerController.loggerstatic) and every other class takes its logger from loggerstatic. I guess that's not correct, please tell me how to do!
EDIT: I have already tried different log locations to see if I don't have permission to write to that specific folders.
Thanks,
rapgru
I'd rather recommend you use the built-in configuration, i. e. adapt the log4j.properties.
Read the documentation, especially the usage of ${log} here:
# Set the name of the file
log4j.appender.FILE.File=${log}/log.out
You declare your own system variable by default and change it via jvm parameter when you start your app:
# log directory (default, 'log' directory relative to the execution path)
# you can override the default directory by setting the variable as VM option when you start the application
# example: -DMYAPP_LOG_DIR=/tmp/logs
MYAPP_LOG_DIR=./log
...
log4j.appender.debugfile.file=${MYAPP_LOG_DIR}/application.log
Then when you start your application and don't want the log file to be in the execution folder, just specify a different path via the -D option.
Here is what I wanna do:
Check if a folder exists
If it does not exists, create the folder
If it doest exists do nothing
At last create a file in that folder
Everything is working fine in Windows 7, but when I run the application in Ubuntu it doesn't create the folder, it is just creating the file with the folder name, example: (my file name is xxx.xml and the folder is d:\temp, so in Ubuntu the file is generated at d: with the name temp\xxx.xml). Here is my code:
File folder = new File("D:\\temp");
if (folder.exists() && folder.isDirectory()) {
} else {
folder.mkdir();
}
String filePath = folder + File.separator;
File file = new File(filePath + "xxx.xml");
StreamResult result = new StreamResult(file);
transformer.transform(source, result);
// more code here
Linux does not use drive letters (like D:) and uses forward slashes as file separator.
You can do something like this:
File folder = new File("/path/name/of/the/folder");
folder.mkdirs(); // this will also create parent directories if necessary
File file = new File(folder, "filename");
StreamResult result = new StreamResult(file);
You directory (D:\temp) is nos appropriate on Linux.
Please, consider using linux File System, and the File.SEPARATOR constant :
static String OS = System.getProperty("OS.name").toLowerCase();
String root = "/tmp";
if (OS.indexOf("win") >= 0) {
root="D:\\temp";
} else {
root="/";
}
File folder = new File(ROOT + "dir1" + File.SEPARATOR + "dir2");
if (folder.exists() && folder.isDirectory()) {
} else {
folder.mkdir();
}
Didn't tried it, but whould work.
D:\temp does not exists in linux systems (what I mean is it interprets it as if it were any other foldername)
In Linux systems the file seperator is / instead of \ as in case of Windows
so the solution is to :
File folder = new File("/tmp");
instead of
File folder = new File("D:\\temp");
Before Java 7 the File API has some possibilities to create a temporary file, utilising the operating system configuration (like temp files on a RAM disk). Since Java 7 use the utility functions class Files.
Consider both solutions using the getProperty static method of System class.
String os = System.getProperty("os.name");
if(os.indexOf("nix") >= 0 || os.indexOf("nux") >= 0 || os.indexOf("aix") > 0 ) // Unix
File folder = new File("/home/tmp");
else if(os.indexOf("win") >= 0) // Windows
File folder = new File("D:\\temp");
else
throw Exception("your message");
On Unix-like systems no logical discs. You can try create on /tmp or /home
Below code for create temp dirrectory in your home directory:
String myPathCandidate = System.getProperty("os.name").equals("Linux")? System.getProperty("user.home"):"D:\\";
System.out.println(myPathCandidate);
//Check write permissions
File folder = new File(myPathCandidate);
if (folder.exists() && folder.isDirectory() && folder.canWrite()) {
System.out.println("Create directory here");
} else {System.out.println("Wrong path");}
or, for /tmp - system temp dicecrory. Majority of users can write here:
String myPathCandidate = System.getProperty("os.name").equals("Linux")? System.getProperty("java.io.tmpdir"):"D:\\";
Since Java 7, you can use the Files utility class, with the new Path class. Note that exception handling has been omitted in the examples below.
// uses os separator for path/to/folder.
Path file = Paths.get("path","to","file");
// this creates directories in case they don't exist
Files.createDirectories(file.getParent());
if (!Files.exists(file)) {
Files.createFile(file);
}
StreamResult result = new StreamResult(file.toFile());
transformer.transform(source, result);
this covers the generic case, create a folder if it doesn't exist and a file on that folder.
In case you actually want to create a temporary file, as written in your example, then you just need to do the following:
// this create a temporary file on the system's default temp folder.
Path tempFile = Files.createTempFile("xxx", "xml");
StreamResult result = new StreamResult(Files.newOutputStream(file, CREATE, APPEND, DELETE_ON_CLOSE));
transformer.transform(source, result);
Note that with this method, the file name will not correspond exactly to the prefix you used (xxx, in this case).
Still, given that it's a temp file, that shouldn't matter at all. The DELETE_ON_CLOSE guarantees that the file will get deleted when closed.