How to use CompressionCodec in Hadoop

How to use CompressionCodec in Hadoop - java

I am doing following to do compression of o/p files from reducer:
OutputStream out = ipFs.create( new Path( opDir + "/" + fileName ) );
CompressionCodec codec = new GzipCodec();
OutputStream cs = codec.createOutputStream( out );
BufferedWriter cout = new BufferedWriter( new OutputStreamWriter( cs ) );
cout.write( ... )
But got null pointer exception in line 3:
java.lang.NullPointerException
at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
at org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
at myFile$myReduce.reduce(myFile.java:354)
I also got following JIRA for the same.
Can you please suggest if I am doing something wrong?

You should use the CompressionCodecFactory if you want to use compression outside of the standard OutputFormat handling (as detailed in #linker answer):
CompressionCodecFactory ccf = new CompressionCodecFactory(conf)
CompressionCodec codec = ccf.getCodecByClassName(GzipCodec.class.getName());
OutputStream compressedOutputSream = codec.createOutputStream(outputStream)

You're doing it wrong. The standard way to do this would be:
TextOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
The GzipCodec is a Configurable, you have to initialize it properly if you instantiate it directly (setConf, ...)
Try this and let me know if that works.

Related

How to read from PDF using Selenium webdriver and Java

I am trying to read the contents of a PDF file using Java-Selenium. Below is my code. getWebDriver is a custom method in the framework. It returns the webdriver.
URL urlOfPdf = new URL(this.getWebDriver().getCurrentUrl());
BufferedInputStream fileToParse = new BufferedInputStream(urlOfPdf.openStream());
PDFParser parser = new PDFParser((RandomAccessRead) fileToParse);
parser.parse();
String output = new PDFTextStripper().getText(parser.getPDDocument());
The second line of the code gives compile time error if I don't parse it to RandomAccessRead type.
And when I parse it, I get this run time error:
java.lang.ClassCastException: java.io.BufferedInputStream cannot be cast to org.apache.pdfbox.io.RandomAccessRead
I need help with getting rid of these errors.

First of, unless you want to interfere in the PDF loading process, there is no need to explicitly use the PdfParser class. You can instead use a static PDDocument.load method:
URL urlOfPdf = new URL(this.getWebDriver().getCurrentUrl());
BufferedInputStream fileToParse = new BufferedInputStream(urlOfPdf.openStream());
PDDocument document = PDDocument.load(fileToParse);
String output = new PDFTextStripper().getText(document);
Otherwise, if you do want to interfere in the loading process, you have to create a RandomAccessRead instance for your BufferedInputStream, you cannot simply cast it because the classes are not related.
You can do that like this
URL urlOfPdf = new URL(this.getWebDriver().getCurrentUrl());
BufferedInputStream fileToParse = new BufferedInputStream(urlOfPdf.openStream());
MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMainMemoryOnly();
ScratchFile scratchFile = new ScratchFile(memUsageSetting);
PDFParser parser;
try
{
RandomAccessRead source = scratchFile.createBuffer(fileToParse);
parser = new PDFParser(source);
parser.parse();
}
catch (IOException ioe)
{
IOUtils.closeQuietly(scratchFile);
throw ioe;
}
String output = new PDFTextStripper().getText(parser.getPDDocument());
(This essentially is copied and pasted from the source of PDDocument.load.)

Reading from property file containing utf 8 character

I am reading a property file which consists of a message in the UTF-8 character set.
Problem
The output is not in the appropriate format. I am using an InputStream.
The property file looks like
username=LBSUSER
password=Lbs#123
url=http://localhost:1010/soapfe/services/MessagingWS
timeout=20000
message=Spanish character are = {á é í, ó,ú ,ü, ñ, ç, å, Á, É, Í, Ó, Ú, Ü, Ñ, Ç, ¿, °, 4° año = cuarto año, €, ¢, £, ¥}
And I am reading the file like this,
Properties props = new Properties();
props.load(new FileInputStream("uinsoaptest.properties"));
String username = props.getProperty("username", "test");
String password = props.getProperty("password", "12345");
String url = props.getProperty("url", "12345");
int timeout = Integer.parseInt(props.getProperty("timeout", "8000"));
String messagetext = props.getProperty("message");
System.out.println("This is soap msg : " + messagetext);
The output of the above message is
You can see the message in the console after the line
{************************ SOAP MESSAGE TEST***********************}
I will be obliged if I can get any help reading this file properly. I can read this file with another approach but I am looking for less code modification.

Use an InputStreamReader with Properties.load(Reader reader):
FileInputStream input = new FileInputStream(new File("uinsoaptest.properties"));
props.load(new InputStreamReader(input, Charset.forName("UTF-8")));
As a method, this may resemble the following:
private Properties read( final Path file ) throws IOException {
final var properties = new Properties();
try( final var in = new InputStreamReader(
new FileInputStream( file.toFile() ), StandardCharsets.UTF_8 ) ) {
properties.load( in );
}
return properties;
}
Don't forget to close your streams. Java 7 introduced StandardCharsets.UTF_8.

Use props.load(new FileReader("uinsoaptest.properties")) instead. By default it uses the encoding Charset.forName(System.getProperty("file.encoding")) which can be set to UTF-8 with System.setProperty("file.encoding", "UTF-8") or with the commandline parameter -Dfile.encoding=UTF-8.

If somebody use #Value annotation, could try StringUils.
#Value("${title}")
private String pageTitle;
public String getPageTitle() {
return StringUtils.toEncodedString(pageTitle.getBytes(Charset.forName("ISO-8859-1")), Charset.forName("UTF-8"));
}

You should specify the UTF-8 encoding when you construct your FileInputStream object. You can use this constructor:
new FileInputStream("uinsoaptest.properties", "UTF-8");
If you want to make a change to your JVM so as to be able to read UTF-8 files by default, you will have to change the JAVA_TOOL_OPTIONS in your JVM options to something like this :
-Dfile.encoding=UTF-8

If anybody comes across this problem in Kotlin, like me:
The accepted solution of #Würgspaß works here as well. The corresponding Kotlin syntax:
Instead of the usual
val properties = Properties()
filePath.toFile().inputStream().use { stream -> properties.load(stream) }
I had to use
val properties = Properties()
InputStreamReader(FileInputStream(filePath.toFile()), StandardCharsets.UTF_8).use { stream -> properties.load(stream) }
With this, special UTF-8 characters are loaded correctly from the properties file given in filePath.

SVNKit to find diff between two files stored at separate locations with separate revision numbers

I am writing a Java program using the SVNKit API, and I need to use the correct class or call in the API that would allow me to find the diff between files stored in separate locations.
1st file:
https://abc.edc.xyz.corp/svn/di-edc/tags/ab-cde-fgh-axsym-1.0.0/src/site/apt/releaseNotes.apt
2nd file:
https://abc.edc.xyz.corp/svn/di-edc/tags/ab-cde-fgh-axsym-1.1.0/src/site/apt/releaseNotes.apt
I have used the listed API calls to generate the diff output, but I am unsuccessful so far.
DefaultSVNDiffGenerator diffGenerator = new DefaultSVNDiffGenerator();
diffGenerator.displayFileDiff("", file1, file2, "10983", "8971", "text", "text/plain", output);
diffClient.doDiff(svnUrl1, SVNRevision.create(10868), svnUrl2, SVNRevision.create(8971), SVNDepth.IMMEDIATES, false, System.out);
Can anyone provide guidance on the correct way to do this?

Your code looks correct. But prefer using the new API:
final SvnOperationFactory svnOperationFactory = new SvnOperationFactory();
try {
final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
final SvnDiffGenerator diffGenerator = new SvnDiffGenerator();
diffGenerator.setBasePath(new File(""));
final SvnDiff diff = svnOperationFactory.createDiff();
diff.setSources(SvnTarget.fromURL(url1, svnRevision1), SvnTarget.fromURL(url2, svnRevision1));
diff.setDiffGenerator(diffGenerator);
diff.setOutput(byteArrayOutputStream);
diff.run();
} finally {
svnOperationFactory.dispose();
}

Saving a Hashset to a file in Java

I know this question has been asked a million times and I have seen a million solutions but none that work for me. I have a hashet that I want to write to a file but I want each element in the Hashset in a separate line.
Here is my code:
Collection<String> similar4 = new HashSet<String>(file268List);
Collection <String> different4 = new HashSet<String>();
different4.addAll(file268List);
different4.addAll(sqlFileList);
similar4.retainAll(sqlFileList);
different4.removeAll(similar4);
Iterator hashSetIterator = different.iterator();
while(hashSetIterator.hasNext()){
System.out.println(hashSetIterator.next());
}
ObjectOutputStream writer = new ObjectOutputStream(new FileOutputStream("HashSet.txt"));
while(hashSetIterator.hasNext()){
Object o = hashSetIterator.next();
writer.writeObject(o);
}

Where you got it wrong is that you are trying to serialize the strings instead of just printing them to the file, exactly the same way you print them to the screen:
PrintStream out = new PrintStream(new FileOutputStream("HashSet.txt")));
Iterator hashSetIterator = different.iterator();
while(hashSetIterator.hasNext()){
out.println(hashSetIterator.next());
}

ObjectOutputStream will try to serialize the String as an object (binary format). I think you you want to use a PrintWriter instead. Example:
PrintWriter writer= new PrintWriter( new OutputStreamWriter( new FileOutputStream( "HashSet.txt"), "UTF-8" ));
while(hashSetIterator.hasNext()) {
String o = hashSetIterator.next();
writer.println(o);
}
Note that per this answer and the answer from Marko, you can use PrintStream or PrintWriter to output strings (characters). There is little difference between the two, but be sure to specify a character encoding if you work with non standard characters or need to read/write files across different platforms.

How to get output of groovy script in java

I am executing groovy script in java:
final GroovyClassLoader classLoader = new GroovyClassLoader();
Class groovy = classLoader.parseClass(new File("script.groovy"));
GroovyObject groovyObj = (GroovyObject) groovy.newInstance();
groovyObj.invokeMethod("main", null);
this main method println some information which I want to save in some variable. How can I do it ?

You would have to redirect System.out into something else..
Of course, if this is multi-threaded, you're going to hit issues
final GroovyClassLoader classLoader = new GroovyClassLoader();
Class groovy = classLoader.parseClass(new File("script.groovy"));
GroovyObject groovyObj = (GroovyObject) groovy.newInstance();
ByteArrayOutputStream buffer = new ByteArrayOutputStream() ;
PrintStream saveSystemOut = System.out ;
System.setOut( new PrintStream( buffer ) ) ;
groovyObj.invokeMethod("main", null);
System.setOut( saveSystemOut ) ;
String output = buffer.toString().trim() ;
It's probably better (if you can) to write our scripts so they return something rather than dump to system.out

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to use CompressionCodec in Hadoop - java

You're doing it wrong. The standard way to do this would be: TextOutputFormat.setOutputCompressorClass(job, GzipCodec.class); The GzipCodec is a Configurable, you have to initialize it properly if you instantiate it directly (setConf, ...) Try this and let me know if that works.

Related

How to read from PDF using Selenium webdriver and Java

Reading from property file containing utf 8 character

SVNKit to find diff between two files stored at separate locations with separate revision numbers

Saving a Hashset to a file in Java

How to get output of groovy script in java

Categories

Resources