HI all i am trying to use this apache common exec, using this i am trying to create and write to a file.
the command line argument to write to a file is follows
Example: PDFAnnotator.exe "C:\My Documents\Test.pdf"
I have tried the following
public PrintResultHandler print(final File file, final long printJobTimeout, final boolean printInBackground)
throws IOException {
int exitValue;
ExecuteWatchdog watchdog = null;
PrintResultHandler resultHandler;
// build up the command line to using a 'java.io.File'
CommandLine commandLine = new CommandLine("C:\\Program Files (x86)\\Adobe\\Reader 11.0\\Reader\\AcroRd32.exe");
//CommandLine cmdLine = new CommandLine("C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe");
Map map = new HashMap();
map.put("file", new File("C:\\test\\invoice.pdf"));
commandLine.addArgument("/p");
commandLine.addArgument("/h");
commandLine.addArgument("${file}");
// create the executor and consider the exitValue '1' as success
final Executor executor = new DefaultExecutor();
executor.setExitValue(1);
// create a watchdog if requested
if (printJobTimeout > 0) {
watchdog = new ExecuteWatchdog(printJobTimeout);
executor.setWatchdog(watchdog);
}
// pass a "ExecuteResultHandler" when doing background printing
if (printInBackground) {
System.out.println("[print] Executing non-blocking print job ...");
resultHandler = new PrintResultHandler(watchdog);
executor.execute(commandLine, (Map<String, String>) resultHandler);
}
else {
System.out.println("[print] Executing blocking print job ...");
exitValue = executor.execute(commandLine);
resultHandler = new PrintResultHandler(exitValue);
}
return resultHandler;
}
it does not create any pdf file as an output can you please suggest.
It seems this code has been modified from the Apache Commons Exec tutorial code. There are a couple of modifications to the code it seems you have made which have caused problems.
Firstly, you have deleted the line
commandLine.setSubstitutionMap(map);
Without this line, you are creating the variable map, putting a single value into this map and then doing nothing further with it. Clearly, having a map that you never read any values out of achieves nothing. Reinstate this line, it's important.
The other problem is the line
executor.execute(commandLine, (Map<String, String>) resultHandler);
The difference between this code and the tutorial code is that you have added the cast to Map<String, String>. resultHandler is a PrintResultHandler, but this class does not implement Map<String, String> so this cast will fail.
I don't see why you have the cast at all. Get rid of it to leave you with:
executor.execute(commandLine, resultHandler);
If your code continues not to work, then I can't say what the reasons would be. Maybe the Adode Reader executable isn't where you think it is, maybe the file doesn't exist or doesn't have read permissions. In any case, suitable details should be written to standard output or standard error to help you further diagnose the problem.
Related
I am using java and Apache Velocity 1.7 to evaluate template
Following is sample code:
public void internalEvaluate(Map<String, Object> customContext, String templateText) throws IOException {
// add custom context to VelocityContext
VelocityContext context = new VelocityContext();
for (Map.Entry<String, Object> entry : customContext.entrySet()) {
context.put(entry.getKey(), entry.getValue());
}
// define writer
StringWriter output = new StringWriter();
// define logTag
String logTag = "TestVTL";
// check input template text
if (templateText == null)
templateText = "$noDescription";
Velocity.evaluate(context, output, logTag, templateText);
// write output to file
saveToFile(out);
}
However, specific customContext or templateText may make a large output.
The output can be created as file but cannot be opened by editor.
Below are my questions
Q1.
I would like to limit or check size of output at runtime (or before calling evaluate()) and throw warning message about creating too large file.
Does Velocity provide configure or Api to do something like this?
Q2.
Evaluation process may take a long time.
I would like to know progress status in velocity evaluation process.
Is it possible to get progress information?
Best regards,
Since Velocity only sees a Writer class, it has no mean of counting the output size.
Your best option would be to implement a CustomStringWriter class that will throw an exception when its internal size has reached a certain limit.
I'm working on creating a framework to allow customers to create their own plugins to my software built on Apache Flink. I've outlined in a snippet below what I'm trying to get working (just as a proof of concept), however I'm getting a org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. error when trying to upload it.
I want to be able to branch the input stream into x number of different pipelines, then having those combine together into a single output. What I have below is just my simplified version I'm starting with.
public class ContentBase {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kf-service:9092");
properties.setProperty("group.id", "varnost-content");
// Setup up execution environment and get stream from Kafka
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> logs = see.addSource(new FlinkKafkaConsumer011<>("log-input",
new JSONKeyValueDeserializationSchema(false), properties).setStartFromLatest())
.map((MapFunction<ObjectNode, ObjectNode>) jsonNodes -> (ObjectNode) jsonNodes.get("value"));
// Create a new List of Streams, one for each "rule" that is being executed
// For now, I have a simple custom wrapper on flink's `.filter` function in `MyClass.filter`
List<String> codes = Arrays.asList("404", "200", "500");
List<DataStream<ObjectNode>> outputs = new ArrayList<>();
for (String code : codes) {
outputs.add(MyClass.filter(logs, "response", code));
}
// It seemed as though I needed a seed DataStream to union all others on
ObjectMapper mapper = new ObjectMapper();
ObjectNode seedObject = (ObjectNode) mapper.readTree("{\"start\":\"true\"");
DataStream<ObjectNode> alerts = see.fromElements(seedObject);
// Union the output of each "rule" above with the seed object to then output
for (DataStream<ObjectNode> output : outputs) {
alerts.union(output);
}
// Convert to string and sink to Kafka
alerts.map((MapFunction<ObjectNode, String>) ObjectNode::toString)
.addSink(new FlinkKafkaProducer011<>("kf-service:9092", "log-output", new SimpleStringSchema()));
see.execute();
}
}
I can't figure out how to get the actual error out of the Flink web interface to add that information here
There were a few errors I found.
1) A Stream Execution Environment can only have one input (apparently? I could be wrong) so adding the .fromElements input was not good
2) I forgot all DataStreams are immutable so the .union operation creates a new DataStream output.
The final result ended up being much simpler
public class ContentBase {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kf-service:9092");
properties.setProperty("group.id", "varnost-content");
// Setup up execution environment and get stream from Kafka
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> logs = see.addSource(new FlinkKafkaConsumer011<>("log-input",
new JSONKeyValueDeserializationSchema(false), properties).setStartFromLatest())
.map((MapFunction<ObjectNode, ObjectNode>) jsonNodes -> (ObjectNode) jsonNodes.get("value"));
// Create a new List of Streams, one for each "rule" that is being executed
// For now, I have a simple custom wrapper on flink's `.filter` function in `MyClass.filter`
List<String> codes = Arrays.asList("404", "200", "500");
List<DataStream<ObjectNode>> outputs = new ArrayList<>();
for (String code : codes) {
outputs.add(MyClass.filter(logs, "response", code));
}
Optional<DataStream<ObjectNode>> alerts = outputs.stream().reduce(DataStream::union);
// Convert to string and sink to Kafka
alerts.map((MapFunction<ObjectNode, String>) ObjectNode::toString)
.addSink(new FlinkKafkaProducer011<>("kf-service:9092", "log-output", new SimpleStringSchema()));
see.execute();
}
}
The code you post cannot be compiled through because of the last part code (i.e., converting to string). You mixed up the java stream API map with Flink one. Change it to
alerts.get().map(ObjectNode::toString);
can fix it.
Good luck.
I have HBase code that I use for gets (Although I don't have Kerberos on, I plan to have it later so I wanted to make sure that user credentials were handled correctly when connecting and doing a Put or Get).
final ByteArrayOutputStream bos = new ByteArrayOutputStream();
MyHBaseService.getUserHBase().runAs(new PrivilegedExceptionAction<Object>() {
#Override
public Object run() throws Exception {
Connection connection = null;
Table StorageTable = null;
List<hFile> HbaseDownload = new ArrayList<>();
try {
// Open an HBase Connection
connection = ConnectionFactory.createConnection(MyHBaseService.getHBaseConfiguration());
Get get = new Get(Bytes.toBytes("filenameCell"));
Result result = table.get(get);
byte[] data = result.getValue(Bytes.toBytes(MyHBaseService.getDataStoreFamily()), Bytes.toBytes(MyHBaseService.getDataStoreQualifier()));
bos.write(data, 0, data.length);
bos.flush();
...
}
});
// now get the outputstream.
// I am assuming byteArrayStream is synchronized and thread-safe.
return bos.toByteArray();
However, I wasn't sure if this was running an asynchronous or synchronous thread.
The problem:
I use:
Get get = new Get(Bytes.toBytes("filenameCell"));
Result result = table.get(get);
Inside this run() function. But to get information OUT of the run() thread I use a new ByteOutputArrayStream OUTSIDE the run(). ByteOutputArrayStream.write & ByteOutputArrayStream.flush inside the run(). Then toByteArray() to get the binary bytes of the HBase content out of the function. This causes null bytes to be returned though, so maybe I'm not doing this right.
However, I am having difficulty finding good examples of HBase Java API to do these things and no one seems to use runAs like I do. It's so strange.
I have HBase 1.2.5 client running inside a Web App (request-based function calls).
Here in this code the thread is running inside "MyHBaseService.getUserHBase().runAs" this. But if it is running asyncronously then before executing it properly program will return "bos.toByteArray();" as this is outside the runAs(). So before even executing the complete function it will return the output.
I think thats the reason of null values.
I am exploring Spark Streaming using Java.
I current have Cloudera quick start VM (CDH 5.5) downloaded and I wrote a Java code for Spark streaming
I have written a program which returns JavaPairDStream. When I try to write the output to HDFS, it works but it is creating multiple folders (based on timestamp). The documentation says that this is how it is going to work, but is there a way to write the output to the same folder/file in HDFS? I tried to use repartition(1), but that did not work
Please see the code below:
if (args.length < 3) {
System.err.println("Invalid arguments");
System.exit(1);
}
SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("Product Reco Spark Streaming");
JavaStreamingContext javaStreamContext = new JavaStreamingContext(sparkConf, new Duration(10000));
String inputFile = args[0];
String outputPath = args[1];
String outputFile = args[2];
JavaDStream<String> dStream = javaStreamContext.textFileStream(inputFile);
JavaPairDStream<String, String> finalDStream = fetchProductRecommendation(dStream); // Does some logic to get the final DStream
finalDStream.print();
finalDStream.repartition(1).saveAsNewAPIHadoopFiles(outputPath, outputFile, String.class, String.class, TextOutputFormat.class);
javaStreamContext.start();
javaStreamContext.awaitTermination();
To run this program, here is the command that I am using
spark-submit --master local /home/cloudera/Spark/JarLib_ProductRecoSparkStream.jar /user/ProductRecomendations/SparkInput/ /user/ProductRecomendations/SparkOutput/ productRecoOutput
Please let me know if you need more information since this is the first time I am writing a spark stream code.
I want to read data from FTP Server.I am providing path of the file which resides on FTP server in the format ftp://Username:Password#host/path.
When I use map reduce program to read data from file it works fine. I want to read data from same file through Cascading framework. I am using Hfs tap of cascading framework to read data. It throws following exception
java.io.IOException: Stream closed
at org.apache.hadoop.fs.ftp.FTPInputStream.close(FTPInputStream.java:98)
at java.io.FilterInputStream.close(Unknown Source)
at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:168)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:254)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:440)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Below is the code of cascading framework from where I am reading the files:
public class FTPWithHadoopDemo {
public static void main(String args[]) {
Tap source = new Hfs(new TextLine(new Fields("line")), "ftp://user:pwd#xx.xx.xx.xx//input1");
Tap sink = new Hfs(new TextLine(new Fields("line1")), "OP\\op", SinkMode.REPLACE);
Pipe pipe = new Pipe("First");
pipe = new Each(pipe, new RegexSplitGenerator("\\s+"));
pipe = new GroupBy(pipe);
Pipe tailpipe = new Every(pipe, new Count());
FlowDef flowDef = FlowDef.flowDef().addSource(pipe, source).addTailSink(tailpipe, sink);
new HadoopFlowConnector().connect(flowDef).complete();
}
}
I tried to look in Hadoop Source code for the same exception. I found that in the MapTask class there is one method runOldMapper which deals with stream. And in the same method there is finally block where stream gets closed (in.close()). When I remove that line from finally block it works fine. Below is the code:
private <INKEY, INVALUE, OUTKEY, OUTVALUE> void runOldMapper(final JobConf job, final TaskSplitIndex splitIndex,
final TaskUmbilicalProtocol umbilical, TaskReporter reporter)
throws IOException, InterruptedException, ClassNotFoundException {
InputSplit inputSplit = getSplitDetails(new Path(splitIndex.getSplitLocation()), splitIndex.getStartOffset());
updateJobWithSplit(job, inputSplit);
reporter.setInputSplit(inputSplit);
RecordReader<INKEY, INVALUE> in = isSkipping()
? new SkippingRecordReader<INKEY, INVALUE>(inputSplit, umbilical, reporter)
: new TrackedRecordReader<INKEY, INVALUE>(inputSplit, job, reporter);
job.setBoolean("mapred.skip.on", isSkipping());
int numReduceTasks = conf.getNumReduceTasks();
LOG.info("numReduceTasks: " + numReduceTasks);
MapOutputCollector collector = null;
if (numReduceTasks > 0) {
collector = new MapOutputBuffer(umbilical, job, reporter);
} else {
collector = new DirectMapOutputCollector(umbilical, job, reporter);
}
MapRunnable<INKEY, INVALUE, OUTKEY, OUTVALUE> runner = ReflectionUtils.newInstance(job.getMapRunnerClass(),
job);
try {
runner.run(in, new OldOutputCollector(collector, conf), reporter);
collector.flush();
} finally {
// close
in.close(); // close input
collector.close();
}
}
please assist me in solving this problem.
Thanks,
Arshadali
After some efforts I found out that hadoop uses org.apache.hadoop.fs.ftp.FTPFileSystem Class for FTP.
This class doesn't supports seek, i.e. Seek to the given offset from the start of the file. Data is read in one block and then file system seeks to next block to read. Default block size is 4KB for FTPFileSystem. As seek is not supported it can only read data less than or equal to 4KB.