Flume Custom HTTPSourceHandler GZipped File

Flume Custom HTTPSourceHandler GZipped File - java

I am trying to create a custom Flume HTTPSourceHandler that handles the contents of a file that is sent in the POST body of an HTTP request, and the payload of that post will be gzipped.
I am new to Flume, and struggling to understand how to return the contents of this GZip file (or any data for that matter) as Flume events.
Here is some incomplete code I am working on. The main goal right now is to simply print console of file to console.
Any tips, examples, etc. would be very helpful.
import org.apache.flume.Event;
import org.apache.flume.source.http.HTTPSourceHandler;
import org.apache.http.HttpHeaders;
import javax.servlet.http.HttpServletRequest;
import java.util.ArrayList;
import java.util.List;
import java.util.zip.GZIPInputStream;
public class HttpGzipHandler extends HTTPSourceHandler{
public HttpGzipHandler(){
}
public List<Event> getEvents(HttpServletRequest request) throws Exception {
boolean isGzipped = request.getHeader(HttpHeaders.CONTENT_ENCODING) != null
&& request.getHeader(HttpHeaders.CONTENT_ENCODING).contains("gzip");
GZIPInputStream gzipInputStream = new GZIPInputStream(request.getInputStream());
List<Event> eventList = new ArrayList<Event>(0);
//TODO: Return the Events
}
}

You may have a look on a custom Http handler I've developed for a tool named Cygnus, as an inspiration. I think the important part for you will be the code where the event is created and emitted:
// create the appropiate headers
Map<String, String> eventHeaders = new HashMap<String, String>();
eventHeaders.put(..., ...);
// create the event list containing only one event
ArrayList<Event> eventList = new ArrayList<Event>();
Event event = EventBuilder.withBody(data.getBytes(), eventHeaders);
eventList.add(event);
return eventList;

Related

Not able to process kafka json message with Flink siddhi library

I am trying to create a simple application where the app will consume Kafka message do some cql transform and publish to Kafka and below is the code:
JAVA: 1.8
Flink: 1.13
Scala: 2.11
flink-siddhi: 2.11-0.2.2-SNAPSHOT
I am using library: https://github.com/haoch/flink-siddhi
input json to Kafka:
{
"awsS3":{
"ResourceType":"aws.S3",
"Details":{
"Name":"crossplane-test",
"CreationDate":"2020-08-17T11:28:05+00:00"
},
"AccessBlock":{
"PublicAccessBlockConfiguration":{
"BlockPublicAcls":true,
"IgnorePublicAcls":true,
"BlockPublicPolicy":true,
"RestrictPublicBuckets":true
}
},
"Location":{
"LocationConstraint":"us-west-2"
}
}
}
main class:
public class S3SidhiApp {
public static void main(String[] args) {
internalStreamSiddhiApp.start();
//kafkaStreamApp.start();
}
}
App class:
package flinksidhi.app;
import com.google.gson.JsonObject;
import flinksidhi.event.s3.source.S3EventSource;
import io.siddhi.core.SiddhiManager;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.siddhi.SiddhiCEP;
import org.json.JSONObject;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Map;
import static flinksidhi.app.connector.Consumers.createInputMessageConsumer;
import static flinksidhi.app.connector.Producer.*;
public class internalStreamSiddhiApp {
private static final String inputTopic = "EVENT_STREAM_INPUT";
private static final String outputTopic = "EVENT_STREAM_OUTPUT";
private static final String consumerGroup = "EVENT_STREAM1";
private static final String kafkaAddress = "localhost:9092";
private static final String zkAddress = "localhost:2181";
private static final String S3_CQL1 = "from inputStream select * insert into temp";
private static final String S3_CQL = "from inputStream select json:toObject(awsS3) as obj insert into temp;" +
"from temp select json:getString(obj,'$.awsS3.ResourceType') as affected_resource_type," +
"json:getString(obj,'$.awsS3.Details.Name') as affected_resource_name," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration') as encryption," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm') as algorithm insert into temp2; " +
"from temp2 select affected_resource_name,affected_resource_type, " +
"ifThenElse(encryption == ' ','Fail','Pass') as state," +
"ifThenElse(encryption != ' ' and algorithm == 'aws:kms','None','Critical') as severity insert into outputStream";
public static void start(){
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//DataStream<String> inputS = env.addSource(new S3EventSource());
//Flink kafka stream consumer
FlinkKafkaConsumer<String> flinkKafkaConsumer =
createInputMessageConsumer(inputTopic, kafkaAddress,zkAddress, consumerGroup);
//Add Data stream source -- flink consumer
DataStream<String> inputS = env.addSource(flinkKafkaConsumer);
SiddhiCEP cep = SiddhiCEP.getSiddhiEnvironment(env);
cep.registerExtension("json:toObject", io.siddhi.extension.execution.json.function.ToJSONObjectFunctionExtension.class);
cep.registerExtension( "json:getString", io.siddhi.extension.execution.json.function.GetStringJSONFunctionExtension.class);
cep.registerStream("inputStream", inputS, "awsS3");
inputS.print();
System.out.println(cep.getDataStreamSchemas());
//json needs extension jars to present during runtime.
DataStream<Map<String,Object>> output = cep
.from("inputStream")
.cql(S3_CQL1)
.returnAsMap("temp");
//Flink kafka stream Producer
FlinkKafkaProducer<Map<String, Object>> flinkKafkaProducer =
createMapProducer(env,outputTopic, kafkaAddress);
//Add Data stream sink -- flink producer
output.addSink(flinkKafkaProducer);
output.print();
try {
env.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Consumer class:
package flinksidhi.app.connector;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.json.JSONObject;
import java.util.Properties;
public class Consumers {
public static FlinkKafkaConsumer<String> createInputMessageConsumer(String topic, String kafkaAddress, String zookeeprAddr, String kafkaGroup ) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", kafkaAddress);
properties.setProperty("zookeeper.connect", zookeeprAddr);
properties.setProperty("group.id",kafkaGroup);
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(
topic,new SimpleStringSchema(),properties);
return consumer;
}
}
Producer class:
package flinksidhi.app.connector;
import flinksidhi.app.util.ConvertJavaMapToJson;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.json.JSONObject;
import java.util.Map;
public class Producer {
public static FlinkKafkaProducer<Tuple2> createStringProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Tuple2>(kafkaAddress, topic, new AverageSerializer());
}
public static FlinkKafkaProducer<Map<String,Object>> createMapProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Map<String,Object>>(kafkaAddress, topic, new SerializationSchema<Map<String, Object>>() {
#Override
public void open(InitializationContext context) throws Exception {
}
#Override
public byte[] serialize(Map<String, Object> stringObjectMap) {
String json = ConvertJavaMapToJson.convert(stringObjectMap);
return json.getBytes();
}
});
}
}
I have tried many things but the code where the CQL is invoked is never called and doesn't even give any error not sure where is it going wrong.
The same thing if I do creating an internal stream source and use the same input json to return as string it works.

Initial guess: if you are using event time, are you sure you have defined watermarks correctly? As stated in the docs:
(...) an incoming element is initially put in a buffer where elements are sorted in ascending order based on their timestamp, and when a watermark arrives, all the elements in this buffer with timestamps smaller than that of the watermark are processed (...)
If this doesn't help, I would suggest to decompose/simplify the job to a bare minimum, for example just a source operator and some naive sink printing/logging elements. And if that works, start adding back operators one by one. You could also start by simplifying your CEP pattern as much as possible.

First of all thanks a lot #Piotr Nowojski , just because of your small pointer which no matter how many times I pondered over about event time , it did not came in my mind. So yes while debugging the two cases:
With internal datasource , where it was processing successfully, while debugging the flow , I identified that it was processing a watermark after it was processing the data, but it did not catch me that it was somehow managing the event time of the data implicitly.
With kafka as a datasource , while I was debugging I could very clearly see that it was not processing any watermark in the flow, but it did not occur to me that , it is happening because of the event time and watermark not handled properly.
Just adding a single line of code in the application code which I understood from below Flink code snippet:
#deprecated In Flink 1.12 the default stream time characteristic has been changed to {#link
* TimeCharacteristic#EventTime}, thus you don't need to call this method for enabling
* event-time support anymore. Explicitly using processing-time windows and timers works in
* event-time mode. If you need to disable watermarks, please use {#link
* ExecutionConfig#setAutoWatermarkInterval(long)}. If you are using {#link
* TimeCharacteristic#IngestionTime}, please manually set an appropriate {#link
* WatermarkStrategy}. If you are using generic "time window" operations (for example {#link
* org.apache.flink.streaming.api.datastream.KeyedStream#timeWindow(org.apache.flink.streaming.api.windowing.time.Time)}
* that change behaviour based on the time characteristic, please use equivalent operations
* that explicitly specify processing time or event time.
*/
I got to know that by default flink considers event time and for that watermark needs to be handled properly which I didn't so I added below link for setting the time characteristics of the flink execution environment:
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
and kaboom ... it started working , while this is deprecated and needs some other configuration, but thanks a lot , it was a great pointer and helped me a lot and I solved the issue..
Thanks again #Piotr Nowojski

Google Cloud Dataflow issue with writing the data (TextIO or DatastoreIO)

OK, everyone. Another Dataflow question from a Dataflow newbie. (Just started playing with it this week..)
I'm creating a datapipe to take in a list of product names and generate autocomplete data. The data processing part is all working fine, it seems, but I'm missing something obvious because when I add my last ".apply" to use either DatastoreIO or TextIO to write the data out, I'm getting a syntax error in my IDE that says the following:
"The method apply(DatastoreV1.Write) is undefined for the type ParDo.SingleOutput>,Entity>"
If gives me an option add a cast to the method receiver, but that obviously isn't the answer. Do I need to do some other step before I try to write the data out? My last step before trying to write the data is a call to an Entity helper for Dataflow to change my Pipeline structure from > to , which seems to me like what I'd need to write to Datastore.
I got so frustrated with this thing the last few days, I even decided to write the data to some AVRO files instead so I could just load it in Datastore by hand. Imagine how ticked I was when I got all that done and got the exact same error in the exact same place on my call to TextIO. That is why I think I must be missing something very obvious here.
Here is my code. I included it all for reference, but you probably just need to look at the main[] at the bottom. Any input would be greatly appreciated! Thanks!
MrSimmonsSr
package com.client.autocomplete;
import com.client.autocomplete.AutocompleteOptions;
import com.google.datastore.v1.Entity;
import com.google.datastore.v1.Key;
import com.google.datastore.v1.Value;
import static com.google.datastore.v1.client.DatastoreHelper.makeKey;
import static com.google.datastore.v1.client.DatastoreHelper.makeValue;
import org.apache.beam.sdk.coders.DefaultCoder;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionList;
import com.google.api.services.bigquery.model.TableRow;
import com.google.common.base.MoreObjects;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.gcp.datastore.DatastoreIO;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.transforms.GroupByKey;
import org.apache.beam.sdk.transforms.DoFn.ProcessContext;
import org.apache.beam.sdk.transforms.DoFn.ProcessElement;
import org.apache.beam.sdk.extensions.jackson.ParseJsons;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.StreamingOptions;
import org.apache.beam.sdk.options.Validation;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.List;
import java.util.ArrayList;
/*
* A simple Dataflow pipeline to create autocomplete data from a list of
* product names. It then loads that prefix data into Google Cloud Datastore for consumption by
* a Google Cloud Function. That function will take in a prefix and return a list of 10 product names
*
* Pseudo Code Steps
* 1. Load a list of product names from Cloud Storage
* 2. Generate prefixes for use with autocomplete, based on the product names
* 3. Merge the prefix data together with 10 products per prefix
* 4. Write that prefix data to the Cloud Datastore as a KV with a <String>, List<String> structure
*
*/
public class ClientAutocompletePipeline {
private static final Logger LOG = LoggerFactory.getLogger(ClientAutocompletePipeline.class);
/**
* A DoFn that keys each product name by all of its prefixes.
* This creates one row in the PCollection for each prefix<->product_name pair
*/
private static class AllPrefixes
extends DoFn<String, KV<String, String>> {
private final int minPrefix;
private final int maxPrefix;
public AllPrefixes(int minPrefix) {
this(minPrefix, 10);
}
public AllPrefixes(int minPrefix, int maxPrefix) {
this.minPrefix = minPrefix;
this.maxPrefix = maxPrefix;
}
#ProcessElement
public void processElement(ProcessContext c) {
String productName= c.element().toString();
for (int i = minPrefix; i <= Math.min(productName.length(), maxPrefix); i++) {
c.output(KV.of(productName.substring(0, i), c.element()));
}
}
}
/**
* Takes as input the top product names per prefix, and emits an entity
* suitable for writing to Cloud Datastore.
*
*/
static class FormatForDatastore extends DoFn<KV<String, List<String>>, Entity> {
private String kind;
private String ancestorKey;
public FormatForDatastore(String kind, String ancestorKey) {
this.kind = kind;
this.ancestorKey = ancestorKey;
}
#ProcessElement
public void processElement(ProcessContext c) {
// Initialize an EntityBuilder and get it a valid key
Entity.Builder entityBuilder = Entity.newBuilder();
Key key = makeKey(kind, ancestorKey).build();
entityBuilder.setKey(key);
// New HashMap to hold all the properties of the Entity
Map<String, Value> properties = new HashMap<>();
String prefix = c.element().getKey();
String productsString = "Products[";
// iterate through the product names and add each one to the productsString
for (String productName : c.element().getValue()) {
// products.add(productName);
productsString += productName + ", ";
}
productsString += "]";
properties.put("prefix", makeValue(prefix).build());
properties.put("products", makeValue(productsString).build());
entityBuilder.putAllProperties(properties);
c.output(entityBuilder.build());
}
}
/**
* Options supported by this class.
*
* <p>Inherits standard Beam example configuration options.
*/
public interface Options
extends AutocompleteOptions {
#Description("Input text file")
#Validation.Required
String getInputFile();
void setInputFile(String value);
#Description("Cloud Datastore entity kind")
#Default.String("prefix-product-map")
String getKind();
void setKind(String value);
#Description("Whether output to Cloud Datastore")
#Default.Boolean(true)
Boolean getOutputToDatastore();
void setOutputToDatastore(Boolean value);
#Description("Cloud Datastore ancestor key")
#Default.String("root")
String getDatastoreAncestorKey();
void setDatastoreAncestorKey(String value);
#Description("Cloud Datastore output project ID, defaults to project ID")
String getOutputProject();
void setOutputProject(String value);
}
public static void main(String[] args) throws IOException{
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
// create the pipeline
Pipeline p = Pipeline.create(options);
PCollection<String> toWrite = p
// A step to read in the product names from a text file on GCS
.apply(TextIO.read().from("gs://sample-product-data/clean_product_names.txt"))
// Next expand the product names into KV pairs with prefix as key (<KV<String, String>>)
.apply("Explode Prefixes", ParDo.of(new AllPrefixes(2)))
// Apply a GroupByKey transform to the PCollection "flatCollection" to create "productsGroupedByPrefix".
.apply(GroupByKey.<String, String>create())
// Now format the PCollection for writing into the Google Datastore
.apply("FormatForDatastore", ParDo.of(new FormatForDatastore(options.getKind(),
options.getDatastoreAncestorKey()))
// Write the processed data to the Google Cloud Datastore
// NOTE: This is the line that I'm getting the error on!!
.apply(DatastoreIO.v1().write().withProjectId(MoreObjects.firstNonNull(
options.getOutputProject(), options.getOutputProject()))));
// Run the pipeline.
PipelineResult result = p.run();
}
}

I think you need another closing parenthesis. I've removed some of the extraneous bits and reindent according to the parentheses:
PCollection<String> toWrite = p
.apply(TextIO.read().from("..."))
.apply("Explode Prefixes", ...)
.apply(GroupByKey.<String, String>create())
.apply("FormatForDatastore", ParDo.of(new FormatForDatastore(
options.getKind(), options.getDatastoreAncestorKey()))
.apply(...);
Specifically, you need another parenthesis to close the apply("FormatForDatastore", ...). Right now, it is trying to call ParDo.of(...).apply(...) which doesn't work.

Spark Streaming: Using PairRDD.saveAsNewHadoopDataset function to save data to HBase

I want to save a Twitter stream in a HBase database. What I have now, is the Saprk Application to receive and transform the data. But I don't know how to save my TwitterStream into HBase?
The only thing I found that could be useful is the PairRDD.saveAsNewAPIHadoopDataset(conf) method. But how shall I use it, which Configurations do I have to make to able to save the RDD data to my HBase table?
The only thing I found yet is the HBase client library, which can insert data to a table via Put objects. But this isn't a solution for inside a Spark program, is it (would be necessary to iterate over all items inside the RDD!!)?
Can someone give an example in JAVA? My main problem seems to be the set-up of the org.apache.hadoop.conf.Configuration instance, I have to submit in the saveAsNewAPIHadoopDataset...
Here a code snippet:
JavaReceiverInputDStream<Status> statusDStream = TwitterUtils.createStream(streamingCtx);
JavaPairDStream<Long, String> statusPairDStream = statusDStream.mapToPair(new PairFunction<Status, Long, String>() {
public Tuple2<Long, String> call(Status status) throws Exception {
return new Tuple2<Long, String> (status.getId(), status.getText());
}
});
statusPairDStream.foreachRDD(new Function<JavaPairRDD<Long,String>, Void>() {
public Void call(JavaPairRDD<Long, String> status) throws Exception {
org.apache.hadoop.conf.Configuration conf = new Configuration();
status.saveAsNewAPIHadoopDataset(conf);
// HBase PUT here can't be correct!?
return null;
}
});

First thing is functions are discouraged, if you are using java 8. Pls. use lambda.
Below code snippet could address all your queries.
sample snippet:
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
....
public static void processYourMessages(final JavaRDD<YourMessage> rdd, final HiveContext sqlContext,
, MyMessageUtil messageutil) throws Exception {
final JavaRDD<Row> yourrdd = rdd.filter(msg -> messageutil.filterType(.....) // create a java rdd
final JavaPairRDD<ImmutableBytesWritable, Put> yourrddPuts = yourrdd.mapToPair(row -> messageutil.getPuts(row));
yourrddPuts.saveAsNewAPIHadoopDataset(conf);
}
where conf is like below
private Configuration conf = HBaseConfiguration.create();
conf.set(ZOOKEEPER_QUORUM, "comma seperated list of zookeeper quorum");
conf.set("hbase.mapred.outputtable", "your table name");
conf.set("mapreduce.outputformat.class", "org.apache.hadoop.hbase.mapreduce.TableOutputFormat");
MyMessageUtil has getPuts methods which is like below
public Tuple2<ImmutableBytesWritable, Put> getPuts(Row row) throws Exception {
Put put = ..// prepare your put with all the columns you have.
return new Tuple2<ImmutableBytesWritable, Put>(new ImmutableBytesWritable(), put);
}
Hope this helps!

How to create multiple files based on one Freemarker Template

I'm having a little bit trouble with freemarker right now. What I want to do basically in my template: iterate over a list of elements and create for each element a new file.
<#assign x=3>
<#list 1..x as i>
${i}
...create a new file with the output of this loop iteration...
</#list>
I did not find anything about this in the freemarker manual or google. Is there a way to do this?

You can implement this with a custom directive. See freemarker.template.TemplateDirectiveModel, and particularly TemplateDirectiveBody. Custom directives can specify the Writer used in their nested content. So you can do something like <#output file="...">...</#output>, where the nested content will be written into the Writer you have provided in your TemplateDirectiveModel implementation, which in this case should write into the file specified. (FMPP does this too: http://fmpp.sourceforge.net/qtour.html#sect4)

You cannot do this using only FreeMarker. Its idea is to produce the single output stream from your template. It doesn't even care whether you will save the result to file, pass directly to TCP socket, store in the memory as string or do anything else.
If you really want to achieve this, you have to handle file separation by yourself. For example, you can insert special line like:
<#assign x=3>
<#list 1..x as i>
${i}
%%%%File=output${i}.html
...
</#list>
After that you should post-process FreeMarker output by yourself looking for the lines started with %%%%File= and create a new file at this point.

As ddekany said, you can do that implementing a directive. I have coded a little example:
package spikes;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.HashMap;
import java.util.Map;
import freemarker.core.Environment;
import freemarker.template.Configuration;
import freemarker.template.SimpleScalar;
import freemarker.template.Template;
import freemarker.template.TemplateDirectiveBody;
import freemarker.template.TemplateDirectiveModel;
import freemarker.template.TemplateException;
import freemarker.template.TemplateModel;
import io.vertx.core.json.JsonArray;
import io.vertx.core.json.JsonObject;
class OutputDirective implements TemplateDirectiveModel {
#Override
public void execute(
Environment env,
#SuppressWarnings("rawtypes") Map params,
TemplateModel[] loopVars,
TemplateDirectiveBody body)
throws TemplateException, IOException {
SimpleScalar file = (SimpleScalar) params.get("file");
FileWriter fw = new FileWriter(new File(file.getAsString()));
body.render(fw);
fw.flush();
}
}
public class FreemarkerTest {
public static void main(String[] args) throws Exception {
Configuration cfg = new Configuration(Configuration.VERSION_2_3_0);
cfg.setDefaultEncoding("UTF-8");
JsonObject model = new JsonObject()
.put("entities", new JsonArray()
.add(new JsonObject()
.put("name", "Entity1"))
.add(new JsonObject()
.put("name", "Entity2")));
Template template = new Template("Test", "<#assign model = model?eval_json><#list model.entities as entity><#output file=entity.name + \".txt\">This is ${entity.name} entity\n</#output></#list>", cfg);
Map<String, Object> root = new HashMap<String, Object>();
root.put("output", new OutputDirective());
root.put("model", model.encode());
Writer out = new OutputStreamWriter(System.out);
template.process(root, out);
}
}
This will generate two files:
"Entity1.txt": This is Entity1 entity
"Entity2.txt": This is Entity2 entity
:-)

Amazon Product Advertising API through Java/SOAP

I have been playing with Amazon's Product Advertising API, and I cannot get a request to go through and give me data. I have been working off of this: http://docs.amazonwebservices.com/AWSECommerceService/2011-08-01/GSG/ and this: Amazon Product Advertising API signed request with Java
Here is my code. I generated the SOAP bindings using this: http://docs.amazonwebservices.com/AWSECommerceService/2011-08-01/GSG/YourDevelopmentEnvironment.html#Java
On the Classpath, I only have: commons-codec.1.5.jar
import com.ECS.client.jax.AWSECommerceService;
import com.ECS.client.jax.AWSECommerceServicePortType;
import com.ECS.client.jax.Item;
import com.ECS.client.jax.ItemLookup;
import com.ECS.client.jax.ItemLookupRequest;
import com.ECS.client.jax.ItemLookupResponse;
import com.ECS.client.jax.ItemSearchResponse;
import com.ECS.client.jax.Items;
public class Client {
public static void main(String[] args) {
String secretKey = <my-secret-key>;
String awsKey = <my-aws-key>;
System.out.println("API Test started");
AWSECommerceService service = new AWSECommerceService();
service.setHandlerResolver(new AwsHandlerResolver(
secretKey)); // important
AWSECommerceServicePortType port = service.getAWSECommerceServicePort();
// Get the operation object:
com.ECS.client.jax.ItemSearchRequest itemRequest = new com.ECS.client.jax.ItemSearchRequest();
// Fill in the request object:
itemRequest.setSearchIndex("Books");
itemRequest.setKeywords("Star Wars");
// itemRequest.setVersion("2011-08-01");
com.ECS.client.jax.ItemSearch ItemElement = new com.ECS.client.jax.ItemSearch();
ItemElement.setAWSAccessKeyId(awsKey);
ItemElement.getRequest().add(itemRequest);
// Call the Web service operation and store the response
// in the response object:
com.ECS.client.jax.ItemSearchResponse response = port
.itemSearch(ItemElement);
String r = response.toString();
System.out.println("response: " + r);
for (Items itemList : response.getItems()) {
System.out.println(itemList);
for (Item item : itemList.getItem()) {
System.out.println(item);
}
}
System.out.println("API Test stopped");
}
}
Here is what I get back.. I was hoping to see some Star Wars books available on Amazon dumped out to my console :-/:
API Test started
response: com.ECS.client.jax.ItemSearchResponse#7a6769ea
com.ECS.client.jax.Items#1b5ac06e
API Test stopped
What am I doing wrong (Note that no "item" in the second for loop is being printed out, because its empty)? How can I troubleshoot this or get relevant error information?

I don't use the SOAP API but your Bounty requirements didn't state that it had to use SOAP only that you wanted to call Amazon and get results. So, I'll post this working example using the REST API which will at least fulfill your stated requirements:
I would like some working example code that hits the amazon server and returns results
You'll need to download the following to fulfill the signature requirements:
http://associates-amazon.s3.amazonaws.com/signed-requests/samples/amazon-product-advt-api-sample-java-query.zip
Unzip it and grab the com.amazon.advertising.api.sample.SignedRequestsHelper.java file and put it directly into your project. This code is used to sign the request.
You'll also need to download Apache Commons Codec 1.3 from the following and add it to your classpath i.e. add it to your project's library. Note that this is the only version of Codec that will work with the above class (SignedRequestsHelper)
http://archive.apache.org/dist/commons/codec/binaries/commons-codec-1.3.zip
Now you can copy and paste the following making sure to replace your.pkg.here with the proper package name and replace the SECRET and the KEY properties:
package your.pkg.here;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringWriter;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class Main {
private static final String SECRET_KEY = "<YOUR_SECRET_KEY>";
private static final String AWS_KEY = "<YOUR_KEY>";
public static void main(String[] args) {
SignedRequestsHelper helper = SignedRequestsHelper.getInstance("ecs.amazonaws.com", AWS_KEY, SECRET_KEY);
Map<String, String> params = new HashMap<String, String>();
params.put("Service", "AWSECommerceService");
params.put("Version", "2009-03-31");
params.put("Operation", "ItemLookup");
params.put("ItemId", "1451648537");
params.put("ResponseGroup", "Large");
String url = helper.sign(params);
try {
Document response = getResponse(url);
printResponse(response);
} catch (Exception ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
}
private static Document getResponse(String url) throws ParserConfigurationException, IOException, SAXException {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(url);
return doc;
}
private static void printResponse(Document doc) throws TransformerException, FileNotFoundException {
Transformer trans = TransformerFactory.newInstance().newTransformer();
Properties props = new Properties();
props.put(OutputKeys.INDENT, "yes");
trans.setOutputProperties(props);
StreamResult res = new StreamResult(new StringWriter());
DOMSource src = new DOMSource(doc);
trans.transform(src, res);
String toString = res.getWriter().toString();
System.out.println(toString);
}
}
As you can see this is much simpler to setup and use than the SOAP API. If you don't have a specific requirement for using the SOAP API then I would highly recommend that you use the REST API instead.
One of the drawbacks of using the REST API is that the results aren't unmarshaled into objects for you. This could be remedied by creating the required classes based on the wsdl.

This ended up working (I had to add my associateTag to the request):
public class Client {
public static void main(String[] args) {
String secretKey = "<MY_SECRET_KEY>";
String awsKey = "<MY AWS KEY>";
System.out.println("API Test started");
AWSECommerceService service = new AWSECommerceService();
service.setHandlerResolver(new AwsHandlerResolver(secretKey)); // important
AWSECommerceServicePortType port = service.getAWSECommerceServicePort();
// Get the operation object:
com.ECS.client.jax.ItemSearchRequest itemRequest = new com.ECS.client.jax.ItemSearchRequest();
// Fill in the request object:
itemRequest.setSearchIndex("Books");
itemRequest.setKeywords("Star Wars");
itemRequest.getResponseGroup().add("Large");
// itemRequest.getResponseGroup().add("Images");
// itemRequest.setVersion("2011-08-01");
com.ECS.client.jax.ItemSearch ItemElement = new com.ECS.client.jax.ItemSearch();
ItemElement.setAWSAccessKeyId(awsKey);
ItemElement.setAssociateTag("th0426-20");
ItemElement.getRequest().add(itemRequest);
// Call the Web service operation and store the response
// in the response object:
com.ECS.client.jax.ItemSearchResponse response = port
.itemSearch(ItemElement);
String r = response.toString();
System.out.println("response: " + r);
for (Items itemList : response.getItems()) {
System.out.println(itemList);
for (Item itemObj : itemList.getItem()) {
System.out.println(itemObj.getItemAttributes().getTitle()); // Title
System.out.println(itemObj.getDetailPageURL()); // Amazon URL
}
}
System.out.println("API Test stopped");
}
}

It looks like the response object does not override toString(), so if it contains some sort of error response, simply printing it will not tell you what the error response is. You'll need to look at the api for what fields are returned in the response object and individually print those. Either you'll get an obvious error message or you'll have to go back to their documentation to try to figure out what is wrong.

You need to call the get methods on the Item object to retrieve its details, e.g.:
for (Item item : itemList.getItem()) {
System.out.println(item.getItemAttributes().getTitle()); //Title of item
System.out.println(item.getDetailPageURL()); // Amazon URL
//etc
}
If there are any errors you can get them by calling getErrors()
if (response.getOperationRequest().getErrors() != null) {
System.out.println(response.getOperationRequest().getErrors().getError().get(0).getMessage());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flume Custom HTTPSourceHandler GZipped File - java

Related

Not able to process kafka json message with Flink siddhi library

Google Cloud Dataflow issue with writing the data (TextIO or DatastoreIO)

Spark Streaming: Using PairRDD.saveAsNewHadoopDataset function to save data to HBase

How to create multiple files based on one Freemarker Template

Amazon Product Advertising API through Java/SOAP

Categories

Resources