Why do I get duplicated data from flink?

Why do I get duplicated data from flink? - java

I'm new to flink, and I'm trying to read a stream from kafka, however I'm getting duplicate data processed, and I'm wondering why ?
I know that's the problem came from flink because when I wrote a simple consumer in java I got no duplicate data
flink-connector-kafka_2.11 version 1.10.0
flink version 1.11
is there any issue to check if flink is processing only once the data provided by kafka ?
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
KafkaConsumer consumer = new KafkaConsumer("fashion","172.16.3.241:9092","fashion","org.apache.kafka.common.serialization.ByteBufferDeserializer");
FlinkKafkaConsumer<JsonNode> stream_consumer = new FlinkKafkaConsumer<>(consumer.getTopic(), new DeserializationSchema<JsonNode>() {
private final ObjectMapper objMapper = new ObjectMapper();
#Override
public JsonNode deserialize(byte[] bytes) throws IOException {
return objMapper.readValue(bytes,JsonNode.class);
}
#Override
public boolean isEndOfStream(JsonNode jsonNode) {
return false;
}
#Override
public TypeInformation<JsonNode> getProducedType() {
return TypeExtractor.getForClass(JsonNode.class);
}
}, consumer.getProperties());
DataStream<JsonNode> tweets = env.addSource(stream_consumer);
tweets.flatMap(new getTweetSchema());
env.execute("Flink Streaming Java API Skeleton");
}
private static class getTweetSchema implements FlatMapFunction<JsonNode, Tweet>{
private static final long serialVersionUID = -6867736771747690202L;
private JSONObject objTweet;
public void flatMap(JsonNode tweet, Collector<Tweet> out) throws JSONException, ParseException{
try{
if (objTweet == null){
objTweet = new JSONObject(tweet.asText());
}
HashSet<String> hashtag = new HashSet<>();
String text = objTweet.get("text").toString();
DateFormat dateFormat = new SimpleDateFormat("EEE MMM d HH:mm:ss Z yyyy", Locale.ENGLISH );
Date created_at = dateFormat.parse(objTweet.get("created_at").toString());
String source = objTweet.get("source").toString();
source = source.substring(source.length() - 11).replaceAll("</a>","");
String lang = objTweet.get("lang").toString();
Boolean isRT = text.matches("^RT.*");
Long id = Long.parseLong(objTweet.get("id").toString());
if (objTweet.has("extended_tweet")){
JSONArray arr = objTweet.getJSONObject("extended_tweet").getJSONObject("entities").getJSONArray("hashtags");
if(!(arr.isEmpty())){
for(int i = 0; i< arr.length();i++){
hashtag.add(arr.getJSONObject(i).get("text").toString());
}
System.out.println(arr);
}
}
out.collect(new Tweet(id, text,created_at,source,lang,isRT,hashtag));
}catch (JSONException | ParseException e){
System.out.println("e");
throw e;
}
}
}

Related

Why is deserialized object's size different from serialized object's size?

I'm saving some Java objects in files. I seralize them that way :
Class Timestamp.java:
public class Timestamp implements java.io.Serializable {
private static final long serialVersionUID = 1L;
private static SimpleDateFormat staticFormat = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz", ENGLISH);
private SimpleDateFormat instanceFormat = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz", ENGLISH);
private long time;
}
Class ObjId.java:
public class ObjId implements java.io.Serializable {
private final int x;
private final int y;
private final int z;
public ObjId(final int x, final int y, final int z) {
this.x = x;
this.y = y;
this.z = z;
}
}
CachedObj.java :
class CachedObj implements java.io.Serializable {
private static final long serialVersionUID = 1L;
private final ObjId id;
private final byte[] data;
private final String eTag;
private Timestamp modified;
private Timestamp expires;
private boolean mustRevalidate;
public CachedObj(final ObjId ID, final byte[] data, final String eTag, final Timestamp modified, final Timestamp expires, final boolean mustRevalidate) {
this.id= ID;
this.data = data;
this.eTag = eTag;
this.modified = modified;
this.expires = expires;
this.mustRevalidate = mustRevalidate;
}
}
The rest of my code :
public void saveObjToFlash(CachedObj obj, String fileName) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(obj);
out.flush();
bos.close();
out.close();
try (OutputStream outputStream = new FileOutputStream(fileName)) {
bos.writeTo(outputStream);
} catch (Exception e) {
e.getStackTrace();
}
} catch (IOException e) {
e.printStackTrace();
}
}
In another place in my code I deserialize that way :
public CachedObj getCachedObjFromFile(String filename) {
try {
File file = new File(filename);
FileInputStream fileInput = new FileInputStream(file);
ObjectInputStream objInput = new ObjectInputStream(fileInput);
CachedObj obj= (CachedObj ) objInput.readObject();
fileInput.close();
objInput.close();
return obj;
} catch (IOException ex) {
ex.printStackTrace();
} catch (java.lang.ClassNotFoundException exx) {
exx.printStackTrace();
}
return null;
}
And I have a function that calculates an CachedObj object size :
public int getObjectSize(CachedObj obj) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(obj);
out.flush();
out.close();
bos.close();
int sizeObj = bos.toByteArray().length;
return sizeObj;
} catch (IOException e) {
e.printStackTrace();
return -1;
}
}
My question is, when I run the following code :
public static void main(String[] args) {
byte[] bytesArr = new byte[4];
CachedObj obj1 = new CachedObj new ObjId(100, 100, 100), bytesArr , "etag100",
new Timestamp(1659523700),
new Timestamp(1659523700), false);
int size1 = getObjectSize(obj1);
saveObjToFlash(obj1, "file");
CachedObj obj2 = getCachedObjFromFile("file"); // objects obj1 and obj2 seems to have same exact fields values
int size2 = getObjectSize(obj2); // I always get : size2 = size1 - 2 !!
}
Why are size1 and size2 different ?
How can I get the size of obj2 such as I'll get the same value of size1 ?

This isn't a complete answer. Your issue with slight change in the serialised size is down to the SimpleDateFormat in Timestamp. Comment out the instanceFormat field, or make the field transient will make all the sizes the same again, or change your example to just serialise SimpleDateFormat instead of CachedObj:
private SimpleDateFormat instanceFormat = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz", Locale.ENGLISH);
SimpleDateFormat isn't a good type of field to add to the serialized format of your class as it has a relatively big memory footprint (~ 48KB!). It would be better to created one instance in the application code, never serialised, and re-use it in same thread for all your Timestamp <-> String conversions performed rather than allocate new instance per Timestamp.
If you are performing a significant number of conversions re-using the same SimpleDateFormat for date to string formatting will reduce overall memory churn on large application servers.
By the way, your calls can be simplified with try-with-resources any adapted for use by and Serializable:
public static int getObjectSize(Serializable obj) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try (ObjectOutputStream out = new ObjectOutputStream(bos)) {
out.writeObject(obj);
}
return bos.size();
}
public static void saveObjToFlash(Serializable obj, String fileName) throws IOException {
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(fileName))) {
out.writeObject(obj);
}
}
public static <T extends Serializable> T getCachedObjFromFile(String filename) throws IOException, ClassNotFoundException {
try (ObjectInputStream objInput = new ObjectInputStream(new FileInputStream(filename))) {
return (T)objInput.readObject();
}
}

Writing an array of STRUCT from Dataflow to big query

I am trying to write an Array of Structs field from my Dataflow pipeline to big query, the schema of the table generated is correct but no data gets populated in the fields.
My DoFn function:
public class ProcessIpBlocks {
public static class IpBlocksToIp extends DoFn<TableRow, TableRow> {
private static final long serialVersionUID = 1L;
#Override
public void processElement(ProcessContext c) throws JSONException {
TableRow row = c.element();
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Calendar cal = Calendar.getInstance();
long startIp = 0L, endIp = 0L;
if(row.get("start_ip") != null)
startIp = Long.parseLong((String)row.get("start_ip"));
if(row.get("end_ip") != null)
endIp = Long.parseLong((String)row.get("end_ip"));
for(long i= startIp; i<=endIp; i++)
{
TableRow outputRow = new TableRow();
outputRow.set("start_ip", startIp);
outputRow.set("ip", i);
if(row.get("postal_code") != null && !((String)row.get("postal_code")).isEmpty()){
System.out.println("This is getting written to logs");
endIp = Long.parseLong((String)row.get("end_ip"));
JSONArray atrArray = new JSONArray();
JSONObject atr = new JSONObject();
atr.put("id", "zippostal_code");
JSONArray atrValueArray = new JSONArray();
atr.put("value", atrValueArray.put((String)row.get("postal_code")));
atr.put("pr", 0.5);
atr.put("dt", cal.getTime());
atrArray.put(atr);
outputRow.set("atr", atrArray);
}
c.output(outputRow);
}
}
}
}
My pipeline write step:
iPBlocksToIPData.apply("Foo", ParDo.of(new ProcessIpBlocks.IpBlocksToIp()))
.apply(BigQueryIO.Write
.named("WriteIPs")
.to(String.format("%1$s:%2$s.%3$s",projectId, eventDataset, ipBlocksToIpTable))
.withSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

Below solution worked, using TableRow instead of JSONArray
public class Foo {
public static class Foo extends DoFn<TableRow, TableRow> {
#Override
public void processElement(ProcessContext c) throws JSONException {
TableRow row = c.element();
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Calendar cal = Calendar.getInstance();
long startIp = 0L, endIp = 0L;
if(row.get("start_ip") != null)
startIp = Long.parseLong((String)row.get("start_ip"));
if(row.get("end_ip") != null)
endIp = Long.parseLong((String)row.get("end_ip"));
for(long i= startIp; i<=endIp; i++)
{
TableRow outputRow = new TableRow();
outputRow.set("start_ip", startIp);
outputRow.set("ip", i);
if(row.get("postal_code") != null && !((String)row.get("postal_code")).isEmpty()){
endIp = Long.parseLong((String)row.get("end_ip"));
TableRow atrRow = new TableRow();
atrRow.set("id", "zippostal_code");
atrRow.set("value", new String[] {(String)row.get("postal_code")});
outputRow.set("atr", atrRow);
}
System.out.println(outputRow);
c.output(outputRow);
}
}
}

Memory leak in Oracle's Jena adaptation?

(note; this is also asked on the OTN discussion fora - but I'm not too sure there's much activity there)
I've got a test using rdf_semantic_graph_support_for_apache_jena_2.11.1_with_12101_server_patch_build0529 and observe a OOM behavior when running the attached test-class's main() method.
I'm at a loss to see why, though.
In MAT it seems that theres a whole bunch of Oracle instances that hang around, hugging a lot of Strings - but I can verify via SQL*Developer that the connections are successfully closed.
Stripping the test down, I see the apparent leakage happen upon just about any interaction with the ModelOracleSem or GraphOracleSem.
public class OracleSemTxIntegrationWithSpringITCase {
private OracleDataSource oracleDataSource;
private OraclePool oraclePool;
#Before
public void before() throws SQLException {
java.util.Properties prop = new java.util.Properties();
prop.setProperty("MinLimit", "2"); // the cache size is 2 at least
prop.setProperty("MaxLimit", "10");
prop.setProperty("InitialLimit", "2"); // create 2 connections at startup
prop.setProperty("InactivityTimeout", "200"); // seconds
prop.setProperty("AbandonedConnectionTimeout", "100"); // seconds
prop.setProperty("MaxStatementsLimit", "10");
prop.setProperty("PropertyCheckInterval", "60"); // seconds
oracleDataSource = new OracleDataSource();
oracleDataSource.setURL("jdbc:oracle:thin:#**********");
oracleDataSource.setUser("rdfuser");
oracleDataSource.setPassword("****");
oracleDataSource.setConnectionProperties(prop);
oraclePool = new OraclePool(oracleDataSource);
}
#Test
public void testTransactionHandlingViaJdbcTransactions() throws Exception {
final Oracle oracle1 = oraclePool.getOracle();
final Oracle oracle2 = oraclePool.getOracle();
final Oracle oracle3 = oraclePool.getOracle();
final GraphOracleSem graph1 = new GraphOracleSem(oracle1, OracleMetadataDaoITCase.INTEGRATION_TEST_MODEL);
final Model model1 = new ModelOracleSem(graph1);
final GraphOracleSem graph2 = new GraphOracleSem(oracle2, OracleMetadataDaoITCase.INTEGRATION_TEST_MODEL);
final Model model2 = new ModelOracleSem(graph2);
GraphOracleSem graph3 = new GraphOracleSem(oracle3, OracleMetadataDaoITCase.INTEGRATION_TEST_MODEL);
Model model3 = new ModelOracleSem(graph3);
removePersons(model3);
model3.commit();
model3.close();
graph3 = new GraphOracleSem(oracle3, OracleMetadataDaoITCase.INTEGRATION_TEST_MODEL);
model3 = new ModelOracleSem(graph3);
model1.add(model1.createResource("http://www.tv2.no/people/person-1"), DC.description, "A dude");
model2.add(model1.createResource("http://www.tv2.no/people/person-2"), DC.description, "Another dude");
int countPersons = countPersons(model3);
assertEquals(0, countPersons);
model1.commit();
countPersons = countPersons(model3);
assertEquals(1, countPersons);
model2.commit();
countPersons = countPersons(model3);
assertEquals(2, countPersons);
oracle1.commitTransaction();
oracle2.commitTransaction();
oracle3.commitTransaction();
model1.close();
model2.close();
model3.close();
oracle1.dispose();
oracle2.dispose();
oracle3.dispose();
System.err.println("all disposed");
}
public static void main(String ...args) throws Exception {
OracleSemTxIntegrationWithSpringITCase me = new OracleSemTxIntegrationWithSpringITCase();
me.before();
Stopwatch sw = Stopwatch.createStarted();
for(int n = 0; n < 1000; n++) {
me.testTransactionHandlingViaJdbcTransactions();
}
System.err.println("DONE: " + sw.stop());
me.after();
}
#After
public void after() throws SQLException {
oracleDataSource.close();
}
private int countPersons(final Model model) {
return listPersons(model).size();
}
private void removePersons(final Model model) {
final List<Resource> persons = listPersons(model);
persons.stream().forEach(per -> model.removeAll(per, null, null));
}
private List<Resource> listPersons(final Model model) {
final List<Resource> persons = Lists.newArrayList();
ExtendedIterator<Resource> iter = model.listSubjects()
.filterKeep(new Filter<Resource>() {
#Override
public boolean accept(Resource o) {
return o.getURI().startsWith("http://www.tv2.no/people/person-");
}
})
;
iter.forEachRemaining(item -> persons.add(item));
iter.close();
return persons;
}
}

Oracle has provided a fix for this, which I would assume will be publicly available at some time.

java.lang.RuntimeException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String

While running a storm topology I am getting this error.The topology runs perfectly for 5mins without any error then it fails.I am using
Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS as 300 sec i.e 5mins.
This is my inputstream :
{"_id":{"$oid":"556809dbe4b0ef41436f7515"},"body":{"ProductCount":NumberInt(1),"category":null,"correctedWord":"bbtp","field":null,"filter":{},"fromAutocomplete":false,"loggedIn":false,"pageNo":"1","pageSize":"64","percentageMatch":NumberInt(100),"searchTerm":"bbtp","sortOrder":null,"suggestedWords":[]},"envelope":{"IP":"115.115.115.98","actionType":"search","sessionId":"10536088910863418864","timestamp":{"$date":"2015-05-29T06:40:00.000Z"}}}
This is the complete error:
java.lang.RuntimeException: java.lang.ClassCastException: java.lang.Long
cannot be cast to java.lang.String at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) at
backtype.storm.daemon.executor$fn__4722$fn__4734$fn__4781.invoke(executor.clj:748) at
backtype.storm.util$async_loop$fn__458.invoke(util.clj:463) at
clojure.lang.AFn.run(AFn.java:24) at
java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to
java.lang.String at
backtype.storm.tuple.TupleImpl.getString(TupleImpl.java:112) at
com.inferlytics.InferlyticsStormConsumer.bolt.QueryNormalizer.execute(QueryNor
malizer.java:40) at
backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) at
backtype.storm.daemon.executor$fn__4722$tuple_action_fn__4724.invoke(executor.clj:633) at
backtype.storm.daemon.executor$mk_task_receiver$fn__4645.invoke(executor.clj:4
04) at
backtype.storm.disruptor$clojure_handler$reify__1446.onEvent(disruptor.clj:58) at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ... 6 more
My topology :
public class TopologyMain {
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory
.getLogger(TopologyMain.class);
private static final String SPOUT_ID = "Feed-Emitter";
/**
* #param args
*/
/**
* #param args
* #throws AlreadyAliveException
* #throws InvalidTopologyException
*/
/**
* #param args
* #throws AlreadyAliveException
* #throws InvalidTopologyException
*/
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException {
int numSpoutExecutors = 1;
LOG.info("This is SpoutConfig");
KafkaSpout kspout = QueryCounter();
TopologyBuilder builder = new TopologyBuilder();
LOG.info("This is Set Spout");
builder.setSpout(SPOUT_ID, kspout, numSpoutExecutors);
LOG.info("This is Query-Normalizer bolt");
builder.setBolt("Query-normalizer", new QueryNormalizer())
.shuffleGrouping(SPOUT_ID);
LOG.info("This is Query-ProductCount bolt");
builder.setBolt("Query-ProductCount", new QueryProductCount(),1)
.shuffleGrouping("Query-normalizer", "stream1");
LOG.info("This is Query-SearchTerm bolt");
builder.setBolt("Query-SearchTerm", new QuerySearchTermCount(),1)
.shuffleGrouping("Query-normalizer", "stream2");
LOG.info("This is tick-tuple bolt");
builder.setBolt("Tick-Tuple", new TickTuple(),1)
.shuffleGrouping("Query-normalizer", "stream3");
/*
* Storm Constants
* */
String NIMBUS_HOST = FilePropertyManager.getProperty( ApplicationConstants.STORM_CONSTANTS_FILE,
ApplicationConstants.NIMBUS_HOST );
String NIMBUS_THRIFT_PORT = FilePropertyManager.getProperty( ApplicationConstants.STORM_CONSTANTS_FILE,
ApplicationConstants.NIMBUS_THRIFT_PORT );
String TOPOLOGY_TICK_TUPLE_FREQ_SECS = FilePropertyManager.getProperty( ApplicationConstants.STORM_CONSTANTS_FILE,
ApplicationConstants.TOPOLOGY_TICK_TUPLE_FREQ_SECS );
String STORM_JAR = FilePropertyManager.getProperty( ApplicationConstants.STORM_CONSTANTS_FILE,
ApplicationConstants.STORM_JAR );
String SET_NUM_WORKERS = FilePropertyManager.getProperty( ApplicationConstants.STORM_CONSTANTS_FILE,
ApplicationConstants.SET_NUM_WORKERS );
String SET_MAX_SPOUT_PENDING = FilePropertyManager.getProperty( ApplicationConstants.STORM_CONSTANTS_FILE,
ApplicationConstants.SET_MAX_SPOUT_PENDING );
final int setNumWorkers = Integer.parseInt(SET_NUM_WORKERS);
final int setMaxSpoutPending = Integer.parseInt(SET_MAX_SPOUT_PENDING);
final int nimbus_thirft_port = Integer.parseInt(NIMBUS_THRIFT_PORT);
final int topology_tick_tuple_freq_secs = Integer.parseInt(TOPOLOGY_TICK_TUPLE_FREQ_SECS);
/*
* Storm Configurations
*/
LOG.trace("Setting Configuration");
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
conf.put(Config.NIMBUS_HOST, NIMBUS_HOST);
conf.put(Config.NIMBUS_THRIFT_PORT, nimbus_thirft_port);
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, topology_tick_tuple_freq_secs);
System.setProperty("storm.jar",STORM_JAR );
conf.setNumWorkers(setNumWorkers);
conf.setMaxSpoutPending(setMaxSpoutPending);
if (args != null && args.length > 0) {
LOG.trace("Storm Topology Submitted On CLuster");
StormSubmitter. submitTopology(args[0], conf, builder.createTopology());
}
else
{
LOG.trace("Storm Topology Submitted On Local");
cluster.submitTopology("Query", conf, builder.createTopology());
Utils.sleep(10000);
cluster.killTopology("Query");
LOG.trace("This is ShutDown cluster");
cluster.shutdown();
}
LOG.trace("Method: main finished.");
}
private static KafkaSpout QueryCounter() {
//Build a kafka spout
/*
* Kafka Constants
*/
final String topic = FilePropertyManager.getProperty( ApplicationConstants.KAFKA_CONSTANTS_FILE,
ApplicationConstants.TOPIC );
String zkHostPort = FilePropertyManager.getProperty( ApplicationConstants.KAFKA_CONSTANTS_FILE,
ApplicationConstants.ZOOKEEPER_CONNECTION_STRING );
String zkRoot = "/Feed-Emitter";
String zkSpoutId = "Feed-Emitter-spout";
ZkHosts zkHosts = new ZkHosts(zkHostPort);
LOG.trace("This is Inside kafka spout ");
SpoutConfig spoutCfg = new SpoutConfig(zkHosts, topic, zkRoot, zkSpoutId);
spoutCfg.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutCfg);
LOG.trace("Returning From kafka spout ");
return kafkaSpout;
}
}
My QueryNormalizer Bolt :
public class QueryNormalizer extends BaseBasicBolt {
/**
*
*/
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory
.getLogger(QueryNormalizer.class);
public void cleanup() {}
/**
* The bolt will receive the line from the
* feed file and process it to Normalize this line
*
* The normalize will be put the terms in lower case
* and split the line to get all terms.
*/
public void execute(Tuple input, BasicOutputCollector collector) {
LOG.trace("Method in QueryNormalizer: execute called.");
String feed = input.getString(0);
String searchTerm = null;
String pageNo = null;
boolean sortOrder = true;
boolean category = true;
boolean field = true;
boolean filter = true;
String pc = null;
int ProductCount = 0;
String timestamp = null;
String year = null;
String month = null;
String day = null;
String hour = null;
Calendar calendar = Calendar.getInstance();
int dayOfYear = calendar.get(Calendar.DAY_OF_YEAR);
int weekOfYear = calendar.get(Calendar.WEEK_OF_YEAR);
JSONObject obj = null;
try {
obj = new JSONObject(feed);
} catch (JSONException e1) {
LOG.error( "Json Exception in Query Normalizer", e1 );
}
try {
searchTerm = obj.getJSONObject("body").getString("correctedWord");
pageNo = obj.getJSONObject("body").getString("pageNo");
sortOrder = obj.getJSONObject("body").isNull("sortOrder");
category = obj.getJSONObject("body").isNull("category");
field = obj.getJSONObject("body").isNull("field");
filter = obj.getJSONObject("body").getJSONObject("filter").isNull("filters");
pc = obj.getJSONObject("body").getString("ProductCount").replaceAll("[^\\d]", "");
ProductCount = Integer.parseInt(pc);
timestamp = (obj.getJSONObject("envelope").get("timestamp")).toString().substring(10,29);
year = (obj.getJSONObject("envelope").get("timestamp")).toString().substring(10, 14);
month = (obj.getJSONObject("envelope").get("timestamp")).toString().substring(15, 17);
day = (obj.getJSONObject("envelope").get("timestamp")).toString().substring(18, 20);
hour = (obj.getJSONObject("envelope").get("timestamp")).toString().substring(21, 23);
} catch (JSONException e) {
LOG.error( "Parsing Value Exception in Query Normalizer", e );
}
searchTerm = searchTerm.trim();
//Condition to eliminate pagination
if(!searchTerm.isEmpty()){
if ((pageNo.equals("1")) && (sortOrder == true) && (category == true) && (field == true) && (filter == true)){
searchTerm = searchTerm.toLowerCase();
System.out.println("In QueryProductCount execute: "+searchTerm+","+year+","+month+","+day+","+hour+","+dayOfYear+","+weekOfYear+","+ProductCount);
System.out.println("Entire Json : "+feed);
System.out.println("In QuerySearchCount execute : "+searchTerm+","+year+","+month+","+day+","+hour);
LOG.trace("In QueryNormalizer execute : "+searchTerm+","+year+","+month+","+day+","+hour+","+dayOfYear+","+weekOfYear+","+ProductCount);
LOG.trace("In QueryNormalizer execute : "+searchTerm+","+year+","+month+","+day+","+hour);
collector.emit("stream1", new Values(searchTerm , year , month , day , hour , dayOfYear , weekOfYear , ProductCount ));
collector.emit("stream2", new Values(searchTerm , year , month , day , hour ));
collector.emit("stream3", new Values());
}LOG.trace("Method in QueryNormalizer: execute finished.");
}
}
/**
* The bolt will only emit the specified streams in collector
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declareStream("stream1", new Fields("searchTerm" ,"year" ,"month" ,"day" ,"hour" ,"dayOfYear" ,"weekOfYear" ,"ProductCount"));
declarer.declareStream("stream2", new Fields("searchTerm" ,"year" ,"month" ,"day" ,"hour"));
declarer.declareStream("stream3", new Fields());
}
}
In the QueryNormalizer class the error is shown at this line
String feed = input.getString(0);
public void execute(Tuple input, BasicOutputCollector collector) {
LOG.trace("Method in QueryNormalizer: execute called.");
String feed = input.getString(0);
String searchTerm = null;
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to
java.lang.String at
backtype.storm.tuple.TupleImpl.getString(TupleImpl.java:112) at
com.inferlytics.InferlyticsStormConsumer.bolt.QueryNormalizer.execute(QueryNor
malizer.java:40)
EDIT :
After removing Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS from the config the code works properly.But I have to implement Tick Tuple () . How to achieve it?
I guess there is some problem with my TickTuple class. Is this the right way to Implement it ?
TickTuple
public class TickTuple extends BaseBasicBolt {
private static final long serialVersionUID = 1L;
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory
.getLogger(TickTuple.class);
private static final String KEYSPACE = FilePropertyManager.getProperty( ApplicationConstants.CASSANDRA_CONSTANTS_FILE,
ApplicationConstants.KEYSPACE );
private static final String MONGO_DB = FilePropertyManager.getProperty( ApplicationConstants.MONGO_CONSTANTS_FILE,
ApplicationConstants.MONGO_DBE );
private static final String TABLE_CASSANDRA_TOP_QUERY = FilePropertyManager.getProperty( ApplicationConstants.CASSANDRA_CONSTANTS_FILE,
ApplicationConstants.TABLE_CASSANDRA_TOP_QUERY );
private static final String MONGO_COLLECTION_E = FilePropertyManager.getProperty( ApplicationConstants.MONGO_CONSTANTS_FILE,
ApplicationConstants.MONGO_COLLECTION_E );
public void cleanup() {
}
protected static boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {}
#Override
public void execute(Tuple input, BasicOutputCollector collector) {
try {
if (isTickTuple(input)) {
CassExport.cassExp(KEYSPACE, TABLE_CASSANDRA_TOP_QUERY, MONGO_DB, MONGO_COLLECTION_E);
TruncateCassandraTable.truncateData(TABLE_CASSANDRA_TOP_QUERY);
Log.trace("In Truncate");
return;
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Can Anyone please suggest the required changes in the code ?

Now I understand: You have data tuples and tick tuples in the same input stream. Thus, for data tuples the first field is of type String, but for tick tuples it is of type Long. Thus, input.getString(0) runs in the ClassCastException for the first arriving tick tuple.
You need to update you bolt code like this:
Object field1 = input.getValue(0);
if (field1 instanceof Long) {
Long tick = (Long)field1;
// process tick tuple further
} else {
String feed = (String)field1;
// process data tuple as you did already
}

You need to differentiate between tick tuples and normal tuples within your execute method. Add this method to your bolt :
public boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
Now in execute, you can do
if(isTickTuple(tuple)){
doSomethingPeriodic()
} else {
executeLikeBefore()
}

The problem was with my TickTuple Bolt Implementation.I have added the
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, topology_tick_tuple_freq_secs)
In my main topology configuration.Instead it should be added in the bolt where TickTuple has be implemented.
I edited my TickTuple code , added this snippet and everything works fine.
#Override
public Map<String, Object> getComponentConfiguration() {
// configure how often a tick tuple will be sent to our bolt
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, topology_tick_tuple_freq_secs);
return conf;
}
This has to be added in the corresponding bolt instead of the main topology

servlet not working in CouchBase server

IN order to test couchbase, I am trying to create 30K-specific documents in 15 minutes.
During the test, 6563 documents are created and then hangs. I have seen that it takes 2 minutes to create 0-3K thousand; 5 minutes to create between 3K-6K; and 5 minutes to create the final 6K -6.5K documents.
Example here.
I would appreciate help in understanding what I am doing wrong. The code is below:
public class ConnectionManager {
Logger logger = Logger.getLogger(getClass().getName());
private CouchbaseClient client;
public ConnectionManager() {
init();
}
public void init() {
try {
logger.info("Opening base connection.");
List<URI> hosts = Arrays.asList(new URI("http://127.0.0.1:8091/pools"));
String bucket = "default";
String password = "";
client = new CouchbaseClient(hosts, bucket, password);
} catch (Exception e) {
client = null;
throw new IllegalStateException(e);
}
}
#PreDestroy
public void destroy() {
logger.info("Closing base connection.");
if (client != null) {
client.shutdown();
client = null;
}
}
public CouchbaseClient getClient() {
return client;
}
}
public class DatabaseManager {
ConnectionManager cm;
public DatabaseManager() {
cm = new ConnectionManager();
}
public String addDocument(String result) {
CouchbaseClient c = cm.getClient();
JSONParameters j = new JSONParameters();
String id = UUID.randomUUID().toString();
Date today = new Date();
SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy.MM.dd HH:mm:ss.SSSZ");
String date = DATE_FORMAT.format(today);
j.setTime(date);
j.setData(UUID.randomUUID().toString());
j.setSender_id(result);
j.setFlag(false);
Gson gson = new Gson();
String json = gson.toJson(j);
c.add(result, json);
return json;
}
public class DataBaseAddServlet extends HttpServlet {
#Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
try {
for (int k = 0; k < 30000; k++) {
String id = UUID.randomUUID().toString();
DatabaseManager dbManager = new DatabaseManager();
dbManager.addDocument(id);
}
} catch (Exception e) {
resp.getOutputStream().println(e.getMessage());
resp.flushBuffer();
}
}
}

I think you missed a key point from the example you linked to: in the Servlet, DatabaseManager is injected via dependency injection and thus there's only one instance created.
In you code, you actually create a new DatabaseManager inside the loop, so you end up creating 30K CouchbaseClients. You are probably hitting a limit, and definitely wasting a lot of time and resources with that much extra clients.
Just moving the DatabaseManager dbManager = new DatabaseManager(); before the for loop should make things far better.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why do I get duplicated data from flink? - java

Related

Why is deserialized object's size different from serialized object's size?

Writing an array of STRUCT from Dataflow to big query

Memory leak in Oracle's Jena adaptation?

java.lang.RuntimeException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String

servlet not working in CouchBase server

Categories

Resources