Using scala and spark, i try to build simple apps. To avoid too many logs, i have setLevel Log to Level.ERROR, but it seems all log still appear. Here're my codes :
import org.apache.log4j.{BasicConfigurator, Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by hduser on 16/08/16.
*/
object ALS_Test {
def main(command : Array[String]): Unit =
{
Logger.getRootLogger
Logger.getLogger(this.getClass).setLevel(Level.ERROR)
Logger.getLogger("org.spark_project").setLevel(Level.ERROR)
BasicConfigurator.configure()
val sparkConf = new SparkConf().setAppName("AppName").setMaster("local[4]")
val sc = new SparkContext(sparkConf)
println("test 123")
}
}
The Output when running still have many confused logs :
0 [main] INFO org.apache.spark.SparkContext - Running Spark version 2.0.0
163 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time)
175 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time)
176 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[GetGroups], valueName=Time)
177 [main] DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
555 [main] DEBUG org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
Related
I have a camel route in MyRouteBuilder.java file which is consuming messages from ActiveMQ:
from("activemq:queue:myQueue" )
.process(consumeDroppedMessage)
.log(">>> I am here");
I wrote a test case for the following like this :
#Override
public RouteBuilder createRouteBuilder() throws Exception {
return new MyRouteBuilder();
}
#Test
void testMyTest() throws Exception {
String queueInputMessage = "My Msg";
template.sendBody("activemq:queue:myQueue", queueInputMessage);
assertMockEndpointsSatisfied();
}
When I run the unit test case I get this strange error:
7:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Route: route1 >>> Route[activemq://queue:null -> null]
17:53:26.175 [main] DEBUG org.apache.camel.impl.engine.InternalRouteStartupManager - Starting consumer (order: 1000) on route: route1
17:53:26.175 [main] DEBUG org.apache.camel.support.DefaultConsumer - Build consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Init consumer: Consumer[activemq://queue:null]
17:53:26.185 [main] DEBUG org.apache.camel.support.DefaultConsumer - Starting consumer: Consumer[activemq://queue:null]
17:53:26.213 [main] DEBUG org.apache.activemq.thread.TaskRunnerFactory - Initialized TaskRunnerFactory[ActiveMQ Task] using ExecutorService: java.util.concurrent.ThreadPoolExecutor#3fffff43[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
17:53:26.215 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Reconnect was triggered but transport is not started yet. Wait for start to connect the transport.
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Started unconnected
17:53:26.334 [main] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Waking up reconnect task
17:53:26.335 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
17:53:26.339 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Established shared JMS Connection
17:53:26.340 [main] DEBUG org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Resumed paused task: org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker#58c34bb3
17:53:26.372 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Attempting 0th connect to: tcp://localhost:61616
17:53:28.393 [ActiveMQ Task-1] DEBUG org.apache.activemq.transport.failover.FailoverTransport - Connect fail to: tcp://localhost:61616, reason: {}
I am especially stumped to see these messages:
Route: route1 >>> Route[activemq://queue:null -> null]
and
urlList connectionList:[tcp://localhost:61616], from: [tcp://localhost:61616]
Why is the queue coming up as null though I have a proper queue name? Also why is the broker url tcp://localhost:61616?
I want to run this unit test case so that it runs properly in all environments like: local, DIT , SIT, PROD etc. So, for that I cannot afford the broker url to be: tcp://localhost:61616.
Any ideas as to what I am doing wrong here and what I should be doing?
EDIT 1:
One of the issues that I am seeing is even before the test class is called, the MyRouteBuilder() inside createRouteBuilder() is invoked, leading to the issues that I see in the log.
The "activemq:queue:.." is telling Camel to use the auto-configure magic behind the scenes (which uses default url) and your use case is beyond that.
You need to configure a connection factory (ActiveMQConnectionFactory) and configure a camel-jms component to use that connection factory.
The connection factory allows you to specify url, userName, password, default connection settings and setup SSL.
A best practice is to externalize the url, userName, password and queue to a properties file so you can change those across the environments-- local, DIT, SIT and prod, etc.
NOTE: Use org.apache.camel/camel-jms component, and not the org.apache.activemq/activemq-camel component. activemq-camel is deprecated and being removed in ActiveMQ 5.17.x.
Instead of setting up an explicit active mq broker , I started using a VM broker .
#Override
protected RoutesBuilder createRouteBuilder() throws Exception {
return new RouteBuilder() {
#Override
public void configure() {
ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=false");
ActiveMQComponent activeMQComponent = new ActiveMQComponent();
activeMQComponent.setConnectionFactory(connectionFactory);
context.addComponent("activemq", activeMQComponent);
from("activemq:queue:myQueue").to("mock:collector");
}
};
}
Also , I mistook camel junit as a traditional junit . We don't need to call explicitly the actual route builder class . Instead after setting up my activeMq component up above , I was able to write my test methods, mock my end points for queue and send messages and assert them . Camel is truly versatile . Requires a lot of study though .
I have changed Cassandra configuration file
cat /etc/cassandra/cassandra.yaml | grep -n 'seed'
416:seed_provider:
423: # seeds is actually a comma-delimited list of addresses.
425: - seeds:"84.208.89.132,192.168.0.23,192.168.0.25,192.168.0.28"
and also cluster name
10:cluster_name: 'Petter Cluster'
I am surprised to see what the system.log shows
INFO [main] 2018-01-27 17:20:51,343 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
ERROR [main] 2018-01-27 17:20:51,427 CassandraDaemon.java:706 - Exception encountered during startup: Invalid yaml: file:/etc/cassandra/cassandra.yaml
Error: while parsing a block mapping; expected <block end>, but found FlowEntry; in 'reader', line 425, column 34:
- seeds: "192.168.0.13","192.168.0.23","192.168.0.25"," ...
^
INFO [main] 2018-02-03 20:35:48,528 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
ERROR [main] 2018-02-03 20:35:48,844 CassandraDaemon.java:706 - Exception encountered during startup: Invalid yaml: file:/etc/cassandra/cassandra.yaml
Error: null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=seed_provider for JavaBean=org.apache.cassandra.config.Config#551bdc27; java.lang.reflect.InvocationTargetException; in 'reader', line 10, column 1:
cluster_name: 'Test Cluster'
^
INFO [main] 2018-02-03 20:39:08,311 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
ERROR [main] 2018-02-03 20:39:08,647 CassandraDaemon.java:706 - Exception encountered during startup: Invalid yaml: file:/etc/cassandra/cassandra.yaml
Error: null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=seed_provider for JavaBean=org.apache.cassandra.config.Config#551bdc27; java.lang.reflect.InvocationTargetException; in 'reader', line 10, column 1:
cluster_name: 'Test Cluster'
How to fix this?How to initialize system after the changes?
It seems you have got into a issue with Cluster name,it is supposed be changed on all the nodes if you willing to change it.
Here are instruction to change Cluster name :
1. Log into cqlsh
2. cqlsh> UPDATE system.local SET cluster_name = 'Petter Cluster' where key='local'; (You need to issue this command on each of the nodes where you would like to change the cluster name. )
system.local gets changed only locally
3. cqlsh> exit;
4. $ nodetool flush system
5. edit cassandra.yaml cluster name to YOUR_CLUSTER_NAME.
6. Restart cassandra.
Please check this link as well:
https://surbhinosqldba.wordpress.com/2015/07/23/how-to-rename-modify-cassandra-cluster-name/
Using the DirectPipelineRunner, I'd like to run my pipeline locally for debugging purposes. I'm using SDK 1.9.0 with Java 8.
My pipeline reads a table from BigQuery, transforms some fields, and writes back to BigQuery.
Running on GCP, i.e. using the DataflowPipelineRunner runner works absolutely fine. However, when I use the DirectPipelineRunner is just keeps spitting out the following log info, and does nothing else:
19:45:05,470 21866 [main] INFO com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner - Executing pipeline using the DirectPipelineRunner.
19:45:18,594 34990 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_c88ee6741e434aabbf50e73d4e6733d1-extract found.
19:45:27,344 43740 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_012dca76d75e461480fe75897b5fa7ba-extract found.
19:45:38,150 54546 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_3548a0ee373a417e8e7570ae90aef78d-extract found.
19:45:47,912 64308 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_db0b957250ef41279a639bdc113c5493-extract found.
19:45:56,685 73081 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_3773e0643ec14475aaa140bcf46ea7af-extract found.
19:46:45,958 122354 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_27af9a1163944cb19e520242de98d899-extract found.
19:46:55,766 132162 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_5473e6702b3544118c7da8877c900f7a-extract found.
19:47:04,015 140411 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_40f47d35aa154708a6fc684c8ffb0ba4-extract found.
19:47:11,913 148309 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_6dce34301c97498884d7344b85a1b07e-extract found.
19:47:35,809 172205 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_4f7c26d372974095a24ac58b547c13d6-extract found.
19:47:45,136 181532 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_a7c33e75bfdb41a6990dd66810a0d44a-extract found.
19:47:55,802 192198 [main] INFO com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl - No BigQuery job with job id beam_job_a1d7422ca42a4b1d96205bf8c6dada9d-extract found.
The log message is coming from here:
#VisibleForTesting
public Job getJob(JobReference jobRef, Sleeper sleeper, BackOff backoff)
throws IOException, InterruptedException {
String jobId = jobRef.getJobId();
Exception lastException;
do {
try {
return client.jobs().get(jobRef.getProjectId(), jobId).execute();
} catch (GoogleJsonResponseException e) {
if (errorExtractor.itemNotFound(e)) {
LOG.info("No BigQuery job with job id {} found.", jobId);
return null;
}....
Eventually, the JVM runs out of memory:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.String.toLowerCase(String.java:2647)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:847)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:472)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:781)
at com.google.api.client.json.JsonParser.parseArray(JsonParser.java:648)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:740)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:472)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:781)
at com.google.api.client.json.JsonParser.parseArray(JsonParser.java:648)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:740)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:472)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:781)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:382)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:355)
at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:87)
at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:81)
at com.google.api.client.http.HttpResponse.parseAs(HttpResponse.java:459)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator.executeWithBackOff(BigQueryTableRowIterator.java:497)
at com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator.advance(BigQueryTableRowIterator.java:180)
at com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl$BigQueryJsonReaderImpl.advance(BigQueryServicesImpl.java:555)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$BigQuerySourceBase$BigQueryReader.advance(BigQueryIO.java:1331)
at com.google.cloud.dataflow.sdk.io.Read$Bounded$1.evaluateReadHelper(Read.java:180)
at com.google.cloud.dataflow.sdk.io.Read$Bounded$1.evaluate(Read.java:168)
at com.google.cloud.dataflow.sdk.io.Read$Bounded$1.evaluate(Read.java:164)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:858)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:221)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:217)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:103)
The table in BigQuery only has 100 rows (it's just for debugging).
What is the problem here?
I believe the BigQuery message is a red-herring; the stack trace of the OOM indicates that data is being read directly from the table and not via an export job.
The DirectPipelineRunner is not at all optimized for memory utilization; try using the newer InProcessPipelineRunner. Additionally, it may be worth using standard Java heap profiling tools to see where the memory is being used.
I have a few tables in Hive in HDFS, how do I read them into dataframe from spark? How does HiveContext know where my hive warehouse is?
My current code, it throws out of memory error for some reason, these tables are tiny, 30k rows max, 3-5 columns.
SparkConf sparkConf = new SparkConf().setAppName("Hive Test").setMaster("local[*]");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(ctx);
// Queries are expressed in HiveQL.
DataFrame results = sqlContext.sql("SELECT * FROM departments");
results.show();
$ hdfs dfs -ls /user/hive/warehouse
Found 6 items
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:09 /user/hive/warehouse/categories
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:13 /user/hive/warehouse/customers
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:09 /user/hive/warehouse/departments
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:11 /user/hive/warehouse/order_items
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:16 /user/hive/warehouse/orders
drwxrwxrwt - [Omitted] hive 0 2016-10-30 14:09 /user/hive/warehouse/products
16/10/30 14:38:29 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
16/10/30 14:38:29 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/10/30 14:38:30 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/10/30 14:38:30 INFO ObjectStore: ObjectStore, initialize called
16/10/30 14:38:30 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/10/30 14:38:30 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/10/30 14:38:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/10/30 14:38:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/10/30 14:38:31 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
I'm using Storm Flux (0.10.0) DSL to deploy the following topology (simplified to keep only the relevant parts of it):
---
name: "my-topology"
components:
- id: "esConfig"
className: "java.util.HashMap"
configMethods:
- name: "put"
args:
- "es.nodes"
- "${es.nodes}"
bolts:
- id: "es-bolt"
className: "org.elasticsearch.storm.EsBolt"
constructorArgs:
- "myindex/docs"
- ref: "esConfig"
parallelism: 1
# ... other bolts, spouts and streams here
As you can see, one of the bolts I use is org.elasticsearch.storm.EsBolt which has the following constructors (see code):
public EsBolt(String target) { ... }
public EsBolt(String target, boolean writeAck) { ... }
public EsBolt(String target, Map configuration) { ... }
The last one should be called because I pass a String and a Map in the constructorArgs. But when I deploy the topology I get the following exception, as if Flux wasn't able to infer the right constructor from the types (String, Map):
storm jar mytopology-1.0.0.jar org.apache.storm.flux.Flux --local mytopology.yml --filter mytopology.properties
...
Version: 0.10.0
Parsing file: mytopology.yml
958 [main] INFO o.a.s.f.p.FluxParser - loading YAML from input stream...
965 [main] INFO o.a.s.f.p.FluxParser - Performing property substitution.
969 [main] INFO o.a.s.f.p.FluxParser - Not performing environment variable substitution.
1252 [main] INFO o.a.s.f.FluxBuilder - Detected DSL topology...
1431 [main] WARN o.a.s.f.FluxBuilder - Found multiple invokable constructors for class class org.elasticsearch.storm.EsBolt, given arguments [myindex/docs, {es.nodes=localhost}]. Using the last one found.
Exception in thread "main" java.lang.IllegalArgumentException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.storm.flux.FluxBuilder.buildObject(FluxBuilder.java:291)
at org.apache.storm.flux.FluxBuilder.buildBolts(FluxBuilder.java:372)
at org.apache.storm.flux.FluxBuilder.buildTopology(FluxBuilder.java:88)
at org.apache.storm.flux.Flux.runCli(Flux.java:153)
at org.apache.storm.flux.Flux.main(Flux.java:98)
Any idea about what could be happening? Here is how Storm Flux finds a compatible constructor. The magic is in the canInvokeWithArgs method.
These are Flux debug logs, where you see how FluxBuilder finds the most appropriate constructor:
Version: 0.10.0
Parsing file: mytopology.yml
559 [main] INFO o.a.s.f.p.FluxParser - loading YAML from input stream...
566 [main] INFO o.a.s.f.p.FluxParser - Performing property substitution.
569 [main] INFO o.a.s.f.p.FluxParser - Not performing environment variable substitution.
804 [main] INFO o.a.s.f.FluxBuilder - Detected DSL topology...
org.apache.logging.slf4j.Log4jLogger#3b69e7d1
1006 [main] DEBUG o.a.s.f.FluxBuilder - Found constructor arguments in definition: java.util.ArrayList
1006 [main] DEBUG o.a.s.f.FluxBuilder - Checking arguments for references.
1010 [main] DEBUG o.a.s.f.FluxBuilder - Target class: org.elasticsearch.storm.EsBolt
1011 [main] DEBUG o.a.s.f.FluxBuilder - found constructor with same number of args..
1011 [main] DEBUG o.a.s.f.FluxBuilder - Comparing parameter class class java.lang.String to object class class java.lang.String to see if assignment is possible.
1011 [main] DEBUG o.a.s.f.FluxBuilder - Yes, they are the same class.
1012 [main] DEBUG o.a.s.f.FluxBuilder - ** invokable --> true
1012 [main] DEBUG o.a.s.f.FluxBuilder - found constructor with same number of args..
1012 [main] DEBUG o.a.s.f.FluxBuilder - Comparing parameter class class java.lang.String to object class class java.lang.String to see if assignment is possible.
1012 [main] DEBUG o.a.s.f.FluxBuilder - Yes, they are the same class.
1012 [main] DEBUG o.a.s.f.FluxBuilder - ** invokable --> true
1012 [main] DEBUG o.a.s.f.FluxBuilder - Skipping constructor with wrong number of arguments.
1012 [main] WARN o.a.s.f.FluxBuilder - Found multiple invokable constructors for class class org.elasticsearch.storm.EsBolt, given arguments [myindex/docs, {es.nodes=localhost}]. Using the last one found.
1014 [main] DEBUG o.a.s.f.FluxBuilder - Found something seemingly compatible, attempting invocation...
1044 [main] DEBUG o.a.s.f.FluxBuilder - Comparing parameter class class java.lang.String to object class class java.lang.String to see if assignment is possible.
1044 [main] DEBUG o.a.s.f.FluxBuilder - They are the same class.
1044 [main] DEBUG o.a.s.f.FluxBuilder - Comparing parameter class boolean to object class class java.util.HashMap to see if assignment is possible.
Exception in thread "main" java.lang.IllegalArgumentException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.storm.flux.FluxBuilder.buildObject(FluxBuilder.java:291)
...
This issue has been fixed on Nov 18, 2015.
See this: https://github.com/apache/storm/commit/69b9cf581fd977f6c28b3a78a116deddadc44014
And the next version of Storm with this fix should be released within a month.
As a crappy workaround, I've finally extended EsBolt to expose only the constructors I need and avoid collisions.
package com.mypackage;
import java.util.Map;
import org.elasticsearch.storm.EsBolt;
public class EsBoltWrapper extends EsBolt {
public EsBoltWrapper(String target) {
super(target);
}
public EsBoltWrapper(String target, Map configuration) {
super(target, configuration);
}
}
Now my topology looks like this:
bolts:
- id: "es-bolt"
className: "com.mypackage.EsBoltWrapper" # THE NEW CLASS
constructorArgs:
- "myindex/docs"
- ref: "esConfig"
parallelism: 1
It seems to be a bug in Storm Flux.