I have configured .pull file to produce and send metrics to InfluxDb for source, extractor and converter jobs. I tried with the example wikipedia job.
metrics.enabled=true
metrics.report.interval=30000
metrics.reporting.influxdb.metrics.enabled=true
metrics.reporting.influxdb.events.enabled=true
metrics.reporting.influxdb.database=****
metrics.reporting.influxdb.url=http://**.**.**.**:8086
metrics.reporting.influxdb.user=********
metrics.reporting.influxdb.password=****
metrics.reporting.influxdb.sending.type=TCP
But the job is not sending any data. I could not find any example with metrics in Gobblin
I found the problem here. The gobblin uses config file as a source for metrics configuration. Instead of adding the properties to *.pull or *.job file, I had to add those to *.conf file.
Once added, it will send metrics to the platform whichever is addded to the gobblin application.
Related
I have configured metrics in flink and exposed in prometheus, its working fine,
But I just need to verify some metrics in my integration test, so I was trying to expose to prometheus via java code
Followed the approach mentioned in below link (heading- Configuring Prometheus with Flink
)
https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html
and converted to java inline config.
conf.setString("metrics.reporters", "prom");
conf.setString("metrics.reporter.prom.class", "org.apache.flink.metrics.prometheus.PrometheusReporter");
conf.setInteger("metrics.reporter.prom.port", 9999);
But how to config prometheus yml contents?
can I set to same flink conf object as below?
conf.setString("global.scrape_interval","15s");
conf.setString("scrape_configs","[{job_name=name, static_configs=[{targets=[localhost:9999]}]}]}");
I need to design and configure Kafka jdbc connect project where source and sink both are postgres db, and I am using apache Kafka 2.8.
I have prepared POC for standalone mode, but I need to design it for distributed mode and data volume would be several million records.
Can you share any reference to setup for distributed mode and also parameters tuning and best practices?
I have gone through several documents but not getting precise document only for apache Kafka with jdbc connector.
Also please let me know how can I make this solution dockerized?
Thanks,
Suvendu
reference to setup for distributed mode
This is in the Kafka documentation. Run connect-distributed.sh along with its config file.
parameters tuning and best practices?
The config has reasonable defaults, but you're welcome to inspect the file for any changes. Only other thing would be heap settings, but 2G is the default Xmx, and can be set with KAFKA_HEAP_OPTS env var
This starts an HTTP server, and you POST JSON to it that has the same key values as the standalone jdbc worker file
precise document only for apache Kafka with jdbc connector
There's the official configuration page and handful of blogs (by Confluent) about it
how can I make this solution dockerized?
The Confluent Docker images would be best for this, though you may have to confluent-hub install the JDBC connector into an image of your own
I'd recommend Debezium as the source, though
If you access this URL -> https://apacheignite.readme.io/docs/jdbc-driver#section-streaming-mode
There it is mentioned that we can use streaming mode using cfg connection that has to be provided using ignite-jdbc.xml file.
But what are the contents of that file? How do we configure?
As that same page notes, it's "a complete Spring XML configuration." Have a look in the "examples" directory of the Ignite download for some samples, but the important thing is how to find the rest of the cluster.
Please note that preferred option for streaming with JDBC is using thin client driver (which neither needs an XML nor starts a client node) together with SET STREAMING ON.
Is it possible to define what queues and topics should exist in qpid using qpid-config.json. I am using qpid 7.1.0.
How would I do this in a config file?
Qpid Broker-J has two levels of configuration, broker-wide configurations and virtual-host specific configuration. Each virtual host has its own set of queues and topics (or - more properly - exchanges), so the queue and topic definitions are in the virtual host config.
If you are just using the default configuration you get with Broker-J then it will create a virtualhost named "default" with the configuration stored as JSON in the file system (e.g. in work/default/config.json ).
Probably the best way to see how the queue and exchange configuration is stored in that file is to first create queues/exchanges through the Management UI, and then look to see what the config looks like. (Note that you shouldn't manually edit the config while Qpid is running... as it will likely overwrite it, however you can update the config while Qpid is stopped and it will pick up the changes).
I'm currently submitting Storm topologies programatically via my Java application by using the following command:
Nimbus.Client client = NimbusClient.getConfiguredClient(stormConfigProvider.getStormConfig()).getClient();
client.submitTopology(
this.topologyID.toString(),
stormJarManager.getRemoteJarLocation(),
JSONValue.toJSONString(stormConfigProvider.getStormConfig()),
topology
);
In my scenario, I have two kinds of topologies. Testing topologies and production topologies. For both kind of topologies, I require different types of logging. While the testing topologies run with TRACE level, the production topologies will run with INFO level. In addition, I require that the production topologies have a SPLUNK Log4J2 appender configured, to centralize the logging of my production application.
For that, I included a log4j.xml file into my topology JAR which configures the SPLUNK appender. However, the log4j.xml file is not honored by the Server. Instead, the Storm Server seems to use its own configuration.
How can I change my log4j configuration for different topologies? (I don't want to modify the log4j.xml on each worker).
You can use https://storm.apache.org/releases/current/dynamic-log-level-settings.html to set log levels for each topology.
I'm not sure how you'd add/remove the splunk appender based on the loaded topology. You might be able to configure log4j programatically https://logging.apache.org/log4j/2.x/manual/customconfig.html and set the log4j2.configurationFactory system property on your workers to point to your configuration factory (you can do this by adding it to the topology.worker.childopts property in your topology config).
Just for context, here's where Storm sets the system property that causes Log4j to load the worker log4j configuration https://github.com/apache/storm/blob/4137328b75c06771f84414c3c2113e2d1c757c08/storm-server/src/main/java/org/apache/storm/daemon/supervisor/BasicContainer.java#L560. If you wanted to load a log4j2.xml included in your topology jar, maybe it would be possible to conditionally exclude that setting from the system properties set for workers. I think it would require a code change though, so you'd need to raise an issue on https://issues.apache.org/jira