Flume twitter stream

Flume twitter stream - java

I am trying to execute flume to get data from twitter stream but received this error while executing the flume.
[ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:140)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I am a beginner to flume and working in Cloudera quickstart. While searching for solutions it was said to install maven and then build the flume-snapshot jar file from there but I don't know how I can install maven in Cloudera quickstart. Any help on how to correct this error please I have been stuck here for 1 week.

Found the solution:
The conflict is raised by twitter4j files and flume snapshot jars. So I renamed the twitter4j jars by changing their file extenstion with jarx. Another thing I did by reading from this article is to put flume snapshot in following hierarchy.
/usr/lib/flume-ng/lib/plugins.d/flumesnapshot and following same pattern in var directory.

Related

JanusGraph 0.5.2 embedded cassandra java.lang.NoSuchMethodError: com.codahale.metrics.Snapshot

We are setting up JanusGraph 0.5.2 with embedded cassandra mode.
When we start Janus in this mode, it throws below exception in its logs:
org.apache.cassandra.service.CassandraDaemon - Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NoSuchMethodError: com.codahale.metrics.Snapshot: method <init>()V not found
at org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir$EstimatedHistogramReservoirSnapshot.<init>(DecayingEstimatedHistogramReservoir.java:353)
at org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir.getSnapshot(DecayingEstimatedHistogramReservoir.java:224)
at com.codahale.metrics.Histogram.getSnapshot(Histogram.java:54)
at com.codahale.metrics.Timer.getSnapshot(Timer.java:142)
at org.apache.cassandra.db.ColumnFamilyStore$3.run(ColumnFamilyStore.java:446)
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I searched it on net it seems to be some maven dependency issue of metrics-core.
Any way to resolve this in JanusGraph setup? Can we suppress this exception from logs or disable Cassandra metrics ?
This does not used to occur in Janus 0.3.1
Please help

So this problem ultimately comes down to conflicting versions of the Codahale metrics-core JAR. I believe Cassandra is currently dependent on 3.1.5, and JanusGraph (Gremlin, actually) ships with (both) 3.0.2 and 3.2.2.
One solution out there involves removing 3.0.2 from $JANUSGRAPH_HOME/lib.
But if you don't want to mess around with library dependencies of different projects, the best solution is probably to ensure that JanusGraph and Cassandra run in separate JVMs.

java.lang.NoClassDefFoundError: Could not initialize class com.google.pubsub.v1.ProjectTopicName

I am working on integrating Kafka with Google PubSub by using the CloudPubSubConnector provided here: https://github.com/GoogleCloudPlatform/pubsub/tree/master/kafka-connector#cloudpubsubconnector-configs
with a locally hosted Kafka server running on my machine.
When I run the connector, I get the following stack trace:
[2020-05-29 15:20:01,678] ERROR WorkerSinkTask{id=CPSSinkConnector-9} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
java.lang.NoClassDefFoundError: Could not initialize class com.google.pubsub.v1.ProjectTopicName
at com.google.pubsub.kafka.sink.CloudPubSubSinkTask.createPublisher(CloudPubSubSinkTask.java:353)
at com.google.pubsub.kafka.sink.CloudPubSubSinkTask.start(CloudPubSubSinkTask.java:143)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:305)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
[2020-05-29 15:20:01,678] INFO Created connector CPSSinkConnector (org.apache.kafka.connect.cli.ConnectStandalone:112)
[2020-05-29 15:20:01,678] ERROR WorkerSinkTask{id=CPSSinkConnector-9} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:187)
[2020-05-29 15:20:01,678] INFO Stopping CloudPubSubSinkTask (com.google.pubsub.kafka.sink.CloudPubSubSinkTask:386)
I am trying to understand what the issue is. The connector works fine if I host Kafka on a GCP VM, and my maven dependencies seem to be set up correctly, but it can't find the ProjectTopicName class for some reason.
UPDATE: RESOLVED
This issue does not persist when using Kafka version 2.4.1 instead of 2.5.0

It looks like ProjectTopicName was accidentally deprecated and reinstated in google-cloud-pubsub version 1.104.0. Are you able to use TopicName instead, or update the library?

Apache Spark job failed with FileNotFoundExceptoin

I have a spark cluster consists of 5 nodes and I have a spark job written in Java that read set of files from a directory and send the content to Kafka.
When I was testing the job locally, everything was working fine.
When I tried to submit the job to the cluster, the job failed with FileNoTFoundException
The files need to be processed exists in a directory mounted on all the 5 nodes so I am sure the file path appears in the exception exists.
Here is the exception appears while submitting the job
java.io.FileNotFoundException: File file:/home/me/shared/input_1.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The directory /home/me/shared/ is mounted on all the 5 nodes.
EDIT:
Here is the command I am using to submit the job
bin$ ./spark-submit --total-executor-cores 20 --executor-memory 5G --class org.company.java.FileMigration.FileSparkMigrator --master spark://spark-master:7077 /home/me/FileMigrator-0.1.1-jar-with-dependencies.jar /home/me/shared kafka01,kafka02,kafka03,kafka04,kafka05 kafka_topic
I faced a weird behavior. I submitted the job while the directory contains only one file, the exception is thrown on the driver but the file is processed successfully. Then, I added another file, the same behavior occurred. But, once I added the third file, the exception is thrown and the job failed.
EDIT 2
After some tries, we discovered that there was a problem in the mounted directory that caused this weird behavior.

Spark defaults to HDFS by default. This looks like an NFS file, so try to access it with: file:///home/me/shared/input_1.txt
Yes, three /!

Here is what solve the problem for me. It is weird and I have no idea what the actual problem was.
Simply I asked the sysadmin to mount another directory instead of the one I was using. After that, everything worked fine.
Is seems there was an issue in the old mounted directory but I have no idea what was the actual problem.

Caused by: java.net.MalformedURLException: no protocol using PublicClient samples for ADAL

I wanted to play around with Microsoft Azure's Active Directory Library for Java. After I pulled the code from github and importing to Eclipse as a maven project, building and executing the PublicClient.java sample file, I get the following exception:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.util.concurrent.ExecutionException: java.net.MalformedURLException: no protocol:
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at src.main.java.PublicClient.getAccessTokenFromUserCredentials(PublicClient.java:36)
at src.main.java.PublicClient.main(PublicClient.java:23)
Caused by: java.net.MalformedURLException: no protocol:
at java.net.URL.<init>(URL.java:585)
at java.net.URL.<init>(URL.java:482)
at java.net.URL.<init>(URL.java:431)
at com.microsoft.aad.adal4j.HttpHelper.openConnection(HttpHelper.java:110)
at com.microsoft.aad.adal4j.HttpHelper.executeHttpGet(HttpHelper.java:43)
at com.microsoft.aad.adal4j.HttpHelper.executeHttpGet(HttpHelper.java:38)
at com.microsoft.aad.adal4j.MexParser.getWsTrustEndpointFromMexEndpoint(MexParser.java:87)
at com.microsoft.aad.adal4j.AuthenticationContext.processPasswordGrant(AuthenticationContext.java:852)
at com.microsoft.aad.adal4j.AuthenticationContext.access$0(AuthenticationContext.java:839)
at com.microsoft.aad.adal4j.AuthenticationContext$1.call(AuthenticationContext.java:129)
at com.microsoft.aad.adal4j.AuthenticationContext$1.call(AuthenticationContext.java:1)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I have correctly specified my client ID, username and password. Can someone please help me resolve this issue?

I reproduce your issue. The problem is that the sample PublicClient.java is a separate Maven project with Microsoft AAD Library for Java.
To resolve this issue, you need to import the sample project alone to Eclipse as the follows.
Then you run the maven install to install all depencies for the sample project and execute it. It will works without exception.

Thanks to Peter Pan - MSFT, I was able to get this issue resolved. The problem was I was using a Microsoft account and not a service account. After I setup a service account, I was able to get the sample working.
In order to setup a service account, open the azure portal, go to all services, open subscriptions, select your subscription, scroll down and click on "Roles" and click on "Reader" (or any other category which you like but make sure it has proper access priviliges) and click on "+" to add a new user which should not be a microsoft user.

Intellij 14 saying that "Internal caches are corrupted or have outdated format"?

I've been working on a Java Jersey RESTful web application and everything was going fine until I compiled once again and got the following compiler error.
I've Googled this issue and many suggested doing the File->Invalidate Caches/Restart... but this did not work.
I also tried manually deleting the files in the Mac OS X directory: /Libary/Caches/IntellijIdea14. This also did not work.
Has anyone ever experienced this before? I'm very confused why this error came and what exactly is causing it. It's certainly putting a delay to my development!
Stack Trace:
Information:Internal caches are corrupted or have outdated format, forcing project rebuild: java.io.FileNotFoundException: /Users/grantmcgovern/Dropbox/Developer/Projects/1834Software/GymAPI/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/target/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/target/classes/target/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/.idea/libraries/Maven__org_glassfish_jersey_test_framework_providers_jersey_test_framework_provider_grizzly2_2_17.xml (File name too long)
Information:4/5/15, 10:39 PM - Compilation completed with 1 error and 0 warnings in 29 sec
Error:Internal error: (java.io.FileNotFoundException) /Users/grantmcgovern/Dropbox/Developer/Projects/1834Software/GymAPI/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/target/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/target/classes/target/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/.idea/libraries/Maven__org_glassfish_jersey_test_framework_providers_jersey_test_framework_provider_grizzly2_2_17.xml (File name too long)
java.io.FileNotFoundException: /Users/grantmcgovern/Dropbox/Developer/Projects/1834Software/GymAPI/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/target/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/target/classes/target/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/out/artifacts/GymAPI_war_exploded/WEB-INF/classes/.idea/libraries/Maven__org_glassfish_jersey_test_framework_providers_jersey_test_framework_provider_grizzly2_2_17.xml (File name too long)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
at com.intellij.openapi.util.io.FileUtil.openOutputStream(FileUtil.java:508)
at com.intellij.openapi.util.io.FileUtil.performCopy(FileUtil.java:460)
at com.intellij.openapi.util.io.FileUtil.copyContent(FileUtil.java:454)
at org.jetbrains.jps.incremental.artifacts.instructions.FilterCopyHandler.copyFile(FilterCopyHandler.java:40)
at org.jetbrains.jps.incremental.artifacts.instructions.FileBasedArtifactRootDescriptor.copyFromRoot(FileBasedArtifactRootDescriptor.java:100)
at org.jetbrains.jps.incremental.artifacts.IncArtifactBuilder.build(IncArtifactBuilder.java:159)
at org.jetbrains.jps.incremental.artifacts.IncArtifactBuilder.build(IncArtifactBuilder.java:50)
at org.jetbrains.jps.incremental.IncProjectBuilder.buildTarget(IncProjectBuilder.java:855)
at org.jetbrains.jps.incremental.IncProjectBuilder.runBuildersForChunk(IncProjectBuilder.java:836)
at org.jetbrains.jps.incremental.IncProjectBuilder.buildTargetsChunk(IncProjectBuilder.java:894)
at org.jetbrains.jps.incremental.IncProjectBuilder.buildChunkIfAffected(IncProjectBuilder.java:789)
at org.jetbrains.jps.incremental.IncProjectBuilder.buildChunks(IncProjectBuilder.java:612)
at org.jetbrains.jps.incremental.IncProjectBuilder.runBuild(IncProjectBuilder.java:352)
at org.jetbrains.jps.incremental.IncProjectBuilder.build(IncProjectBuilder.java:191)
at org.jetbrains.jps.cmdline.BuildRunner.runBuild(BuildRunner.java:137)
at org.jetbrains.jps.cmdline.BuildSession.runBuild(BuildSession.java:289)
at org.jetbrains.jps.cmdline.BuildSession.run(BuildSession.java:124)
at org.jetbrains.jps.cmdline.BuildMain$MyMessageHandler$1.run(BuildMain.java:238)
at org.jetbrains.jps.service.impl.SharedThreadPoolImpl$1.run(SharedThreadPoolImpl.java:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Oddly enough, the following worked (since it's a Maven project):
$ mvn clean
I suppose cleaning the modules must have done something because it built just fine thereafter. I have to believe this is some sort of Intellij bug.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flume twitter stream - java

Related

JanusGraph 0.5.2 embedded cassandra java.lang.NoSuchMethodError: com.codahale.metrics.Snapshot

java.lang.NoClassDefFoundError: Could not initialize class com.google.pubsub.v1.ProjectTopicName

Apache Spark job failed with FileNotFoundExceptoin

Caused by: java.net.MalformedURLException: no protocol using PublicClient samples for ADAL

Intellij 14 saying that "Internal caches are corrupted or have outdated format"?

Categories

Resources