Flink: TaskManager cannot connect to the JobManager - Could not resolve ResourceManager address - java

I'm using the Apache Flink Kubernetes operator to deploy a standalone job on an Application cluster setup.
I have setup the following files using the Flink official documentation - Link
jobmanager-application-non-ha.yaml
taskmanager-job-deployment.yaml
flink-configuration-configmap.yaml
jobmanager-service.yaml
I have not changed any of the configurations in these files and am trying to run a simple WordCount example from the Flink examples using the Apache Flink Operator.
After running the kubectl commands to setting up the job manager and the task manager - the job manager goes into a NotReady state while the task manager goes into a CrashLoopBackOff loop.
NAME READY STATUS RESTARTS AGE
flink-jobmanager-28k4b 1/2 NotReady 2 (4m24s ago) 16m
flink-kubernetes-operator-6585dddd97-9hjp4 2/2 Running 0 10d
flink-taskmanager-6bb88468d7-ggx8t 1/2 CrashLoopBackOff 9 (2m21s ago) 15m
The job manager logs look like this
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout
at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86) ~[flink-dist-1.16.0.jar:1.16.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:453) ~[flink-rpc-akka_be40712e-8b2e-47cd-baaf-f0149cf2604d.jar:1.16.0]
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_be40712e-8b2e-47cd-baaf-f0149cf2604d.jar:1.16.0]
The Task manager it seems cannot connect to the job manager
2023-01-28 19:21:47,647 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Connecting to ResourceManager akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).
2023-01-28 19:21:57,766 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*.
2023-01-28 19:22:08,036 INFO akka.remote.transport.ProtocolStateActor [] - No response from remote for outbound association. Associate timed out after [20000 ms].
2023-01-28 19:22:08,057 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink#flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#flink-jobmanager:6123]] Caused by: [No response from remote for outbound association. Associate timed out after [20000 ms].]
2023-01-28 19:22:08,069 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#flink-jobmanager:6123/user/rpc/resourcemanager_*.
2023-01-28 19:22:08,308 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager/100.127.18.9:6123
The flink-configuration-configmap.yaml looks like this
flink-conf.yaml: |+
jobmanager.rpc.address: flink-jobmanager
taskmanager.numberOfTaskSlots: 2
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
queryable-state.proxy.ports: 6125
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
parallelism.default: 2
This is what the pom.xml looks like - Link

You deployed the Kubernetes Operator in the namespace, but you did not create the CRDs the Operator requires. Instead you tried to create a standalone Flink Kubernetes cluster.
The Flink Operator makes it a lot easier to deploy your Flink jobs, you only need to deploy the operator itself and FlinkDeployment/FlinkSessionJob CRDs. The operator will manage your deployment after.
Please use this documentation for the Kubernetes Operator: Link

Related

Failed to connect Azure servicebus topic using JMS - Java

I followed steps as mentioned in Azure ServiceBus JMS Sample with below properties
spring.jms.servicebus.connection-string=Endpoint=sb://test-dt.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=key
spring.jms.servicebus.topic-client-id=12345
spring.jms.servicebus.idle-timeout=18000
spring.jms.servicebus.pricing-tier=Standard
however I get below error
ERROR 43904 --- [ntContainer#0-1] org.apache.qpid.jms.JmsConnection : Failed to connect to remote at: amqps://test-dt.servicebus.windows.net:-1
ERROR 43904 --- [ntContainer#0-1] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'test-topic' - retrying using FixedBackOff{interval=5000, currentAttempts=6, maxAttempts=unlimited}. Cause: handshake timed out after 10000ms
On the other hand, I followed steps as mentioned in ServiceBus without JMS and added transportType as AmqpTransportType.AMQP_WEB_SOCKETS then I am able to connect.
We want to implement using spring boot starter and listener method, instead of calling from (public static void main) method.
Please guide on what am I missing when following first link
ERROR 43904 --- [ntContainer#0-1] org.apache.qpid.jms.JmsConnection : Failed to connect to remote at: amqps://test-dt.servicebus.windows.net:-1
To resolve above error, try as suggested by Anand Sowmithiran:
Check if port 5671 is blocked:
telnet <yournamespacename>.servicebus.windows.net 5671
Note: Clients that use AMQP connections over TCP require ports 5671 and 5672 to be opened in the firewall. Along with these ports, it might be necessary to open additional ports if the EnableLinkRedirect feature is enabled.
You can refer to Troubleshooting guide for Azure Service Bus, AMQP outbound port requirements and Port 5671 Blocked :(. What are other options?

how can we add a document using solr cloud server

While adding a document using solr cloud server I'm getting following exception
60 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper
65 [main-SendThread(jmajeed.ibsorb.com:8982)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server jmajeed.ibsorb.com/192.168.70.91:8982. Will not attempt to authenticate using SASL (unknown error)
69 [main-SendThread(jmajeed.ibsorb.com:8982)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to jmajeed.ibsorb.com/192.168.70.91:8982, initiating session
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 192.168.70.91:8982/#/hotelcontent within 10000 ms
Does anybody has any idea why this is happening??
Thanks.
Have the disturbed the default configuration of solr nodes because by default if you do not specify the port the first node in the cluster will start in 8983 port so check this first. If this is not the problem then check whether the cluster is up or not by accessing admin UI of solr cloud. Then see whether all the shards in the cluster are alive by clicking on the cloud tab.
If everything is fine and still you are facing the above problem then are you trying to access a remote solr cloud server and it is firewall issue.

Unable to connect to remote cassandra from titan

I am using cassandra 2.0.7 sitting on a remote server listening on non-default port
<code>
---cassandra.yaml
rpc_address: 0.0.0.0
rpc_port: 6543
</code>
I am trying to connect to the server using titan-0.4.4 (java API, also tried with rexster) using the following config:
<code>
storage.hostname=172.182.183.215
storage.backend=cassandra
storage.port=6543
storage.keyspace=abccorp
</code>
It does not connect and I see the the following exceptions below. However, if I use cqlsh on the same host from where I am trying to execute my code/rexster, I am able to connect without any issues. Anybody seen this?
<code>
0 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager - Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool
49 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor - AddHost: 172.182.183.215
554 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager - Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool
555 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor - AddHost: 172.182.183.215
999 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor - AddHost: 127.0.0.1
1000 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor - RemoveHost: 172.182.183.215
2366 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Initiated backend operations thread pool of size 16
41523 [RingDescribeAutoDiscovery] WARN com.netflix.astyanax.impl.RingDescribeHostSupplier - Failed to get hosts from abccorp via ring describe. Will use previously known ring instead
61522 [RingDescribeAutoDiscovery] WARN com.netflix.astyanax.impl.RingDescribeHostSupplier - Failed to get hosts from abccorp via ring describe. Will use previously known ring instead
63080 [main] INFO com.thinkaurelius.titan.diskstorage.util.BackendOperation - Temporary storage exception during backend operation. Attempting backoff retry
com.thinkaurelius.titan.diskstorage.TemporaryStorageException: Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxOrderedKeyColumnValueStore.getNamesSlice(AstyanaxOrderedKeyColumnValueStore.java:138)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxOrderedKeyColumnValueStore.getSlice(AstyanaxOrderedKeyColumnValueStore.java:88)
at com.thinkaurelius.titan.graphdb.configuration.KCVSConfiguration$1.call(KCVSConfiguration.java:70)
at com.thinkaurelius.titan.graphdb.configuration.KCVSConfiguration$1.call(KCVSConfiguration.java:64)
at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:30)
at com.thinkaurelius.titan.graphdb.configuration.KCVSConfiguration.getConfigurationProperty(KCVSConfiguration.java:64)
at com.thinkaurelius.titan.diskstorage.Backend.initialize(Backend.java:277)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1174)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:75)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:40)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:29)
at com.abccorp.grp.graphorm.GraphORM.<init>(GraphORM.java:23)
at com.abccorp.grp.graphorm.GraphORM.getInstance(GraphORM.java:47)
at com.abccorp.grp.utils.dataloader.MainLoader.main(MainLoader.java:150)
Caused by: com.netflix.astyanax.connectionpool.exceptions.NoAvailableHostsException: NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0), attempts=0]No hosts to borrow from
at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.<init>(RoundRobinExecuteWithFailover.java:30)
at com.netflix.astyanax.connectionpool.impl.TokenAwareConnectionPoolImpl.newExecuteWithFailover(TokenAwareConnectionPoolImpl.java:83)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:519)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxOrderedKeyColumnValueStore.getNamesSlice(AstyanaxOrderedKeyColumnValueStore.java:136)
... 13 more
91522 [RingDescribeAutoDiscovery] WARN com.netflix.astyanax.impl.RingDescribeHostSupplier - Failed to get hosts from abccorp via ring describe. Will use previously known ring instead
121522 [RingDescribeAutoDiscovery] WARN com.netflix.astyanax.impl.RingDescribeHostSupplier - Failed to get hosts from abccorp via ring describe. Will use previously known ring instead
</code>
Any help greatly appreciated. I am evaluating titan on cassandra and am a bit stuck on this as previously I was using cassandra (same version) on localhost and everything was fine.
thanks
Changing the listen_address to 172.182.183.215 in the configuration had done the trick. Initially it was not clear if just setting the rpc_address was enough.
Thrift and the drivers that support Thrift are deprecated as of C* 1.2. You should switch to the DataStax Java Driver (currently at 2.0.2).
Alternately, ensure this is set properly in cassandra.yaml
start_rpc: true

Cassandra-Cli refusing connection

I am trying to connect to cassandra. I installed the latest stable version that is apache-cassandra-1.2.4 and extracted it on my desktop. As I run cassandra it sets up nicely listening for thrift client and displaying following :
sudo cassandra -f
log :
INFO 15:30:34,646 Cassandra version: 1.0.12
INFO 15:30:34,646 Thrift API version: 19.20.0
INFO 15:30:34,646 Loading persisted ring state
INFO 15:30:34,650 Starting up server gossip
INFO 15:30:34,661 Enqueuing flush of Memtable-LocationInfo#1117603949(29/36 serialized/live bytes, 1 ops)
INFO 15:30:34,661 Writing Memtable-LocationInfo#1117603949(29/36 serialized/live bytes, 1 ops)
INFO 15:30:34,877 Completed flushing /var/lib/cassandra/data/system/LocationInfo-hd-54-Data.db (80 bytes)
INFO 15:30:34,892 Starting Messaging Service on port 7000
INFO 15:30:34,901 Using saved token 143186062733850112297005303551620336860
INFO 15:30:34,903 Enqueuing flush of Memtable-LocationInfo#1282534304(53/66 serialized/live bytes, 2 ops)
INFO 15:30:34,904 Writing Memtable-LocationInfo#1282534304(53/66 serialized/live bytes, 2 ops)
INFO 15:30:35,102 Completed flushing /var/lib/cassandra/data/system/LocationInfo-hd-55-Data.db (163 bytes)
INFO 15:30:35,106 Node localhost/127.0.0.1 state jump to normal
INFO 15:30:35,107 Bootstrap/Replace/Move completed! Now serving reads.
INFO 15:30:35,108 Will not load MX4J, mx4j-tools.jar is not in the classpath
INFO 15:30:35,150 Binding thrift service to localhost/127.0.0.1:9160
INFO 15:30:35,155 Using TFastFramedTransport with a max frame size of 15728640 bytes.
INFO 15:30:35,160 Using synchronous/threadpool thrift server on localhost/127.0.0.1 : 9160
INFO 15:30:35,168 Listening for thrift clients...
Now as I run : cassandra-cli -h localhost -p 9160, it throws up the error. I have checked for the port to be free and cassandra is listening at the port. :
**
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:80)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:256)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 3 more
Exception connecting to localhost/9160. Reason: Connection refused.
**
I had the same error.Now, it is OK.
The main problem is that the configuration is wrong.
My configuration is as following:
My visual machine ip is 192.168.11.11.My cassandra was installed into the machine.So, I configurate thar
listen_address: 192.168.11.11
rpc_address: 0.0.0.0
broadcast_rpc_address: 192.168.11.11
That is OK。
The documentation of cassandra-stress seems to be sketchy. Maybe in due course that would be corrected. As of now, this command worked for me
./cassandra-stress write -node <IP_OF_NODE1>
Once this works, we could try putting in the other optional parameters to tweak our command.
Option 1:
Run jps command under root user and kill CassandraDaemon if you will see it. After this you will start Cassandra again.
Option2:
Try to connect Cassandra with CQL
./cqlsh 10.234.31.232 9042
Final Check:
An intermediate firewall is blocking the JVM from making the connection.
An operating system firewall, or antivirus that is causing the problems as well.
I think you installed in windows and looks like firewall is blocking your connection.

NameNode: java.net.BindException

hi folks i am stucked in very strange problem.I am installing HBase and hadoop on another VM by accessing it from my machine.Now i have properly installed hadoop and then iran it ./start-all.sh and i see that all processes are running perfectly.So i do jps and i saw that
jobtracker
tasktracker
namenode
secondrynamenode
datanode
everything is running good.Now when I setup hbase and then started hadoop and Hbase , I saw that namenode is not running and in logs (from namenode log file) I got this exception
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
at java.lang.Thread.run(Thread.java:662)
2012-05-19 08:46:07,493 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2012-05-19 08:46:07,516 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to localhost/23.21.195.24:54310 : Cannot assign requested address
at org.apache.hadoop.ipc.Server.bind(Server.java:227)
at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:301)
at org.apache.hadoop.ipc.Server.<init>(Server.java:1483)
at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:545)
at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:294)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:497)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1268)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1277)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at org.apache.hadoop.ipc.Server.bind(Server.java:225)
... 8 more
2012-05-19 08:46:07,516 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
i checked ports and revise all conf files again and again but didn't find the solution. Please guide me if anyone have an idea-
Thnaks alot
Based on your comment, you're probably is most probably related to the hosts file.
Firstly you should uncomment the 127.0.0.1 localhost entry, this is a fundamental entry.
Secondly, Have you set up hadoop and hbase to run with external accessible services - i'm not too up on hbase, but for hadoop, the services need to be bound to non-localhost addresses for external access, so your masters and slaves files in $HADOOP_HOME/conf need to name the actual machine names (or IP addresses if you don't have a DNS server). None of your configuration files should mention localhost, and should use either the host names or IP addresses.

Categories

Resources