Is connectionPoolSetting for single mongos instance or the whole cluster? - java

I am a server-side developer, working on a project which uses a mongo cluster as persistent database.
I have a question for https://mongodb.github.io/mongo-java-driver/3.8/javadoc/com/mongodb/connection/ConnectionPoolSettings.html
It said to a MongoDB server
But what if I have a connectionString like following one
mongodb://user:pwd#mongos1:port,mongos3:port,mongos3:port,mongos4:port,mongos5:port,mongos6:port/admin?readPreference=secondaryPreferred
A mongodb sharded cluster which has 6 mongos instance.
Question:
Is the connectionPoolSetting related to one mongos server? or related to all mongos servers?
E.g. if we have maxSize = 10 in this setting, does it mean single client has max connection pool = 10 for single mongos server (max pool = 60 for my 6 mongos cluster)? Or max connection pool = 10 for the whole cluster no matter how many mongos server we have?

max connection pool = 10 means that in the client pool there will be max 10 connections no matter hoe many server are part of your cluster.

Mongo Client
com.mongodb.client.MongoClient interface:
A client-side representation of a MongoDB cluster. Instances can represent either a standalone MongoDB instance, a replica set, or a sharded cluster. Instance of this class are responsible for maintaining an up-to-date state of the cluster, and possibly cache resources related to this, including background threads for monitoring, and connection pools.
MongoClient object is used to get access to the database, using the getDatebase() method and work with collections and documents in it.
From the documentation:
The MongoClient instance represents a pool of connections to the
database; you will only need one instance of class MongoClient even
with multiple threads.
IMPORTANT
Typically you only create one MongoClient instance for a
given MongoDB deployment (e.g. standalone, replica set, or a sharded
cluster) and use it across your application. However, if you do create
multiple instances:
All resource usage limits (e.g. max connections, etc.) apply per
MongoClient instance.
To dispose of an instance, call MongoClient.close() to clean up resources.
The following code creates a MongoDB client connection object with connection pooling to connect to a MongoDB instance.
MongoClient mongoClient = MongoClients.create();
MongoDatabase database = mongoClient.getDatabase("test");
MongoClients.create() static method creates a connection object specified by the default host (localhost) and port (27017). You can explicitly specify other settings with the MongoClientSettings which specifies various settings to control the behavior of a MongoClient.
MongoClient mongoClient = MongoClients.create(MongoClientSettings settings)
Connection Pool Settings:
The ConnectionPoolSettings object specifies all settings that relate to the pool of connections to a MongoDB server. The application creates this connection pool when the client object is created. This creating of connection pool is driver specific.
ConnectionPoolSettings.Builder is a builder for ConnectionPoolSettings has methods to specify the connection pool properties. E.g., maxSize​(int maxSize): The maximum number of connections allowed. Default is 100. Other methods include, minSize, maxConnectionIdleTime, etc.
Code to instantiate a MongoClient with connection pool settings:
MongoClientSettings settings = MongoClientSettings.builder()
.applyToConnectionPoolSettings(builder ->
builder.maxSize(20).minSize(10)
.build();
MongoClient mongoClient = MongoClients.create(settings);
//...
// Verify the connection pool settings max size as
settings.getConnectionPoolSettings().getMaxSize()
Question: Is the connectionPoolSetting related to one mongos server?
or related to all mongos servers?
A client or application connects to the sharded cluster (includes all its shards) via the mongos router. The client program specifies the URL connection string and other options for the connection. In a sharded cluster, a client may connect thru a set of mongoss or a single mongos, or multiple clients can connect thru a single mongos, etc.,; it depends upon your application architecture.
If you are connecting via a single mongos, you can specify the mongos's host, port, user/password, etc., in the connection string. If it is a multiple mongos's, then multiple host/port values. Irrespective of the number of mongos's, the client program connects to the cluster via only one mongos.
The connection pool setting is for one mongos router only, as an application connects to one mongos irrespective of the number of mongoss specified in the connection string.

Related

Is there a way to create a set of X connections to mongodb using MongoClient on application startup?

I want to create a "X" number of connections to mongodb on my application startup (i.e before my application starts taking traffic).
MongoDB Version: 4.0.11
Mongo Java Driver Version (maven): 3.4.1
I have tried setting the "minConnectionsPerHost" to the required number, but when i execute the code it barely open 1 or 2 connections. But, when i put load on my application the connection count is slowly going up to accommodate the traffic. I want to create those connections before my application starts taking traffic.
ServerAddress address = new ServerAddress("localhost", 27017);
List<ServerAddress> serverAddresses = Arrays.asList(address);
MongoCredential credential =
MongoCredential.createCredential("XXXX", "XXXX",
"XXXX".toCharArray());
List<MongoCredential> mongoCredentials =
Arrays.asList(credential);
MongoClientOptions clientOptions =
MongoClientOptions.builder().connectionsPerHost(100).
minConnectionsPerHost(50).build();
MongoClient mongoClient = new MongoClient(serverAddresses,
mongoCredentials, clientOptions);
Is there a way to achieve this using the mongo java driver?
You can set a minConnectionsPerHost() in the options builder, and then use a warmup script to create many connections. The connection pool will keep the minConnectionsPerHost connections alive without closing.
The warmup script can have a program which spawns 2*minConnectionsPerHost number of threads, which will connect and do may be a dummy read operation. This way connections will be opened, minimum connections will be kept alive.
This seems to be dirty solution :-) But might work!

Cassandra behavior on contact point based on data center

Cassandra setup in 3 data-center (dc1, dc2 & dc3) forming a cluster
Running a Java Application on dc1.
dc1 application has Cassandra connectors pointed to dc1 (ips of cassandra in dc1 alone given to the application)
turning off the dc1 cassandra nodes application throws exception in application like
All host(s) tried for query failed (no host was tried)
More Info:
cassandra-driver-core-3.0.8.jar
netty-3.10.5.Final.jar
netty-buffer-4.0.37.Final.jar
netty-codec-4.0.37.Final.jar
netty-common-4.0.37.Final.jar
netty-handler-4.0.37.Final.jar
netty-transport-4.0.37.Final.jar
Keyspace : Network topology
Replication : dc1:2, dc2:2, dc3:2
Cassandra Version : 3.11.4
Here are some things I have found out with connections and Cassandra (and BTW, I believe Cassandra has one of the best HA configurations of any database I've worked with over the past 25 years).
1) Ensure you have all of the components specified in your connection connection. Here is an example of some of the connection components, but there are others as well (maybe you've already done this):
cluster = Cluster.builder()
.addContactPoints(nodes.split(","))
.withCredentials(username, password)
.withPoolingOptions(poolingOptions)
.withLoadBalancingPolicy(
new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder()
.withLocalDc("MYLOCALDC")
.withUsedHostsPerRemoteDc(1)
.allowRemoteDCsForLocalConsistencyLevel()
.build()
)
).build();
2) Unless the entire DC you're "working in" is down, you could receive errors. Cassandra doesn't fail over to alternate DCs unless every node is down in the DC. If less than all nodes are down and your client can't satisfy the client CL settings, you will receive errors. I was actually hoping, when I did testing a while back, that if you couldn't achieve client CL in the LOCAL DC (even if some nodes in the current DC were up) and alternate DCs could, that it would automatically fail over, but this is not the case (since I last tested).
Maybe that helps?
-Jim

Client connection with remote Hbase server in case of network issue for some time

I am using Table interface Interface( https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html ) and using connection interface ( https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html ) to get the Table object. But as mention in Connection interface link,"Connection creation is a heavy-weight operation. Connection implementations are thread-safe, so that the client can create a connection once, and share it with different threads.".
So if I am creating a single connection object for all threads(creating this object in static block), so what will happen if there will be some network issue and client lost connection with hbase cluster for some time. Will the Connection object will still work after that ?
If the connection is lost and come back again untill some certain period(TCP Timeout), everything will work fine.
As there is TCP connection established between client and hbase. Also as mentioned in the documentation, "The individual connections to servers, meta cache, zookeeper connection, etc are all shared by the Table and Admin instances obtained from this connection", if we have sent the data while the network was unreachable, the data will be present in the buffer and client will try to do retransmissions of that segment, and hbase will get it when network will come back.
But if the network will not become reachable untill certain period(TCP Timeout), then TCP finally gives up and will close the socket.For this situation you have to put some catch block to handle this or need to restart the jar.

Java EE show how many connections left in the connection pool

My system encounter some connection leak in connection pool. I would like to list down some statistic of the connection pool regularly, how can I do that? For example, Current Capacity, Active Connections High Count, Connections Total Count, Leaked Connection Count and etc.
I am using javax.sql.DataSource to retrieve the connection from connection pool. But I couldn't find any interface that can retrieve those connection pool information. Any ideas?
I am using Oracle DB and Java EE as my server side script.
The javax.sql.DataSource is an interface and it just abstracts a data source. It does not involve providing pooled connections to it.
A connection pool is responsible for providing pooled, reusable connections to a database (data source).
First you need to find out which connection pool you're using. Connection pool implementations usually provide a way to query things like the number of active connections.
For example the Apache DBCP has a BasicDataSource class which is a connection pool, and it has a methods for this:
BasicDataSource.getMaxTotal();
BasicDataSource.getNumActive();
BasicDataSource.getNumIdle();
BasicDataSource.getMinIdle();
BasicDataSource.getMaxIdle();
Since you mentioned you're using Oracle DB, most likely your connection pool is OracleOCIConnectionPool (part of Oracle JDBC driver) which provides:
OracleOCIConnectionPool.getMaxLimit();
OracleOCIConnectionPool.getPoolSize();
OracleOCIConnectionPool.getActiveSize();
OracleOCIConnectionPool.getMinLimit();

setMaxConns and setMaxConnsPerHost in Astyanax client

I am using Astyanax client to read the data from Cassandra database. I have a single cluster with four nodes. I am having replication factor of 2. I am trying to understand what is the difference between
setMaxConns and setMaxConnsPerHost
methods in Astyanax client? I cannot find proper documentation on this.
I have a Multithreaded code which which spawn multiple threads and then create the connection to Cassandra database only once (as it is a Singleton) and then keep on reusing for other request.
Now I am trying to understand how the above two methods will play a role in read performance? And How those values should be set up?
And If I am setting those above two methods as-
setMaxConns(-1) and setMaxConnsPerHost(20)
then what does it mean? Any explanation will be of great help.
Updated Code:-
Below is the code, I am using to make the connection-
private CassandraAstyanaxConnection() {
context = new AstyanaxContext.Builder()
.forCluster(ModelConstants.CLUSTER)
.forKeyspace(ModelConstants.KEYSPACE)
.withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
)
.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
.setPort(9160)
.setMaxConnsPerHost(20)
.setMaxConns(-1)
.setSeeds("host1:9160,host2:9160,host3:9160,host4:9160")
)
.withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
.setCqlVersion("3.0.0")
.setTargetCassandraVersion("1.2"))
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
.buildKeyspace(ThriftFamilyFactory.getInstance());
context.start();
keyspace = context.getEntity();
emp_cf = ColumnFamily.newColumnFamily(
ModelConstants.COLUMN_FAMILY,
StringSerializer.get(),
StringSerializer.get());
}
If I am debugging this code, it is not even hitting the BagOfConnectionsConnectionPoolImpl class. I put a lot of breakpoint in the same class to see how it is using the conenctions and other default parameters. But don't know why it is not hitting that class.
The behavior regarding these configuration properties might be dependent on implementation.
BagOfConnectionsConnectionPoolImpl
BagOfConnectionsConnectionPoolImpl is the only implementation at the moment that honors both these properties. It behaves as follows:
Connection is borrowed from the pool on every cassandra operation (query or mutation) and returned to pool upon completion of operation.
maxConnsPerHost - maximum number of connections per single cassandra host.
maxConns - maximum number of connections in the pool.
Both these numbers must be positive, so setMaxConns(-1) just won't work.
On the attempt to borrow a connection from pool, the pool checks active connection number against maxConns. If the limit is exceeded, it waits until some connection is released. If no connection is available in specified timeout, the pool throws PoolTimeoutException.
If maxConns limit is not exceeded, the pool attempts to find a cassandra host it's aware of (specified as seed or found during discovery) that has the number of active connections below maxConnsPerHost and connect to it. If all hosts reached connection limit, the pool throws NoAvailableHostsException.
For example, let's take a client that connects to cluster of 4 nodes:
setMaxConns(100); setMaxConnsPerHost(10): Effective maximum number of connections is 40 (10 connections per node, no further connection attempts will be made). NoAvailableHostsException will be thrown.
setMaxConns(20); setMaxConnsPerHost(10): Effective maximum number of connections is 20. The connections to different hosts will be distributed uniformly, but not necessary equally. PoolTimeoutException will be thrown.
Things get more complicated if nodes join or leave cluster, but general idea is the same.
TokenAwareConnectionPoolImpl & RoundRobinConnectionPoolImpl
Both TokenAwareConnectionPoolImpl & RoundRobinConnectionPoolImpl ignore maxConns configuration property. They just select a host (depending on row token or randomly) and attempt to connect to it.
If the number of active connections to that host exceeds maxConnsPerHost, the pool waits until some connection is released. If no connection is available during specified timeout, another connection attempt to (potentially) another host is executed as a part of failover.

Categories

Resources