Access FreePastry program that is behind NAT - java

I'm trying to connect to my program that uses FreePastry behind a NAT but getting no where. mIP is my public IP, mBootport and mBindport is 50001. I have forworded this ports in my router to my computer stil it does not work. I disabled the firewall yet nothing. I disconnected the router and connect directly to the internet and stil it does not work. The only time it does work in on my local network. So something most be wrong in either the code of the config file but i can not see what is wrong.
Environment env = new Environment();
InetSocketAddress bootaddress = new InetSocketAddress(mIP, mBootport);
NodeIdFactory nidFactory = new RandomNodeIdFactory(env);
PastryNodeFactory factory = new SocketPastryNodeFactory(nidFactory, mBindport, env);
for (int curNode = 0; curNode < mNumNodes; curNode++) {
PastryNode node = factory.newNode();
NetworkHandler app = new NetworkHandler(node, mLog);
apps.add(app);
node.boot(bootaddress);
synchronized(node) {
while(!node.isReady() && !node.joinFailed()) {
node.wait(500);
if (node.joinFailed()) {
throw new IOException("Could not join the FreePastry ring. Reason:"+node.joinFailedReason());
}
}
}
System.out.println("Finished creating new node: " + node);
mLog.append("Finished creating new node: " + node + "\n");
}
Iterator<NetworkHandler> i = apps.iterator();
NetworkHandler app = (NetworkHandler) i.next();
app.subscribe();
public class NetworkHandler implements ScribeClient, Application {
int seqNum = 0;
CancellableTask publishTask;
Scribe myScribe;
Topic myTopic;
JTextArea mLog;
protected Endpoint endpoint;
public NetworkHandler(Node node, JTextArea log) {
this.endpoint = node.buildEndpoint(this, "myinstance");
mLog = log;
myScribe = new ScribeImpl(node,"myScribeInstance");
myTopic = new Topic(new PastryIdFactory(node.getEnvironment()), "example topic");
System.out.println("myTopic = "+myTopic);
mLog.append("myTopic = "+myTopic + "\n");
endpoint.register();
}
public void subscribe() {
myScribe.subscribe(myTopic, this);
}
}
freepastry.params
# this file holds the default values for pastry and it's applications
# you do not need to modify the default.params file to override these values
# instead you can use your own params file to set values to override the
# defaults. You can specify this file by constructing your
# rice.environment.Environment() with the filename you wish to use
# typically, you will want to be able to pass this file name from the command
# line
# max number of handles stored per routing table entry
pastry_rtMax = 1
pastry_rtBaseBitLength = 4
# leafset size
pastry_lSetSize = 24
# maintenance frequencies
pastry_leafSetMaintFreq = 60
pastry_routeSetMaintFreq = 900
# drop the message if pastry is not ready
pastry_messageDispatch_bufferIfNotReady = false
# number of messages to buffer while an app hasn't yet been registered
pastry_messageDispatch_bufferSize = 32
# FP 2.1 uses the new transport layer
transport_wire_datagram_receive_buffer_size = 131072
transport_wire_datagram_send_buffer_size = 65536
transport_epoch_max_num_addresses = 2
transport_sr_max_num_hops = 5
# proximity neighbor selection
transport_use_pns = true
# number of rows in the routing table to consider during PNS
# valid values are ALL, or a number
pns_num_rows_to_use = 10
# commonapi testing parameters
# direct or socket
commonapi_testing_exit_on_failure = true
commonapi_testing_protocol = direct
commonapi_testing_startPort = 5009
commonapi_testing_num_nodes = 10
# set this to specify the bootstrap node
#commonapi_testing_bootstrap = localhost:5009
# random number generator's seed, "CLOCK" uses the current clock time
random_seed = CLOCK
# sphere, euclidean or gt-itm
direct_simulator_topology = sphere
# -1 starts the simulation with the current time
direct_simulator_start_time = -1
#pastry_direct_use_own_random = true
#pastry_periodic_leafset_protocol_use_own_random = true
pastry_direct_gtitm_matrix_file=GNPINPUT
# the number of stubs in your network
pastry_direct_gtitm_max_overlay_size=1000
# the number of virtual nodes at each stub: this allows you to simulate multiple "LANs" and allows cheeper scaling
pastry_direct_gtitm_nodes_per_stub=1
# the factor to multiply your file by to reach millis. Set this to 0.001 if your file is in microseconds. Set this to 1000 if your file is in seconds.
pastry_direct_gtitm_delay_factor=1.0
#millis of the maximum network delay for the generated network topologies
pastry_direct_max_diameter=200
pastry_direct_min_delay=2
#setting this to false will use the old protocols which are about 200 times as fast, but may cause routing inconsistency in a real network. Probably won't in a simulator because it will never be incorrect about liveness
pastry_direct_guarantee_consistency=true
# rice.pastry.socket parameters
# tells the factory you intend to use multiple nodes
# this causes the logger to prepend all entries with the nodeid
pastry_factory_multipleNodes = true
pastry_factory_selectorPerNode = false
pastry_factory_processorPerNode = false
# number of bootstap nodehandles to fetch in parallel
pastry_factory_bootsInParallel = 1
# the maximum size of a message
pastry_socket_reader_selector_deserialization_max_size = 1000000
# the maximum number of outgoing messages to queue when a socket is slower than the number of messages you are queuing
pastry_socket_writer_max_queue_length = 30
pastry_socket_writer_max_msg_size = 20480
pastry_socket_repeater_buffer_size = 65536
pastry_socket_pingmanager_smallPings=true
pastry_socket_pingmanager_datagram_receive_buffer_size = 131072
pastry_socket_pingmanager_datagram_send_buffer_size = 65536
# the time before it will retry a route that was already found dead
pastry_socket_srm_check_dead_throttle = 300000
pastry_socket_srm_proximity_timeout = 3600000
pastry_socket_srm_ping_throttle = 30000
pastry_socket_srm_default_rto = 3000
pastry_socket_srm_rto_ubound = 10000
pastry_socket_srm_rto_lbound = 50
pastry_socket_srm_gain_h = 0.25
pastry_socket_srm_gain_g = 0.125
pastry_socket_scm_max_open_sockets = 300
pastry_socket_scm_max_open_source_routes = 30
# the maximum number of source routes to attempt, setting this to 0 will
# effectively eliminate source route attempts
# setting higher than the leafset does no good, it will be bounded by the leafset
# a larger number tries more source routes, which could give you a more accurate
# determination, however, is more likely to lead to congestion collapse
pastry_socket_srm_num_source_route_attempts = 8
pastry_socket_scm_socket_buffer_size = 32768
# this parameter is multiplied by the exponential backoff when doing a liveness check so the first will be 800, then 1600, then 3200 etc...
pastry_socket_scm_ping_delay = 800
# adds some fuzziness to the pings to help prevent congestion collapse, so this will make the ping be advanced or delayed by this factor
pastry_socket_scm_ping_jitter = 0.1
# how many pings until we call the node faulty
pastry_socket_scm_num_ping_tries = 5
pastry_socket_scm_write_wait_time = 30000
pastry_socket_scm_backoff_initial = 250
pastry_socket_scm_backoff_limit = 5
pastry_socket_pingmanager_testSourceRouting = false
pastry_socket_increment_port_after_construction = true
# if you want to allow connection to 127.0.0.1, set this to true
pastry_socket_allow_loopback = false
# these params will be used if the computer attempts to bind to the loopback address, they will open a socket to this address/port to identify which network adapter to bind to
pastry_socket_known_network_address = yahoo.com
pastry_socket_known_network_address_port = 80
pastry_socket_use_own_random = true
pastry_socket_random_seed = clock
# force the node to be a seed node
rice_socket_seed = false
# the parameter simulates some nodes being firewalled, base on rendezvous_test_num_firewalled
rendezvous_test_firewall = false
# probabilistic fraction of firewalled nodes
rendezvous_test_num_firewalled = 0.3
# don't firewall the first node, useful for testing
rendezvous_test_makes_bootstrap = false
# FP 2.1 uses the new transport layer
transport_wire_datagram_receive_buffer_size = 131072
transport_wire_datagram_send_buffer_size = 65536
# NAT/UPnP settings
nat_network_prefixes = 127.0.0.1;10.;192.168.
# Enable and set this if you have already set up port forwarding and know the external address
#external_address = 123.45.67.89:1234
#enable this if you set up port forwarding (on the same port), but you don't
#know the external address and you don't have UPnP enabled
#this is useful for a firwall w/o UPnP support, and your IP address isn't static
probe_for_external_address = true
# values how to probe
pastry_proxy_connectivity_timeout = 15000
pastry_proxy_connectivity_tries = 3
# possible values: always, never, prefix (prefix is if the localAddress matches any of the nat_network_prefixes
# whether to search for a nat using UPnP (default: prefix)
nat_search_policy = prefix
# whether to verify connectivity (default: boot)
firewall_test_policy = never
# policy for setting port forwarding the state of the firewall if there is already a conflicting rule: overwrite, fail (throw exception), change (use different port)
# you may want to set this to overwrite or fail on the bootstrap nodes, but most freepastry applications can run on any available port, so the default is change
nat_state_policy = change
# the name of the application in the firewall, set this if you want your application to have a more specific name
nat_app_name = freepastry
# how long to wait for responses from the firewall, in millis
nat_discovery_timeout = 5000
# how many searches to try to find a free firewall port
nat_find_port_max_tries = 10
# uncomment this to use UPnP NAT port forwarding, you need to include in the classpath: commons-jxpath-1.1.jar:commons-logging.jar:sbbi-upnplib-xxx.jar
nat_handler_class = rice.pastry.socket.nat.sbbi.SBBINatHandler
# hairpinning:
# default "prefix" requires more bandwidth if you are behind a NAT. It enables multiple IP
# addresses in the NodeHandle if you are behind a NAT. These are usually the internet routable address,
# and the LAN address (usually 192.168.x.x)
# you can set this to never if any of the following conditions hold:
# a) you are the only FreePastry node behind this address
# b) you firewall supports hairpinning see
# http://scm.sipfoundry.org/rep/ietf-drafts/behave/draft-ietf-behave-nat-udp-03.html#rfc.section.6
nat_nodehandle_multiaddress = prefix
# if we are not scheduled for time on cpu in this time, we setReady(false)
# otherwise there could be message inconsistency, because
# neighbors may believe us to be dead. Note that it is critical
# to consider the amount of time it takes the transport layer to find a
# node faulty before setting this parameter, this parameter should be
# less than the minimum time required to find a node faulty
pastry_protocol_consistentJoin_max_time_to_be_scheduled = 15000
# in case messages are dropped or something, how often it will retry to
# send the consistent join message, to get verification from the entire
# leafset
pastry_protocol_consistentJoin_retry_interval = 30000
# parameter to control how long dead nodes are retained in the "failed set" in
# CJP (see ConsistentJoinProtocol ctor) (15 minutes)
pastry_protocol_consistentJoin_failedRetentionTime = 900000
# how often to cleanup the failed set (5 mins) (see ConsistentJoinProtocol ctor)
pastry_protocol_consistentJoin_cleanup_interval = 300000
# the maximum number of entries to send in the failed set, only sends the most
recent detected failures (see ConsistentJoinProtocol ctor)
pastry_protocol_consistentJoin_maxFailedToSend = 20
# how often we send/expect to be sent updates
pastry_protocol_periodicLeafSet_ping_neighbor_period = 20000
pastry_protocol_periodicLeafSet_lease_period = 30000
# what the grace period is to receive a periodic update, before checking
# liveness
pastry_protocol_periodicLeafSet_request_lease_throttle = 10000
# how many entries are kept in the partition handler's table
partition_handler_max_history_size=20
# how long entries in the partition handler's table are kept
# 90 minutes
partition_handler_max_history_age=5400000
# what fraction of the time a bootstrap host is checked
partition_handler_bootstrap_check_rate=0.05
# how often to run the partition handler
# 5 minutes
partition_handler_check_interval=300000
# the version number of the RouteMessage to transmit (it can receive anything that it knows how to)
# this is useful if you need to migrate an older ring
# you can change this value in realtime, so, you can start at 0 and issue a command to update it to 1
pastry_protocol_router_routeMsgVersion = 1
# should usually be equal to the pastry_rtBaseBitLength
p2p_splitStream_stripeBaseBitLength = 4
p2p_splitStream_policy_default_maximum_children = 24
p2p_splitStream_stripe_max_failed_subscription = 5
p2p_splitStream_stripe_max_failed_subscription_retry_delay = 1000
#multiring
p2p_multiring_base = 2
#past
p2p_past_messageTimeout = 30000
p2p_past_successfulInsertThreshold = 0.5
#replication
# fetch delay is the delay between fetching successive keys
p2p_replication_manager_fetch_delay = 500
# the timeout delay is how long we take before we time out fetching a key
p2p_replication_manager_timeout_delay = 20000
# this is the number of keys to delete when we detect a change in the replica set
p2p_replication_manager_num_delete_at_once = 100
# this is how often replication will wake up and do maintainence; 10 mins
p2p_replication_maintenance_interval = 600000
# the maximum number of keys replication will try to exchange in a maintainence message
p2p_replication_max_keys_in_message = 1000
#scribe
p2p_scribe_maintenance_interval = 180000
#time for a subscribe fail to be thrown (in millis)
p2p_scribe_message_timeout = 15000
#util
p2p_util_encryptedOutputStream_buffer = 32678
#aggregation
p2p_aggregation_logStatistics = true
p2p_aggregation_flushDelayAfterJoin = 30000
#5 MINS
p2p_aggregation_flushStressInterval = 300000
#5 MINS
p2p_aggregation_flushInterval = 300000
#1024*1024
p2p_aggregation_maxAggregateSize = 1048576
p2p_aggregation_maxObjectsInAggregate = 25
p2p_aggregation_maxAggregatesPerRun = 2
p2p_aggregation_addMissingAfterRefresh = true
p2p_aggregation_maxReaggregationPerRefresh = 100
p2p_aggregation_nominalReferenceCount = 2
p2p_aggregation_maxPointersPerAggregate = 100
#14 DAYS
p2p_aggregation_pointerArrayLifetime = 1209600000
#1 DAY
p2p_aggregation_aggregateGracePeriod = 86400000
#15 MINS
p2p_aggregation_aggrRefreshInterval = 900000
p2p_aggregation_aggrRefreshDelayAfterJoin = 70000
#3 DAYS
p2p_aggregation_expirationRenewThreshold = 259200000
p2p_aggregation_monitorEnabled = false
#15 MINS
p2p_aggregation_monitorRefreshInterval = 900000
#5 MINS
p2p_aggregation_consolidationDelayAfterJoin = 300000
#15 MINS
p2p_aggregation_consolidationInterval = 900000
#14 DAYS
p2p_aggregation_consolidationThreshold = 1209600000
p2p_aggregation_consolidationMinObjectsInAggregate = 20
p2p_aggregation_consolidationMinComponentsAlive = 0.8
p2p_aggregation_reconstructionMaxConcurrentLookups = 10
p2p_aggregation_aggregateLogEnabled = true
#1 HOUR
p2p_aggregation_statsGranularity = 3600000
#3 WEEKS
p2p_aggregation_statsRange = 1814400000
p2p_aggregation_statsInterval = 60000
p2p_aggregation_jitterRange = 0.1
# glacier
p2p_glacier_logStatistics = true
p2p_glacier_faultInjectionEnabled = false
p2p_glacier_insertTimeout = 30000
p2p_glacier_minFragmentsAfterInsert = 3.0
p2p_glacier_refreshTimeout = 30000
p2p_glacier_expireNeighborsDelayAfterJoin = 30000
#5 MINS
p2p_glacier_expireNeighborsInterval = 300000
#5 DAYS
p2p_glacier_neighborTimeout = 432000000
p2p_glacier_syncDelayAfterJoin = 30000
#5 MINS
p2p_glacier_syncMinRemainingLifetime = 300000
#insertTimeout
p2p_glacier_syncMinQuietTime = 30000
p2p_glacier_syncBloomFilterNumHashes = 3
p2p_glacier_syncBloomFilterBitsPerKey = 4
p2p_glacier_syncPartnersPerTrial = 1
#1 HOUR
p2p_glacier_syncInterval = 3600000
#3 MINUTES
p2p_glacier_syncRetryInterval = 180000
p2p_glacier_syncMaxFragments = 100
p2p_glacier_fragmentRequestMaxAttempts = 0
p2p_glacier_fragmentRequestTimeoutDefault = 10000
p2p_glacier_fragmentRequestTimeoutMin = 10000
p2p_glacier_fragmentRequestTimeoutMax = 60000
p2p_glacier_fragmentRequestTimeoutDecrement = 1000
p2p_glacier_manifestRequestTimeout = 10000
p2p_glacier_manifestRequestInitialBurst = 3
p2p_glacier_manifestRequestRetryBurst = 5
p2p_glacier_manifestAggregationFactor = 5
#3 MINUTES
p2p_glacier_overallRestoreTimeout = 180000
p2p_glacier_handoffDelayAfterJoin = 45000
#4 MINUTES
p2p_glacier_handoffInterval = 240000
p2p_glacier_handoffMaxFragments = 10
#10 MINUTES
p2p_glacier_garbageCollectionInterval = 600000
p2p_glacier_garbageCollectionMaxFragmentsPerRun = 100
#10 MINUTES
p2p_glacier_localScanInterval = 600000
p2p_glacier_localScanMaxFragmentsPerRun = 20
p2p_glacier_restoreMaxRequestFactor = 4.0
p2p_glacier_restoreMaxBoosts = 2
p2p_glacier_rateLimitedCheckInterval = 30000
p2p_glacier_rateLimitedRequestsPerSecond = 3
p2p_glacier_enableBulkRefresh = true
p2p_glacier_bulkRefreshProbeInterval = 3000
p2p_glacier_bulkRefreshMaxProbeFactor = 3.0
p2p_glacier_bulkRefreshManifestInterval = 30000
p2p_glacier_bulkRefreshManifestAggregationFactor = 20
p2p_glacier_bulkRefreshPatchAggregationFactor = 50
#3 MINUTES
p2p_glacier_bulkRefreshPatchInterval = 180000
p2p_glacier_bulkRefreshPatchRetries = 2
p2p_glacier_bucketTokensPerSecond = 100000
p2p_glacier_bucketMaxBurstSize = 200000
p2p_glacier_jitterRange = 0.1
#1 MINUTE
p2p_glacier_statisticsReportInterval = 60000
p2p_glacier_maxActiveRestores = 3
#transport layer testing params
org.mpisws.p2p.testing.transportlayer.replay.Recorder_printlog = true
# logging
#default log level
loglevel = WARNING
#example of enabling logging on the endpoint:
#rice.p2p.scribe#ScribeRegrTest-endpoint_loglevel = INFO
logging_packageOnly = true
logging_date_format = yyyyMMdd.HHmmss.SSS
logging_enable=true
# 24 hours
log_rotate_interval = 86400000
# the name of the active log file, and the filename prefix of rotated log
log_rotate_filename = freepastry.log
# the format of the date for the rotating log
log_rotating_date_format = yyyyMMdd.HHmmss.SSS
# true will tell the environment to ues the FileLogManager
environment_logToFile = false
# the prefix for the log files (otherwise will be named after the nodeId)
fileLogManager_filePrefix =
# the suffix for the log files
fileLogManager_fileSuffix = .log
# wether to keep the line prefix (declaring the node id) for each line of the log
fileLogManager_keepLinePrefix = false
fileLogManager_multipleFiles = true
fileLogManager_defaultFileName = main
# false = append true = overwrite
fileLogManager_overwrite_existing_log_file = false
# the amount of time the LookupService tutorial app will wait before timing out
# in milliseconds, default is 30 seconds
lookup_service.timeout = 30000
# how long to wait before the first retry
lookup_service.firstTimeout = 500
Edit: Comfirmed with wireshark that the message indeed reach the computer freepastry just don't accept the connection.

Not sure what you mean by "not work". To test the connectivity between your client and your server (sit behind NAT), you just need do something like "telnet mIP mBindport" on your client side, assuming you have a telnet utility (default on Linux and Mac, you can install one, like nc ("netcat") on your windows).
If the port forwarding is set up correctly, you should see something like the following when the TCP connection is set up with your server.
Connected to localhost.
Escape character is '^]'.
Once the TCP session sets up correctly, you can stop the "telnet" program and use your real client (in java) to talk to your server, it should work fine.
If the TCP session didn't set up, you may want to check on the server side. Use either a wireshark or tcpdump to capture packets with filter "tcp port 50001", and run the telnet command above to check if there is a TCP packet come in.
If nothing show up in wireshark or tcpdump, then your firewall (like portforwarding) is not set up correctly.
If the TCP packet does show up in wireshark or tcpdump, then your server program may be at fault. Check the IP address it binds to using the command (linux):
netstat -antp | grep 50001
(on windows, the command is slightly different).
Typically it should bind to IP address 0.0.0.0 (all ip), if it doesn't, you should check whether the IP it binds to has connectivity/route to the outside world (outside the NAT).
Good luck.

I would try to set your IP as your local for the computer Free Pastry is running on. It sounds like the computer is getting the information but Free Pastry is looking for it on a different address. If you set your mIP to be local, I think it would work. This would be if it is behind the router/NAT.
Port forwarding forwards packets from your public IP on port 50001 to your internal computer IP on whatever port you set, normally the same 50001. If you set your program to listen on the public IP, it doesn't have access to it so it will not accept any packets/messages. Set to listen on the computers IP, or 0.0.0.0/localhost, it should accept any packets/messages on that port.

Related

OR-TOOLS EXCEPTION_ACCESS_VIOLATION when running TSP

When I run the TSP algorithm I get a fatal error on the native or-tools library.
There is a small chance to execute the TSP algo with success when running it only one time, but for consecutive executions without a big interval between them, it always happens.
I'm currently running it on Windows 10, but it tested it on Debian and Alpine and the problem still happens.
Here is a preview, but you can see the full log here (each time I get this error the problematic frame is different).
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000002a9e261c007, pid=12012, tid=9116
#
# JRE version: OpenJDK Runtime Environment (14.0.1+7) (build 14.0.1+7)
# Java VM: OpenJDK 64-Bit Server VM (14.0.1+7, mixed mode, sharing, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
# J 10298 c2 org.neo4j.kernel.impl.store.LongerShortString.decode([JII)Lorg/neo4j/values/storable/TextValue; (120 bytes) # 0x000002a9e261c007 [0x000002a9e261bb80+0x0000000000000487]
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\hugo_\Workspace\Itini\Backend\service-itinerary-builder\hs_err_pid12012.log
Compiled method (c2) 124037 10298 4 org.neo4j.kernel.impl.store.LongerShortString::decode (120 bytes)
total in heap [0x000002a9e261b910,0x000002a9e261cc48] = 4920
relocation [0x000002a9e261ba70,0x000002a9e261bb70] = 256
main code [0x000002a9e261bb80,0x000002a9e261c6e0] = 2912
stub code [0x000002a9e261c6e0,0x000002a9e261c6f8] = 24
oops [0x000002a9e261c6f8,0x000002a9e261c708] = 16
metadata [0x000002a9e261c708,0x000002a9e261c770] = 104
scopes data [0x000002a9e261c770,0x000002a9e261ca40] = 720
scopes pcs [0x000002a9e261ca40,0x000002a9e261cb90] = 336
dependencies [0x000002a9e261cb90,0x000002a9e261cb98] = 8
handler table [0x000002a9e261cb98,0x000002a9e261cc28] = 144
nul chk table [0x000002a9e261cc28,0x000002a9e261cc48] = 32
Compiled method (c2) 124056 12326 4 org.neo4j.kernel.impl.newapi.DefaultPropertyCursor::propertyValue (38 bytes)
total in heap [0x000002a9e2cb7410,0x000002a9e2cb88f8] = 5352
relocation [0x000002a9e2cb7570,0x000002a9e2cb7750] = 480
main code [0x000002a9e2cb7760,0x000002a9e2cb8280] = 2848
stub code [0x000002a9e2cb8280,0x000002a9e2cb82c8] = 72
oops [0x000002a9e2cb82c8,0x000002a9e2cb82d8] = 16
metadata [0x000002a9e2cb82d8,0x000002a9e2cb8398] = 192
scopes data [0x000002a9e2cb8398,0x000002a9e2cb8618] = 640
scopes pcs [0x000002a9e2cb8618,0x000002a9e2cb8828] = 528
dependencies [0x000002a9e2cb8828,0x000002a9e2cb8838] = 16
handler table [0x000002a9e2cb8838,0x000002a9e2cb88c8] = 144
nul chk table [0x000002a9e2cb88c8,0x000002a9e2cb88f8] = 48
Java code:
public List<AlgoNode> solve(final Collection<AlgoNode> nodes, final AlgoNode start) {
if (nodes == null || nodes.isEmpty())
return new ArrayList<>();
if (nodes.size() == 1)
return new ArrayList<>(nodes);
// Setup Variables
var list = new ArrayList<>(nodes);
var depot = start == null ? 0 : list.indexOf(start);
var manager = new RoutingIndexManager(list.size(), 1, depot);
var routing = new RoutingModel(manager);
// Define dummy weight function
var transitCallbackIndex = routing.registerTransitCallback((fromIndex, toIndex) -> 1L);
routing.setArcCostEvaluatorOfAllVehicles(transitCallbackIndex);
// Solve
var parameters = main.defaultRoutingSearchParameters().toBuilder();
parameters.setFirstSolutionStrategy(FirstSolutionStrategy.Value.PATH_CHEAPEST_ARC);
var solution = routing.solveWithParameters(parameters.build()); // Problematic line
return new ArrayList<>(); // Dummy return
}
I also tryied making the method syncronized, using a lock and calling closeModel() after running the TSP, but no lucky.
seems related to https://github.com/google/or-tools/issues/2091
Please, don't hesitate to open a github issue with all this information...

Can produce to Kafka but cannot consume

I'm using the Kafka JDK client ver 0.10.2.1 . I am able to produce simple messages to Kafka for a "heartbeat" test, but I cannot consume a message from that same topic using the sdk. I am able to consume that message when I go into the Kafka CLI, so I have confirmed the message is there. Here's the function I'm using to consume from my Kafka server, with the props - I pass the message I produced to the topic only after I have indeed confirmed the produce() was succesful, I can post that function later if requested:
private def consumeFromKafka(topic: String, expectedMessage: String): Boolean = {
val props: Properties = initProps("consumer")
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List(topic).asJava)
var readExpectedRecord = false
try {
val records = {
val firstPollRecs = consumer.poll(MAX_POLLTIME_MS)
// increase timeout and try again if nothing comes back the first time in case system is busy
if (firstPollRecs.count() == 0) firstPollRecs else {
logger.info("KafkaHeartBeat: First poll had 0 records- trying again - doubling timeout to "
+ (MAX_POLLTIME_MS * 2)/1000 + " sec.")
consumer.poll(MAX_POLLTIME_MS * 2)
}
}
records.forEach(rec => {
if (rec.value() == expectedMessage) readExpectedRecord = true
})
} catch {
case e: Throwable => //log error
} finally {
consumer.close()
}
readExpectedRecord
}
private def initProps(propsType: String): Properties = {
val prop = new Properties()
prop.put("bootstrap.servers", kafkaServer + ":" + kafkaPort)
propsType match {
case "producer" => {
prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
prop.put("acks", "1")
prop.put("producer.type", "sync")
prop.put("retries", "3")
prop.put("linger.ms", "5")
}
case "consumer" => {
prop.put("group.id", groupId)
prop.put("enable.auto.commit", "false")
prop.put("auto.commit.interval.ms", "1000")
prop.put("session.timeout.ms", "30000")
prop.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
prop.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
prop.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
// poll just once, should only be one record for the heartbeat
prop.put("max.poll.records", "1")
}
}
prop
}
Now when I run the code, here's what it outputs in the console:
13:04:21 - Discovered coordinator serverName:9092 (id: 2147483647
rack: null) for group 0b8947e1-eb68-4af3-ac7b-be3f7c02e76e. 13:04:23
INFO o.a.k.c.c.i.ConsumerCoordinator - Revoking previously assigned
partitions [] for group 0b8947e1-eb68-4af3-ac7b-be3f7c02e76e 13:04:24
INFO o.a.k.c.c.i.AbstractCoordinator - (Re-)joining group
0b8947e1-eb68-4af3-ac7b-be3f7c02e76e 13:04:25 INFO
o.a.k.c.c.i.AbstractCoordinator - Successfully joined group
0b8947e1-eb68-4af3-ac7b-be3f7c02e76e with generation 1 13:04:26 INFO
o.a.k.c.c.i.ConsumerCoordinator - Setting newly assigned partitions
[HeartBeat_Topic.Service_5.2018-08-03.13_04_10.377-0] for group
0b8947e1-eb68-4af3-ac7b-be3f7c02e76e 13:04:27 INFO
c.p.p.l.util.KafkaHeartBeatUtil - KafkaHeartBeat: First poll had 0
records- trying again - doubling timeout to 60 sec.
And then nothing else, no errors thrown -so no records are polled. Does anyone have any idea what's preventing the 'consume' from happening? The subscriber seems to be successful, as I'm able to successfully call the listTopics and list partions no problem.
Your code has a bug. It seems your line:
if (firstPollRecs.count() == 0)
Should say this instead
if (firstPollRecs.count() > 0)
Otherwise, you're passing in an empty firstPollRecs, and then iterating over that, which obviously returns nothing.

How to increase Dataflow read parallelism from Cassandra

I am trying to export a lot of data (2 TB, 30kkk rows) from Cassandra to BigQuery. All my infrastructure is on GCP. My Cassandra cluster have 4 nodes (4 vCPUs, 26 GB memory, 2000 GB PD (HDD) each). There is one seed node in the cluster. I need to transform my data before writing to BQ, so I am using Dataflow. Worker type is n1-highmem-2. Workers and Cassandra instances are at the same zone europe-west1-c. My limits for Cassandra:
Part of my pipeline code responsible for reading transform is located here.
Autoscaling
The problem is that when I don't set --numWorkers, the autoscaling set number of workers in such manner (2 workers average):
Load balancing
When I set --numWorkers=15 the rate of reading doesn't increase and only 2 workers communicate with Cassandra (I can tell it from iftop and only these workers have CPU load ~60%).
At the same time Cassandra nodes don't have a lot of load (CPU usage 20-30%). Network and disk usage of the seed node is about 2 times higher than others, but not too high, I think:
And for the not seed node here:
Pipeline launch warnings
I have some warnings when pipeline is launching:
WARNING: Size estimation of the source failed:
org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#7569ea63
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.132.9.101:9042 (com.datastax.driver.core.exceptions.TransportException: [/10.132.9.101:9042] Cannot connect), /10.132.9.102:9042 (com.datastax.driver.core.exceptions.TransportException: [/10.132.9.102:9042] Cannot connect), /10.132.9.103:9042 (com.datastax.driver.core.exceptions.TransportException: [/10.132.9.103:9042] Cannot connect), /10.132.9.104:9042 [only showing errors of first 3 hosts, use getErrors() for more details])
My Cassandra cluster is in GCE local network and it seams that some queries are made from my local machine and cannot reach the cluster (I am launching pipeline with Dataflow Eclipse plugin as described here). These queries are about size estimation of tables. Can I specify size estimation by hand or launch pipline from GCE instance? Or can I ignore these warnings? Does it have effect on rate of read?
I'v tried to launch pipeline from GCE VM. There is no more problem with connectivity. I don't have varchar columns in my tables but I get such warnings (no codec in datastax driver [varchar <-> java.lang.Long]). :
WARNING: Can't estimate the size
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.lang.Long]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:741)
at com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:588)
at com.datastax.driver.core.CodecRegistry.access$500(CodecRegistry.java:137)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:246)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:232)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3628)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2336)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2295)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2208)
at com.google.common.cache.LocalCache.get(LocalCache.java:4053)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4057)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4986)
at com.datastax.driver.core.CodecRegistry.lookupCodec(CodecRegistry.java:522)
at com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:485)
at com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:467)
at com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:69)
at com.datastax.driver.core.AbstractGettableByIndexData.getLong(AbstractGettableByIndexData.java:152)
at com.datastax.driver.core.AbstractGettableData.getLong(AbstractGettableData.java:26)
at com.datastax.driver.core.AbstractGettableData.getLong(AbstractGettableData.java:95)
at org.apache.beam.sdk.io.cassandra.CassandraServiceImpl.getTokenRanges(CassandraServiceImpl.java:279)
at org.apache.beam.sdk.io.cassandra.CassandraServiceImpl.getEstimatedSizeBytes(CassandraServiceImpl.java:135)
at org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource.getEstimatedSizeBytes(CassandraIO.java:308)
at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.startDynamicSplitThread(BoundedReadEvaluatorFactory.java:166)
at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:142)
at org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
at org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Pipeline read code
// Read data from Cassandra table
PCollection<Model> pcollection = p.apply(CassandraIO.<Model>read()
.withHosts(Arrays.asList("10.10.10.101", "10.10.10.102", "10.10.10.103", "10.10.10.104")).withPort(9042)
.withKeyspace(keyspaceName).withTable(tableName)
.withEntity(Model.class).withCoder(SerializableCoder.of(Model.class))
.withConsistencyLevel(CASSA_CONSISTENCY_LEVEL));
// Transform pcollection to KV PCollection by rowName
PCollection<KV<Long, Model>> pcollection_by_rowName = pcollection
.apply(ParDo.of(new DoFn<Model, KV<Long, Model>>() {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(KV.of(c.element().rowName, c.element()));
}
}));
Number of splits (Stackdriver log)
W Number of splits is less than 0 (0), fallback to 1
I Number of splits is 1
W Number of splits is less than 0 (0), fallback to 1
I Number of splits is 1
W Number of splits is less than 0 (0), fallback to 1
I Number of splits is 1
What I'v tried
No effect:
set read consistency level to ONE
nodetool setstreamthroughput 1000, nodetool setinterdcstreamthroughput 1000
increase Cassandra read concurrency (in cassandra.yaml): concurrent_reads: 32
setting different number of workers 1-40.
Some effect:
1. I'v set numSplits = 10 as #jkff proposed. Now I can see in logs:
I Murmur3Partitioner detected, splitting
W Can't estimate the size
W Can't estimate the size
W Number of splits is less than 0 (0), fallback to 10
I Number of splits is 10
W Number of splits is less than 0 (0), fallback to 10
I Number of splits is 10
I Splitting source org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#6d83ee93 produced 10 bundles with total serialized response size 20799
I Splitting source org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#25d02f5c produced 10 bundles with total serialized response size 19359
I Splitting source [0, 1) produced 1 bundles with total serialized response size 1091
I Murmur3Partitioner detected, splitting
W Can't estimate the size
I Splitting source [0, 0) produced 0 bundles with total serialized response size 76
W Number of splits is less than 0 (0), fallback to 10
I Number of splits is 10
I Splitting source org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#2661dcf3 produced 10 bundles with total serialized response size 18527
But I'v got another exception:
java.io.IOException: Failed to start reading from source: org.apache.beam.sdk.io.cassandra.Cassandra...
(5d6339652002918d): java.io.IOException: Failed to start reading from source: org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#5f18c296
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:582)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:347)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:183)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:148)
at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:68)
at com.google.cloud.dataflow.worker.DataflowWorker.executeWork(DataflowWorker.java:336)
at com.google.cloud.dataflow.worker.DataflowWorker.doWork(DataflowWorker.java:294)
at com.google.cloud.dataflow.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:53 mismatched character 'p' expecting '$'
at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58)
at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:43)
at org.apache.beam.sdk.io.cassandra.CassandraServiceImpl$CassandraReaderImpl.start(CassandraServiceImpl.java:80)
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:579)
... 14 more
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:53 mismatched character 'p' expecting '$'
at com.datastax.driver.core.Responses$Error.asException(Responses.java:144)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:50)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:817)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:651)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1077)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1000)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:565)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:479)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
Maybe there is a mistake: CassandraServiceImpl.java#L220
And this statement looks like mistype: CassandraServiceImpl.java#L207
Changes I'v done to CassandraIO code
As #jkff proposed, I've change CassandraIO in the way I needed:
#VisibleForTesting
protected List<BoundedSource<T>> split(CassandraIO.Read<T> spec,
long desiredBundleSizeBytes,
long estimatedSizeBytes) {
long numSplits = 1;
List<BoundedSource<T>> sourceList = new ArrayList<>();
if (desiredBundleSizeBytes > 0) {
numSplits = estimatedSizeBytes / desiredBundleSizeBytes;
}
if (numSplits <= 0) {
LOG.warn("Number of splits is less than 0 ({}), fallback to 10", numSplits);
numSplits = 10;
}
LOG.info("Number of splits is {}", numSplits);
Long startRange = MIN_TOKEN;
Long endRange = MAX_TOKEN;
Long startToken, endToken;
String pk = "$pk";
switch (spec.table()) {
case "table1":
pk = "table1_pk";
break;
case "table2":
case "table3":
pk = "table23_pk";
break;
}
endToken = startRange;
Long incrementValue = endRange / numSplits - startRange / numSplits;
String splitQuery;
if (numSplits == 1) {
// we have an unique split
splitQuery = QueryBuilder.select().from(spec.keyspace(), spec.table()).toString();
sourceList.add(new CassandraIO.CassandraSource<T>(spec, splitQuery));
} else {
// we have more than one split
for (int i = 0; i < numSplits; i++) {
startToken = endToken;
endToken = startToken + incrementValue;
Select.Where builder = QueryBuilder.select().from(spec.keyspace(), spec.table()).where();
if (i > 0) {
builder = builder.and(QueryBuilder.gte("token(" + pk + ")", startToken));
}
if (i < (numSplits - 1)) {
builder = builder.and(QueryBuilder.lt("token(" + pk + ")", endToken));
}
sourceList.add(new CassandraIO.CassandraSource(spec, builder.toString()));
}
}
return sourceList;
}
I think this should be classified as a bug in CassandraIO. I filed BEAM-3424. You can try building your own version of Beam with that default of 1 changed to 100 or something like that, while this issue is being fixed.
I also filed BEAM-3425 for the bug during size estimation.

Why zabbix do not show a value received from java code?

Consider a java code:
String host = "zabbixHost";
int port = 10051;
ZabbixSender zabbixSender = new ZabbixSender(host, port);
DataObject dataObject = new DataObject();
dataObject.setHost("testHost");
dataObject.setKey("test.ping.count");
dataObject.setValue("10");
// TimeUnit is SECONDS.
dataObject.setClock(System.currentTimeMillis()/1000);
SenderResult result = zabbixSender.send(dataObject);
System.out.println("result:" + result);
if (result.success()) {
System.out.println("send success.");
} else {
System.err.println("sned fail!");
}
The result is {"failed":0,"processed":1,"spentSeconds":0.001715,"total":1}
Then I send a request by zabbix_sender tool from command line:
zabbix_sender -z zabbixHost -p 10051 -s testHost -k test.ping.count -o 8 -v
The output is:
info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.002052"
sent: 1; skipped: 0; total: 1
For now 2 values were sent into Zabbix. But when I got to the monitoring graphic for test.ping.count and only 8 value is shown. E.g. value from java code was not received even when response was successful.
What is going on? How to fix such situation?
Note
The library is - io.github.hengyunabc:zabbix-sender:0.0.3
Zabbix version is 3.0
The problem was with timestamps, zabbix-sender with version 0.0.1 set request (not dataobject) clock in milliseconds while version 0.0.3 in seconds. So using right version fix issues.
maven sample (source):
<dependency>
<groupId>io.github.hengyunabc</groupId>
<artifactId>zabbix-sender</artifactId>
<version>0.0.3</version>
</dependency>

MySQL: Intermittant com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

I am getting this exception intermittently and am having a really hard time tracking down how to fix it.
The load on this server is pretty low, and the application is on the same host as the MySQL server connecting via localhost (MySQL 5.6). I have reasonable values for 'connect_timeout' (10), "interactive_timeout' (28800) and 'wait_timeout' (28800) and there is no firewall to cause issues.
Yet every so often this error pops up:
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
Last packet sent to the server was 1 ms ago.
at sun.reflect.GeneratedConstructorAccessor206.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2536)
at com.mysql.jdbc.ConnectionImpl.setAutoCommit(ConnectionImpl.java:4874)
at com.mchange.v2.c3p0.impl.NewProxyConnection.setAutoCommit(NewProxyConnection.java:881)
at org.hibernate.c3p0.internal.C3P0ConnectionProvider.getConnection(C3P0ConnectionProvider.java:94)
at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:380)
at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:228)
... 76 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2431)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
This is what I have for my Hibernate configuration:
.setProperty("hibernate.hbm2ddl.auto", "validate")
.setProperty("hibernate.dialect", "org.hibernate.dialect.MySQL5InnoDBDialect")
.setProperty("hibernate.connection.provider_class", "org.hibernate.connection.C3P0ConnectionProvider")
.setProperty("hibernate.c3p0.idle_test_period", "1000")
.setProperty("hibernate.c3p0.min_size", "1")
.setProperty("hibernate.c3p0.max_size", "20")
.setProperty("hibernate.c3p0.timeout", "1800")
.setProperty("hibernate.c3p0.max_statements", "50")
.setProperty("hibernate.c3p0.debugUnreturnedConnectionStackTraces", "true")
.setProperty("hibernate.c3p0.unreturnedConnectionTimeout", "20")
And this is my MySQL config:
[mysqld]
## General
ignore-db-dir = lost+found
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
tmpdir = /var/lib/mysqltmp
## Cache
table-definition-cache = 4096
table-open-cache = 4096
#table-open-cache-instances = 1
#thread-cache-size = 16
#query-cache-size = 32M
#query-cache-type = 1
## Per-thread Buffers
#join-buffer-size = 512K
#read-buffer-size = 512K
#read-rnd-buffer-size = 512K
#sort-buffer-size = 512K
## Temp Tables
#max-heap-table-size = 64M
#tmp-table-size = 32M
## Networking
#interactive-timeout = 3600
max-connections = 400
max-connect-errors = 1000000
max-allowed-packet = 16M
skip-name-resolve
wait-timeout = 600
## MyISAM
key-buffer-size = 32M
#myisam-recover = FORCE,BACKUP
myisam-sort-buffer-size = 128M
## InnoDB
#innodb-buffer-pool-size = 256M
innodb-file-format = Barracuda
#innodb-file-per-table = 1
#innodb-flush-method = O_DIRECT
#innodb-log-file-size = 128M
## Replication and PITR
#binlog-format = ROW
expire-logs-days = 7
#log-bin = /var/log/mysql/bin-log
#log-slave-updates = 1
#max-binlog-size = 128M
#read-only = 1
#relay-log = /var/log/mysql/relay-log
relay-log-space-limit = 16G
server-id = 1
## Logging
#log-output = FILE
#log-slow-admin-statements
#log-slow-slave-statements
#log-warnings = 0
#long-query-time = 2
#slow-query-log = 1
#slow-query-log-file = /var/log/mysql/slow-log
[mysqld_safe]
log-error = /var/log/mysqld.log
#malloc-lib = /usr/lib64/libjemalloc.so.1
open-files-limit = 65535
[mysql]
no-auto-rehash
If load is low you can try to use testConnectionOnCheckout to true
testConnectionOnCheckout Must be set in c3p0.properties, C3P0 default: false
If set to true, an operation will be performed at every connection checkout to verify that the connection is valid.
Actually it's expensive.
A better choice is to verify connections periodically using c3p0.idleConnectionTestPeriod. Try to reduce the value and check how it works

Categories

Resources