OutOfOrderScannerNextException when filtering results in HBase - java

I am trying to filter results in HBase this way:
List<Filter> andFilterList = new ArrayList<>();
SingleColumnValueFilter sourceLowerFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("source"), CompareFilter.CompareOp.GREATER, Bytes.toBytes(lowerLimit));
sourceLowerFilter.setFilterIfMissing(true);
SingleColumnValueFilter sourceUpperFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("source"), CompareFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes(upperLimit));
sourceUpperFilter.setFilterIfMissing(true);
SingleColumnValueFilter targetLowerFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("target"), CompareFilter.CompareOp.GREATER, Bytes.toBytes(lowerLimit));
targetLowerFilter.setFilterIfMissing(true);
SingleColumnValueFilter targetUpperFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("target"), CompareFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes(upperLimit));
targetUpperFilter.setFilterIfMissing(true);
andFilterList.add(sourceUpperFilter);
andFilterList.add(targetUpperFilter);
FilterList andFilter = new FilterList(FilterList.Operator.MUST_PASS_ALL, andFilterList);
List<Filter> orFilterList = new ArrayList<>();
orFilterList.add(sourceLowerFilter);
orFilterList.add(targetLowerFilter);
FilterList orFilter = new FilterList(FilterList.Operator.MUST_PASS_ONE, orFilterList);
FilterList fl = new FilterList(FilterList.Operator.MUST_PASS_ALL);
fl.addFilter(andFilter);
fl.addFilter(orFilter);
Scan edgeScan = new Scan();
edgeScan.setFilter(fl);
ResultScanner edgeScanner = table.getScanner(edgeScan);
Result edgeResult;
logger.info("Writing edges...");
while ((edgeResult = edgeScanner.next()) != null) {
// Some code
}
This code launchs this error:
org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:402)
at org.deustotech.internet.phd.framework.rdf2subdue.RDF2Subdue.writeFile(RDF2Subdue.java:150)
at org.deustotech.internet.phd.framework.rdf2subdue.RDF2Subdue.run(RDF2Subdue.java:39)
at org.deustotech.internet.phd.Main.main(Main.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 178 number_of_rows: 100 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3098)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
at java.lang.Thread.run(Thread.java:745)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:354)
... 9 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 178 number_of_rows: 100 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3098)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
at java.lang.Thread.run(Thread.java:745)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1657)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1715)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
... 13 more
The RPC timeout is set to 600000. I have tried to remove some filters given these results:
sourceUpperFilter && (sourceLowerFilter || targetLowerFilter) --> Success
targetUpperFilter && (sourceLowerFilter || targetLowerFilter) --> Success
(sourceUpperFilter && targetUpperFilter) && (sourceLowerFilter) --> Fail
(sourceUpperFilter && targetUpperFilter) && (targetLowerFilter) --> Fail
Any help would be appreciated. Thank you.

I solve this problem by setting hbase.client.scanner.caching
see also
Client and RS maintain a nextCallSeq number during the scan. Every next() call from client to server will increment this number in both sides. Client passes this number along with the request and at RS side both the incoming nextCallSeq and its nextCallSeq will be matched. In case of a timeout this increment at the client side should not happen. If at the server side fetching of next batch of data was over, there will be mismatch in the nextCallSeq number. Server will throw OutOfOrderScannerNextException and then client will reopen the scanner with startrow as the last successfully retrieved row.
Since the problem is caused by the client-side overtime, then the corresponding reduction in client cache (hbase.client.scanner.caching) size or increase rpc timeout time (hbase.rpc.timeout) can be.
Hope this answer helps.

Reason :Looking for few rows from a big region. It takes time to fill the #rows
as requested by client side. By this time the client gets an rpc timeout.
So client side will retry the call on same scanner. Remember with this next
call client says give me next N rows from where you are. The old failed
call was in progress and would have advanced some rows. So this retry call
will miss those rows.... to avoid this and to distinguish this case we have
this scan seqno and this exception. On seeing this the client will close
the scanner and create a new one with proper start row . But this retry way
happens only one more time. Again this call also might be timing out.
So we
have to adjust the timeout and/or scan caching value.
heart beat mechanism avoids such timeout for long running scans.
In our case where data is huge in hbase we have used
RPC time out = 1800000 and lease period = 1800000 and we have used fuzzy row filters and also scan.setCaching(xxxx)// value need to be adjusted ;
Note : value filters are slow(since full table scan will take long for execution) than row filter
With all above precautions we are successful to query huge data from hbase with mapreduce.
Hope this explanation helps.

Related

How to track committed offset with Spark job for kafka batch

I have a use case where i am writing to a Kafka topic in batches using spark job (no streaming).Initially i pump-in suppose 10 records to Kafka topic and run the spark job which does some processing and finally write to another Kafka topic.
Next time when i push another 5 records and run the spark job, my requirement is to start processing these 5 records only not from starting offset. I need to maintain the committed offset so that spark job should run on next offset position and do the processing.
Here is code from kafka side to fetch the offset:
private static List<TopicPartition> getPartitions(KafkaConsumer consumer, String topic) {
List<PartitionInfo> partitionInfoList = consumer.partitionsFor(topic);
return partitionInfoList.stream().map(x -> new TopicPartition(topic, x.partition())).collect(Collectors.toList());
}
public static void getOffSet(KafkaConsumer consumer) {
List<TopicPartition> topicPartitions = getPartitions(consumer, topic);
consumer.assign(topicPartitions);
consumer.seekToBeginning(topicPartitions);
topicPartitions.forEach(x -> {
System.out.println("Partition-> " + x + " startingOffSet-> " + consumer.position(x));
});
consumer.assign(topicPartitions);
consumer.seekToEnd(topicPartitions);
topicPartitions.forEach(x -> {
System.out.println("Partition-> " + x + " endingOffSet-> " + consumer.position(x));
});
topicPartitions.forEach(x -> {
consumer.poll(1000) ;
OffsetAndMetadata offsetAndMetadata = consumer.committed(x);
long position = consumer.position(x);
System.out.printf("Committed: %s, current position %s%n", offsetAndMetadata == null ? null : offsetAndMetadata
.offset(), position);
});
}
Below code is for spark to load the messages from topic which is not working :
Dataset<Row> kafkaDataset = session.read().format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", topic)
.option("group.id", "test-consumer-group")
.option("startingOffsets","{\"Topic1\":{\"0\":2}}")
.option("endingOffsets", "{\"Topic1\":{\"0\":3}}")
.option("enable.auto.commit","true")
.load();
After above code executes i am again trying to get the offset by calling
getoffset(consumer)
from the topic which always reads from 0 offset and committed offset fetched initially keeps on increasing. I am new to kafka and still figuring out how to handle such scenarion.Please help here.
Initially i had 10 records in my topic, i published another 2 records and here is the o/p:
Output post getoffset method executes :
Partition-> Topic00-0 startingOffSet-> 0 Partition->
Topic00-0 endingOffSet-> 12 Committed: 12, current position
12
Output post spark code executes for loading messages.
Partition-> Topic00-0 startingOffSet-> 0 Partition->
Topic00-0 endingOffSet-> 12 Committed: 12, current position
12
I see no diff and . Please take a look and suggest resolution for this sceanario.

How to increase Dataflow read parallelism from Cassandra

I am trying to export a lot of data (2 TB, 30kkk rows) from Cassandra to BigQuery. All my infrastructure is on GCP. My Cassandra cluster have 4 nodes (4 vCPUs, 26 GB memory, 2000 GB PD (HDD) each). There is one seed node in the cluster. I need to transform my data before writing to BQ, so I am using Dataflow. Worker type is n1-highmem-2. Workers and Cassandra instances are at the same zone europe-west1-c. My limits for Cassandra:
Part of my pipeline code responsible for reading transform is located here.
Autoscaling
The problem is that when I don't set --numWorkers, the autoscaling set number of workers in such manner (2 workers average):
Load balancing
When I set --numWorkers=15 the rate of reading doesn't increase and only 2 workers communicate with Cassandra (I can tell it from iftop and only these workers have CPU load ~60%).
At the same time Cassandra nodes don't have a lot of load (CPU usage 20-30%). Network and disk usage of the seed node is about 2 times higher than others, but not too high, I think:
And for the not seed node here:
Pipeline launch warnings
I have some warnings when pipeline is launching:
WARNING: Size estimation of the source failed:
org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#7569ea63
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.132.9.101:9042 (com.datastax.driver.core.exceptions.TransportException: [/10.132.9.101:9042] Cannot connect), /10.132.9.102:9042 (com.datastax.driver.core.exceptions.TransportException: [/10.132.9.102:9042] Cannot connect), /10.132.9.103:9042 (com.datastax.driver.core.exceptions.TransportException: [/10.132.9.103:9042] Cannot connect), /10.132.9.104:9042 [only showing errors of first 3 hosts, use getErrors() for more details])
My Cassandra cluster is in GCE local network and it seams that some queries are made from my local machine and cannot reach the cluster (I am launching pipeline with Dataflow Eclipse plugin as described here). These queries are about size estimation of tables. Can I specify size estimation by hand or launch pipline from GCE instance? Or can I ignore these warnings? Does it have effect on rate of read?
I'v tried to launch pipeline from GCE VM. There is no more problem with connectivity. I don't have varchar columns in my tables but I get such warnings (no codec in datastax driver [varchar <-> java.lang.Long]). :
WARNING: Can't estimate the size
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.lang.Long]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:741)
at com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:588)
at com.datastax.driver.core.CodecRegistry.access$500(CodecRegistry.java:137)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:246)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:232)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3628)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2336)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2295)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2208)
at com.google.common.cache.LocalCache.get(LocalCache.java:4053)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4057)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4986)
at com.datastax.driver.core.CodecRegistry.lookupCodec(CodecRegistry.java:522)
at com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:485)
at com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:467)
at com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:69)
at com.datastax.driver.core.AbstractGettableByIndexData.getLong(AbstractGettableByIndexData.java:152)
at com.datastax.driver.core.AbstractGettableData.getLong(AbstractGettableData.java:26)
at com.datastax.driver.core.AbstractGettableData.getLong(AbstractGettableData.java:95)
at org.apache.beam.sdk.io.cassandra.CassandraServiceImpl.getTokenRanges(CassandraServiceImpl.java:279)
at org.apache.beam.sdk.io.cassandra.CassandraServiceImpl.getEstimatedSizeBytes(CassandraServiceImpl.java:135)
at org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource.getEstimatedSizeBytes(CassandraIO.java:308)
at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.startDynamicSplitThread(BoundedReadEvaluatorFactory.java:166)
at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:142)
at org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
at org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Pipeline read code
// Read data from Cassandra table
PCollection<Model> pcollection = p.apply(CassandraIO.<Model>read()
.withHosts(Arrays.asList("10.10.10.101", "10.10.10.102", "10.10.10.103", "10.10.10.104")).withPort(9042)
.withKeyspace(keyspaceName).withTable(tableName)
.withEntity(Model.class).withCoder(SerializableCoder.of(Model.class))
.withConsistencyLevel(CASSA_CONSISTENCY_LEVEL));
// Transform pcollection to KV PCollection by rowName
PCollection<KV<Long, Model>> pcollection_by_rowName = pcollection
.apply(ParDo.of(new DoFn<Model, KV<Long, Model>>() {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(KV.of(c.element().rowName, c.element()));
}
}));
Number of splits (Stackdriver log)
W Number of splits is less than 0 (0), fallback to 1
I Number of splits is 1
W Number of splits is less than 0 (0), fallback to 1
I Number of splits is 1
W Number of splits is less than 0 (0), fallback to 1
I Number of splits is 1
What I'v tried
No effect:
set read consistency level to ONE
nodetool setstreamthroughput 1000, nodetool setinterdcstreamthroughput 1000
increase Cassandra read concurrency (in cassandra.yaml): concurrent_reads: 32
setting different number of workers 1-40.
Some effect:
1. I'v set numSplits = 10 as #jkff proposed. Now I can see in logs:
I Murmur3Partitioner detected, splitting
W Can't estimate the size
W Can't estimate the size
W Number of splits is less than 0 (0), fallback to 10
I Number of splits is 10
W Number of splits is less than 0 (0), fallback to 10
I Number of splits is 10
I Splitting source org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#6d83ee93 produced 10 bundles with total serialized response size 20799
I Splitting source org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#25d02f5c produced 10 bundles with total serialized response size 19359
I Splitting source [0, 1) produced 1 bundles with total serialized response size 1091
I Murmur3Partitioner detected, splitting
W Can't estimate the size
I Splitting source [0, 0) produced 0 bundles with total serialized response size 76
W Number of splits is less than 0 (0), fallback to 10
I Number of splits is 10
I Splitting source org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#2661dcf3 produced 10 bundles with total serialized response size 18527
But I'v got another exception:
java.io.IOException: Failed to start reading from source: org.apache.beam.sdk.io.cassandra.Cassandra...
(5d6339652002918d): java.io.IOException: Failed to start reading from source: org.apache.beam.sdk.io.cassandra.CassandraIO$CassandraSource#5f18c296
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:582)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:347)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:183)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:148)
at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:68)
at com.google.cloud.dataflow.worker.DataflowWorker.executeWork(DataflowWorker.java:336)
at com.google.cloud.dataflow.worker.DataflowWorker.doWork(DataflowWorker.java:294)
at com.google.cloud.dataflow.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:53 mismatched character 'p' expecting '$'
at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58)
at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:43)
at org.apache.beam.sdk.io.cassandra.CassandraServiceImpl$CassandraReaderImpl.start(CassandraServiceImpl.java:80)
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:579)
... 14 more
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:53 mismatched character 'p' expecting '$'
at com.datastax.driver.core.Responses$Error.asException(Responses.java:144)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:50)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:817)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:651)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1077)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1000)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:565)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:479)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
Maybe there is a mistake: CassandraServiceImpl.java#L220
And this statement looks like mistype: CassandraServiceImpl.java#L207
Changes I'v done to CassandraIO code
As #jkff proposed, I've change CassandraIO in the way I needed:
#VisibleForTesting
protected List<BoundedSource<T>> split(CassandraIO.Read<T> spec,
long desiredBundleSizeBytes,
long estimatedSizeBytes) {
long numSplits = 1;
List<BoundedSource<T>> sourceList = new ArrayList<>();
if (desiredBundleSizeBytes > 0) {
numSplits = estimatedSizeBytes / desiredBundleSizeBytes;
}
if (numSplits <= 0) {
LOG.warn("Number of splits is less than 0 ({}), fallback to 10", numSplits);
numSplits = 10;
}
LOG.info("Number of splits is {}", numSplits);
Long startRange = MIN_TOKEN;
Long endRange = MAX_TOKEN;
Long startToken, endToken;
String pk = "$pk";
switch (spec.table()) {
case "table1":
pk = "table1_pk";
break;
case "table2":
case "table3":
pk = "table23_pk";
break;
}
endToken = startRange;
Long incrementValue = endRange / numSplits - startRange / numSplits;
String splitQuery;
if (numSplits == 1) {
// we have an unique split
splitQuery = QueryBuilder.select().from(spec.keyspace(), spec.table()).toString();
sourceList.add(new CassandraIO.CassandraSource<T>(spec, splitQuery));
} else {
// we have more than one split
for (int i = 0; i < numSplits; i++) {
startToken = endToken;
endToken = startToken + incrementValue;
Select.Where builder = QueryBuilder.select().from(spec.keyspace(), spec.table()).where();
if (i > 0) {
builder = builder.and(QueryBuilder.gte("token(" + pk + ")", startToken));
}
if (i < (numSplits - 1)) {
builder = builder.and(QueryBuilder.lt("token(" + pk + ")", endToken));
}
sourceList.add(new CassandraIO.CassandraSource(spec, builder.toString()));
}
}
return sourceList;
}
I think this should be classified as a bug in CassandraIO. I filed BEAM-3424. You can try building your own version of Beam with that default of 1 changed to 100 or something like that, while this issue is being fixed.
I also filed BEAM-3425 for the bug during size estimation.

Selenium testing with lists for load performance

I'm attempting to use a unit test of mine for "load" testing on our browser. For various reasons, we have seen performance degradation on the browser side, because we heavily rely on the print dialog.
I have the following unit test working via ScalaTest:
class LoadPrePaidSpec extends FlatSpec with Matchers with Chrome with Eventually {
implicit override val patienceConfig =
PatienceConfig(timeout = scaled(Span(40, Seconds)), interval = scaled(Span(100, Millis)))
def build(csvLine:String):TestCSVHolder ={
val split = csvLine.split(",")
TestCSVHolder(memberId = split(0), preSaleCode = split(1),
prePaidCode = split(2), lastName = split(3), firstName = split(4), badgeName = split(5))
}
def memberHelper(member: TestCSVHolder): Unit = {
//insert member id via prepaid code
textField("member_id").value = member.prePaidCode
//fire keyup event
executeScript("var eventToFire=jQuery.Event(\"keyup\");eventToFire.keyCode=221;eventToFire.which=221;" +
"$(\"#member_id\").trigger(eventToFire)")
eventually {
val eles = webDriver.findElements(By.xpath(s"//*[contains(#id, '${member.memberId}')]"))
eles.get(0).getTagName
//We remove the head element because it just says Prep For Print
val tdEles = (eles.get(0).findElements(By.tagName("td")).toList.tail)
tdEles(0).getText() should be(member.lastName)
tdEles(1).getText() should be(member.firstName)
tdEles(2).getText() should be(member.badgeName)
}
}
"Scanning an ID" should "look up the member" in {
val member = new TestCSVHolder("100001", "ABCD", "[-100001-ABCD]", "John", "Doe", "JohnDoe")
go to (url)
//login
textField("user_name").value = "mrkaiser"
webDriver.findElementById("credentials").sendKeys("somepassword")
click on ("btnLogin")
//click to pre-paid
click on linkText("Pre-Paid")
memberHelper())
webDriver.quit()
}
}
However when I try to iterate through a list of elements using a foreach and passing in memberHelper, after a list of about 5 elements, I get the following stack trace:
The code passed to eventually never returned normally. Attempted 369 times over 40.110734904 seconds. Last failure message: Index: 0, Size: 0.
ScalaTestFailureLocation: LoadPrePaidSpec at (LoadPrePaidSpec.scala:43)
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 369 times over 40.110734904 seconds. Last failure message: Index: 0, Size: 0.
at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
at LoadPrePaidSpec.eventually(LoadPrePaidSpec.scala:17)
at LoadPrePaidSpec.memberHelper(LoadPrePaidSpec.scala:43)
at LoadPrePaidSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(LoadPrePaidSpec.scala:70)
at LoadPrePaidSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(LoadPrePaidSpec.scala:70)
at scala.collection.immutable.List.foreach(List.scala:383)
at LoadPrePaidSpec$$anonfun$1.apply$mcV$sp(LoadPrePaidSpec.scala:70)
at LoadPrePaidSpec$$anonfun$1.apply(LoadPrePaidSpec.scala:54)
at LoadPrePaidSpec$$anonfun$1.apply(LoadPrePaidSpec.scala:54)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683)
at org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644)
at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656)
at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1683)
at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:383)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:383)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714)
at org.scalatest.FlatSpec.runTests(FlatSpec.scala:1683)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1683)
at org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
at org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FlatSpecLike$class.run(FlatSpecLike.scala:1760)
at org.scalatest.FlatSpec.run(FlatSpec.scala:1683)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:383)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
at org.scalatest.tools.Runner$.run(Runner.scala:883)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at LoadPrePaidSpec$$anonfun$memberHelper$1.apply$mcV$sp(LoadPrePaidSpec.scala:45)
at LoadPrePaidSpec$$anonfun$memberHelper$1.apply(LoadPrePaidSpec.scala:43)
at LoadPrePaidSpec$$anonfun$memberHelper$1.apply(LoadPrePaidSpec.scala:43)
at org.scalatest.concurrent.Eventually$class.makeAValiantAttempt$1(Eventually.scala:394)
at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:408)
... 63 more
My end goal is to actually test something in the 20K range of elements from a file, but until I can get a small list like this working, I'm up a creek.
I'm using the chromedriver and am on Scala 2.11.6, scala test 2.2.0, selenium 2.35.0.

Access FreePastry program that is behind NAT

I'm trying to connect to my program that uses FreePastry behind a NAT but getting no where. mIP is my public IP, mBootport and mBindport is 50001. I have forworded this ports in my router to my computer stil it does not work. I disabled the firewall yet nothing. I disconnected the router and connect directly to the internet and stil it does not work. The only time it does work in on my local network. So something most be wrong in either the code of the config file but i can not see what is wrong.
Environment env = new Environment();
InetSocketAddress bootaddress = new InetSocketAddress(mIP, mBootport);
NodeIdFactory nidFactory = new RandomNodeIdFactory(env);
PastryNodeFactory factory = new SocketPastryNodeFactory(nidFactory, mBindport, env);
for (int curNode = 0; curNode < mNumNodes; curNode++) {
PastryNode node = factory.newNode();
NetworkHandler app = new NetworkHandler(node, mLog);
apps.add(app);
node.boot(bootaddress);
synchronized(node) {
while(!node.isReady() && !node.joinFailed()) {
node.wait(500);
if (node.joinFailed()) {
throw new IOException("Could not join the FreePastry ring. Reason:"+node.joinFailedReason());
}
}
}
System.out.println("Finished creating new node: " + node);
mLog.append("Finished creating new node: " + node + "\n");
}
Iterator<NetworkHandler> i = apps.iterator();
NetworkHandler app = (NetworkHandler) i.next();
app.subscribe();
public class NetworkHandler implements ScribeClient, Application {
int seqNum = 0;
CancellableTask publishTask;
Scribe myScribe;
Topic myTopic;
JTextArea mLog;
protected Endpoint endpoint;
public NetworkHandler(Node node, JTextArea log) {
this.endpoint = node.buildEndpoint(this, "myinstance");
mLog = log;
myScribe = new ScribeImpl(node,"myScribeInstance");
myTopic = new Topic(new PastryIdFactory(node.getEnvironment()), "example topic");
System.out.println("myTopic = "+myTopic);
mLog.append("myTopic = "+myTopic + "\n");
endpoint.register();
}
public void subscribe() {
myScribe.subscribe(myTopic, this);
}
}
freepastry.params
# this file holds the default values for pastry and it's applications
# you do not need to modify the default.params file to override these values
# instead you can use your own params file to set values to override the
# defaults. You can specify this file by constructing your
# rice.environment.Environment() with the filename you wish to use
# typically, you will want to be able to pass this file name from the command
# line
# max number of handles stored per routing table entry
pastry_rtMax = 1
pastry_rtBaseBitLength = 4
# leafset size
pastry_lSetSize = 24
# maintenance frequencies
pastry_leafSetMaintFreq = 60
pastry_routeSetMaintFreq = 900
# drop the message if pastry is not ready
pastry_messageDispatch_bufferIfNotReady = false
# number of messages to buffer while an app hasn't yet been registered
pastry_messageDispatch_bufferSize = 32
# FP 2.1 uses the new transport layer
transport_wire_datagram_receive_buffer_size = 131072
transport_wire_datagram_send_buffer_size = 65536
transport_epoch_max_num_addresses = 2
transport_sr_max_num_hops = 5
# proximity neighbor selection
transport_use_pns = true
# number of rows in the routing table to consider during PNS
# valid values are ALL, or a number
pns_num_rows_to_use = 10
# commonapi testing parameters
# direct or socket
commonapi_testing_exit_on_failure = true
commonapi_testing_protocol = direct
commonapi_testing_startPort = 5009
commonapi_testing_num_nodes = 10
# set this to specify the bootstrap node
#commonapi_testing_bootstrap = localhost:5009
# random number generator's seed, "CLOCK" uses the current clock time
random_seed = CLOCK
# sphere, euclidean or gt-itm
direct_simulator_topology = sphere
# -1 starts the simulation with the current time
direct_simulator_start_time = -1
#pastry_direct_use_own_random = true
#pastry_periodic_leafset_protocol_use_own_random = true
pastry_direct_gtitm_matrix_file=GNPINPUT
# the number of stubs in your network
pastry_direct_gtitm_max_overlay_size=1000
# the number of virtual nodes at each stub: this allows you to simulate multiple "LANs" and allows cheeper scaling
pastry_direct_gtitm_nodes_per_stub=1
# the factor to multiply your file by to reach millis. Set this to 0.001 if your file is in microseconds. Set this to 1000 if your file is in seconds.
pastry_direct_gtitm_delay_factor=1.0
#millis of the maximum network delay for the generated network topologies
pastry_direct_max_diameter=200
pastry_direct_min_delay=2
#setting this to false will use the old protocols which are about 200 times as fast, but may cause routing inconsistency in a real network. Probably won't in a simulator because it will never be incorrect about liveness
pastry_direct_guarantee_consistency=true
# rice.pastry.socket parameters
# tells the factory you intend to use multiple nodes
# this causes the logger to prepend all entries with the nodeid
pastry_factory_multipleNodes = true
pastry_factory_selectorPerNode = false
pastry_factory_processorPerNode = false
# number of bootstap nodehandles to fetch in parallel
pastry_factory_bootsInParallel = 1
# the maximum size of a message
pastry_socket_reader_selector_deserialization_max_size = 1000000
# the maximum number of outgoing messages to queue when a socket is slower than the number of messages you are queuing
pastry_socket_writer_max_queue_length = 30
pastry_socket_writer_max_msg_size = 20480
pastry_socket_repeater_buffer_size = 65536
pastry_socket_pingmanager_smallPings=true
pastry_socket_pingmanager_datagram_receive_buffer_size = 131072
pastry_socket_pingmanager_datagram_send_buffer_size = 65536
# the time before it will retry a route that was already found dead
pastry_socket_srm_check_dead_throttle = 300000
pastry_socket_srm_proximity_timeout = 3600000
pastry_socket_srm_ping_throttle = 30000
pastry_socket_srm_default_rto = 3000
pastry_socket_srm_rto_ubound = 10000
pastry_socket_srm_rto_lbound = 50
pastry_socket_srm_gain_h = 0.25
pastry_socket_srm_gain_g = 0.125
pastry_socket_scm_max_open_sockets = 300
pastry_socket_scm_max_open_source_routes = 30
# the maximum number of source routes to attempt, setting this to 0 will
# effectively eliminate source route attempts
# setting higher than the leafset does no good, it will be bounded by the leafset
# a larger number tries more source routes, which could give you a more accurate
# determination, however, is more likely to lead to congestion collapse
pastry_socket_srm_num_source_route_attempts = 8
pastry_socket_scm_socket_buffer_size = 32768
# this parameter is multiplied by the exponential backoff when doing a liveness check so the first will be 800, then 1600, then 3200 etc...
pastry_socket_scm_ping_delay = 800
# adds some fuzziness to the pings to help prevent congestion collapse, so this will make the ping be advanced or delayed by this factor
pastry_socket_scm_ping_jitter = 0.1
# how many pings until we call the node faulty
pastry_socket_scm_num_ping_tries = 5
pastry_socket_scm_write_wait_time = 30000
pastry_socket_scm_backoff_initial = 250
pastry_socket_scm_backoff_limit = 5
pastry_socket_pingmanager_testSourceRouting = false
pastry_socket_increment_port_after_construction = true
# if you want to allow connection to 127.0.0.1, set this to true
pastry_socket_allow_loopback = false
# these params will be used if the computer attempts to bind to the loopback address, they will open a socket to this address/port to identify which network adapter to bind to
pastry_socket_known_network_address = yahoo.com
pastry_socket_known_network_address_port = 80
pastry_socket_use_own_random = true
pastry_socket_random_seed = clock
# force the node to be a seed node
rice_socket_seed = false
# the parameter simulates some nodes being firewalled, base on rendezvous_test_num_firewalled
rendezvous_test_firewall = false
# probabilistic fraction of firewalled nodes
rendezvous_test_num_firewalled = 0.3
# don't firewall the first node, useful for testing
rendezvous_test_makes_bootstrap = false
# FP 2.1 uses the new transport layer
transport_wire_datagram_receive_buffer_size = 131072
transport_wire_datagram_send_buffer_size = 65536
# NAT/UPnP settings
nat_network_prefixes = 127.0.0.1;10.;192.168.
# Enable and set this if you have already set up port forwarding and know the external address
#external_address = 123.45.67.89:1234
#enable this if you set up port forwarding (on the same port), but you don't
#know the external address and you don't have UPnP enabled
#this is useful for a firwall w/o UPnP support, and your IP address isn't static
probe_for_external_address = true
# values how to probe
pastry_proxy_connectivity_timeout = 15000
pastry_proxy_connectivity_tries = 3
# possible values: always, never, prefix (prefix is if the localAddress matches any of the nat_network_prefixes
# whether to search for a nat using UPnP (default: prefix)
nat_search_policy = prefix
# whether to verify connectivity (default: boot)
firewall_test_policy = never
# policy for setting port forwarding the state of the firewall if there is already a conflicting rule: overwrite, fail (throw exception), change (use different port)
# you may want to set this to overwrite or fail on the bootstrap nodes, but most freepastry applications can run on any available port, so the default is change
nat_state_policy = change
# the name of the application in the firewall, set this if you want your application to have a more specific name
nat_app_name = freepastry
# how long to wait for responses from the firewall, in millis
nat_discovery_timeout = 5000
# how many searches to try to find a free firewall port
nat_find_port_max_tries = 10
# uncomment this to use UPnP NAT port forwarding, you need to include in the classpath: commons-jxpath-1.1.jar:commons-logging.jar:sbbi-upnplib-xxx.jar
nat_handler_class = rice.pastry.socket.nat.sbbi.SBBINatHandler
# hairpinning:
# default "prefix" requires more bandwidth if you are behind a NAT. It enables multiple IP
# addresses in the NodeHandle if you are behind a NAT. These are usually the internet routable address,
# and the LAN address (usually 192.168.x.x)
# you can set this to never if any of the following conditions hold:
# a) you are the only FreePastry node behind this address
# b) you firewall supports hairpinning see
# http://scm.sipfoundry.org/rep/ietf-drafts/behave/draft-ietf-behave-nat-udp-03.html#rfc.section.6
nat_nodehandle_multiaddress = prefix
# if we are not scheduled for time on cpu in this time, we setReady(false)
# otherwise there could be message inconsistency, because
# neighbors may believe us to be dead. Note that it is critical
# to consider the amount of time it takes the transport layer to find a
# node faulty before setting this parameter, this parameter should be
# less than the minimum time required to find a node faulty
pastry_protocol_consistentJoin_max_time_to_be_scheduled = 15000
# in case messages are dropped or something, how often it will retry to
# send the consistent join message, to get verification from the entire
# leafset
pastry_protocol_consistentJoin_retry_interval = 30000
# parameter to control how long dead nodes are retained in the "failed set" in
# CJP (see ConsistentJoinProtocol ctor) (15 minutes)
pastry_protocol_consistentJoin_failedRetentionTime = 900000
# how often to cleanup the failed set (5 mins) (see ConsistentJoinProtocol ctor)
pastry_protocol_consistentJoin_cleanup_interval = 300000
# the maximum number of entries to send in the failed set, only sends the most
recent detected failures (see ConsistentJoinProtocol ctor)
pastry_protocol_consistentJoin_maxFailedToSend = 20
# how often we send/expect to be sent updates
pastry_protocol_periodicLeafSet_ping_neighbor_period = 20000
pastry_protocol_periodicLeafSet_lease_period = 30000
# what the grace period is to receive a periodic update, before checking
# liveness
pastry_protocol_periodicLeafSet_request_lease_throttle = 10000
# how many entries are kept in the partition handler's table
partition_handler_max_history_size=20
# how long entries in the partition handler's table are kept
# 90 minutes
partition_handler_max_history_age=5400000
# what fraction of the time a bootstrap host is checked
partition_handler_bootstrap_check_rate=0.05
# how often to run the partition handler
# 5 minutes
partition_handler_check_interval=300000
# the version number of the RouteMessage to transmit (it can receive anything that it knows how to)
# this is useful if you need to migrate an older ring
# you can change this value in realtime, so, you can start at 0 and issue a command to update it to 1
pastry_protocol_router_routeMsgVersion = 1
# should usually be equal to the pastry_rtBaseBitLength
p2p_splitStream_stripeBaseBitLength = 4
p2p_splitStream_policy_default_maximum_children = 24
p2p_splitStream_stripe_max_failed_subscription = 5
p2p_splitStream_stripe_max_failed_subscription_retry_delay = 1000
#multiring
p2p_multiring_base = 2
#past
p2p_past_messageTimeout = 30000
p2p_past_successfulInsertThreshold = 0.5
#replication
# fetch delay is the delay between fetching successive keys
p2p_replication_manager_fetch_delay = 500
# the timeout delay is how long we take before we time out fetching a key
p2p_replication_manager_timeout_delay = 20000
# this is the number of keys to delete when we detect a change in the replica set
p2p_replication_manager_num_delete_at_once = 100
# this is how often replication will wake up and do maintainence; 10 mins
p2p_replication_maintenance_interval = 600000
# the maximum number of keys replication will try to exchange in a maintainence message
p2p_replication_max_keys_in_message = 1000
#scribe
p2p_scribe_maintenance_interval = 180000
#time for a subscribe fail to be thrown (in millis)
p2p_scribe_message_timeout = 15000
#util
p2p_util_encryptedOutputStream_buffer = 32678
#aggregation
p2p_aggregation_logStatistics = true
p2p_aggregation_flushDelayAfterJoin = 30000
#5 MINS
p2p_aggregation_flushStressInterval = 300000
#5 MINS
p2p_aggregation_flushInterval = 300000
#1024*1024
p2p_aggregation_maxAggregateSize = 1048576
p2p_aggregation_maxObjectsInAggregate = 25
p2p_aggregation_maxAggregatesPerRun = 2
p2p_aggregation_addMissingAfterRefresh = true
p2p_aggregation_maxReaggregationPerRefresh = 100
p2p_aggregation_nominalReferenceCount = 2
p2p_aggregation_maxPointersPerAggregate = 100
#14 DAYS
p2p_aggregation_pointerArrayLifetime = 1209600000
#1 DAY
p2p_aggregation_aggregateGracePeriod = 86400000
#15 MINS
p2p_aggregation_aggrRefreshInterval = 900000
p2p_aggregation_aggrRefreshDelayAfterJoin = 70000
#3 DAYS
p2p_aggregation_expirationRenewThreshold = 259200000
p2p_aggregation_monitorEnabled = false
#15 MINS
p2p_aggregation_monitorRefreshInterval = 900000
#5 MINS
p2p_aggregation_consolidationDelayAfterJoin = 300000
#15 MINS
p2p_aggregation_consolidationInterval = 900000
#14 DAYS
p2p_aggregation_consolidationThreshold = 1209600000
p2p_aggregation_consolidationMinObjectsInAggregate = 20
p2p_aggregation_consolidationMinComponentsAlive = 0.8
p2p_aggregation_reconstructionMaxConcurrentLookups = 10
p2p_aggregation_aggregateLogEnabled = true
#1 HOUR
p2p_aggregation_statsGranularity = 3600000
#3 WEEKS
p2p_aggregation_statsRange = 1814400000
p2p_aggregation_statsInterval = 60000
p2p_aggregation_jitterRange = 0.1
# glacier
p2p_glacier_logStatistics = true
p2p_glacier_faultInjectionEnabled = false
p2p_glacier_insertTimeout = 30000
p2p_glacier_minFragmentsAfterInsert = 3.0
p2p_glacier_refreshTimeout = 30000
p2p_glacier_expireNeighborsDelayAfterJoin = 30000
#5 MINS
p2p_glacier_expireNeighborsInterval = 300000
#5 DAYS
p2p_glacier_neighborTimeout = 432000000
p2p_glacier_syncDelayAfterJoin = 30000
#5 MINS
p2p_glacier_syncMinRemainingLifetime = 300000
#insertTimeout
p2p_glacier_syncMinQuietTime = 30000
p2p_glacier_syncBloomFilterNumHashes = 3
p2p_glacier_syncBloomFilterBitsPerKey = 4
p2p_glacier_syncPartnersPerTrial = 1
#1 HOUR
p2p_glacier_syncInterval = 3600000
#3 MINUTES
p2p_glacier_syncRetryInterval = 180000
p2p_glacier_syncMaxFragments = 100
p2p_glacier_fragmentRequestMaxAttempts = 0
p2p_glacier_fragmentRequestTimeoutDefault = 10000
p2p_glacier_fragmentRequestTimeoutMin = 10000
p2p_glacier_fragmentRequestTimeoutMax = 60000
p2p_glacier_fragmentRequestTimeoutDecrement = 1000
p2p_glacier_manifestRequestTimeout = 10000
p2p_glacier_manifestRequestInitialBurst = 3
p2p_glacier_manifestRequestRetryBurst = 5
p2p_glacier_manifestAggregationFactor = 5
#3 MINUTES
p2p_glacier_overallRestoreTimeout = 180000
p2p_glacier_handoffDelayAfterJoin = 45000
#4 MINUTES
p2p_glacier_handoffInterval = 240000
p2p_glacier_handoffMaxFragments = 10
#10 MINUTES
p2p_glacier_garbageCollectionInterval = 600000
p2p_glacier_garbageCollectionMaxFragmentsPerRun = 100
#10 MINUTES
p2p_glacier_localScanInterval = 600000
p2p_glacier_localScanMaxFragmentsPerRun = 20
p2p_glacier_restoreMaxRequestFactor = 4.0
p2p_glacier_restoreMaxBoosts = 2
p2p_glacier_rateLimitedCheckInterval = 30000
p2p_glacier_rateLimitedRequestsPerSecond = 3
p2p_glacier_enableBulkRefresh = true
p2p_glacier_bulkRefreshProbeInterval = 3000
p2p_glacier_bulkRefreshMaxProbeFactor = 3.0
p2p_glacier_bulkRefreshManifestInterval = 30000
p2p_glacier_bulkRefreshManifestAggregationFactor = 20
p2p_glacier_bulkRefreshPatchAggregationFactor = 50
#3 MINUTES
p2p_glacier_bulkRefreshPatchInterval = 180000
p2p_glacier_bulkRefreshPatchRetries = 2
p2p_glacier_bucketTokensPerSecond = 100000
p2p_glacier_bucketMaxBurstSize = 200000
p2p_glacier_jitterRange = 0.1
#1 MINUTE
p2p_glacier_statisticsReportInterval = 60000
p2p_glacier_maxActiveRestores = 3
#transport layer testing params
org.mpisws.p2p.testing.transportlayer.replay.Recorder_printlog = true
# logging
#default log level
loglevel = WARNING
#example of enabling logging on the endpoint:
#rice.p2p.scribe#ScribeRegrTest-endpoint_loglevel = INFO
logging_packageOnly = true
logging_date_format = yyyyMMdd.HHmmss.SSS
logging_enable=true
# 24 hours
log_rotate_interval = 86400000
# the name of the active log file, and the filename prefix of rotated log
log_rotate_filename = freepastry.log
# the format of the date for the rotating log
log_rotating_date_format = yyyyMMdd.HHmmss.SSS
# true will tell the environment to ues the FileLogManager
environment_logToFile = false
# the prefix for the log files (otherwise will be named after the nodeId)
fileLogManager_filePrefix =
# the suffix for the log files
fileLogManager_fileSuffix = .log
# wether to keep the line prefix (declaring the node id) for each line of the log
fileLogManager_keepLinePrefix = false
fileLogManager_multipleFiles = true
fileLogManager_defaultFileName = main
# false = append true = overwrite
fileLogManager_overwrite_existing_log_file = false
# the amount of time the LookupService tutorial app will wait before timing out
# in milliseconds, default is 30 seconds
lookup_service.timeout = 30000
# how long to wait before the first retry
lookup_service.firstTimeout = 500
Edit: Comfirmed with wireshark that the message indeed reach the computer freepastry just don't accept the connection.
Not sure what you mean by "not work". To test the connectivity between your client and your server (sit behind NAT), you just need do something like "telnet mIP mBindport" on your client side, assuming you have a telnet utility (default on Linux and Mac, you can install one, like nc ("netcat") on your windows).
If the port forwarding is set up correctly, you should see something like the following when the TCP connection is set up with your server.
Connected to localhost.
Escape character is '^]'.
Once the TCP session sets up correctly, you can stop the "telnet" program and use your real client (in java) to talk to your server, it should work fine.
If the TCP session didn't set up, you may want to check on the server side. Use either a wireshark or tcpdump to capture packets with filter "tcp port 50001", and run the telnet command above to check if there is a TCP packet come in.
If nothing show up in wireshark or tcpdump, then your firewall (like portforwarding) is not set up correctly.
If the TCP packet does show up in wireshark or tcpdump, then your server program may be at fault. Check the IP address it binds to using the command (linux):
netstat -antp | grep 50001
(on windows, the command is slightly different).
Typically it should bind to IP address 0.0.0.0 (all ip), if it doesn't, you should check whether the IP it binds to has connectivity/route to the outside world (outside the NAT).
Good luck.
I would try to set your IP as your local for the computer Free Pastry is running on. It sounds like the computer is getting the information but Free Pastry is looking for it on a different address. If you set your mIP to be local, I think it would work. This would be if it is behind the router/NAT.
Port forwarding forwards packets from your public IP on port 50001 to your internal computer IP on whatever port you set, normally the same 50001. If you set your program to listen on the public IP, it doesn't have access to it so it will not accept any packets/messages. Set to listen on the computers IP, or 0.0.0.0/localhost, it should accept any packets/messages on that port.

Operation timed out using CouchbaseClient

I am getting Timeout exceptions even though there is not much load on the Couchbase server.
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1003)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1018)
at com.eos.cache.CacheClient.get(CacheClient.java:280)
at com.eos.cache.GenericCacheAccessObject.get(GenericCacheAccessObject.java:55)
...
...
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: /192.168.4.12:11210
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:157)
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:62)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:997)
...30 more
This is how I am creating the client.
List<URI> uris = new ArrayList<URI>();
String[] serverTokens = getServers().split(" ");
for (int index = 0; index < serverTokens.length; index++) {
uris.add(new URI(serverTokens[index]));
}
CouchbaseConnectionFactoryBuilder ccfb = new CouchbaseConnectionFactoryBuilder();
ccfb.setProtocol(Protocol.BINARY);
ccfb.setOpTimeout(10000); // wait up to 10 seconds for an operation to
// succeed
ccfb.setOpQueueMaxBlockTime(5000); // wait up to 5 seconds when trying
// to enqueue an operation
ccfb.setMaxReconnectDelay(1500);
CouchbaseConnectionFactory cf = ccfb.buildCouchbaseConnection(uris, bucket, "");
CouchbaseClient client = new CouchbaseClient(cf);
I am maintaining a pool of persistent clients in our web server. And we are not even touching the max conn limit which has been set to 15 only.
Pls help me guys in solving this.

Categories

Resources