I have some bug in production's application, but I can't find the cause of it. I try to get some log to find a method, which calls my method(). But because I use threadPool I can't just get Thread.currentThread().getStackTrace() and iterate through StackTraceElements, it shows only some lines before ThreadPool.
If I use the next code, I'll get every method which I need, but it so expansive. Only 1 call of method cost 400+ Kb in a text file in my test environment. In production it would be about 1 Mb in a second, I think.
private final ExecutorService completableFutureExecutor =
new ThreadPoolExecutor(10, 2000, 60L, TimeUnit.SECONDS, new SynchronousQueue<>());
public void firstMethod(){
secondMethod();
}
private CompletableFuture<Void> secondMethod(){
return CompletableFuture.supplyAsync(()->method(),threadPool);
}
void method(){
Map<Thread, StackTraceElement[]> map = Thread.getAllStackTraces();
for (Thread thread : map.keySet()) {
printLog(thread);
}
}
private void printLog(Thread thread) {
StringBuilder builder = new StringBuilder();
for (StackTraceElement s : thread.getStackTrace()) {
builder.append("\n getClass = " + s.getClass());
builder.append("\n getClassName = " + s.getClassName());
builder.append("\n getFileName = " + s.getFileName());
builder.append("\n getLineNumber = " + s.getLineNumber());
builder.append("\n getMethodName = " + s.getMethodName());
builder.append("\n ---------------------------- \n ");
}
ownLogger.info("SomeThread = {} ", builder);
}
How to find that firstMethod() who calls secondMethod() ?
As I haven't found any good solution my own is to put logger before and after CompletableFuture call
It looks like
Logger beforeAsync= LoggerFactory.getLogger("beforeAsync");
Logger afterAsync= LoggerFactory.getLogger("afterAsync");
private CompletableFuture<Void> secondMethod(){
printLongerTrace(Thread.currentThread(),beforeAsync);
return CompletableFuture.supplyAsync(()->method(),threadPool);
}
private void methodWithException(){
try{
//do something
}
catch(Exception e){
printLongerTrace(e,"methodWithException", afterAsync);
}
}
public void printLongerTrace(Throwable t, String methodName, Logger ownlogger) {
if (t.getCause() != null) {
printLongerTrace(t.getCause(), methodName, fields, ownlogger);
}
StringBuilder builder = new StringBuilder();
builder.append("\n Thread = " + Thread.currentThread().getName());
builder.append("ERROR CAUSE = " + t.getCause() + "\n");
builder.append("ERROR MESSAGE = " + t.getMessage() + "\n");
printLog(t.getStackTrace(), builder);
ownlogger.info(methodName + "Trace ----- {}", builder);
}
public void printLongerTrace(Thread t, Logger ownlogger) {
StringBuilder builder = new StringBuilder();
builder.append("\n Thread = " + Thread.currentThread().getName());
printLog(t.getStackTrace(), builder);
ownlogger.info("Trace ----- {}", builder);
}
private StringBuilder printLog(StackTraceElement[] elements, StringBuilder builder) {
int size = elements.length > 15 ? 15 : elements.length;
for (int i = 0; i < size; i++) {
builder.append("Line " + i + " = " + elements[i] + " with method = " + elements[i].getMethodName() + "\n");
}
return builder;
}
printLongerTrace(Throwable t, String methodName, Logger ownlogger) needs to print exception with every cause in recursion.
printLongerTrace(Thread t, Logger ownlogger) needs to print which method call before CompletableFuture
Just dump the Stack by calling Thread.dumpStack() but this is only for debugin and has a big overhead, since dumping the stack is cpu intensive
Related
I am creating a spring boot application which connects to multiple REST services and write the responses on outputStream.
I am also using multiple threads to call the REST services.
public ResponseEntity<StreamingResponseBody> startBombing(Request request) {
int numberOfThreads = request.getConfig().getNumberOfThreads() ==0?5:request.getConfig().getNumberOfThreads();
long requestPerThread = request.getConfig().getRequestPerThread() ==0 ? 100: request.getConfig().getRequestPerThread();
StreamingResponseBody responseBody = response -> {
for (int i = 1; i <= numberOfThreads; i++) {
int finalI = i;
Runnable r1 = () -> {
try {
for (int j = 1; j <= requestPerThread; j++) {
HttpRequest req = createRequest(request.getHttpRequest());
Object res = doRequest(req);
System.out.println("Thread number: " + finalI + ": " + "call number: " + j + "TimeStamp: " + System.currentTimeMillis() + ":::: RESPONSE: " + res);
response.write(("Thread number: " + finalI + ": " + "call number: " + j + "TimeStamp: " + System.currentTimeMillis() + ":::: RESPONSE: " + res).getBytes());
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
};
Thread t1 = new Thread(r1);
t1.start();
}
};
return ResponseEntity.ok()
.contentType(MediaType.TEXT_PLAIN)
.body(responseBody);
}
No data is printed on the output stream.
Any clue how to reuse same outputStream in muliple threads
Wait for all threads to finish before exiting lambda (as it will close the output for you)
StreamingResponseBody responseBody = response -> {
CountDownLatch latch=new CountDownLatch(numberOfThreads);
for (int i = 1; i <= numberOfThreads; i++) {
int finalI = i;
Runnable r1 = () -> {
try {
//ireelevant code
} finally{
latch.countDown(); // decrease latch counter
}
};
Thread t1 = new Thread(r1);
t1.start();
}
latch.await(); // wait for latch to count down to 0 + add error handling and return value
};
I have some confuse about ReentrantLock tryLock(timeout,timeUnit) method , when
running below code it seems tryLock timeout until the previous thread end,could anyone explains this?
public class MyService2 {
public ReentrantLock lock = new ReentrantLock();
public void waitMethod() {
try {
System.out.println(System.currentTimeMillis() + " " + Thread.currentThread().getName() + " enter ");
boolean b = lock.tryLock(2, TimeUnit.SECONDS);
if (b) {
System.out.println(System.currentTimeMillis() + " lock begin:" + Thread.currentThread().getName());
for (int i = 0; i < Integer.MAX_VALUE / 10; i++) {
Math.random();
}
System.out.println(System.currentTimeMillis() + " lock end " + Thread.currentThread().getName());
return;
}
System.out.println(System.currentTimeMillis() + " " + Thread.currentThread().getName() + " got no lock end ");
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
if (lock.isHeldByCurrentThread()) {
lock.unlock();
}
}
}
public static void main(String[] args) throws InterruptedException {
MyService2 myService2 = new MyService2();
Runnable runnable = myService2::waitMethod;
Thread thread1 = new Thread(runnable);
thread1.setName("T1");
thread1.start();
TimeUnit.MILLISECONDS.sleep(10);
Thread thread2 = new Thread(runnable);
thread2.setName("T2");
thread2.start();
}
after running this code ,the result is like that
1555343172612 T1 enter
1555343172613 lock begin:T1
1555343172627 T2 enter
1555343179665 lock end T1
1555343179665 T2 got no lock end
my question is why thread T2 doesn't timeout in 2s rather than waiting until thread T1 ends?
BUT I just found:
if replace Math.random() with TimeUnit.SECONDS.sleep(1) for example ,it works fine.
if run in debug mode ,it works fine too.
Here is an alternate which has a number modifications:
First, cleanups. Clearer names. Less intrusive logging. Relative time values.
Second, the 0.1s sleep between the launch of the two compute threads is moved into each of the threads. That more clearly gives precedence to the thread which launches the compute threads.
Third, the launch thread has joins with the compute threads. That is to tie the conclusion of the computation to the launch thread. In the original code, there is no management of the compute threads after they have been launched. If the compute threads are intended to be unmanaged, that needs to be documented.
Fourth, the entire launch thread plus two compute threads structure is replicated. That is to place give the structure a more realistic runtime environment, and, to present the different behaviors of the structure together in a single view.
A theme to the modifications is to provide clarity, both to the intended behavior of the program, and to the actual behavior (as viewed through the logging output). The goal is to provide maximal clarity to these.
An additional modification is recommended, which is to put the log statements into a cache, then display the collected log lines after all of the computation cells have completed. That removes behavior changes caused by the log statements, which are often considerable.
package my.tests;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.ReentrantLock;
public class LockTest {
private static long initialTime;
protected static void setInitialTime() {
initialTime = System.currentTimeMillis();
}
public static long getInitialTime() {
return initialTime;
}
public static final int CELL_COUNT = 10;
public static void main(String[] args) {
setInitialTime();
System.out.println("Beginning [ " + Integer.toString(CELL_COUNT) + " ] computation cells");
Thread[] cellThreads = new Thread[CELL_COUNT];
for ( int cellNo = 0; cellNo < CELL_COUNT; cellNo++ ) {
final String cellNoText = Integer.toString(cellNo);
Runnable computeCell = () -> {
(new LockTest(cellNoText) ).compute();
};
Thread cellThread = new Thread(computeCell);
cellThreads[cellNo] = cellThread;
}
// Start them all up ...
for ( Thread cellThread : cellThreads ) {
cellThread.start();
}
// Then wait for them all to finish ...
for ( Thread cellThread : cellThreads ) {
try {
cellThread.join();
} catch ( InterruptedException e ) {
System.out.println("Unexpected interruption: " + e.getMessage());
e.printStackTrace();
}
}
System.out.println("Completed [ " + Integer.toString(CELL_COUNT) + " ] computation cells");
}
//
public LockTest(String cellName) {
this.cellName = cellName;
}
private final String cellName;
public String getCellName() {
return cellName;
}
// Logging ...
public String formatTime(long timeMs) {
return String.format("%12d (ms)", new Long(timeMs));
}
public long getRelativeTime(long currentTime) {
return currentTime - getInitialTime();
}
public String formatRelativeTime(long timeMs) {
return String.format(
"%12d %8d (ms)",
new Long(timeMs),
new Long( timeMs - getInitialTime() ));
}
public void log(String methodName, String message) {
long timeMs = System.currentTimeMillis();
String threadName = Thread.currentThread().getName();
System.out.println(
formatRelativeTime(timeMs) + ": " +
methodName + ": " +
threadName + ": " + message);
}
//
public void compute() {
log("compute", "ENTER: " + getCellName());
Runnable computation = () -> {
guardedComputation(
100L, 0, // Pause 0.1s before attempting the computation
1, TimeUnit.SECONDS, // Try to obtain the computation lock for up to 1.0s.
Integer.MAX_VALUE / 60 ); // Run this many computations; takes about 2s; adjust as needed
};
Thread computer1 = new Thread(computation);
computer1.setName( getCellName() + "." + "T1");
Thread computer2 = new Thread(computation);
computer2.setName( getCellName() + "." + "T2");
// Run two sets of computations:
//
// Each will pause for 0.1s before performing the computations.
//
// Performing computations requires a computation lock; wait up to 2.0s
// to acquire the lock.
computer1.start();
computer2.start();
try {
computer1.join();
} catch ( InterruptedException e ) {
System.out.println("Unexpected interruption: " + e.getMessage());
e.printStackTrace();
return;
}
try {
computer2.join();
} catch ( InterruptedException e ) {
System.out.println("Unexpected interruption: " + e.getMessage());
e.printStackTrace();
return;
}
log("compute", "RETURN: " + getCellName());
}
// Computation locking ...
private final ReentrantLock computationLock = new ReentrantLock();
public boolean acquireComputationLock(long maxWait, TimeUnit maxWaitUnit) throws InterruptedException {
return computationLock.tryLock(maxWait, maxWaitUnit);
}
public void releaseComputationLock() {
if ( computationLock.isHeldByCurrentThread() ) {
computationLock.unlock();
}
}
//
public void guardedComputation(
long pauseMs, int pauseNs,
long maxWait, TimeUnit maxWaitUnit, int computations) {
String methodName = "guardedComputation";
log(methodName, "ENTER");
try {
Thread.sleep(pauseMs, pauseNs);
} catch ( InterruptedException e ) {
System.out.println("Unexpected interruption: " + e.getMessage());
e.printStackTrace();
return;
}
try {
boolean didLock;
try {
didLock = acquireComputationLock(maxWait, maxWaitUnit);
} catch ( InterruptedException e ) {
System.out.println("Unexpected interruption: " + e.getMessage());
e.printStackTrace();
return;
}
String computationsText = Integer.toString(computations);
if ( didLock ) {
log(methodName, "Starting computations: " + computationsText);
for ( int computationNo = 0; computationNo < computations; computationNo++ ) {
Math.random();
}
log(methodName, "Completed computations: " + computationsText);
} else {
log(methodName, "Skipping computations: " + computationsText);
}
} finally {
releaseComputationLock();
}
log(methodName, "RETURN");
}
}
Recently, I review the kafka code and test. I found a strange case:
I print the bytebuffer on the entry of SocketServer processCompletedReceives, as well as print the value on the point of Log sotre as follows:
the entry of SocketServer
private def processCompletedReceives() {
selector.completedReceives.asScala.foreach { receive =>
try {
openOrClosingChannel(receive.source) match {
case Some(channel) =>
val header = RequestHeader.parse(receive.payload)
val connectionId = receive.source
val context = new RequestContext(header, connectionId, channel.socketAddress,
channel.principal, listenerName, securityProtocol)
val req = new RequestChannel.Request(processor = id, context = context,
startTimeNanos = time.nanoseconds, memoryPool, receive.payload, requestChannel.metrics)
if(header.apiKey() == ApiKeys.PRODUCE){
LogHelper.log("produce request: %v" + java.util.Arrays.toString(receive.payload.array()))
}
...
the point of Log
validRecords.records().asScala.foreach { record =>
LogHelper.log("buffer info: value " + java.util.Arrays.toString(record.value().array()))
}
but, the result of print is different. and record.value() is not what I passed in client value like this:
public void run() {
int messageNo = 1;
while (true) {
String messageStr = "Message_" + messageNo;
long startTime = System.currentTimeMillis();
if (isAsync) { // Send asynchronously
producer.send(new ProducerRecord<>(topic,
messageNo,
messageStr), new DemoCallBack(startTime, messageNo, messageStr));
} else { // Send synchronously
try {
producer.send(new ProducerRecord<>(topic,
messageNo,
messageStr)).get();
System.out.println("Sent message: (" + messageNo + ", " + messageStr + ")");
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
++messageNo;
}
}
the print result is not the not String messageStr = "Message_" + messageNo;
so what happend in the case.
done. I write the code as follows:
public class KVExtractor {
private static final Logger logger = LoggerFactory.getLogger(KVExtractor.class);
public static Map.Entry<byte[], byte[]> extract(Record record) {
if (record.hasKey() && record.hasValue()) {
byte[] key = new byte[record.key().limit()];
record.key().get(key);
byte[] value = new byte[record.value().limit()];
record.value().get(value);
System.out.println("key : " + new String(key) + " value: " + new String(value));
return new AbstractMap.SimpleEntry<byte[], byte[]>(key, value);
}else if(record.hasValue()){
// illegal impl
byte[] data = new byte[record.value().limit()];
record.value().get(data);
System.out.println("no key but with value : " + new String(data));
}
return null;
}
}
I want a dispatcher thread that executes and retrieves results from a pool of worker threads. The dispatcher needs to continuously feed work to the worker threads. When ANY of the worker thread completes, the dispatcher needs to gather its results and re-dispatch (or create a new) worker thread. It seems to me like this should be obvious but I have been unable to find an example of a suitable pattern. A Thread.join() loop would be inadequate because that is really "AND" logic and I am looking for "OR" logic.
The best I could come up with is to have the dispatcher thread wait() and have the worker threads notify() when they are done. Though seems like I would have to guard against two worker threads that end at the same time causing the dispatcher thread to miss a notify(). Plus, this seems a little bit inelegant to me.
Even less elegant is the idea of the dispatcher thread periodically waking up and polling the worker thread pool and checking each thread to see if it has completed via isAlive().
I took a look at java.util.concurrent and didn't see anything that looked like it fit this pattern.
I feel that to implement what I mention above would involve a lot of defensive programming and reinventing the wheel. There's got to be something that I am missing. What can I leverage to implement this pattern?
This is the single-threaded version. putMissingToS3() would become the dispatcher thread and the capability represented in the uploadFileToBucket() would become the worker thread.
private void putMissingToS3()
{
int reqFilesToUpload = 0;
long reqSizeToUpload = 0L;
int totFilesUploaded = 0;
long totSizeUploaded = 0L;
int totFilesSkipped = 0;
long totSizeSkipped = 0L;
int rptLastFilesUploaded = 0;
long rptSizeInterval = 1000000000L;
long rptLastSize = 0L;
StopWatch rptTimer = new StopWatch();
long rptLastMs = 0L;
StopWatch globalTimer = new StopWatch();
StopWatch indvTimer = new StopWatch();
for (FileSystemRecord fsRec : fileSystemState.toList())
{
String reqKey = PathConverter.pathToKey(PathConverter.makeRelativePath(fileSystemState.getRootPath(), fsRec.getFullpath()));
LocalS3MetadataRecord s3Rec = s3Metadata.getRecord(reqKey);
// Just get a rough estimate of what the size of this upload will be
if (s3Rec == null)
{
++reqFilesToUpload;
reqSizeToUpload += fsRec.getSize();
}
}
long uploadTimeGuessMs = (long)((double)reqSizeToUpload/estUploadRateBPS*1000.0);
printAndLog("Estimated upload: " + natFmt.format(reqFilesToUpload) + " files, " + Utils.readableFileSize(reqSizeToUpload) +
", Estimated time " + Utils.readableElapsedTime(uploadTimeGuessMs));
globalTimer.start();
rptTimer.start();
for (FileSystemRecord fsRec : fileSystemState.toList())
{
String reqKey = PathConverter.pathToKey(PathConverter.makeRelativePath(fileSystemState.getRootPath(), fsRec.getFullpath()));
if (PathConverter.validate(reqKey))
{
LocalS3MetadataRecord s3Rec = s3Metadata.getRecord(reqKey);
//TODO compare and deal with size mismatches. Maybe go and look at last-mod dates.
if (s3Rec == null)
{
indvTimer.start();
uploadFileToBucket(s3, syncParms.getS3Bucket(), fsRec.getFullpath(), reqKey);
indvTimer.stop();
++totFilesUploaded;
totSizeUploaded += fsRec.getSize();
logOnly("Uploaded: Size=" + fsRec.getSize() + ", " + indvTimer.stopDeltaMs() + " ms, File=" + fsRec.getFullpath() + ", toKey=" + reqKey);
if (totSizeUploaded > rptLastSize + rptSizeInterval)
{
long invSizeUploaded = totSizeUploaded - rptLastSize;
long nowMs = rptTimer.intervalMs();
long invElapMs = nowMs - rptLastMs;
long remSize = reqSizeToUpload - totSizeUploaded;
double progessPct = (double)totSizeUploaded/reqSizeToUpload*100.0;
double mbps = (invElapMs > 0) ? invSizeUploaded/1e6/(invElapMs/1000.0) : 0.0;
long remMs = (long)((double)remSize/((double)invSizeUploaded/invElapMs));
printOnly("Progress: " + d2Fmt.format(progessPct) + "%, " + Utils.readableFileSize(totSizeUploaded) + " of " +
Utils.readableFileSize(reqSizeToUpload) + ", Rate " + d3Fmt.format(mbps) + " MB/s, " +
"Time rem " + Utils.readableElapsedTime(remMs));
rptLastMs = nowMs;
rptLastFilesUploaded = totFilesUploaded;
rptLastSize = totSizeUploaded;
}
}
}
else
{
++totFilesSkipped;
totSizeSkipped += fsRec.getSize();
logOnly("Skipped (Invalid chars): Size=" + fsRec.getSize() + ", " + fsRec.getFullpath() + ", toKey=" + reqKey);
}
}
globalTimer.stop();
double mbps = 0.0;
if (globalTimer.stopDeltaMs() > 0)
mbps = totSizeUploaded/1e6/(globalTimer.stopDeltaMs()/1000.0);
printAndLog("Actual upload: " + natFmt.format(totFilesUploaded) + " files, " + Utils.readableFileSize(totSizeUploaded) +
", Time " + Utils.readableElapsedTime(globalTimer.stopDeltaMs()) + ", Rate " + d3Fmt.format(mbps) + " MB/s");
if (totFilesSkipped > 0)
printAndLog("Skipped Files: " + natFmt.format(totFilesSkipped) + " files, " + Utils.readableFileSize(totSizeSkipped));
}
private void uploadFileToBucket(AmazonS3 amazonS3, String bucketName, String filePath, String fileKey)
{
File inFile = new File(filePath);
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.addUserMetadata(Const.LAST_MOD_KEY, Long.toString(inFile.lastModified()));
objectMetadata.setLastModified(new Date(inFile.lastModified()));
PutObjectRequest por = new PutObjectRequest(bucketName, fileKey, inFile).withMetadata(objectMetadata);
// Amazon S3 never stores partial objects; if during this call an exception wasn't thrown, the entire object was stored.
amazonS3.putObject(por);
}
I think you are at right package. you should use ExecutorService API.
This removes burden of waiting and watching for thread's notification.
Example:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.Executors;
public class ExecutorEx{
static class ThreadA implements Runnable{
int id;
public ThreadA(int id){
this.id = id;
}
public void run(){
//To simulate some work
try{Thread.sleep(Math.round(Math.random()*100));}catch(Exception e){}
// to show message
System.out.println(this.id + "--Test Message" + System.currentTimeMillis());
}
}
public static void main(String args[]) throws Exception{
int poolSize = 10;
ExecutorService pool = Executors.newFixedThreadPool(poolSize);
int i=0;
while(i<100){
pool.submit(new ThreadA(i));
i++;
}
pool.shutdown();
while(!pool.isTerminated()){
pool.awaitTermination(60, TimeUnit.SECONDS);
}
}
}
And if you want to return something from your thread will need to implement Callable instead of Runnable(call() instead of run()) and collect returned values in Future object array, that you can iterate over later.
I'm trying to use the DynamoDB Parallel Scan Example:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LowLevelJavaScanning.html
I have 200,000 items, and I've taken the sequential code scan, and modified it slightly for my usage:
Map<String, AttributeValue> lastKeyEvaluated = null;
do
{
ScanRequest scanRequest = new ScanRequest()
.withTableName(tableName)
.withExclusiveStartKey(lastKeyEvaluated);
ScanResult result = client.scan(scanRequest);
double counter = 0;
for(Map<String, AttributeValue> item : result.getItems())
{
itemSerialize.add("Set:"+counter);
for (Map.Entry<String, AttributeValue> getItem : item.entrySet())
{
String attributeName = getItem.getKey();
AttributeValue value = getItem.getValue();
itemSerialize.add(attributeName
+ (value.getS() == null ? "" : ":" + value.getS())
+ (value.getN() == null ? "" : ":" + value.getN())
+ (value.getB() == null ? "" : ":" + value.getB())
+ (value.getSS() == null ? "" : ":" + value.getSS())
+ (value.getNS() == null ? "" : ":" + value.getNS())
+ (value.getBS() == null ? "" : ":" + value.getBS()));
}
counter += 1;
}
lastKeyEvaluated = result.getLastEvaluatedKey();
}
while(lastKeyEvaluated != null);
The counter gives exactly 200,000 when this code has finished, however, I also wanted to try the parallel scan.
Function Call:
ScanSegmentTask task = null;
ArrayList<String> list = new ArrayList<String>();
try
{
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
int totalSegments = numberOfThreads;
for (int segment = 0; segment < totalSegments; segment++)
{
// Runnable task that will only scan one segment
task = new ScanSegmentTask(tableName, itemLimit, totalSegments, segment, list);
// Execute the task
executor.execute(task);
}
shutDownExecutorService(executor);
}
.......Catches something if error
return list;
Class:
I have a static list that the data is shared with all the threads. I was able to retrieve the lists, and output the amount of data.
// Runnable task for scanning a single segment of a DynamoDB table
private static class ScanSegmentTask implements Runnable
{
// DynamoDB table to scan
private String tableName;
// number of items each scan request should return
private int itemLimit;
// Total number of segments
// Equals to total number of threads scanning the table in parallel
private int totalSegments;
// Segment that will be scanned with by this task
private int segment;
static ArrayList<String> list_2;
Object lock = new Object();
public ScanSegmentTask(String tableName, int itemLimit, int totalSegments, int segment, ArrayList<String> list)
{
this.tableName = tableName;
this.itemLimit = itemLimit;
this.totalSegments = totalSegments;
this.segment = segment;
list_2 = list;
}
public void run()
{
System.out.println("Scanning " + tableName + " segment " + segment + " out of " + totalSegments + " segments " + itemLimit + " items at a time...");
Map<String, AttributeValue> exclusiveStartKey = null;
int totalScannedItemCount = 0;
int totalScanRequestCount = 0;
int counter = 0;
try
{
while(true)
{
ScanRequest scanRequest = new ScanRequest()
.withTableName(tableName)
.withLimit(itemLimit)
.withExclusiveStartKey(exclusiveStartKey)
.withTotalSegments(totalSegments)
.withSegment(segment);
ScanResult result = client.scan(scanRequest);
totalScanRequestCount++;
totalScannedItemCount += result.getScannedCount();
synchronized(lock)
{
for(Map<String, AttributeValue> item : result.getItems())
{
list_2.add("Set:"+counter);
for (Map.Entry<String, AttributeValue> getItem : item.entrySet())
{
String attributeName = getItem.getKey();
AttributeValue value = getItem.getValue();
list_2.add(attributeName
+ (value.getS() == null ? "" : ":" + value.getS())
+ (value.getN() == null ? "" : ":" + value.getN())
+ (value.getB() == null ? "" : ":" + value.getB())
+ (value.getSS() == null ? "" : ":" + value.getSS())
+ (value.getNS() == null ? "" : ":" + value.getNS())
+ (value.getBS() == null ? "" : ":" + value.getBS()));
}
counter += 1;
}
}
exclusiveStartKey = result.getLastEvaluatedKey();
if (exclusiveStartKey == null)
{
break;
}
}
}
catch (AmazonServiceException ase)
{
System.err.println(ase.getMessage());
}
finally
{
System.out.println("Scanned " + totalScannedItemCount + " items from segment " + segment + " out of " + totalSegments + " of " + tableName + " with " + totalScanRequestCount + " scan requests");
}
}
}
Executor Service Shut Down:
public static void shutDownExecutorService(ExecutorService executor)
{
executor.shutdown();
try
{
if (!executor.awaitTermination(10, TimeUnit.SECONDS))
{
executor.shutdownNow();
}
}
catch (InterruptedException e)
{
executor.shutdownNow();
Thread.currentThread().interrupt();
}
}
However, the amount of items changes every time I run this piece of code (Varies around 60000 in total, 6000 per threads, with 10 created threads). Removing synchronization does not change the result too.
Is there a bug with the synchronization or with the Amazon AWS API?
Thanks All
EDIT:
The new function call:
ScanSegmentTask task = null;
ArrayList<String> list = new ArrayList<String>();
try
{
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
int totalSegments = numberOfThreads;
for (int segment = 0; segment < totalSegments; segment++)
{
// Runnable task that will only scan one segment
task = new ScanSegmentTask(tableName, itemLimit, totalSegments, segment);
// Execute the task
Future<ArrayList<String>> future = executor.submit(task);
list.addAll(future.get());
}
shutDownExecutorService(executor);
}
The new class:
// Runnable task for scanning a single segment of a DynamoDB table
private static class ScanSegmentTask implements Callable<ArrayList<String>>
{
// DynamoDB table to scan
private String tableName;
// number of items each scan request should return
private int itemLimit;
// Total number of segments
// Equals to total number of threads scanning the table in parallel
private int totalSegments;
// Segment that will be scanned with by this task
private int segment;
ArrayList<String> list_2 = new ArrayList<String>();
static int counter = 0;
public ScanSegmentTask(String tableName, int itemLimit, int totalSegments, int segment)
{
this.tableName = tableName;
this.itemLimit = itemLimit;
this.totalSegments = totalSegments;
this.segment = segment;
}
#SuppressWarnings("finally")
public ArrayList<String> call()
{
System.out.println("Scanning " + tableName + " segment " + segment + " out of " + totalSegments + " segments " + itemLimit + " items at a time...");
Map<String, AttributeValue> exclusiveStartKey = null;
try
{
while(true)
{
ScanRequest scanRequest = new ScanRequest()
.withTableName(tableName)
.withLimit(itemLimit)
.withExclusiveStartKey(exclusiveStartKey)
.withTotalSegments(totalSegments)
.withSegment(segment);
ScanResult result = client.scan(scanRequest);
for(Map<String, AttributeValue> item : result.getItems())
{
list_2.add("Set:"+counter);
for (Map.Entry<String, AttributeValue> getItem : item.entrySet())
{
String attributeName = getItem.getKey();
AttributeValue value = getItem.getValue();
list_2.add(attributeName
+ (value.getS() == null ? "" : ":" + value.getS())
+ (value.getN() == null ? "" : ":" + value.getN())
+ (value.getB() == null ? "" : ":" + value.getB())
+ (value.getSS() == null ? "" : ":" + value.getSS())
+ (value.getNS() == null ? "" : ":" + value.getNS())
+ (value.getBS() == null ? "" : ":" + value.getBS()));
}
counter += 1;
}
exclusiveStartKey = result.getLastEvaluatedKey();
if (exclusiveStartKey == null)
{
break;
}
}
}
catch (AmazonServiceException ase)
{
System.err.println(ase.getMessage());
}
finally
{
return list_2;
}
}
}
Final EDIT:
Function Call:
ScanSegmentTask task = null;
ArrayList<String> list = new ArrayList<String>();
ArrayList<Future<ArrayList<String>>> holdFuture = new ArrayList<Future<ArrayList<String>>>();
try
{
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
int totalSegments = numberOfThreads;
for (int segment = 0; segment < totalSegments; segment++)
{
// Runnable task that will only scan one segment
task = new ScanSegmentTask(tableName, itemLimit, totalSegments, segment);
// Execute the task
Future<ArrayList<String>> future = executor.submit(task);
holdFuture.add(future);
}
for (int i = 0 ; i < holdFuture.size(); i++)
{
boolean flag = false;
while(flag == false)
{
Thread.sleep(1000);
if(holdFuture.get(i).isDone())
{
list.addAll(holdFuture.get(i).get());
flag = true;
}
}
}
shutDownExecutorService(executor);
}
Class:
private static class ScanSegmentTask implements Callable>
{
// DynamoDB table to scan
private String tableName;
// number of items each scan request should return
private int itemLimit;
// Total number of segments
// Equals to total number of threads scanning the table in parallel
private int totalSegments;
// Segment that will be scanned with by this task
private int segment;
ArrayList<String> list_2 = new ArrayList<String>();
static AtomicInteger counter = new AtomicInteger(0);
public ScanSegmentTask(String tableName, int itemLimit, int totalSegments, int segment)
{
this.tableName = tableName;
this.itemLimit = itemLimit;
this.totalSegments = totalSegments;
this.segment = segment;
}
#SuppressWarnings("finally")
public ArrayList<String> call()
{
System.out.println("Scanning " + tableName + " segment " + segment + " out of " + totalSegments + " segments " + itemLimit + " items at a time...");
Map<String, AttributeValue> exclusiveStartKey = null;
try
{
while(true)
{
ScanRequest scanRequest = new ScanRequest()
.withTableName(tableName)
.withLimit(itemLimit)
.withExclusiveStartKey(exclusiveStartKey)
.withTotalSegments(totalSegments)
.withSegment(segment);
ScanResult result = client.scan(scanRequest);
for(Map<String, AttributeValue> item : result.getItems())
{
list_2.add("Set:"+counter);
for (Map.Entry<String, AttributeValue> getItem : item.entrySet())
{
String attributeName = getItem.getKey();
AttributeValue value = getItem.getValue();
list_2.add(attributeName
+ (value.getS() == null ? "" : ":" + value.getS())
+ (value.getN() == null ? "" : ":" + value.getN())
+ (value.getB() == null ? "" : ":" + value.getB())
+ (value.getSS() == null ? "" : ":" + value.getSS())
+ (value.getNS() == null ? "" : ":" + value.getNS())
+ (value.getBS() == null ? "" : ":" + value.getBS()));
}
counter.addAndGet(1);
}
exclusiveStartKey = result.getLastEvaluatedKey();
if (exclusiveStartKey == null)
{
break;
}
}
}
catch (AmazonServiceException ase)
{
System.err.println(ase.getMessage());
}
finally
{
return list_2;
}
}
}
OK, I believe the issue is in the way you synchronized.
In your case, your lock is pretty much pointless, as each thread has its own lock, and so synchronizing never actually blocks one thread from running the same piece of code. I believe that this is the reason that removing synchronization does not change the result -- because it never would have had an effect in the first place.
I believe your issue is in fact due to the static ArrayList<String> that's shared by your threads. This is because ArrayList is actually not thread-safe, and so operations on it are not guaranteed to succeed; as a result, you have to synchronize operations to/from it. Without proper synchronization, it could be possible to have two threads add something to an empty ArrayList, yet have the resulting ArrayList have a size of 1! (or at least if my memory hasn't failed me. I believe this is the case for non-thread-safe objects, though)
As I said before, while you do have a synchronized block, it really isn't doing anything. You could synchronize on list_2, but all that would do is effectively make all your threads run in sequence, as the lock on the ArrayList wouldn't be released until one of your threads was done.
There are a few solutions to this. You can use Collections.synchronizedList(list_2) to create a synchronized wrapper to your ArrayList. This way, adding to the list is guaranteed to succeed. However, this induces a synchronization cost per operations, and so isn't ideal.
What I would do is actually have ScanSegmentTask implement Callable (technically Callable<ArrayList<String>>. The Callable interface is almost exactly like the Runnable interface, except its method is call(), which returns a value.
Why is this important? I think that what would produce the best results for you is this:
Make list_2 an instance variable, initialized to a blank list
Have each thread add to this list exactly as you have done
Return list_2 when you are done
Concatenate each resulting ArrayList<String> to the original ArrayList using addAll()
This way, you have no synchronization overhead to deal with!
This will require a few changes to your executor code. Instead of calling execute(), you'll need to call submit(). This returns a Future object (Future<ArrayList<String>> in your case) that holds the results of the call() method. You'll need to store this into some collection -- an array, ArrayList, doesn't matter.
To retrieve the results, simply loop through the collection of Future objects and call get() (I think). This call will block until the thread that the Future object corresponds to is complete.
I think that's it. While this is more complicated, I think that this is be best performance you're going to get, as with enough threads either CPU contention or your network link will become the bottleneck. Please ask if you have any questions, and I'll update as needed.