I am creating a crawler like app which will find images in the web pages. In here the producer produces the links and the consumer connects to that link for finding the images, but consumer was taking a lot of time because of huge number of links produced by the consumer. So I placed the consumer in the executor service but I could not find decrease in time taken by the consumer. Kindly help me out. Below is my code.
#Service
#Qualifier("crawlerService")
public class CrawlerService {
#Autowired
#Qualifier("loggerService")
LoggerService loggerService;
#Autowired
#Qualifier("imageTypeExtensionCombo")
ImageTypeExtensionCombo imageTypeExtensionCombo;
public List<String> startCrawler(List<String> links, List<String> images, URL url, String protocol, String protocolHost) throws Exception{
LinkQueue queue = new LinkQueue(links);
LinkProducer producer = new LinkProducer(links, url, protocol, protocolHost, queue, loggerService);
LinkConsumer consumer = new LinkConsumer(links, images, url, protocol, protocolHost, loggerService, queue);
ExecutorService executorService = Executors.newFixedThreadPool(4);
executorService.submit(consumer);
producer.start();
//consumer.start();
Thread.currentThread().join();
executorService.shutdown();
return images;
}
}
LinkProducer class
public class LinkProducer extends Thread {
private List<String> anchorList;
private URL url;
private String protocol;
private String protocolHost;
private UrlValidator urlValidator = new UrlValidator();
private LinkQueue queue;
private LoggerService loggerService;
private int MAX_QUEUE_SIZE = 2;
private int counter = 0;
private boolean stopThread = false;
private String HTML_TYPE = "HTML";
private String HTML_CONTENT_TYPE = "text/html";
private String IMAGE_TYPE = "IMAGE";
private String NON_HTML_NON_IMAGE_TYPE = "OTHERS";
public LinkProducer(List<String> anchorList, URL url, String protocol,String protocolHost, LinkQueue queue, LoggerService loggerService) {
super(protocolHost.replace(protocol, "").replaceAll("/", ""));
this.anchorList = anchorList;
this.url = url;
this.protocol = protocol;
this.protocolHost = protocolHost;
this.queue = queue;
this.loggerService = loggerService;
}
public void run() {
int i = 0;
while(true) {
List<String> anchors = null;
loggerService.log("Producer Thread : " + (++i));
try {
anchors = produce();
} catch (Exception ex) {
loggerService.log("Exception occured in producer thread : "+ ex.getMessage());
ex.printStackTrace();
if(stopThread){
break;
}
}
if(stopThread){
break;
}
if(anchors != null && anchors.size() > 0){
Iterator<String> iter = anchors.iterator();
while(iter.hasNext()){
synchronized (queue) {
queue.enQueue(iter.next());
}
}
}
}
}
}
LinkConsumer class
public class LinkConsumer extends Thread {
private List<String> anchorList;
private List<String> imageList;
private URL url;
private String protocol;
private String protocolHost;
private LinkQueue queue;
private LoggerService loggerService;
private UrlValidator urlValidator = new UrlValidator();
private String HTML_TYPE = "HTML";
private String HTML_CONTENT_TYPE = "text/html";
private String IMAGE_TYPE = "IMAGE";
private String NON_HTML_NON_IMAGE_TYPE = "OTHERS";
public LinkConsumer(List<String> anchorList, List<String> imageList, URL url, String protocol,String protocolHost, LoggerService loggerService, LinkQueue queue) {
super(protocolHost.replace(protocol, "").replaceAll("/", ""));
this.anchorList = anchorList;
this.imageList = imageList;
this.url = url;
this.protocol = protocol;
this.protocolHost = protocolHost;
this.queue = queue;
this.loggerService = loggerService;
}
public void run() {
int i = 0;
while (!queue.isEmpty()) {
List<String> images = null;
loggerService.log("Consumer Thread : " + (++i));
try {
images = consume();
} catch (Exception ex) {
loggerService.log("Exception occured in consumer thread : "+ ex.getMessage());
ex.printStackTrace();
}
if (images != null && images.size() > 0) {
Iterator<String> iter = images.iterator();
while (iter.hasNext()) {
imageList.add(iter.next());
}
}
}
}
}
Thanks
You create and submit only a single LinkConsumer and therefore you have only one worker.
To achieve real parallel performance you will need to create and submit more LinkConsumer.
Multi-threading doesn't give you much of an advantage. Infact, it increases the complexity, when you are creating too many threads and when your hardware isn't capable enough of processing these threads.
Multi-threading would give you significant gains, only if you use it efficiently. If you keep creating threads this way, you aren't going to gain any performance improvements.
Your hardware, especially processor specs and the volume of data you are writing to your disk are the primary constraints, which are going to decide the performance that you would get.
I would suggest the following.
Have multiple machines. One machine, which acts as producer writes all URL's or images or what ever you want to a database. The client systems take the URL's from DB and fetch the data from the source.
Technically, you have multiple systems which are working and each machine could have ~10 active threads at a time. And you need to code only once and run the same code on multiple machines. You could use the same producer machine as consumer too.
You can try something like this to create new threads. But I'm not sure that creating new threads will increase time too much. You will need better Hardware as well.
public boolean secondThread(){
Thread t = new Thread(){
public void run(){
//do somehting
}
};
t.start();
return true;
}
Related
I am using Fork join pool in java for multitasking. Now i came across a situation where, for every task, I need to hit a url then wait for 10 minutes and then again hit another url to read the data. Now the problem is that for those 10 minutes my CPU is idle and not starting another tasks ( more than those defined in fork join pool).
static ForkJoinPool pool = new ForkJoinPool(10);
public static void main(String[] args){
List<String> list = new ArrayList<>();
for(int i=1; i<=100; i++){
list.add("Str"+i);
}
final Tasker task = new Tasker(list);
pool.invoke(task);
public class Tasker extends RecursiveAction{
private static final long serialVersionUID = 1L;
List<String> myList;
public Tasker(List<String> checkersList) {
super();
this.myList = checkersList;
}
#Override
protected void compute() {
if(myList.size()==1){
System.out.println(myList.get(0) + "start");
//Date start = new Date();
try {
Thread.sleep(10*60*1000);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(myList.get(0) + "Finished");
}
else{
List<String> temp = new ArrayList<>();
temp.add( myList.get( myList.size()-1 ) );
myList.remove( myList.size()-1 );
Tasker left = new Tasker(myList);
Tasker right = new Tasker(temp);
left.fork();
right.compute();
left.join();
}
}
Now What should I do so that CPU picks all the tasks and then wait parallaly for them.
Unfortunately, ForkJoinPool does not work well in the face of Thread.sleep(), because it designed for many short tasks that finish quickly, rather than tasks that block for a long time.
Instead, for what you are trying to accomplish, I would recommend using ScheduledThreadPoolExecutor and dividing your task into two parts.
import java.util.*;
import java.util.concurrent.*;
public class Main {
static ScheduledThreadPoolExecutor pool = new ScheduledThreadPoolExecutor(10);
public static void main(String[] args){
for(int i=1; i<=100; i++){
pool.schedule(new FirstHalf("Str"+i), 0, TimeUnit.NANOSECONDS);
}
}
static class FirstHalf implements Runnable {
String name;
public FirstHalf(String name) {
this.name = name;
}
public void run() {
System.out.println(name + "start");
pool.schedule(new SecondHalf(name), 10, TimeUnit.MINUTES);
}
}
static class SecondHalf implements Runnable {
String name;
public SecondHalf(String name) {
this.name = name;
}
public void run() {
System.out.println(name + "Finished");
}
}
}
If Java provides a thread pool which allows releasing the underlying resources (that is, the kernel thread participating in the thread pool) during a Thread.sleep(), you should use that instead, but I currently do not know of one.
According to docs forkJoin basic use section tells:
if (my portion of the work is small enough)
do the work directly
else
split my work into two pieces
invoke the two pieces and wait for the results
Hopefully this meets your need if you are using forkjoin
public class Tasker extends RecursiveAction {
static ForkJoinPool pool = new ForkJoinPool(10);
static int threshold = 10;
public static void main(String[] args){
List<String> list = new ArrayList<>();
for(int i=1; i<=100; i++){
list.add("Str"+i);
}
final Tasker task = new Tasker(list);
pool.invoke(task);
}
private static final long serialVersionUID = 1L;
List<String> myList;
public Tasker(List<String> checkersList) {
super();
this.myList = checkersList;
}
void computeDirectly() {
for(String url : myList){
System.out.println(url + " start");
}
//Date start = new Date();
try {
//keep hitting url
while (true) {
for(String url : myList) {
//url hitting code here
System.out.println(url + " hitting");
}
Thread.sleep(10 * 60 * 1000);
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
for(String url : myList){
System.out.println(url + " Finished");
}
}
#Override
protected void compute() {
if (myList.size() <= threshold) {
computeDirectly();
return;
}
//temp list have only one url
//List<String> temp = new ArrayList<>();
//temp.add( myList.get( myList.size()-1 ) );
//myList.remove( myList.size()-1 );
//Tasker left = new Tasker(myList);
//Tasker right = new Tasker(temp);
//left.fork();
//right.compute();
//left.join();
List<String> first = new ArrayList<>();
List<String> second = new ArrayList<>();
//divide list
int len = myList.size();
int smHalf = len / 2;//smaller half
first = myList.subList(0, smHalf);
second = myList.subList(smHalf + 1, len);
invokeAll(new Tasker(first), new Tasker(second));
}
}
I have a Singleton class which connects to Cassandra. I want to initialize processMetadata, procMetadata and topicMetadata all at once not one by one. If they gets initialize all at once then I will see consistent values from all those three not different values for either of them.
In the below code, processMetadata, procMetadata and topicMetadata is initialized for the first time inside initializeMetadata method and then it gets updated every 15 minutes.
public class CassUtil {
private static final Logger LOGGER = Logger.getInstance(CassUtil.class);
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
// below are my three metedata which I need to update all three at once not one by one
private List<ProcessMetadata> processMetadata = new ArrayList<>();
private List<ProcMetadata> procMetadata = new ArrayList<>();
private List<String> topicMetadata = new ArrayList<>();
private Session session;
private Cluster cluster;
private static class Holder {
private static final CassUtil INSTANCE = new CassUtil();
}
public static CassUtil getInstance() {
return Holder.INSTANCE;
}
private CassUtil() {
List<String> servers = TestUtils.HOSTNAMES;
String username = TestUtils.USERNAME;
String password = TestUtils.PASSWORD;
PoolingOptions opts = new PoolingOptions();
opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
opts.getCoreConnectionsPerHost(HostDistance.LOCAL));
Builder builder = Cluster.builder();
cluster =
builder
.addContactPoints(servers.toArray(new String[servers.size()]))
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withPoolingOptions(opts)
.withReconnectionPolicy(new ConstantReconnectionPolicy(100L))
.withLoadBalancingPolicy(
DCAwareRoundRobinPolicy
.builder()
.withLocalDc(
!TestUtils.isProduction() ? "DC2" : TestUtils.getCurrentLocation()
.get().name().toLowerCase()).build())
.withCredentials(username, password).build();
try {
session = cluster.connect("testkeyspace");
} catch (NoHostAvailableException ex) {
LOGGER.logError("error= ", ExceptionUtils.getStackTrace(ex));
} catch (Exception ex) {
LOGGER.logError("error= " + ExceptionUtils.getStackTrace(ex));
}
}
// start a background thread which runs every 15 minutes
public void startScheduleTask() {
scheduler.scheduleAtFixedRate(new Runnable() {
public void run() {
try {
processMetadata = processMetadata(true);
topicMetadata = listOfTopic(TestUtils.GROUP_ID);
procMetadata = procMetadata();
} catch (Exception ex) {
LOGGER.logError("error= ", ExceptionUtils.getStackTrace(ex));
}
}
}, 0, 15, TimeUnit.MINUTES);
}
// called from main thread to initialize the metadata
// and start the background thread where it gets updated
// every 15 minutes
public void initializeMetadata() {
processMetadata = processMetadata(true);
topicMetadata = listOfTopic(TestUtils.GROUP_ID);
procMetadata = procMetadata();
startScheduleTask();
}
private List<String> listOfTopic(final String consumerName) {
List<String> listOfTopics = new ArrayList<>();
String sql = "select topics from topic_metadata where id=1 and consumerName=?";
try {
// get data from cassandra
} catch (Exception ex) {
LOGGER.logError("error= ", ExceptionUtils.getStackTrace(ex), ", Consumer Name= ",
consumerName);
}
return listOfTopics;
}
private List<ProcessMetadata> processMetadata(final boolean flag) {
List<ProcessMetadata> metadatas = new ArrayList<>();
String sql = "select * from process_metadata where id=1 and is_active=?";
try {
// get data from cassandra
} catch (Exception ex) {
LOGGER.logError("error= ", ExceptionUtils.getStackTrace(ex), ", active= ", flag);
}
return metadatas;
}
private List<ProcMetadata> procMetadata() {
List<ProcMetadata> metadatas = new ArrayList<>();
String sql = "select * from schema where id=1";
try {
// get data from cassandra
} catch (SchemaParseException ex) {
LOGGER.logError("schema parsing error= ", ExceptionUtils.getStackTrace(ex));
} catch (Exception ex) {
LOGGER.logError("error= ", ExceptionUtils.getStackTrace(ex));
}
return metadatas;
}
public List<ProcessMetadata> getProcessMetadata() {
return processMetadata;
}
public List<String> getTopicMetadata() {
return topicMetadata;
}
public List<ProcMetadata> getProcMetadata() {
return procMetadata;
}
}
So from my main thread, I call initializeMetadata method only once which initializes those three metadata and then it starts a background thread which updates them every 15 minutes. Afer that I was using them like below from my multiple threads:
CassUtil.getInstance().getProcessMetadata();
CassUtil.getInstance().getTopicMetadata();
CassUtil.getInstance().getProcMetadata();
Problem Statement:-
Now I want to see same state of processMetadata, topicMetadata and procMetadata. Meaning these three metadata should be updated at same time not one after other bcoz I don't want to see mix state value for them after I do get on them.
How can I avoid this issue? Do I need to create another class which will hold these three metadata as constructor parameter?
The most efficient way to keep a consistent state of your lists can be to use an immutable class that will hold your 3 lists, you will then have a field of this type in your class that you will define volatile to make sure that all threads see the latest update of this field.
Here is for example the immutable class that we use to hold the state of the lists (it could be an ordinary class but as it is implementation specific it could be a static inner class):
private static class State {
private final List<ProcessMetadata> processMetadata;
private final List<ProcMetadata> procMetadata;
private final List<String> topicMetadata;
public State(final List<ProcessMetadata> processMetadata,
final List<ProcMetadata> procMetadata, final List<String> topicMetadata) {
this.processMetadata = new ArrayList<>(processMetadata);
this.procMetadata = new ArrayList<>(procMetadata);
this.topicMetadata = new ArrayList<>(topicMetadata);
}
// Getters
}
Then your class would be something like that:
public class CassUtil {
...
private volatile State state = new State(
new ArrayList<>(), new ArrayList<>(), new ArrayList<>()
);
...
public void startScheduleTask() {
...
this.state = new State(
processMetadata(true), listOfTopic(TestUtils.GROUP_ID),
procMetadata()
);
...
}
...
public void initializeMetadata() {
this.state = new State(
processMetadata(true), listOfTopic(TestUtils.GROUP_ID), procMetadata()
);
startScheduleTask();
}
...
public List<ProcessMetadata> getProcessMetadata() {
return this.state.getProcessMetadata();
}
public List<String> getTopicMetadata() {
return this.state.getTopicMetadata();
}
public List<ProcMetadata> getProcMetadata() {
return this.state.getProcMetadata();
}
A little bit of context: the client is sending to the server a SOSPFPacket object (via TCP) that has various attributes, such as a Vector<LSA> lsaArray. The LSA itself has a LinkedList<LinkDescription> links attribute. In my test case, there are two messages being sent. In both messages, there is only one LSA in the vector. In the first message, the LSA has one LinkDescription, in the second, it has two. When I send a message, I increment the messageId.
The server receives both messages with proper ids, but in the second message, the links only contain one link instead of two. I'm clueless...
Here are the object implementations:
import java.io.*;
import java.util.Vector;
public class SOSPFPacket implements Serializable {
public final static short HELLO = 0;
public final static short LSU = 1;
public final static short OVER_BURDENED = 2;
public static int id = Integer.MIN_VALUE;
public String srcProcessIP;
public short srcProcessPort;
public String srcIP;
public String dstIP;
public short sospfType; //0 - HELLO, 1 - LinkState Update, 2 - Over Burdened
public String routerID;
public int messageId = id++;
public String neighborID; //neighbor's simulated IP address
public Vector<LSA> lsaArray = new Vector<>();
public String lsaInitiator = null;
}
import java.io.Serializable;
import java.util.LinkedList;
public class LSA implements Serializable {
public String linkStateID;
public int lsaSeqNumber = Integer.MIN_VALUE;
public LinkedList<LinkDescription> links = new LinkedList<LinkDescription>();
#Override
public String toString() {
StringBuffer sb = new StringBuffer();
sb.append(linkStateID + ":").append(lsaSeqNumber + "\n");
for (LinkDescription ld : links) {
sb.append(ld);
}
sb.append("\n");
return sb.toString();
}
}
import java.io.Serializable;
public class LinkDescription implements Serializable {
public String linkID;
public int portNum;
public int tosMetrics;
public LinkDescription() {}
public LinkDescription(String linkID, int portNum, int tosMetrics) {
this.linkID = linkID;
this.portNum = portNum;
this.tosMetrics = tosMetrics;
}
public String toString() {
return linkID + "," + portNum + "," + tosMetrics;
}
}
To send the message, I do it via a Client.java thread implementing Runnable. Here are the relevant methods:
public void run() {
try {
_outputStream = new ObjectOutputStream(_clientSocket.getOutputStream());
sendMessage(SOSPFPacket.HELLO);
_inputStream = new ObjectInputStream(_clientSocket.getInputStream());
SOSPFPacket message = Util.receiveMessage(_inputStream);
if (message.sospfType == SOSPFPacket.OVER_BURDENED) {
System.out.println("Removing link with router " + message.srcIP + "...");
_router.removeLink(_remoteRouterIP);
return;
}
_remoteRouterDescription.setStatus(RouterStatus.TWO_WAY);
_router.addLinkDescriptionToDatabase(_remoteRouterDescription, _link.getWeight());
sendMessage(SOSPFPacket.HELLO);
message = Util.receiveMessage(_inputStream);
if (message.sospfType == SOSPFPacket.LSU) {
_router.synchronize(message.lsaArray);
}
_router.propagateSynchronization(message.lsaInitiator, message.srcIP);
} catch (IOException e) {
e.printStackTrace();
}
}
private void sendMessage(short messageType) {
try {
SOSPFPacket message = Util.makeMessage(_rd, _remoteRouterDescription, messageType, _router);
_outputStream.writeObject(message);
_outputStream.flush();
} catch (IOException e) {
e.printStackTrace();
}
}
public class Util {
public static SOSPFPacket makeMessage(RouterDescription local, RouterDescription external, short messageType, Router rd) {
SOSPFPacket message = new SOSPFPacket();
message.srcProcessIP = local.getProcessIPAddress();
message.srcProcessPort = local.getProcessPortNumber();
message.srcIP = local.getSimulatedIPAddress();
message.dstIP = external.getSimulatedIPAddress();
message.sospfType = messageType;
message.routerID = local.getSimulatedIPAddress();
message.neighborID = external.getSimulatedIPAddress();
rd.getLsd().getStore().forEach((k, v) -> message.lsaArray.addElement(v));
message.lsaInitiator = messageType == SOSPFPacket.LSU ? message.srcIP : null;
return message;
}
public static SOSPFPacket receiveMessage(ObjectInputStream inputStream) {
SOSPFPacket receivedMessage = null;
try {
receivedMessage = (SOSPFPacket) inputStream.readObject();
String messageType;
switch (receivedMessage.sospfType) {
case SOSPFPacket.HELLO:
messageType = "HELLO";
break;
case SOSPFPacket.LSU:
messageType = "LINKSTATEUPDATE";
break;
case SOSPFPacket.OVER_BURDENED:
messageType = "OVER_BURDENED";
break;
default:
messageType = "UNKNOWN_STATE";
break;
}
System.out.println("received " + messageType + " from " + receivedMessage.srcIP + ";");
} catch (ClassNotFoundException e) {
System.out.println("No message received.");
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return receivedMessage;
}
}
And the server instantiates a private ClientServiceThread when it receives a new connection, which is in charge of receiving the message.
private class ClientServiceThread implements Runnable {
Socket _clientSocket;
Thread _runner;
ClientServiceThread(Socket s) {
_clientSocket = s;
_runner = new Thread(this);
}
public Thread getRunner() { return _runner; }
public void run() {
ObjectInputStream inputStream = null;
ObjectOutputStream outputStream = null;
try {
inputStream = new ObjectInputStream(_clientSocket.getInputStream());
outputStream = new ObjectOutputStream(_clientSocket.getOutputStream());
while (true) {
try {
SOSPFPacket receivedMessage = Util.receiveMessage(inputStream);
//some logic not relevant since the receivedMessage is already not correct
}
}
}
}
}
Again, all SOSPFPacket fields are correctly received, except for the Vector<LSA> lsaArray...
Edit: I also tried sending a third sendMessage(SOSPFPacket.HELLO) after _router.propagateSynchronization(message.lsaInitiator, message.srcIP);. This time, the message being sent contains two LSA, the first one having two LinkDescription, the second one having one. Both LSA are received by the server, but still, only the first LinkDescription is received in the first LSA. The message id is correct in all three messages.
If I run everything a second time (i.e. I create a new Client and a new ClientService Thread for the already running routers), only then does the server finally receive two LinkDescription in the first LSA.
Java sends references to objects that have already been serialized, to preserve the integrity of object graphs.
You should call ObjectOutputStream.reset() after each writeObject().
Or use ObjectOutputStream.writeUnshared(), but note that it still shares referenced objects, i.e. if you try to send a list with both added and changed element objects, it will send the new list and new element objects, but not the element objects which have been changed.
Finally figured it out. Somehow it seems like the problem was the following line of code in Util.makeMessage: rd.getLsd().getStore().forEach((k, v) -> message.lsaArray.addElement(v));. I replaced it with rd.getLsd().getStore().forEach((k, v) -> message.lsaArray.add(new LSA(v))); with the following LSA constructor:
public LSA(LSA lsa) {
linkStateID = lsa.linkStateID;
lsaSeqNumber = lsa.lsaSeqNumber;
links = new LinkedList<>();
for (LinkDescription ld : lsa.links) {
LinkDescription linkD = new LinkDescription();
linkD.linkID = ld.linkID;
linkD.portNum = ld.portNum;
linkD.tosMetrics = ld.tosMetrics;
links.add(linkD);
}
}
In other words, I needed to deep copy the object contained in my message.
I have a fairly typical producer-consumer scenario where I have 1 producer thread that executes queries and puts the results to a BlockingQueue and approx 7-8 consumers that pick these objects from the BlockingQueue and run a rather long lasting analysis on them. Once these analyses are completed, the resultant objects are placed in a HashMap with the original objects as keys, i.e. HashMap<AnalyzedObject, AnalysisResult>
Due to the nature of relationships in the underlying data model, I get a lot of duplicate tasks, which obviously do not need to be reprocessed. My current solution is essentially as follows:
public class AnalysisAction implements Runnable{
private Dataset data;
private DbManager dbManager;
private Path path;
private Set<Integer> identifiedElements;
private AnalysisResult res;
private Map<Path, AnalysisResult> analyzedPaths;
public static final AtomicInteger duplicates = new AtomicInteger(0);
public AnalysisAction(Path p, Dataset ds, DbManager dbm, Map<Path, AnalysisResult> paths){
this.data = ds;
this.path = p;
this.dbManager = dbm;
this.analyzedPaths = paths;
this.res = new AnalysisResult(path);
}
#Override
public void run() {
if(!analyzedPaths.containsKey(path)){
t0 = System.currentTimeMillis();
// 1. Check the coverage of the path
this.identifiedElements = getIdentifiedElements();
if(identifiedElements.size() != 0)
{
try{
// TIME CONSUMING STUFF...
analyzedPaths.put(path, res);
}
catch(Exception e){
// Exception handling...
}
}
t_end = System.currentTimeMillis();
DebugToolbox.submitProcTime(t_end - t0);
}
else {
duplicates.incrementAndGet();
logger.finer("Duplicate path encountered..." + System.lineSeparator());
}
}
// PRIVATE METHODS THAT CARRY OUT THE TIME CONSUMING STUFF...
}
Then within the class that controls the multithreading I have the following solution:
public class ConcurrencyService {
private final ThreadPoolExecutor pool;
private final int poolSize;
private final int qCapacity = 1 << 7;
private final long timeout = 3;
private final Path tainedPath =
new Path(Long.MIN_VALUE, "LAST_PATH_IN_QUEUE", "N/A", "N/A");
private BlockingQueue<PathwayImpl> bq;
private DbManager dbMan;
private Dataset ds;
private Map<Path,AnalysisResult> analyzedPaths;
private volatile boolean started;
public ConcurrencyService(Dataset data, DbManager db){
this.ds = data;
this.bq = new LinkedBlockingQueue<Path>(qCapacity);
this.dbMan = db;
this.analyzedPaths = new ConcurrentHashMap<Path,AnalysisResult>(1<<15);
this.started = false;
poolSize = Runtime.getRuntime().availableProcessors();
pool = (ThreadPoolExecutor) Executors.newFixedThreadPool(poolSize, new FThreadFactory(-1));
}
public void serve() throws InterruptedException {
try {
ds.finalize();
started = true;
Thread producerThread = new Thread(new QueryingAction(), "f-query-thread");
producerThread.start();
Thread loggerThread = new Thread(new PeriodicLogAction(null), "f-logger-thread");
loggerThread.start();
while((producerThread.getState() != Thread.State.TERMINATED) || !bq.isEmpty()){
Path p = bq.poll(timeout, TimeUnit.MINUTES);
if(p != null){
if (p.equals(tainedPath)) break;
pool.submit(new AnalysisAction(p, ds, dbMan, analyzedPaths));
}else
logger.warning("Timed out while waiting for a path...");
}
} catch (Exception ex) {
// Exception handling...
} finally{
pool.shutdown();
long totalTasks = pool.getTaskCount(),
compTasks = pool.getCompletedTaskCount(),
tasksRemaining = totalTasks - compTasks,
timeout = 10 * tasksRemaining / poolSize;
pool.awaitTermination(timeout, TimeUnit.SECONDS);
logger.info(
"A total of " + DebugToolbox.getNbrProcTimes()
+ " tasks analyzed. Mean process time is: "
+ DebugToolbox.getMeanProcTimeAsString()
+ " milliseconds." + System.lineSeparator());
}
public boolean isDone(){
if(this.started)
return pool.isTerminated();
else
return false;
}
}
protected class QueryingAction implements Runnable {
// Use this to limit the number of paths to be analyzed
// private final int debugLimiter = 1500;
private final int debugLimiter = Integer.MAX_VALUE;
public void run() {
try {
int i = 0;
outer: for(String el : ds.getElements()){
inner: for(Path path : dbMan.getAllPathsWithElement(el)){
if(i++ > debugLimiter)
break outer;
else
bq.put(path);
}
}
logger.info("Total number of queried paths: " + i);
} catch (SQLException e) {
// Exception handling...
} catch (InterruptedException e) {
// Exception handling...
}
bq.offer(tainedPath);
}
}
protected class PeriodicLogAction implements Runnable {
private final PrintStream ps;
private final long period;
private final static long DEF_PERIOD = 30000;
private final String nL = System.getProperty("line.separator");
private volatile boolean loop;
private int counter = 0;
private ConcurrencyService cs;
private int inQueryQueue, inPoolQueue,
completedTasks, inProccessedSet,duplicates;
boolean sanityCheck;
StringBuffer sb;
PeriodicLogAction(PrintStream ps, long timePeriod) {
this.ps = ps;
this.period = timePeriod;
this.loop = true;
this.cs = ConcurrencyService.this;
}
// Alternative constructors
#SuppressWarnings("rawtypes")
public void run() {
logger.config("PeriodicLogAction started on thread: " +
Thread.currentThread().getName() +
System.lineSeparator());
while(loop){
// log # of paths created, analyzed and are in queue
outputLogInfo();
// wait designated time period
try {
Thread.sleep(period);
} catch (InterruptedException e) {}
if(cs.isDone()){
this.loop = false;
outputLogInfo();
}
}
}
private void outputLogInfo(){
synchronized (pool) {
Queue queryQueue = cs.bq,
poolQueue = cs.pool.getQueue();
Map<PathwayImpl,AnalysisResult> processedSet = cs.analyzedPaths;
inQueryQueue = queryQueue.size();
inPoolQueue = poolQueue.size();
completedTasks = (int) pool.getCompletedTaskCount();
inProccessedSet = processedSet.size();
duplicates = AnalysisAction.duplicates.get();
sanityCheck = (completedTasks == inProccessedSet + duplicates);
}
sb = new StringBuffer();
sb.append("Checkpoint ").append(++counter).append(": ")
.append("QQ: ").append(inQueryQueue).append("\t")
.append("PQ: ").append(inPoolQueue).append("\t")
.append("CT: ").append(completedTasks).append("\t")
.append("AP: ").append(inProccessedSet).append("\t")
.append("DP: ").append(duplicates).append("\t")
.append("Sanity: ").append(sanityCheck);
if(ps == null)
logger.info(sb.toString() + nL);
else
ps.println(sb.toString());
}
}
}
Here's what I see on the logs:
Sep 17, 2014 5:30:00 PM main.ConcurrencyService$QueryingAction run
INFO: Total number of queried paths: 81128
Sep 17, 2014 5:30:00 PM main.ConcurrencyService serve
INFO: All paths are queried and queued...
Initiating a timely shutdown of the pool..
...
Sep 17, 2014 5:49:49 PM main.ConcurrencyService serve
INFO: A total of 8620 tasks analyzed. Mean process time is: 1108.208 milliseconds.
...
Sep 17, 2014 5:49:54 PM main.ConcurrencyService$PeriodicLogAction outputLogInfo
INFO: Checkpoint 41: QQ: 0 PQ: 0 CT: 81128 AP: 8565 DP: 72508 Sanity: false
... which indicate:
The number of completed tasks is in agreement with the number of objects that were queried and queued. So nothing is missed..
The number of analyzed paths (thus results) and the number of duplicates do not add up to the number of completed tasks: 81128 - (8565 + 72508) = 55
The number of results accumulated does not match with reported process times from AnalysisAction class: 8565 vs 8620 (i.e. there are 55 results missing)
Not sure what could be the cause of this discrepancy, or where to begin debugging. I can't obviously step through 81128 tasks to investigate which 55 are missing, and why..
Any suggestions?
EDIT: here are some clarifications addressing the question in the comments
DebugToolbox.submitProcTimes(long t) is a synchronized static method, it simply adds t to an ArrayList.
isDone() is a method in ConcurrencyService, I had accidentally removed it as I was trying to shorten the code I post here. I've edited the code to reflect how the method is implemented.
You check the map for presence of the key, then spend time generating the value, then put the value into the map.
While you're generating the value, another thread can process the same key. Since it's not been added yet, you now have two threads generating the same value. So the count of values generated is larger than the final size of the map.
The solution is to add the result (may be a placeholder) and check for the key's presence atomically using putIfAbsent().
I have an pool of objects in a blockingQueue. Now i want to assign objects from the queue to a thread and use it inside the run method.
What is the best way of doing it?
Here is a sample code i am using to build the pool:
public class ConsumerPool {
private static Logger log;
//building consumer Pool
#SuppressWarnings("finally")
public BlockingQueue<OAuthConsumer> buildConsumerPool() {
BlockingQueue<OAuthConsumer> consumerObjectsQueue = null;
try {
//setting the config path
PropertyHandler.setConfigPath(propertiesMain);
String twitterPath = PropertyHandler.getProperty("twitterPath");
//setting config for twitter
PropertyHandler.setConfigPath(twitterPath);
//Blocking Linked Queue
consumerObjectsQueue = new LinkedBlockingQueue<OAuthConsumer>();
//fetching required tokens for all apps
String consumerKeySet = PropertyHandler.getProperty("consumerKey");
String consumerSecretSet = PropertyHandler.getProperty("consumerSecret");
String accessTokenSet = PropertyHandler.getProperty("accessToken");
String tokenSecretSet = PropertyHandler.getProperty("tokenSecret");
String[] splitconsumerKeys = consumerKeySet.split(",");
String[] splitconsumerSecret = consumerSecretSet.split(".");
String[] splitaccessToken = accessTokenSet.split(",");
String[] splittokenSecret = tokenSecretSet.split(".");
//creating consumer objects for each app
for (int i= 0; i< splitconsumerKeys.length; i++) {
log.info("constructing consumer object for twitter api " +i);
String consumerKey = splitconsumerKeys[i];
String consumerSecret = splitconsumerSecret[i];
String accessToken = splitaccessToken[i];
String tokenSecret = splittokenSecret[i];
OAuthConsumer consumer = new CommonsHttpOAuthConsumer(consumerKey, consumerSecret);
consumer.setTokenWithSecret(accessToken, tokenSecret);
consumerObjectsQueue.put(consumer);
log.info("added the consumer object to que pool");
}
} catch (Exception e) {
e.printStackTrace();
} finally {
return consumerObjectsQueue;
}
}
That is used to build the object pool.
Here is the way i want to create threads.
public class MrRunnable implements Runnable {
private String toFireUrl;
MrRunnable(String url){
}
#Override
public void run() {
// do some function here
}
}
public class Main {
public static void main(String[] args) {
// We will create 500 threads
for (int i = 0; i < 500; i++) {
Runnable task = new MrRunnable("some new url");
Thread worker = new Thread(task);
//start the thread
worker.start();
}
}
}
Now i want to access the objects in the pool via a thread. In the main program should i pass the object from the consumer pool to runnable class during the creation of MrRunnable Object or is there any other way i can do it ?
the constructor of MrRunnable should get a reference to the queue