How to force remote-only reads in Cassandra3? - java

We are trying to modify the Cassandra code to perform ONLY remote reads (never read locally) for performance testing purposes of the Speculative Retry and Request Duplication latency reduction techniques.
So far we have modified
src/java/org/apache/cassandra/service/AbstractReadExecutor.java
to do something like this:
public abstract class AbstractReadExecutor {
protected int getNonLocalEndpointIndex (Iterable<InetAddress> endpoints) {
int endpoint_index = 0;
// iterate thru endpoints and pick non-local one
boolean found = false;
for (InetAddress e : endpoints) {
if (! StorageProxy.canDoLocalRequest(e) ) {
found = true;
break;
}
endpoint_index++;
}
if (!found) {
endpoint_index = 0;
}
return endpoint_index;
}
}
public static class NeverSpeculatingReadExecutor extends AbstractReadExecutor {
public void executeAsync() {
int endpoint_index = getNonLocalEndpointIndex(targetReplicas);
makeDataRequests(targetReplicas.subList(endpoint_index, endpoint_index+1));
if (targetReplicas.size() > 1)
makeDigestRequests(targetReplicas.subList(1, targetReplicas.size()));
}
}
}
However, it does not work since targetReplicas is almost always just 1 endpoint (the local one) for using small workloads, 5 cassandra nodes, and a replication factor of 3.

If this is just for testing, can set 1 node to be in wrong DC and use LOCAL queries for things that that node does not own (white list load balancing policy on driver to ensure requests go to only it). Just need to make it so only test for things that node doesn't own a copy of.
Or are you interested in doing things like testing the proxy mutations in the read repairs?

I was able to only do remote-reads by adding a function "getRemoteReplicas()" that filters out the local nodes before/when the ReadExecutor object is created. consistencyLevel.filterForQuery() then usually just returns 1 node (a non-local one).
public static AbstractReadExecutor getReadExecutor(...) {
...
List<InetAddress> remoteReplicas = getRemoteReplicas( allReplicas );
List<InetAddress> targetReplicas = consistencyLevel.filterForQuery(keyspace, remoteReplicas, repairDecision);
...
}
private static List<InetAddress> getRemoteReplicas(List<InetAddress> replicas) {
logger.debug("ALL REPLICAS: " + replicas.toString());
List<InetAddress> remote_replicas = new ArrayList<>();
// iterate thru replicas and pick non-local one
boolean found = false;
for (InetAddress r : replicas) {
if (! StorageProxy.canDoLocalRequest(r) ) {
remote_replicas.add(r);
found = true;
}
}
if (!found) {
return replicas;
}
logger.debug("REMOTE REPLICAS: " + remote_replicas.toString());
return remote_replicas;
}
in src/java/org/apache/cassandra/service/AbstractReadExecutor.java

Related

How to use the same hashmap in multiple threads

I have a Hashmap that is created for each "mailer" class and each "agent" class creates a mailer.
My problem is that each of my "agents" creates a "mailer" that in turn creates a new hashmap.
What I'm trying to do is to create one Hashmap that will be used by all the agents(every agent is a thread).
This is the Agent class:
public class Agent implements Runnable {
private int id;
private int n;
private Mailer mailer;
private static int counter;
private List<Integer> received = new ArrayList<Integer>();
#Override
public void run() {
System.out.println("Thread has started");
n = 10;
if (counter < n - 1) {
this.id = ThreadLocalRandom.current().nextInt(0, n + 1);
counter++;
}
Message m = new Message(this.id, this.id);
this.mailer.getMap().put(this.id, new ArrayList<Message>());
System.out.println(this.mailer.getMap());
for (int i = 0; i < n; i++) {
if (i == this.id) {
continue;
}
this.mailer.send(i, m);
}
for (int i = 0; i < n; i++) {
if (i == this.id) {
continue;
}
if (this.mailer.getMap().get(i) == null) {
continue;
} else {
this.received.add(this.mailer.readOne(this.id).getContent());
}
}
System.out.println(this.id + "" + this.received);
}
}
This is the Mailer class :
public class Mailer {
private HashMap<Integer, List<Message>> map = new HashMap<>();
public void send(int receiver, Message m) {
synchronized (map) {
while (this.map.get(receiver) == null) {
this.map.get(receiver);
}
if (this.map.get(receiver) == null) {
} else {
map.get(receiver).add(m);
}
}
}
public Message readOne(int receiver) {
synchronized (map) {
if (this.map.get(receiver) == null) {
return null;
} else if (this.map.get(receiver).size() == 0) {
return null;
} else {
Message m = this.map.get(receiver).get(0);
this.map.get(receiver).remove(0);
return m;
}
}
}
public HashMap<Integer, List<Message>> getMap() {
synchronized (map) {
return map;
}
}
}
I have tried so far :
Creating the mailer object inside the run method in agent.
Going by the idea (based on your own answer to this question) that you made the map static, you've made 2 mistakes.
do not use static
static means there is one map for the entire JVM you run this on. This is not actually a good thing: Now you can't create separate mailers on one JVM in the future, and you've made it hard to test stuff.
You want something else: A way to group a bunch of mailer threads together (these are all mailers for the agent), but a bit more discerning than a simple: "ALL mailers in the ENTIRE system are all the one mailer for the one agent that will ever run".
A trivial way to do this is to pass the map in as argument. Alternatively, have the map be part of the agent, and pass the agent to the mailer constructor, and have the mailer ask the agent for the map every time.
this is not thread safe
Thread safety is a crucial concept to get right, because the failure mode if you get it wrong is extremely annoying: It may or may not work, and the JVM is free to base whether it'll work right this moment or won't work on the phase of the moon or the flip of a coin: The JVM is given room to do whatever it feels like it needs to, in order to have a JVM that can make full use of the CPU's powers regardless of which CPU and operating system your app is running on.
Your code is not thread safe.
In any given moment, if 2 threads are both referring to the same field, you've got a problem: You need to ensure that this is done 'safely', and the compiler nor the runtime will throw errors if you fail to do this, but you will get bizarre behaviour because the JVM is free to give you caches, refuse to synchronize things, make ghosts of data appear, and more.
In this case the fix is near-trivial: Use java.util.concurrent.ConcurrentHashMap instead, that's all you'd have to do to make this safe.
Whenever you're interacting with a field that doesn't have a convenient 'typesafe' type, or you're messing with the field itself (one thread assigns a new value to the field, another reads it - you don't do that here, there is just the one field that always points at the same map, but you're messing with the map) - you need to use synchronized and/or volatile and/or locks from the java.util.concurrent package and in general it gets very complicated. Concurrent programming is hard.
I was able to solve this by changing the mailer to static in the Agent class

java - fastest way to iterate through List<Map> and compare contents of Map to known data

I have a small java class that will be run on a cloud application server, so it needs to execute as fast as possible.
What I need to do is iterate over a List<Map>, get the contents of the current Map and then do some logic on it. The Map contains strings and doubles.
This is my current implementation:
for (int i = 0; i < count; i++) {
Map data = result.get(i);
double priceToCheck = Double.parseDouble(data.get("value").toString());
String criteria = data.get("criteria").toString();
String coin = data.get("coin").toString();
if (coin.equals("BTC")) {
if (criteria.equals("above")) {
if (BTC > priceToCheck) {
// create notficaition
sendNotification = true;
}
} else {
if (BTC < priceToCheck) {
// create notification
sendNotification = true;
}
}
} else if (coin.equals("BCH")) {
if (criteria.equals("above")) {
if (BCH > priceToCheck) {
// create notficaition
sendNotification = true;
}
} else {
if (BCH < priceToCheck) {
// create notification
sendNotification = true;
}
}
} else if (coin.equals("ETH")) {
if (criteria.equals("above")) {
if (ETH > priceToCheck) {
// create notficaition
sendNotification = true;
}
} else {
if (ETH < priceToCheck) {
// create notification
sendNotification = true;
}
}
} else if (coin.equals("ETC")) {
if (criteria.equals("above")) {
if (ETC > priceToCheck) {
// create notficaition
sendNotification = true;
}
} else {
if (ETC < priceToCheck) {
// create notification
sendNotification = true;
}
}
} else if (coin.equals("LTC")) {
if (criteria.equals("above")) {
if (LTC > priceToCheck) {
// create notficaition
sendNotification = true;
}
} else {
if (LTC < priceToCheck) {
// create notification
sendNotification = true;
}
}
} else if (coin.equals("XRP")) {
if (criteria.equals("above")) {
if (XRP > priceToCheck) {
// create notficaition
sendNotification = true;
}
} else {
if (XRP < priceToCheck) {
// create notification
sendNotification = true;
}
}
}
Where result is a List<Map>, "BTC" is a string and BTC is a double
As you can see the highest level if statements checks the string coin, there are six possible values. Once the coin is found I determine the value of criteria, and then do a comparison of doubles depending on the value of criteria
I feel as though this is a very cumbersome way of accomplishing this task, it works but its relatively slow. I can't think of a way to speed it up without directly accessing every Map element and manually checking the contents.
Does anyone else have any ideas?
I have a small java class that will be run on a cloud application server, so it needs to execute as fast as possible.
First of all, there there are some assumptions in there that is doubtful.
Yes, it is nice for your code to run as fast as possible, but in most cases it is not necessary. And in particular, the fact that you running on a cloud server does not necessarily make it essential.
You assume your application is not running fast enough. (Have you benchmarked it?)
You assume that this part of the code is (or will be) responsible for the code being too slow. (Have you implemented it? Have you profiled it?)
And this:
I feel as though this is a very cumbersome way of accomplishing this task, it works but its relatively slow.
Cumbersome and slow are not the same. Often cumbersome verbose / clunky is faster than concise / elegant.
So now to the potential performance issues with your code. (Bearing in mind that this could all be irrelevant of your assumptions are incorrect!)
If the fields are / can be known at compile time, is better to use a custom class than a Map. The Map::get method will be orders of magnitude slower than a getter on a custom class, and a Map will typically use an order of magnitude more memory.
A custom class will also allow you to use primitive types, etcetera instead of shoehorning the values into String. Avoiding that will have performance benefits too.
This is slow:
double priceToCheck = Double.parseDouble(data.get("value").toString());
You appear to be taking a double (or a Double), converting it to a string and then converting it back to adouble`. Conversions between numbers and decimal strings are relatively expensive.
If the values of coin and criteria are known at compile time, consider using an enum or boolean rather than a String. Comparison will be faster, and you will be able to use a switch statement ... or a simple if in the boolean case.
In fact, the iteration through the list is one aspect where this not a lot of opportunity to optimize.
There is one optimization that really stands out here. Your original code snippit will iterate through your entire list to adjust your sendNotification boolean.
If setting sendNotification all you need to do, you can strategically place break in each of your conditional. This will short-circuit the loop after the desired behavior. In the worst case scenario you would end up iterating through your entire list.
for (int i = 0; i < count; i++) {
if (coin.equals("BTC")) {
if (criteria.equals("above")) {
if (BTC > priceToCheck) {
// create notficaition
sendNotification = true;
break; // exits the loop
}
} else {
if (BTC < priceToCheck) {
// create notification
sendNotification = true;
break;
}
}
} else if (coin.equals("BCH")) {
if (criteria.equals("above")) {
if (BCH > priceToCheck) {
// create notficaition
sendNotification = true;
break;
}

Compose variable number of ListenableFuture

I'm quite new to Futures and am stuck on chaining calls and create a list of objects. I'm using Android, API min is 19.
I want to code the method getAllFoo() below:
ListenableFuture<List<Foo>> getAllFoo() {
// ...
}
I have these 2 methods available:
ListenableFuture<Foo> getFoo(int index) {
// gets a Foo by its index
}
ListenableFuture<Integer> getNbFoo() {
// gets the total number of Foo objects
}
Method Futures.allAsList() would work nicely here, but my main constraint is that each call to getFoo(int index) cannot occur until the previous one is completed.
As far as I understand it (and tested it), Futures.allAsList() "fans-out" the calls (all the calls start at the same time), so I can't use something like that:
ListenableFuture<List<Foo>> getAllFoo() {
// ...
List<ListenableFuture<Foo>> allFutureFoos = new ArrayList<>();
for (int i = 0; i < size; i++) {
allFutureFoos.add(getFoo(i));
}
ListenableFuture<List<Foo>> allFoos = Futures.allAsList(allFutureFoos);
return allFoos;
}
I have this kind of (ugly) solution (that works):
// ...
final SettableFuture<List<Foo>> future = SettableFuture.create();
List<Foo> listFoos = new ArrayList<>();
addApToList(future, 0, nbFoo, listFoos);
// ...
private ListenableFuture<List<Foo>> addFooToList(SettableFuture future, int idx, int size, List<Foo> allFoos) {
Futures.addCallback(getFoo(idx), new FutureCallback<Foo>() {
#Override
public void onSuccess(Foo foo) {
allFoos.add(foo);
if ((idx + 1) < size) {
addFooToList(future, idx + 1, size, allFoos);
} else {
future.set(allFoos);
}
}
#Override
public void onFailure(Throwable throwable) {
future.setException(throwable);
}
});
return future;
}
How can I implement that elegantly using ListenableFuture ?
I found multiple related topics (like this or that), but these are using "coded" transform, and are not based on a variable number of transformations.
How can I compose ListenableFutures and get the same return value as Futures.allAsList(), but by chaining calls (fan-in)?
Thanks !
As a general rule, it's better to chain derived futures together with transform/catching/whennAllSucceed/whenAllComplete than with manual addListener/addCallback calls. The transformation methods can do some more for you:
present fewer opportunities to forget to set an output, thus hanging the program
propagate cancellation
avoid retaining memory longer than needed
do tricks to reduce the chance of stack overflows
Anyway, I'm not sure there's a particularly elegant way to do this, but I suggest something along these lines (untested!):
ListenableFuture<Integer> countFuture = getNbFoo();
return countFuture.transformAsync(
count -> {
List<ListenableFuture<Foo>> results = new ArrayList<>();
ListenableFuture<?> previous = countFuture;
for (int i = 0; i < count; i++) {
final int index = i;
ListenableFuture<Foo> current = previous.transformAsync(
unused -> getFoo(index),
directExecutor());
results.add(current);
previous = current;
}
return allAsList(results);
},
directExecutor());

How to remove elements from a queue in Java with a loop

I have a data structure like this:
BlockingQueue mailbox = new LinkedBlockingQueue();
I'm trying to do this:
for(Mail mail: mailbox)
{
if(badNews(mail))
{
mailbox.remove(mail);
}
}
Obviously the contents of the loop interfere with the bounds and a error is triggered, so I would normally do this:
for(int i = 0; i < mailbox.size(); i++)
{
if(badNews(mailbox.get(i)))
{
mailbox.remove(i);
i--;
}
}
But sadly BlockingQueue's don't have a function to get or remove an element by index, so I'm stuck. Any ideas?
Edit - A few clarifications:
One of my goals is the maintain the same ordering so popping from the head and putting it back into the tail is no good. Also, although no other threads will remove mail from a mailbox, they will add to it, so I don't want to be in the middle of an removal algorithm, have someone send me mail, and then have an exception occur.
Thanks in advance!
You may p̶o̶p̶ poll and p̶u̶s̶h̶ offer all the elements in your queue until you make a complete loop over your queue. Here's an example:
Mail firstMail = mailbox.peek();
Mail currentMail = mailbox.pop();
while (true) {
//a base condition to stop the loop
Mail tempMail = mailbox.peek();
if (tempMail == null || tempMail.equals(firstMail)) {
mailbox.offer(currentMail);
break;
}
//if there's nothing wrong with the current mail, then re add to mailbox
if (!badNews(currentMail)) {
mailbox.offer(currentMail);
}
currentMail = mailbox.poll();
}
Note that this approach will work only if this code is executed in a single thread and there's no other thread that removes items from this queue.
Maybe you need to check if you really want to poll or take the elements from the BlockingQueue. Similar for offer and put.
More info:
Java BlockingQueue take() vs poll()
LinkedBlockingQueue put vs offer
Another less buggy approach is using a temporary collection, not necessarily concurrent, and store the elements you still need in the queue. Here's a kickoff example:
List<Mail> mailListTemp = new ArrayList<>();
while (mailbox.peek() != null) {
Mail mail = mailbox.take();
if (!badNews(mail)) {
mailListTemp.add(mail);
}
}
for (Mail mail : mailListTemp) {
mailbox.offer(mail);
}
I looked over the solutions posted and I think I found a version that serves my purposes. What do you think about this one?
int size = mailbox.size();
for(int i = 0; i < size; i++)
{
Mail currentMail = mailbox.poll();
if (!badNews(currentMail))
mailbox.offer(currentMail);
}
Edit: A new solution that may be problem free. What you guys think?
while(true)
{
boolean badNewRemains = false;
for(Mail mail: mailbox)
{
if(badNews(mail))
{
badNewRemains = true;
mailbox.remove(mail);
break;
}
}
if(!badNewRemains)
break;
}
You can easily implement queue for your need. And you will need to, if API provided doesn't have such features.
One like:
import java.util.Iterator;
import java.util.LinkedList;
class Mail {
boolean badMail;
}
class MailQueue {
private LinkedList<Mail> backingQueue = new LinkedList<>();
private final Object lock = new Object();
public void push(Mail mail){
synchronized (lock) {
backingQueue.addLast(mail);
if(backingQueue.size() == 1){
// this is only element in queue, i.e. queue was empty before, so invoke if any thread waiting for mails in queue.
lock.notify();
}
}
}
public Mail pop() throws InterruptedException{
synchronized (lock) {
while(backingQueue.isEmpty()){
// no elements in queue, wait.
lock.wait();
}
return backingQueue.removeFirst();
}
}
public boolean removeBadMailsInstantly() {
synchronized (lock) {
boolean removed = false;
Iterator<Mail> iterator = backingQueue.iterator();
while(iterator.hasNext()){
Mail mail = iterator.next();
if(mail.badMail){
iterator.remove();
removed = true;
}
}
return removed;
}
}
}
The implemented queue will be thread-safe, whether push or pop. Also you can edit queue for more operations. And it will allow to access removeBadMailsInstantly method by multiple threads (thread-safe). And you will also learn concepts of multithreading.

Implementing Threads Into Java Web Crawler

Here is the original web crawler in which i wrote: (Just for reference)
https://github.com/domshahbazi/java-webcrawler/tree/master
This is a simple web crawler which visits a given initial web page, scrapes all the links from the page and adds them to a Queue (LinkedList), where they are then popped off one by one and each visited, where the cycle starts again. To speed up my program, and for learning, i tried to implement using threads so i could have many threads operating at once, indexing more pages in less time. Below is each class:
Main class
public class controller {
public static void main(String args[]) throws InterruptedException {
DataStruc data = new DataStruc("http://www.imdb.com/title/tt1045772/?ref_=nm_flmg_act_12");
Thread crawl1 = new Crawler(data);
Thread crawl2 = new Crawler(data);
crawl1.start();
crawl2.start();
}
}
Crawler Class (Thread)
public class Crawler extends Thread {
/** Instance of Data Structure **/
DataStruc data;
/** Number of page connections allowed before program terminates **/
private final int INDEX_LIMIT = 10;
/** Initial URL to visit **/
public Crawler(DataStruc d) {
data = d;
}
public void run() {
// Counter to keep track of number of indexed URLS
int counter = 0;
// While URL's left to visit
while((data.url_to_visit_size() > 0) && counter<INDEX_LIMIT) {
// Pop next URL to visit from stack
String currentUrl = data.getURL();
try {
// Fetch and parse HTML document
Document doc = Jsoup.connect(currentUrl)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36")
.referrer("http://www.google.com")
.timeout(12000)
.followRedirects(true)
.get();
// Increment counter if connection to web page succeeds
counter++;
/** .select returns a list of elements (links in this case) **/
Elements links = doc.select("a[href]"); // Relative URL
// Add newly found links to stack
addLinksToQueue(links);
} catch (IOException e) {
//e.printStackTrace();
System.out.println("Error: "+currentUrl);
}
}
}
public void addLinksToQueue(Elements el) {
// For each element in links
for(Element e : el) {
String theLink = e.attr("abs:href"); // 'abs' prefix ensures absolute url is returned rather then relative url ('www.reddit.com/hello' rather then '/hello')
if(theLink.startsWith("http") && !data.oldLink(theLink)) {
data.addURL(theLink);
data.addVisitedURL(theLink); // Register each unique URL to ensure it isnt stored in 'url_to_visit' again
System.out.println(theLink);
}
}
}
}
DataStruc Class
public class DataStruc {
/** Queue to store URL's, can be accessed by multiple threads **/
private ConcurrentLinkedQueue<String> url_to_visit = new ConcurrentLinkedQueue<String>();
/** ArrayList of visited URL's **/
private ArrayList<String> visited_url = new ArrayList<String>();
public DataStruc(String initial_url) {
url_to_visit.offer(initial_url);
}
// Method to add seed URL to queue
public void addURL(String url) {
url_to_visit.offer(url);
}
// Get URL at front of queue
public String getURL() {
return url_to_visit.poll();
}
// URL to visit size
public int url_to_visit_size() {
return url_to_visit.size();
}
// Add visited URL
public void addVisitedURL(String url) {
visited_url.add(url);
}
// Checks if link has already been visited
public boolean oldLink(String link) {
for(String s : visited_url) {
if(s.equals(link)) {
return true;
}
}
return false;
}
}
DataStruc is the shared data structure class, which will be concurrently accessed by each instance of a Crawler.java thread. DataStruc has a Queue to store links to be visited, and an arraylist to store visited URL's, to prevent entering a loop. I used a ConcurrentLinkedQueue to store the urls to be visited, as i see it takes care of concurrent access. I didnt require concurrent access with my arraylist of visited urls, as all i need to be able to do is add to this and iterate over it to check for matches.
My problem is that when i compare operation time of using a single thread VS using 2 threads (on the same URL), my single threaded version seems to work faster. I feel i have implemented the threading incorrectly, and would like some tips if anybody can pinpoint the issues?
Thanks!
Added: see my comment, I think the check in Crawler
// While URL's left to visit
while((data.url_to_visit_size() > 0) && counter<INDEX_LIMIT) {
is wrong. The 2nd Thread will stop immediately since the 1st Thread polled the only URL.
You can ignore the remaining, but left for history ...
My general approach to such types of "big blocks that can run in parallel" is:
Make each crawler a Callable. Probably Callable<List<String>>
Submit them to an ExecutorService
When they complete, take the results one at a time and add them to a List.
Using this strategy there is no need to use any concurrent lists at all. The disadvantage is that you don't get much live feedback as they are runnìng. And, if what they return is huge, you may need to worry about memory.
Would this suit your needs? You would have to worry about the addVisitedURL so you still need that as a concurrent data structure.
Added: Since you are starting with a single URL this strategy doesn't apply. You could apply it after the visit to the first URL.
class controller {
public static void main(String args[]) throws InterruptedException {
final int LIMIT = 4;
List<String> seedList = new ArrayList<>(); //1
seedList.add("https://www.youtube.com/");
seedList.add("https://www.digg.com/");
seedList.add("https://www.reddit.com/");
seedList.add("https://www.nytimes.com/");
DataStruc[] data = new DataStruc[LIMIT];
for(int i = 0; i < LIMIT; i++){
data[i] = new DataStruc(seedList.get(i)); //2
}
ExecutorService es = Executors.newFixedThreadPool(LIMIT);
Crawler[] crawl = new Crawler[LIMIT];
for(int i = 0; i < LIMIT; i++){
crawl[i] = new Crawler(data[i]); //3
}
for(int i = 0; i < LIMIT; i++){
es.submit(crawl[i]) // 4
}
}
}
you can try this out
create a seedlist
create objects of datastruc and add the seedlist to each of them
create crawl array and pass datastruc object to them one by one
pass the crawl object to the excutor

Categories

Resources