Round Robin for choosing a server - java

I do have access to few Servers say A, B, C, D and E. I'd like to choose data from these servers one by one in a round robin way. I am new to Java and threads, It would be a great help if you could help me with this.
What I am trying to do is to load a map in my application, I send HTTP requests to the servers. These Servers revert response in Bitmap format, I arrange these Images (Tiles) and show it in my application, but I am doing it it sequentially. E.g. I request Server A first to get the tiles then Server B and so on..I would like to get the tiles in such a way that Server A downloads one Image, Server B does the other. If I if I'd be doing it all alone using one server without using Multithreading it would take a long time to display whole Map.

Create a url builder which has the base urls of each server in an array and also keeps track of which server was hit last time. Next time you need data, just return the base url of the next server.

use modulo see example:
(used String as the url)
public static final int MAX_SERVER = 4;
public static void main(String[] args)
{
String urlarr[] = new String[MAX_SERVER];
init(urlarr);
int idx = 0;
while(idx < 1000){
String next = urlarr[idx++%urlarr.length];
System.out.println(next);
}
}
private static void init(String[] urlarr)
{
for(int i=0 ; i<urlarr.length ; i++){
urlarr[i] = "url("+i+")";
}
}
using module size of array on idx make it iterates over all available indexes 0,1,2,3 in this case.
part of output:
url(0)
url(1)
url(2)
url(3)
url(0)
url(1)
url(2)
url(3)

Related

How to do the same work multiple times using Dataflow?

I have some work that needs to be done repeatedly. For example, let's say I want to roll 2000 dice and collect the result. The caveat is the dice throw depends on PCollection How can this be done with Dataflow?
I've tried using a PCollectionList but the result is that my Dataflow is too large to start (> 10 MB). Here is an example of what I'd like to do (using PCollectionList):
// I'd like to operate on things 2000 times.
PCollection<Thing> things = ...;
List<PCollection<ModifiedThing>> modifiedThingsList = new ArrayList<>();
for (int i = 0; i < 2000; ++i) {
modifiedThingsList.add(things.apply(ParDo.of(thing -> modify(thing)));
}
PCollection<ModifiedThing> modifiedThings = PCollectionList.of(modifiedThingsList).apply(Flatten.pCollections());
Because the JSON representation of the above graph is too large for Dataflow, I need a different way of representing this logic. Any ideas?
ParDo or FlatMapElements can return an arbitrarily large number of outputs per input. For example:
PCollection<ModifiedThing> modifiedThings = things.apply(
ParDo.of(new DoFn<Thing, ModifiedThing>() {
public void processElement(ProcessContext c) {
for (int i = 0; i < 2000; ++i) {
c.output(modify(c.element()));
}
}
}));
Caveat: If you're going to immediately apply other ParDos to modifiedThings, be careful with fusion, since 2000 is a pretty high fan-out factor. A good example code snippet for preventing fusion is here.

Need some help for deeplearning4j single RBM usage

I have a bunch of sensors and I really just want to reconstruct the input.
So what I want is this:
after I have trained my model I will pass in my feature matrix
get the reconstructed feature matrix back
I want to investigate which sensor values are completely different from the reconstructed value
Therefore I thought a RBM will be the right choice and since I am used to Java, I have tried to use deeplearning4j. But I got stuck very early. If you run the following code, I am facing 2 problems.
The result is far away from a correct prediction, most of them are simply [1.00,1.00,1.00].
I would expect to get back 4 values (which is the number of inputs expected to be reconstructed)
So what do I have to tune to get a) a better result and b) get the reconstructed inputs back?
public static void main(String[] args) {
// Customizing params
Nd4j.MAX_SLICES_TO_PRINT = -1;
Nd4j.MAX_ELEMENTS_PER_SLICE = -1;
Nd4j.ENFORCE_NUMERICAL_STABILITY = true;
final int numRows = 4;
final int numColumns = 1;
int outputNum = 3;
int numSamples = 150;
int batchSize = 150;
int iterations = 100;
int seed = 123;
int listenerFreq = iterations/5;
DataSetIterator iter = new IrisDataSetIterator(batchSize, numSamples);
// Loads data into generator and format consumable for NN
DataSet iris = iter.next();
iris.normalize();
//iris.scale();
System.out.println(iris.getFeatureMatrix());
NeuralNetConfiguration conf = new NeuralNetConfiguration.Builder()
// Gaussian for visible; Rectified for hidden
// Set contrastive divergence to 1
.layer(new RBM.Builder()
.nIn(numRows * numColumns) // Input nodes
.nOut(outputNum) // Output nodes
.activation("tanh") // Activation function type
.weightInit(WeightInit.XAVIER) // Weight initialization
.lossFunction(LossFunctions.LossFunction.XENT)
.updater(Updater.NESTEROVS)
.build())
.seed(seed) // Locks in weight initialization for tuning
.iterations(iterations)
.learningRate(1e-1f) // Backprop step size
.momentum(0.5) // Speed of modifying learning rate
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) // ^^ Calculates gradients
.build();
Layer model = LayerFactories.getFactory(conf.getLayer()).create(conf);
model.setListeners(Arrays.asList((IterationListener) new ScoreIterationListener(listenerFreq)));
model.fit(iris.getFeatureMatrix());
System.out.println(model.activate(iris.getFeatureMatrix(), false));
}
For b), when you call activate(), you get a list of "nlayers" arrays. Every array in the list is the activation for one layer. The array itself is composed of rows: 1 row per input vector; each column contains the activation for every neuron in this layer and this observation (input).
Once all layers have been activated with some input, you can get the reconstruction with the RBM.propDown() method.
As for a), I'm afraid it's very tricky to train correctly an RBM.
So you really want to play with every parameter, and more importantly,
monitor during training various metrics that will give you some hint about whether it's training correctly or not. Personally, I like to plot:
The score() on the training corpus, which is the reconstruction error after every gradient update; check that it decreases.
The score() on another development corpus: useful to be warned when overfitting occurs;
The norm of the parameter vector: it has a large impact on the score
Both activation maps (= XY rectangular plot of the activated neurons of one layer over the corpus), just after initialization and after N steps: this helps detecting unreliable training (e.g.: when all is black/white, when a large part of all neurons are never activated, etc.)

ZeroMQ: Disappearing messages

We have a Java application which is acting as a server. Client applications (written in C#) are communicating with it using ZeroMQ. We are (mostly) following the Lazy Pirate pattern.
The server has a Router socket, implemented as follows (using JeroMQ):
ZContext context = new ZContext();
Socket socket = context.createSocket(ZMQ.ROUTER);
socket.bind("tcp://*:5555");
The clients connect and send messages like this:
ZContext context = ZContext.Create();
ZSocket socket = ZSocket.Create(context, ZSocketType.REQ);
socket.Identity = Encoding.UTF8.GetBytes("Some identity");
socket.Connect("tcp://my_host:5555");
socket.Send(new ZFrame("request data"));
We have experienced lost messages when multiple clients are sending messages at the same time. With a single client, there doesn't appear to be any problem.
Are we implementing this the right way for a multiple-client-single-server setup?
Update: Example client and server exhibiting this behaviour:
Server:
import org.zeromq.ZContext;
import org.zeromq.ZMQ;
import org.zeromq.ZMQ.PollItem;
import org.zeromq.ZMQ.Poller;
import org.zeromq.ZMQ.Socket;
import org.zeromq.ZMsg;
public class SimpleServer
{
public static void main(String[] args) throws InterruptedException
{
ZContext context = new ZContext();
Socket socket = context.createSocket(ZMQ.ROUTER);
socket.setRouterMandatory(true);
socket.bind("tcp://*:5559");
PollItem pollItem = new PollItem(socket, Poller.POLLIN);
int messagesReceived = 0;
int pollCount = 0;
while ((pollCount = ZMQ.poll(new PollItem[]{pollItem}, 3000)) > -1)
{
messagesReceived += pollCount;
for (int i = 0 ; i < pollCount ; i++)
{
ZMsg msg = ZMsg.recvMsg(socket);
System.out.println(String.format("Received message: %s. Total messages received: %d", msg, messagesReceived));
}
if (pollCount == 0)
{
System.out.println(String.format("No messages on socket. Total messages received: %d", messagesReceived));
}
}
}
}
Client:
using NetMQ;
using System;
using System.Text;
namespace SimpleClient
{
class Program
{
static byte[] identity = Encoding.UTF8.GetBytes("id" + DateTime.UtcNow.Ticks);
static void Main(string[] args)
{
for (int i = 0; i < 100; i++)
{
SendMessage();
}
}
private static void SendMessage()
{
using (NetMQContext context = NetMQContext.Create())
{
using (NetMQSocket socket = context.CreateRequestSocket())
{
socket.Options.Identity = identity;
socket.Connect("tcp://localhost:5559");
socket.Send(Encoding.UTF8.GetBytes("hello!"));
}
}
}
}
}
If I run the server and a single client, I can see all my 100 messages arrive. If I run, say, 5 clients simultaneously, I only get around 200 -> 300 messages arrive, instead of the full 500. As an aside, it appears that closing the socket in the client is somehow stopping the router socket on the server from receiving messages briefly, although this is just a theory.
Part 1 - poll may return more than one event
ZMQ.poll() returns the number of events that were found:
int rc = ZMQ.poll(new PollItem[]{pollItem}, 3000);
You currently assume that one return from poll is one event. Instead, you should loop over ZMsg msg = ZMsg.recvMsg(socket); for the number of events that are indicated by the return of ZMQ.Poll().
From the source of JeroMQ:
/**
* Polling on items. This has very poor performance.
* Try to use zmq_poll with selector
* CAUTION: This could be affected by jdk epoll bug
*
* #param items
* #param timeout
* #return number of events
*/
public static int zmq_poll(PollItem[] items, long timeout)
{
return zmq_poll(items, items.length, timeout);
}
Part 2 - ZMsg.receive() may return multiple frames
When you receive a ZMsg from ZMsg msg = ZMsg.recvMsg(socket);, the ZMsg may contain multiple ZFrames, each containing client data.
From the comments of the ZMsg class in JeroMQ's source:
* // Receive message from ZMQSocket "input" socket object and iterate over frames
* ZMsg receivedMessage = ZMsg.recvMsg(input);
* for (ZFrame f : receivedMessage) {
* // Do something with frame f (of type ZFrame)
* }
Part 3 - messages can be split across multiple ZFrames
From ZFrame's source in JeroMQ:
* The ZFrame class provides methods to send and receive single message
* frames across 0MQ sockets. A 'frame' corresponds to one underlying zmq_msg_t in the libzmq code.
* When you read a frame from a socket, the more() method indicates if the frame is part of an
* unfinished multipart message.
If I'm understanding this correctly, then for each event you may get multiple frames, and one client message may map to 1..N frames (if the message is big?).
So to summarize:
One return from poll may indicate multiple events.
One event and thus one ZMsg.receive() may contain multiple frames
One frame could contain one complete client message or only part of a client message; one client message maps to 1..N frames.
Unfortunately we couldn't solve this particular issue, and have moved away from using ZeroMQ for this interface. In case it helps anyone else, the only things we worked out for definite is that rapidly opening/closing the request sockets caused undesirable behaviour (lost messages) on the router socket end. The problem was exacerbated by a poorly performing server CPU, and didn't appear at all when the server was on a fast multi-core machine.
Unfortunatley I was not even close working with ZMQ at the time this question was active. But I had the same problem today and found this page. And your answer (not using ZMQ) was not satisfying for me. So I searched a bit more and finally found out what to do.
Just as a reminder: this works with the "POLLER" in ZMQ [1]
If you use "PAIR" connection you will for sure do NOT lose nay files, BUT send/recive takes approx. the same time. So you can not speed up and was not a solution for me.
Solution:
in zmq_setsockopt (python: zmq.setsockopt) you can set ZMQ_HWM (zmq.SNDHWM, zmq.RCVHWM) to '0' [2]
in python: sock.setsockopt(zmq.SNDHWM , 0) resp. sock.setsockopt(zmq.RCVHWM, 0) for the Sender resp. Reciver
note: i think notation changed from HWM to SNDWHM/RCVHWM
HWM = 0 means that there is "NO limit" for the number of messages (so be careful, maybe set a (hvery high) limit)
there is also ZMQ_SNDBUF/ ZMQ_RCVBUF (python: zmq.SNDBUF/zmq.RCVBUF) which you can give as well, ie. sock.setsockopt(zmq.RCVBUF, 0) resp. ..... [2]
so this will set the operating system "SO_RCVBUF" to default (here my knowledge ends)
setting this parameter or not did NOT influence my case but I think it might
Performance:
So with this I could "send" 100'000 files with 98kB in ~8s (~10GB): this will fill your RAM (if this is full I think your program will slow down), see also picture
in the mean time I "recived" and saved the files in about ~enter image description here118s and freeing the RAM again
Also, with that I NERVER lost a file up to now. (you might if you hit the limits of your PC)
data loss is "GOOD":
if you realy NEED all the data you should use this method
if you can regard some losses are fine (e.g. live plotting: as long as your FPS > ~50 you will smoothly see the plots and you do not care if you lose something)
--> you can save RAM and avoid blocking your whole PC!
Hope this post helps for the next person coming by...
[1]: https://learning-0mq-with-pyzmq.readthedocs.io/en/latest/pyzmq/multisocket/zmqpoller.htm
[2]: http://api.zeromq.org/2-1:zmq-setsockopt
You find a Picture of the RAM her:
RAM is loading in about 8s. Afterwords the disk is saving the files from the buffer

native errors on DatagramChannel send

Basic
I have an app that is sending packets using DatagramChannel.send in multiple threads each to its own IP address/port and each of them keeping constant bit-rate/bandwidth. Every now and then I get this error:
java.net.SocketException: Invalid argument: no further information
at sun.nio.ch.DatagramChannelImpl.send0(Native Method)
at sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(Unknown Source)
at sun.nio.ch.DatagramChannelImpl.send(Unknown Source)
at sun.nio.ch.DatagramChannelImpl.send(Unknown Source)
...
It happens on random - sometimes 5 minutes after start sometimes after a day - so I really have problems reproducing it for testing. And on my home machine I can't reproduce it at all.
Environments
Windows 7, 8 and Server 2012 (all 64bit)
64bit Java 7 update 45
More information
The app is sending SI/EIT data to DVB-C network. I'm creating a list of 188-byte arrays for each of 80-120 threads and giving it to use. The thread takes the list and is looping over the list until new list is provided.
The error usually happens on multiple channels at once. But it can happen on just one also.
The error never happened until we had 40+ threads.
The error happens while looping over the list, not when I'm binding new list to thread.
The app it not running out of memory. Its usually running up to 70% of memory given to JVM.
Strange part: If I run multiple instance of app each handling ~10 threads problems are the same.
Simplified code sample
for(int i = 0; i < 100; ++i) {
final int id = i;
new Thread(new Runnable() {
#Override
public void run() {
final Random r = new Random();
final List<byte[]> buffer = Lists.newArrayList();
for(int i = 0; i < 200; ++i) {
final byte[] temp = new byte[188];
r.nextBytes(temp);
buffer.add(temp);
}
final SocketAddress target = new InetSocketAddress("230.0.0.18", 1000 + id);
try (final DatagramChannel channel = DatagramChannel.open(StandardProtocolFamily.INET)) {
channel.configureBlocking(false);
channel.setOption(StandardSocketOptions.IP_MULTICAST_IF, NetworkInterface.getByName("eth0"));
channel.setOption(StandardSocketOptions.IP_MULTICAST_TTL, 8);
channel.setOption(StandardSocketOptions.SO_REUSEADDR, true);
channel.setOption(StandardSocketOptions.SO_SNDBUF, 1024 * 64);
int counter = 0;
int index = 0;
while(true) {
final byte[] item = buffer.get(index);
channel.send(ByteBuffer.wrap(item), target);
index = (index + 1) % buffer.size();
counter++;
Thread.sleep(1);
}
}
catch(Exception e) {
LOG.error("Fail at " + id, e);
}
}
}).start();
}
Edits:
1) #EJP: I'm setting setting multicast properties as the actual app that I use was doing joins (and reading some data). But the problems persisted even after I removed them.
2) Should I be using some other API if I just need to send UDP packets? All the samples I could find use DatagramChannel (or its older alternative).
3) I'm still stuck with this. If anyone has an idea what can I even try, please let me know.
I had exactly the same problem, and it was caused by a zero port in the target InetSocketAddress, when calling the send method.
In your code, the target port is defined as 1000 + i, so it doesn't seem to be the problem. Anyway, I'd log the target parameters that are used when the exception is thrown, just in case.

Calculating the number of primes in a range RMI version vs concurrent version

I have two versions of a program with the same purpose: to calculate how many prime numbers there are between 0 and n.
The first version uses concurrency, a Callable class "does the math" and the results are retrieved though a Future array. There are as many created threads as processors in my computer (4).
The second version is implemented via RMI. All four servers are registered in the local host. The servers are working in paralell as well, obviously.
I would expect the second version to be slower than the first, because I guess the network would involve latency and the other version would just run the program concurrently.
However, the RMI version is around twice faster than the paralel version... Why is this happening?!
I didn't paste any code because it'd be huge, but ask for it in case you need it and I'll see what I can do...
EDIT: adding the code. I commented the sections where unnecessary code was to be posted.
Paralell version
public class taskPrimes implements Callable
{
private final long x;
private final long y;
private Long total = new Long(0);
public taskPrimes(long x, long y)
{
this.x = x;
this.y = y;
}
public static boolean isPrime(long n)
{
if (n<=1) return false ;
for (long i=2; i<=Math.sqrt(n); i++)
if (n%i == 0) return false;
return true;
}
public Long call()
{
for (long i=linf; i<=lsup;i++)
if (isPrime(i)) total++;
return total;
}
}
public class paralelPrimes
{
public static void main(String[] args) throws Exception
{
// here some variables...
int nTasks = Runtime.getRuntime().availableProcessors();
ArrayList<Future<Long>> partial = new ArrayList<Future<Long>>();
ThreadPoolExecutor ept = new ThreadPoolExecutor();
for(int i=0; i<nTasks; i++)
{
partial.add(ept.submit(new taskPrimes(x, y))); // x and y are the limits of the range
// sliding window here
}
for(Future<Long> iterator:partial)
try { total += iterator.get(); } catch (Exception e) {}
}
}
RMI version
Server
public class serverPrimes
extends UnicastRemoteObject
implements interfacePrimes
{
public serverPrimes() throws RemoteException {}
#Override
public int primes(int x, int y) throws RemoteException
{
int total = 0;
for(int i=x; i<=y; i++)
if(isPrime(i)) total++;
return total;
}
#Override
public boolean isPrime(int n) throws RemoteException
{
if (n<=1) return false;
for (int i=2; i<=Math.sqrt(n); i++)
if (n%i == 0) return false ;
return true;
}
public static void main(String[] args) throws Exception
{
interfacePrimes RemoteObject1 = new serverPrimes();
interfacePrimes RemoteObject2 = new serverPrimes();
interfacePrimes RemoteObject3 = new serverPrimes();
interfacePrimes RemoteObject4 = new serverPrimes();
Naming.bind("Server1", RemoteObject1);
Naming.bind("Server2", RemoteObject2);
Naming.bind("Server3", RemoteObject3);
Naming.bind("Server4", RemoteObject4);
}
}
Client
public class clientPrimes implements Runnable
{
private int x;
private int y;
private interfacePrimes RemoteObjectReference;
private static AtomicInteger total = new AtomicInteger();
public clientPrimes(int x, int y, interfacePrimes RemoteObjectReference)
{
this.x = x;
this.y = y;
this.RemoteObjectReference = RemoteObjectReference;
}
#Override
public void run()
{
try
{
total.addAndGet(RemoteObjectReference.primes(x, y));
}
catch (RemoteException e) {}
}
public static void main(String[] args) throws Exception
{
// some variables here...
int nServers = 4;
ExecutorService e = Executors.newFixedThreadPool(nServers);
double t = System.nanoTime();
for (int i=1; i<=nServers; i++)
{
e.submit(new clientPrimes(xVentana, yVentana, (interfacePrimes)Naming.lookup("//localhost/Server"+i)));
// sliding window here
}
e.shutdown();
while(!e.isTerminated());
t = System.nanoTime()-t;
}
}
One interesting thing to consider is that, by default, the jvm runs in client mode. This means that threads won't span over the cores in the most agressive way. Trying to run the program with -server option can influence the result although, as mentioned, the algorithm design is crucial the concurrent version may have bottlenecks. There is little chance that, given the problem, there is a bottleneck in your algorithm, but it sure needs to be considered.
The rmi version truly runs in parallel because each object runs on a different machine, since this tends to be a processing problem more than a communication problem then the latency plays a non important part.
[UPDATE]
Now that I saw your code lets get into some more details.
You are relying on the ThreadExecutorPool and Future to perform the thread control and synchronization for you, this means (by the documentation) that your running objects will be allocated on an existing thread and once your object finishes its computation the thread will be returned to that pool, on the other hand the Future will check periodically if the computation has finished so it can collect the values.
This scenario would be best fit for some computation that is performed periodically in a way that the ThreadPool could increase performance by having the threads pre-allocated (having the overhead of thread creation only on the first time the threads aren't there).
Your implementation is correct, but it is more centered on the programmer convinience (there is nothing wrong with this, I am always defending this point of view) than on system performance.
The RMI version performs differently due (mainly) of 2 things:
1 - you said you are running on the same machine, most OS will recognize localhost, 127.0.0.1 or even the real self ip address as being its self address and perform some optimizations on the communication, so little overhead from the network here.
2 - the RMI system will create a separate thread for each server object you created (as I mentioned before) and these servers will starting computing as soon as they get called.
Things you should try to experiment:
1 - Try to run your RMI version truly on a network, if you can configure it for 10Mbps would be better to see communication overhead (although, since it is a one shot communication it may have little impact to notice, you could chance you client application to call for the calculation multiple times, and then you see the lattency in the way)
2 - Try to change you parallel implementation to use Threads directly with no Future (you could use Thread.join to monitor execution end) and then use the -server option on the machine (although, sometimes, the JVM performs a check to see if the machine configuration can be truly said to be a server and will decline to move to that profile). The main problem is that if your threads doesn't get to use all the computer cores you won't see any performance improvent. Also try to perform the calculations many time to overcome the overhead of thread creation.
Hope that helps to elucidate the situation :)
Cheers
It depends on how your Algorithms are designed for parallel and concurrent solutions. There is no a criteria where parallel must be better than concurrent or viceversa. By example if your concurrent solution has many synchronized blocks it can drop your performance, in the other case maybe the communication in your parallel algorithm is minimum then there is no overhead on network.
If you can get a copy o the book of Peter Pacheco it can clear some ideas:http://www.cs.usfca.edu/~peter/ipp/
Given the details you provided, it will mostly depend on how large a range you're using, and how efficiently you distribute the work to the servers.
For instance, I'll bet that for a small range N you will probably have no speedup from distributing via RMI. In this case, the RMI (network) overhead will likely outweigh the benefit of distributing over multiple servers. When N becomes large, and with an efficient distribution algorithm, this overhead will become more and more negligible with regards to the actual computation time.
For example, assuming homogenous servers, a relatively efficient distribution could be to tell each server to compute the primes for all the numbers n such that n % P = i, where n <= N, P is the number of servers, i is an index in the range [0, P-1] assigned to each server, and % is the modulo operation.

Categories

Resources