Strange issue regarding for-comprehension - java

I'm a newbie to the whole Scala scene but so far have been loving the ride! However, I got stuck with an issue and haven't yet been able to grasp the reason...
I'm currently working with Kafka and was trying to read data from a topic and pass it around to somewhere else.
The problem is: the println in the inner for-comprehension outputs the lines on the bottom, as expected, but all other prinln's outside that inner for are skipped and the function ends up returning nothing at all (can't even issue a getClass in the test case!)... What might be causing it? I really ran out of ideas...
The related code:
def tryBatchRead(maxMessages: Int = 100, skipMessageOnError: Boolean = true): List[String] = {
var numMessages = 0L
var list = List[String]()
val iter = if (maxMessages >= 0) stream.slice(0, maxMessages) else stream
for (messageAndTopic <- iter) {
for (m <- messageAndTopic) {
println(m.offset.toString + " --- " + new String(m.message))
list = list ++ List(new String(m.message))
println("DEBUG " + list)
numMessages += 1
}
println("test1")
}
println("test2")
println("FINISH" + list)
connector.shutdown()
println("test3")
list
}
The output:
6 --- {"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}}
DEBUG List({"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}})
7 --- test 2
DEBUG List({"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}}, test 2)
8 --- {"StartSurvey":{"user":{"id":"6a736fdd-79a0-466a-9030-61b5ac3a3a0e","age":25,"sex":"M","location":"PT"}}}
DEBUG List({"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}}, test 2, {"StartSurvey":{"user":{"id":"6a736fdd-79a0-466a-9030-61b5ac3a3a0e","age":25,"sex":"M","location":"PT"}}})
Thanks for the help!

I'm not totally sure, but it's VERY likely that you block after reading last message awaiting next one to come (kafka streams are basically infinite). Configure timeout for kafka consumer, so it will give up if there is no message for some time. There is consumer.timeout.ms property for that (set it to 3000 ms for example), which will result in ConsumerTimeoutException once waiting limit is reached.
By the way, I would rewrite your code as:
def tryBatchRead(maxMessages: Int = 100): List[String] = {
// `.take` works fine if collection has less elements than max
val batchStream = stream.take(maxMessages)
// TODO: add try/catch section, according to the above comments
// in scala we usually write a single joined for, instead of multiple nested ones
val batch = for {
messageAndTopic <- batchStream.take(maxMessages)
msg <- messageAndTopic // are you sure you can iterate message and topic? 0_o
} yield {
println(m.offset.toString + " --- " + new String(m.message))
msg
}
println("Number of messages: " + batch.length)
// shutdown has to be done outside, it's bad idea to implicitly tear down streams in reading function
batch
}

I think that this is a normal behavior since you are doing a for over a stream which can be in theory of infinite size(so it will never end or can hang if it waits for results over an I.O ....).
IMHO I will rather write for (m <- messageAndTopic.take(maxMessages).toList) instead of for (m <- messageAndTopic)

Related

Setting Jenkins multijob build result with Groovy script based on % pass/fail child jobs

I have a Jenkins Multijob project with a very simple structure:
Multijob
childjob 1
childjob 2
childjob 3
childjob 4 etc...
I want to set the Multijob status as follows:
I want a green ball if all child jobs pass
I want a yellow ball if any are skipped OR < 25% fail
I want a red ball if >= 25% fail
I know I can use a Groovy post build action with a script such as that below, but I don't know how to set the required threshold levels:
void log(msg) {
manager.listener.logger.println(msg)
}
threshold = Result.SUCCESS
void aggregate_results() {
failed = false
mainJob = manager.build.getProject().getName()
job = hudson.model.Hudson.instance.getItem(mainJob)
log '-------------------------------------------------------------------------------------'
log 'Aggregated status report'
log '-------------------------------------------------------------------------------------'
log('${mainJob} #${manager.build.getNumber()} - ${manager.build.getResult()}')
job.getLastBuild().getSubBuilds().each { subBuild->
subJob = subBuild.getJobName()
subJobNumber = subBuild.getBuildNumber()
job = hudson.model.Hudson.instance.getItem(subBuild.getJobName())
log '${subJob} #${subJobNumber} - ${job.getLastCompletedBuild().getResult()}'
log job.getLastCompletedBuild().getLog()
//println subBuild
dePhaseJob = hudson.model.Hudson.instance.getItem(subBuild.getJobName())
dePhaseJobBuild = dePhaseJob.getBuildByNumber(subBuild.getBuildNumber())
dePhaseJobBuild.getSubBuilds().each { childSubBuild ->
try {
log ' ${childSubBuild.jobName}'
job = hudson.model.Hudson.instance.getItem(childSubBuild.getJobName())
build = job.getBuildByNumber(childSubBuild.getBuildNumber())
indent = ' '
log '${indent} #${build.getNumber()} - ${build.getResult()}'
log build.getLog()
if(!failed && build.getResult().isWorseThan(threshold) ) {
failed = true
}
} catch (Exception e) {
log('ERROR: ${e.getMessage()}')
failed = true
}
}
}
if(failed) {manager.build.setResult(hudson.model.Result.FAILURE)}
}
try {
aggregate_results()
} catch(Exception e) {
log('ERROR: ${e.message}')
log('ERROR: Failed Status report aggregation')
manager.build.setResult(hudson.model.Result.FAILURE)
}
Can anyone help tweak the script to achieve what I need?
Not sure if this really qualifies as an answer. Might be more of a comment but comments do not really lend themselves to long code snippets so here goes.
To make your code a tad more readable and easier to grok I did the following:
replaced all instances of getXX and setYY with groovy property access, eg build.getResult() becomes build.result
removed unnecessary use of parens in function calls, e.g. log('ERROR: ${e.getMessage()}') becomes log 'ERROR: ${e.getMessage()}'
replaced use of string interpolation in single quotes with double quotes since string interpolation does not work in single quotes. E.g. log 'ERROR: ${e.message}' becomes log "ERROR: ${e.message}"
switched from declaring all variables in the script global binding scope to local, e.g. subJob = ... becomes def subJob = .... Declaring everything in the global scope leads to hard-to-find issues, especially if you are re-using variable names like job.
I also cleaned out some reduncancies, an example:
job.getLastBuild().getSubBuilds().each { subBuild->
subJob = subBuild.getJobName()
subJobNumber = subBuild.getBuildNumber()
job = hudson.model.Hudson.instance.getItem(subBuild.getJobName()) // <---
...
//println subBuild
dePhaseJob = hudson.model.Hudson.instance.getItem(subBuild.getJobName()) // <---
dePhaseJobBuild = dePhaseJob.getBuildByNumber(subBuild.getBuildNumber())
so here we set both job and dePhaseJob to the same value. Assigning the same value to two separate variables like this is redundant and only makes the code harder to read.
Furthermore (and I'm not intimately familiar with the jenkins internal apis so I might be wrong here) the following flow in the above code seems off:
we have the subBuild instance
we then retrieve the corresponding job instance into both job and dePhaseJob
we then retrieve the build into dePhaseJobBuild using dePHaseJob.getBuildByNumber(subBuild.buildNumer)
but doesn't that leave us with subBuild == dePhaseJobBuild? I.e. we spent all this code just to retrieve a value we already had. We go from build to job and back to build. Unless I'm missing something esoteric in the jenkins apis this seems redunant as well.
With all those changes and a few other minor ones we und up with the following code:
def job(name) {
hudson.model.Hudson.instance.getItem(name)
}
def aggregateResults() {
def mainJobName = manager.build.project.name
log '-------------------------------------------------------------------------------------'
log 'Aggregated status report'
log '-------------------------------------------------------------------------------------'
log "${mainJobName} #${manager.build.number} - ${manager.build.result}"
def failed = false
job(mainJobName).lastBuild.subBuilds.each { subBuild ->
log "${subBuild.jobName} #${subBuild.buildNumber} - ${subBuild.result}"
log subBuild.log
subBuild.subBuilds.each { subSubBuild ->
try {
log " ${subSubBuild.jobName} #${subSubBuild.buildNumber} - ${subSubBuild.result}"
log " " + subSubBuild.getLog(Integer.MAX_VALUE).join("\n ") //indent the log lines
if(!failed && subSubBuild.result.isWorseThan(threshold)) {
failed = true
}
} catch (Exception e) {
log "ERROR: ${e.message}"
failed = true
}
}
}
if(failed) {
manager.build.result = hudson.model.Result.FAILURE
}
}
and again, I don't have a jenkins instance to test this on so I'm flying in the dark here and apologize in advance for misspellings, syntax fumbles or other abuse of the code and the jenkins apis.
The issues in your code (like the string interpolation one which I can't see ever having worked) makes me think that the original code was not working but rather an example pattern.
This makes me further wonder if you really need to do two levels of nesting here, i.e. is the following:
job(mainJobName).lastBuild.subBuilds.each { subBuild ->
subBuild.subBuilds.each { subSubBuild ->
...
}
}
really necessary or would one level be enough? From the quick graph in your question it would seem that we only need to care about the main job and its sub jobs, not sub-sub jobs.
If this is the case, you could get away with logic along the lines of:
def aggregateResults() {
def mainJob = job(manager.build.project.name)
def subs = mainJob.lastBuild.subBuilds
def total = subs.size()
def failed = subs.findAll { sub -> sub.result.isWorseThan(threshold) }.size()
if(failed > 0) {
manager.build.result = hudson.model.Result.FAILURE
}
failed == 0 ? "green" : (failed/total < 0.25 ? "yellow" : "red")
}

How to get 2 counts based on two different values in list and check the two counts are equal or not?

I have a list "List issues" this list will hold all the issues from all the projects.
From this "issues" object i can get issues.Project, Issues.Status inside the loop.
I wanted to do the below mentioned operations.
List<Issue> issues = issueCollector.get().getAppropriateIssues();
for (int i=0;i< issues.size();i++)
{
Issue iss = issues.get(i);
}
eg:
**Project IssueKey Status**
PRJ 1 issKey 1 Closed
PRJ 1 issKey 2 Resolved
PRJ 2 isskey 1 Open
PRJ 3 issKey 1 Closed
PRJ 3 issKey 2 Resolved
PRJ 3 issKey 3 Closed
I wanted to get the count of issues with respect to the PROJECT and store it in a variable. How to get the values like below and store in a collection vairable?
eg : PROJECT | Count(Issues)
PRJ 1 2
PRJ 2 1
PRJ 3 3
To get the count of issues in a project with the status in closed or resolved and store it in a variable. How to get the values like below and store in a collection vairable?
eg :
PROJECT | Count(Issues count whose in CLOSED or RESOLVED)
PRJ 1 2
PRJ 3 3
Then from this two variable, i want to check condition like
if(PRJ1(2 issues) == PRJ1(2 issues(with status)))
{
Add this PROJECT to a LIST of STRING
List<STRING> val = new List();
val.add(PROJECT);
}
For flexibility, (it can be that you have to check open issues or sum of this or that), I advise to introduce a small class IssueStatus which keeps all project issue counts. Java 8 allows you to construct it within another class btw.
class IssueStatus {
int numOfClosed = 0;
int numOfResolved = 0;
int numOfOpen = 0;
// not sure if status is string or enum
addStatusCount(String status) {
// logic to inc the num
// eg if "closed", then use numOfClosed++
}
getNumOfClosed() { return numOfClosed; }
getNumOfResolved() { return numOfResolved; }
getNumOfOpen() { return numOfOpen; }
getTotalIssues() { return numOfClosed + numOfResolved + numOfOpen; }
}
You can consider to add a project name to the object. But here, I've used a map to associate a given status to a project.
Map<String, IssueStatus> issueStatusMap = new ...
To populate the map, just use your loop
for (int i=0;i< issues.size();i++) {
Issue iss = issues.get(i);
// check if given project is already in map -> if not, add IssueStatus instance
if (! issueStatusMap.contains(iss.Project)) {
issueStatusMap.put(iss.Project, new IssueStatus());
}
// add issue status cound
issueStatusMap.get(iss.Project).addStatusCount(iss.Status);
}
You can use java 8's stream().forEach( ... ) to fill in the map though. Now, it's easy to have statistic information from your map.
// now you only have to get the data simply
// 1) sum of issues
for(Map.Entry<String, IssueStatus> entry : issueStatusMap.entrySet()) {
s.o.p("project name: " + entry.getKey() + " has " + entry.getValue().getTotalIssues());
}
// or use the sum of the three getNum... methods
// 2) count only closed + resolved
for(Map.Entry<String, IssueStatus> entry : issueStatusMap.entrySet()) {
IssueStatus is = entry.getValue();
s.o.p("project name: " + entry.getKey() + " status count: closed + resolved = " + (is.getNumOfClosed() + is.getNumOfResolved()));
}
Of course you can do all java 8's stream and group by, but I don't advise it because you have to perform another loop each time you're doing your task. This can be an exhaustive operation if the list of issues is very large.
Like in this example, if you want to get sum of counts and sum of "closed" and "resolved" issues by using Collectors.groupingBy, then you're going through that issue list two times. My solution requires one looping, with the cost of some extra heap space to store the objects. And when gathering the data, another small loop is used to go through all project status object instead of all issues. (if there are 100 projects with 5000 issues, then there is a big win)
Finally, to answer your last thing (I admit that this one isn't clear for me)
if(PRJ1(2 issues) == PRJ1(2 issues(with status)))
which is simply
IssueStatus status = issueStatusMap.get("<your projectName>");
if( status.getNum... == status.getNum... ) {
// do something
}
Use java8 collectors api for perform the grouping. check link https://www.mkyong.com/java8/java-8-collectors-groupingby-and-mapping-example/
A simple approach; assuming that you actually have a Project class; you can use a Map<Project, List<Issues>> and from there:
Map<Project, List<Issues>> issuesByProject = new HashMap<>();
for (Issue issue : issues) {
if (issue status ... can be ignored) {
continue;
}
Project proj = issue.getProject();
if (issuesByProject.containsKey(proj)) {
issuesByProject.get(proj).add(issue);
} else {
List newListForProject = new ArrayList<>();
newListForProject.add(issue);
issuesByProject.put(proj, newListForProject);
}
}
This code iterates your list (using the simpler and to-be-preferred for-each looping style). Then we first check if that issue needs to be processed (by checking its status for example). If not, we stop that loop iteration and hop to the next one (using continue). If processing is required, we check if that map contains a list for the current project; if so, we simply add that issue. If not, we create a new list, add the issue, and then put the list into the map.

How to implement early exit / return in Haskell?

I am porting a Java application to Haskell. The main method of the Java application follows the pattern:
public static void main(String [] args)
{
if (args.length == 0)
{
System.out.println("Invalid number of arguments.");
System.exit(1);
}
SomeDataType d = getData(arg[0]);
if (!dataOk(d))
{
System.out.println("Could not read input data.");
System.exit(1);
}
SomeDataType r = processData(d);
if (!resultOk(r))
{
System.out.println("Processing failed.");
System.exit(1);
}
...
}
So I have different steps and after each step I can either exit with an error code, or continue to the following step.
My attempt at porting this to Haskell goes as follows:
main :: IO ()
main = do
a <- getArgs
if (null args)
then do
putStrLn "Invalid number of arguments."
exitWith (ExitFailure 1)
else do
-- The rest of the main function goes here.
With this solution, I will have lots of nested if-then-else (one for each exit point of the original Java code).
Is there a more elegant / idiomatic way of implementing this pattern in Haskell? In general, what is a Haskell idiomatic way to implement an early exit / return as used in an imperative language like Java?
A slightly more sensible approach in Haskell that uses the same sort of conditional logic you tried might look like this:
fallOverAndDie :: String -> IO a
fallOverAndDie err = do putStrLn err
exitWith (ExitFailure 1)
main :: IO ()
main = do a <- getArgs
case a of
[d] | dataOk d -> doStuff $ processData d
| otherwise -> fallOverAndDie "Could not read input data."
_ -> fallOverAndDie "Invalid number of arguments."
processData r
| not (resultOk r) = fallOverAndDie "Processing failed."
| otherwise = do -- and so on...
In this particular case, given that exitWith terminates the program anyway, we could also dispense with the nested conditionals entirely:
main :: IO ()
main = do a <- getArgs
d <- case a of
[x] -> return x
_ -> fallOverAndDie "Invalid number of arguments."
when (not $ dataOk d) $ fallOverAndDie "Could not read input data."
let r = processData d
when (not $ resultOk r) $ fallOverAndDie "Processing failed."
Using the same fallOverAndDie as before. This is a much more direct translation of the original Java.
In the general case, the Monad instance for Either lets you write something very similar to the latter example above in pure code. Starting from this instead:
fallOverAndDie :: String -> Either String a
fallOverAndDie = Left
notMain x = do a <- getArgsSomehow x
d <- case a of
-- etc. etc.
...the rest of the code is unchanged from my second example. You can of course use something other than just String as well; to more faithfully recreate the IO version, you could use Either (String, ExitCode) instead.
Additionally, this use of Either is not limited to error handling--if you have some complicated calculation returning a Double, using Either Double Double and the same monadic style as above, you can use Left to bail out early with a return value, then wrap the function using something like either id id to collapse the two outcomes and get a single Double.
One way is to use the ErrorT monad transformer. With it, you can treat it like a regular monad, return, bind, all that good stuff, but you also get this function, throwError. This causes you to skip the following calculations either till you reach the end of the monadic computation, or when you call catchError. This is for error handling though, it's not meant to be for arbitrarily exiting a function in Haskell. I suggested it because it seems like that's what you're doing.
A quick example:
import Control.Monad.Error
import System.Environment
data IOErr = InvalidArgs String | GenErr String deriving (Show)
instance Error IOErr where
strMsg = GenErr --Called when fail is called
noMsg = GenErr "Error!"
type IOThrowsError = ErrorT IOErr IO
process :: IOThrowsError [String]
process = do
a <- liftIO getArgs
if length a == 0
then throwError $ InvalidArgs "Expected Arguments, received none"
else return a
main = do
result <- runErrorT errableCode
case result of
Right a -> putStrLn $ show a
Left e -> putStrLn $ show e
where errableCode = do
a <- process
useArgs a
now if process threw an error, useArgs wouldn't be executed.
This is what I have came up with
data ExtendedMaybe a = Just a | GenErr String
isWrongArgs :: [string] -> ExtendedMaybe [string]
isWrongArgs p = if (length p == 0)
then GenErr "Invalid number of arguments"
else p
getData :: ExtendedMaybe [string] -> ExtendedMaybe sometype
getData GenErr = GenErr
getData [string] = if anything wrong return GenErr "could not read input data"
processdata :: ExtendedMaybe sometype -> ExtendedMaybe sometype
processdata GenErr = GenErr
main = do
a <- getArgs
d <- isWrongArgs a
r <- getData d
f <- processdata r
Roughly the idea is you have a datatype like Maybe a, only instead of Nothing you have GenErr String, which you define in every function which process data. If the input data type is GenErr simply return that. Otherwise check the error in the data and return GenErr with appropriate string. This may not be the perfect way, but still one way. This do not exit at the exact point of error, but guarantee that not much is happening after error occurred.

Introduce a counter into a loop within scala

I'm writing a small program which will convert a very large file into multiple smaller files, each file will contain 100 lines.
I'm iterating over a lines iteration :
while (lines.hasNext) {
val line = lines.next()
}
I want to introduce a counter and when it reaches a certain value, reset the counter and proceed. In java I would do something like :
int counter = 0;
while (lines.hasNext) {
val line = lines.next()
if(counter == 100){
counter = 0;
}
++counter
}
Is there something similar in scala or an alternative method ?
traditionally in scala you use .zipWithIndex
scala> List("foo","bar")
res0: List[java.lang.String] = List(foo, bar)
scala> for((x,i) <- res0.zipWithIndex) println(i + " : " +x)
0 : foo
1 : bar
(this will work with your lines too, as far as they are in Iterator, e.g. has hasNext and next() methods, or some other scala collection)
But if you need a complicated logic, like resetting counter, you may write it the same way as in java:
var counter = 0
while (lines.hasNext) {
val line = lines.next()
if(counter % 100 == 0) {
// now write to another file
}
}
Maybe you can tell us why you want to reset counter, so we may say how to do it better?
EDIT
according to your update, that is better to do using grouped method, as #pr1001 proposed:
lines.grouped(100).foreach(l => l.foreach(/* write line to file*/))
If your resetting counter represents the fact that there are repeated groups of data in the original list, you might want to use the grouped method:
scala> val l = List("one", "two", "three", "four")
l: List[java.lang.String] = List(one, two, three, four)
scala> l.grouped(2).toList
res0: List[List[java.lang.String]] = List(List(one, two), List(three, four))
Update: Since you're reading from a file, you should be able to pretty efficiently iterate over the file:
val bigFile = io.Source.fromFile("/tmp/verybigfile")
val groupedLines = bigFile.getLines.grouped(2).zipWithIndex
groupedLines.foreach(group => {
val (lines, index) = group
val p = new java.io.PrintWriter("/tmp/" + index)
lines.foreach(p.println)
p.close()
})
Of course this could also be written as a for comprehension...
You might even be able to get better performance by converting groupedLines to a parallel collection with .par before writing out each group of lines to its own file.
This would work:
lines grouped 100 flatMap (_.zipWithIndex) foreach {
case (line, count) => //whatever
}
You may use zipWithIndex along with some transformation.
scala> List(10, 20, 30, 40, 50).zipWithIndex.map(p => (p._1, p._2 % 3))
res0: List[(Int, Int)] = List((10,0), (20,1), (30,2), (40,0), (50,1))

node.js performance with zeromq vs. Python vs. Java

I've written a simple echo request/reply test for zeromq using node.js, Python, and Java. The code runs a loop of 100K requests. The platform is a 5yo MacBook Pro with 2 cores and 3G of RAM running Snow Leopard.
node.js is consistently an order of magnitude slower than the other two platforms.
Java:
real 0m18.823s
user 0m2.735s
sys 0m6.042s
Python:
real 0m18.600s
user 0m2.656s
sys 0m5.857s
node.js:
real 3m19.034s
user 2m43.460s
sys 0m24.668s
Interestingly, with Python and Java the client and server processes both use about half of a CPU. The client for node.js uses just about a full CPU and the server uses about 30% of a CPU. The client process also has an enormous number of page faults leading me to believe this is a memory issue. Also, at 10K requests node is only 3 times slower; it definitely slows down more the longer it runs.
Here's the client code (note that the process.exit() line doesn't work, either, which is why I included an internal timer in addition to using the time command):
var zeromq = require("zeromq");
var counter = 0;
var startTime = new Date();
var maxnum = 10000;
var socket = zeromq.createSocket('req');
socket.connect("tcp://127.0.0.1:5502");
console.log("Connected to port 5502.");
function moo()
{
process.nextTick(function(){
socket.send('Hello');
if (counter < maxnum)
{
moo();
}
});
}
moo();
socket.on('message',
function(data)
{
if (counter % 1000 == 0)
{
console.log(data.toString('utf8'), counter);
}
if (counter >= maxnum)
{
var endTime = new Date();
console.log("Time: ", startTime, endTime);
console.log("ms : ", endTime - startTime);
process.exit(0);
}
//console.log("Received: " + data);
counter += 1;
}
);
socket.on('error', function(error) {
console.log("Error: "+error);
});
Server code:
var zeromq = require("zeromq");
var socket = zeromq.createSocket('rep');
socket.bind("tcp://127.0.0.1:5502",
function(err)
{
if (err) throw err;
console.log("Bound to port 5502.");
socket.on('message', function(envelope, blank, data)
{
socket.send(envelope.toString('utf8') + " Blancmange!");
});
socket.on('error', function(err) {
console.log("Error: "+err);
});
}
);
For comparison, the Python client and server code:
import zmq
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://127.0.0.1:5502")
for counter in range(0, 100001):
socket.send("Hello")
message = socket.recv()
if counter % 1000 == 0:
print message, counter
import zmq
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://127.0.0.1:5502")
print "Bound to port 5502."
while True:
message = socket.recv()
socket.send(message + " Blancmange!")
And the Java client and server code:
package com.moo.test;
import org.zeromq.ZMQ;
import org.zeromq.ZMQ.Context;
import org.zeromq.ZMQ.Socket;
public class TestClient
{
public static void main (String[] args)
{
Context context = ZMQ.context(1);
Socket requester = context.socket(ZMQ.REQ);
requester.connect("tcp://127.0.0.1:5502");
System.out.println("Connected to port 5502.");
for (int counter = 0; counter < 100001; counter++)
{
if (!requester.send("Hello".getBytes(), 0))
{
throw new RuntimeException("Error on send.");
}
byte[] reply = requester.recv(0);
if (reply == null)
{
throw new RuntimeException("Error on receive.");
}
if (counter % 1000 == 0)
{
String replyValue = new String(reply);
System.out.println((new String(reply)) + " " + counter);
}
}
requester.close();
context.term();
}
}
package com.moo.test;
import org.zeromq.ZMQ;
import org.zeromq.ZMQ.Context;
import org.zeromq.ZMQ.Socket;
public class TestServer
{
public static void main (String[] args) {
Context context = ZMQ.context(1);
Socket socket = context.socket(ZMQ.REP);
socket.bind("tcp://127.0.0.1:5502");
System.out.println("Bound to port 5502.");
while (!Thread.currentThread().isInterrupted())
{
byte[] request = socket.recv(0);
if (request == null)
{
throw new RuntimeException("Error on receive.");
}
if (!socket.send(" Blancmange!".getBytes(), 0))
{
throw new RuntimeException("Error on send.");
}
}
socket.close();
context.term();
}
}
I would like to like node, but with the vast difference in code size, simplicity, and performance, I'd have a hard time convincing myself at this point.
So, has anyone seen behavior like this before, or did I do something asinine in the code?
You're using a third party C++ binding. As far as I understand it, the crossover between v8's "js-land" and bindings to v8 written in "c++ land", is very expensive. If you notice, some popular database bindings for node are implemented entirely in JS (although, partly I'm sure, because people don't want to compile things, but also because it has the potential to be very fast).
If I remember correctly, when Ryan Dahl was writing the Buffer objects for node, he noticed that they were actually a lot faster if he implemented them mostly in JS as opposed to C++. He ended up writing what he had to in C++, and did everything else in pure javascript.
So, I'm guessing part of the performance issue here has to do with that particular module being a c++ binding.
Judging node's performance based on a third party module is not a good medium for determining its speed or quality. You would do a lot better to benchmark node's native TCP interface.
"can you try to simulate logic from your Python example (e.i send next message only after receiving previous)?" – Andrey Sidorov Jul 11 at 6:24
I think that's part of it:
var zeromq = require("zeromq");
var counter = 0;
var startTime = new Date();
var maxnum = 100000;
var socket = zeromq.createSocket('req');
socket.connect("tcp://127.0.0.1:5502");
console.log("Connected to port 5502.");
socket.send('Hello');
socket.on('message',
function(data)
{
if (counter % 1000 == 0)
{
console.log(data.toString('utf8'), counter);
}
if (counter >= maxnum)
{
var endTime = new Date();
console.log("Time: ", startTime, endTime);
console.log("ms : ", endTime - startTime);
socket.close(); // or the process.exit(0) won't work.
process.exit(0);
}
//console.log("Received: " + data);
counter += 1;
socket.send('Hello');
}
);
socket.on('error', function(error) {
console.log("Error: "+error);
});
This version doesn't exhibit the same increasing slowness as the previous, probably because it's not throwing as many requests as possible at the server and only counting responses like the previous version. It's about 1.5 times as slow as Python/Java as opposed to 5-10 times slower in the previous version.
Still not a stunning commendation of node for this purpose, but certainly a lot better than "abysmal".
This was a problem with the zeroMQ bindings of node.
I don't know since when, but it is fixed and you get the same results as with the other languages.
I'm not all that familiar with node.js, but the way you're executing it is recursively creating new functions over and over again, no wonder it's blowing up. to be on par with python or java, the code needs to be more along the lines of:
if (counter < maxnum)
{
socket.send('Hello');
processmessages(); // or something similar in node.js if available
}
Any performance testing using REQ/REP sockets is going to be skewed due to round-tripping and thread latencies. You're basically waking up the whole stack, all the way down and up, for each message. It's not very useful as a metric because REQ/REP cases are never high performance (they can't be). There are two better performance tests:
Sending many messages of various sizes from 1 byte to 1K, see how many you can send in e.g. 10 seconds. This gives you basic throughput. This tells you how efficient the stack is.
Measure end-to-end latency but of a stream of messsages; i.e. insert time stamp in each message and see what the deviation is on the receiver. This tells you whether the stack has jitter, e.g. due to garbage collection.
Your client python code is blocking in the loop. In the node example, you receive the events in the 'message' event handler asynchronously. If all you want from your client is to receive data from zmq then your python code will be more efficient because it is coded as a specialized one-trick pony. If you want to add features like listen to other events that aren't using zmq, then you'll find it complicated to re-write the python code to do so. With node, all you need is to add another event handler. node will never be a performance beast for simple examples. However, as your project gets more complicated with more moving pieces, it's a lot easier to add features correctly to node than to do so with the vanilla python you've written. I'd much rather toss a little bit more money on hardware, increase readability and decrease my development time/cost.

Categories

Resources