I am currently translating legacy groovy class with methods to Java, and for most methods it has been easy with slight modifications.
Now I am stuck in a method that takes closure as param:
transformer.renameNumbers([:], { Number->
return "${number.name}#somecompany.com"
})
}
the renameNumbers implementation is :
renameNumbers(Map<String,String> renameMap, someclosure = {it}) {
numbers.each { it->
if(newUsername == null ) {
newNumbername = someclosure.call(it)
}
if(newNumbername!=null && newNumbername!=it.number) {
def oldNumber= it.number
it.number = newNumbername
log.info("Changed numbername key of from '$oldNumber' to '$newNumbername'")
}
}
The problem is that if i try to simply pass: transformer.renameNumbers(Map, Object)
it complains:
groovy.lang.MissingMethodException: No signature of method: org.eclipse.emf.ecore.util.EObjectContainmen.call() is applicable for argument types:
I guess it's because my normal Java Object doesn't have call() methods.
Is there a way to circumvent this? For example if I create custom Java class with custom call method ?
Thanks
You could try using Java 8s functional interfaces like Function<T,R> and Lambdas:
//Function<Number, String> f = (n) -> n.name + "#somecompany.com";
transformer.renameNumbers(new HashMap<>(), (n) -> n.name + "#somecompany.com");
Usage :
void renameNumbers(Map<String, String> renameMap, Function<Number, String> somefunction) {
numbers.forEach(it -> {
String newNumbername = somefunction.apply(it); // <-----
if (newNumbername != null && newNumbername != it.number) {
String oldNumber = it.number;
it.number = newNumbername;
log.info("Changed numbername key of from '" + oldNumber + "' to '" + newNumbername + "'");
}
});
}
I have a Map that has been alphabetically sorted by converting it using TreeMap.
The Map contains both a String (installer file name) and Path (installer path on file system) for instance
Map installers;
I need to obtain the most recent installer file name. However, regex seems like it'd be too complicated.
The code I have currently to display the installers and their paths is this:
Map<String, Path> installers = findInstallers();
Set s = installers.entrySet();
Iterator it = s.iterator();
while(it.hasNext()) {
Map.Entry entry = (Map.Entry) it.next();
String installerFile = (String) entry.getKey();
Path installerPath = (Path) entry.getValue();
System.out.println(installerFile + " ==> " + installerPath.toString());
}
System.out.println("================================");
private Map<String, Path> findInstallers() {
HashMap<String, Path> installerPathMap = new HashMap<>();
try {
Path productReleasePath = Paths.get("C:", "test");
List<Path> allPaths = Files.walk(productReleasePath)
.filter(Files::isRegularFile)
.collect(Collectors.toList());
allPaths.forEach(path -> {
if (!path.toFile().getName().toLowerCase().endsWith(".log")) {
String installerFiileName = path.toFile().getName();
installerPathMap.put(installerFiileName, path);
}
});
} catch (IOException e) {
e.printStackTrace();
}
return new TreeMap<>(installerPathMap);
}
This is a sample output:
Client_1.exe ==> C:\test\build_1\Win32\Client_1.exe
Client_5.exe ==> C:\test\build_5\Win32\Client_5.exe
Client_6.exe ==> C:\test\build_6\Win32\Client_6.exe
Server_1.exe ==> C:\test\build_1\Win64\Server_1.exe
Server_2.exe ==> C:\test\build_2\Win64\Server_2.exe
Server_Linux_1.tar.gz ==> C:\test\build_1\Linux32\Server_Linux_1.tar.gz
Server_Linux_2.tar.gz ==> C:\test\build_2\Linux32\Server_Linux_1.tar.gz
================================
I need to shorten my Map to only contain the highest key and it's value pair, so the output is similar to this:
Client_6.exe ==> C:\test\build_6\Win32\Client_6.exe
Server_2.exe ==> C:\test\build_2\Win64\Server_2.exe
Server_Linux_2.tar.gz ==> C:\test\build_2\Linux32\Server_Linux_1.tar.gz
================================
Any help would be greatly appreciated.
If you add the paths to a map using the root of the installer name as a key (i.e. the part before the underscore), and discard the lowest version when there is a key collision, you'll get what you want.
Note that sorting the names alphabetically won't work because version 9 will sort after 10, so you'll have to extract the version and do a numeric comparison.
I'm not certain of your naming convention, but the helper functions in the following example should be easy enough to modify if my assumptions aren't correct.
public class InstallerList {
public static void main(String[] args) throws IOException {
Path productReleasePath = Paths.get("C:", "test");
Collection<Path> installers = Files.walk(productReleasePath)
.filter(Files::isRegularFile)
.filter(p -> !p.getFileName().toString().endsWith(".log"))
// Collect files with the highest version
.collect(Collectors.toMap(
// Key is installer name *without* version
InstallerList::extractName,
// Value mapper; identity mapping to the path
p -> p,
// Select newest version when there is a collision
InstallerList::newer
))
.values();
for (Path path : installers) {
System.out.println(path.getFileName() + " ==> " + path);
}
}
// Extract the root name of an installer from a path (up to but not including the last '_')
public static String extractName(Path path) {
String fileName = path.getFileName().toString();
int i = fileName.lastIndexOf('_');
if (i < 0) {
throw new IllegalArgumentException(fileName);
}
return fileName.substring(0, i);
}
// Return the path with the highest version number
public static Path newer(Path p1, Path p2) {
return extractVersion(p1) > extractVersion(p2) ? p1 : p2;
}
// Extract a version number from a path (number between the last '_' and the following '.')
private static int extractVersion(Path path) {
String fileName = path.getFileName().toString();
int i = fileName.lastIndexOf('_');
if (i < 0) {
throw new IllegalArgumentException(fileName);
}
int j = fileName.indexOf('.', i);
if (j < 0) {
throw new IllegalArgumentException(fileName);
}
return Integer.parseInt(fileName.substring(i + 1, j));
}
}
I am learning akka framework for parallel processing in scala, and I was trying to migrating a java project to scala so I can learn both akka and scala at the same time. I am get a NullPointerException on master actor when trying to receive mutable object from the worker actor after some computation in the worker. All code is below...
import akka.actor._
import java.math.BigInteger
import akka.routing.ActorRefRoutee
import akka.routing.Router
import akka.routing.RoundRobinRoutingLogic
object Main extends App {
val system = ActorSystem("CalcSystem")
val masterActor = system.actorOf(Props[Master], "master")
masterActor.tell(new Calculate, ActorRef.noSender)
}
class Master extends Actor {
private val messages: Int = 10;
var resultList: Seq[String] = _
//val workerRouter = this.context.actorOf(Props[Worker].withRouter(new RoundRobinRouter(2)), "worker")
var router = {
val routees = Vector.fill(5) {
val r = context.actorOf(Props[Worker])
context watch r
ActorRefRoutee(r)
}
Router(RoundRobinRoutingLogic(), routees)
}
def receive() = {
case msg: Calculate =>
processMessages()
case msg: Result =>
resultList :+ msg.getFactorial().toString
println(msg.getFactorial())
if (resultList.length == messages) {
end
}
}
private def processMessages() {
var i: Int = 0
for (i <- 1 to messages) {
// workerRouter.tell(new Work, self)
router.route(new Work, self)
}
}
private def end() {
println("List = " + resultList)
this.context.system.shutdown()
}
}
import akka.actor._
import java.math.BigInteger
class Worker extends Actor {
private val calculator = new Calculator
def receive() = {
case msg: Work =>
println("Called calculator.calculateFactorial: " + context.self.toString())
val result = new Result(calculator.calculateFactorial)
sender.tell(result, this.context.parent)
case _ =>
println("I don't know what to do with this...")
}
}
import java.math.BigInteger
class Result(bigInt: BigInteger) {
def getFactorial(): BigInteger = bigInt
}
import java.math.BigInteger
class Calculator {
def calculateFactorial(): BigInteger = {
var result: BigInteger = BigInteger.valueOf(1)
var i = 0
for(i <- 1 to 4) {
result = result.multiply(BigInteger.valueOf(i))
}
println("result: " + result)
result
}
}
You initialize the resultList with null and then try to append something.
Does your calculation ever stop? In line
resultList :+ msg.getFactorial().toString
you're creating a copy of sequence with an element appended. But there is no assignment to var resultList
This line will work as you want.
resultList = resultList :+ msg.getFactorial().toString
I recommend you to avoid mutable variables in actor and use context.become
https://github.com/alexandru/scala-best-practices/blob/master/sections/5-actors.md#52-should-mutate-state-in-actors-only-with-contextbecome
Let me start out by saying that I'm new to Scala; however, I find the Actor based concurrency model interesting, and I tried to give it a shot for a relatively simple application. The issue that I'm running into is that, although I'm able to get the application to work, the result is far less efficient (in terms of real time, CPU time, and memory usage) than an equivalent Java based solution that uses threads that pull messages off an ArrayBlockingQueue. I'd like to understand why. I suspect that it's likely my lack of Scala knowledge, and that I'm causing all the inefficiency, but after several attempts to rework the application without success, I decided to reach out to the community for help.
My problem is this:
I have a gzipped file with many lines in the format of:
SomeID comma_separated_list_of_values
For example:
1234 12,45,82
I'd like to parse each line and get an overall count of the number of occurrences of each value in the comma separated list.
This file may be pretty large (several GB compressed), but the number of unique values per file is pretty small (at most 500). I figured this would be a pretty good opportunity to try to write an Actor-based concurrent Scala application. My solution involves a main driver that creates a pool of parser Actors. The main driver then reads lines from stdin, passes the line off to an Actor that parses the line and keeps a local count of the values. When the main driver has read the last line, it passes a message to each actor indicating that all lines have been read. When the actor receive the 'done' message, they pass their counts to an aggregator that sums the counts from all actors. Once the counts from all parsers have been aggregated, the main driver prints out the statistics.
The problem:
The main issue that I'm encountering is the incredible amount of inefficiency of this application. It uses far more CPU and far more memory than an "equivalent" Java application that uses threads and an ArrayBlockingQueue. To put this in perspective, here are some stats that I gathered for a 10 million line test input file:
Scala 1 Actor (parser):
real 9m22.297s
user 235m31.070s
sys 21m51.420s
Java 1 Thread (parser):
real 1m48.275s
user 1m58.630s
sys 0m33.540s
Scala 5 Actors:
real 2m25.267s
user 63m0.730s
sys 3m17.950s
Java 5 Threads:
real 0m24.961s
user 1m52.650s
sys 0m20.920s
In addition, top reports that the Scala application has about 10x the resident memory size. So we're talking about orders of magnitude more CPU and memory here for orders of magnitude worse performance, and I just can't figure out what is causing this. Is it a GC issue, or am I somehow creating far more copies of objects than I realize?
Additional details that may or may not be of importance:
The scala application is wrapped by a Java class so that I could
deliver a self-contained executable JAR file (I don't have the Scala
jars on every machine that I might want to run this app).
The application is being invoked as follows: gunzip -c gzFilename |
java -jar StatParser.jar
Here is the code:
Main Driver:
import scala.actors.Actor._
import scala.collection.{ immutable, mutable }
import scala.io.Source
class StatCollector (numParsers : Int ) {
private val parsers = new mutable.ArrayBuffer[StatParser]()
private val aggregator = new StatAggregator()
def generateParsers {
for ( i <- 1 to numParsers ) {
val parser = new StatParser( i, aggregator )
parser.start
parsers += parser
}
}
def readStdin {
var nextParserIdx = 0
var lineNo = 1
for ( line <- Source.stdin.getLines() ) {
parsers( nextParserIdx ) ! line
nextParserIdx += 1
if ( nextParserIdx >= numParsers ) {
nextParserIdx = 0
}
lineNo += 1
}
}
def informParsers {
for ( parser <- parsers ) {
parser ! true
}
}
def printCounts {
val countMap = aggregator.getCounts()
println( "ID,Count" )
/*
for ( key <- countMap.keySet ) {
println( key + "," + countMap.getOrElse( key, 0 ) )
//println( "Campaign '" + key + "': " + countMap.getOrElse( key, 0 ) )
}
*/
countMap.toList.sorted foreach {
case (key, value) =>
println( key + "," + value )
}
}
def processFromStdIn {
aggregator.start
generateParsers
readStdin
process
}
def process {
informParsers
var completedParserCount = aggregator.getNumParsersAggregated
while ( completedParserCount < numParsers ) {
Thread.sleep( 250 )
completedParserCount = aggregator.getNumParsersAggregated
}
printCounts
}
}
The Parser Actor:
import scala.actors.Actor
import collection.mutable.HashMap
import scala.util.matching
class StatParser( val id: Int, val aggregator: StatAggregator ) extends Actor {
private var countMap = new HashMap[String, Int]()
private val sep1 = "\t"
private val sep2 = ","
def getCounts(): HashMap[String, Int] = {
return countMap
}
def act() {
loop {
react {
case line: String =>
{
val idx = line.indexOf( sep1 )
var currentCount = 0
if ( idx > 0 ) {
val tokens = line.substring( idx + 1 ).split( sep2 )
for ( token <- tokens ) {
if ( !token.equals( "" ) ) {
currentCount = countMap.getOrElse( token, 0 )
countMap( token ) = ( 1 + currentCount )
}
}
}
}
case doneProcessing: Boolean =>
{
if ( doneProcessing ) {
// Send my stats to Aggregator
aggregator ! this
}
}
}
}
}
}
The Aggregator Actor:
import scala.actors.Actor
import collection.mutable.HashMap
class StatAggregator extends Actor {
private var countMap = new HashMap[String, Int]()
private var parsersAggregated = 0
def act() {
loop {
react {
case parser: StatParser =>
{
val cm = parser.getCounts()
for ( key <- cm.keySet ) {
val currentCount = countMap.getOrElse( key, 0 )
val incAmt = cm.getOrElse( key, 0 )
countMap( key ) = ( currentCount + incAmt )
}
parsersAggregated += 1
}
}
}
}
def getNumParsersAggregated: Int = {
return parsersAggregated
}
def getCounts(): HashMap[String, Int] = {
return countMap
}
}
Any help that could be offered in understanding what is going on here would be greatly appreciated.
Thanks in advance!
---- Edit ---
Since many people responded and asked for the Java code, here is the simple Java app that I created for comparison purposes. I realize that this is not great Java code, but when I saw the performance of the Scala application, I just whipped up something quick to see how a Java Thread-based implementation would perform as a base-line:
Parsing Thread:
import java.util.Hashtable;
import java.util.Map;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.TimeUnit;
public class JStatParser extends Thread
{
private ArrayBlockingQueue<String> queue;
private Map<String, Integer> countMap;
private boolean done;
public JStatParser( ArrayBlockingQueue<String> q )
{
super( );
queue = q;
countMap = new Hashtable<String, Integer>( );
done = false;
}
public Map<String, Integer> getCountMap( )
{
return countMap;
}
public void alldone( )
{
done = true;
}
#Override
public void run( )
{
String line = null;
while( !done || queue.size( ) > 0 )
{
try
{
// line = queue.take( );
line = queue.poll( 100, TimeUnit.MILLISECONDS );
if( line != null )
{
int idx = line.indexOf( "\t" ) + 1;
for( String token : line.substring( idx ).split( "," ) )
{
if( !token.equals( "" ) )
{
if( countMap.containsKey( token ) )
{
Integer currentCount = countMap.get( token );
currentCount++;
countMap.put( token, currentCount );
}
else
{
countMap.put( token, new Integer( 1 ) );
}
}
}
}
}
catch( InterruptedException e )
{
// TODO Auto-generated catch block
System.err.println( "Failed to get something off the queue: "
+ e.getMessage( ) );
e.printStackTrace( );
}
}
}
}
Driver:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Hashtable;
import java.util.List;
import java.util.Map;
import java.util.TreeSet;
import java.util.concurrent.ArrayBlockingQueue;
public class JPS
{
public static void main( String[] args )
{
if( args.length <= 0 || args.length > 2 || args[0].equals( "-?" ) )
{
System.err.println( "Usage: JPS [filename]" );
System.exit( -1 );
}
int numParsers = Integer.parseInt( args[0] );
ArrayBlockingQueue<String> q = new ArrayBlockingQueue<String>( 1000 );
List<JStatParser> parsers = new ArrayList<JStatParser>( );
BufferedReader reader = null;
try
{
if( args.length == 2 )
{
reader = new BufferedReader( new FileReader( args[1] ) );
}
else
{
reader = new BufferedReader( new InputStreamReader( System.in ) );
}
for( int i = 0; i < numParsers; i++ )
{
JStatParser parser = new JStatParser( q );
parser.start( );
parsers.add( parser );
}
String line = null;
while( (line = reader.readLine( )) != null )
{
try
{
q.put( line );
}
catch( InterruptedException e )
{
// TODO Auto-generated catch block
System.err.println( "Failed to add line to q: "
+ e.getMessage( ) );
e.printStackTrace( );
}
}
// At this point, we've put everything on the queue, now we just
// need to wait for it to be processed.
while( q.size( ) > 0 )
{
try
{
Thread.sleep( 250 );
}
catch( InterruptedException e )
{
}
}
Map<String,Integer> countMap = new Hashtable<String,Integer>( );
for( JStatParser jsp : parsers )
{
jsp.alldone( );
Map<String,Integer> cm = jsp.getCountMap( );
for( String key : cm.keySet( ) )
{
if( countMap.containsKey( key ))
{
Integer currentCount = countMap.get( key );
currentCount += cm.get( key );
countMap.put( key, currentCount );
}
else
{
countMap.put( key, cm.get( key ) );
}
}
}
System.out.println( "ID,Count" );
for( String key : new TreeSet<String>(countMap.keySet( )) )
{
System.out.println( key + "," + countMap.get( key ) );
}
for( JStatParser parser : parsers )
{
try
{
parser.join( 100 );
}
catch( InterruptedException e )
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.exit( 0 );
}
catch( IOException e )
{
System.err.println( "Caught exception: " + e.getMessage( ) );
e.printStackTrace( );
}
}
}
I'm not sure this is a good test case for actors. For one thing, there's almost no interaction between actors. This is a simple map/reduce, which calls for parallelism, not concurrency.
The overhead on the actors is also pretty heavy, and I don't know how many actual threads are being allocated. Depending on how many processors you have, you might have less threads than on the Java program -- which seems to be the case, given that the speed-up is 4x instead of 5x.
And the way you wrote the actors is optimized for idle actors, the kind of situation where you have hundreds or thousands or actors, but only few of them doing actual work at any time. If you wrote the actors with while/receive instead of loop/react, they'd perform better.
Now, actors would make it easy to distribute the application over many computers, except that you violated one of the tenets of actors: you are calling methods on the actor object. You should never do that with actors and, in fact, Akka prevents you from doing so. A more actor-ish way of doing this would be for the aggregator to ask each actor for their key sets, compute their union, and then, for each key, ask all actors to send their count for that key.
I'm not sure, however, that the actor overhead is what you are seeing. You provided no information about the Java implementation, but I daresay you use mutable maps, and maybe even a single concurrent mutable map -- a very different implementation than what you are doing in Scala.
There's also no information on how the file is read (such a big file might have buffering issues), or how it is parsed in Java. Since most of the work is reading and parsing the file, not counting the tokens, differences in implementation there can easily overcome any other issue.
Finally, about resident memory size, Scala has a 9 MB library (in addition to what JVM brings), which might be what you are seeing. Of course, if you are using a single concurrent map in Java vs 6 immutable maps in Scala, that will certainly make a big difference in memory usage patterns.
Scala actors give way Akka actors last days... and more is coming - Viktor is hAkking further to make last the best: https://twitter.com/viktorklang/status/229694698397257728
BTW: Open Source is great power! This day should be holiday of all JVM-based community:
http://www.marketwire.com/press-release/azul-systems-announces-new-initiative-support-open-source-community-with-free-zing-jvm-1684899.htm
Consider the following snipped of code, which calculates the size of all paths given.
def pathSizes = []
paths.each { rootPath ->
pathSizes.addAll(
withPool { pool ->
runForkJoin(rootPath) { path ->
def headSizes = [:]
println path
def lines = ["ls", "-al", path].execute().text.readLines()
(0..<3).each { lines.remove(0) }
lines.each { line ->
def fields = line.split(/\s+/)
if (fields[0] =~ /^d/)
forkOffChild("$path/${fields.last()}")
else {
def userName = fields[2]
def fileSize = fields[4] as long
if (headSizes[userName] == null)
headSizes[userName] = fileSize
else
headSizes[userName] += fileSize
}
}
quietlyJoin()
System.gc()
def shallowSizes =
headSizes.collectEntries { userName, fileSize ->
def childResult =
childrenResults.sum {
it.shallowSizes[userName] ? it.shallowSizes[userName] : 0
} ?: 0
return [userName, fileSize + childResult]
}
def deepSizes =
childrenResults.sum { it.deepSizes ?: [] } +
shallowSizes.collect { userName, fileSize ->
[userName: userName, path: path, fileSize: fileSize]
}
return [shallowSizes: shallowSizes, deepSizes: deepSizes]
}.deepSizes
})
}
Why does this snippet of code deadlock? There are no interactions between threads except possibly with the system call and other parts of the Java framework. If the system calls are the problem, then how can I fix it, without removing the system calls (they are slow, hence the need to parallelize)?