Is there anything in Java close to the parallel collections in Scala?

Is there anything in Java close to the parallel collections in Scala? - java

What is the simplest way to implement a parallel computation (e.g. on a multiple core processor) using Java.
I.E. the java equivalent to this Scala code
val list = aLargeList
list.par.map(_*2)
There is this library, but it seems overwhelming.

http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/
Don't give up so fast, snappy! ))
From the javadocs (with changes to map to your f) the essential matter is really just this:
ParallelLongArray a = ... // you provide
a.replaceWithMapping (new LongOp() { public long op(long a){return a*2L;}};);
is pretty much this, right?
val list = aLargeList
list.par.map(_*2)
& If you are willing to live with a bit less terseness, the above can be a reasonably clean and clear 3 liner (and of course, if you reuse functions, then its the same exact thing as Scala - inline functions.):
ParallelLongArray a = ... // you provide
LongOp f = new LongOp() { public long op(long a){return a*2L;}};
a.replaceWithMapping (f);
[edited above to show concise complete form ala OP's Scala variant]
and here it is in maximal verbose form where we start from scratch for demo:
import java.util.Random;
import jsr166y.ForkJoinPool;
import extra166y.Ops.LongGenerator;
import extra166y.Ops.LongOp;
import extra166y.ParallelLongArray;
public class ListParUnaryFunc {
public static void main(String[] args) {
int n = Integer.parseInt(args[0]);
// create a parallel long array
// with random long values
ParallelLongArray a = ParallelLongArray.create(n-1, new ForkJoinPool());
a.replaceWithGeneratedValue(generator);
// use it: apply unaryLongFuncOp in parallel
// to all values in array
a.replaceWithMapping(unaryLongFuncOp);
// examine it
for(Long v : a.asList()){
System.out.format("%d\n", v);
}
}
static final Random rand = new Random(System.nanoTime());
static LongGenerator generator = new LongGenerator() {
#Override final
public long op() { return rand.nextLong(); }
};
static LongOp unaryLongFuncOp = new LongOp() {
#Override final public long op(long a) { return a * 2L; }
};
}
Final edit and notes:
Also note that a simple class such as the following (which you can reuse across your projects):
/**
* The very basic form w/ TODOs on checks, concurrency issues, init, etc.
*/
final public static class ParArray {
private ParallelLongArray parr;
private final long[] arr;
public ParArray (long[] arr){
this.arr = arr;
}
public final ParArray par() {
if(parr == null)
parr = ParallelLongArray.createFromCopy(arr, new ForkJoinPool()) ;
return this;
}
public final ParallelLongArray map(LongOp op) {
return parr.replaceWithMapping(op);
}
public final long[] values() { return parr.getArray(); }
}
and something like that will allow you to write more fluid Java code (if terseness matters to you):
long[] arr = ... // you provide
LongOp f = ... // you provide
ParArray list = new ParArray(arr);
list.par().map(f);
And the above approach can certainly be pushed to make it even cleaner.

Doing that on one machine is pretty easy, but not as easy as Scala makes it. That library you posted is already apart of Java 5 and beyond. Probably the simplest thing to use is a ExecutorService. That represents a series of threads that can be run on any processor. You send it tasks and those things return results.
http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
http://www.fromdev.com/2009/06/how-can-i-leverage-javautilconcurrent.html
I'd suggest using ExecutorService.invokeAll() which will return a list of Futures. Then you can check them to see if their done.
If you're using Java7 then you could use the fork/join framework which might save you some work. With all of these you can build something very similar to Scala parallel arrays so using it is fairly concise.

Using threads, Java doesn't have this sort of thing built-in.

There will be an equivalent in Java 8: http://www.infoq.com/articles/java-8-vs-scala

Related

Update a list reference inside a method

In Java we can not reassign a reference inside a method.
So the following does not work:
class SomeClass {
List<PaidOrders> paidOrders;
List<PendingOrders> pendingOrders;
List<CancelledOrders> cancelledOrders;
private void process(List<OrderStatus> data, List<Orders> currentOrderlist) {
List<Order> newOrders = fromOrderStatus(data);
currentOrderlist = newOrders;
}
}
But the following does work:
class SomeClass {
private void process(List<OrderStatus> data, List<Orders> currentOrderlist) {
List<Order> newOrders = fromOrderStatus(data);
currentOrderlist.clear();
currentOrderlist.addAll(newOrders); // <- extra linear loop
}
}
The problem is that the second example does an extra linear loop to copy from one list to the other.
Question:
I was wondering, is there some design approach so that I could neatly just replace the references instead? I.e. somehow make the first snippet work with some change in the parameters or something?
Update
After the comments I would like to clarify that the currentOrderList can be any of the paidOrders, pendingOrders, cancelledOrders.
The code for process is the same for all types.

Hm. I see two possibilities here. Either you use some wrapper object such as AtomicReference (might be a bit overpowered because of the multi-threading issues) as the argument and then just set it there or you use a consumer.
In the second case your method would look like this:
public void process(List<OrderStatus> data, Consumer<List<Orders>> target) {
List<Person> newOrders = fromOrderStatus(data);
target.accept(newOrders);
}
Then on the calling side you would implement it like this:
process(data, e-> <<targetList>> = e);

If your list will be wrapped by a different object (for example - AtomicReference), then you will be able to change it.
public static void doSomething(AtomicReference<List<Integer>> listAtomicReference){
List<Integer> newIntegers = new ArrayList<>();
listAtomicReference.set(newIntegers);
}
public static void main(String[] args) {
AtomicReference<List<Integer>> listAtomicReference = new AtomicReference<>(Arrays.asList(4));
doSomething(listAtomicReference);
System.out.println(listAtomicReference.get());
}
Output:
[]
Making a public member variable in a class.
With that being said, I wouldn't recommend walking this path.
Is premature optimization really the root of all evil?

How to shard a Set?

Could you help me with one thing? Imagine I have a simple RESTful microserver with one GET method which simply responds with a random String.
I assemble all the strings in a ConcurrentHashSet<String> that holds all answers.
There is a sloppy implementation below, the main thing is that the Set<String> is a fail-safe and can be modified simultaneously.
#RestController
public class Controller {
private final StringService stringService;
private final CacheService cacheService;
public Controller(final StringService stringService, final CacheService cacheService) {
this.stringService = stringService;
this.cacheService = cacheService;
}
#GetMapping
public String get() {
final String str = stringService.random();
cacheService.add(str);
return str;
}
}
public class CacheService {
private final Set<String> set = ConcurrentHashMap.newKeySet();
public void add(final String str) {
set.add(str);
}
}
While you are reading this line my endpint is being used by 1 billion people.
I want to shard the cache. Since my system is heavily loaded I can't hold all the strings on one server. I want to have 256 servers/instances and uniformly distribute my cache utilizing str.hashCode()%256 function to determine on each server/instance should a string be kept.
Could you tell me what should I do next?
Assume that currently, I have only running locally Spring Boot application.

You should check out Hazelcast, it is open source and has proved useful for me in a case where i wanted to share data among multiple instances of my application. The In-memory data grid provided by hazelcast might just be the thing you are looking for.

I agree with Vicky, this is what Hazelcast is made for. It's a single jar, a couple lines of code and instead of a HashMap, you have an IMap, which is an extension of HashMap, and you're good to go. All the distribution, sharding, concurrency, etc is done for you. Check out:
https://docs.hazelcast.org/docs/3.11.1/manual/html-single/index.html#map

Try follow codes.But,it is a bad way,you best use Map to cache your data in one instance.If you need to create distributed application,try distributed catche service like Redis.
class CacheService {
/**
* assume read operation is more frequently than write operation
*/
private final static List<Set<String>> sets = new CopyOnWriteArrayList<>();
static {
for (int i = 0; i < 256; i++) {
sets.add(ConcurrentHashMap.newKeySet());
}
}
public void add(final String str) {
int insertIndex = str.hashCode() % 256;
sets.get(insertIndex).add(str);
}
}

Mapping 2 Objects in a Stream to a single 3rd one?

If I have a list of timestamps and a file path of an object that I want to convert, can I make a collection of converters that expect the method signature Converter(filePath, start, end)?
More Detail (Pseuo-Code):
Some list that has timestamps (Imagine they're in seconds) path = somewhere, list = {0, 15, 15, 30},
How can I do something like this:
list.stream.magic.map(start, end -> new Converter (path, start, end))?
Result: new Converter (path, 0, 15), new Converter(path, 15, 30)
Note: I'm aware of BiFunction, but to my knowledge, streams do not implement it.

There are many approaches to get the required result using streams.
But first of all, you're not obliged to use Stream API, and in case of dealing with lists of tens and hundreds elements I would suggest to use plain old list iterations.
Just for instance try the code sample below.
We easily can see the two surface problems arising from the nature of streams and their incompatibility with the very idea of pairing its elements:
it's necessary to apply stateful function which is really tricky for using in map() and should be considered dirty coding; and the mapping produce some nulls on even places that should be filtered out properly;
problems are there when stream contains odd number of elements, and you never can predict if it does.
If you decide to use streams then to make it a clear way we need a custom implementation of Iterator, Spliterator or Collector - depends on demands.
Anyway there are couple of non-obvious corner cases you won't be happy to implement by yourself, so can try tons of third-party stream libs.
Two of the most popular are Streamex and RxJava.
Definitely they have tools for pairing stream elements... but don't forget to check the performance for your case!
import java.util.Objects;
import java.util.function.Function;
import java.util.stream.Stream;
public class Sample
{
public static void main(String... arg)
{
String path = "somewhere";
Stream<Converter> stream = Stream.of(0, 15, 25, 30).map(
new Function<Integer, Converter>()
{
int previous;
boolean even = true;
#Override
public Converter apply(Integer current)
{
Converter converter = even ? null : new Converter(path, previous, current);
even = !even;
previous = current;
return converter;
}
}).filter(Objects::nonNull);
stream.forEach(System.out::println);
}
static class Converter
{
private final String path;
private final int start;
private final int end;
Converter(String path, int start, int end)
{
this.path = path;
this.start = start;
this.end = end;
}
public String toString()
{
return String.format("Converter[%s,%s,%s]", path, start, end);
}
}
}

Java 8 Stream Collectors - Collector to create a Map with objects in multiple buckets

The following code works and is readable but it seems to me I have intermediate operations that feel like they shouldn't be necessary. I've written this simplified version as the actual code is part of a much larger process.
I've got a Collection of Widget, each with a name and multiple types (indicated by constants of the WidgetType enum). These multiple types are gettable as a Stream<WidgetType> though, if necessary, I could return those as some other type. (For various reasons, it is strongly desirable that these be returned as a Stream<WidgetType> because of how these widgets are used later in the actual code.)
These widgets are added to an EnumMap<WidgetType, List<Widget>> which is, later, translated into an EnumMap<WidgetType, Widget[]>.
If each Widget only had a single WidgetType, this would be a trivial solve but, since any Widget could have 1 or more types, I am tripping all over myself with the syntax of the Collectors.groupingBy() method (and its overloads).
Here's the code example, again, fully functional and gives me the exact result I need.
class StackOverFlowExample {
private final Map<WidgetType, Widget[]> widgetMap = new EnumMap<>(WidgetType.class);
public static void main(String[] args) { new StackOverFlowExample(); }
StackOverFlowExample() {
Collection<Widget> widgetList = getWidgetsFromWhereverWidgetsComeFrom();
{
final Map<WidgetType, List<Widget>> intermediateMap = new EnumMap<>(WidgetType.class);
widgetList.forEach(w ->
w.getWidgetTypes().forEach(wt -> {
intermediateMap.putIfAbsent(wt, new ArrayList<>());
intermediateMap.get(wt).add(w);
})
);
intermediateMap.entrySet().forEach(e -> widgetMap.put(e.getKey(), e.getValue().toArray(new Widget[0])));
}
Arrays.stream(WidgetType.values()).forEach(wt -> System.out.println(wt + ": " + Arrays.toString(widgetMap.get(wt))));
}
private Collection<Widget> getWidgetsFromWhereverWidgetsComeFrom() {
return Arrays.asList(
new Widget("1st", WidgetType.TYPE_A, WidgetType.TYPE_B),
new Widget("2nd", WidgetType.TYPE_A, WidgetType.TYPE_C),
new Widget("3rd", WidgetType.TYPE_A, WidgetType.TYPE_D),
new Widget("4th", WidgetType.TYPE_C, WidgetType.TYPE_D)
);
}
}
This outputs:
TYPE_A: [1st, 2nd, 3rd]
TYPE_B: [1st]
TYPE_C: [2nd, 4th]
TYPE_D: [3rd, 4th]
For completeness sake, here's the Widget class and the WidgetType enum:
class Widget {
private final String name;
private final WidgetType[] widgetTypes;
Widget(String n, WidgetType ... wt) { name = n; widgetTypes = wt; }
public String getName() { return name; }
public Stream<WidgetType> getWidgetTypes() { return Arrays.stream(widgetTypes).distinct(); }
#Override public String toString() { return name; }
}
enum WidgetType { TYPE_A, TYPE_B, TYPE_C, TYPE_D }
Any ideas on a better way to execute this logic are welcome. Thanks!

IMHO, the key is to convert a Widget instance to a Stream<Pair<WidgetType, Widget>> instance. Once we have that, we can flatMap a stream of widgets and collect on the resulting stream. Of course we don't have Pair in Java, so have to use AbstractMap.SimpleEntry instead.
widgets.stream()
// Convert a stream of widgets to a stream of (type, widget)
.flatMap(w -> w.getTypes().map(t->new AbstractMap.SimpleEntry<>(t, w)))
// Grouping by the key, and do additional mapping to get the widget
.collect(groupingBy(e->e.getKey(),
mapping(e->e.getValue,
collectingAndThen(toList(), l->l.toArray(new Widget[0])))));
P.S. this is an occasion where IntelliJ's suggestion doesn't shorten a lambda with method reference.

This is a bit convoluted, but it produces the same output, not necessarily in the same order. It uses a static import of java.util.stream.Collectors.*.
widgetMap = widgetList.stream()
.flatMap(w -> w.getWidgetTypes().map(t -> new AbstractMap.SimpleEntry<>(t, w)))
.collect(groupingBy(Map.Entry::getKey, collectingAndThen(mapping(Map.Entry::getValue, toSet()), s -> s.stream().toArray(Widget[]::new))));
Output on my machine:
TYPE_A: [1st, 3rd, 2nd]
TYPE_B: [1st]
TYPE_C: [2nd, 4th]
TYPE_D: [3rd, 4th]

Creating Performance Counters in Java

Does anyone know how can I create a new Performance Counter (perfmon tool) in Java?
For example: a new performance counter for monitoring the number / duration of user actions.
I created such performance counters in C# and it was quite easy, however I couldn’t find anything helpful for creating it in Java…

If you want to develop your performance counter independently from the main code, you should look at aspect programming (AspectJ, Javassist).
You'll can plug your performance counter on the method(s) you want without modifying the main code.

Java does not immediately work with perfmon (but you should see DTrace under Solaris).
Please see this question for suggestions: Java app performance counters viewed in Perfmon

Not sure what you are expecting this tool to do but I would create some data structures to record these times and counts like
class UserActionStats {
int count;
long durationMS;
long start = 0;
public void startAction() {
start = System.currentTimeMillis();
}
public void endAction() {
durationMS += System.currentTimeMillis() - start;
count++;
}
}
A collection for these could look like
private static final Map<String, UserActionStats> map =
new HashMap<String, UserActionStats>();
public static UserActionStats forUser(String userName) {
synchronized(map) {
UserActionStats uas = map.get(userName);
if (uas == null)
map.put(userName, uas = new UserActionStats());
return uas;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Is there anything in Java close to the parallel collections in Scala? - java

What is the simplest way to implement a parallel computation (e.g. on a multiple core processor) using Java. I.E. the java equivalent to this Scala code val list = aLargeList list.par.map(_*2) There is this library, but it seems overwhelming.

Using threads, Java doesn't have this sort of thing built-in.

There will be an equivalent in Java 8: http://www.infoq.com/articles/java-8-vs-scala

Related

Update a list reference inside a method

How to shard a Set?

Mapping 2 Objects in a Stream to a single 3rd one?

Java 8 Stream Collectors - Collector to create a Map with objects in multiple buckets

Creating Performance Counters in Java

Categories

Resources