I am using a accumulator within a fold function. I would like to change the value of the accumulator.
My function looks something like this:
public Tuple2<String, Long> fold(Tuple2<String, Long> acc, eventClass event)
{
acc._1 = event.getUser();
acc._2 += event.getOtherThing();
return acc
}
To me this should be working, because all I am doing is change the values of the accumulator. However what I get is Cannot assign value to final variable _1. Same for _2. Why are these properties of acc final? How can I assign values to them?
quick edit:
Wat I could to is rust return a new Tuple instead, but this is not a nice solution in my opinion return new Tuple2<String, Long>(event.getUser(), acc._2 + event.getOtherThing());
solution for flink framework:
Use the Tuple2 of defined in flink. Import it using
import org.apache.flink.api.java.tuple.Tuple2;
and then use it with
acc.f0 = event.getUser();
acc.f1 += event.getByteDiff();
return acc;
I don't know what kind of Tuple2 you still use, but I assume it is a scala Tuple2.
The scala Tuple2 it's immutable. You can't change the value of an Immutable object, you must to recreate it.
Why? The scala Tuple2 is a functional programming "Data structure" so, as all concept of functional programming" it try to reduce side effect.
You can use the .copy function to recreate it as you want.
the following is an example of code:
#Test
public void test() {
Tuple2<String,Long> tuple = new Tuple2<>("a",1l);
Tuple2<String,Long> actual = tuple.copy(tuple._1,tuple._2+1);
Tuple2<String,Long> expected = new Tuple2<>("a",2l);
assertEquals(actual,expected);
}
I don't know which Tuple2 you are working with. How about return a new object:
Tuple2<String, Long> tuple = new Tuple2<String, Long>();
tuple._1 = event.getUser();
tuple._2 = event.getOtherThing() + acc._2;
return tuple;
Related
I was reading the doc here, which gives a use case to reuse the object as given below:
stream
.apply(new WindowFunction<WikipediaEditEvent, Tuple2<String, Long>, String, TimeWindow>() {
// Create an instance that we will reuse on every call
private Tuple2<String, Long> result = new Tuple<>();
#Override
public void apply(String userName, TimeWindow timeWindow, Iterable<WikipediaEditEvent> iterable, Collector<Tuple2<String, Long>> collector) throws Exception {
long changesCount = ...
// Set fields on an existing object instead of creating a new one
result.f0 = userName;
// Auto-boxing!! A new Long value may be created
result.f1 = changesCount;
// Reuse the same Tuple2 object
collector.collect(result);
}
}
So every time instead of creating a new Tuple, it seems to be able to use the same Tuple by using its mutable nature in order to decrease the pressure on GC. Would it be applicable in all operators, where we can mutate and pass the same object in the pipeline via collector.collect(...) call?
By using that idea, I wonder in what places I can make such an optimization without breaking the code or introducing sneaky bugs. Again as an example a KeySelector which returns a Tuple taken from this answer given below:
KeyedStream<Employee, Tuple2<String, String>> employeesKeyedByCountryndEmployer =
streamEmployee.keyBy(
new KeySelector<Employee, Tuple2<String, String>>() {
#Override
public Tuple2<String, String> getKey(Employee value) throws Exception {
return Tuple2.of(value.getCountry(), value.getEmployer());
}
}
);
I wonder if that case, can I reuse the same Tuple by mutating it with different inputs as below. Of course in all cases I assume parallelism is more than 1, probably much higher in a real use case.
KeyedStream<Employee, Tuple2<String, String>> employeesKeyedByCountryndEmployer =
streamEmployee.keyBy(
new KeySelector<Employee, Tuple2<String, String>>() {
Tuple2<String, String> tuple = new Tuple2<>();
#Override
public Tuple2<String, String> getKey(Employee value) throws Exception {
tuple.f0 = value.getCountry();
tuple.f1 = value.value.getEmployer();
return tuple;
}
}
);
I do not know, if Flink copies objects between stages in the pipeline, so I wonder if it's safe to do such an optimization. I read about enableObjectReuse() configuration in the docs, though I am not sure if I really understood it. Actually, it may be a bit Flink internals, though could not understand when Flink does what to manage data/object/records in the pipeline. May be I should make this clear first?
Thanks,
This is sort of reuse in a KeySelector is not safe. keyBy is not an operator, and the usual rules about object reuse in operator chains (which I covered here) do not apply.
See Dave Anderson's answer to Flink, rule of using 'object reuse mode'
Basically you can't remember input object references across function calls or
modify input objects. So in your situation above with the KeySelector, you're modifying an object that you created, not an input object.
I want to convert a scala function that replaces the age of a certain person in a scala Map<String,String> (which is name -> age) : map.map(e => if (e._1 == "Tom") (e._1, "52") else e)
Now I need to write the same function in java, I also have a scala map (scala.collection.Map) as input, I checked in javadoc the method map.map(..) has this signature :
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That])
So the function f I write like this :
AbstractFunction1 f = new AbstractFunction1<Tuple2<String, String>, Tuple2<String, String>>() {
#Override
public Tuple2<String, String> apply(Tuple2<String, String> e) {
if (e._1.equals("Tom")) {
return new Tuple2<>(e._1, "52");
}
return e;
}
};
But I have no idea what I should put in CanBuildFrom. I searched some posts but never found something that works for me.
Does someone know how to do this properly with .map() or there is some other workaround for this kind of usage in Java ? Note : I can convert the scala map into Java map first but it's definitely not something I will do because the return value of the function is also a scala map.
I want to write a utility for general memoization in Java, I want the code be able to look like this:
Util.memoize(() -> longCalculation(1));
where
private Integer longCalculation(Integer x) {
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {}
return x * 2;
}
To do this, I was thinking I could do something like this:
public class Util{
private static final Map<Object, Object> cache = new ConcurrentHashMap<>();
public interface Operator<T> {
T op();
}
public static<T> T memoize(Operator<T> o) {
ConcurrentHashMap<Object, T> memo = cache.containsKey(o.getClass()) ? (ConcurrentHashMap<Object, T>) cache.get(o.getClass()) : new ConcurrentHashMap<>();
if (memo.containsKey(o)) {
return memo.get(o);
} else {
T val = o.op();
memo.put(o, val);
return val;
}
}
}
I was expecting this to work, but I see no memoization being done. I have tracked it down to the o.getClass() being different for each invocation.
I was thinking that I could try to get the run-time type of T but I cannot figure out a way of doing that.
The answer by Lino points out a couple of flaws in the code, but doesn't work if not reusing the same lambda.
This is because o.getClass() does not return the class of what is returned by the lambda, but the class of the lambda itself. As such, the below code returns two different classes:
Util.memoize(() -> longCalculation(1));
Util.memoize(() -> longCalculation(1));
I don't think there is a good way to find out the class of the returned type without actually executing the potentially long running code, which of course is what you want to avoid.
With this in mind I would suggest passing the class as a second parameter to memoize(). This would give you:
#SuppressWarnings("unchecked")
public static <T> T memoize(Operator<T> o, Class<T> clazz) {
return (T) cache.computeIfAbsent(clazz, k -> o.op());
}
This is based on that you change the type of cache to:
private static final Map<Class<?>, Object> cache = new ConcurrentHashMap<>();
Unfortunately, you have to downcast the Object to a T, but you can guarantee that it is safe with the #SuppressWarnings("unchecked") annotation. After all, you are in control of the code and know that the class of the value will be the same as the key in the map.
An alternative would be to use Guavas ClassToInstanceMap:
private static final ClassToInstanceMap<Object> cache = MutableClassToInstanceMap.create(new ConcurrentHashMap<>());
This, however, doesn't allow you to use computeIfAbsent() without casting, since it returns an Object, so the code would become a bit more verbose:
public static <T> T memoize(Operator<T> o, Class<T> clazz) {
T cachedCalculation = cache.getInstance(clazz);
if (cachedCalculation != null) {
return cachedCalculation;
}
T calculation = o.op();
cache.put(clazz, calculation);
return calculation;
}
As a final side note, you don't need to specify your own functional interface, but you can use the Supplier interface:
#SuppressWarnings("unchecked")
public static <T> T memoize(Supplier<T> o, Class<T> clazz) {
return (T) cache.computeIfAbsent(clazz, k -> o.get());
}
The problem you have is in the line:
ConcurrentHashMap<Object, T> memo = cache.containsKey(o.getClass()) ? (ConcurrentHashMap<Object, T>) cache.get(o.getClass()) : new ConcurrentHashMap<>();
You check whether an entry with the key o.getClass() exists. If yes, you get() it else you use a newly initialized ConcurrentHashMap. The problem now with that is, you don't save this newly created map, back in the cache.
So either:
Place cache.put(o.getClass(), memo); after the line above
Or even better use the computeIfAbsent() method:
ConcurrentHashMap<Object, T> memo = cache.computeIfAbsent(o.getClass(),
k -> new ConcurrentHashMap<>());
Also because you know the structure of your cache you can make it more typesafe, so that you don't have to cast everywhere:
private static final Map<Object, Map<Operator<?>, Object>> cache = new ConcurrentHashMap<>();
Also you can shorten your method even more by using the earlier mentioned computeIfAbsent():
public static <T> T memoize(Operator<T> o) {
return (T) cache
.computeIfAbsent(o.getClass(), k -> new ConcurrentHashMap<>())
.computeIfAbsent(o, k -> o.op());
}
(T): simply casts the unknown return type of Object to the required output type T
.computeIfAbsent(o.getClass(), k -> new ConcurrentHashMap<>()): invokes the provided lambda k -> new ConcurrentHashMap<>() when there is no mapping for the key o.getClass() in cache
.computeIfAbsent(o, k -> o.op());: this is invoked on the returned value from the computeIfAbsent call of 2.. If o doesn't exist in the nested map then execute the lambda k -> o.op() the return value is then stored in the map and returned.
I have a function like this:
private static Map<String, ResponseTimeStats> perOperationStats(List<PassedMetricData> scopedMetrics, Function<PassedMetricData, String> classifier)
{
Map<String, List<PassedMetricData>> operationToDataMap = scopedMetrics.stream()
.collect(groupingBy(classifier));
return operationToDataMap.entrySet().stream()
.collect(toMap(Map.Entry::getKey, e -> StatUtils.mergeStats(e.getValue())));
}
Is there any way to have the groupBy call do the transformation that i do explicitly in line 2 so i dont have to separately stream over the map?
Update
Here is what mergeStats() looks like:
public static ResponseTimeStats mergeStats(Collection<PassedMetricData> metricDataList)
{
ResponseTimeStats stats = new ResponseTimeStats();
metricDataList.forEach(data -> stats.merge(data.stats));
return stats;
}
If you can rewrite StatUtils.mergeStats into a Collector, you could just write
return scopedMetrics.stream().collect(groupingBy(classifier, mergeStatsCollector));
And even if you can't do this, you could write
return scopedMetrics.stream().collect(groupingBy(classifier,
collectingAndThen(toList(), StatUtils::mergeStats)));
In order to group the PassedMetricData instances, you must consume the entire Stream since, for example, the first and last PassedMetricData might be grouped into the same group.
That's why the grouping must be a terminal operation on the original Stream and you must create a new Stream in order to do the transformation on the results of this grouping.
You could chain these two statements, but it won't make much of a difference :
private static Map<String, ResponseTimeStats> perOperationStats(List<PassedMetricData> scopedMetrics, Function<PassedMetricData, String> classifier)
{
return scopedMetrics.stream()
.collect(groupingBy(classifier)).entrySet().stream()
.collect(toMap(Map.Entry::getKey, e -> StatUtils.mergeStats(e.getValue())));
}
This question already has answers here:
How to convert List to Map?
(20 answers)
Closed 7 years ago.
I would like to find a way to take the object specific routine below and abstract it into a method that you can pass a class, list, and fieldname to get back a Map.
If I could get a general pointer on the pattern used or , etc that could get me started in the right direction.
Map<String,Role> mapped_roles = new HashMap<String,Role>();
List<Role> p_roles = (List<Role>) c.list();
for (Role el : p_roles) {
mapped_roles.put(el.getName(), el);
}
to this? (Pseudo code)
Map<String,?> MapMe(Class clz, Collection list, String methodName)
Map<String,?> map = new HashMap<String,?>();
for (clz el : list) {
map.put(el.methodName(), el);
}
is it possible?
Using Guava (formerly Google Collections):
Map<String,Role> mappedRoles = Maps.uniqueIndex(yourList, Functions.toStringFunction());
Or, if you want to supply your own method that makes a String out of the object:
Map<String,Role> mappedRoles = Maps.uniqueIndex(yourList, new Function<Role,String>() {
public String apply(Role from) {
return from.getName(); // or something else
}});
Here's what I would do. I am not entirely sure if I am handling generics right, but oh well:
public <T> Map<String, T> mapMe(Collection<T> list) {
Map<String, T> map = new HashMap<String, T>();
for (T el : list) {
map.put(el.toString(), el);
}
return map;
}
Just pass a Collection to it, and have your classes implement toString() to return the name. Polymorphism will take care of it.
Java 8 streams and method references make this so easy you don't need a helper method for it.
Map<String, Foo> map = listOfFoos.stream()
.collect(Collectors.toMap(Foo::getName, Function.identity()));
If there may be duplicate keys, you can aggregate the values with the toMap overload that takes a value merge function, or you can use groupingBy to collect into a list:
//taken right from the Collectors javadoc
Map<Department, List<Employee>> byDept = employees.stream()
.collect(Collectors.groupingBy(Employee::getDepartment));
As shown above, none of this is specific to String -- you can create an index on any type.
If you have a lot of objects to process and/or your indexing function is expensive, you can go parallel by using Collection.parallelStream() or stream().parallel() (they do the same thing). In that case you might use toConcurrentMap or groupingByConcurrent, as they allow the stream implementation to just blast elements into a ConcurrentMap instead of making separate maps for each thread and then merging them.
If you don't want to commit to Foo::getName (or any specific method) at the call site, you can use a Function passed in by a caller, stored in a field, etc.. Whoever actually creates the Function can still take advantage of method reference or lambda syntax.
Avoid reflection like the plague.
Unfortunately, Java's syntax for this is verbose. (A recent JDK7 proposal would make it much more consise.)
interface ToString<T> {
String toString(T obj);
}
public static <T> Map<String,T> stringIndexOf(
Iterable<T> things,
ToString<T> toString
) {
Map<String,T> map = new HashMap<String,T>();
for (T thing : things) {
map.put(toString.toString(thing), thing);
}
return map;
}
Currently call as:
Map<String,Thing> map = stringIndexOf(
things,
new ToString<Thing>() { public String toString(Thing thing) {
return thing.getSomething();
}
);
In JDK7, it may be something like:
Map<String,Thing> map = stringIndexOf(
things,
{ thing -> thing.getSomething(); }
);
(Might need a yield in there.)
Using reflection and generics:
public static <T> Map<String, T> MapMe(Class<T> clz, Collection<T> list, String methodName)
throws Exception{
Map<String, T> map = new HashMap<String, T>();
Method method = clz.getMethod(methodName);
for (T el : list){
map.put((String)method.invoke(el), el);
}
return map;
}
In your documentation, make sure you mention that the return type of the method must be a String. Otherwise, it will throw a ClassCastException when it tries to cast the return value.
If you're sure that each object in the List will have a unique index, use Guava with Jorn's suggestion of Maps.uniqueIndex.
If, on the other hand, more than one object may have the same value for the index field (which, while not true for your specific example perhaps, is true in many use cases for this sort of thing), the more general way do this indexing is to use Multimaps.index(Iterable<V> values, Function<? super V,K> keyFunction) to create an ImmutableListMultimap<K,V> that maps each key to one or more matching values.
Here's an example that uses a custom Function that creates an index on a specific property of an object:
List<Foo> foos = ...
ImmutableListMultimap<String, Foo> index = Multimaps.index(foos,
new Function<Foo, String>() {
public String apply(Foo input) {
return input.getBar();
}
});
// iterate over all Foos that have "baz" as their Bar property
for (Foo foo : index.get("baz")) { ... }