Background
I am writing an OpenRewrite recipe to add some comments to the Java code. To avoid inserting the comment at unnecessary points, I have written the following code (it detects the already existing comment) and it worked properly:
public class JtestSuppressDelombokVisitor extends JavaIsoVisitor<ExecutionContext> {
#Override
public MethodDeclaration visitMethodDeclaration(MethodDeclaration methodDecl, ExecutionContext context) {
// (snip)
Iterator<Comment> it = methodDecl.getPrefix().getComments().iterator();
boolean alreadyHasSuppressComment = false;
while (it.hasNext()) {
Comment comment = it.next();
PrintOutputCapture<String> p = new PrintOutputCapture<String>("");
comment.printComment(this.getCursor(), p);
if (p.out.toString().matches(".*parasoft-begin-suppress\sALL.*")) {
alreadyHasSuppressComment = true;
break;
}
}
// (snip)
return methodDecl;
}
}
Problem
I have tried to refactor the code above with the Stream API. The code needs the result of this.getCursor() in the process, but I couldn't find the way to pass it to the instance of Predicate:
boolean alreadyHasSuppressComment = methodDecl.getPrefix().getComments().stream()
.anyMatch(new Predicate<Comment>() {
#Override
public boolean test(Comment comment) {
PrintOutputCapture<String> p = new PrintOutputCapture<String>("");
comment.printComment(this.getCursor(), p); // <- Can't call `this.getCursor()` on the `JtestSuppressDelombokVisitor` class in the `Predicate`
return p.out.toString().matches(".*parasoft-begin-suppress\sALL.*");
}
});
Question
Is there any way to pass the object other than the object on the stream from outside to Predicate?
Or, it is impossible to write such a code with the Stream API?
You need to specify the outer class because the sole this keyword refers to the implemented anonymous Predicate class.
comment.printComment(JtestSuppressDelombokVisitor.this.getCursor(), p);
Related
the usage of method references as listeners in an observer pattern does not work.
Example:
public class ObserverWithMethodReferenceAsListenerTest {
class ListenerCurator {
private final Set<Consumer<String>> listeners = new HashSet<>();
public boolean register(final Consumer<String> consumer) {
return this.listeners.add(consumer);
}
public boolean unregister(final Consumer<String> consumer) {
return this.listeners.remove(consumer);
}
public int getListenersCount() {
return this.listeners.size();
}
}
class MyListenerLeaks {
public void theListener(final String someString) {
// the listener
}
}
class MyListenerWorks {
public Consumer<String> consumer = str -> {
theListener(str);
};
public void theListener(final String someString) {
// the listener
}
}
#Test
public void testListenerLeak() {
ListenerCurator lc = new ListenerCurator();
MyListenerLeaks ml = new MyListenerLeaks();
lc.register(ml::theListener);
Assert.assertEquals(1, lc.getListenersCount());
lc.register(ml::theListener);
// expected 1 but there are 2 listeners
lc.unregister(ml::theListener);
// there are 2 listeners registered here
}
#Test
public void testListenerWorks() {
ListenerCurator lc = new ListenerCurator();
MyListenerWorks ml = new MyListenerWorks();
lc.register(ml.consumer);
Assert.assertEquals(1, lc.getListenersCount());
lc.register(ml.consumer);
Assert.assertEquals(1, lc.getListenersCount());
lc.unregister(ml.consumer);
Assert.assertEquals(0, lc.getListenersCount());
}
}
Conclusion: each referencing of the listener method with ml::theListener generates a new object id for the reference? Right? Therefore there a multiple listeners registered and cannot be removed individually?
The MyListenerWorks class uses a member with a "constant" object id and works. Is there another workaround for this? Are my assumptions correct?
After I added some breakpoints to the HashSet#add and remove function.
I got some results for your questions in the images below:
1. each referencing of the listener method with ml::theListener generates a new object id for the reference? Right?
Ans: No. It would generate a new memory address into the HashSet. There would not be an object id. So in the test function:testListenerLeak, you cannot remove the listener correspondingly. Since you didn't get the listeners from the set before you remove it.
2. The MyListenerWorks class uses a member with a "constant" object id and works. Is there another workaround for this? Are my assumptions correct?
You could take a look of the Observer pattern in Spring, Vue, or some other famous project. they have something similar to what you want. But mostly I have ever read about this pattern is in the Event-driven model. They use the "instanceOf" to check the subclasses and their superclass.
From the Oracle documentation on Method References:
Method references enable you to do this; they are compact, easy-to-read lambda expressions for methods that already have a name.
A method reference is not a constant.
I would like to create an array_agg UDF for Apache Drill to be able to aggregate all values of a group to a list of values.
This should work with any major types (required, optional) and minor types (varchar, dict, map, int, etc.)
However, I get the impression that Apache Drill's UDF API does not really make use of inheritance and generics. Each type has its own writer and handler, and they cannot be abstracted to handle any type. E.g., the ValueHolder interface seems to be purely cosmetic and cannot be used to have type-agnostic hooking of UDFs to any type.
My current implementation
I tried to solve this by using Java's reflection so I could use the ListHolder's write function independent of the holder of the original value.
However, I then ran into the limitations of the #FunctionTemplate annotation.
I cannot create a general UDF annotation for any value (I tried it with the interface ValueHolder: #param ValueHolder input.
So to me it seems like the only way to support different types to have separate classes for each type. But I can't even abstract much and work on any #Param input, because input is only visible in the class where its defined (i.e. type specific).
I based my implementation on https://issues.apache.org/jira/browse/DRILL-6963
and created the following two classes for required and optional varchars (how can this be unified in the first place?)
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class VarChar_Agg implements DrillAggFunc {
#Param org.apache.drill.exec.expr.holders.VarCharHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj;
listWriter.varChar().write(input);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class NullableVarChar_Agg implements DrillAggFunc {
#Param NullableVarCharHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
if (input.isSet != 1) {
return;
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj;
org.apache.drill.exec.expr.holders.VarCharHolder outHolder = new org.apache.drill.exec.expr.holders.VarCharHolder();
outHolder.start = input.start;
outHolder.end = input.end;
outHolder.buffer = input.buffer;
listWriter.varChar().write(outHolder);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
Interestingly, I can't import org.apache.drill.exec.vector.complex.writer.BaseWriter to make the whole thing easier because then Apache Drill would not find it.
So I have to put the entire package path for everything in org.apache.drill.exec.vector.complex.writer in the code.
Furthermore, I'm using the depcreated ObjectHolder. Any better solution?
Anyway: These work so far, e.g. with this query:
SELECT
MIN(tbl.`timestamp`) AS start_view,
MAX(tbl.`timestamp`) AS end_view,
array_agg(tbl.eventLabel) AS label_agg
FROM `dfs.root`.`/path/to/avro/folder` AS tbl
WHERE tbl.data.slug IS NOT NULL
GROUP BY tbl.data.slug
however, when I use ORDER BY, I get this:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: UnsupportedOperationException: NULL
Fragment 0:0
Additionally, I tried more complex types, namely maps/dicts.
Interestingly, when I call SELECT sqlTypeOf(tbl.data) FROM tbl, I get MAP.
But when I write UDFs, the query planner complains about having no UDF array_agg for type dict.
Anyway, I wrote a version for dicts:
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class Map_Agg implements DrillAggFunc {
#Param MapHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
//listWriter.copyReader(input.reader);
input.reader.copyAsValue(listWriter);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class Dict_agg implements DrillAggFunc {
#Param DictHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
//listWriter.copyReader(input.reader);
input.reader.copyAsValue(listWriter);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
But here, I get an empty list in the field data_agg for my query:
SELECT
MIN(tbl.`timestamp`) AS start_view,
MAX(tbl.`timestamp`) AS end_view,
array_agg(tbl.data) AS data_agg
FROM `dfs.root`.`/path/to/avro/folder` AS tbl
GROUP BY tbl.data.viewSlag
Summary of questions
Most importantly: How do I create an array_agg UDF for Apache Drill?
How to make UDFs type-agnostic/general purpose? Do I really have to implement an entire class for each Nullable, Required and Repeated version of all types? That's a lot to do and quite tedious. Isn't there a way to handle values in an UDF agnostic to the underlying types?
I wish Apache Drill would just use what Java offers here with function generic types, specialised function overloading and inheritence of their own type system. Am I missing something on how to do that?
How can I fix the NULL problem when I use ORDER BY on my varchar version of the aggregate?
How can I fix the problem where my aggregate of maps/dicts is an empty list?
Is there an alternative to using the deprecated ObjectHolder?
To answer your question, unfortunately you've run into one of the limits of the Drill Aggregate UDF API which is that it can only return simple data types.1 It would be a great improvement to Drill to fix this, but that is the current status. If you're interested in discussing that further, please start a thread on the Drill user group and/or slack channel. I don't think it is impossible, but it would require some modification to the Drill internals. IMHO it would be well worth it because there are a few other UDFs that I'd like to implement that need this feature.
The second part of your question is how to make UDFs type agnostic and once again... you've found yet another bit of ugliness in the UDF API. :-) If you do some digging in the codebase, you'll see that most of the Math functions have versions that accept FLOAT, INT etc..
Regarding the aggregate of null or empty lists. I actually have some good news here... The current way of doing that is to provide two versions of the function, one which accepts regular holders and the second which accepts nullable holders and returns an empty list or map if the inputs are null. Yes, this sucks, but the additional good news is that I'm working on cleaning this up and hopefully will have a PR submitted soon that will eliminate the need to do this.
Regarding the ObjectHolder, I wrote a median function that uses a few Stacks to compute a streaming median and I used the ObjectHolder for that. I think it will be with us for some time as there is no alternative at the moment.
I hope this answers your questions.
Is there a way to check if all objects in a list have the same attribute with Google Guava API?
Moreover, is there a way to send more parameters to Predicate?
Let's said I want to filter all my objects with string that I am getting from the user,
and I want the Predicate to use this parameter when applying the filter.
You can create your own predicate as follows:
class MyPredicate implements Predicate<MyObject> {
private final String parameter;
public MyPredicate(String parameter) {this.parameter = parameter;}
boolean apply(MyObject input) {
// apply predicate using parameter.
}
}
You can then filter by doing:
Iterables.filter(myIterable, new MyPredicate(myParameter));
You should be wary though that this performs a lazy filter.
Is there a way to check if all objects in a list have the same attribute with Google Guava API?
Yes:
Foo first = list.get(0).getFoo();
boolean allSameFoo = Iterables.all(list, element -> element.getFoo().equals(first));
Or, if you're not using Java 8 yet:
final Foo first = list.get(0).getFoo();
boolean allSameFoo = Iterables.all(list, new Predicate<Bar> {
#Override
public boolean apply(Bar element) {
return element.getFoo().equals(first);
}
});
is there a way to send more parameters to Predicate
Yes:
String s = getFromUser();
Iterables.filter(list, element -> element.getFoo().equals(s));
Or, if you're not using Java 8 yet:
final String s = getFromUser();
Iterables.filter(list, new Predicate<Bar> {
#Override
public boolean apply(Bar element) {
return element.getFoo().equals(s);
}
});
It seems you don't really know how inner classes work, so you should read the tutorial about them: https://docs.oracle.com/javase/tutorial/java/javaOO/nested.html.
With the introduction of generics, I am reluctant to perform instanceof or casting as much as possible. But I don't see a way around it in this scenario:
for (CacheableObject<ICacheable> cacheableObject : cacheableObjects) {
ICacheable iCacheable = cacheableObject.getObject();
if (iCacheable instanceof MyObject) {
MyObject myObject = (MyObject) iCacheable;
myObjects.put(myObject.getKey(), myObject);
} else if (iCacheable instanceof OtherObject) {
OtherObject otherObject = (OtherObject) iCacheable;
otherObjects.put(otherObject.getKey(), otherObject);
}
}
In the above code, I know that my ICacheables should only ever be instances of MyObject, or OtherObject, and depending on this I want to put them into 2 separate maps and then perform some processing further down.
I'd be interested if there is another way to do this without my instanceof check.
Thanks
You could use double invocation. No promises it's a better solution, but it's an alternative.
Code Example
import java.util.HashMap;
public class Example {
public static void main(String[] argv) {
Example ex = new Example();
ICacheable[] cacheableObjects = new ICacheable[]{new MyObject(), new OtherObject()};
for (ICacheable iCacheable : cacheableObjects) {
// depending on whether the object is a MyObject or an OtherObject,
// the .put(Example) method will double dispatch to either
// the put(MyObject) or put(OtherObject) method, below
iCacheable.put(ex);
}
System.out.println("myObjects: "+ex.myObjects.size());
System.out.println("otherObjects: "+ex.otherObjects.size());
}
private HashMap<String, MyObject> myObjects = new HashMap<String, MyObject>();
private HashMap<String, OtherObject> otherObjects = new HashMap<String, OtherObject>();
public Example() {
}
public void put(MyObject myObject) {
myObjects.put(myObject.getKey(), myObject);
}
public void put(OtherObject otherObject) {
otherObjects.put(otherObject.getKey(), otherObject);
}
}
interface ICacheable {
public String getKey();
public void put(Example ex);
}
class MyObject implements ICacheable {
public String getKey() {
return "MyObject"+this.hashCode();
}
public void put(Example ex) {
ex.put(this);
}
}
class OtherObject implements ICacheable {
public String getKey() {
return "OtherObject"+this.hashCode();
}
public void put(Example ex) {
ex.put(this);
}
}
The idea here is that - instead of casting or using instanceof - you call the iCacheable object's .put(...) method which passes itself back to the Example object's overloaded methods. Which method is called depends on the type of that object.
See also the Visitor pattern. My code example smells because the ICacheable.put(...) method is incohesive - but using the interfaces defined in the Visitor pattern can clean up that smell.
Why can't I just call this.put(iCacheable) from the Example class?
In Java, overriding is always bound at runtime, but overloading is a little more complicated: dynamic dispatching means that the implementation of a method will be chosen at runtime, but the method's signature is nonetheless determined at compile time. (Check out the Java Language Specification, Chapter 8.4.9 for more info, and also check out the puzzler "Making a Hash of It" on page 137 of the book Java Puzzlers.)
Is there no way to combine the cached objects in each map into one map? Their keys could keep them separated so you could store them in one map. If you can't do that then you could have a
Map<Class,Map<Key,ICacheable>>
then do this:
Map<Class,Map<Key,ICacheable>> cache = ...;
public void cache( ICacheable cacheable ) {
if( cache.containsKey( cacheable.getClass() ) {
cache.put( cacheable.getClass(), new Map<Key,ICacheable>() );
}
cache.get(cacheable.getClass()).put( cacheable.getKey(), cacheable );
}
You can do the following:
Add a method to your ICachableInterface interface that will handle placing the object into one of two Maps, given as arguments to the method.
Implement this method in each of your two implementing classes, having each class decide which Map to put itself in.
Remove the instanceof checks in your for loop, and replace the put method with a call to the new method defined in step 1.
This is not a good design, however, because if you ever have another class that implements this interface, and a third map, then you'll need to pass another Map to your new method.
This question already has answers here:
How to call a method stored in a HashMap? (Java) [duplicate]
(3 answers)
Closed 8 years ago.
I have read this question and I'm still not sure whether it is possible to keep pointers to methods in an array in Java. If anyone knows if this is possible (or not), it would be a real help. I'm trying to find an elegant solution of keeping a list of Strings and associated functions without writing a mess of hundreds of if statements.
Cheers
Java doesn't have a function pointer per se (or "delegate" in C# parlance). This sort of thing tends to be done with anonymous subclasses.
public interface Worker {
void work();
}
class A {
void foo() { System.out.println("A"); }
}
class B {
void bar() { System.out.println("B"); }
}
A a = new A();
B b = new B();
Worker[] workers = new Worker[] {
new Worker() { public void work() { a.foo(); } },
new Worker() { public void work() { b.bar(); } }
};
for (Worker worker : workers) {
worker.work();
}
You can achieve the same result with the functor pattern. For instance, having an abstract class:
abstract class Functor
{
public abstract void execute();
}
Your "functions" would be in fact the execute method in the derived classes. Then you create an array of functors and populate it with the apropriated derived classes:
class DoSomething extends Functor
{
public void execute()
{
System.out.println("blah blah blah");
}
}
Functor [] myArray = new Functor[10];
myArray[5] = new DoSomething();
And then you can invoke:
myArray[5].execute();
It is possible, you can use an array of Method. Grab them using the Reflection API (edit: they're not functions since they're not standalone and have to be associated with a class instance, but they'd do the job -- just don't expect something like closures)
Java does not have pointers (only references), nor does it have functions (only methods), so it's doubly impossible for it to have pointers to functions. What you can do is define an interface with a single method in it, have your classes that offer such a method declare they implement said interface, and make a vector with references to such an interface, to be populated with references to the specific objects on which you want to call that method. The only constraint, of course, is that all the methods must have the same signature (number and type of arguments and returned values).
Otherwise, you can use reflection/introspection (e.g. the Method class), but that's not normally the simplest, most natural approach.
I found the reflection approach the cleanest -- I added a twist to this solution since most production classes have nested classes and I didn't see any examples that demonstrates this (but I didn't look for very long either). My reason for using reflection is that my "updateUser()" method below had a bunch of redundant code and just one line that changed (for every field in the user object) in the middle that updated the user object:
NameDTO.java
public class NameDTO {
String first, last;
public String getFirst() {
return first;
}
public void setFirst(String first) {
this.first = first;
}
public String getLast() {
return last;
}
public void setLast(String last) {
this.last = last;
}
}
UserDTO.java
public class UserDTO {
private NameDTO name;
private Boolean honest;
public UserDTO() {
name = new NameDTO();
honest = new Boolean(false);
}
public NameDTO getName() {
return name;
}
public void setName(NameDTO name) {
this.name = name;
}
public Boolean getHonest() {
return honest;
}
public void setHonest(Boolean honest) {
this.honest = honest;
}
}
Example.java
import java.lang.reflect.Method;
public class Example {
public Example () {
UserDTO dto = new UserDTO();
try {
Method m1 = dto.getClass().getMethod("getName", null);
NameDTO nameDTO = (NameDTO) m1.invoke(dto, null);
Method m2 = nameDTO.getClass().getMethod("setFirst", String.class);
updateUser(m2, nameDTO, "Abe");
m2 = nameDTO.getClass().getMethod("setLast", String.class);
updateUser(m2, nameDTO, "Lincoln");
m1 = dto.getClass().getMethod("setHonest", Boolean.class);
updateUser(m1, dto, Boolean.TRUE);
System.out.println (dto.getName().getFirst() + " " + dto.getName().getLast() + ": honest=" + dto.getHonest().toString());
} catch (Exception e) {
e.printStackTrace();
}
}
public void updateUser(Method m, Object o, Object v) {
// lots of code here
try {
m.invoke(o, v);
} catch (Exception e) {
e.printStackTrace();
}
// lots of code here -- including a retry loop to make sure the
// record hadn't been written since my last read
}
public static void main(String[] args) {
Example mp = new Example();
}
}
You are right that there are no pointers in java because a reference variables are the same as the & syntax in C/C++ holding the reference to the object but no * because the JVM can reallocate the heap when necessary causing the pointer to be lost from the address which would cause a crash. But a method is just a function inside a class object and no more than that so you are wrong saying there are no functions, because a method is just a function encapsulated inside an object.
As far as function pointers, the java team endorses the use of interfaces and nested classes which all fine and dandy, but being a C++/C# programmer who uses java from time to time, I use my Delegate class I made for java because I find it more convenient when I need to pass a function only having to declare the return type of the method delegate.
It all depends on the programmer.
I read the white pages on why delegates are not support but I disagree and prefer to think outside the box on that topic.