Implementing a non-parallel Spliterator for unknown size? - java

I'm a little confused by all my research. I have custom interface called TabularResultSet (which I've watered down for the sake of example) which traverses through any data set that is tabular in nature. It has a next() method like an iterator and it can be looping through a QueryResultSet, a tabbed-table from a clipboard, a CSV, etc...
However, I'm trying to create a Spliterator that wraps around my TabularResultSet and easily turns it into a stream. I cannot imagine a safe way to parallelize because the TabularResultSet could be traversing a QueryResultSet, and calling next() concurrently could wreak havoc. The only way I imagine parallelization can be done safely is to have the next() called by a single working thread and it passes the data off to a parallel thread to work on it.
So I think parallelization is not an easy option. How do I just get this thing to stream without parallelizing? Here is my work so far...
public final class SpliteratorTest {
public static void main(String[] args) {
TabularResultSet rs = null; /* instantiate an implementation; */
Stream<TabularResultSet> rsStream = StreamSupport.stream(new TabularSpliterator(rs), false);
}
public static interface TabularResultSet {
public boolean next();
public List<Object> getData();
}
private static final class TabularSpliterator implements Spliterator<TabularResultSet> {
private final TabularResultSet rs;
public TabularSpliterator(TabularResultSet rs) {
this.rs = rs;
}
#Override
public boolean tryAdvance(Consumer<? super TabularResultSet> action) {
action.accept(rs);
return rs.next();
}
#Override
public Spliterator<TabularResultSet> trySplit() {
return null;
}
#Override
public long estimateSize() {
return Long.MAX_VALUE;
}
#Override
public int characteristics() {
return 0;
}
}
}

It's probably easiest to extend Spliterators.AbstractSpliterator. If you do this, you need only implement tryAdvance. This can be turned into a parallel stream; the parallelism comes from the streams implementation calling tryAdvance multiple times, batching up the data it receives, and processing it in different threads.
If TabularResultSet is anything like a JDBC ResultSet, I don't think you want a Spliterator<TabularResultSet> or a Stream<TabularResultSet>. Instead it looks like a TabularResultSet represents an entire tabular data set, so you probably want each spliterator or stream element to represent one row in that table -- the List<Object> that is returned by getData()? If so, you'd want something like the following.
class TabularSpliterator extends Spliterators.AbstractSpliterator<List<Object>> {
private final TabularResultSet rs;
public TabularSpliterator(TabularResultSet rs) {
super(...);
this.rs = rs;
}
#Override public boolean tryAdvance(Consumer<? super List<Object>> action) {
if (rs.next()) {
action.accept(rs.getData());
return true;
} else {
return false;
}
}
}
Then you can turn an instance of this spliterator into a stream by calling StreamSupport.stream().
Note: in general, a Spliterator instance is not called from multiple threads and need not even be thread-safe. See the Spliterator class documentation at the paragraph beginning "Despite..." for details.

You're mostly there. All you have to do now is convert your Spliterator into a Stream. You can do that using the StreamSupport.stream(Spliterator, boolean) method. The boolean parameter is a flag for whether you want to do parallel streaming or not (you would want false, for not parallel)
If your TabularResultSet implemented Iterator, you could use the Spliterators.spliteratorUnknownSize() method to convert the Iterator into a Spliterator which basically does what the code you have above does.
Not sure if it's worth adding characteristics but you might want to consider
Spliterator.IMMUTABLE| Spliterator.ORDERED | Spliterator.NONNULL
good luck

Related

Apache Drill: Write general-purpose array_agg UDF

I would like to create an array_agg UDF for Apache Drill to be able to aggregate all values of a group to a list of values.
This should work with any major types (required, optional) and minor types (varchar, dict, map, int, etc.)
However, I get the impression that Apache Drill's UDF API does not really make use of inheritance and generics. Each type has its own writer and handler, and they cannot be abstracted to handle any type. E.g., the ValueHolder interface seems to be purely cosmetic and cannot be used to have type-agnostic hooking of UDFs to any type.
My current implementation
I tried to solve this by using Java's reflection so I could use the ListHolder's write function independent of the holder of the original value.
However, I then ran into the limitations of the #FunctionTemplate annotation.
I cannot create a general UDF annotation for any value (I tried it with the interface ValueHolder: #param ValueHolder input.
So to me it seems like the only way to support different types to have separate classes for each type. But I can't even abstract much and work on any #Param input, because input is only visible in the class where its defined (i.e. type specific).
I based my implementation on https://issues.apache.org/jira/browse/DRILL-6963
and created the following two classes for required and optional varchars (how can this be unified in the first place?)
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class VarChar_Agg implements DrillAggFunc {
#Param org.apache.drill.exec.expr.holders.VarCharHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj;
listWriter.varChar().write(input);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class NullableVarChar_Agg implements DrillAggFunc {
#Param NullableVarCharHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
if (input.isSet != 1) {
return;
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj;
org.apache.drill.exec.expr.holders.VarCharHolder outHolder = new org.apache.drill.exec.expr.holders.VarCharHolder();
outHolder.start = input.start;
outHolder.end = input.end;
outHolder.buffer = input.buffer;
listWriter.varChar().write(outHolder);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
Interestingly, I can't import org.apache.drill.exec.vector.complex.writer.BaseWriter to make the whole thing easier because then Apache Drill would not find it.
So I have to put the entire package path for everything in org.apache.drill.exec.vector.complex.writer in the code.
Furthermore, I'm using the depcreated ObjectHolder. Any better solution?
Anyway: These work so far, e.g. with this query:
SELECT
MIN(tbl.`timestamp`) AS start_view,
MAX(tbl.`timestamp`) AS end_view,
array_agg(tbl.eventLabel) AS label_agg
FROM `dfs.root`.`/path/to/avro/folder` AS tbl
WHERE tbl.data.slug IS NOT NULL
GROUP BY tbl.data.slug
however, when I use ORDER BY, I get this:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: UnsupportedOperationException: NULL
Fragment 0:0
Additionally, I tried more complex types, namely maps/dicts.
Interestingly, when I call SELECT sqlTypeOf(tbl.data) FROM tbl, I get MAP.
But when I write UDFs, the query planner complains about having no UDF array_agg for type dict.
Anyway, I wrote a version for dicts:
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class Map_Agg implements DrillAggFunc {
#Param MapHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
//listWriter.copyReader(input.reader);
input.reader.copyAsValue(listWriter);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class Dict_agg implements DrillAggFunc {
#Param DictHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
//listWriter.copyReader(input.reader);
input.reader.copyAsValue(listWriter);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
But here, I get an empty list in the field data_agg for my query:
SELECT
MIN(tbl.`timestamp`) AS start_view,
MAX(tbl.`timestamp`) AS end_view,
array_agg(tbl.data) AS data_agg
FROM `dfs.root`.`/path/to/avro/folder` AS tbl
GROUP BY tbl.data.viewSlag
Summary of questions
Most importantly: How do I create an array_agg UDF for Apache Drill?
How to make UDFs type-agnostic/general purpose? Do I really have to implement an entire class for each Nullable, Required and Repeated version of all types? That's a lot to do and quite tedious. Isn't there a way to handle values in an UDF agnostic to the underlying types?
I wish Apache Drill would just use what Java offers here with function generic types, specialised function overloading and inheritence of their own type system. Am I missing something on how to do that?
How can I fix the NULL problem when I use ORDER BY on my varchar version of the aggregate?
How can I fix the problem where my aggregate of maps/dicts is an empty list?
Is there an alternative to using the deprecated ObjectHolder?
To answer your question, unfortunately you've run into one of the limits of the Drill Aggregate UDF API which is that it can only return simple data types.1 It would be a great improvement to Drill to fix this, but that is the current status. If you're interested in discussing that further, please start a thread on the Drill user group and/or slack channel. I don't think it is impossible, but it would require some modification to the Drill internals. IMHO it would be well worth it because there are a few other UDFs that I'd like to implement that need this feature.
The second part of your question is how to make UDFs type agnostic and once again... you've found yet another bit of ugliness in the UDF API. :-) If you do some digging in the codebase, you'll see that most of the Math functions have versions that accept FLOAT, INT etc..
Regarding the aggregate of null or empty lists. I actually have some good news here... The current way of doing that is to provide two versions of the function, one which accepts regular holders and the second which accepts nullable holders and returns an empty list or map if the inputs are null. Yes, this sucks, but the additional good news is that I'm working on cleaning this up and hopefully will have a PR submitted soon that will eliminate the need to do this.
Regarding the ObjectHolder, I wrote a median function that uses a few Stacks to compute a streaming median and I used the ObjectHolder for that. I think it will be with us for some time as there is no alternative at the moment.
I hope this answers your questions.

Spring Batch : Write a List to a database table using a custom batch size

Background
I have a Spring Batch job where :
FlatFileItemReader - Reads one row at a time from the file
ItemProcesor - Transforms the row from the file into a List<MyObject> and returns the List. That is, each row in the file is broken down into a List<MyObject> (1 row in file transformed to many output rows).
ItemWriter - Writes the List<MyObject> to a database table. (I used this
implementation to unpack the list received from the processor
and delegae to a JdbcBatchItemWriter)
Question
At point 2) The processor can return a List of 100000 MyObject instances.
At point 3), The delegate JdbcBatchItemWriter will end up writing the entire List with 100000 objects to the database.
My question is : The JdbcBatchItemWriter does not allow a custom batch size. For all practical purposes, the batch-size = commit-interval for the step. With this in mind, is there another implementation of an ItemWriter available in Spring Batch that allows writing to the database and allows configurable batch size? If not, how do go about writing a custom writer myself to acheive this?
I see no obvious way to set the batch size on the JdbcBatchItemWriter. However, you can extend the writer and use a custom BatchPreparedStatementSetter to specify the batch size. Here is a quick example:
public class MyCustomWriter<T> extends JdbcBatchItemWriter<T> {
#Override
public void write(List<? extends T> items) throws Exception {
namedParameterJdbcTemplate.getJdbcOperations().batchUpdate("your sql", new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
// set values on your sql
}
#Override
public int getBatchSize() {
return items.size(); // or any other value you want
}
});
}
}
The StagingItemWriter in the samples is an example of how to use a custom BatchPreparedStatementSetter as well.
The answer from Mahmoud Ben Hassine and the comments pretty much covers all aspects of the solution and is the accepted answer.
Here is the implementation I used if anyone is interested :
public class JdbcCustomBatchSizeItemWriter<W> extends JdbcDaoSupport implements ItemWriter<W> {
private int batchSize;
private ParameterizedPreparedStatementSetter<W> preparedStatementSetter;
private String sqlFileLocation;
private String sql;
public void initReader() {
this.setSql(FileUtilties.getFileContent(sqlFileLocation));
}
public void write(List<? extends W> arg0) throws Exception {
getJdbcTemplate().batchUpdate(sql, Collections.unmodifiableList(arg0), batchSize, preparedStatementSetter);
}
public void setBatchSize(int batchSize) {
this.batchSize = batchSize;
}
public void setPreparedStatementSetter(ParameterizedPreparedStatementSetter<W> preparedStatementSetter) {
this.preparedStatementSetter = preparedStatementSetter;
}
public void setSqlFileLocation(String sqlFileLocation) {
this.sqlFileLocation = sqlFileLocation;
}
public void setSql(String sql) {
this.sql = sql;
}
}
Note :
The use of Collections.unmodifiableList prevents the need for any explicit casting.
I use sqlFileLocation to specify an external file that contains the sql and FileUtilities.getfileContents simply returns the contents of this sql file. This can be skipped and one can directly pass the sql to the class as well while creating the bean.
I wouldn't do this. It presents issues for restartability. Instead, modify your reader to produce individual items rather than having your processor take in an object and return a list.

Check if java stream has been consumed

How can I check if a stream instance has been consumed or not (meaning having called a terminal operation such that any further call to a terminal operation may fail with IllegalStateException: stream has already been operated upon or closed.?
Ideally I want a method that does not consume the stream if it has not yet been consumed, and that returns a boolean false if the stream has been consumed without catching an IllegalStateException from a stream method (because using Exceptions for control flow is expensive and error prone, in particular when using standard Exceptions).
A method similar to hasNext() in Iterator in the exception throwing and boolean return behavior (though without the contract to next()).
Example:
public void consume(java.util.function.Consumer<Stream<?>> consumer, Stream<?> stream) {
consumer.accept(stream);
// defensive programming, check state
if (...) {
throw new IllegalStateException("consumer must call terminal operation on stream");
}
}
The goal is to fail early if client code calls this method without consuming the stream.
It seems there is no method to do that and I'd have to add a try-catch block calling any terminal operation like iterator(), catch an exception and throw a new one.
An acceptable answer can also be "No solution exists" with a good justification of why the specification could not add such a method (if a good justification exists). It seems that the JDK streams usually have this snippets at the start of their terminal methods:
// in AbstractPipeline.java
if (linkedOrConsumed)
throw new IllegalStateException(MSG_STREAM_LINKED);
So for those streams, an implementation of such a method would not seem that difficult.
Taking into consideration that spliterator (for example) is a terminal operation, you can simply create a method like:
private static <T> Optional<Stream<T>> isConsumed(Stream<T> stream) {
Spliterator<T> spliterator;
try {
spliterator = stream.spliterator();
} catch (IllegalStateException ise) {
return Optional.empty();
}
return Optional.of(StreamSupport.stream(
() -> spliterator,
spliterator.characteristics(),
stream.isParallel()));
}
I don't know of a better way to do it... And usage would be:
Stream<Integer> ints = Stream.of(1, 2, 3, 4)
.filter(x -> x < 3);
YourClass.isConsumed(ints)
.ifPresent(x -> x.forEachOrdered(System.out::println));
Since I don't think there is a practical reason to return an already consumed Stream, I am returning Optional.empty() instead.
One solution could be to add an intermediate operation (e.g. filter()) to the stream before passing it to the consumer. In that operation you do nothing but saving the state, that the operation was called (e.g. with an AtomicBoolean):
public <T> void consume(Consumer<Stream<T>> consumer, Stream<T> stream) {
AtomicBoolean consumed = new AtomicBoolean(false);
consumer.accept(stream.filter(i -> {
consumed.set(true);
return true;
}));
if (!consumed.get()) {
throw new IllegalStateException("consumer must call terminal operation on stream");
}
}
Side Note: Do not use peek() for this, because it is not called with short-circuiting terminal operations (like findAny()).
Here is a standalone compilable solution that uses a delegating custom Spliterator<T> implementation + an AtomicBoolean to accomplish what you seek without losing thread-safety or affecting the parallelism of a Stream<T>.
The main entry is the Stream<T> track(Stream<T> input, Consumer<Stream<T>> callback) function - you can do whatever you want in the callback function. I first tinkered with a delegating Stream<T> implementation but it's just too big an interface to delegate without any issues (see my code comment, even Spliterator<T> has its caveats when delegating):
import java.util.Spliterator;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.function.Consumer;
import java.util.stream.IntStream;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
class StackOverflowQuestion56927548Scratch {
private static class TrackingSpliterator<T> implements Spliterator<T> {
private final AtomicBoolean tracker;
private final Spliterator<T> delegate;
private final Runnable callback;
public TrackingSpliterator(Stream<T> forStream, Runnable callback) {
this(new AtomicBoolean(true), forStream.spliterator(), callback);
}
private TrackingSpliterator(
AtomicBoolean tracker,
Spliterator<T> delegate,
Runnable callback
) {
this.tracker = tracker;
this.delegate = delegate;
this.callback = callback;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
boolean advanced = delegate.tryAdvance(action);
if(tracker.compareAndSet(true, false)) {
callback.run();
}
return advanced;
}
#Override
public Spliterator<T> trySplit() {
Spliterator<T> split = this.delegate.trySplit();
//may return null according to JavaDoc
if(split == null) {
return null;
}
return new TrackingSpliterator<>(tracker, split, callback);
}
#Override
public long estimateSize() {
return delegate.estimateSize();
}
#Override
public int characteristics() {
return delegate.characteristics();
}
}
public static <T> Stream<T> track(Stream<T> input, Consumer<Stream<T>> callback) {
return StreamSupport.stream(
new TrackingSpliterator<>(input, () -> callback.accept(input)),
input.isParallel()
);
}
public static void main(String[] args) {
//some big stream to show it works correctly when parallelized
Stream<Integer> stream = IntStream.range(0, 100000000)
.mapToObj(Integer::valueOf)
.parallel();
Stream<Integer> trackedStream = track(stream, s -> System.out.println("consume"));
//dummy consume
System.out.println(trackedStream.anyMatch(i -> i.equals(-1)));
}
}
Just return the stream of the track function, maybe adapt the callback parameters type (you probably don't need to pass the stream) and you are good to go.
Please note that this implementation only tracks when the stream is actually consumed, calling .count() on a Stream that was produced by e.g. IntStream.range(0,1000) (without any filter steps etc.) will not consume the stream but return the underlying known length of the stream via Spliterator<T>.estimateSize()!

How to log List interface method for existing code

I have existing codebase that sometimes uses ArrayList or LinkedList and I need to find a way to log whenever add or remove is called to track what has been either added or removed.
What is the best way to make sure I have logging in place?
So for example.
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(123);
and
LinkedList<Integer> anotherNewList = new LinkedList<Integer>();
anotherNewList.add(333);
Not sure if I can intercept add method to achieve this or create overriding class that implements java.util.List interface then use it instead. Either way I'm looking for a good solution that requires minimum intervention and prefrerrably without using any third party packages...
I would use the so called Decorator Pattern to wrap your lists.
This would be a simple example code just to give you an idea:
private static class LogDecorator<T> implements Collection<T> {
private final Collection<T> delegate;
private LogDecorator(Collection<T> delegate) {this.delegate = delegate;}
#Override
public int size() {
return delegate.size();
}
#Override
public boolean isEmpty() {
return delegate.isEmpty();
}
#Override
public boolean contains(Object o) {
return delegate.contains(o);
}
#Override
public Iterator<T> iterator() {
return delegate.iterator();
}
#Override
public Object[] toArray() {
return delegate.toArray();
}
#Override
public <T1> T1[] toArray(T1[] a) {
return delegate.toArray(a);
}
#Override
public boolean add(T t) {
// ADD YOUR INTERCEPTING CODE HERE
return delegate.add(t);
}
#Override
public boolean remove(Object o) {
return delegate.remove(o);
}
#Override
public boolean containsAll(Collection<?> c) {
return delegate.containsAll(c);
}
#Override
public boolean addAll(Collection<? extends T> c) {
return delegate.addAll(c);
}
#Override
public boolean removeAll(Collection<?> c) {
return delegate.removeAll(c);
}
#Override
public boolean retainAll(Collection<?> c) {
return delegate.retainAll(c);
}
#Override
public void clear() {
delegate.clear();
}
}
There is not really a simple way to get there.
Those classes are part of the "standard libraries"; so you can't change their behavior. You could create your own versions of them; and use class path ordering to get them used; but this really dirty hack.
The only other option: extend those classes; #Override the methods you want to be logged; and make sure all your sources use your own versions of those classes. Or if you prefer composition over inheritance you go for the decorator pattern; as suggested by JDC's answer.
The "third" option is really different - you turn to aspect oriented programming (for example using AspectJ) and use such tools to manipulate things on a bytecode level. But that adds a whole new layer of "complexity" to your product; thus I am not counting it as real option.
EDIT on your answer: it seems that you don't understand the difference between interface and implementation?! An interface simply describes a set of method signatures; but in order to have real code behind those methods, there needs to be an implementing class. You see, when you do
List<X> things = new ArrayList<>();
the real type of things is ArrayList; but you rarely care about that real type; it is good enough to know that you can all those List methods on things. So, when you create some new implementation of the List interface ... that doesn't affect any existing
... = new ArrayList ...
declarations at all. You would have to change all assignments to
List<X> things = new YourNewListImplementation<>();
JDC has given a good way to follow.
I would like bring important precisions.
The decorator pattern allows to create a class which decorates another class by adding or removing dynamically a new responsibility to an instance.
In your case, you want to add responsibility.
Decorator is not an intrusive pattern but the decorator class have to conform to the class that it decorates.
So in your case, having a decorator which derives from the Collection interface is not conform to the decorated object since List has methods that Collection has not.
Your need is decorating List instances, so decorator should derive from the List type.
Besides, the decorator class can do, according its needs, a processing before and or after the operation of the class that it decorates but it is also responsible to call the original operation of the decorated class.
In your case, you want to know if an element was added or in or removed from the List. To achieve it, as the method result has consequences on whether you log or not the information, it is preferable to delegate first the processing to the decorated object and then your decorator can perform its processings.
Sometimes, you don't need to decorate a method, don't do it but don't forget to delegate suitably to the decorated object.
import java.util.Iterator;
import java.util.List;
public class DecoratorList<T> implements List<T> {
private static final Tracer tracer = ....;
private List<T> decorated;
private DecoratorList(List<T> decorated) {
this.decorated=decorated;
}
// no decorated methods
....
#Override
public int size() {
return this.decorated.size();
}
#Override
public boolean isEmpty() {
return this.decorated.isEmpty();
}
#Override
public boolean contains(Object o) {
return this.decorated.contains(o);
}
#Override
public Iterator<T> iterator() {
return this.decorated.iterator();
}
....
// end no decorated methods
// exemple of decorated methods
#Override
public void add(int index, T element) {
tracer.info("element " + element + " added to index " + index);
this.decorated.add(index,element);
}
#Override
public boolean remove(Object o) {
final boolean isRemoved = this.decorated.remove(o);
if (isRemoved){
tracer.info("element " + o + " removed");
}
return isRemoved;
}
}
As explained, a decorator is not intrusive for the decorated objects.
So the idea is not changing your code that works but add the decorating operation just after the list be instantiated.
If don't program by interface when you declare your list variables, that is you declare ArrayList list = new ArrayList() instead of List list = new ArrayList() , of course you should change the declared type to List but it doesn't break the code, on the contrary.
Here is your example code :
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(123);
LinkedList<Integer> anotherNewList = new LinkedList<Integer>();
anotherNewList.add(333);
Now, you could do it :
List<Integer> list = new ArrayList<Integer>();
list = new DecoratorList<Integer>(list); // line added
list.add(123);
List<Integer> anotherNewList = new LinkedList<Integer>();
anotherNewList = new DecoratorList<Integer>(anotherNewList); // line added
anotherNewList.add(333);
To ease the task and make it safer, you could even create a util method to apply the decoration on the list :
private static <T> List<T> decorateList(List<T> list) {
list = new DecoratorList<T>(list);
return list;
}
and call it like that :
List<Integer> list = new ArrayList<Integer>();
list = decorateList(list); // line added
list.add(123);
You can use Aspects - but it will log every add and remove call:
#Aspect
public class ListLoggerAspect {
#Around("execution(* java.util.List.add(..))")
public boolean aroundAdd(ProceedingJoinPoint joinPoint) throws Throwable {
boolean result = (boolean) joinPoint.proceed(joinPoint.getArgs());
// do the logging
return result;
}
}
You'll need to configure the aspect in META-INF/aop.xml :
<aspectj>
<aspects>
<aspect name="com.example.ListLoggerAspect"/>
</aspects>
</aspectj>
An easy way to accomplish this is wrapping your source list in a ObservableList and use that as base list. You can simply add an listener to this list to catch every modification (and log out if you wish)
Example:
List obs = FXCollections.observableList(myOriginalList);
obs.addListener(c -> {
for(Item it : c.getRemoved())
System.out.println(it);
for(Item it : c.getAddedSubList())
System.out.println(it);
});
See the javafx documentation on how to add a good listener
Your List is the source here. You need to keep track of the changes to the source. This is a good and natural example of the Observer pattern. You can create an Observable which is your list. Then create some Observers and register them to the Observable. When the Observable is changed, notify all the registered Observers. Inside the Observer you can log the changes using the input event. You should literally implement some ObservableCollection here. You can use Java Rx to get this work done. Please find the sample code given below.
package com.test;
import java.util.ArrayList;
import java.util.List;
import rx.Observable;
import rx.subjects.PublishSubject;
public class ObservableListDemo {
public static class ObservableList<T> {
protected final List<T> list;
protected final PublishSubject<T> onAdd;
public ObservableList() {
this.list = new ArrayList<T>();
this.onAdd = PublishSubject.create();
}
public void add(T value) {
list.add(value);
onAdd.onNext(value);
}
public Observable<T> getObservable() {
return onAdd;
}
}
public static void main(String[] args) throws InterruptedException {
ObservableList<Integer> observableList = new ObservableList<>();
observableList.getObservable().subscribe(System.out::println);
observableList.add(1);
Thread.sleep(1000);
observableList.add(2);
Thread.sleep(1000);
observableList.add(3);
}
}
Hope this helps. Happy coding !
We need a little more information to find the right solution. But I see a number of options.
You can track changes, using a decorator.
You can copy the collection and calculate the changes
You can use aspects to 'decorate' every List in the JVM
Change the existing codebase (a little bit)
1) works if you know exactly how the list is used, and once it is returned to your new code, you are the only user. So the existing code can't have any methods that add to the original list (because would invoke add/remove on the delegate instead of the decorated collection).
2) This approach is used when multiple classes can modify the list. You need to be able to get a copy of the list, before any modifications begin, and then calculate what happened afterwards. If you have access to Apache Collections library you can use CollectionUtils to calculate the intersection and disjunction.
3) This solution requires some for of weaving (compile or load time) as this will create a proxy for every List, so it can add callback code around the method calls. I would not recommend this option unless you have a good understanding of how aspects work, as this solution has a rather steep learning curve, and if something goes wrong and you need to debug you code, it can be a bit tricky.
4) You say existing codebase, which leads me to believe, that you could actually change the code if you really wanted. If this is at all possible, that is the approach I would choose. If the user of the List needs to be able to track changes, then the best possible solution is that the library returns a ChangeTrackingList (interface defining methods from tracking), which you could build using decoration.
One thing you have to be aware of when decorating, is that List has a removeAll() and a addAll(), these methods may or may not call the add() and remove(), this depends on the list implementation. If you are not aware of how these methods are invoked internally you could end up seeing an object as removed twice (unless you can use a set).

Threadsafe way of exposing keySet()

This must be a fairly common occurrence where I have a map and wish to thread-safely expose its key set:
public MyClass {
Map<String,String> map = // ...
public final Set<String> keys() {
// returns key set
}
}
Now, if my "map" is not thread-safe, this is not safe:
public final Set<String> keys() {
return map.keySet();
}
And neither is:
public final Set<String> keys() {
return Collections.unmodifiableSet(map.keySet());
}
So I need to create a copy, such as:
public final Set<String> keys() {
return new HashSet(map.keySet());
}
However, this doesn't seem safe either because that constructor traverses the elements of the parameter and add()s them. So while this copying is going on, a ConcurrentModificationException can happen.
So then:
public final Set<String> keys() {
synchronized(map) {
return new HashSet(map.keySet());
}
}
seems like the solution. Does this look right?
That solution isn't particularly helpful unless you plan to also synchronize on the map everywhere it is used. Synchronizing on it doesn't stop someone else from invoking methods on it at the same time. It only stops them from also being able to synchronize on it.
The best solution really seems to be just use ConcurrentHashMap in the first place if you know you need concurrent puts and removes while someone may be iterating. If the concurrency behavior that class offers isn't what you need, you'll probably just need to use a fully synchronized Map.
Good question. I would use Google Guava library. More specifically com.google.common.collect.ImmutableSet.copyOf(Collection<? extends E>) method. In documentation it has been said that this method is thread safe.
Another option would be to use ConcurrentHashMap. Its keySet() is thread safe so there might be no need to synchronize or take a copy.
If you are interested on thread-safe iterator with exact snapshot of elements through out the iteration process then go for the below.
public class ThreadSafeIteratorConcurrentMap
{
private ConcurrentMap<String, String> itrSafeMap = null;
public ThreadSafeIteratorConcurrentCollection() {
itrSafeMap = new ConcurrentHashMap<String, String>
}
public void synchronized put(psConference conference, String p_key)
{
itrSafeMap.putIfAbsent(p_key, conference);
}
public psConference getConference(String p_key)
{
return (itrSafeMap.get(p_key));
}
public void synchronized remove(String p_key)
{
itrSafeMap.remove(p_key);
}
public boolean containsKey(String p_key)
{
return itrSafeMap.containsKey(p_key);
}
// Get the size of the itrSafeMap.
public int size()
{
return itrSafeMap.size();
}
public Iterator<String> valueIterator()
{
return (itrSafeMap.values().iterator());
}
public Iterator<String> keyIterator()
{
return (itrSafeMap.keySet().iterator());
}
}
Then where ever you want thread safe iterator with exact snapshot of elements; then use it in synchronized block like below.
synchronized(threadSafeIteratorConcurrentMapObject) {
Iterator<String> keyItr = threadSafeIteratorConcurrentMapObject.keyIterator();
while(keyItr.hasNext()){
// Do whatever
}
}
If you don't mind modification on the collection while iteration; only concentrating on snapshot of elements at the time of iterator creation; then without synchronization block you can use keyItr. Which is already thread safe; it wont through ConcurrentModificationException.
You can create an temporary Map using Collections.UnmodifiableMap, then iterate the keyset .

Categories

Resources