Understanding the main loop in Streams API's ForEachTask - java

It seems that the centerpiece of Java Streams' parallelization is the ForEachTask. Understanding its logic appears to be essential to acquiring the mental model necessary to anticipate the concurrent behavior of client code written against the Streams API. Yet I find my anticipations contradicted by the actual behavior.
For reference, here is the key compute() method (java/util/streams/ForEachOps.java:253):
public void compute() {
Spliterator<S> rightSplit = spliterator, leftSplit;
long sizeEstimate = rightSplit.estimateSize(), sizeThreshold;
if ((sizeThreshold = targetSize) == 0L)
targetSize = sizeThreshold = AbstractTask.suggestTargetSize(sizeEstimate);
boolean isShortCircuit = StreamOpFlag.SHORT_CIRCUIT.isKnown(helper.getStreamAndOpFlags());
boolean forkRight = false;
Sink<S> taskSink = sink;
ForEachTask<S, T> task = this;
while (!isShortCircuit || !taskSink.cancellationRequested()) {
if (sizeEstimate <= sizeThreshold ||
(leftSplit = rightSplit.trySplit()) == null) {
task.helper.copyInto(taskSink, rightSplit);
break;
}
ForEachTask<S, T> leftTask = new ForEachTask<>(task, leftSplit);
task.addToPendingCount(1);
ForEachTask<S, T> taskToFork;
if (forkRight) {
forkRight = false;
rightSplit = leftSplit;
taskToFork = task;
task = leftTask;
}
else {
forkRight = true;
taskToFork = leftTask;
}
taskToFork.fork();
sizeEstimate = rightSplit.estimateSize();
}
task.spliterator = null;
task.propagateCompletion();
}
On a high level of description, the main loop keeps breaking down the spliterator, alternately forking off the processing of the chunk and processing it inline, until the spliterator refuses to split further or the remaining size is below the computed threshold.
Now consider the above algorithm in the case of unsized streams, where the whole is not being split into roughly equal halves; instead chunks of predetermined size are being repeatedly taken from the head of the stream. In this case the "suggested target size" of the chunk is abnormally large, which basically means that the chunks are never re-split into smaller ones.
The algorithm would therefore appear to alternately fork off one chunk, then process one inline. If each chunk takes the same time to process, this should result in no more than two cores being used. However, the actual behavior is that all four cores on my machine are occupied. Obviously, I am missing an important piece of the puzzle with that algorithm.
What is it that I'm missing?
Appendix: test code
Here is a piece of self-contained code which may be used to test the behavior which is the subject of this question:
package test;
import static java.util.concurrent.TimeUnit.NANOSECONDS;
import static java.util.concurrent.TimeUnit.SECONDS;
import static test.FixedBatchSpliteratorWrapper.withFixedSplits;
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
public class Parallelization {
static final AtomicLong totalTime = new AtomicLong();
static final ExecutorService pool = Executors.newFixedThreadPool(4);
public static void main(String[] args) throws IOException {
final long start = System.nanoTime();
final Path inputPath = createInput();
System.out.println("Start processing");
try (PrintWriter w = new PrintWriter(Files.newBufferedWriter(Paths.get("output.txt")))) {
withFixedSplits(Files.newBufferedReader(inputPath).lines(), 200).map(Parallelization::processLine)
.forEach(w::println);
}
final double cpuTime = totalTime.get(), realTime = System.nanoTime() - start;
final int cores = Runtime.getRuntime().availableProcessors();
System.out.println(" Cores: " + cores);
System.out.format(" CPU time: %.2f s\n", cpuTime / SECONDS.toNanos(1));
System.out.format(" Real time: %.2f s\n", realTime / SECONDS.toNanos(1));
System.out.format("CPU utilization: %.2f%%", 100.0 * cpuTime / realTime / cores);
}
private static String processLine(String line) {
final long localStart = System.nanoTime();
double ret = 0;
for (int i = 0; i < line.length(); i++)
for (int j = 0; j < line.length(); j++)
ret += Math.pow(line.charAt(i), line.charAt(j) / 32.0);
final long took = System.nanoTime() - localStart;
totalTime.getAndAdd(took);
return NANOSECONDS.toMillis(took) + " " + ret;
}
private static Path createInput() throws IOException {
final Path inputPath = Paths.get("input.txt");
try (PrintWriter w = new PrintWriter(Files.newBufferedWriter(inputPath))) {
for (int i = 0; i < 6_000; i++) {
final String text = String.valueOf(System.nanoTime());
for (int j = 0; j < 20; j++)
w.print(text);
w.println();
}
}
return inputPath;
}
}
package test;
import static java.util.Spliterators.spliterator;
import static java.util.stream.StreamSupport.stream;
import java.util.Comparator;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.stream.Stream;
public class FixedBatchSpliteratorWrapper<T> implements Spliterator<T> {
private final Spliterator<T> spliterator;
private final int batchSize;
private final int characteristics;
private long est;
public FixedBatchSpliteratorWrapper(Spliterator<T> toWrap, long est, int batchSize) {
final int c = toWrap.characteristics();
this.characteristics = (c & SIZED) != 0 ? c | SUBSIZED : c;
this.spliterator = toWrap;
this.batchSize = batchSize;
this.est = est;
}
public FixedBatchSpliteratorWrapper(Spliterator<T> toWrap, int batchSize) {
this(toWrap, toWrap.estimateSize(), batchSize);
}
public static <T> Stream<T> withFixedSplits(Stream<T> in, int batchSize) {
return stream(new FixedBatchSpliteratorWrapper<>(in.spliterator(), batchSize), true);
}
#Override public Spliterator<T> trySplit() {
final HoldingConsumer<T> holder = new HoldingConsumer<>();
if (!spliterator.tryAdvance(holder)) return null;
final Object[] a = new Object[batchSize];
int j = 0;
do a[j] = holder.value; while (++j < batchSize && tryAdvance(holder));
if (est != Long.MAX_VALUE) est -= j;
return spliterator(a, 0, j, characteristics());
}
#Override public boolean tryAdvance(Consumer<? super T> action) {
return spliterator.tryAdvance(action);
}
#Override public void forEachRemaining(Consumer<? super T> action) {
spliterator.forEachRemaining(action);
}
#Override public Comparator<? super T> getComparator() {
if (hasCharacteristics(SORTED)) return null;
throw new IllegalStateException();
}
#Override public long estimateSize() { return est; }
#Override public int characteristics() { return characteristics; }
static final class HoldingConsumer<T> implements Consumer<T> {
Object value;
#Override public void accept(T value) { this.value = value; }
}
}

Ironically, the answer is almost stated in the question: as the "left" and "right" task take turns at being forked vs. processed inline, half of the time the right task, represented by this, e.g. the complete rest of the stream, is being forked off. That means that the forking off of chunks is just slowed down a bit (happening every other time), but clearly it happens.

Related

Java Async Profiler Flame Graph

In the scenario below, is Java async-profiler the right tool to see where's time spent when comparing performance of ArrayBlockingQueue and LinkedBlockingQueue?
On my machine, total execution time of ABQ is always 25% faster than LBQ when sharing 50M entries between a consumer and a producer. Flame graphs of both are "pretty much" same except LBQ one shows only a handful of samples from JVM object allocation code but this wouldn't jusify 25% increase. As expected, TLAB allocation in LBQ is much higher.
I was wondering, how can I see which activity (be it code or hardware) is taking the time?
Runner:
import java.util.*;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
public class Runner {
public static void main(String[] args) throws InterruptedException {
int size = 50_000_000;
BlockingQueue<Long> queue = new LinkedBlockingQueue<>(size);
Producer producer = new Producer(queue, size);
Thread t = new Thread(producer);
t.setName("ProducerItIs");
Consumer consumer = new Consumer(queue, size);
Thread t2 = new Thread(consumer);
t2.setName("ConsumerItIs");
t.start();
t2.start();
Thread.sleep(8000);
System.out.println("done");
queue.forEach(System.out::println);
System.out.println(queue.size());
}
}
Producer:
import java.util.Queue;
import java.util.Random;
import java.util.concurrent.BlockingQueue;
public class Producer implements Runnable {
public Producer(BlockingQueue<Long> blockingQueue, int size) {
this.queue = blockingQueue;
this.size = size;
}
private final BlockingQueue<Long> queue;
private final int size;
public void run() {
System.out.println("Started to produce...");
long nanos = System.nanoTime();
Long ii = (long) new Random().nextInt();
for (int j = 0; j < size; j++) {
queue.add(ii);
}
System.out.println("producer Time taken :" + ((System.nanoTime() - nanos) / 1e6));
}
}
Consumer:
import java.util.concurrent.BlockingQueue;
public class Consumer implements Runnable {
private final BlockingQueue<Long> blockingQueue;
private final int size;
private Long value;
public Consumer(BlockingQueue<Long> blockingQueue, int size) {
this.blockingQueue = blockingQueue;
this.size = size;
}
public void run() {
long nanos = System.nanoTime();
System.out.println("Starting to consume...");
int i = 1;
try {
while (true) {
value = blockingQueue.take();
i++;
if (i >= size) {
break;
}
}
System.out.println("Consumer Time taken :" + ((System.nanoTime() - nanos)/1e6));
} catch (Exception exp) {
System.out.println(exp);
}
}
public long getValue() {
return value;
}
}
With ArrayBlockingQueue:
With LinkedListBlockedQueue: Black arrow showing samples captured for allocations

Problems with Static Internal Threads When Accessing Static Variables in External Classes

This problem has puzzled me for a long time, please help me,thanks.
This is my java code.
package com.concurrent.example;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
/**
* P683
*/
class CircularSet {
private int[] array;
private int len;
private int index = 0;
public CircularSet (int size) {
array = new int[size];
len = size;
for (int i = 0; i < size; i++) {
array[i] = -1;
}
}
public synchronized void add(int i ) {
array[index] = i;
index = ++index % len;
}
public synchronized boolean contains(int val) {
for (int i = 0; i < len; i++) {
if(array[i] == val) {
return true;
}
}
return false;
}
}
public class SerialNumberChecker {
private static final int SIZE = 10;
private static CircularSet serials = new CircularSet(1000);
private static ExecutorService exec = Executors.newCachedThreadPool();
private static int serial;
static class SerialChecker implements Runnable {
#Override
public void run() {
while(true) {
//int serial;
synchronized (serials) {
serial = SerialNumberGenerator.nextSerialNumber();
}
if (serials.contains(serial)) {
System.out.println("Duplicate: " + serial);
System.exit(0);
}
System.out.println(serial);
serials.add(serial);
}
}
}
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < SIZE; i++) {
exec.execute(new SerialChecker());
if (args.length > 0) {
TimeUnit.SECONDS.sleep(new Integer(args[0]));
System.out.println("No duplicates detected");
System.exit(0);
}
}
}
}
It can stop, but when i uncomment //int serial;The result is different,it can't stop.Why does this temporary variable have a different result than the static variable of the external class. Is this the reason of using a thread?
The code of SerialNumberGenerator:
public class SerialNumberGenerator {
private static volatile int serialNumber = 0;
public static int nextSerialNumber() {
return serialNumber ++; //Not thread-safe
}
}
With private static int serial, all SerialNumberCheckers share the same serial. For example:
Thread1 set serial = 1
Thread2 set serial = 2
Thread1 put 2 into CircularSet.
Thread2 found it duplicate and exit.
However, if you declare another int serial in the run method, It will shadow the private static int serial, which means all threads has its own serial and they will assign & check it. Since the generation of serial is in the synchronized block, there will be no duplicates.

Creating an endless Iterator with a given distribution

Given a java.util.Collection what is the easiest way to create an endless java.util.Iterator which returns those elements such that they show up according to a given distribution (org.apache.commons.math.distribution)?
List<Object> l = new ArrayList<Object>(coll);
Iterator<Object> i = new Iterator<Object>() {
public boolean hasNext() { return true; }
public Object next() {
return coll.get(distribution.nextInt(0, l.size());
}
}
Your problem is then how to convert the Distribution classes in the apache library to implement the nextInt method. I have to say that it is far from obvious to me how you can actually do this from the Distribution interface.
One (slightly rubbish) way I can think of is to generate an EmpiricalDistribution (in the random package) dataset using the probability defined by your actual distribution and then using that emprirical dsitribution as the distribution (above)
Solution for Gaussian distribution
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.Random;
import java.util.SortedMap;
import java.util.Map.Entry;
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.ImmutableSortedMap;
import com.google.common.collect.Lists;
import com.google.common.collect.Multimap;
import com.google.common.collect.ImmutableSortedMap.Builder;
/**
* Endless sequence with gaussian distribution.
*
* #param <T> the type of the elements
* #author Michael Locher
*/
public class GaussianSequence<T> implements Iterable<T>, Iterator<T> {
private static final int HISTOGRAMM_SAMPLES = 50000;
private static final int HISTOGRAMM_ELEMENTS = 100;
private static final int HISTOGRAMM_LENGTH = 80;
private static final double DEFAULT_CUTOFF = 4.0;
private final List<T> elements;
private final int maxIndex;
private final Random rnd;
private final double scaling;
private final double halfCount;
/**
* Creates this.
* #param rnd the source of randomness to use
* #param elements the elements to deliver
*/
public GaussianSequence(final Random rnd, final Collection<T> elements) {
this(rnd, DEFAULT_CUTOFF, elements);
}
private GaussianSequence(final Random rnd, final double tailCutOff, final Collection<T> elements) {
super();
this.rnd = rnd;
this.elements = new ArrayList<T>(elements);
if (this.elements.isEmpty()) {
throw new IllegalArgumentException("no elements provided");
}
this.maxIndex = this.elements.size() - 1;
this.halfCount = this.elements.size() / 2.0;
this.scaling = this.halfCount / tailCutOff;
}
/**
* {#inheritDoc}
*/
#Override
public Iterator<T> iterator() {
return this;
}
/**
* {#inheritDoc}
*/
#Override
public boolean hasNext() {
return true;
}
/**
* {#inheritDoc}
*/
#Override
public void remove() {
throw new UnsupportedOperationException();
}
/**
* {#inheritDoc}
*/
#Override
public T next() {
return this.elements.get(sanitizeIndex(determineNextIndex()));
}
private int determineNextIndex() {
final double z = this.rnd.nextGaussian();
return (int) (this.halfCount + (this.scaling * z));
}
private int sanitizeIndex(final int index) {
if (index < 0) {
return 0;
}
if (index > this.maxIndex) {
return this.maxIndex;
}
return index;
}
/**
* Prints a histogramm to stdout.
* #param args not used
*/
public static void main(final String[] args) {
final PrintWriter out = new PrintWriter(new OutputStreamWriter(System.out, Charset.forName("UTF-8")), true);
plotHistogramm(new Random(), out);
}
private static void plotHistogramm(final Random rnd, final PrintWriter out) {
// build elements
final Multimap<Integer, Integer> results = ArrayListMultimap.create();
final List<Integer> elements = Lists.newArrayListWithCapacity(HISTOGRAMM_ELEMENTS);
for (int i = 1; i < HISTOGRAMM_ELEMENTS; i++) {
elements.add(i);
}
// sample sequence
final Iterator<Integer> randomSeq = new GaussianSequence<Integer>(rnd, elements);
for (int j = 0; j < HISTOGRAMM_SAMPLES; j++) {
final Integer sampled = randomSeq.next();
results.put(sampled, sampled);
}
// count and sort results
final Builder<Integer, Integer> r = ImmutableSortedMap.naturalOrder();
for (final Entry<Integer, Collection<Integer>> e : results.asMap().entrySet()) {
final int count = e.getValue().size();
r.put(e.getKey(), count);
}
// plot results
final SortedMap<Integer, Integer> sortedAndCounted = r.build();
final double histogramScale = (double) HISTOGRAMM_LENGTH / Collections.max(sortedAndCounted.values());
for (final Entry<Integer, Integer> e : sortedAndCounted.entrySet()) {
out.format("%3d [%4d]", e.getKey(), e.getValue());
final StringBuilder c = new StringBuilder();
final int lineLength = (int) (histogramScale * e.getValue());
for (int i = 0; i < lineLength; i++) {
c.append('*');
}
out.println(c);
}
}
}

writing a Comparator for a compound object for binary searching

I have a class, and list of instances, that looks something like this (field names changed to protect the innocent/proprietary):
public class Bloat
{
public long timeInMilliseconds;
public long spaceInBytes;
public long costInPennies;
}
public class BloatProducer
{
final private List<Bloat> bloatList = new ArrayList<Bloat>();
final private Random random = new Random();
public void produceMoreBloat()
{
int n = bloatList.size();
Bloat previousBloat = (n == 0) ? new Bloat() : bloatList.get(n-1);
Bloat newBloat = new Bloat();
newBloat.timeInMilliseconds =
previousBloat.timeInMilliseconds + random.nextInt(10) + 1;
newBloat.spaceInBytes =
previousBloat.spaceInBytes + random.nextInt(10) + 1;
newBloat.costInPennies =
previousBloat.costInPennies + random.nextInt(10) + 1;
bloatList.add(newBloat);
}
/* other fields/methods */
public boolean testMonotonicity()
{
Bloat previousBloat = null;
for (Bloat thisBloat : bloatList)
{
if (previousBloat != null)
{
if ((previousBloat.timeInMilliseconds
>= thisBloat.timeInMilliseconds)
|| (previousBloat.spaceInBytes
>= thisBloat.spaceInBytes)
|| (previousBloat.costInPennies
>= thisBloat.costInPennies))
return false;
}
previousBloat = thisBloat;
}
return true;
}
BloatProducer bloatProducer;
The list bloatList is kept internally by BloatProducer and is maintained in such a way that it only appends new Bloat records, does not modify any of the old ones, and each of the fields is monotonically increasing, e.g. bloatProducer.testMonotonicity() would always return true.
I would like to use Collections.binarySearch(list,key,comparator) to search for the Bloat record by either the timeInMilliseconds, spaceInBytes, or costInPennies fields. (and if the number is between two records, I want to find the previous record)
What's the easiest way to write a series of 3 Comparator classes to get this to work? Do I have to use a key that is a Bloat object with dummy fields for the ones I'm not searching for?
You'll need to write a separate comparator for each field you want to compare on:
public class BloatTimeComparator implements Comparator<Bloat> {
public int compare(Bloat bloat1, Bloat bloat2) {
if (bloat1.timeInMilliseconds > bloat2.timeInMilliseconds) {
return 1;
} else if (bloat1.timeInMilliseconds < bloat2.timeInMilliseconds) {
return -1;
} else {
return 0;
}
}
}
And so on for each property in Bloat you want to compare on (you'll need to create a comparator class for each). Then use the Collections helper method:
Collections.binarySearch(bloatList, bloatObjectToFind,
new BloatTimeComparator());
From the Java documentation for the binarySearch method, the return value will be:
the index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
Which is the index you specified that you wanted.
You will need to have 3 separate Comparators if you want to search by each of the 3 properties.
A cleaner option would be to have a generic Comparator which receives a parameter which tells it by which field to compare.
A basic generic comparator should look something like this:
public class BloatComparator implements Comparator<Bloat>
{
CompareByEnum field;
public BloatComparator(CompareByEnum field) {
this.field = field;
}
#Override
public int compare(Bloat arg0, Bloat arg1) {
if (this.field == CompareByEnum.TIME){
// compare by field time
}
else if (this.field == CompareByEnum.SPACE) {
// compare by field space
}
else {
// compare by field cost
}
}
}
Here's a test-driven approach to writing the first comparator:
public class BloatTest extends TestCase{
public class Bloat {
public long timeInMilliseconds;
public long spaceInBytes;
public long costInPennies;
public Bloat(long timeInMilliseconds, long spaceInBytes, long costInPennies) {
this.timeInMilliseconds = timeInMilliseconds;
this.spaceInBytes = spaceInBytes;
this.costInPennies = costInPennies;
}
}
public void testMillisecondComparator() throws Exception {
Bloat a = new Bloat(5, 10, 10);
Bloat b = new Bloat(3, 12, 12);
Bloat c = new Bloat(5, 12, 12);
Comparator<Bloat> comparator = new MillisecondComparator();
assertTrue(comparator.compare(a, b) > 0);
assertTrue(comparator.compare(b, a) < 0);
assertEquals(0, comparator.compare(a, c));
}
private static class MillisecondComparator implements Comparator<Bloat> {
public int compare(Bloat a, Bloat b) {
Long aTime = a.timeInMilliseconds;
return aTime.compareTo(b.timeInMilliseconds);
}
}
}
If you want to leverage the binary search for all three properties, you have to create comparators for them and have additional Lists or TreeSets sorted by the comparators.
test program (MultiBinarySearch.java) to see if these ideas work properly (they appear to):
package com.example.test;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Random;
class Bloat
{
final public long timeInMilliseconds;
final public long spaceInBytes;
final public long costInPennies;
static final private int N = 100;
public Bloat(long l1, long l2, long l3) {
timeInMilliseconds = l1;
spaceInBytes = l2;
costInPennies = l3;
}
public Bloat() { this(0,0,0); }
public Bloat moreBloat(Random r)
{
return new Bloat(
timeInMilliseconds + r.nextInt(N) + 1,
spaceInBytes + r.nextInt(N) + 1,
costInPennies + r.nextInt(N) + 1
);
}
public String toString() {
return "[bloat: time="+timeInMilliseconds
+", space="+spaceInBytes
+", cost="+costInPennies
+"]";
}
static int compareLong(long l1, long l2)
{
if (l2 > l1)
return -1;
else if (l1 > l2)
return 1;
else
return 0;
}
public static class TimeComparator implements Comparator<Bloat> {
public int compare(Bloat bloat1, Bloat bloat2) {
return compareLong(bloat1.timeInMilliseconds, bloat2.timeInMilliseconds);
}
}
public static class SpaceComparator implements Comparator<Bloat> {
public int compare(Bloat bloat1, Bloat bloat2) {
return compareLong(bloat1.spaceInBytes, bloat2.spaceInBytes);
}
}
public static class CostComparator implements Comparator<Bloat> {
public int compare(Bloat bloat1, Bloat bloat2) {
return compareLong(bloat1.costInPennies, bloat2.costInPennies);
}
}
enum Type {
TIME(new TimeComparator()),
SPACE(new SpaceComparator()),
COST(new CostComparator());
public Comparator<Bloat> comparator;
Type(Comparator<Bloat> c) { this.comparator = c; }
}
}
class BloatProducer
{
final private List<Bloat> bloatList = new ArrayList<Bloat>();
final private Random random = new Random();
public void produceMoreBloat()
{
int n = bloatList.size();
Bloat newBloat =
(n == 0) ? new Bloat() : bloatList.get(n-1).moreBloat(random);
bloatList.add(newBloat);
}
/* other fields/methods */
public boolean testMonotonicity()
{
Bloat previousBloat = null;
for (Bloat thisBloat : bloatList)
{
if (previousBloat != null)
{
if ((previousBloat.timeInMilliseconds
>= thisBloat.timeInMilliseconds)
|| (previousBloat.spaceInBytes
>= thisBloat.spaceInBytes)
|| (previousBloat.costInPennies
>= thisBloat.costInPennies))
return false;
}
previousBloat = thisBloat;
}
return true;
}
public int searchBy(Bloat.Type t, Bloat key)
{
return Collections.binarySearch(bloatList, key, t.comparator);
}
public void showSearch(Bloat.Type t, Bloat key)
{
System.out.println("Search by "+t+": ");
System.out.println(key);
int i = searchBy(t,key);
if (i >= 0)
{
System.out.println("matches");
System.out.println(bloatList.get(i));
}
else
{
System.out.println("is between");
i = -i-1;
Bloat b1 = (i == 0) ? null : bloatList.get(i-1);
System.out.println(b1);
Bloat b2 = (i >= bloatList.size()) ? null : bloatList.get(i);
System.out.println("and");
System.out.println(b2);
}
}
}
public class MultiBinarySearch {
private static int N = 1000;
public static void main(String[] args)
{
BloatProducer bloatProducer = new BloatProducer();
for (int i = 0; i < N; ++i)
{
bloatProducer.produceMoreBloat();
}
System.out.println("testMonotonicity() returns "+
bloatProducer.testMonotonicity());
Bloat key;
key = new Bloat(10*N, 20*N, 30*N);
bloatProducer.showSearch(Bloat.Type.COST, key);
bloatProducer.showSearch(Bloat.Type.SPACE, key);
bloatProducer.showSearch(Bloat.Type.TIME, key);
key = new Bloat(-10000, 0, 1000*N);
bloatProducer.showSearch(Bloat.Type.COST, key);
bloatProducer.showSearch(Bloat.Type.SPACE, key);
bloatProducer.showSearch(Bloat.Type.TIME, key);
}
}

How can I get an int[] out of an Iterator?

I have what amounts to an Iterator<Integer>... actually it's a class Thing that accepts a Visitor<SomeObject> and calls visit() for a subset of the SomeObjects it contains, and I have to implement Visitor<SomeObject> so it does something like this:
// somehow get all the Id's from each of the SomeObject that Thing lets me visit
public int[] myIdExtractor(Thing thing)
{
SomeCollection c = new SomeCollection();
thing.visitObjects(new Visitor<SomeObject>()
{
public void visit(SomeObject obj) { c.add(obj.getId()); }
}
);
return convertToPrimitiveArray(c);
}
I need to extract an int[] containing the results, and I'm not sure what to use for SomeCollection and convertToPrimitiveArray. The number of results is unknown ahead of time and will be large (10K-500K). Is there anything that would be a better choice than using ArrayList<Integer> for SomeCollection, and this:
public int[] convertToPrimitiveArray(List<Integer> ints)
{
int N = ints.size();
int[] array = new int[N];
int j = 0;
for (Integer i : ints)
{
array[j++] = i;
}
return array;
}
Efficiency and memory usage are of some concern.
It's not too difficult to come up with a class that collects ints in an array (even if you are not using some library which does it for you).
public class IntBuffer {
private int[] values = new int[10];
private int size = 0;
public void add(int value) {
if (!(size < values.length)) {
values = java.util.Arrays.copyOf(values, values.length*2);
}
values[size++] = value;
}
public int[] toArray() {
return java.util.Arrays.copyOf(values, size);
}
}
(Disclaimer: This is stackoverflow, I have not even attempted to compile this code.)
As an alternative you could use DataOutputStream to store the ints in a ByteArrayOutputStream.
final ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
final DataOutputStream out = new DataOutputStream(byteOut);
...
out.writeInt(value);
...
out.flush();
final byte[] bytes = byteOut.toByteArray();
final int[] ints = new int[bytes.length/4];
final ByteArrayInputStream byteIn = new ByteArrayInputStream(bytes);
final DataInputStream in = new DataOutputStream(byteIn);
for (int ct=0; ct<ints.length; ++ct) {
ints[ct] = in.readInt();
}
(Disclaimer: This is stackoverflow, I have not even attempted to compile this code.)
You could look at something like pjc to handle this. That is a collections framework made for primitives.
for benchmarking's sake I put together a test program using an LFSR generator to prevent the compiler from optimizing out test arrays. Couldn't download pjc but I assume timing should be similar to Tom's IntBuffer class, which is by far the winner. The ByteArrayOutputStream approach is about the same speed as my original ArrayList<Integer> approach. I'm running J2SE 6u13 on a 3GHz Pentium 4, and with approx 220 values, after JIT has run its course, the IntBuffer approach takes roughly 40msec (only 40nsec per item!) above and beyond a reference implementation using a "forgetful" collection that just stores the last argument to visit() (so the compiler doesn't optimize it out). The other two approaches take on the order of 300msec, about 8x as slow.
edit: I suspect the problem with the Stream approach is that there is the potential for exceptions which I had to catch, not sure.
(for arguments run PrimitiveArrayTest 1 2)
package com.example.test.collections;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class PrimitiveArrayTest {
interface SomeObject {
public int getX();
}
interface Visitor {
public void visit(SomeObject obj);
}
public static class PlainObject implements SomeObject
{
private int x;
public int getX() { return this.x; }
public void setX(int x) { this.x = x; }
}
public static class Thing
{
/* here's a LFSR
* see http://en.wikipedia.org/wiki/Linear_feedback_shift_register
* and http://www.ece.cmu.edu/~koopman/lfsr/index.html
*/
private int state;
final static private int MASK = 0x80004;
private void _next()
{
this.state = (this.state >>> 1)
^ (-(this.state & 1) & MASK);
}
public Thing(int state) { this.state = state; }
public void setState(int state) { this.state = state; }
public void inviteVisitor(Visitor v, int terminationPoint)
{
PlainObject obj = new PlainObject();
while (this.state != terminationPoint)
{
obj.setX(this.state);
v.visit(obj);
_next();
}
}
}
static public abstract class Collector implements Visitor
{
abstract public void initCollection();
abstract public int[] getCollection();
public int[] extractX(Thing thing, int startState, int endState)
{
initCollection();
thing.setState(startState);
thing.inviteVisitor(this, endState);
return getCollection();
}
public void doit(Thing thing, int startState, int endState)
{
System.out.printf("%s.doit(thing,%d,%d):\n",
getClass().getName(),
startState,
endState);
long l1 = System.nanoTime();
int[] result = extractX(thing,startState,endState);
long l2 = System.nanoTime();
StringBuilder sb = new StringBuilder();
sb.append(String.format("%d values calculated in %.4f msec ",
result.length, (l2-l1)*1e-6));
int N = 3;
if (result.length <= 2*N)
{
sb.append("[");
for (int i = 0; i < result.length; ++i)
{
if (i > 0)
sb.append(", ");
sb.append(result[i]);
}
sb.append("]");
}
else
{
int sz = result.length;
sb.append(String.format("[%d, %d, %d... %d, %d, %d]",
result[0], result[1], result[2],
result[sz-3], result[sz-2], result[sz-1]));
}
System.out.println(sb.toString());
}
}
static public class Collector0 extends Collector
{
int lastint = 0;
#Override public int[] getCollection() { return new int[]{lastint}; }
#Override public void initCollection() {}
#Override public void visit(SomeObject obj) {lastint = obj.getX(); }
}
static public class Collector1 extends Collector
{
final private List<Integer> ints = new ArrayList<Integer>();
#Override public int[] getCollection() {
int N = this.ints.size();
int[] array = new int[N];
int j = 0;
for (Integer i : this.ints)
{
array[j++] = i;
}
return array;
}
#Override public void initCollection() { }
#Override public void visit(SomeObject obj) { ints.add(obj.getX()); }
}
static public class Collector2 extends Collector
{
/*
* adapted from http://stackoverflow.com/questions/1167060
* by Tom Hawtin
*/
private int[] values;
private int size = 0;
#Override public void visit(SomeObject obj) { add(obj.getX()); }
#Override public void initCollection() { values = new int[32]; }
private void add(int value) {
if (!(this.size < this.values.length)) {
this.values = java.util.Arrays.copyOf(
this.values, this.values.length*2);
}
this.values[this.size++] = value;
}
#Override public int[] getCollection() {
return java.util.Arrays.copyOf(this.values, this.size);
}
}
static public class Collector3 extends Collector
{
/*
* adapted from http://stackoverflow.com/questions/1167060
* by Tom Hawtin
*/
final ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
final DataOutputStream out = new DataOutputStream(this.byteOut);
int size = 0;
#Override public int[] getCollection() {
try
{
this.out.flush();
final int[] ints = new int[this.size];
final ByteArrayInputStream byteIn
= new ByteArrayInputStream(this.byteOut.toByteArray());
final DataInputStream in = new DataInputStream(byteIn);
for (int ct=0; ct<ints.length; ++ct) {
ints[ct] = in.readInt();
}
return ints;
}
catch (IOException e) { /* gulp */ }
return new int[0]; // failure!?!??!
}
#Override public void initCollection() { }
#Override public void visit(SomeObject obj) {
try {
this.out.writeInt(obj.getX());
++this.size;
}
catch (IOException e) { /* gulp */ }
}
}
public static void main(String args[])
{
int startState = Integer.parseInt(args[0]);
int endState = Integer.parseInt(args[1]);
Thing thing = new Thing(0);
// let JIT do its thing
for (int i = 0; i < 20; ++i)
{
Collector[] collectors = {new Collector0(), new Collector1(), new Collector2(), new Collector3()};
for (Collector c : collectors)
{
c.doit(thing, startState, endState);
}
System.out.println();
}
}
}
Instead of convertToPrimitiveArray, you can use List.toArray(T[] a):
ArrayList<int> al = new ArrayList<int>();
// populate al
int[] values = new int[al.size()];
al.toArray(values);
For your other concerns, LinkedList might be slightly better than ArrayList, given that you don't know the size of your result set in advance.
If performance is really a problem, you may be better off hand-managing an int[] yourself, and using System.arraycopy() each time it grows; the boxing/unboxing from int to Integer that you need for any Collection could hurt.
As with any performance-related question, of course, test and make sure it really matters before spending too much time optimizing.

Categories

Resources