I have two collections of Long type. Both of size 20-30 million. What is the quickest way to remove from one, those that are common in the second? Lesser the heap space taken, the better, as there are other things going on in parallel.
I know LinkedList is better than ArrayList for removals using Iterator, but I'm just not sure if I need to iterate over each element. I want to poll for any better approaches, both Collections are sorted.
Edit: I previously stated my collection sizes as 2-3 million, I realized it is 20-30 million.
There will be lots of overlaps. The exact type of Collections is open to debate as well.
With counts in the range of millions, solutions with O(n2) complexity should be out. You have two basic solutions here:
Sort the second collection, and use binary search for an O((N+M)*logM) solution, or
Put elements from the second collection into a hash container, for an O(N+M) solution
Above, N is the number of elements in the first collection, and M is the number of elements in the second collection.
Set<Long> toRemove = new HashSet<Long>(collection2);
Iterator<Long> iter = collection1.iterator();
while (iter.hasNext()) {
if (toRemove.contains(iter.next())) {
iter.remove();
}
}
Note that if collection1 is an ArrayList, this will be very slow. If you must keep it an ArrayList, you can do it like this:
int rd = 0, wr = 0;
// Copy the elements you are keeping into a contiguous range
while (rd != arrayList1.size()) {
Long last = arrayList1.get(rd++);
if (!toRemove.contains(iter.next()) {
arrayList1.put(wr++, last);
}
}
// Remove "tail" elements
while (rd > wr) {
arrayList1.remove(--wr);
}
Without growing heap.
Collection<Long> a = new HashSet<Long>();
//fill a
Collection<Long> b = new ArrayList<Long>();
//fill b
for(int i = 0; i < b.size(); i++){
a.remove(b.get(i));
}
b.size() and b.get(int i) runs in constant time according to Oracles Javadoc.
Also a.remove(O o) runs in constant time.
First port of call would be the Collection.removeAll method. This uses no extra heap space, and its time complexity is dependent on the performance of the contains method on your second collection. If your second collection is a TreeSet then a.removeAll(b) takes O(n . log(m)) time (where n is the size of a and m is the size of b), if b is a HashSet then it takes O(n) time, if b is a sorted ArrayList then it's O(nm), but you can create a new wrapper Collection that uses a binary search to reduce it to O(n . log(m)) for negligible constant memory cost:
private static class SortedList<T extends Comparable<? super T>> extends com.google.common.collect.ForwardingList<T>
{
private List delegate;
public SortedList(ArrayList<T> delegate)
{
this.delegate = delegate;
}
#Override
protected List<T> delegate()
{
return delegate;
}
#Override
public boolean contains(Object object)
{
return Collections.binarySearch(delegate, (T) object) >= 0;
}
}
static <E extends Comparable<? super E>> void removeAll(Collection<E> a, ArrayList<E> b)
{
//assumes that b is sorted
a.removeAll(new SortedList<E>(b));
}
You should take a look at Apache Common Collections
I tested it with LinkedList containing ~3M Longs, it gives pretty good results :
Random r = new Random();
List<Long> list1 = new LinkedList<Long>();
for (int i = 0; i < 3000000; i++) {
list1.add(r.nextLong());
}
List<Long> list2 = new LinkedList<Long>();
for (int i = 0; i < 2000000; i++) {
list2.add(r.nextLong());
}
Collections.sort(list1);
Collections.sort(list2);
long time = System.currentTimeMillis();
list3 = ListUtils.subtract(list2, list1);
System.out.println("listUtils.intersection = " + (System.currentTimeMillis() - time));
I can't ensure you this is the best solution, but it is as easy one.
I get an execution time equals to :
1247 ms
Inconvenient : it creates a new List
Related
We found out that ArrayList.addAll(...) method doesn't check if the given collection is not empty.
It means we will re-allocate memory (System.arrayCopy(...) will be called) even if we don't actually need it.
Shoud we add an IF check for an optimization?
For example: elapsed time for the first code is more than 8 times faster than the second one.
It seems like our optimization works properly, so why it wasn't implemented already?
List<Integer> existed = new ArrayList<>();
List<Integer> empty = new ArrayList<>();
for (int i = 0; i < 100_000_000; i++) {
if(!empty.isEmpty())
existed.addAll(empty);
}
VS
List<Integer> existed = new ArrayList<>();
List<Integer> empty = new ArrayList<>();
for (int i = 0; i < 100_000_000; i++) {
existed.addAll(empty);
}
In JDK14 ArrayList.addAll() makes a copy of the collection to add as an array and increments the ArrayList modification count - even if there are no elements to add.
public boolean addAll(Collection<? extends E> c) {
Object[] a = c.toArray();
modCount++;
int numNew = a.length;
if (numNew == 0)
return false;
...
So your testcase which is not using isEmpty() beforehand, it will cause unnecessary allocation of 100,000,000 instances of new Object[0] and increments the modification count same number of times and which would explain why it appears slower.
Note that System.arrayCopy is not called unless the array is growing beyond the capacity of the existing array.
I've scanned in my own code just now and I doubt whether adding the isEmpty() test would speed up the code as there is generally something to add each time, and so putting in these extra checks would unnecessary. You'd need to decide for your own situation.
Suppose I had the following code:
public Set<String> csvToSet(String src) {
String[] splitted = src.split(",");
Set<String> result = new HashSet<>(splitted.length);
for (String s : splitted) {
result.add(s);
}
return result;
}
so I need to transform an array into Set.
And Intellij Idea suggests to replace my for-each loop with Collection.addAll one-liner so I get:
...
Set<String> result = new HashSet<>(splitted.length);
result.addAll(Arrays.asList(splitted));
return result;
The complete inspection message is:
This inspection warns when calling some method in a loop (e.g. collection.add(x)) could be replaced when calling a bulk method (e.g. collection.addAll(listOfX).
If checkbox "Use Arrays.asList() to wrap arrays" is checked, the inspection will warn even if the original code iterates over an array while bulk method requires a Collection. In this case the quick-fix action will automatically wrap an array with Arrays.asList() call.
From inspection description it sounds like it works as expected.
If we refer to a top answer of a question about converting an array into Set (How to convert an Array to a Set in Java) the same one liner is suggested:
Set<T> mySet = new HashSet<T>(Arrays.asList(someArray));
Even though creating an ArrayList from array is O(1), I do not like the idea of creating an additional List object.
Usually I trust Intellij inspections and assume it does not offer anything less efficient.
But today I am curious why both: top SO answer and Intellij Idea (with default settings) recommend using same one-liner with creating useless intermediate List object while there is also a Collections.addAll(destCollection, yourArray) since JDK 6.
The only reason I see for it is that both (inspection and answers) are too old. If so, here is the reason to improve intellij idea and give more votes to an answer proposing Collections.addAll() :)
A hint as to why Intellij doesn't suggest the Arrays.asList replacement for
Set<String> result = new HashSet<>(splitted.length);
result.addAll(Arrays.asList(splitted));
return result;
is in the source code for HashSet(Collection):
public HashSet(Collection<? extends E> c) {
map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
Note that the capacity of the set isn't the size of c.
As such, the change would not be semantically equivalent.
Don't worry about creating the List. It is really cheap. It's not free; but you would have to be using it in a really performance-critical loop to notice.
I wrote a small function to measure the performance of the three ways of adding the array to a HashSet and here are the results.
First the base code used by all of them that would generate an array of maxSize with values between 0-100
int maxSize = 10000000; // 10M values
String[] s = new String[maxSize];
Random r = new Random();
for (int i = 0; i < maxSize; i++) {
s[i] = "" + r.nextInt(100);
}
Then the benchmark function:
public static void benchmark(String name, Runnable f) {
Long startTime = System.nanoTime();
f.run();
Long endTime = System.nanoTime();
System.out.println("Total execution time for: " + name + ": " + (endTime-startTime) / 1000000 + "ms");
}
So first way is using your code with a loop and for 10M values it took between 150ms and 190ms ( I ran the benchmark several times for each method)
Main.benchmark("Normal loop ", () -> {
Set<String> result = new HashSet<>(s.length);
for (String a : s) {
result.add(a);
}
});
Second is using result.addAll(Arrays.asList(s)); and it took between 180ms and 220ms
Main.benchmark("result.addAll(Arrays.asList(s)): ", () -> {
Set<String> result = new HashSet<>(s.length);
result.addAll(Arrays.asList(s));
});
Third way is using Collections.addAll(result, s); and it took between 170ms and 200ms
Main.benchmark("Collections.addAll(result, s); ", () -> {
Set<String> result = new HashSet<>(s.length);
Collections.addAll(result, s);
});
Now the explanation. From a runtime complexity they all run in O(n) meaning that for N values N operations are going to run (basically adding N values).
From a memory complexity point of view, is again, for all O(N). There's only the new HashSet which is created.
Arrays.asList(someArray) is not creating a new array, is just creating a new object that has a reference to that array. You can see it in the java code:
private final E[] a;
ArrayList(E[] array) {
a = Objects.requireNonNull(array);
}
Besides that, all the addAll methods are going to do exactly what you did, a for-loop:
// addAll method for Collections.addAll(result, s);
public static <T> boolean addAll(Collection<? super T> c, T... elements) {
boolean result = false;
for (T element : elements)
result |= c.add(element);
return result;
}
// addAll method for result.addAll(Arrays.asList(s));
public boolean addAll(Collection<? extends E> c) {
boolean modified = false;
for (E e : c)
if (add(e))
modified = true;
return modified;
}
To conclude, the runtime difference being so small, IntelliJ suggests a way to write code in a more clear way and less code.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I want to merge sorted lists into a single list. How is this solution? I believe it runs in O(n) time. Any glaring flaws, inefficiencies, or stylistic issues?
I don't really like the idiom of setting a flag for "this is the first iteration" and using it to make sure "lowest" has a default value. Is there a better way around that?
public static <T extends Comparable<? super T>> List<T> merge(Set<List<T>> lists) {
List<T> result = new ArrayList<T>();
int totalSize = 0; // every element in the set
for (List<T> l : lists) {
totalSize += l.size();
}
boolean first; //awkward
List<T> lowest = lists.iterator().next(); // the list with the lowest item to add
while (result.size() < totalSize) { // while we still have something to add
first = true;
for (List<T> l : lists) {
if (! l.isEmpty()) {
if (first) {
lowest = l;
first = false;
}
else if (l.get(0).compareTo(lowest.get(0)) <= 0) {
lowest = l;
}
}
}
result.add(lowest.get(0));
lowest.remove(0);
}
return result;
}
Note: this isn't homework, but it isn't for production code, either.
Efficiency will suck if lists contains an ArrayList, since lowest.remove(0) will take linear time in the length of the list, making your algorithm O(n^2).
I'd do:
List<T> result = new ArrayList<T>();
for (List<T> list : lists) {
result.addAll(list);
}
Collections.sort(result);
which is in O(n log n), and leaves far less tedious code to test, debug and maintain.
Your solution is probably the fastest one. SortedLists have an insert cost of log(n), so you'll end up with M log (M) (where M is the total size of the lists).
Adding them to one list and sorting, while easier to read, is still M log(M).
Your solution is just M.
You can clean up your code a bit by sizing the result list, and by using a reference to the lowest list instead of a boolean.
public static <T extends Comparable<? super T>> List<T> merge(Set<List<T>> lists) {
int totalSize = 0; // every element in the set
for (List<T> l : lists) {
totalSize += l.size();
}
List<T> result = new ArrayList<T>(totalSize);
List<T> lowest;
while (result.size() < totalSize) { // while we still have something to add
lowest = null;
for (List<T> l : lists) {
if (! l.isEmpty()) {
if (lowest == null) {
lowest = l;
} else if (l.get(0).compareTo(lowest.get(0)) <= 0) {
lowest = l;
}
}
}
result.add(lowest.get(0));
lowest.remove(0);
}
return result;
}
If you're really particular, use a List object as input, and lowest can be initialized to be lists.get(0) and you can skip the null check.
To expand on Anton's comment:
By placing the latest result from each List, along with an indicator of whch list it is, into a heap, then continually take the top off the heap, and put a new item on the heap from the list belonging to the item you just took off.
Java's PriorityQueue can provide the heap implementation.
This is a really old question, but I don't like any of the submitted answers, so this is what I ended up doing.
The solution of just adding them all into one list and sorting is bad because of the log linear complexity (O(m n log(m n))). If that's not important to you, then it's definitely the simplest and most straightforward answer. Your initial solution isn't bad, but it's a little messy, and #Dathan pointed out that the complexity is O(m n) for m lists and n total elements. You can reduce this to O(n log(m)) by using a heap to reduce the number of comparisons for each element. I use a helper class that allows me to compare iterables. This way I don't destroy the initial lists, and it should operate with reasonable complexity no matter what type of lists are input. The only flaw I see with the implementation below is that it doesn't support lists with null elements, however this could be fixed with sentinels if desired.
public static <E extends Comparable<? super E>> List<E> merge(Collection<? extends List<? extends E>> lists) {
PriorityQueue<CompIterator<E>> queue = new PriorityQueue<CompIterator<E>>();
for (List<? extends E> list : lists)
if (!list.isEmpty())
queue.add(new CompIterator<E>(list.iterator()));
List<E> merged = new ArrayList<E>();
while (!queue.isEmpty()) {
CompIterator<E> next = queue.remove();
merged.add(next.next());
if (next.hasNext())
queue.add(next);
}
return merged;
}
private static class CompIterator<E extends Comparable<? super E>> implements Iterator<E>, Comparable<CompIterator<E>> {
E peekElem;
Iterator<? extends E> it;
public CompIterator(Iterator<? extends E> it) {
this.it = it;
if (it.hasNext()) peekElem = it.next();
else peekElem = null;
}
#Override
public boolean hasNext() {
return peekElem != null;
}
#Override
public E next() {
E ret = peekElem;
if (it.hasNext()) peekElem = it.next();
else peekElem = null;
return ret;
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
#Override
public int compareTo(CompIterator<E> o) {
if (peekElem == null) return 1;
else return peekElem.compareTo(o.peekElem);
}
}
Every element of the returned list involves two O(log(m)) heap operations, there is also an initial iteration over all of the lists. Therefore the overall complexity is O(n log(m) + m) for n total elements and m lists. making this always faster than concatenating and sorting.
Since Balus and meriton have together given an excellent response to your question about the algorithm, I'll speak to your aside about the "first" idiom.
There are definitely other approaches (like setting lowest to a 'magic' value), but I happen to feel that "first" (to which I'd probably give a longer name, but that's being pedantic) is the best, because it's very clear. Presence of a boolean like "first" is a clear signal that your loop will do something special the first time through. It helps the reader.
Of course you don't need it if you take the Balus/meriton approach, but it's a situation which crops up.
I am writing a simple 3D SW rendering engine. I have a default ArrayList<Object3D> containing the whole scene. Now, I want to be able to add, remove and select objects by name, like 3D editors do (because its MUCH more simple than mouse select, but still looking good in homework :) ).
So, the first thing I thought is to have Hashtable for name and index to scene ArrayList. But, then I thought I could just simply save the scene using Hashtable directly, and go through it to render using iterator.
So I want to ask, in a 3D engine, what is speed-preferable? Because I will for-loop the scene many times per second, compared to selecting object. Is ArrayList any faster than iterated Hashtable? Thanks.
First, I suggest you use a HashMap instead of a Hashtable, for the same reason that ArrayList is a better choice than a Vector: less overhead due to useless synchronization.
My guess is that iterating through an ArrayList will be faster than iterating through the Set returned by a Hashtable's (or HashMap's) entrySet() method. But the only way to know is to profile.
Obviously, changes to the display list (other than appending or chopping off the last element) are going to be faster for a HashMap than for an ArrayList.
EDIT
So I followed my own advice and benchmarked. Here's the code I used:
import java.util.*;
public class IterTest {
static class Thing {
Thing(String name) { this.name = name; }
String name;
}
static class ArrayIterTest implements Runnable {
private final ArrayList<Thing> list;
ArrayIterTest(ArrayList<Thing> list) {
this.list = list;
}
public void run() {
int i = 0;
for (Thing thing : list) {
++i;
}
}
}
static class ArraySubscriptTest implements Runnable {
private final ArrayList<Thing> list;
ArraySubscriptTest(ArrayList<Thing> list) {
this.list = list;
}
public void run() {
int i = 0;
int n = list.size();
for (int j = 0; j < n; ++j) {
Thing thing = list.get(j);
++i;
}
}
}
static class MapIterTest implements Runnable {
private final Map<String, Thing> map;
MapIterTest(Map<String, Thing> map) {
this.map = map;
}
public void run() {
int i = 0;
Set<Map.Entry<String, Thing>> set = map.entrySet();
for (Map.Entry<String, Thing> entry : set) {
++i;
}
}
}
public static void main(String[] args) {
final int ITERS = 10000;
final Thing[] things = new Thing[1000];
for (int i = 0; i < things.length; ++i) {
things[i] = new Thing("thing " + i);
}
final ArrayList<Thing> arrayList = new ArrayList<Thing>();
Collections.addAll(arrayList, things);
final HashMap<String, Thing> hashMap = new HashMap<String, Thing>();
for (Thing thing : things) {
hashMap.put(thing.name, thing);
}
final ArrayIterTest t1 = new ArrayIterTest(arrayList);
final ArraySubscriptTest t2 = new ArraySubscriptTest(arrayList);
final MapIterTest t3 = new MapIterTest(hashMap);
System.out.println("t1 time: " + time(t1, ITERS));
System.out.println("t2 time: " + time(t2, ITERS));
System.out.println("t3 time: " + time(t3, ITERS));
}
private static long time(Runnable runnable, int iters) {
System.gc();
long start = System.nanoTime();
while (iters-- > 0) {
runnable.run();
}
return System.nanoTime() - start;
}
}
And here are the results for a typical run:
t1 time: 41412897
t2 time: 30580187
t3 time: 146536728
Clearly using an ArrayList is a big win (by a factor of 3-4) over a HashMap, at least for my style of iterating through a HashMap. I suspect the reason the array iterator is slower than array subscripting is all the iterator objects that need to be created and then garbage-collected.
For reference, this was done with Java 1.6.0_26 (64-bit JVM) on an Intel 1.6GHz quad-core Windows machine with plenty of free memory.
I'm fairly sure that iterating through the ArrayList will be faster than iterating over the Hashtable. Not sure how significant the difference is, though -- maybe (thumb suck) 2x in the actual internal logic, but that's not much.
But note that, unless you need multithread synchronization, you should use a HashMap rather than a Hashtable. There's some performance to be gained there.
Actually, I looked at the current HashMap implementation (preferred over Hashtable as everyone points out). Iterating over the values looks like it's simply iterating through an underlying array.
So, speed will probably be comparable to iterating an ArrayList, though there may be some time skipping over gaps in the HashMaps underlying array.
All said, profiling is king.
A) don't use Hashtable, use HashMap. Hashtable is informally deprecated
B) That depends on the application. Lookup will be faster in the HashMap, Iteration will likely be the same as both use arrays internally. (but the arrays in a HashMap have gaps, so that might give a slight advantage to the ArrayList). Oh, and if you want to maintain a fixed order of iteration, use LinkedHashMap (sorted by insertion) or TreeMap (sorted by natural ordering)
As already said, it's better to use HashMap. Regarding to iteration, in theory, ArrayList has to be faster for two reasons. First is that data structure is simpler, which gives less access time. The second is that ArrayList can be iterated by index without creating Iterator object, which, in case of intense use, produce less garbage and therefore less gc.
In practice - you may not notice difference, depends how heavy you are going to use it.
Use java.util.HashMap instead of java.util.Hashtable if you don't need retrieval synchronization.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I want to merge sorted lists into a single list. How is this solution? I believe it runs in O(n) time. Any glaring flaws, inefficiencies, or stylistic issues?
I don't really like the idiom of setting a flag for "this is the first iteration" and using it to make sure "lowest" has a default value. Is there a better way around that?
public static <T extends Comparable<? super T>> List<T> merge(Set<List<T>> lists) {
List<T> result = new ArrayList<T>();
int totalSize = 0; // every element in the set
for (List<T> l : lists) {
totalSize += l.size();
}
boolean first; //awkward
List<T> lowest = lists.iterator().next(); // the list with the lowest item to add
while (result.size() < totalSize) { // while we still have something to add
first = true;
for (List<T> l : lists) {
if (! l.isEmpty()) {
if (first) {
lowest = l;
first = false;
}
else if (l.get(0).compareTo(lowest.get(0)) <= 0) {
lowest = l;
}
}
}
result.add(lowest.get(0));
lowest.remove(0);
}
return result;
}
Note: this isn't homework, but it isn't for production code, either.
Efficiency will suck if lists contains an ArrayList, since lowest.remove(0) will take linear time in the length of the list, making your algorithm O(n^2).
I'd do:
List<T> result = new ArrayList<T>();
for (List<T> list : lists) {
result.addAll(list);
}
Collections.sort(result);
which is in O(n log n), and leaves far less tedious code to test, debug and maintain.
Your solution is probably the fastest one. SortedLists have an insert cost of log(n), so you'll end up with M log (M) (where M is the total size of the lists).
Adding them to one list and sorting, while easier to read, is still M log(M).
Your solution is just M.
You can clean up your code a bit by sizing the result list, and by using a reference to the lowest list instead of a boolean.
public static <T extends Comparable<? super T>> List<T> merge(Set<List<T>> lists) {
int totalSize = 0; // every element in the set
for (List<T> l : lists) {
totalSize += l.size();
}
List<T> result = new ArrayList<T>(totalSize);
List<T> lowest;
while (result.size() < totalSize) { // while we still have something to add
lowest = null;
for (List<T> l : lists) {
if (! l.isEmpty()) {
if (lowest == null) {
lowest = l;
} else if (l.get(0).compareTo(lowest.get(0)) <= 0) {
lowest = l;
}
}
}
result.add(lowest.get(0));
lowest.remove(0);
}
return result;
}
If you're really particular, use a List object as input, and lowest can be initialized to be lists.get(0) and you can skip the null check.
To expand on Anton's comment:
By placing the latest result from each List, along with an indicator of whch list it is, into a heap, then continually take the top off the heap, and put a new item on the heap from the list belonging to the item you just took off.
Java's PriorityQueue can provide the heap implementation.
This is a really old question, but I don't like any of the submitted answers, so this is what I ended up doing.
The solution of just adding them all into one list and sorting is bad because of the log linear complexity (O(m n log(m n))). If that's not important to you, then it's definitely the simplest and most straightforward answer. Your initial solution isn't bad, but it's a little messy, and #Dathan pointed out that the complexity is O(m n) for m lists and n total elements. You can reduce this to O(n log(m)) by using a heap to reduce the number of comparisons for each element. I use a helper class that allows me to compare iterables. This way I don't destroy the initial lists, and it should operate with reasonable complexity no matter what type of lists are input. The only flaw I see with the implementation below is that it doesn't support lists with null elements, however this could be fixed with sentinels if desired.
public static <E extends Comparable<? super E>> List<E> merge(Collection<? extends List<? extends E>> lists) {
PriorityQueue<CompIterator<E>> queue = new PriorityQueue<CompIterator<E>>();
for (List<? extends E> list : lists)
if (!list.isEmpty())
queue.add(new CompIterator<E>(list.iterator()));
List<E> merged = new ArrayList<E>();
while (!queue.isEmpty()) {
CompIterator<E> next = queue.remove();
merged.add(next.next());
if (next.hasNext())
queue.add(next);
}
return merged;
}
private static class CompIterator<E extends Comparable<? super E>> implements Iterator<E>, Comparable<CompIterator<E>> {
E peekElem;
Iterator<? extends E> it;
public CompIterator(Iterator<? extends E> it) {
this.it = it;
if (it.hasNext()) peekElem = it.next();
else peekElem = null;
}
#Override
public boolean hasNext() {
return peekElem != null;
}
#Override
public E next() {
E ret = peekElem;
if (it.hasNext()) peekElem = it.next();
else peekElem = null;
return ret;
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
#Override
public int compareTo(CompIterator<E> o) {
if (peekElem == null) return 1;
else return peekElem.compareTo(o.peekElem);
}
}
Every element of the returned list involves two O(log(m)) heap operations, there is also an initial iteration over all of the lists. Therefore the overall complexity is O(n log(m) + m) for n total elements and m lists. making this always faster than concatenating and sorting.
Since Balus and meriton have together given an excellent response to your question about the algorithm, I'll speak to your aside about the "first" idiom.
There are definitely other approaches (like setting lowest to a 'magic' value), but I happen to feel that "first" (to which I'd probably give a longer name, but that's being pedantic) is the best, because it's very clear. Presence of a boolean like "first" is a clear signal that your loop will do something special the first time through. It helps the reader.
Of course you don't need it if you take the Balus/meriton approach, but it's a situation which crops up.