I was wondering, is the size() method that you can call on a existing ArrayList<T> cached?
Or is it preferable in performance critical code that I just store the size() in a local int?
I would expect that it is indeed cached, when you don't add/remove items between calls to size().
Am I right?
update
I am not talking about inlining or such things. I just want to know if the method size() itself caches the value internally, or that it dynamically computes every time when called.
I don't think I'd say it's "cached" as such - but it's just stored in a field, so it's fast enough to call frequently.
The Sun JDK implementation of size() is just:
public int size() {
return size;
}
Yes.
A quick look at the Java source would tell you the answer.
This is the implementation in OpenJDK version:
/**
* Returns the number of elements in this list.
*
* #return the number of elements in this list
*/
public int size() {
return size;
}
So it's as good as a method call is going to get. It's not very likely that HotSpot caches the value returned by this method, so if you're really THAT concerned, you can cache it yourself. Unless your profiling has shown that this is a bottleneck, though (not very likely), you should just concern yourself with readability rather than whether a simple method call that returns the value of a field is cached or not.
I don't know the answer for sure, but my guess would be: no. There is no way for a Java compiler, short of special casing ArrayList, to know that the functions you invoke will be non-mutating and that, as a result, the invocation of size() should return the same value. Therefore, I find it highly unlikely that a Java compiler will factor out repeated calls to size() and store them in a temporary value. If you need that level of optimization then you should store the value in a local variable yourself. Otherwise, yes, you will pay for the function invocation overhead associated with calling the size() method. Note, though, that the size() method is O(1) for an ArrayList (though the function call overhead is pretty hefty). Personally, I would factor out any calls to size() from loops and manually store them in a local where applicable.
Edit
Even though such an optimization cannot be performed by a Java compiler, it has been aptly pointed out that the JIT can inline the implementation of ArrayList.size() such that it only costs the same as a field access, without any additional method call overhead, so in effect the costs are negligible, although you might still save slightly by manually saving in a temporary (which could potentially eliminate a memory lookup and instead serve the variable out of a CPU register).
The obvious implementation of ArrayList would be to store the size internally in a field. I would be very surprised if it ever had to be computed, even after resizing.
Why would it need to be? ArrayList implements a List interface that is backed by an array, after all.
I would assume it to just have a size member, that is incremented when you insert things and decremented when you remove, and then it just returns that.
I haven't looked at more than the API docs now, though.
If caching the result of the size() method would noticeably improve performance (which it sometimes does - I regularly see ArrayList.size() as the top compiled method in my -Xprof output) then consider converting the whole list to an array for an even greater speedup.
Here's one trick that can work if you iterate over the list regularly but only rarely update it:
class FooProcessor {
private Foo[] fooArray = null;
private List<Foo> fooList = new ArrayList<Foo>();
public void addFoo(Foo foo) {
fooList.add(foo);
fooArray = null;
}
public void processAllFoos() {
Foo[] foos = getFooArray();
for (int i = 0; i < foos.length; ++ i) {
process(foos[i]);
}
}
private void getFooArray() {
if (fooArray == null) {
Foo[] tmpArray = new Foo[fooList.size()];
fooArray = fooList.toArray(tmpArray);
}
return fooArray;
}
}
Related
Let suppose I have a method
public int dummy(){
return 1;
}
and if I call this method by
dummy();
not
int a = dummy();
will it make any difference? why?
No, it wouldn't make any difference. We call a ton of methods from the JDK ignoring their outputs - List's .remove(int index) removes the element at a given index, and returns what element was removed. It is normal to ignore it and move ahead.
It will still compile, but (assuming that the method is just called in isolation) it'll be pointless, since you can never use the value returned. (At a deeper level, and depending on implementation, it's possible the JVM may optimise away the method call entirely.)
If however you do int a = dummy();, then you can later reference that variable, eg. System.out.println(a); will print out its value.
If your method had another side effect other than returning a value, such as:
public int dummy(){
System.out.println("hello");
return 1;
}
...then calling it without assigning its result to a variable wouldn't be pointless, since the side effect would still occur (printing "hello" in this case). While some argue this is poor design, this sometimes occurs in practice in the Java libraries - you could call createnewFile() on a File object for instance and ignore its returned boolean value (which will tell you whether the file was created or not.)
These two are both true statements, but the difference is, when you use
dummy()
this will call the function in the program, but since you are returning a value, it is pointless ( considering you will keep the value )
When you use
int a = dummy()
You will create an integer named a in order to store the output ( return value ) of the function, which you can reuse any time.
The below method is part of a weighted random selection algorithm for picking songs.
I would like to convert the below method to use streams, to decide if it would be clearer / preferable. I am not certain it is possible at all, since calculation is a stateful operation, dependent on position in the list.
public Song songForTicketNumber(long ticket)
{
if(ticket<0) return null;
long remaining = ticket;
for(Song s : allSongs) // allSongs is ordered list
{
rem-=s.numTickets; // numTickets is a long and never negative
if(remaining<0)
return s;
}
return null;
}
More formally: If n is the sum of all Song::numTickets for each Song object in allSongs, then for any integer 0 through n-1, the above method should return a song in the list. The number of integers which will return a specific Song object x, would be determined by x.numTickets. The selection criteria for a specific song is a range of consecutive integers determined by both its numTickets property and the numTickets property for each item in the list to its left. As currently written, anything outside the range would return null.
Note: Out of range behavior can be modified to accommodate Streams (other than returning null)
The efficiency of a Stream compared to a basic for or for-each loop is a matter of circumstance. In yours, it's highly likely that a Stream would be less efficient than your current code for, among others, these major reasons:
Your function is stateful as you mentioned. Maintaining a state with this method probably means finagling some kind of anonymous implementation of a BinaryOperator to use with Stream.reduce, and it's going to turn out bulkier and more confusing to read than your current code.
You're short circuiting in your current loop, and no Stream operation will reflect that kind of efficiency, especially considering this in combination with #1.
Your collection is ordered, which means the stream will iterate over elements in a manner very similar to your existing loop anyway. Depending on the size of your collection, you might get some efficiency out of parallelStream, but having to maintain the order in this case will mean a less efficient stream.
The only real benefit you could get from switching to a Stream is the difference in memory consumption (You could keep allSongs out of memory and let Stream handle it in a more memory-efficient way), which doesn't seem applicable here.
In conclusion, since the Stream operations would be even more complex to write and would probably be harmful, if anything, to your efficiency, I would recommend that you do not pursue this change.
That being said, I personally can't come up with a Stream based solution to actually answer your question of how to convert this work to a Stream. Again, it would be something complex and strange involving a reducer or similar... (I'll delete this answer if this is insufficient.)
Java streams do have the facility to short circuit evaluation, see for example the documentation for findFirst(). Having said that, decrementing and checking remaining, requires state mutation which is not great. Not great, but doable:
public Optional<Song> songForTicketNumber(long ticket, Stream<Song> songs) {
if (ticket < 0) return Optional.empty();
AtomicLong remaining = new AtomicLong(ticket);
return songs.filter(song -> decrementAndCheck(song, remaining)).findFirst();
}
private boolean decrementAndCheck(Song song, AtomicLong total) {
total.addAndGet(-song.numTickets);
return total.get() < 0;
}
As far as I can tell, the only advantage of this approach is that you could switch to parallel streams if you wanted to.
I am new to chronicle-map. I am trying to model an off heap map using chronicle-map where the key is a primitive short and the value is a primitive long array. The max size of the long array value is known for a given map. However I will have multiple maps of this kind each of which may have a different max size for the long array value. My question relates to the serialisation/deserialisation of the key and value.
From reading the documentation I understand that for the key I can use the value type ShortValue and reuse the instance of the implementation of that interface. Regarding the value I have found the page talking about DataAccess and SizedReader which gives an example for byte[] but I'm unsure how to adapt this to a long[]. One additional requirement I have is that I need to get and set values at arbitrary indices in the long array without paying the cost of a full serialisation/deserialisation of the entire value each time.
So my question is: how can I model the value type when constructing the map and what serialisation/deserialisation code do I need for a long[] array if the max size is known per map and I need to be able to read and write random indices without serialising/deserialising the entire value payload each time? Ideally the long[] would be encoded/decoded directly to/from off heap without undergoing an on heap intermediate conversion to a byte[] and also the chronicle-map code would not allocate at runtime. Thank you.
First, I recommend to use some kind of LongList interface abstraction instead of long[], it will make it easier to deal with size variability, provide alternative flyweight implementations, etc.
If you want to read/write just single elements in large lists, you should use advanced contexts API:
/** This method is entirely garbage-free, deserialization-free, and thread-safe. */
void putOneValue(ChronicleMap<ShortValue, LongList> map, ShortValue key, int index,
long element) {
if (index < 0) throw throw new IndexOutOfBoundsException(...);
try (ExternalMapQueryContext<ShortValue, LongList, ?> c = map.getContext(key)) {
c.writeLock().lock(); // (1)
MapEntry<ShortValue, LongList> entry = c.entry();
if (entry != null) {
Data<LongList> value = entry.value();
BytesStore valueBytes = (BytesStore) value.bytes(); // (2)
long valueBytesOffset = value.offset();
long valueBytesSize = value.size();
int valueListSize = (int) (valueBytesSize / Long.BYTES); // (3)
if (index >= valueListSize) throw new IndexOutOfBoundsException(...);
valueBytes.writeLong(valueBytesOffset + ((long) index) * Long.BYTES,
element);
((ChecksumEntry) entry).updateChecksum(); // (4)
} else {
// there is no entry for the given key
throw ...
}
}
}
Notes:
You must acquire writeLock() from the beginning, because otherwise readLock() is going to be acquired automatically when you call context.entry() method, and you won't be able to upgrade read lock to write lock later. Please read HashQueryContext javadoc carefully.
Data.bytes() formally returns RandomDataInput, but you could be sure (it's specified in Data.bytes() javadoc) that it's actually an instance of BytesStore (that's combination of RandomDataInput and RandomDataOutput).
Assuming proper SizedReader and SizedWriter (or DataAccess) are provided. Note that "bytes/element joint size" technique is used, the same as in the example given in SizedReader and SizedWriter doc section, PointListSizeMarshaller. You could base your LongListMarshaller on that example class.
This cast is specified, see ChecksumEntry javadoc and the section about checksums in the doc. If you have a purely in-memory (not persisted) Chronicle Map, or turned checksums off, this call could be omitted.
Implementation of single element read is similar.
Answering extra questions:
I've implemented a SizedReader+Writer. Do I need DataAccess or is SizedWriter fast enough for primitive arrays? I looked at the ByteArrayDataAccess but it's not clear how to port it for long arrays given that the internal HeapBytesStore is so specific to byte[]/ByteBuffers?
Usage of DataAccess instead of SizedWriter allows to make one less value data copy on Map.put(key, value). However, if in your use case putOneValue() (as in the example above) is the dominating type of query, it won't make much difference. If Map.put(key, value) (and replace(), etc., i. e. any "full value write" operations) are important, it is still possible to implement DataAccess for LongList. It will look like this:
class LongListDataAccess implements DataAccess<LongList>, Data<LongList>,
StatefulCopyable<LongListDataAccess> {
transient ByteStore cachedBytes;
transient boolean cachedBytesInitialized;
transient LongList list;
#Override public Data<LongList> getData(LongList list) {
this.list = list;
this.cachedBytesInitialized = false;
return this;
}
#Override public long size() {
return ((long) list.size()) * Long.BYTES;
}
#Override public void writeTo(RandomDataOutput target, long targetOffset) {
for (int i = 0; i < list.size(); i++) {
target.writeLong(targetOffset + ((long) i) * Long.BYTES), list.get(i));
}
}
...
}
For efficiency, the methods size() and writeTo() are key. But it's important to implement all other methods (which I didn't write here) correctly too. Read DataAccess, Data and StatefulCopyable javadocs very carefully, and also Understanding StatefulCopyable, DataAccess and SizedReader and Custom serialization checklist in the tutorial with great attention too.
Does the read/write locking mediate across multiple process reading and writing on same machine or just within a single process?
It's safe accross processes, note that the interface is called InterProcessReadWriteUpdateLock.
When storing objects, with a variable size not known in advance, as values will that cause fragmentation off heap and in the persisted file?
Storing value for a key once and not changing the size of the value (and not removing keys) after that won't cause external fragmentation. Changing size of the value or removing keys could cause external fragmentation. ChronicleMapBuilder.actualChunkSize() configuration allows to trade between external and internal fragmentation. The bigger the chunk, the less external fragmentation, but the more internal fragmentation. If your values are significantly bigger than page size (4 KB), you could set up absurdly big chunk size and still have internal fragmentation bound by the page size, because Chronicle Map is able to exploit lazy page allocation feature in Linux.
I'm trying to create a "Limited List" in Java. It should remove old entries if I add new entries.
e.g. If the list size is 3, and I add the 4rd item, it should remove the 1st item. Currently I solved this using remove(0) in a ArrayList, but I heard ArrayLists are very slow.
Is there a faster way to solve this? My current code is:
public class LimitedList<T> extends ArrayList<T> {
private int maximum;
public LimitedList(int maximum) {
this.maximum = maximum;
}
#Override
public boolean add(T t) {
boolean r = super.add(t);
while (size() > maximum) {
remove(0);
}
return r;
}
}
but I heard ArrayList's are very slow.
Some operations are slow for ArrayLists others for other collections. This is because an ArrayList uses an array behind the curtains, and for a remove operation in the head, it has to shift all the elements one to the left. Therefore in terms of big oh, removing from the head is O(n) for ArrayLists where it is O(1) for LinkedLists.
If you only want to add items in the tail of the collection and remove elements in the head, I propose you use a LinkedList:
public class LimitedList<T> extends LinkedList<T> {
private int maximum;
public LimitedList(int maximum) {
this.maximum = maximum;
}
#Override
public boolean add(T t) {
boolean r = super.add(t);
int n = this.size();
while (n > maximum) {
this.removeFirst();
n--;
}
return r;
}
}
An important note from #JBNizet is that you should inherit from ArrayList or LinkedList directly, but implement a Collection<T>, something like:
public class LimitedList<T> implements Collection<T> {
private final LinkedList<T> list;
private int maximum;
public LimitedList(int maximum) {
this.list = new LinkedList<T>();
this.maximum = maximum;
}
#Override
public boolean add(T t) {
boolean r = this.list.add(t);
int n = this.list.size();
while (n > maximum) {
this.list.removeFirst();
n--;
}
return r;
}
//implement other Collection methods...
}
Please: don't give a nickel on "what you hear". Programming is about hard facts, not hearsay. You can be very sure that all the collection implementations were written by experts in the field; so the first thing would be to check the various documentation out there to assess if the operations that you need really come with a significant performance tag to you.
Yes, collection operations have different cost, but all of that is documented.
Then: if you are really only talking about 3, 4 elements ... do you really really think that performance matters? That would only be the case if you would be using this lists for (hundreds of) thousands of times per minute or so. Keep in mind what a modern CPU can do in a few micro seconds nowadays; and how many method calls you need to have in order for "call B is 5 nanoseconds slower" to be noticeable.
In other words: strive for good, clean (SOLID based) designs; instead of worrying about potential performance problems. That will pay off 10 times more compared to spending hours on topics like this.
You see: one should be really careful about "performance issues". Because: you only realize that you have one ... when there are complains from your users. And if that happens, you don't start blindly by trying to improve this or that; no then you first do profiling to measure where your problems are coming from. And then you fix those things that need improvement.
EDIT: so your last comment indicates that you did some sort of measurement (hint: if you would have said so directly, instead of writing "I heard that lists are slow" ... probably you would have gotten a different answers already). OK, lets assume that you did good profiling (where: even when one collection types performs slower for you, the question is still: will it be called so often to cause trouble). Anyway: what you really ant to is to understand access patterns that you do on your code. Do you need random access, do you iterate in lists, etc. You see, typically, a container that forgets elements ... that sounds more like a cache than a list. In that sense: do you rely on quick in-sequence retrieval of objects; or how is data accessed? Those are the questions that you have to ask yourself; and only then you can decide if you should implement your own special limited list; or if there are existing components (maybe from apache commons or guava) that give you what you need.
I think you should use Queues, because this kind of collection use a FIFO procedure (First In - First Out). So the first element you inserted will be the first that goes out and you can manage simply your queue by using methods like Enqueue and Dequeue (both runs in O(1)).
If you are fine with writing code, best option would be not to extend any existing collection and make your own collection. For list, you can implement list interface.( recommended approach is to take AbstractList abstract class)
This will give you more control and you can design for speed.
Fastest option would be to dump the collection and simply use a array (wrap operation in a class). However, that may not be suitable for you as it will not be under collection hierarchy.
For example given the following methods:
public double[] getCoord(){
return coord;
}
public double getCoord(int variable){
return coord[variable];
}
Would it be better to call
object.getCoord()[1]
or
object.getCoord(1)
and why?
Although there is no performance difference, the second method presents a far superior API, because Java arrays are always mutable. The first API lets your users write
object.getCoord()[1] = 12.345;
and modify internals of your object behind your back. This is never a good thing: even a non-malicious users could do things you never intended, simply by mistake.
In terms of performance, it doesn't matter. The first method returns a reference to the array, not a copy.
That said, the second method protects the array from being modified outside the class.
No, Java doesn't read the whole array when you use the subscript operator ([]). With regards to would it be better to use the accessor method to grab the array first, then index into it versus call a method that does the same thing, it's probably negligible. You're still incurring the overhead (minimal mind you) of invoking a function and returning a result either way.
I am going to guess that #2 is marginally slower because a parameter has to be pushed onto the stack prior to the call to getCoord(int). Not much in it though.
Neither has to read the whole array.
Both are slower than direct array access, for example coord[1].