How does RandomAccess interface works internally?

How does RandomAccess interface works internally? - java

As per my understanding, this RandomAccess interface provides an ability to retrieve data in constant time. How this functionality is achieved? Is there any hashing technique?
Thanks in advance

Your understanding is wrong. RandomAccess may provide a promise but it doesn’t provide any ability. Providing the ability is up to the implementing class. Actually RandomAccess doesn’t “work internally” in any sense of the expression, it has no functionality whatsoever.
In the documentation it called an indication, not a promise:
Marker interface used by List implementations to indicate that they
support fast (generally constant time) random access.
The classes implementing RandomAccess generally provide constant time access by storing their contents in an array. Array lookup takes constant time. (Classes implementing the interface include ArrayList, AttributeList, CopyOnWriteArrayList, RoleList, RoleUnresolvedList, Stack and Vector according to the documentation. Surely also some third-party classes implement it too.)
Edit: Fast access in arrays relies on the hardware. The computer can access any memory cell (storage cell) in constant time. Array elements are laid out in consecutive memory cells. So if you know that an array is laid out from address 54321 and you need to access the array element at index 35, you add those two numbers and know the element is at memory address 54356 (I am ignoring the slight complication in that the word size is often 8 bytes and addresses have granularity of 1 byte). The principle is the same whether the JVM is interpreting byte code or the method has been compiled to native machine code thus bypasses the JVM. Your search engine should lead to a better explanation than the one I am giving here.
It’s quite general for interfaces they they don’t have any internal working but leave this to the classes implementing the interface (as has been said, since Java 8 and 9 the situation has become more blurred, but in this respect RandomAccess still keeps the classical design). It’s the job of an interface to specify a behaviour and that of the implementing class/es to provide it.
Edit:
From the documentation, "The primary purpose of this interface is to
allow generic algorithms to alter their behavior to provide good
performance when applied to either random or sequential access lists".
Why it is mentioned as algorithms here?
“Generic algortithms” refers to algorithms that you and I write that work with lists that may or may not have fast element access (may or may not implement RandomAccess). So you may write for example:
public void yourMethod(List<String> param) {
if (param instanceof RandomAccess) {
// do your work relying on fast element access
} else {
// do your work in some other way that doesn’t depend on fast element access
// for example by iterating sequentially through the list
// or copying the elements to a faster list implementation first.
}
}
So no, it has nothing to do with different JVM implementations.

RandomAccess is a marker interface. It is an interface with no methods. Apart from that an interface generally does not have any functionality in it 1. It is upto the implementing classes to provide random access functionality.
Have a look at
Why does ArrayList implement RandomAccess Interface?
What is the use of marker interfaces in Java?
1 Yes. An interface can have default methods that has some logic. But this was added only from Java 8 to enable adding new methods to an interface without breaking the implementing classes. An interface is only for providing or declaring the contract that the implementing class has to fulfil.

Related

Why is there no "List.reverse()" method in Java?

In Java, to reverse elements in a List, I need to use:
Collections.reverse(list)
I was just wondering why Java doesn't implement the reverse method within the List interface so that I could do the in-place reverse like this:
list.reverse()
Does anyone have any ideas about this?

Why is there no List.reverse() method in Java?
Because there is a Collections.reverse(List) method instead.
Because the API designers figured it was a bad idea to force every List implementation1 to implement a method that wasn't used 99.9% of the time2. This could be addressed by making the method "optional", but that has downsides too; e.g. runtime exceptions.
Because for some kinds of list (stream wrappers / adapters for example) implementing in-place reverse would be problematic. It changes the memory usage characteristics of the list by requiring it to be reified.
Also note that the generic implementation (source code) of reverse() that is provided by Collection uses set to swap elements. It is close to optimal for the standard list types.
#shmosel comments:
I assume OP is asking why it wasn't added as a default method, as List.sort() was.
Good point. Possibly the 99.9% argument applies. Bear in mind that this would only help people with a codebase that is built using a Java 8 or later compilers, etc.
1 - This includes implementations in your codebase and 3rd-party libraries.
2 - 86% of statistics are made up for theatrical effect :-)

For the same reason that fill and rotate and shuffle and swap and infinitely more possible list functions aren't declared in the List interface. They're not part of the "list" abstraction; rather, they can be implemented on top of that abstraction.
Once a List implements the methods already in the List interface, a reverse function can be written on top of the List abstraction without any knowledge of a particular List implementation. Therefore, it would be pointless to force every class implementing List to provide a custom implementation of reverse (and fill, rotate, shuffle, swap, etc.).

Note: This question is a very specific case of "Why does the Collections class contain standalone (static) methods, instead of them being added to the List interface?" - one could even consider is as a duplicate. Beyond that, arguing about the reasoning behind the decision for each individual method is reading tea leaves, and nobody can tell "the reason" for the design decision for the particular case of the reverse method (until, maybe Josh Bloch posts an answer here). Interestingly, this is a point that is not covered in the Java Collections API Design FAQ...
Some of the other answers seem convincing at the first glance, but raise other questions. Particularly, some of them don't give a reason for the design decision at all. Even if there are other ways to emulate the behavior of a certain method, or when a method is not used "99.9% of all time", it can still make sense to include it in the interface.
Looking at the List interface, you will notice that you can basically implement all methods based on two others:
T get(int index)
int size()
(For a mutable list, you also need set). These are exactly the ones that are still abstract in AbstractList. So all other methods are rather "convenience" methods that can be implemented canonically, based on these two methods. In this regard, I think that the answer Sam Estep contains an important point: One could argue to implement dozens of other methods. And there would certainly be good reasons to do so. Having a look at the actual implementation of Collections#reverse(List):
public static void reverse(List<?> list) {
int size = list.size();
if (size < REVERSE_THRESHOLD || list instanceof RandomAccess) {
for (int i=0, mid=size>>1, j=size-1; i<mid; i++, j--)
swap(list, i, j);
} else {
ListIterator fwd = list.listIterator();
ListIterator rev = list.listIterator(size);
for (int i=0, mid=list.size()>>1; i<mid; i++) {
Object tmp = fwd.next();
fwd.set(rev.previous());
rev.set(tmp);
}
}
}
What is this REVERSE_THRESHOLD and RandomAccess thing there? Seriously, if I felt the necessity to introduce a tagging interface like RandomAccess, I would strongly question my design. Whenever you have a method like
void doSomethingWith(Type x) {
if (x instanceof Special) doSomethingSpecial((Special)x);
else doSomethingNormal(x);
}
then this is a strong sign that this should actually be a polymorphic method, which should be implemented accordingly for the Special type.
So yes, it have been justified to pull the reverse method into the interface, to allow a polymorphic implementation. The same applies to fill rotate, shuffle, swap, sort and others. Similarly, one could have introduced a static method like
Collections.containsAll(containing, others);
that offers what is now done with the Collection#containsAll method. But in general: The designers chose a particular set of methods that they found suitable. One of the reasonings behind leaving out certain methods may be given by one of the bottom lines of the talk about "How to Design a Good API & Why it Matters" by Joshua Bloch, one of the core designers of the Java Collections API:
When in doubt, leave it out
Interestingly, of all the methods for which a polymorphic implementation (via a method in the List interface) could have been reasonable, one actually found its way into the interface, using a Java 8 default method:List#sort(). Maybe others, like reverse, will be added later...

Because Collection is an utilitarian class, that actually based on one of SOLID principle : S - Single Responsibility Principle
This principle states that if we have 2 reasons to change for a class, we have to split the functionality in two classes.
You have a class that play a some role, and if you need to manipulate of inner data you need to create some subsidiary class, that will plays another role.

If you need list.reverse() you need to use Eclipse Collections, when you can use just list.reverseThis(), see this. In JDK list, a lot of method (like sort, max, min) does not be added.
It's two different ways of API design:
A lot of method in Collection -> rich collection -> Eclipse
Collections, drawback: a lot of rarely used method in List,
Only most used method and Utility class -> JDK
collection, drawback: need to use Utility class like Collections,

Reverse is defined in Collections (with an extra (s)). This is not a part of collection hierarchy, rather it has been given as a part of utility class which can be used for different Lists.
Reversing a list is not a key part of defining a list , so its kept out of interface and given separately. If defined in the interface, everyone will have to implement it, which may not be suitable for all.
The makers of collection could have build this in List hierarchy as well, ( Since most list derivations have an abstract class in between, they could have put it in any abstract class in between). However, to simplify everyone's life it makes sense to keep it in single utility class so that we don't have to figure out which class to look for all collection related utility functions.

Whether or not to code to an interface when only certain implementations provide correct behavior

So, I know that coding to an interface (using an interface as a variable's declared type instead of its concrete type) is a good practice in OO code, for a bunch of reasons. This is seen a lot, for example, with Java collections. Well, is referring to an interface in your program still a good thing to do when only certain implementations of that interface provide correct behavior?
For example, I have a Java program. In that program, I have multiple sets of objects. I chose to use a Set, because I didn't want duplicate elements. However, I wanted a list's ordering property (i.e. maintain insertion order). Therefore, I am using a LinkedHashSet as the concrete Set type. One thing these sets are used for is computing a dot product involving the primitive fields of the objects contained in the sets, such as in (simplifying a bit):
double dot(LinkedHashSet<E> set, double[] array) {
double sum = 0.0;
int i = 0;
for(E element : set) {
sum += (element.getValue()*array[i]);
}
return sum;
}
This method's result is dependent on the set's iteration order, and so certain Set implementations, mainly HashSet, will give incorrect/unexpected results. Currently, I am using LinkedHashSet throughout my program as the declared type, instead of Set, to ensure correct behavior. However, that feels bad stylistically. What's the right thing to do here? Is it okay to use the concrete type in this case? Or maybe should I use Set as the type, but then state in the documentation which implementations will/won't produce correct behavior? I'm looking more for general input than anything specific to the scenario above. In particular, this should apply to really any scenario where you're using the ordering properties of a LinkedHashSet or TreeSet. How do you prevent unintended implementations from being used? Do you force it in the code (by ditching the interface), or do you specify it in the documentation? Or perhaps some other approach?

It is true that you should code to interfaces, but only if the assurances they make fit your needs. In your case, if you would only use Set then you are saying: I don't want duplicates, but I don't care about the order. You could also use a List and mean: I care about insertion order, but not about duplicates. There even is a SortedSet but it does not have the ordering you want. So in your case you can't replace LinkedHashSet by one of its interfaces without violating the Liskov substitution principle.
So I would argue that in your case you should stick to the implementation until you really need the to switch to another implementation. With modern IDEs refactoring is not that hard anymore so I would refrain from doing any premature optimizations -- YAGNI and KISS.

Very very great question. One solution is: Make another interface! Say one that extends SortedMap but has a getInsertionOrderIterator() method or an interface that extends Map & has getOrderIterator() & getInsertionOrderIterator() methods.
You can write a quick adapter class that contains a LinkedHashMap & TreeMap as the backend data structures.

You can make arguments for either way. As long as you and others maintaining this code know that particular implementations of Set might break the rest of the app or library, then coding to the interface is fine. However, if that is not true, then you should use the specific implementation.
The purpose of coding to an interface is to give you flexibility that will not break your app. Take JDBC for instance. If you use the wrong driver it will break your program similar to how you are describing here. However, if let's say Oracle decided to put behavior in their JDBC driver that subtly broke code written to the JDBC spec instead of the specific Oracle driver code then you'd have to choose.
There is no cut and dry, "this is always right" type of answer.

Is a Collection better than a LinkedList?

Collection list = new LinkedList(); // Good?
LinkedList list = new LinkedList(); // Bad?
First variant gives more flexibility, but is that all? Are there any other reasons to prefer it? What about performance?

These are design decisions, and one size usually doesn't fit all. Also the choice of what is used internally for the member variable can (and usually should be) different from what is exposed to the outside world.
At its heart, Java's collections framework does not provide a complete set of interfaces that describe the performance characteristics without exposing the implementation details. The one interface that describes performance, RandomAccess is a marker interface, and doesn't even extend Collection or re-expose the get(index) API. So I don't think there is a good answer.
As a rule of thumb, I keep the type as unspecific as possible until I recognize (and document) some characteristic that is important. For example, as soon as I want methods to know that insertion order is retained, I would change from Collection to List, and document why that restriction is important. Similarly, move from List to LinkedList if say efficient removal from front becomes important.
When it comes to exposing the collection in public APIs, I always try to start exposing just the few APIs that are expected to get used; for example add(...) and iterator().

Collection list = new LinkedList(); //bad
This is bad because, you don't want this reference to refer say an HashSet(as HashSet also implements Collection and so does many other class's in the collection framework).
LinkedList list = new LinkedList(); //bad?
This is bad because, good practice is to always code to the interface.
List list = new LinkedList();//good
This is good because point 2 days so.(Always Program To an Interface)

Use the most specific type information on non-public objects. They are implementation details, and we want our implementation details as specific and precise as possible.

Sure. If for example java will find and implement more efficient implementation for the List collection, but you already have API that accepts only LinkedList, you won't be able to replace the implementation if you already have clients for this API. If you use interface, you can easily replace the implementation without breaking the APIs.

They're absolutely equivalent. The only reason to use one over the other is that if you later want to use a function of list that only exists in the class LinkedList, you need to use the second.

My general rule is to only be as specific as you need to be at the time (or will need to be in the near future, within reason). Granted, this is somewhat subjective.
In your example I would usually declare it as a List just because the methods available on Collection aren't very powerful, and the distinction between a List and another Collection (Map, Set, etc.) is often logically significant.
Also, in Java 1.5+ don't use raw types -- if you don't know the type that your list will contain, at least use List<?>.

java.util package - classes vs interfaces

Why is Queue an interface, but others like Stack and ArrayList are classes?
I understand that interfaces are made so that clients can implement them and add on their own methods, whereas with classes if every client needs their methods in there it will become huge and bloated..
...or am I missing something here?

A Queue can be implemented in a number of fashions, as can a List or a Set. They all merely specify a contract for different kinds of collections.
An ArrayList, however, is a particular implementation of a List, made to internally use an array for storing elements. LinkedList is also an implementation of a List, which uses a series of interconnected nodes, i.e. a doubly linked list. Similarly, TreeSet and HashMap are particular implementations of sets and maps, respectively.
Now, Stack is a odd case here, particularly because it is a legacy class from older versions of Java. You really shouldn't use a Stack anymore; instead, you should use its modern equivalent, the ArrayDeque. ArrayDeque is an implementation of a Deque (a double-ended queue), that internally uses an array for storage (which is just what Stack does). A Deque supports all of the operations of a Stack, like pop, push, etc. Other implementations of Deque include LinkedList, as mentioned by someone else, although this deviates from Stack in that underlying it is not an array, but a doubly-linked list :-p
Now, there are plenty of implementations of Queue, and many different types of Queues. You not only have BlockingQueues (often used for producer-consumer), whose common implementations include LinkedBlockingQueue and ArrayBlockingQueue, but also TransferQueues, and so on. I digress... you can read more on the collections API in the relevant Java Tutorial.

You get the idea of interfaces correctly. In this case Java standard library already provides both implementations and interfaces.
You are better of using an interface so you can switch the implementation any time.
Hope it makes sense.

I think Stack is well-renowned for being a class that should be an interface. The Java libraries are a bit hit-and-miss when it comes to correctly choosing to provide an interface.
ArrayList is just an implementation of the List interface, so Sun got it correct there! Another classic miss (in my opinion) is the Observable class, which very much needs to be the default implementation of an interface, rather than just a class.

Interesting question. My thought about this that Queue is a basis for a lot of data structures like BlockingQueue, PriorityQueue, Deque, etc. This bunch of classes need specific implementation for various operations so it's much simpler to made Queue as interface.

The reason interfaces are used for List and Queue is NOT to reduce excessive code.
The main advantage of interfaces is they allow you to write flexible, loosely coupled code.
(Here's an awesome answer that describes this concept perfectly)
An interface simply defines a list of methods that will be implemented by a class.
This allows us to do a wonderfully powerful thing:
We can treat all classes that implement an interface the same.
This is a HUGE advantage.
Here's a very simple example:
We want to write a debug method that prints every element in a Collection.
Collection is an interface. It defines a list of operations and does not implement them.
You cannot instantiate a Collection. You can instantiate a class that implements Collection.
There are many classes that implement Collection: ArrayList, Vector, TreeSet, LinkedList, etc... They all have different snazzy features, but they also have certain things in common: Because each class implements Collection, they all implement each method found here.
This allows us to do a very powerful thing:
We can write a method that operates on ANY class that implements Collection.
It would look just like this:
public void printCollection(Collection semeCollection) {
for (Object o : someCollection) {
String s = (o == null) ? "null" : o.toString();
System.out.println(s);
}
}
Because of the magic of interfaces, we can now do the following:
public void testStuff() {
Collection s = new TreeSet();
Collection a = new ArrayList();
Collection v = new Vector();
s.add("I am a set");
a.add("I am an array list");
v.add("I am a vector");
printCollection(s);
printCollection(a);
printCollection(v);
}

Java: ArrayList for List, HashMap for Map, and HashSet for Set?

I usually always find it sufficient to use the concrete classes for the interfaces listed in the title. Usually when I use other types (such as LinkedList or TreeSet), the reason is for functionality and not performance - for example, a LinkedList for a queue.
I do sometimes construct ArrayList with an initial capcacity more than the default of 10 and a HashMap with more than the default buckets of 16, but I usually (especially for business CRUD) never see myself thinking "hmmm...should I use a LinkedList instead ArrayList if I am just going to insert and iterate through the whole List?"
I am just wondering what everyone else here uses (and why) and what type of applications they develop.

Those are definitely my default, although often a LinkedList would in fact be the better choice for lists, as the vast majority of lists seem to just iterate in order, or get converted to an array via Arrays.asList anyway.
But in terms of keeping consistent maintainable code, it makes sense to standardize on those and use alternatives for a reason, that way when someone reads the code and sees an alternative, they immediately start thinking that the code is doing something special.
I always type the parameters and variables as Collection, Map and List unless I have a special reason to refer to the sub type, that way switching is one line of code when you need it.
I could see explicitly requiring an ArrayList sometimes if you need the random access, but in practice that really doesn't happen.

For some kind of lists (e.g. listeners) it makes sense to use a CopyOnWriteArrayList instead of a normal ArrayList. For almost everything else the basic implementations you mentioned are sufficient.

Yep, I use those as defaults. I generally have a rule that on public class methods, I always return the interface type (ie. Map, Set, List, etc.), since other classes (usually) don't need to know what the specific concrete class is. Inside class methods, I'll use the concrete type only if I need access to any extra methods it may have (or if it makes understanding the code easier), otherwise the interface is used.
It's good to be pretty flexible with any rules you do use, though, as a dependancy on concrete class visibility is something that can change over time (especially as your code gets more complex).

Indeed, always use base interfaces Collection, List, Map instead their implementations. To make thinkgs even more flexible you could hide your implementations behind static factory methods, which allow you to switch to a different implementation in case you find something better(I doubt there will be big changes in this field, but you never know). Another benefit is that the syntax is shorter thanks to generics.
Map<String, LongObjectClasName> map = CollectionUtils.newMap();
instead of
Map<String, LongObjectClasName> map = new HashMap<String, LongObjectClasName>();
public class CollectionUtils {
.....
public <T> List<T> newList() {
return new ArrayList<T>();
}
public <T> List<T> newList(int initialCapacity) {
return new ArrayList<T>(initialCapacity);
}
public <T> List<T> newSynchronizedList() {
return new Vector<T>();
}
public <T> List<T> newConcurrentList() {
return new CopyOnWriteArrayList<T>();
}
public <T> List<T> newSynchronizedList(int initialCapacity) {
return new Vector<T>(initialCapacity);
}
...
}

Having just come out of a class about data structure performance, I'll usually look at the kind of algorithm I'm developing or the purpose of the structure before I choose an implementation.
For example, if I'm building a list that has a lot of random accesses into it, I'll use an ArrayList because its random access performance is good, but if I'm inserting things into the list a lot, I might choose a LinkedList instead. (I know modern implementations remove a lot of performance barriers, but this was the first example that came to mind.)
You might want to look at some of the Wikipedia pages for data structures (especially those dealing with sorting algorithms, where performance is especially important) for more information about performance, and the article about Big O notation for a general discussion of measuring the performance of various functions on data structures.

I don't really have a "default", though I suppose I use the implementations listed in the question more often than not. I think about what would be appropriate for whatever particular problem I'm working on, and use it. I don't just blindly default to using ArrayList, I put in 30 seconds of thought along the lines of "well, I'm going to be doing a lot of iterating and removing elements in the middle of this list so I should use a LinkedList".
And I almost always use the interface type for my reference, rather than the implementation. Remember that List is not the only interface that LinkedList implements. I see this a lot:
LinkedList<Item> queue = new LinkedList<Item>();
when what the programmer meant was:
Queue<Item> queue = new LinkedList<Item>();
I also use the Iterable interface a fair amount.

If you are using LinkedList for a queue, you might consider using the Deque interface and ArrayDeque implementing class (introduced in Java 6) instead. To quote the Javadoc for ArrayDeque:
This class is likely to be faster than
Stack when used as a stack, and faster
than LinkedList when used as a queue.

I tend to use one of *Queue classes for queues. However LinkedList is a good choice if you don't need thread safety.

Using the interface type (List, Map) instead of the implementation type (ArrayList, HashMap) is irrelevant within methods - it's mainly important in public APIs, i.e. method signatures (and "public" doesn't necessarily mean "intended to be published outside your team).
When a method takes an ArrayList as a parameter, and you have something else, you're screwed and have to copy your data pointlessly. If the parameter type is List, callers are much more flexible and can, e.g. use Collections.EMPTY_LIST or Collections.singletonList().

I too typically use ArrayList, but I will use TreeSet or HashSet depending on the circumstances. When writing tests, however, Arrays.asList and Collections.singletonList are also frequently used. I've mostly been writing thread-local code, but I could also see using the various concurrent classes as well.
Also, there were times I used ArrayList when what I really wanted was a LinkedHashSet (before it was available).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.