In Java, should I use ArrayList<Long> or long[] ? - java

I am writing a program which accepts 400 numbers of type long and will modify some of them depending on conditions at runtime and I want to know whether to use ArrayList<Long> or long[].
Which will be faster to use? I am thinking of using long[] because size is fixed.

When the size is fixed, long[] is faster, but it allows a less maintainable API, because it does not implement the List interface.
Note a long[] is faster for 2 reasons:
Uses primitive longs and not box object Longs (also enables better cache performace, since the longs are allocated contigously and the Longs aren't guaranteed to)
An array is much simpler and more efficient DS.
Nevertheless, for simpler maintainability - I would have used a List<Long>, unless performace is very critical at this part of the program.
If you use this collection very often in a tight loop - and your profiler says it is indeed a bottleneck - I would then switch to a more efficient long[].

As far as the speed goes, it almost does not matter for a list of 400 items. If you need to grow your list dynamically, ArrayList<Long> is better; if the size is fixed, long[] may be better (and a bit faster, although again, you would probably not notice the difference in most situations).

There are few things which haven't been mentioned in other answers:
A generic collection is in actuality a collection of Objects, or better said, this is what Java compiler will make of it. This is while long[] will always remain what it is.
A consequence of the first bullet point is that if you do something that eventually puts something other then Long into your collection, in certain situations the compiler will let it through (because Java type system is unsound, i.e. as an example, it will allow you to upcast and then re-cast to a completely disparate type).
A more general consequence of these two is that Java generics are half-baked and in some less trivial cases such as reflection, serialization etc. may "surprise" you. It is in fact safer to use plain arrays, then generics.
package tld.example;
import java.util.List;
import java.util.ArrayList;
class Example {
static void testArray(long[] longs) {
System.out.println("testArray");
}
static void testGeneric(List<Long> longs) {
System.out.println("testGeneric");
}
#SuppressWarnings("unchecked")
public static void main(String... arguments) {
List<Long> fakeLongs = new ArrayList<Long>();
List<Object> mischiefManaged = (List<Object>)(Object)fakeLongs;
mischiefManaged.add(new Object());
// this call succeeds and prints the value.
// we could sneak in a wrong type into this function
// and it remained unnoticed
testGeneric(fakeLongs);
long[] realLongs = new long[1];
// this will fail because it is not possible to perform this cast
// despite the compiler thinks it is.
Object[] forgedLongs = (Object[])(Object)realLongs;
forgedLongs[0] = new Object();
testArray(realLongs);
}
}
This example is a bit contrived because it is difficult to come up with a short convincing example, but trust me, in less trivial cases, when you have to use reflection and unsafe casts this is quite a possibility.
Now, you have to consider that besides what is reasonable, there is a tradition. Every community has a set of it's customs and tradition. There are a lot of superficial beliefs, such as those voiced here eg. when someone claims that implementing List API is an unconditional goodness, and if this does not happen, then it must be bad... This is not just a dominating opinion, this is what overwhelming majority of Java programmers believe in. After all, it doesn't matter that much, and Java, as a language has a lot more of other shortcomings... so, if you want to secure your job interview or simply avoid conflicts with other Java programmers, then use Java generics, no matter the reason. But if you don't like it - well, perhaps just use some other language ;)

long[] is both much faster and takes much less memory. Read "Effective Java" Item 49: "Prefer primitive types to boxed primitives"

I think better to use ArrayList because it is much much easier to maintain in long runs. In future if the size of your array gets increased beyond 400 then maintaining a long[] has lot of overhead whereas ArrayList grows dynamically so you don't need to worry about increased size.
Also, deletion of element is handled in much better way by ArrayList than static arrays (long[]) as they automatically reorganize the elements so that they still appear as ordered elements.
Static arrays are worse at this.

Why not use one of the many list implementations that work on primitive types? TLongArrayList for instance. It can be used just like Javas List but is based on a long[] array. So you have the advantages of both sides.
HPPC has a short overview of some libraries: https://github.com/carrotsearch/hppc/blob/master/ALTERNATIVES.txt

Related

Why does Java CharSequence.chars() return an IntStream? [duplicate]

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?
As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.
Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:
Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
IntStream unboxedChars(), which would to be used for performance code.
However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.
In Java 7 I would have done it like this:
for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}
And I think a reasonable method to do it in Java 8 is the following:
hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);
Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.
Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.
Other ugly alternatives for Java 8:
By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:
hello.chars()
.forEach(i -> System.out.println((char)i));
Moreover, using method references to your own method do not work anymore! Consider the following:
private void print(char c) {
System.out.println(c);
}
and then
hello.chars()
.forEach(this::print);
This will give a compile error, as there possibly is a lossy conversion.
Conclusion:
The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.
The answer from skiwi covered many of the major points already. I'll fill in a bit more background.
The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.
Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.
This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.
When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)
For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.
We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.
The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.
So, this boils down to choosing pragmatically among several alternatives:
We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).
We chose the last one.

Why is String.chars() a stream of ints in Java 8?

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?
As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.
Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:
Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
IntStream unboxedChars(), which would to be used for performance code.
However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.
In Java 7 I would have done it like this:
for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}
And I think a reasonable method to do it in Java 8 is the following:
hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);
Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.
Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.
Other ugly alternatives for Java 8:
By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:
hello.chars()
.forEach(i -> System.out.println((char)i));
Moreover, using method references to your own method do not work anymore! Consider the following:
private void print(char c) {
System.out.println(c);
}
and then
hello.chars()
.forEach(this::print);
This will give a compile error, as there possibly is a lossy conversion.
Conclusion:
The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.
The answer from skiwi covered many of the major points already. I'll fill in a bit more background.
The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.
Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.
This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.
When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)
For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.
We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.
The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.
So, this boils down to choosing pragmatically among several alternatives:
We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).
We chose the last one.

Passing dynamic list of primitives to a Java method

I need to pass a dynamic list of primitives to a Java method. That could be (int, int, float) or (double, char) or whatever. I know that's not possible, so I was thinking of valid solutions to this problem.
Since I am developing a game on Android, where I want to avoid garbage collection as much as possible, I do not want to use any objects (e.g. because of auto boxing), but solely primitive data types. Thus a collection or array of primitive class objects (e.g. Integer) is not an option in my case.
So I was thinking whether I could pass a class object to my method, which would contain all the primitive vales I need. However, thats neither a solution to my problem, because as said the list of primitives is variable. So if I would go that way in my method I then don't know how to access this dynmic list of primitives (at least not without any conversion to objects, which is what I want to avoid).
Now I feel a bit lost here. I do not know of any other possible way in Java how to solve my problem. I hope that's simply a lack of knowledge on my side. Does anyone of you know a solution without a conversion to and from objects?
It would perhaps be useful to provide some more context and explain on exactly what you want to use this technique for, since this will probably be necessary to decide on the best approach.
Conceptually, you are trying to do something that is always difficult in any language that passes parameters on a managed stack. What do you expect the poor compiler to do? Either it lets you push an arbitrary number of arguments on the stack and access them with some stack pointer arithmetic (fine in C which lets you play with pointers as much as you like, not so fine in a managed language like Java) or it will need to pass a reference to storage elsewhere (which implies allocation or some form of buffer).
Luckily, there are several ways to do efficient primitive parameter passing in Java. Here is my list of the most promising approaches, roughly the order you should consider them:
Overloading - have multiple methods with different primitive arguments to handle all the possible combinations. Likely to be the the best / simplest / most lightweight option if there are a relatively small number of arguments. Also great performance since the compiler will statically work out which overloaded method to call.
Primitive arrays - Good way of passing an arbitrary number of primitive arguments. Note that you will probably need to keep a primitive array around as a buffer (otherwise you will have to allocate it when needed, which defeats your objective of avoiding allocations!). If you use partially-filled primitive arrays you will also need to pass offset and/or count arguments into the array.
Pass objects with primitive fields - works well if the set of primitive fields is relatively well known in advance. Note that you will also have to keep an instance of the class around to act as a buffer (otherwise you will have to allocate it when needed, which defeats your objective of avoiding allocations!).
Use a specialised primitive collection library - e.g. the Trove library. Great performance and saves you having to write a lot of code as these are generally well designed and maintained libraries. Pretty good option if these collections of primitives are going to be long lived, i.e. you're not creating the collection purely for the purpose of passing some parameters.
NIO Buffers - roughly equivalent to using arrays or primitive collections in terms of performance. They have a bit of overhead, but could be a better option if you need NIO buffers for another reason (e.g. if the primitives are being passed around in networking code or 3D library code that uses the same buffer types, or if the data needs to be passed to/from native code). They also handle offsets and counts for you which can helpful.
Code generation - write code that generates the appropriate bytceode for the specialised primitive methods (either ahead of time or dynamically). This is not for the faint-hearted, but is one way to get absolutely optimal performance. You'll probably want to use a library like ASM, or alternatively pick a JVM language that can easily do the code generation for you (Clojure springs to mind).
There simply isn't. The only way to have a variable number of parameters in a method is to use the ... operator, which does not support primitives. All generics also only support primitives.
The only thing I can possibly think of would be a class like this:
class ReallyBadPrimitives {
char[] chars;
int[] ints;
float[] floats;
}
And resize the arrays as you add to them. But that's REALLY, REALLY bad as you lose basically ALL referential integrity in your system.
I wouldn't worry about garbage collection - I would solve your problems using objects and autoboxing if you have to (or better yet, avoiding this "unknown set of input parameters" and get a solid protocol down). Once you have a working prototype, see if you run into performance problems, and then make necessary adjustments. You might find the JVM can handle those objects better than you originally thought.
Try to use the ... operator:
static int sum (int ... numbers)
{
int total = 0;
for (int i = 0; i < numbers.length; i++)
total += numbers [i];
return total;
}
You can use BitSet similar to C++ Bit field.
http://docs.oracle.com/javase/1.3/docs/api/java/util/BitSet.html
You could also cast all your primitives to double then just pass in an array of double. The only trick there is that you can't use the boolean type.
Fwiw, something like sum(int... numbers) would not autobox the ints. It would create a single int[] to hold them, so there would be an object allocation; but it wouldn't be per int.
public class VarArgs {
public static void main(String[] args) {
System.out.println(variableInts(1, 2));
System.out.println(variableIntegers(1, 2, 3));
}
private static String variableInts(int... args) {
// args is an int[], and ints can't have getClass(), so this doesn't compile
// args[0].getClass();
return args.getClass().toString() + " ";
}
private static String variableIntegers(Integer... args) {
// args is an Integer[], and Integers can have getClass()
args[0].getClass();
return args.getClass().toString();
}
}
output:
class [I
class [Ljava.lang.Integer;

Casting and Generics, Any performance difference?

I am coding in Android a lot lately, Though I am comfortable in JAVA, but missing some
ideas about core concepts being used there.
I am interested to know whether any performance difference is there between these 2 codes.
First Method:
//Specified as member variable.
ArrayList <String> myList = new ArrayList <String>();
and using as String temp = myList.get(1);
2nd Method:
ArrayList myList = new ArrayList(); //Specified as member variable.
and using
String temp1 = myList.get(1).toString();
I know its about casting. Does the first method has great advantage over the second,
Most of the time in real coding I have to use second method because arraylist can take different data types, I end up specifying
ArrayList <Object> = new ArrayList <Object>();
or more generic way.
In short, there's no performance difference worth worrying about, if it exists at all. Generic information isn't stored at runtime anyway, so there's not really anything else happening to slow things down - and as pointed out by other answers it may even be faster (though even if it hypothetically were slightly slower, I'd still advocate using generics.) It's probably good to get into the habit of not thinking about performance so much on this level. Readability and code quality are generally much more important than micro-optimisations!
In short, generics would be the preferred option since they guarantee type safety and make your code cleaner to read.
In terms of the fact you're storing completely different object types (i.e. not related from some inheritance hierarchy you're using) in an arraylist, that's almost definitely a flaw with your design! I can count the times I've done this on one hand, and it was always a temporary bodge.
Generics aren't reified, which means they go away at runtime. Using generics is preferred for several reasons:
It makes your code clearer, as to which classes are interacting
It keeps it type safe: you can't accidentally add a List to a List
It's faster: casting requires the JVM to test type castability at runtime, in case it needs to throw a ClassCastException. With Generics, the compiler knows what types things must be, and so it doesn't need to check them.
There is a performance difference in that code:
The second method is actually slower.
The reason why:
Generics don't require casting/conversion (your code uses a conversion method, not a cast), the type is already correct. So when you call the toString() method, it is an extra call with extra operations that are unnecessary when using the method with generics.
There wouldn't be a problem with casting, as you are using the toString() method. But you could accidentally add an incorrect object (such as an array of Strings). The toString() method would work properly and not throw an exception, but you would get odd results.
As android is used for Mobiles and handheld devices where resources are limited you have to be careful using while coding.
Casting can be overhead if you are using String data type to store in ArrayList.
So in my opinion you should use first method of being specific.
There is no runtime performance difference because of "type erasure".
But if you are using Java 1.5 or above, you SHOULD use generics and not the weakly typed counterparts.
Advantages of generics --
* The flexibility of dynamic binding, with the advantage of static type-checking. Compiler-detected errors are less expensive to repair than those detected at runtime.
* There is less ambiguity between containers, so code reviews are simpler.
* Using fewer casts makes code cleaner.

API Design for Idiot-Proof Iteration Without Generics

When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T> is out, as is .Net's IEnumerable<T>.
You want people to be able to use the enhanced for loop in Java (for Item i : items), and the foreach / For Each loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone(); is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray() instead of appointments(), and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments() clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
cloning the array once, and iterating over it using the enhanced for loop
iterating over an ArrayList using
the enhanced for loop
iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop
iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
1,000
1,300
1,300
5,000
For 100 objects:
1,300
4,900
6,300
85,500
For 1000 objects:
6,400
51,700
56,200
7,000,300
For 10000 objects:
68,000
445,000
651,000
655,180,000
Rough figures for sure, but enough to convince me of two things:
Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.)
If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.
clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
interface AppointmentHandler {
public void onAppointment(Appointment appointment);
}
class Schedule {
public void forEachAppointment(AppointmentHandler ah) {
for(Appointment a: internalArray)
ah.onAppointment(a);
}
}
Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.
As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.

Categories

Resources