Is using a list of unions a good practice? - java

Union in thrift, by definition,
provide a means to transport exactly one field of a possible set of
fields, just like union {} in C++. Consequently, union members are
implicitly considered optional (see requiredness).
My reasoning is that Union type does not exist anywhere except C++. Every time when I’m using this contract in Java, I need to split this collection in two, do some separate processing and merge two lists back again.
The only usage of this type is for the single object when you trying to access some field, and if it wasn't set you taking the other one.
Is there any situation, where you should use collection of union objects instead of two separate collections?

Don't try to translate C++ to Java. It's much stronger and translation is impossible in most of the cases. Current Java-like languages are made like memory don't exist. The reason is simplicity. Is much easier to learn such language. That's the reason they don't have unions. In Pascal there are unions too. Old languages don't pretend memory doesn't exist.
There are many use cases for using lists of unions. For example if you need to pass arguments to external function, you may create a Variant type, based on union and wrap the calls with multiple functions with const Variant & arguments.

Related

Improve memory usage: IntegerHashMap

We use a HashMap<Integer, SomeType>() with more than a million entries. I consider that large.
But integers are their own hash code. Couldn't we save memory with a, say, IntegerHashMap<Integer, SomeType>() that uses a special Map.Entry, using int directly instead of a pointer to an Integer object? In our case, that would save 1000000x the memory required for an Integer object.
Any faults in my line of thought? Too special to be of general interest? (at least, there is an EnumHashMap)
add1. The first generic parameter of IntegerHashMap is used to make it closely similar to the other Map implementations. It could be dropped, of course.
add2. The same should be possible with other maps and collections. For example ToIntegerHashMap<KeyType, Integer>, IntegerHashSet<Integer>, etc.
What you're looking for is a "Primitive collections" library. They are usually much better with memory usage and performance. One of the oldest/popular libraries was called "Trove". However, it is a bit outdated now. The main active libraries in use now are:
Goldman Sach Collections
Fast Util
Koloboke
See Benchmarks Here
Some words of caution:
"integers are their own hash code" I'd be very careful with this statement. Depending on the integers you have, the distribution of keys may be anything from optimal to terrible. Ideally, I'd design the map so that you can pass in a custom IntFunction as hashing strategy. You can still default this to (i) -> i if you want, but you probably want to introduce a modulo factor, or your internal array will be enormous. You may even want to use an IntBinaryOperator, where one param is the int and the other is the number of buckets.
I would drop the first generic param. You probably don't want to implement Map<Integer, SomeType>, because then you will have to box / unbox in all your methods, and you will lose all your optimizations (except space). Trying to make a primitive collection compatible with an object collection will make the whole exercise pointless.

Why Java doesn't support structures ? (Just out of curiosity)

I know you can use public fields, or some other workarounds. Or maybe you don't need them at all. But just out of curiosity why Sun leave structures out.
Here's a link that explains Sun's decision:
2.2.2 No More Structures or Unions
Java has no structures or unions as complex data types. You don't need structures and unions when you have classes; you can achieve the same effect simply by declaring a class with the appropriate instance variables.
Although Java can support arbitrarily many kinds of classes, the Runtime only supports a few variable types: int, long, float, double, and reference; additionally, the Runtime only recognizes a few object types: byte[], char[], short[], int[], long[], float[], double[], reference[], and non-array object. The system will record a class type for each reference variable or array instance, and the Runtime will perform certain checks like ensuring that a reference stored into an array is compatible with the array type, but such behaviors merely regard the types of objects as "data".
I disagree with the claim that the existence of classes eliminates the need for structures, since structures have semantics which are fundamentally different from class objects. On the other hand, from a Runtime-system design perspective, adding structures greatly complicates the type system. In the absence of structures, the type system only needs eight array types. Adding structures to the type system would require the type system to recognize an arbitrary number of distinct variable types and array types. Such recognition is useful, but Sun felt that it wasn't worth the complexity.
Given the constraints under which Java's Runtime and type system operate, I personally think it should have included a limited form of aggregate type. Much of this would be handled by the language compiler, but it would need a couple of features in the Runtime to really work well. Given a declaration
aggregate TimedNamedPoint
{ int x,y; long startTime; String name; }
a field declaration like TimedNamedPoint tpt; would create four variables: tpt.x, tpt.y of type int, tpt.startTime of type long, and tpt.name of type String. Declaring a parameter of that type would behave similarly.
For such types to be useful, the Runtime would need a couple of slight additions: it would be necessary to allow functions to leave multiple values on the stack when they return, rather than simply having a single return value of one the five main types. Additionally, it would be necessary to have a means of storing multiple kinds of things in an array. While that could be accomplished by having the creation of something declared as TimedNamedPoint[12] actually be an Object[4] which would be initialized to identify two instances of int[12], a long[12], and a String[12], it would be better to have a means by which code could construct a single array instance could hold 24 values of type int, 12 of type long, and 12 of type String.
Personally, I think that for things like Point, the semantics of a simple aggregate would be much cleaner than for a class. Further, the lack of aggregates often makes it impractical to have a method that can return more than one kind of information simultaneously. There are many situations where it would be possible to have a method simultaneously compute and report the sine and cosine of a passed-in angle with much less work than would be required to compute both separately, but having to constructing a SineAndCosineResult object instance would negate any speed advantage that could have been gained by doing so. The execution model wouldn't need to change much to allow a method to leave two floating-point values on the evaluation stack when it returns, but at present no such thing is supported.

What are the ways to implement a map of heterogeneous functions in Java?And their pros and cons?

I want to implement some kind of Command Pattern in Java. I want to have a structure like Map<String commandkey, Function()>. So I have an object (Map, HashMap, LinkedHashMap or whatever associative...) where keys are string commands and values are functions which i want to call by the key. These functions have to be heterogeneous in the sense the can have different return values, number of parameters, names (different signatures). In C++ e.g. I can create a Map of function pointers or functors via boost::function.
So can someone name all the ways of implementing such an idiom (or a pattern if we look at it in broad sense) in Java. I know two ways:
Reflection (minus: slow and very ugly)
Using an interface and anonymous classes (minus: functions must have the same signature)
Detail explanation, links to articles and so on will be very helpful.
there are no function pointers in java, only interfaces
imo reflection is not as slow and ugly as many people think
you still need to know how to call the function (you need to know that in c++ too)
so having the same signature is not that bad, just take a very flexible signature like
void command(Object... args)
Edit:
about Reflection performance:
look at this threads answer: Java Reflection Performance
you can see that just calling a reflection object is not that slow, it's the lookup by name that costs alot of time, and i think i your case you dont need that more than once per function

Java: What's the proper term for multidimensional types, one-dimensional types?

In a Java context, what's the proper way to refer to a variable type that can contain multiple objects or primitives, and the proper way to refer to a type that can contain just one?
In general terms, I refer to lists, arrays, vectors, hashtables, trees, etc, as collections; and I refer to primitive types and one-dimensional objects as scalars.
In the wild, I've heard all sorts of combinations of phrases, including a few that are outright misleading:
"I'm storing my key/value pairs in a hashtable vector."
"Why would you need more than one hashtable?"
"What do you mean? I'm only using one hashtable."
Is there a widely-accepted way to refer to these two groupings of types, at a high level?
The Java Language Spec uses the terms "primitive" and "reference" to make that distinction for both variables and values. This might be confusing in the context of a different programming language where "reference" means something else.
However, I can't tell if that is exactly the distinction you're trying to make. If you want to lump strings and object wrappers like Integer in with the Java primitive types like int you might be distinguishing scaler from non-scaler. Not all non-scalers are collections, of course.
I'm not sure that kind of terminology really applies to an OO language like Java. There's a distinction made between primitives, which can only contain a single value, and Objects, however an Object might contain any number of other objects.
Objects whose purpose is to contain zero-or-more instances of some other object are (in my experience) referred to as collections, maybe because of the Collections API or Arrays.
I guess languages where you're dealing explicitly with pointers and the like depend on the distinction more.
I think your question assumes that there are only 2 classes and one of them is what you call Collections. Let me say that I also call lists, maps, sets, etc. Collections because they're part of the Collections API. However, Collection is not on the same abstraction level as a primitive data type like integer. Really, you have primitives and references. References are pointers to objects which are instances of classes. Classes can be classified many ways. One of these classifications is "Collections".
Your friend who says "hashtable vector" is pretty much wrong if there's only one table. A hash table is a hash table, and a vector is a vector. A hashtable vector is a Vector<Hashtable> as far as I'm concerned.
Just use the most specific Java API class name (Collection, List, Object, etc.), without over-specifying, and 99% of all Java developers will understand what you mean.

what is the point of heterogenous arrays?

I know that more-dynamic-than-Java languages, like Python and Ruby, often allow you to place objects of mixed types in arrays, like so:
["hello", 120, ["world"]]
What I don't understand is why you would ever use a feature like this. If I want to store heterogenous data in Java, I'll usually create an object for it.
For example, say a User has int ID and String name. While I see that in Python/Ruby/PHP you could do something like this:
[["John Smith", 000], ["Smith John", 001], ...]
this seems a bit less safe/OO than creating a class User with attributes ID and name and then having your array:
[<User: name="John Smith", id=000>, <User: name="Smith John", id=001>, ...]
where those <User ...> things represent User objects.
Is there reason to use the former over the latter in languages that support it? Or is there some bigger reason to use heterogenous arrays?
N.B. I am not talking about arrays that include different objects that all implement the same interface or inherit from the same parent, e.g.:
class Square extends Shape
class Triangle extends Shape
[new Square(), new Triangle()]
because that is, to the programmer at least, still a homogenous array as you'll be doing the same thing with each shape (e.g., calling the draw() method), only the methods commonly defined between the two.
As katrielalex wrote: There is no reason not to support heterogeneous lists. In fact, disallowing it would require static typing, and we're back to that old debate. But let's refrain from doing so and instead answer the "why would you use that" part...
To be honest, it is not used that much -- if we make use of the exception in your last paragraph and choose a more liberal definition of "implement the same interface" than e.g. Java or C#. Nearly all of my iterable-crunching code expects all items to implement some interface. Of course it does, otheriwise it could do very little to it!
Don't get me wrong, there are absolutely valid use cases - there's rarely a good reason to write a whole class for containing some data (and even if you add some callables, functional programming sometimes comes to the rescue). A dict would be a more common choice though, and namedtuple is very neat as well. But they are less common than you seem to think, and they are used with thought and discipline, not for cowboy coding.
(Also, you "User as nested list" example is not a good one - since the inner lists are fixed-sized, you better use tuples and that makes it valid even in Haskell (type would be [(String, Integer)]))
Applying a multimethod to the array might make some sense. You switch the strategy to a more functional style in which you focus on a discrete piece of logic (i.e. the multimethod) instead of a discrete piece of data (i.e. the array objects).
In your shapes example, this prevents you from having to define and implement the Shape interface. (Yes, it's not a big deal here, but what if shape was one of several superclasses you wanted to extend? In Java, you're SOL at this point.) Instead, you implement a smart draw() multimethod that first examines the argument and then dispatches to the proper drawing functionality or error handling if the object isn't drawable.
Comparisons between functional and object-oriented styles are all over the place; here are a couple relevant questions that should provide a good start: Functional programming vs Object Oriented programming and Explaining functional programming to object-oriented programmers and less technical people.
Is there reason to use the former over
the latter in languages that support
it?
Yes, there is a very simple reason why you can do this in Python (and i assume the same reason in Ruby):
How would you check that a list is heterogenous?
It can't just compare the types directly because Python has duck typing.
If all the object have some common typeclass Python has no way to guess that either. Everything supports being represented anyways, so you should be able to put them in a list together too.
It wouldn't make any sense to turn lists into the only type that needs a type declaration either.
There is simply no way to prevent you from creating a heterogenous list!
Or is there some bigger reason to use
heterogenous arrays?
No, I can't think of any. As you already mentioned in your question, if you use a heterogenous arrays you're just making things harder than they have to be.
There is no reason not to support heterogeneous lists. It's a limitation for technical reasons, and we don't like those.
Not everything needs to be a class!
In Python, a class is basically a souped up dictionary with some extra stuff anyway. So making a class User is not necessarily any clearer than a dictionary {"name": ..., "id": ...}.
There is nothing to stop you having a heterogeneous array in Java. It is considered poor programming style and using proper POJOs will be faster/more efficient than heterogeneous arrays in Java or any other language as the types of the "fields" are statically known and primitives can be used.
In Java you can
Object[][] array = {{"John Smith", 000}, {"Smith John", 001}, ...};
Eterogenous lists are very useful. For instance, to make the game of snake, I can have a list of blocks like this:
[[x, y, 'down'], [x1, y1, 'down']]
instead of a class for the blocks, and I can access faster to every element.
In Lua an object and an array are the same thing so the reason is more clear. Let's say that Lua takes the weak typing to the extreme
Apart from that, I had a Google Map object and I needed to delete all markers created so far in that map. So I ended up creating an array for markers, an array for circles and an array for places. Then I made a function to iterate over those three arrays and call .remove() on each of them. I then realized that I could just have a single non homogeneous array and insert into them all the objects and iterate once over that array
Here is a simple answer:
N.B. I am not talking about arrays that include different objects that
all implement the same interface or inherit from the same parent, e.g.:
Everything extends java.lang.Object... and that's plenty. There is no reason not to have Object[] and put anything you like in. Object[] are exceptionally useful in any middleware like persistence layer.

Categories

Resources