How do I assign an identifier to a Java object

How do I assign an identifier to a Java object - java

I have the following situation. In my Java back end I collect some data from a service, I create data objects out of it and pass that to my Angular front end. Then a user might choose do something with one of the objects and sends its changed state to the back end. The back end collects the same data again from the service and now I have to identify which object the user choose. I want to do this by creating a common identifier. And here comes my question: how can I best generate the identifier?
First I did the following:
Get data from service
Assign an increasing integer in for-loop
However, I ran into the following problem. The second time the service is called, the same data might return but in a different order. So my first thought was to order the collected data first and then assign the integer. But ordering this data is quite a pain so I thought of something else:
Take the two fields of the data object that makes it unique
Make a string out of those fields and concatenate
Do a .hashCode() on the result
This creates an int that can't be traced back to the original data. That's exactly what I want, but now for the funny part. The hashCodes can collide. My fear is that in very, very rare cases the hashCode will be the same, even though the original string was different!
Is my fear grounded? And how can I compose a better id?

Related

Java and mySQL, why create class objects with data from the database?

So I have a homework assignment to build a console application in Java while utilizing mySQL databases.
Our teacher wants us to use objects and ArrayLists, as Java is an "object oriented language".
I am not sure why I must parse the mySQL ResultSets into objects, and not print them directly in my methods in a nice printf format.
Similarly, why should I make class objects when inserting new data and not altering the database directly through my connection.
Can someone please explain and point me in the right direction.
I don't need any code. Just trying to understand.

One printf method for a ResultSet is not the correct way to implement! A resultset can have a different amount of different types of data.
The correct way would be to parse them to an object. For each table you might have an own class with constructor. For example:
You have a table 'Person' with attributes Name, age, address. Then you create a constructor
public Person(String name, int age, String address){}
or you could create a static method to parse like this:
public static Person parseFromResultSet(ResultSet r) throws ParseException {}
and then you can even write your own 'PersonParseExcpetion'.

"I am not sure why I must parse the mySQL ResultSets into objects, and
not print them directly in a nice printf format. Similarly, why should I make class objects when inserting new data and not altering the database directly through my connection."
...mostly because in a real-world application you're likely to do a lot more with the records than just print them directly to the console, and at that point it's useful to put them into a solid data structure you can move around and manipulate easily.
It can also make the code easier to maintain - real-world apps get updated and altered lots of times usually, and may contain a lot of different data types and structures, some of which may be complex representations consisting of several other structures put together - and it's easier to build a complex object if you already have a lot of smaller objects to make it from (imagine building furniture from ready-made sections rather than starting by sawing all the planks and making all the screws yourself).
You can think of it as an intermediate layer between what's on screen and what's in storage - which, in more complicated applications, will not always be structured or displayed in the exact same way as each other, hence the need for an intermediary. Sometimes the display and the storage will not even be on the same computer (like in websites such as this one).
So I imagine the purpose of this assignment might be to get you used to using those structures, but without making the overall goal too complicated. The true purpose of the exercise is not always the obvious one.

Use the same object as both key and value in a HashMap

I've re-worded this search about as many times as I think I can and I've come up with nothing, so either this hasn't been asked before, or I don't know HOW to ask.
I am working on a personal project that boils down to trying to find the shortest path from a starting state to a finishing state.
There are too many (more than 2^64) states so I can't generate the entire graph. However, each state contains enough information to determine all the states adjacent to it. There are (infinitely) many paths from a state to another, and I am only interested in the shortest. This requires me to know if I've reached been to a state before, and also how I got there the first time.
My state object contains all the state information, as well as the depth of the path that lead me there, and the move I made to get there from the previous state in that path. If I get to the same state following a different path, the state information will be the same, but the depth and previous move fields will be different.
I want a data structure that will tell me if I have visited that state before, and if I have, retrieve the depth and previous state information from it.
The best solution I have come up with so far is to use a HashMap which maps a state to a state, and use the same state object as both the key and value, as in: myHashMap.put(myState, myState)
I've implemented hashCode() and equals() such that two states will be considered "equal" if their state information is the same (i.e. we've been in this room before) regardless of how I got there (i.e. which door I used to enter the room).
This seems rather silly but I can't think of another way (with fast storage/access) to store the information about whether I've been to a state and how I got there.
Does my plan make sense, or is there a better way?

You say that "I want a data structure that will tell me if I have visited that state before, and if I have, retrieve the depth and previous state information from it."
A Set (HashSet is likely better than a TreeSet for what you've described) will do the first part. Now, I see that you are trying to use a Map in order to do the second half which is retrieving the information. However, if you are already able to check if you've visited a state, that means you have a reference to the state. So, you don't need a map at all.
/* Marking a state as visited */
Set<State> visited = new HashSet<>();
visited.put(currentState);
/* Checking if visited/retrieving */
if (visited.contains(currentState)) {
// already visited
} else {
// do something with 'currentState'
}

If you see yourself storing key and value as the same value, then it is a Set what you really need.
Set<State> set = new HashSet<>();
set.put(stat1);
And by the way your solution is not very far from that as HashSet is backed by a HashMap behind the scenes which means a HashSet has the performance characteristic as those of HashMap and relies on equals and hashCode methods.

Difference between serialization and normal object storage?

Serialization is the process of converting an object stored in memory into a stream of bytes to be transferred over a network, stored in a DB, etc.
But isn't the object already stored in memory as bits and bytes? Why do we need another process to convert the object stored as bytes into another byte representation? Can't we just transmit the object directly over the network?
I think I may be missing something in the way the objects are stored in memory, or the way the object fields are accessed.
Can someone please help me in clearing up this confusion?

Different systems don't store things in memory in the same way. The obvious example is endianness.
Serialization defines a way by which systems using different in-memory representations can communicate.
Another important fact is that the requirements on in-memory and serialized data may be different: when in-memory, fast read (and maybe write) access is desirable; when serialized, small size is desirable. It is easier to create two different formats to fit these two use cases than it is to create one format which is good for both.
An example which springs to mind is LinkedHashMap: this basically stores two versions of the mapping when in memory (one to capture insertion order; one as a traditional hash map). However, you don't need both of these representations to reconstruct the same map from a serialized form: you only need the insertion order of key/value pairs. As such, the serialized form does not store the same data as the in-memory form.

Serialization turns the pre-existing bytes from the memory into a universal form.
This is done because different systems allocate memory in different ways. Thus, we cannot ensure that the object can be saved directly from the memory on one machine and then be loaded back in properly into another, different machine.
Mabe you can find more information on this page of Oracle docs.

Explanation of object serialization from book Thinking In Java.
When you create an object, it exists for as long as you need it, but under no circumstances does it exist when the program terminates. While this makes sense at first, there are situations in which it would be incredibly useful if an object could exist and hold its information even while the program wasn’t running. Then, the next time you started the program, the object would be there and it would have the same information it had the previous time the program was running. Of course, you can get a similar effect by writing the information to a file or to a database, but in the spirit of making everything an object, it would be quite convenient to declare an object to be "persistent," and have all the details taken care of for you.
Java’s object serialization allows you to take any object that implements the Serializable interface and turn it into a sequence of bytes that can later be fully restored to regenerate the original object. This is even true across a network, which means that the serialization mechanism automatically compensates for differences in operating systems. That is, you can create an object on a Windows machine, serialize it, and send it across the network to a Unix machine, where it will be correctly reconstructed. You don’t have to worry about the data representations on the different machines, the byte ordering, or any other details.
Hope this helps you.

Let's go with that set of mind : we take the object as is , and we send it as byte array over the network. another socket/httphandler receives that byte array.
now, two things come to mind:
ho much bytes to send?
what are these bytes? what class do these btyes represent?
you will have to provide this data as well. so for this action alone we need extra 2 steps.
Now, in C# and Java, as opposed to C++, the objects are scattered throught the heap, each object hold references to the objects it containes , so now we have another requirement
recursivly "catch" all the inner object and pack them into the byte array
now we get packed byte array which represent some object hirarchy, we need to tell the other side how to de-pack this byte array back to object+the object it holds so
Send information on how to unpack that byte array to object hirarchy
Some entities a obejct have cannot be sent over the net, such as functions. so now we have yet another step
Strip away things that cannot be serialized, like functions
this process goes on and one, for every new solution you will find many problems. Serialization is the process of taking that byte array you are talking about and making it something that can be handled in other enviroments, like network/files.

avoiding calculation every time a class method is called

I don't know if the title is appropriate but this is a design question.
I am designing a Java class which has a method which does heavy calculation and I am wondering there is a clean way to avoid this calculation every time the method is called. I know that the calling code can handle this but should it always be the responsibility of the calling code?.
To elaborate - I was writing a class for thousand dimensional vectors with a method to calculate the magnitude.So every time this method will be called it will calculate the magnitude over all the dimensions.

The concept you are looking for is called Memoization

Just cache the results in some structure internal to your class. Once the method is called, it looks if it has the previously calculated result in cache and returns it. In the other case it does the calculation and stores the result in cache. Be careful with the memory though.

Use flag to indicate whether there is a change to your vectors or not. If there is a change, then the method should do a full calculation or apply the calculation to only the changes but you will need to becareful with all the implementations of the rest of your class and make sure that the flag is properly set every time the value is modified.
The second method is to use cache. This is done by storing the previously calculated result and look it up before doing the calculation. However, this method is only work well if you don't have many variety in the key values of your objects oterwise you will end up using a lot of memory. Especially, if your key value has type of double, it is possible that the key value will never be found if they aren't exactly equal.

If the "thousand dimensional vectors" are passed in c'tor you can calculate the magnitude in c'tor and store in some private member variable.
Few things to take care of are:
If there are methods to add / delete vectors or contents of vectors then you need to update the magnitude in those methods.
If your class is supposed to be thread-safe then ensure appropriate write functions are atomic.

How often are the magnitudes changed? Is this immutable? How much of the interface for the vector do you control? Specifically, do you have any way to identify rotations or other magnitude-preserving transformations in your 1000 dimensional space? You could just store state for the magnitude, flag when the value changes, and recalculate only when necessary. If your transformations have nice internals, you might be able to skip the calculation based on that knowledge.

How to give each object in a document a unique ID?

I'm making a bitmap editor where a document consists of several layers where each layer represents a bitmap. Each layer must have a unique ID compared to all other layers that currently exist in the document. I also need to take into account that I need to save and load documents along with the layer IDs.
I'm using the command pattern to store actions that are performed on the document and the unique IDs are used to keep track of which layer an action should be performed on.
At the moment, I just keep a counter called X, when a new layer is created its ID is set to X then X is incremented. When loading, I need to make sure X is set to an appropriate number so that new layers are given unique IDs i.e. I could save the value of X and restore that, or set X based on the biggest layer ID loaded.
Given X is a 32-bit number, the user will need to create 4,294,967,296 layers working on the same file before IDs start to be reused which will cause weird behaviour. Should I implement a better unique ID system or is this generally good enough?
I'm in Java so I could use the UUID library which creates 128 bit unique IDs according to a standard algorithm. This seems overkill though.
Is there some general approach to this kind of problem?

This is perfectly good enough. At a rate of ten new layers per second 24/365, which is silly, it'll run fine for roughly three years.

If you think you might manipulate layers programmatically, and thus have some possibility of having 2^32 layers in the lifetime of the image, then throw all the layer IDs into a HashSet when you read the file, and update that set when you add/remove layers. (You don't need to explicitly store the set; the ID associated with each mask is enough.) Then instead of taking "the next number", take "the next number not in the set". Having a counter is still useful, but whether you set it to 0 or (max+1) when you read a file is immaterial; it won't take a significant amount of time to find an empty space unless you envision bafflingly large numbers of layers present simultaneously.
But since you're using Java, you could just use a long instead of an int, and then you wouldn't ever (practically speaking) be able to overflow the number even if all you did was create a one-pixel mask and destroy it over and over again.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.