Using Strings vs POJOs ...which one consumes more memory - java

I have a database column which contains varchar data type. I would like to store this data in an arraylist for comparison with another list. I can think of two ways to do this. One is get the data and assign it to a String and store it in the arraylist. The second method would be to have a POJO , which would have a getter and setter method for this variable and store it in the POJO which in turn is stored in an arraylist. When I have to compare it against another variable, I have to either do a String comparison OR get it out of a POJOs getter method and then compare. While I feel that using the String reduces a lot of code, I would like to know if using one(String) over the other(POJO) has any memory implication. I usually need to compare around 1000 objects. So which one would consume less memory....in other words which one would be much faster to run....better performance. I need to use JAVA 1.4.

In Java, the science is, when you create an object, it takes part of heap, in pojo case it will be heap space for pojo and heap space for String (which is memory) and time complexity also (getter/setter). I would prefer just Strings.

Related

Java json object size: JsonNode vs. String

I work on a java application that needs to hold ~50k json objects in memory.
Each json string is ~5000 characters long.
Extra memory consumption is my concern.
I want to compare the json objects later, but processing is not my concern, only extra memory consumption.
What is more efficient:
Keep json as java String
Keep json as Jackson JsonNode object
I tried serializing the JsonNode objects and the resulting files are smaller, than the string size - but I am not sure if the same is true in memory.
My use-case:
I need to detect changes to some objects, which are encoded as json. This change detection runs every minute and compares the current state with the last state (which we hold in memory).
There are no hooks or events or similar to get changes.
We already hold a list of these objects in memory - with only a limited subset of the json fields.
I cannot change that architecture.
Now instead of mapping json data to some Pojo and comparing each property manually, the idea is to hold the json string/objects and then calculate the diff/patch with some library.
This simplifies the logic a lot and is more generic - but we are worried about the extra memory consumption.
You can use the java.lang.instrument package's getObjectSize() method, to get approximations on the sizes of the objects with the both ways.
long getObjectSize(Object objectToSize)
From the javadoc:
Returns an implementation-specific approximation of the amount of storage consumed by the specified object. The result may include some or all of the object's overhead, and thus is useful for comparison within an implementation but not between implementations. The estimate may change during a single invocation of the JVM.

How to efficiently make Wrapper for MT20x Message

I planned to make a wrapper to Swift MT203, MT204 messages.
Message Structure as follows,
MT203 -
2 Mandatory Sequences, where first one exists once and second one can exists two to ten times, and each sequence can contains mandatory fields and optional fields.
MT204 -
2 Mandatory Sequences, where first one exists once and second one can exists more than one time, and each sequence can contains mandatory fields and optional fields.
[References for the MT203 and MT204]
https://www2.swift.com/knowledgecentre/publications/usgf_20180720/1.0?topic=finmt203.htm
https://www2.swift.com/knowledgecentre/publications/usgf_20180720/1.0?topic=finmt204.htm
Which data structure is better to use to store the second sequences in each cases,
I prefer, Array for instance MT203, since I know the maximum size of second sequence but for MT204 I was confused to choose which is better from array and array list.
As during unpacking we have to get fields continuously but not all fields are mandatory for the second sequences.
[Also do comment if the first one choice of Array is not valid]
I think you'd do quite fine with either data structures.
Having said that, there's some things you might want to consider: you can make an ArrayList (like any other list) Immutable. That will prevent unwanted modification of the contents. This might be very interesting when you pass these message objects around and want to prevent someone else to modify the message accidentally. There's many ways to make a list immutable - such as Collections.immutableList(myArrayList) or Guava's ImmutableList.copyOf(myArrayList).
Having said that, I believe that there are more important considerations than features of lists over features of array:
First of all, I would consider having them both use the same data structure - especially if both messages are used in the same part of the codebase, it's going to be very confusing if one message type is an array, while the other one is a list. This might ultimately become a pain in the back as both messages will have to be handled differently. e.g. if you want to log messages - you'll have to do that differently for lists vs arrays.
Secondly, I would recommend, modelling each of these messages as a class. That class would (obviousely) use an array or a list internally to store the message data, but it would also give higher level semantical access to the contents of the message.
say you wanted the ValueDate of MTS203 (field index 1): you'd always need to call dateFormat.parse(message[1]) for that - and everyone would need to remember what index 1 was and how to parse the date string into an actual date object. If you had a class like this:
class MultipleGeneralFinancialInstitutionTransfer {
private List<String> messageData;
/** constructor... */
public Date getValueDate() {
return parseDate(messageData.get(1)); // imagine parse date being a method to parse the actual format
}
}
it would be much more convenient to work with that message - and nobody would need to remember the actual format of that message.
I. Size: Array in Java is fixed in size. We can not change the size of array after creating it. ArrayList is dynamic in size. When we add elements to an ArrayList, its capacity increases automatically.
II. Performance: In Java Array and ArrayList give different performance for different operations.
add() or get(): Adding an element to or retrieving an element from an array or ArrayList object has similar performance. These are constant time operations.
resize(): Automatic resize of ArrayList slows down the performance. ArrayList is internally backed by an Array. In resize() a temporary array is used to copy elements from old array to new array.
III. Primitives: Array can contain both primitive data types as well as objects. But ArrayList can not contain primitive data types. It contains only objects.
IV. Iterator: In an ArrayList we use an Iterator object to traverse the elements. We use for loop for iterating elements in an array.
V. Type Safety: Java helps in ensuring Type Safety of elements in an ArrayList by using Generics. An Array can contain objects of same type of classe. If we try to store a different data type object in an Array then it throws ArrayStoreException.
VI. Length: Size of ArrayList can be obtained by using size() method. Every array object has length variable that is same as the length/size of the array.
VII. Adding elements: In an ArrayList we can use add() method to add objects. In an Array assignment operator is used for adding elements.
VIII. Multi-dimension: An Array can be multi-dimensional. An ArrayList is always of single dimension
Now you can chose as per your need which is better for you

Multiple keys pointing/refering to same Object in values in HashMap

I have a HashMap , which has a Object (with 2 String objects as member variables of it) and value as Object containing 3 different Strings.
Say:
Map<ReqDTO , RespDTO> map = new HashMap<ReqDTO ,RespDTO> ();
suppose I have following values :
KEY VALUE
1 ("str1","1") - ("1","2","3")
2 ("str2","2") - ("a","b","c")
3 ("str3","3") - ("1","2","3")
4 ("str4","4") - ("v","b","g")
5 ("str5","5") - ("1","2","3")
When I have thousands of such records , (which is Cache in my application) , then VALUE part of record number : 1,3,5 is holding memory of 3 objects. I want to make KEYS of 1,3,5 records to point to same instance of the VALUES (1,2,3 , in this case) and not as separate memory.
Is there any variant in HashMap for the same? or Any other Datastructure will do..
NOTE: It is loaded only once and all the operations performed on this are READ only..
What should be the preference of datastructure to make it performance intensive ,In other words, It can have costly insertion.
You could use a technique called interning, which is essentially mapping all objects that are equal() to each other to a single authorative instance.
That's used in Java for Strings using String.intern().
But there are some drawbacks to using this method ('though they have been reduced quite a lot with modern JVMs). As an alternative you can use the Guava interface Interner.
Just create a single Interner using the Interners helper class:
Interner strInterner = Interners.newStrongInterner();
and pass each String value through the interner before using it in a key or value:
String v1 = strInterner.intern(param1);
This way for any given value, you'll only ever use 1 String instance. The same can be done for any other class (as long as it correctly implements equals() and is immutable).
You can even discard the Interner after you've constructed the map.
Well, if you put the same object into the map for both keys, then they'll both be the same object. If you have different instances of the object that are .equals() to each other, it gets more interesting. You could try using Flyweight for your value objects, or you could walk through the values() of the map - if you find an equals() value object, put your key with that object instead of the one passed in.
Someone, somewhere has probably already written a Map implementation that does what you want, but my best recommendation there is to use Google and hope they're good at SEO.

String and Character Array in Java

I am a student who has just shifted from C++ to Java.
In Java what could be the main reason for defining separate data types for Strings and Char arrays? What is the difference between the two?
Since I have only studied C++, up till now I was under the impression that they are the same thing. Please clarify if possible.
String is immutable. Char array is not. A string is implemented with a char array underneath but every time you try to modify it (like with concatenation, replace etc.) it gives you a new String object.
So, String behaves as a constant Char array but comes with certain syntactic sugar that also makes them very easier to use. For example, the addition + operator has been overloaded as a string concatenation operator as well.
In Java, String is a basic system class that essentially wraps a char[]. There are several reasons why, for most uses, having a full class is preferable to directly handling arrays:
Strings are immutable; once you have a reference to some String, you know it's never going to change.
Strings provide useful methods that a bare array couldn't, such as length(), and have clearly-defined comparison semantics.
You never have to deal with string termination yourself.
Java has a special exception for the rule of "no operator overloading" to support string concatenation (with +).
Essentially, it's good OO practice to use a class to collect the desired behavior and the data structures in the same place, and String wraps up an array of characters with the useful operations that you want to perform on a string.
String is a class in Java and offers you methods and is also an Object.
A String-object is also immutable.
Internal the value is a char-array.
There is a semantic difference. Just because data is stored the same way, this doesn't mean it's the same thing. Dates and Amounts may also have the same internal representation (long for a timestamp or fixed point amount of cash), but they're not the same. The char array could as well mean a 16-bit image.
In object orientation, it's good practice to model objects based on what they are and can, and not by how they internally store their data. This allows you to encapsulate the data (and restrict or control (observer support) access with getters/setters, or even make the internal representation immutable or poolable), and provide appropriate methods for your objects.
String is immutable in Java and stored in the String pool. Once it is created it stays in the pool until garbage collected.Since, String is immutable , the logging password is as readable string.It has greater risk of producing the memory dump to find the password.
where as Char array is created in heap and you can override with some dummy values.
The advantage to using the string object is all the methods available to it.
For example:
stringExample1.equals(stringExample2);
String stringExample3 = stringExample1.replace(substring1, substring2);

Is it better to use a custom Java object or to use a HashMaps to store data

Lets say we have a bunch of data (temp,wind,pressure) that ultimately comes in as a number of float arrays.
For example:
float[] temp = //get after performing some processing (takes time)
float[] wind =
Say we want to store these values in memory for different hours of the day. Is it better to put these on a HashMap like:
HashMap maphr1 = new HashMap();
maphr1.put("temp",temp);
maphr1.put("wind",wind);
...
Or is it better to create a Java object like:
public class HourData(){
private float[] temp,wind,pressure;
//getters and setters for above!
}
...
// use it like this
HourData hr1 = new HourData();
hr1.setTemp(temp);
hr1.setWind(wind);
Out of these two approaches which is better in terms of performance, readability, good OOP practice etc
You're best off having an HourData class that stores a single set of temperature, wind, and pressure values, like this:
public class HourData {
private float temp, wind, pressure;
// Getters and setters for the above fields
}
If you need to store more than one set of values, you can use an array, or a collection of HourData objects. For example:
HourData[] hourDataArray = new HourData[10000];
This is ultimately much more flexible, performant, and intuitive to use than putting storing the arrays of data in your HourData class.
Flexibility
I say that this approach is more flexible because it leaves the choice of what kind of collection implementation to use (e.g. ArrayList, LinkedList, etc.) to users of the HourData class. Moreover, if he/she wishes to deal just with a single set of values, this approach doesn't force them to deal with an array or collection.
Performance
Suppose you have a list of HourData instances. If you used three float arrays in the way that you described, then accessing the i'th temp, wind, and pressure values may cause three separate pages to be accessed in memory. This happens because all of the temp values will be stored contiguously, followed by all of the wind values, followed by all of the pressure values. If you use a class to group these values together, then accessing the i'th temp, wind, and pressure values will be faster because they will all be stored adjacent to each other in memory.
Intuitive
If you use a HashMap, anyone who needs to access any of the fields will have to know the field names in advance. HashMap objects are better suited to key/value pairs where the keys are not known at compile time. Using an HourData class that contains clearly defined fields, one only needs to look at the class API to know that HourData contains values for temp, wind, and pressure.
Also, getter and setter methods for array fields can be confusing. What if I just want to add a single set of temp, wind, and pressure values to the list? Do I have to get each of the arrays, and add the new values to the end of them? This kind of confusion is easily avoided by using a "wrapper" collection around an HourData that deals only with single values.
For readability i would definately go for a object since it makes more sense. Especially since you store different datacollections like the wind longs have a different meaning as the temp longs.
Besides this you can also store other information like the location and time of your measurement.
Well if you dont have any key to differentiate different instances of the same object. I would create HourData objects and store them in a array list.
Putting data in a contained object always increases the readability.
You have mentioned bunch of data, So I would rather read it as collection of data.
So the answer is , if something already available in Java collection framework out of box , why do you want to write one for you.
You should look at Java collection classes and see which fits your requirement better, whether it is concurrent access, fast retrieve time or fast add time etc etc..
Hope this helps
EDIT----
Adding one more dimension to this.
The type of application you are building also affects your approach.
The above discussion rightly mentions readability, flexibility , performance as driving criteria for your design.
But the type of application you are building is also one of the influencing factors.
For example, Lets say you are building a web application.
A Object which is stored in memory for a long time would be either in Application or Session Scope. So you will have to make it immutable by design or use it for thread safe manner.
The business data which remains same across different implementations should be designed as per OOP or best practices but the infrastructure or Application logic should more be your framework driven.
I feel what you are talking, like keeping an object for a long time in memory is more a framework driven outlook, hence I suggested use Java Collection and put your business objects inside it. Important points are
Concurrent Access Control
Immutable by design
If you have a limited and already defined list of parameters then it's better to use the second approach.
In terms of performance: you don't need to search for key in hashmap
In terms of readability: data.setTemp(temp) is better than map.put("temp", temp). One of the benefits of the first approach is that typing errors will be catched during the compilation
In terms of good OOP practices: first approach has nothing to do with OOP practices. Using the second approach you can easily change the implementation, add new methods, provide several alternative data object implementations, etc.
But you might want to use collections if you don't know the parameters and if you want to work with uncategorized(extensible) set of parameters.

Categories

Resources