JAVA Performance insights , refactoring existing code

JAVA Performance insights , refactoring existing code - java

I have a code like below, objective is to convert output of data layer to a generic data format so it can be used by other layers.
During my LT runs, i am observing this method consuming a considerable percentage of cpu time but still comparing to over all time it looks manageable.
But worried when it goes for stress testing it might blow up, so thinking for refactoring to remove object mapper usage and convert it by iteration.
I am not really a fan of converting a object to string and then back to another object structure. My belief is to avoid conversions(object-->String-->Object) and it will save time and memory is this is correct assumption?
ObjectMapper om = new com.fasterxml.jackson.databind.ObjectMapper();
String listGridData = om.writeValueAsString(com.fasterxml.jackson.databind.node.ArrayNode grid);
List<Map<String, Object>> responseList = om.readValue(listGridData,
new TypeReference<List<Map<String, Object>>>() {
});

The ObjectMapper is something you want to re-use. There should be one instance per application. The object mapper is thread safe and there is no problem reusing it.
The second line from performance point of view is not an issue. With or without it is almost the same.
The performance of the third line depends on the complexity of the object to which you are parsing the String. The more complex it is the more references will be initialised and the more memory it will occupy. This will have effect on the performance.
How lightweight the object you are parsing to in the third step is vital together with the memory management as well as the re-use of the ObjectMapper.

Related

Does it lead to data corruption when we access an outside object in a Java 8 map function?

I have object customerSummary at line #2 and accessing it at lines #11 & #12. Does it lead to data corruption in production?
private CustomerSummary enrichCustomerIdentifiers(CustomerSummaryDTO customerSummaryDTO) {
CustomerSummary customerSummary = customerSummaryDTO.getCustomerSummary();
List<CustomerIdentifier> customerIdentifiers = customerSummary
.getCustomerIdentifiers().stream()
.peek(customerIdentifier -> {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}).collect(Collectors.toList());
customerSummary.setCustomerIdentifiers(customerIdentifiers);
return customerSummary;
}

The literal answer to your question is No ... assuming that the access is thread-safe.
But your code probably doesn't do what you think it does.
The peek() method returns the precise stream of objects that it is called on. So your code is effectively doing this:
summary.setCustomerIdentifiers(
new SomeListClass<>(summary.getCustomerIdentifiers()));
... while doing some operations on the identifier objects.
You are (AFAIK unnecessarily) copying the list and reassigning it to the field of the summary object.
It would be simpler AND more efficient to write it as:
for (CustomerIdentifier id: summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
You could do the above using a list.stream().forEach(), or a list.forEach(), but the code is (IMO) neither simpler or substantially more concise than a plain loop.
summary.getCustomerIdentifiers().forEach(
id -> {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
);
(A final micro-optimization would be to declare and initialize address outside of the loop.)
Java 8 streams are not the solution to all problems.

The direct answer to your question is a resounding 'no', but you're misusing streams, which presumably is part of why you are even asking this question. You're operating on mutables in stream code, which you shouldn't be doing: It's why I'm saying 'misusing' - this code compiles and works but leads to hard to read and had to maintain code that will fail in weird ways as you use more and more of the stream API. The solution is not to go against the grain so much.
You're also engaging in stringly based typing which is another style mistake.
Finally, your collect call is misleading.
So, to answer the question:
Does it lead to data corruption in production?
No. How would you imagine it would?
Style mistake #1: mutables
Streams don't work nearly as well when you're working with mutables. The general idea is that you have immutable classes (classes without any setters; the instances of these classes cannot change after construction. String is immutable, so is Integer, and so is BigDecimal. There is no .setValue() on an integer instance, there is no setChar() on a string, or even a clear() or an append() - all operations on immutables that appear to modify things actually return a new instance that contains the result of the operation. someBigDecimal.add() doesn't change what someBigDecimal is pointing at; it constructs a new bigDecimal instance and returns that.
With immutables, if you want to change things, Stream's map method is the right one to use: For example, if you have a stream of BigDecimal objects and you want to, say, print them all, but with 2.5 added to them, you'd be calling map: You want to map each input BigDecimal into an output BD by asking the BD instance to make a new BD instance by adding 2.5 to itself.
With mutables, both map and peek are more relevant. Style debates are rife on what to do. peek just lets you witness what's going through a stream pipeline. It can be misleading because stream pipelines dont process anything until you stick a terminator on the end (something like collect, or max() or whatnot, those are 'terminators'). When talking about mutables, peek in theory works just as well as map does and some (evidently, including intellij's auto-suggest authors) are of the belief that a map operation that really just mutates the underlying object in the stream and returns the same reference is a style violation and should be replaced with a peek operation instead.
But the far more relevant observation is that stream operations should not be mutating anything at all. Do not call setters.
You have 2 options:
Massively refactor this code, make CustomIdentifier immutable (get rid of the getters, make all fields final, consider adding with-ers and builders and the like), change your peek code to something like:
.map(identifier -> {
if (....) return customerIdentifier.with(RefType.REF);
return identifier.withCountry(new Country(summary.get..., summary.get...));
})
Note that Country also needs this treatment.
Do not use streams.
This is much simpler. This code is vastly less confusing and better style if you just write a foreach loop. I have no idea why you thought streams were appropriate here. Streams are not 'better'. A problem is that adherents of functional style are so incredibly convinced they are correct they spread copious FUD (Fear, Uncertainty, Doubt) about non-functional approaches and strongly insinuate that functional style is 'just better'. This is not true - it's merely a different style that is more suitable to some domains and less to others. This style goes a lot further than just 'turn for loops into streams', and unawareness of what 'functional style' really means just leads to hard to maintain, hard to read, weird code like what you pasted.
I really, really want to use streams here
This is just a bad idea here (unless you do the full rewrite to immutables), but if you MUST, the actual right answer is not what intellij said, it's to use forEach. This is peek and the terminal in one package. It gets rid of the pointless collect (which just recreates a list that is 100% identical to what customerSummary.getCustomerIdentifiers() returns) call and properly represents what is actually happening (which is NOT that you're writing code that witnesses what is flowing through the stream pipe, you're writing code that you intend to execute on each element in the stream).
But that's still much worse than this:
CustomerSummary summary = custumerSummaryDTO.getCustomerSummary();
for (CustomerIdentifier identifier : summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}
return customerSummary;
Style mistake #2: stringly typing
Why isn't the refType field in CustomerIdentifier just RefType? Why are you converting RefType instances to strings and back?
DB engines support enums and if they don't, the in-between layer (your DTO) should support marshalling enums into strings and back.

Hazelcast Java Serialization/Deserialization ArrayList Pitfall

I've switched from Memcached to Hazelcast. After a while i've noticed that the size of the Cache was bigger than usual. With man center.
So i did like this:
1. Before to call the IMap.set(key,value(ArrayList) i deserialize the value to a file which has 128K as size.
2. After the IMap.set() is called, i IMap.get() the same map, which suddently this has now 6 MB size.
The object in question has many objects which are referenced multiple times in the same structure.
i've opened the 2 binary files and i've seen that the 6MB file has a lot of duplicated data. The serialization used by hazelcast somehow make copies of the references
All the Classes instantiated for the Cache are Serializable except Enums.
using Memcached the value size is 128K in both cases.
i've tryied Kryo with hazelcast and there was not really a difference, still over 6MB
Have someone a similar problem with hazelcast ? If yes how did you solved this without changing the cache provider.
I could provide the Object Structure and Try to reproduce it with non sensitive data, if someone need it.

I am not pretending, but after a lost day, i finally came up with a solution which workaround this. I cannot say if it is a feature or just a problem to report.
Anyway in Hazelcast if you put in an IMap a value as ArrayList thus will be Serialized Entry By Entry. Which means if we have 100 entries of the same instance A which is 6K we will have 600K with Hazelcast. Here a short RAW code which prove my answer.
To Workaround or avoid this with Java Serialization you should wrap the ArrayList into an object , this will do the trick.
(only with Serializable, no other Implementations)
#Test
public void start() throws Exception {
HazelcastInstance client = produceHazelcastClient();
Data data = new Data();
ArrayList<Data> datas = new ArrayList<>();
IntStream.range(0, 1000).forEach(i -> {
datas.add(data);
});
wirteFile(datas,"DataLeoBefore","1");
client.getMap("data").put("LEO", datas);
Object redeserialized = client.getMap("data").get("LEO");
wirteFile(redeserialized,"DataLeoAfter","1");
}
public void wirteFile(Object value, String key, String fileName) {
try {
Files.write(Paths.get("./" + fileName + "_" + key), SerializationUtils.serialize(((ArrayList) value)));
} catch (IOException e) {
e.printStackTrace();
}
}

Hazelcast can be configured to use several different serialization schemes; Java serialization (the default) is the least efficient in terms of both time and space. Typically choosing the right serialization strategy gives a bigger payoff than almost any other optimization you could do.
The reference manual gives a good overview of the different serialization schemes and the tradeoffs involved.
IMDG Reference Manual v3.11 - Serialization
I typically would go with IdentifiedDataSerializable if my application is all Java, or Portable if I needed to support cross-language clients or object versioning.
If you need to use Java serialization for some reason, you might check and verify that the SharedObject property is set to true to avoid creating multiple copies of the same object. (That property can be set via the element in hazelcast.xml, or programmatically through the SerializationConfig object).

Groovy overhead of def keyword

Not sure if this is a correct question for here but I was wondering about the Groovy keyword of def (and the equivalent of other dynamic or optionally typed languages).
One useful, or nice usage of something like this is that you could have one type of value assigned to a variable and then change it to another type.
For instance, let's say you get a map of two timestamps that represent a date range from your front end
def filters = [
from: from,
to : to
]
Then when you do some validations you want to pass a date range in date objects to your DAO for SQL queries so you do something like the following
if(filters.from && filters.to) {
def normalizedDateRange = DateUtil.buildDateRange(filters.from, filters.to, maxRangeDays)
filters.from = normalizedDateRange.from
filters.to = normalizedDateRange.to
}
This is acceptable and you get away without needing to create a second map with very similar name or such. My question is if this causes too much overhead in languages like this and is this one of the reasons they are slower than JAVA let's say.
Some people say that you could consider the def as using Object in Java so it allocates enough space to store anything or maybe it store a reference and when you store something different it just frees the space it was taking and reallocates new space and just updates the reference?
Would I gain anything by creating a new object/map here and storing the new values there? Or is the gain so little that it's better to take advantage of the sugar syntax and "cheating" of Groovy?

def will be lighter, since it is simply an empty reference, which might easily be garbage collected later on.
By storing variables in a map, you are storing a value in a specific structure which calculates hashcode and whatnot1 2. It will be heavier.
Of course a map has wonderful features and one shouldn't overlook this simply based on performance without checking if it is a true bottleneck. You could try writing a microbenchmark.

Remember, Groovy is optionally typed, not dynamically typed. So if you are writing a constant that holds a filter, you can do this to give the compiler and JVM hints on what to do:
static final Map filters = [ to: 'X', from: 'Y' ]

Java String to Json & vice versa

I want to covert a string based protocol to Json, Performance is key
The String based protocol is something like
<START>A12B13C14D15<END>
and json is
{'A':12,'B':13,'C':14,'D':15}
I can regex parse the string, create a map & serialized to a Json, but it seeems lot of work as I need to convert a stream in realtime.
Would it be more efficient if I just do string manipulation to get the Json output? How can I do the conversion efficiently?

JSON serialization performance is likely not a problem. Don't optimize it prematurely. If you roll your own JSON serializer, you need to put some effort into e.g. getting the escapes right. If the performance does become a problem, take a look at Jackson, which is fairly fast.
Java seems to do regex quite fast, so you might be fine with it but beware that it is quite possible to accidentally build a regex that with some inputs starts backtracking heavily and takes several minutes to evaluate. You could use native String methods to parse the string.
If performance is really a concern, do timing tests on different approaches, select right tools, see what takes time and optimize accordingly.

Lots of ways to go about it, on JSON side. Instead of Map, which is not needed, POJO is often most convenient. Following uses Jackson (https://github.com/FasterXML/jackson-databind) library:
final static ObjectMapper MAPPER = new ObjectMapper(); // remember to reuse for good perf
public class ABCD {
public int A, B, C, D;
}
// if you have output stream handy:
ABCD value = new ABCD(...);
OutputStream out = ...;
MAPPER.writeValue(out, value);
// or if not
byte[] raw = MAPPER.writeValueAsBytes(value);
or, if you want to eliminate even more of overhead (which, really, is unlikely to matter here):
JsonGenerator jgen = MAPPER.getFactory().createGenerator(out);
jgen.writeStartObject();
jgen.writeNumberField("A", valueA);
jgen.writeNumberField("B", valueB);
jgen.writeNumberField("C", valueC);
jgen.writeNumberField("D", valueD);
jgen.writeEndObject();
jgen.close();
and that gets to quite to close to optimal performance you'd get with hand-written code.

In my case I used this library to handle with json in a web application.
Don't remember where to find. May this helps:
http://www.findjar.com/class/org/json/JSONArray.html

Giving a class member a reference to another classes members

On a scale of one to ten, how bad is the following from a perspective of safe programming practices? And if you find it worse than a five, what would you do instead?
My goal below is to get the data in the List of Maps in B into A. In this case, to me, it is ok if it is either a copy of the data or a reference to the original data. I found the approach below fastest, but I have a queasy feeling about it.
public class A {
private List<Map<String, String>> _list = null;
public A(B b) {
_list = b.getList();
}
}
public class B {
private List<Map<String, String>> _list = new ArrayList<Map<String, String>>();
public List<Map<String, String>> getList() {
// Put some data in _list just for the sake of this example...
_list.add(new HashMap<String, String>());
return _list;
}
}

The underlying problem is a bit more complex:
From a security perspective, this is very, very bad.
From a performance perspective, this is very, very good.
From a testing perspective, it's good because there is nothing in the class that you can't easily reach from a test
From an encapsulation perspective, it's bad since you expose the inner state of your class.
From a coding safety perspective, it's bad because someone will eventually abuse this for some "neat" trick that will cause odd errors elsewhere and you will waste a lot of time to debug this.
From an API perspective, it can be either: It's hard to imagine an API to be more simple but at the same time, it doesn't communicate your intent and things will break badly if you ever need to change the underlying data structure.
When designing software, you need to keep all of these points in the back of your mind. With time, you will get a feeling which kinds of errors you make and how to avoid them. Computers being as dump and slow as they are, there is never a perfect solution. You can just strive to make it as good as you can make it at the when you write it.
If you want to code defensively, you should always copy any data that you get or expose. Of course, if "data" is your whole data model, then you simply can't copy everything each time you call a method.
Solutions to this deadlock:
Use immutables as often as you can. Immutables and value objects are created and never change after that. These are always safe and the performance is OK unless the creation is very expensive. Lazy creation would help here but that is usually its own can of worms. Guava offers a comprehensive set of collections which can't be changed after creation.
Don't rely too much on Collections.unmodifiable* because the backing collection can still change.
Use copy-on-write data structures. The problem above would go away if the underlying list would clone itself as soon as A or B start to change it. That would give each its own copy effectively isolation them from each other. Unfortunately, Java doesn't have support for these built in.

In this case, to me, it is ok if it is either a copy of the data or a reference to the original data.
That is the sticking point.
Passing the object instance around is the fastest, but allows the caller to change it, and also makes later changes visible (there is no snapshot).
Usually, that is not a problem, since the caller is not malicious (but you may want to protect against coding errors).
If you do not want the caller to make changes, you could wrap it into an immutable wrapper.
If you need a snapshot, you can clone the list.
Either way, this will only snapshot/protect the list itself, not its individual elements. If those are mutable, the same reasoning applies again.

I would say that you will have too choose between efficiency and encapsulation. By directly accessing a member of the class it will have its state changed. That might be unexpected and lead to nasty surprises. I would also say that it increases the coupling between the two classes.
An alternative is to let the information expert principle decide and leave the job to the class that have the information. You will have to judge if the work that was suppose to be done with class A really is the responsibility of class B.
But really, speed and clean code can be conflicting interests. Some times you just have to play dirty to get it quick enough.

All you're creating is a reference to B._list. So 10 if you wanted to copy the items.
You could iterate over all b._list items and add them to the A._list manually:
public A(B b) {
_list = new List<Map<String, String>> ();
for (Map<String,String> map : b.getList()) {
Map<String,String> newMap = new HashMap<String,String>();
while(map.keySet().iterator().hasNext()) {
String key = map.keySet().iterator().next();
newMap.put(key,map.get(key));
}
_list.add(newMap);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.