I want to covert a string based protocol to Json, Performance is key
The String based protocol is something like
<START>A12B13C14D15<END>
and json is
{'A':12,'B':13,'C':14,'D':15}
I can regex parse the string, create a map & serialized to a Json, but it seeems lot of work as I need to convert a stream in realtime.
Would it be more efficient if I just do string manipulation to get the Json output? How can I do the conversion efficiently?
JSON serialization performance is likely not a problem. Don't optimize it prematurely. If you roll your own JSON serializer, you need to put some effort into e.g. getting the escapes right. If the performance does become a problem, take a look at Jackson, which is fairly fast.
Java seems to do regex quite fast, so you might be fine with it but beware that it is quite possible to accidentally build a regex that with some inputs starts backtracking heavily and takes several minutes to evaluate. You could use native String methods to parse the string.
If performance is really a concern, do timing tests on different approaches, select right tools, see what takes time and optimize accordingly.
Lots of ways to go about it, on JSON side. Instead of Map, which is not needed, POJO is often most convenient. Following uses Jackson (https://github.com/FasterXML/jackson-databind) library:
final static ObjectMapper MAPPER = new ObjectMapper(); // remember to reuse for good perf
public class ABCD {
public int A, B, C, D;
}
// if you have output stream handy:
ABCD value = new ABCD(...);
OutputStream out = ...;
MAPPER.writeValue(out, value);
// or if not
byte[] raw = MAPPER.writeValueAsBytes(value);
or, if you want to eliminate even more of overhead (which, really, is unlikely to matter here):
JsonGenerator jgen = MAPPER.getFactory().createGenerator(out);
jgen.writeStartObject();
jgen.writeNumberField("A", valueA);
jgen.writeNumberField("B", valueB);
jgen.writeNumberField("C", valueC);
jgen.writeNumberField("D", valueD);
jgen.writeEndObject();
jgen.close();
and that gets to quite to close to optimal performance you'd get with hand-written code.
In my case I used this library to handle with json in a web application.
Don't remember where to find. May this helps:
http://www.findjar.com/class/org/json/JSONArray.html
Related
I have object customerSummary at line #2 and accessing it at lines #11 & #12. Does it lead to data corruption in production?
private CustomerSummary enrichCustomerIdentifiers(CustomerSummaryDTO customerSummaryDTO) {
CustomerSummary customerSummary = customerSummaryDTO.getCustomerSummary();
List<CustomerIdentifier> customerIdentifiers = customerSummary
.getCustomerIdentifiers().stream()
.peek(customerIdentifier -> {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}).collect(Collectors.toList());
customerSummary.setCustomerIdentifiers(customerIdentifiers);
return customerSummary;
}
The literal answer to your question is No ... assuming that the access is thread-safe.
But your code probably doesn't do what you think it does.
The peek() method returns the precise stream of objects that it is called on. So your code is effectively doing this:
summary.setCustomerIdentifiers(
new SomeListClass<>(summary.getCustomerIdentifiers()));
... while doing some operations on the identifier objects.
You are (AFAIK unnecessarily) copying the list and reassigning it to the field of the summary object.
It would be simpler AND more efficient to write it as:
for (CustomerIdentifier id: summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
You could do the above using a list.stream().forEach(), or a list.forEach(), but the code is (IMO) neither simpler or substantially more concise than a plain loop.
summary.getCustomerIdentifiers().forEach(
id -> {
if (getCustomerReferenceTypes().contains(id.getIdentifierType())) {
id.setRefType(RefType.REF.toString());
} else {
id.setRefType(RefType.TAX.toString());
Country country = new Country();
Address address = summary.getCustomerAddresses().get(0);
country.setIsoCountryCode(address.getIsoCountryCode());
country.setCountryName(address.getCountryName());
id.setCountry(country);
}
}
);
(A final micro-optimization would be to declare and initialize address outside of the loop.)
Java 8 streams are not the solution to all problems.
The direct answer to your question is a resounding 'no', but you're misusing streams, which presumably is part of why you are even asking this question. You're operating on mutables in stream code, which you shouldn't be doing: It's why I'm saying 'misusing' - this code compiles and works but leads to hard to read and had to maintain code that will fail in weird ways as you use more and more of the stream API. The solution is not to go against the grain so much.
You're also engaging in stringly based typing which is another style mistake.
Finally, your collect call is misleading.
So, to answer the question:
Does it lead to data corruption in production?
No. How would you imagine it would?
Style mistake #1: mutables
Streams don't work nearly as well when you're working with mutables. The general idea is that you have immutable classes (classes without any setters; the instances of these classes cannot change after construction. String is immutable, so is Integer, and so is BigDecimal. There is no .setValue() on an integer instance, there is no setChar() on a string, or even a clear() or an append() - all operations on immutables that appear to modify things actually return a new instance that contains the result of the operation. someBigDecimal.add() doesn't change what someBigDecimal is pointing at; it constructs a new bigDecimal instance and returns that.
With immutables, if you want to change things, Stream's map method is the right one to use: For example, if you have a stream of BigDecimal objects and you want to, say, print them all, but with 2.5 added to them, you'd be calling map: You want to map each input BigDecimal into an output BD by asking the BD instance to make a new BD instance by adding 2.5 to itself.
With mutables, both map and peek are more relevant. Style debates are rife on what to do. peek just lets you witness what's going through a stream pipeline. It can be misleading because stream pipelines dont process anything until you stick a terminator on the end (something like collect, or max() or whatnot, those are 'terminators'). When talking about mutables, peek in theory works just as well as map does and some (evidently, including intellij's auto-suggest authors) are of the belief that a map operation that really just mutates the underlying object in the stream and returns the same reference is a style violation and should be replaced with a peek operation instead.
But the far more relevant observation is that stream operations should not be mutating anything at all. Do not call setters.
You have 2 options:
Massively refactor this code, make CustomIdentifier immutable (get rid of the getters, make all fields final, consider adding with-ers and builders and the like), change your peek code to something like:
.map(identifier -> {
if (....) return customerIdentifier.with(RefType.REF);
return identifier.withCountry(new Country(summary.get..., summary.get...));
})
Note that Country also needs this treatment.
Do not use streams.
This is much simpler. This code is vastly less confusing and better style if you just write a foreach loop. I have no idea why you thought streams were appropriate here. Streams are not 'better'. A problem is that adherents of functional style are so incredibly convinced they are correct they spread copious FUD (Fear, Uncertainty, Doubt) about non-functional approaches and strongly insinuate that functional style is 'just better'. This is not true - it's merely a different style that is more suitable to some domains and less to others. This style goes a lot further than just 'turn for loops into streams', and unawareness of what 'functional style' really means just leads to hard to maintain, hard to read, weird code like what you pasted.
I really, really want to use streams here
This is just a bad idea here (unless you do the full rewrite to immutables), but if you MUST, the actual right answer is not what intellij said, it's to use forEach. This is peek and the terminal in one package. It gets rid of the pointless collect (which just recreates a list that is 100% identical to what customerSummary.getCustomerIdentifiers() returns) call and properly represents what is actually happening (which is NOT that you're writing code that witnesses what is flowing through the stream pipe, you're writing code that you intend to execute on each element in the stream).
But that's still much worse than this:
CustomerSummary summary = custumerSummaryDTO.getCustomerSummary();
for (CustomerIdentifier identifier : summary.getCustomerIdentifiers()) {
if (getCustomerReferenceTypes().contains(customerIdentifier.getIdentifierType())) {
customerIdentifier.setRefType(RefType.REF.toString());
} else {
customerIdentifier.setRefType(RefType.TAX.toString());
Country country = new Country();
country.setIsoCountryCode(customerSummary.getCustomerAddresses().get(0).getIsoCountryCode());
country.setCountryName(customerSummary.getCustomerAddresses().get(0).getCountryName());
customerIdentifier.setCountry(country);
}
}
return customerSummary;
Style mistake #2: stringly typing
Why isn't the refType field in CustomerIdentifier just RefType? Why are you converting RefType instances to strings and back?
DB engines support enums and if they don't, the in-between layer (your DTO) should support marshalling enums into strings and back.
I have a code like below, objective is to convert output of data layer to a generic data format so it can be used by other layers.
During my LT runs, i am observing this method consuming a considerable percentage of cpu time but still comparing to over all time it looks manageable.
But worried when it goes for stress testing it might blow up, so thinking for refactoring to remove object mapper usage and convert it by iteration.
I am not really a fan of converting a object to string and then back to another object structure. My belief is to avoid conversions(object-->String-->Object) and it will save time and memory is this is correct assumption?
ObjectMapper om = new com.fasterxml.jackson.databind.ObjectMapper();
String listGridData = om.writeValueAsString(com.fasterxml.jackson.databind.node.ArrayNode grid);
List<Map<String, Object>> responseList = om.readValue(listGridData,
new TypeReference<List<Map<String, Object>>>() {
});
The ObjectMapper is something you want to re-use. There should be one instance per application. The object mapper is thread safe and there is no problem reusing it.
The second line from performance point of view is not an issue. With or without it is almost the same.
The performance of the third line depends on the complexity of the object to which you are parsing the String. The more complex it is the more references will be initialised and the more memory it will occupy. This will have effect on the performance.
How lightweight the object you are parsing to in the third step is vital together with the memory management as well as the re-use of the ObjectMapper.
I have a server that makes frequent calls to microservices (actually AWS Lambda functions written in python) with raw JSON payloads and responses on the order of 5-10 MB. These payloads are gzipped to bring their total size under lambda's 6MB limit.
Currently payloads are serialized to JSON, gzipped, and sent to Lambda. The responses are then gunzipped, and deserialized from JSON back into Java POJOs.
Via profiling we have found that this process of serializing, gzipping, gunzipping, and deserializaing is the majority of our servers CPU usage by a large margin. Looking into ways to make serialization more efficient led me to protobufs.
Switching our serialization from JSON to protobufs would certainly make our (de)serialization more efficient, and might also have the added benefit of eliminating the need to gzip to get payloads under 6MB (network latency is not a concern here).
The POJOs in question look something like this (Java):
public class InputObject {
... 5-10 metadata fields containing primitives or other simple objects ...
List<Slots> slots; // usually around 2000
}
public class Slot {
public double field1; //20ish fields with a single double
public double[] field2; //10ish double arrays of length 5
public double[][] field3; //1 2x2 matrix of doubles
}
This is super easy with JSON, gson.toJson(inputObj) and you're good to go. Protobufs seem like a whole different beast, requiring you to use the generated classes and litter your code with stuff like
Blah blah = Blah.newBuilder()
.setFoo(f)
.setBar(b)
.build()
Additionally, this results in an immutable object which requires more hoop jumping to update. Just seems like a bad bad thing to put all that transport layer dependent code into the business logic.
I have seen some people recommend writing wrappers around the generated classes so that all the protobuffy-ness doesn't leak into the rest of the codebase and that seemed like a good idea. But then I am not sure how I could serialize the top level InputObject in one go.
Maybe protobufs aren't the right tool for the job here, but it seems like the go-to solution for inter-service communication when you start looking into improving efficiency.
Am I missing something?
with your proto you can always serialize in one-go. You have an example in the java tuto online:
https://developers.google.com/protocol-buffers/docs/javatutorial
AddressBook.Builder addressBook = AddressBook.newBuilder();
...
FileOutputStream output = new FileOutputStream(args[0]);
addressBook.build().writeTo(output);
Also what you might want to do, is to serialize your proto into a ByteArray, and then encode it in Base64 to carry it through your wrapper:
String yourPayload = BaseEncoding.base64().encode(blah.toByteArray())
You have additional library that can help you to transform existing JSON into a proto, such as JsonFormat:
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/util/JsonFormat
And the usage is straightforward as well:
to serialize as Json:
JsonFormat.printToString(yourProto)
To build from a proto:
JsonFormat.merge(yourJSONstring, yourPrototBuilder);
No need to iterate through each element of the object.
Let me know if this answer your question!
I will like to validate if the string object is a valid json object, regardless its data correctness. In other words, is this json string well formatted?
For instance, I am given:
"abc":"123",
"cba":"233"
}
the process should return a format exception.
{
"abc":"123"
"cba":"233"
}
should give the same.
You might think this is easy, but how can we do it in a timely fashion and avoid duplicated process(unmarshall should not depend on the result of validation)? processing string can be resource consuming in our cases.
However, if this cost is absolutely unavoidable, what is the quickest way/tool we validate a json string in java?
Just a note, Jersey-json is used in our case. Unfortunately, it doesn't have a good validator on JSON object format. So basically a string gets passed in and before it gets unmarshalled, i need to apply a validator on it.
Use a library like Jackson that's already figured out how to do this for you.
This is a good library for handling JSON. You can use the JSONValidatingReader class to validate whether the JSON format is valid or not.
Talking about tools, if you're looking for some tools outside of Java, you can always feel free to try JSONLint. Quicker, simpler and clearer error message but it's a web-based.
We are using Apache Velocity for dynamic templates. At the moment Velocity has following methods for evaluation/replacing:
public static boolean evaluate(Context context, Writer writer, String logTag, Reader reader)
public static boolean evaluate(Context context, Writer out, String logTag, String instring)
We use these methods by providing StringWriter to write evaluation results. Our incoming data is coming in StringBuilder format so we use StringBuilder.toString and feed it as instring.
The problem is that our templates are fairly large (can be megabytes, tens of Ms on rare cases), replacements occur very frequently and each replacement operation triples the amount of required memory (incoming data + StringBuilder.toString() which creates a new copy + outgoing data).
I was wondering if there is a way to improve this. E.g. if I could find a way to provide a Reader and Writer on top of same StringBuilder instance that only uses extra memory for in/out differences, would that be a good approach? Has anybody done anything similar and could share any source for such a class? Or maybe there any better solutions to given problem?
Velocity needs to parse the whole template before it can be evaluated. You won't be able to provide a Reader and Writer to gain anything in a single evaluation. You could however break up your templates into smaller parts to evaluate them individually. That's going to depend on what's in them and if the parts would depend on each other. And the overhead might not be worth it, depending on your situation.
If you're only dealing with variable substitution in your templates you could simply evaluate each line of your input. Ideally you can intercept that before it goes into the StringBuilder. Otherwise you're still going to have to incur the cost of that memory plus its toString() that you'd feed into a BufferedReader to make readLine() calls against.
If there are #set directives you'll need to keep passing the same context for evaluation. If there are any #if or #foreach blocks it's going to get tricky. I have actually done this before and read in enough lines to capture the block of input for Velocity to parse and evaluate. At that point however you're starting to do Velocity's job and it's probably not worth it.
You can save one copy of the string by reading the value field from the StringBuilder through reflection and creating a CharArrayReader on that:
StringBuilder sb = new StringBuilder("bla");
Field valueField = StringBuilder.class.getSuperclass().getDeclaredField("value");
valueField.setAccessible(true);
char[] value = (char[]) valueField.get(sb);
Reader r = new CharArrayReader(value, 0, sb.length());
Yikes. That's a pretty heavyweight use for evaluate(). I assume you have good reasons for not using the standard resource loader stuff, so i won't pontificate. :)
I haven't heard of any solution that would fit this, but since Reader is not a particularly complicated class, my instinct would be to just create your own StringBufferReader class and pass that in.