We are using Apache Velocity for dynamic templates. At the moment Velocity has following methods for evaluation/replacing:
public static boolean evaluate(Context context, Writer writer, String logTag, Reader reader)
public static boolean evaluate(Context context, Writer out, String logTag, String instring)
We use these methods by providing StringWriter to write evaluation results. Our incoming data is coming in StringBuilder format so we use StringBuilder.toString and feed it as instring.
The problem is that our templates are fairly large (can be megabytes, tens of Ms on rare cases), replacements occur very frequently and each replacement operation triples the amount of required memory (incoming data + StringBuilder.toString() which creates a new copy + outgoing data).
I was wondering if there is a way to improve this. E.g. if I could find a way to provide a Reader and Writer on top of same StringBuilder instance that only uses extra memory for in/out differences, would that be a good approach? Has anybody done anything similar and could share any source for such a class? Or maybe there any better solutions to given problem?
Velocity needs to parse the whole template before it can be evaluated. You won't be able to provide a Reader and Writer to gain anything in a single evaluation. You could however break up your templates into smaller parts to evaluate them individually. That's going to depend on what's in them and if the parts would depend on each other. And the overhead might not be worth it, depending on your situation.
If you're only dealing with variable substitution in your templates you could simply evaluate each line of your input. Ideally you can intercept that before it goes into the StringBuilder. Otherwise you're still going to have to incur the cost of that memory plus its toString() that you'd feed into a BufferedReader to make readLine() calls against.
If there are #set directives you'll need to keep passing the same context for evaluation. If there are any #if or #foreach blocks it's going to get tricky. I have actually done this before and read in enough lines to capture the block of input for Velocity to parse and evaluate. At that point however you're starting to do Velocity's job and it's probably not worth it.
You can save one copy of the string by reading the value field from the StringBuilder through reflection and creating a CharArrayReader on that:
StringBuilder sb = new StringBuilder("bla");
Field valueField = StringBuilder.class.getSuperclass().getDeclaredField("value");
valueField.setAccessible(true);
char[] value = (char[]) valueField.get(sb);
Reader r = new CharArrayReader(value, 0, sb.length());
Yikes. That's a pretty heavyweight use for evaluate(). I assume you have good reasons for not using the standard resource loader stuff, so i won't pontificate. :)
I haven't heard of any solution that would fit this, but since Reader is not a particularly complicated class, my instinct would be to just create your own StringBufferReader class and pass that in.
Related
I want to covert a string based protocol to Json, Performance is key
The String based protocol is something like
<START>A12B13C14D15<END>
and json is
{'A':12,'B':13,'C':14,'D':15}
I can regex parse the string, create a map & serialized to a Json, but it seeems lot of work as I need to convert a stream in realtime.
Would it be more efficient if I just do string manipulation to get the Json output? How can I do the conversion efficiently?
JSON serialization performance is likely not a problem. Don't optimize it prematurely. If you roll your own JSON serializer, you need to put some effort into e.g. getting the escapes right. If the performance does become a problem, take a look at Jackson, which is fairly fast.
Java seems to do regex quite fast, so you might be fine with it but beware that it is quite possible to accidentally build a regex that with some inputs starts backtracking heavily and takes several minutes to evaluate. You could use native String methods to parse the string.
If performance is really a concern, do timing tests on different approaches, select right tools, see what takes time and optimize accordingly.
Lots of ways to go about it, on JSON side. Instead of Map, which is not needed, POJO is often most convenient. Following uses Jackson (https://github.com/FasterXML/jackson-databind) library:
final static ObjectMapper MAPPER = new ObjectMapper(); // remember to reuse for good perf
public class ABCD {
public int A, B, C, D;
}
// if you have output stream handy:
ABCD value = new ABCD(...);
OutputStream out = ...;
MAPPER.writeValue(out, value);
// or if not
byte[] raw = MAPPER.writeValueAsBytes(value);
or, if you want to eliminate even more of overhead (which, really, is unlikely to matter here):
JsonGenerator jgen = MAPPER.getFactory().createGenerator(out);
jgen.writeStartObject();
jgen.writeNumberField("A", valueA);
jgen.writeNumberField("B", valueB);
jgen.writeNumberField("C", valueC);
jgen.writeNumberField("D", valueD);
jgen.writeEndObject();
jgen.close();
and that gets to quite to close to optimal performance you'd get with hand-written code.
In my case I used this library to handle with json in a web application.
Don't remember where to find. May this helps:
http://www.findjar.com/class/org/json/JSONArray.html
We are constructing a large block of text and are using a single instance of StringBuilder. We have broken up the block of text into subsections (5) and assigned each a corresponding method. Each method takes input variables and spits out text.
Is it better to pass in the StringBuilder object to each method, append the data in the method and return void or have each method return a string that we append to the object outside of smaller functions?
What would be some benefits/drawbacks to both ideas.
I'd pass in the StringBuilder, and append directly to it - assuming you don't actually need that intermediate string for any other reason.
The whole point of using StringBuilder is to avoid creating more strings than you need.
The main advantage of just returning strings is that if you ever want to use the same code in a situation where you're not appending to a StringBuilder, it's more convenient and idiomatic. But I'm assuming these are actually private methods just called from the "build the text" method, which makes it less of an issue.
Zim-Zam's point about parallelism is an interesting one, but I wouldn't worry about that unless you're actually planning to parallelize this.
Passing in a StringBuilder is more memory efficient because there's no need to garbage-collect the strings that are being appended. Returning strings is more amenable to parallelization because the strings can be generated concurrently and then appended in sequence.
fixed literal will be detected by compiler so here string is better to use
String s1="str1"+"str2"+"str3";
When you need to append strings dynamically, prefer StringBuilder
String s1="";
s1+="str1";
s1+="str2";
s1+="str3";
and use methods of StringBuilder to append and finally convert to
toString();
as by using + to append later (compiler optimization is not done) more objects are created
So use StringBilder
I am really confused about the purpose of various io classes, for example, If we have BufferedWriter, why we need a PrintWriter?
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = null;
while(s=br.readline()!=null) {
PrintWriter fs = new PrintWriter(new FileWriter(file));
fs.println(s);
}
if the BufferedWriter can not help? I just do not understand the difference between these io classes, can someone explain me?
They have nothing to do with each other. In all truth, I rarely use PrintWriter except to convert System.out temporarily. But anyway.
BufferedWriter, like BufferedReader/BufferedInputStream/BufferedOutputStream merely decorates the enclosed Writer with a memory buffer (you can specify the size) or accept a default. This is very useful when writing to slow Writers like network or file based. (Stuff is committed in memory and only occasionally to disk for example) By buffering in memory the speed is greatly increased - try writing code that writes to say a 10 mb file with just FileWriter and then compare to the same with BufferedWriter wrapped around it.
So that's BufferedWriter. It throws in a few convenience methods, but mostly it just provides this memory buffer.
PrintWriter mostly is a simple decorator that adds some specific write methods for various types like String, float, etc, so you don't have to convert everything to raw bytes.
Edited:
This already has come up
The PrintWriter is essentially a convenience class. If you want to quickly and easily blast out a line of text to e.g. a log file, PrintWriter makes it very easy.
Three features:
The print and println methods will take any data type and do the conversion for you. Not just String.
The relatively new format method is worth its weight in gold. Now it's as simple in Java as in C to output a line of text with C-style format control.
The methods never throw an exception! Some programmers are horrified at the possibility of never hearing about things going wrong. But if it's a throwaway program or doing something really simple, the convenience can be nice. Especially if output is to System.out or System.err which have few ways of going wrong.
The main reason to use the PrintWriter is to get access to the printXXX methods (like println(int)). You can essentially use a PrintWriter to write to a file just like you would use System.out to write to the console.
A BufferedWriter is an efficient way to write to a file (or anything else) as it will buffer the characters in Java memory before writing to the file.
All,
I was wondering if clearing a StringBuffer contents using the setLength(0) would make sense. i.e. Is it better to do :
while (<some condition>)
{
stringBufferVariable = new StringBuffer(128);
stringBufferVariable.append(<something>)
.append(<more>)
... ;
Append stringBufferVariable.toString() to a file;
stringBufferVariable.setLength(0);
}
My questions:
1 > Will this still have better performance than having a String object to append the contents?
I am not really sure how reinitializing the StringBuffer variable would affect the performance and hence the question.
Please pour in your comments
[edit]: Removed the 2nd question about comparing to StringBuilder since I have understood there nothing more to look into based on responses.
Better than concatenating strings?
If you're asking whether
stringBufferVariable.append("something")
.append("more");
...
will perform better than concatenating with +, then yes, usually. That's the whole reason these classes exist. Object creation is expensive compared to updating the values in a char array.
It appears most if not all compilers now convert string concatenation into using StringBuilder in simple cases such as str = "something" + "more" + "...";. The only performance difference I can then see is that the compiler won't have the advantage of setting the initial size. Benchmarks would tell you whether the difference is enough to matter. Using + would make for more readable code though.
From what I've read, the compiler apparently can't optimize concatenation done in a loop when it's something like
String str = "";
for (int i = 0; i < 10000; i++) {
str = str + i + ",";
}
so in those cases you would still want to explicitly use StringBuilder.
StringBuilder vs StringBuffer
StringBuilder is not thread-safe while StringBuffer is, but they are otherwise the same. The synchronization performed in StringBuffer makes it slower, so StringBuilder is faster and should be used unless you need the synchronization.
Should you use setLength?
The way your example is currently written I don't think the call to setLength gets you anything, since you're creating a new StringBuffer on each pass through the loop. What you should really do is
StringBuilder sb = new StringBuilder(128);
while (<some condition>) {
sb.append(<something>)
.append(<more>)
... ;
// Append stringBufferVariable.toString() to a file;
sb.setLength(0);
}
This avoids unnecessary object creation and setLength will only be updating an internal int variable in this case.
I'm just focusing on this part of the question. (The other parts have been asked and answered many times before on SO.)
I was wondering if clearing a StringBuffer contents using the setLength(0) would make sense.
It depends on the Java class libraries you are using. In some older Sun releases of Java, the StringBuffer.toString() was implemented on the assumption that a call to sb.toString() is the last thing that is done with the buffer. The StringBuffer's original backing array becomes part of the String returned by toString(). A subsequent attempt to use the StringBuffer resulted a new backing array being created and initialized by copying the String contents. Thus, reusing a StringBuffer as your code tries to do would actually make the application slower.
With Java 1.5 and later, a better way to code this is as follows:
bufferedWriter.append(stringBufferVariable);
stringBufferVariable.setLength(0);
This should copy the characters directly from the StringBuilder into the file buffer without any need to create a temporary String. Providing that the StringBuffer declaration is outside the loop, the setLength(0) then allows you to reuse the buffer.
Finally, you should only be worrying about all of this if you have evidence that this part of the code is (or is likely to be) a bottleneck. "Premature optimization is the root of all evil" blah, blah.
For question 2, StringBuilder will perform better than StringBuffer. StringBuffer is thread safe, meaning methods are synchronized. String Builder is not synchronized. So if the code you have is going to be run by ONE thread, then StringBuilder is going to have better performance since it does not have the overhead of doing synchronization.
As camickr suggest, please check out the API for StringBuffer and StringBuilder for more information.
Also you may be interested in this article: The Sad Tragedy of Micro-Optimization Theater
1 > Will this still have better performance than having a String object to append the contents?
Yes, concatenating Strings is slow since you keep creating new String Objects.
2 > If using StringBuilder would perform better than StringBuffer here, than why?
Have you read the API description for StringBuilder and/or StringBuffer? This issued is addressed there.
I am not really sure how reinitializing the StringBuffer variable would affect the performance and hence the question.
Well create a test program. Create a test that creates a new StringBuffer/Builder every time. Then rerun the test and just reset the characters to 0 and then compare the times.
Perhaps I am misunderstanding something in your question... why are you setting the length to 0 at the bottom if you are just creating a new one at the start of each iteration?
Assuming the variable is a local variable to the method or that it will not be using by multiple threads if it is declared outside of a method (if it is outside of a method your code probably has issues though) then make it a StringBuilder.
If you declare the StringBuilder outside of the loop then you don't need to make a new one each time you enter the loop but you would want to set the length to 0 at the end of the loop.
If you declare the StringBuilder inside of the loop then you don't need to set the length to 0 at the end of the loop.
It is likely that declaring it outside of the loop and setting the length to 0 will be faster, but I would measure both and if there isn't a large difference declare the variable inside the loop. It is good practice to limit the scope of variables.
yup! setLength(0) is a great idea! that's what its for. anything quicker would be to discard the stringBuffer & make a new one. its faster, can't say anything about it being memory efficient :)
I'm working on a spaghetti monster (unfortunately not of the flying variety), and I've got a question about proper design.
I'm in the process of taking a gigantic static Java method that returns an object and splitting it into reusable (and readable) components. Right now, the method reads an XML document, and then appends summary and detailed information from the document to a "dataModule", and the dataModule is then returned from the method.
In breaking my code up into a getSummaryData and getDetailedData method, I noticed I'd done the following:
dataModule = getSummaryData(xmlDocument);
setDetailedData(xmlDocument, dataModule);
(Pass by reference, append detailed data to dataModule within method)
This mostly has to do with the fact that the detailed data requires business logic based on the summary data in order to be parsed properly, and the fact that changing the structure of the dataModule involves lots of changing the front end of the application.
Is this approach any better than:
dataModule = getSummaryData(xmlDocument);
dataModule = setDetailedData(xmlDocument, dataModule);
(Pass by reference, append detailed data to dataModule within method, return dataModule)
I can't share much more of the code without revealing "teh secretz", but is there a strong reason to go with one approach over the other? Or, am I just getting caught up in which shade of lipstick I'm putting on my pig, here?
Thanks,
IVR Avenger
I find your second approach, where you return the same object, more confusing - because it implies to the calling function that a different object might be returned. If you're modifying the object, your first solution looks fine to me.
One principle I'd use to answer your question is that you want as many things as possible to be final, so that you have less trouble reasoning about state. By that principle, you'd want to avoid a meaningless reassignment.
final DataModule dataModule = getSummaryData(xmlDocument);
setDetailedData(xmlDocument, dataModule);
But that's wrong too. Why should the summary and detailed data be separate steps? Will you ever do one without the other? If not, those steps should be private to the DataModule. Really, the data module should probably know how to construct itself from the xml data.
final DataModule dataModule = new DataModule(xmlDocument);
The (arguable) advantage of the second approach is that it permits method chaining.
Say you had, in addition to setDetailedData(), setMoreData(), and that both functions were written to return the object. You could then write:
dataModule = getSummaryData(xmlDocument);
dataModule = dataModule.setDetailedData(xmlDocument).setMoreData();
I don't think the example you've provided benefits much from a method chaining syntax, but there are examples where it can lead to truly beautiful, expressive code. It permits what Martin Fowler calls a Fluent Interface.