I am using play framework for web development. I was wondering what is the effect of the longer variable on user's query processing.In other words how the name of the variable affects the query size? Does a longer name means longer query to be sent and thus longer time for a user request to process? Does a shorter JSON means shorter response time to user? Shorter variable names will significantly reduce the readability of the code.
What I have found
If I rename my variable to be send in a json from autoEngineId to aeId there is not much performance gain but thats maybe because I dont have significant amount of user requests to process. The site is in dev mode.
Can somebody please tell me what is the advantage/disadvantage of smaller varibale names in JSON?
Unless your variable names are the length of college essays, there's not going to be a noticeable difference in response times over the network. Spend your time optimizing where it counts. That will be where profiling tells you it will count.
The primary advantage of shorter names is less typing for the programmer. Make the names long enough so that when you come back to your code after a month, the name will provide at least a clue as to what it's being used for. Because you will have forgotten.
I suggest you profile your application to determine where you can improve performance. Even after 20 years experience of tuning performance systems I still find that measuring is the only way to be sure you are a) working on the most significant performance improvements b) you don't make matters worse.
In your case, I assume you have particular reason to believe the length of attribute names is a performance issue, it just something you change. This approach to performance tuning is more likely to make matter worse than better.
Smaller variable names may save you a little bandwidth, but the performance impact (unless you have hundreds of thousands of entries an object array), is going to be so small, you are unlikely to notice anything.
The impact of not having meaning variable names will cost you far more in the long run. Performance and Scalability are cheap, man-power is not, and well designed code will always pay for itself many times over in the long term.
Related
I'm investigating a memory leak in my Grails 3.3.10 server that is running on a Java 8 JVM. I took a heap dump from a production server that was running low on memory and analysed it with JXRay. The html report says that some memory is wasted on duplicate strings with 19.6% overhead. Most of it is wasted on duplicates of the empty string "" and it is mostly coming from database reads. I have two questions about this.
Should I start interning strings or is it too costly of an operation to be worth it?
Quite a bit of my code deals with deeply nested JSON structures from elasticsearch and I didn't like the fragility of the code so I made a small helper class to avoid typos when accessing data from the json.
public static final class S {
public static final String author = "author";
public static final String aspectRatio = "aspectRatio";
public static final String userId = "userId";
... etc etc
That helps me avoid typos like so:
Integer userId = json.get("userid"); // Notice the lower case i. This returns null and fails silently
Integer userId = json.get(S.userId); // If I make a typo here the compiler will tell me.
I was reasonably happy about this, but now I'm second guessing myself. Is this a bad idea for some reason? I haven't seen anyone else do this. That shouldn't cause any duplicate strings to be created because they are created once and then referenced in my parsing code, right?
The problem with a String holding class is that you are using a language against its language design.
Classes are supposed to introduce types. A type that provides no utility, because it's an "Everything that can be said with a string" type is rarely useful. While there are some patterns of this occurring in many programs, typically they introduce more behavior than "all the stuff is here." For example, locale databases provide replacement strings for different languages.
I'd start by carving out the sensible enumerations. Error messages might easily be converted into enums, which have easy auto-convert string representations. That way you get your "typo detection" and a classification built-in.
DiskErrors.DISK_NOT_FOUND
Prompts.ASK_USER_NAME
Prompts.ASK_USER_PASSWORD
The side-effect of changes like this can hit your desired goal; but beware, these kinds of changes often signal the loss of readability.
Readability isn't what you think is easy to read, it's what a person who has never used the code would think is easy to read.
If I were to see a problem with "Your selected hard drive was not found", then I'd look through the code base for a string "Your selected hard drive was not found". That could land me in two places:
In the block of code were the error message was raised.
In a table mapping that string to a name.
In many blocks of code where the same error message is raised.
With the table mapping, I can then do a second search, searching for where the name is used. That can land me with a few scenarios:
It is used in one place.
It is used in many places.
With one place, a kind of code maintenance problem arises. You now have a constant that is not used by any other part of the code maintained in a place that is not near where it is used. This means that to do any change that requires full understanding of the impact, someone has to keep the remote constant's value in mind to know if the logical change should be combined with an updated error message. It's not the updating of the error message that causes the extra chance for error, it's the fact that it is removed from the code being worked on.
With multiple places, I have to cycle through all of matches, which basically is the same effort as the multiple string matches in the first step. So, the table doesn't help me find the source of the error, it just adds extra steps that are not relevant to fixing the issue.
Now the table does have a distinct benefit in one scenario: When all the messages for a specific kind of issue should be updated at the same time. The problem is, that such a scenario is rare, and unlikely to happen. What is more likely to happen is that an error message is not specific enough for a certain scenario; but, after another "scan of all the places it is used" is correct for other scenarios. So the error message is split, instead of updated in place, because the coupling enforced by the lookup table means one cannot modify some of the error messages without creating a new error message.
Problems like this come from developers slipping in features that appeal to developers.
In your case, you're building in an anti-typo system. Let me offer a better solution; because typos are real, and a real problem too.
Write a unit test to capture the expected output. It is rare that you will write the same typo twice, exactly the same way. Yes, it is possible, but coordinated typos will impact both systems the same. If you introduce a spelling error in your lookup table, and introduce it in the usage, the benefit would be a working program, but it would be hard to call it a quality solution (because the typos weren't protected against and are there in duplicate).
Have your code reviewed before submitting it to a build system. Reviews can get out of hand, especially with inflexible reviewers, but a good review should comment on "you spelled this wrong." If possible review the code as a team, so you can point out your ideas as they make their comments. If you have difficultly working with people (or they have difficulty working with people) you will find peer-review hard. I'm sorry if that happens, but if you can get a good peer review, it's the second "best" defense against these issues.
Sorry for the length of this reply, but I hope this gives you a chance to remember to "step back" from a solution and see how it impacts your future actions with the code.
And as for the "" String, focusing on why it is being set would probably be more effective in building a better product than patching the issue with interning (but I don't have access to your code base, so I might be wrong!)
Good luck
Q1: Should I start interning strings or is it too costly of an operation to be worth it?
It is hard to say without more information about how the strings are being created and their typical lifetime, but the general answer is No. It is generally not worth it.
(And interning won't fix your memory leak.)
Here are some of the reasons (a bit hand-wavey I'm afraid):
Interning a String doesn't prevent the string you are interning from being created. Your code still needs to create it and the GC still needs to collect it.
There is a hidden data structure that organizes the interned strings. That uses memory. It also costs CPU to check to see if a string is in the interning data structure and add it if needed.
The GC needs to do special (weak reference like) things with the interning data structure to prevent it from leaking. That is an overhead.
An interned string tends to live longer than a non-interned string. It is more likely to be tenured to the "old" heap, which leads to its lifetime extended even longer ... because the "old" heap is GC'ed less often.
If you are using the G1 collector AND the duplicate strings are typically long lived, you might want to try enabling G1GC string deduplication (see here). Otherwise, you are probably better off just letting the GC deal with the strings. The Java GC's are designed to efficiently deal with with lots of objects (such as strings) being created and thrown away soon after.
If it is your code that is creating the Java strings, then it might be worth tweaking it to avoid creating new zero length strings. Manually interning the zero length strings as per #ControlAltDel's comment is probably not worth the effort.
Finally, if you are going to try to reduce the duplication one way or another, I would strongly advise that you set things up so that you can measure the effects of the optimization:
Do you actually save memory?
Does this affect the rate of GC runs?
Does this affect GC pauses?
Does it affect request times / throughput?
If the measurements say that the optimization hasn't helped, you need to back it out.
Q2: Is this a bad idea for some reason? That shouldn't cause any duplicate strings to be created because they are created once and then referenced in my parsing code, right?
I can't think of any reason not to do that. It certainly doesn't lead directly to creating of duplicate strings.
On the other hand, you won't reduce string duplication simply by doing that. String objects that represent literals get interned automatically.
I have an ArrayList of a custom object A. I need to retrieve 2 variables from A based on certain conditions. Should I simply use for loop to retrieve data from the list each time or create 2 LinkedHashMap and store the required variable in it as key/value pair for faster access later? Which is more efficient? Does creating 2 additional map objects justify the efficiency during search?
List will contain about 100-150 objects so does the two maps.
It will be used by concurrent users on daily basis.
Asking about "efficiency" is like asking about "beauty". What is "efficiency"? I argue that efficiency is what gets the code out soonest without bugs or other misbehavior. What's most efficient in terms of software costs is what saves programmer time, both for initial development and maintenance. In the time it took you to find "answers" on SO, you could have had a correct implementation coded and correct, and still had time to test your alternatives rigorously under controlled conditions to see which made any difference in the program's operation.
If you save 10 ms of program run time at the cost of horridly complex, over-engineered code that is rife with bugs and stupidly difficult to refactor or fix, is that "efficient"?
Furthermore, as phrased, the question is useless on SO. You provided no definition of "efficient" from your context. You provided no information on how the structures in question fit into your project architecture, or the extent of their use, or the size of the problem, or anything else relevant to any definition of "efficiency".
Even if you had, we'd have no more ability to answer such a question than if you asked a roomful of lawyers, "Should I sue so-and-so for what they did?" It all depends. You need advice, if you need advice at all, that is very specific to your situation and the exact circumstances of your development environment and process, your runtime environment, your team, the project goals, budget, and other relevant data.
If you are interested in runtime "efficiency", do the following. Precisely define what exactly you mean by "efficient", including an answer to "how 'efficient' is 'efficient' enough?", and including criteria to measure such "efficiency". Once you have such a precise and (dis)provable definition, then set up a rigorous test protocol to compare the alternatives in your context, and actually measure "efficiency".
When defining "efficiency", make sure that what you define matters. It makes no difference to be "efficient" in an area that has very low project cost or impact, and ignore an area that has huge cost or impact.
Don't expect any meaningful answer for your situation here on SO.
Use LinkedHashMap because it made for key value pair (according to your requirement).because data will increase in production environment.
I'm programming something in Java, for context see this question: Markov Model descision process in Java
I have two options:
byte[MAX][4] mypatterns;
or
ArrayList mypatterns
I can use a Java ArrayList and append a new arrays whenever I create them, or use a static array by calculating all possible data combinations, then looping through to see which indexes are 'on or off'.
Essentially, I'm wondering if I should allocate a large block that may contain uninitialized values, or use the dynamic array.
I'm running in fps, so looping through 200 elements every frame could be very slow, especially because I will have multiple instances of this loop.
Based on theory and what I have heard, dynamic arrays are very inefficient
My question is: Would looping through an array of say, 200 elements be faster than appending an object to a dynamic array?
Edit>>>
More information:
I will know the maxlength of the array, if it is static.
The items in the array will frequently change, but their sizes are constant, therefore I can easily change them.
Allocating it statically will be the likeness of a memory pool
Other instances may have more or less of the data initialized than others
You right really, I should use a profiler first, but I'm also just curious about the question 'in theory'.
The "theory" is too complicated. There are too many alternatives (different ways to implement this) to analyse. On top of that, the actual performance for each alternative will depend on the the hardware, JIT compiler, the dimensions of the data structure, and the access and update patterns in your (real) application on (real) inputs.
And the chances are that it really doesn't matter.
In short, nobody can give you an answer that is well founded in theory. The best we can give is recommendations that are based on intuition about performance, and / or based on software engineering common sense:
simpler code is easier to write and to maintain,
a compiler is a more consistent1 optimizer than a human being,
time spent on optimizing code that doesn't need to be optimized is wasted time.
1 - Certainly over a large code-base. Given enough time and patience, human can do a better job for some problems, but that is not sustainable over a large code-base and it doesn't take account of the facts that 1) compilers are always being improved, 2) optimal code can depend on things that a human cannot take into account, and 3) a compiler doesn't get tired and make mistakes.
The fastest way to iterate over bytes is as a single arrays. A faster way to process these are as int or long types as process 4-8 bytes at a time is faster than process one byte at a time, however it rather depends on what you are doing. Note: a byte[4] is actually 24 bytes on a 64-bit JVM which means you are not making efficient use of your CPU cache. If you don't know the exact size you need you might be better off creating a buffer larger than you need even if you are not using all the buffer. i.e. in the case of the byte[][] you are using 6x time the memory you really need already.
Any performance difference will not be visible, when you set initialCapacity on ArrayList. You say that your collection's size can never change, but what if this logic changes?
Using ArrayList you get access to a lot of methods such as contains.
As other people have said already, use ArrayList unless performance benchmarks say it is a bottle neck.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I understand the JVM optimizes some things for you (not clear on which things yet), but lets say I were to do this:
while(true) {
int var = 0;
}
would doing:
int var;
while(true) {
var = 0;
}
take less space? Since you aren't declaring a new reference every time, you don't have to specify the type every time.
I understand you really would only need to put var outside of while if I wanted to use it outside of that loop (instead of only being able to use it locally like in the first example). Also, what about objects, would it be different that primitive types in that situation? I understand it's a small situation, but build-up of this kind of stuff can cause my application to take a lot of memory/cpu. I'm trying to use the least amount of operations possible, but I don't completely understand whats going on behind the scenes.
If someone could help me out, even maybe link me to somewhere I can learn about saving cpu by decreasing amount of operations, it would be highly appreciated. Please no books (unless they're free! :D), no way of getting one right now /:
Don't. Premature optimization is the root of all evil.
Instead, write your code as it makes most sense conceptually. Write it thoughtfully, yes. But don't think you can be a 'human compiler' and optimize and still write good code.
Once you have written your code (more or less naively, depending on your level of experience) you write performance tests for it. Try to think of different ways in which the code may be used (many times in a row, from front to back or reversed, many concurrent invocations etc) and try to cover these in test cases. Then benchmark your code.
If you find that some test cases are not performing well, investigate why. Measure parts of the test case to see where the time is going. Zoom into the parts where most time is spent.
Mostly, you will find weird loops where, upon reading the code again, you will think 'that was silly to write it that way. Of course this is slow' and easily fix it. In my experience most performance problems can be solved this way and 'hardcore optimization' is hardly ever needed.
In the end you will find that 99* percent of all performance problems can be solved by touching only 1 percent of the code. The other code never comes into play. This is why you should not 'prematurely' optimize. You will be spending valuable time optimizing code that had no performance issues in the first place. And making it less readable in the process.
Numbers made up of course but you know what I mean :)
Hot Licks points out the fact that this isn't much of an answer, so let me expand on this with some good ol' perfomance tips:
Keep an eye out for I/O
Most performance problems are not in pure Java. Instead they are in interfacing with other systems. In particular disk access is notoriously slow. So is the network. So minimize it's use.
Optimize SQL queries
SQL queries will add seconds, even minutes, to your program's execution time if you don't watch out. So think about those very carefully. Again, benchmark them. You can write very optimized Java code, but if it first spends ten seconds waiting for the database to run some monster SQL query than it will never be fast.
Use the right kind of collections
Most performance problems are related to doing things lots of times. Usually when working with big sets of data. Putting your data in a Map instead of in a List can make a huge difference. Also there are specialized collection types for all sorts of performance requirements. Study them and pick wisely.
Don't write code
When performance really matters, squeezing the last 'drops' out of some piece of code becomes a science all in itself. Unless you are writing some very exotic code, chances are great there will be some library or toolkit to solve your kind of problems. It will be used by many in the real world. Tried and tested. Don't try to beat that code. Use it.
We humble Java developers are end-users of code. We take the building blocks that the language and it's ecosystem provides and tie it together to form an application. For the most part, performance problems are caused by us not using the provided tools correctly, or not using any tools at all for that matter. But we really need specifics to be able to discuss those. Benchmarking gives you that specifity. And when the slow code is identified it is usually just a matter of changing a collection from list to map, or sorting it beforehand, or dropping a join from some query etc.
Attempting to optimise code which doesn't need to be optimised increases complexity and decreases readability.
However, there are cases were improving readability also comes with improved performance.
For example,
if a numeric value cannot be null, use a primitive instead of a wrapper. This makes it clearer that the value cannot be null but also uses less memory and reduces pressure on the GC.
use a Set when you have a collection which cannot have duplicates. Often a List is used when in fact a Set would be more appropriate, depending on the operations you perform, this can also be faster by reducing time complexity.
consider using an enum with one instance for a singleton (if you have to use singletons at all) This is much simpler as well as faster than double check locking. Hint: try to only have stateless singletons.
writing simpler, well structured code is also easier for the JIT to optimise. This is where trying to out smart the JIT with more complex solutions will back fire because you end up confusing the JIT and what you think should be faster is actually slower. (And it's more complicated as well)
try to reduce how much you write to the console (and IO in general) in critical sections. Writing to the console is so expensive, both for the program and the poor human having to read it that is it worth spending more time producing concise console output.
try to use a StringBuilder when you have a loop of elements to add. Note: Avoid using StringBuilder for one liners, just series of append() as this can actually be slower and harder to read.
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. --
Antoine de Saint-Exupery,
French writer (1900 - 1944)
Developers like to solve hard problems and there is a very strong temptation to solve problems which don't need to be solved. This is a very common behaviour for developers of up to 10 years experience (it was for me anyway ;), after about this point you have already solved most common problem before and you start selecting the best/minimum set of solutions which will solve a problem. This is the point you want to get to in your career and you will be able to develop quality software in far less time than you could before.
If you dream up an interesting problem to solve, go ahead and solve it in your own time, see what difference it makes, but don't include it in your working code unless you know (because you measured) that it really makes a difference.
However, if you find a simpler, elegant solution to a problem, this is worth including not because it might be faster (thought it might be), but because it should make the code easier to understand and maintain and this is usually far more valuable use of your time. Successfully used software usually costs three times as much to maintain as it cost to develop. Do what will make the life of the poor person who has to understand why you did something easier (which is harder if you didn't do it for any good reason in the first place) as this might be you one day ;)
A good example on when you might make an application slower to improve reasoning, is in the use of immutable values and concurrency. Immutable values are usually slower than mutable ones, sometimes much slower, however when used with concurrency, mutable state is very hard to get provably right, and you need this because testing it is good but not reliable. Using concurrency you have much more CPU to burn so a bit more cost in using immutable objects is a very sensible trade off. In some cases using immutable objects can allow you to avoid using locks and actually improve throughput. e.g. CopyOnWriteArrayList, if you have a high read to write ration.
What are some generic methods for optimizing a program in Java, in terms of speed. I am using a DOM Parser to parse an XML file and then store certain words in an ArrayList, remove any duplicates then spell check those words by creating Google search URL's for each word, get the html document, locate the corrected word and save it to another ArrayList.
Any help would be appreciated! Thanks.
Why do you need to improve performance? From your explanation, it is pretty obvious that the big bottleneck here (or performance hit) is going to be the IO resulting from the fact that you are accessing a URL.
This will surely dwarf by orders of magnitude any minor improvements you make in data structures or XML frameworks.
It is a good general rule of thumb that your big performance problems will involve IO. Humorously enough, I am at this very moment waiting for a database query to return in a batch process. It has been running for almost an hour. But I welcome any suggested improvements to my XML parsing library nevertheless!
Here are my general methods:
Does your program perform any obviously expensive task from the perspective of latency (IO)? Do you have enough logging to see that this is where the delay is (if significant)?
Is your program prone to lock-contention (i.e. can it wait around, doing nothing, waiting for some resource to be "free")? Perhaps you are locking an entire Map whilst you make an expensive calculation for a value to store, blocking other threads from accessing the map
Is there some obvious algorithm (perhaps for data-matching, or sorting) that might have poor characteristics?
Run up a profiler (e.g. jvisualvm, which ships with the JDK itself) and look at your code hotspots. Where is the JVM spending its time?
SAX is faster than DOM. If you don't want to go through the ArrayList searching for duplicates, put everything in a LinkedHashMap -- no duplicates, and you still get the order-of-insertion that ArrayList gives you.
But the real bottleneck is going to be sending the HTTP request to Google, waiting for the response, then parsing the response. Use a spellcheck library, instead.
Edit: But take my educated guesses with a grain of salt. Use a code profiler to see what's really slowing down your program.
Generally the best method is to figure out where your bottleneck is, and fix it. You'll usually find that you spend 90% of your time in a small portion of your code, and that's where you want to focus your efforts.
Once you've figured out what's taking a lot of time, focus on improving your algorithms. For example, removing duplicates from an ArrayList can be O(n²) complexity if you're using the most obvious algorithm, but that can be reduced to O(n) if you leverage the correct data structures.
Once you've figured out which portions of your code are taking the most time, and you can't figure out how best to fix it, I'd suggest narrowing down your question and posting another question here on StackOverflow.
Edit
As #oxbow_lakes so snidely put it, not all performance bottlenecks are to be found in the code's big-O characteristics. I certainly had no intention to imply that they were. Since the question was about "general methods" for optimizing, I tried to stick to general ideas rather than talking about this specific program. But here's how you can apply my advice to this specific program:
See where your bottleneck is. There are a number of ways to profile your code, ranging from high-end, expensive profiling software to really hacky. Chances are, any of these methods will indicate that your program spends the 99% of its time waiting for a response from Google.
Focus on algorithms. Right now your algorithm is (roughly):
Parse the XML
Create a list of words
For each word
Ping Google for a spell check.
Return results
Since most of your time is spent in the "ping Google" phase, an obvious way to fix this would be to avoid doing that step more times than necessary. For example:
Parse the XML
Create a list of words
Send list of words to spelling service.
Parse results from spelling service.
Return results
Of course, in this case, the biggest speed boost would probably be by using spell checker that runs on the same machine, but that isn't always an option. For example, TinyMCE runs as a javascript program within the browser, and it can't afford to download the entire dictionary as part of the web page. So it packages up all the words into a distinct list and performs a single AJAX request to get a list of those words that aren't in the dictionary.
These folks are probably right, but a few random pauses will turn *probably" into "definitely, and here's why".