I am working on some inherited code and I am not use to the entity frame work. I'm trying to figure out why a previous programmer coded things the way they did, sometimes mixing and matching different ways of querying data.
Deal d = _em.find(Deal.class, dealid);
List<DealOptions> dos = d.getDealOptions();
for(DealOptions o : dos) {
if(o.price == "100") {
//found the 1 item i wanted
}
}
And then sometimes i see this:
Query q = _em.createQuery("select count(o.id) from DealOptions o where o.price = 100 and o.deal.dealid = :dealid");
//set parameters, get results then check result and do whatver
I understand what both pieces of code do, and I understand that given a large dataset, the second way is more efficient. However, given only a few records, is there any reason not to do a query vs just letting the entity do the join and looping over your recordset?
Some reasons never to use the first approach regardless of the number of records:
It is more verbose
The intention is less clear, since there is more clutter
The performance is worse, probably starting to degrade with the very first entities
The performance of the first approach will degrade much more with each added entity than with the second approach
It is unexpected - most experienced developers would not do it - so it needs more cognitive effort for other developers to understand. They would assume you were doing it for a compelling reason and would look for that reason without finding one.
Related
I have a project where I make a request to get some objects. This is the fastest implementation I have after making some tests but I feel I'm missing something. I have 2000 objects in database for my tests and the code before the computation takes 3.25 seconds to execute.
val allSessions = realm.where(Session::class.java).isNotNull("endDate").findAll()
// added for better performances
val sessionsList = realm.copyFromRealm(allSessions)
val sessionGroup1 = mutableListOf<Session>()
val sessionGroup2 = mutableListOf<Session>()
// otherwise the bottleneck is here, the foreach is slow
sessionsList.forEach { session ->
if (session.isGroup1()) {
sessionGroup1.add(session)
} else {
sessionGroup2.add(session)
}
}
// doComputations(), like sums, averages...
I have to access values for all objects to performs sum, averages and so on.
What would be the fastest way to do that?
try making endDate field as Index field using annotation.
from documentation:
Like primary keys, this makes writes slightly slower , but makes reads faster. (It also makes your Realm file slightly larger, to store the index.) It’s best to only add indexes when you’re optimizing the read performance for specific situations.
import io.realm.annotations.Index;
#Index
private String endDate;
I'm glad to have achieved much better results using the following:
#Index does help, not by a lot in my case but still
I made a dedicated computable object as small as possible
I used RealmResults.sum() which is so fast you should absolutely consider it
I reversed the direction of a relationship to speed up queries
Currently at 0.38s while keeping a for loop in my code, otherwise basic calculations takes 0.16s, so I'm in between a 10x to 20x performance gain!
Note: I no longer use realm.copyFromRealm(allSessions)
I have an array list with some names inside it (first and last names). What I have to do is go through each "first name" and see how many times a character (which the user specifies) shows up at the end of every first name in the array list, and then print out the number of times that character showed up.
public int countFirstName(char c) {
int i = 0;
for (Name n : list) {
if (n.getFirstName().length() - 1 == c) {
i++;
}
}
return i;
}
That is the code I have. The problem is that the counter (i) doesn't add 1 even if there is a character that matches the end of the first name.
You're comparing the index of last character in the string to the required character, instead of the last character itself, which you can access with charAt:
String firstName = n.getFirstName()
if (firstName.charAt(firstName.length() - 1) == c) {
i++;
}
When you're setting out learning to code, there is a great value in using pencil and paper, or describing your algorithm ahead of time, in the language you think in. Most people that learn a foreign language start out by assembling a sentence in their native language, translating it to foreign, then speaking the foreign. Few, if any, learners of a foreign language are able to think in it natively
Coding is no different; all your life you've been speaking English and thinking in it. Now you're aiming to learn a different pattern of thinking, syntax, key words. This task will go a lot easier if you:
work out in high level natural language what you want to do first
write down the steps in clear and simple language, like a recipe
don't try to do too much at once
Had I been a tutor marking your program, id have been looking for something like this:
//method to count the number of list entries ending with a particular character
public int countFirstNamesEndingWith(char lookFor) {
//declare a variable to hold the count
int cnt = 0;
//iterate the list
for (Name n : list) {
//get the first name
String fn = n.getFirstName();
//get the last char of it
char lc = fn.charAt(fn.length() - 1);
//compare
if (lc == lookFor) {
cnt++;
}
}
return cnt;
}
Taking the bullet points in turn:
The comments serve as a high level description of what must be done. We write them aLL first, before even writing a single line of code. My course penalised uncommented code, and writing them first was a handy way of getting the requirement out of the way (they're a chore, right? Not always, but..) but also it is really easy to write a logic algorithm in high level language, then translate the steps into the language learning. I definitely think if you'd taken this approach you wouldn't have made the error you did, as it would have been clear that the code you wrote didn't implement the algorithm you'd have described earlier
Don't try to do too much in one line. Yes, I'm sure plenty of coders think it looks cool, or trick, or shows off what impressive coding smarts they have to pack a good 10 line algorithm into a single line of code that uses some obscure language features but one day it's highly likely that someone else is going to have to come along to maintain that code, improve it or change part of what it does - at that moment it's no longer cool, and it was never really a smart thing to do
Aominee, in their comment, actually gives us something like an example of this:
return (int)list.stream().filter(e -> e.charAt.length()-1)==c).count();
It's a one line implementation of a solution to your problem. Cool huh? Well, it has a bug* (for a start) but it's not the main thrust of my argument. At a more basic level: have you got any idea what it's doing? can you look at it and in 2 seconds tell me how it works?
It's quite an advanced language feature, it's trick for sure, but it might be a very poor solution because it's hard to understand, hard to maintain as a result, and does a lot while looking like a little- it only really makes sense if you're well versed in the language. This one line bundles up a facility that loops over your list, a feature that effectively has a tiny sub method that is called for every item in the list, and whose job is to calculate if the name ends with the sought char
It p's a brilliant feature, a cute example and it surely has its place in production java, but it's place is probably not here, in your learning exercise
Similarly, I'd go as far to say that this line of yours:
if (n.getFirstName().length() - 1 == c) {
Is approaching "doing too much" - I say this because it's where your logic broke down; you didn't write enough code to effectively implement the algorithm. You'd actually have to write even more code to implement this way:
if (n.getFirstName().charAt(n.getFirstName().length() - 1) == c) {
This is a right eyeful to load into your brain and understand. The accepted answer broke it down a bit by first getting the name into a temporary variable. That's a sensible optimisation. I broke it out another step by getting the last char into a temp variable. In a production system I probably wouldn't go that far, but this is your learning phase - try to minimise the number of operations each of your lines does. It will aid your understanding of your own code a great deal
If you do ever get a penchant for writing as much code as possible in as few chars, look at some code golf games here on the stack exchange network; the game is to abuse as many language features as possible to make really short, trick code.. pretty much every winner stands as a testament to condense that should never, ever be put into a production system maintained by normal coders who value their sanity
*the bug is it doesn't get the first name out of the Name object
To allow users to search across multiple fields with Lucene 3.5 I currently create and add a QueryParser for each field to be searched to a DisjunctionMaxQuery. This works great when using OR as the default operator but I now want to change the default operator to AND to get more accurate (and fewer) results.
Problem is, queryParser.setDefaultOperator(QueryParser.AND_OPERATOR) misses many documents since all terms must be in atleast 1 field.
For example, consider the following data for a document: title field = "Programming Languages", body field = "Java, C++, PHP". If a user were to search for Java Programming this particular document would not be included in the results since the title nor the body field contains all terms in the query although combined they do. I would want this document returned for the above query but not for the query HTML Programming.
I've considered a catchall field but I have a few problems with it. First, users frequently include per field terms in their queries (author:bill) which is not possible with a catchall field. Also, I highlight certain fields with FastVectorHighlighter which requires them to be indexed and stored. So by adding a catchall field I would have to index most of the same data twice which is time and space consuming.
Any ideas?
Guess I should have done a little more research. Turns out MultiFieldQueryParser provides the exact functionality I was looking for. For whatever reason I was creating a QueryParser for each field I wanted to search like this:
String[] fields = {"title", "body", "subject", "author"};
QueryParser[] parsers = new QueryParser[fields.length];
for(int i = 0; i < parsers.length; i++)
{
parsers[i] = new QueryParser(Version.LUCENE_35, fields[i], analyzer);
parsers[i].setDefaultOperator(QueryParser.AND_OPERATOR);
}
This would result in a query like this:
(+title:java +title:programming) | (+body:java +body:programming)
...which is not what I was looking. Now I create a single MultiFieldQueryParser like this:
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_35, new String[]{"title", "body", "subject"}, analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
This gives me the query I was looking for:
+(title:java body:java) +(title:programming body:programming)
Thanks to #seeta and #femtoRgon for the help!
Perhaps what you need is a combination of Boolean queries that capture the different combinations of fields and terms. In your given example, the query could be -
(title:Java AND body:programming) OR (title:programming AND body:Java).
I don't know if there's an existing Query class that generates this automatically for you, but I think that's what should be the ultimate query that's run on the index.
You want to be able to search multiple fields with the same set of terms, then the question from your comment:
((title:java title:programming) | (body:java body:programming))~0.2
May not be the best implementation.
You're effectively getting either the score from the title, or the score from the body for the combined set of terms. The case where you hit java in the title and programming in the body would be given approx. equal weight to a hit on java in the body and no hit on programming.
I think a better structured query would be:
(title:java body:java)~0.2 (title:programming body:programming)~0.2
This makes more sense to me, since you want the dismax queries to limit score growing on multiple queries of the same term (in different fields), but you do want scoring to grow for hits on different terms, I believe.
If that sort of query structure gets you better score results, limiting results to a certain minimum score (a percentage of the max score returned, rather than a simple hard-coded value) may be adequate to prevent too-weak results from being seen.
I also still wouldn't count out indexing an all field. It's an implementation I've used before, while indexing BOTH the specific field and the catchall field, thus allowing both general querying and specific single-field queries. Index storage tends to be pretty lean for unstored terms, and it will generally help performance, if you find yourself having to create big, complicated queries to make up for not having it.
If you really want to be sure that it takes minimal storage, you can even turn off TermVectors for that field:
new Field(name, value, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.NO);
Although I don't know how much of a difference that would really make.
I have a variable that gets read and updated thousands of times a second. It needs to be reset regularly. But "half" the time, the value is already the reset value. Is it a good idea to check the value first (to see if it needs resetting) before resetting (a write operaion), or I should just reset it regardless? The main goal is to optimize the code for performance.
To illustrate:
Random r = new Random();
int val = Integer.MAX_VALUE;
for (int i=0; i<100000000; i++) {
if (i % 2 == 0)
val = Integer.MAX_VALUE;
else
val = r.nextInt();
if (val != Integer.MAX_VALUE) //skip check?
val = Integer.MAX_VALUE;
}
I tried to use the above program to test the 2 scenarios (by un/commenting the 2nd "if" line), but any difference is masked by the natural variance of the run duration time.
Thanks.
Don't check it.
It's more execution steps = more cycles = more time.
As an aside, you are breaking one of the basic software golden rules: "Don't optimise early". Unless you have hard evidence that this piece if code is a performance problem, you shouldn't be looking at it. (Note that doesn't mean you code without performance in mind, you still follow normal best practice, but you don't add any special code whose only purpose is "performance related")
The check has no actual performance impact. We'd be talking about a single clock cycle or something, which is usually not relevant in a Java program (as hard-core number crunching usually isn't done in Java).
Instead, base the decision on readability. Think of the maintainer who's going to change this piece of code five years on.
In the case of your example, using my rationale, I would skip the check.
Most likely the JIT will optimise the code away because it doesn't do anything.
Rather than worrying about performance, it is usually better to worry about what it
simpler to understand
cleaner to implement
In both cases, you might remove the code as it doesn't do anything useful and it could make the code faster as well.
Even if it did make the code a little slower it would be very small compared to the cost of calling r.nextInt() which is not cheap.
I have a class that is doing a lot of text processing. For each string, which is anywhere from 100->2000 characters long, I am performing 30 different string replacements.
Example:
string modified;
for(int i = 0; i < num_strings; i++){
modified = runReplacements(strs[i]);
//do stuff
}
public runReplacements(String str){
str = str.replace("foo","bar");
str = str.replace("baz","beef");
....
return str;
}
'foo', 'baz', and all other "targets" are only expected to appear once and are string literals (no need for an actual regex).
As you can imagine, I am concerned about performance :)
Given this,
replaceFirst() seems a bad choice because it won't use Pattern.LITERAL and will do extra processing that isn't required.
replace() seems a bad choice because it will traverse the entire string looking for multiple instances to be replaced.
Additionally, since my replacement texts are the same everytime, it seems to make sense for me to write my own code otherwise String.replaceFirst() or String.replace() will be doing a Pattern.compile every single time in the background. Thinking that I should write my own code, this is my thought:
Perform a Pattern.compile() only once for each literal replacement desired (no need to recompile every single time) (i.e. p1 - p30)
Then do the following for each pX: p1.matcher(str).replaceFirst(Matcher.quoteReplacement("desiredReplacement"));
This way I abandon ship on the first replacement (instead of traversing the entire string), and I am using literal vs. regex, and I am not doing a re-compile every single iteration.
So, which is the best for performance?
So, which is the best for performance?
Measure it! ;-)
ETA: Since a two word answer sounds irretrievably snarky, I'll elaborate slightly. "Measure it and tell us..." since there may be some general rule of thumb about the performance of the various approaches you cite (good ones, all) but I'm not aware of it. And as a couple of the comments on this answer have mentioned, even so, the different approaches have a high likelihood of being swamped by the application environment. So, measure it in vivo and focus on this if it's a real issue. (And let us know how it goes...)
First, run and profile your entire application with a simple match/replace. This may show you that:
your application already runs fast enough, or
your application is spending most of its time doing something else, so optimizing the match/replace code is not worthwhile.
Assuming that you've determined that match/replace is a bottleneck, write yourself a little benchmarking application that allows you to test the performance and correctness of your candidate algorithms on representative input data. It's also a good idea to include "edge case" input data that is likely to cause problems; e.g. for the substitutions in your example, input data containing the sequence "bazoo" could be an edge case. On the performance side, make sure that you avoid the traps of Java micro-benchmarking; e.g. JVM warmup effects.
Next implement some simple alternatives and try them out. Is one of them good enough? Done!
In addition to your ideas, you could try concatenating the search terms into a single regex (e.g. "(foo|baz)" ), use Matcher.find(int) to find each occurrence, use a HashMap to lookup the replacement strings and a StringBuilder to build the output String from input string substrings and replacements. (OK, this is not entirely trivial, and it depends on Pattern/Matcher handling alternates efficiently ... which I'm not sure is the case. But that's why you should compare the candidates carefully.)
In the (IMO unlikely) event that a simple alternative doesn't cut it, this wikipedia page has some leads which may help you to implement your own efficient match/replacer.
Isn't if frustrating when you ask a question and get a bunch of advice telling you to do a whole lot of work and figure it out for yourself?!
I say use replaceAll();
(I have no idea if it is, indeed, the most efficient, I just don't want you to feel like you wasted your money on this question and got nothing.)
[edit]
PS. After that, you might want to measure it.
[edit 2]
PPS. (and tell us what you found)