What is "string bashing" and why is it bad? - java

My boss keeps using the term "string bashing" (we're a Java shop) and usually makes an example out of me whenever I ask him anything (as if, I'm supposed to know it already). I Googled the term only to find results pertaining to theoretical physics and string theory.
I am guessing it has something to do with using String/StringBuilders incorrectly or not in keeping with best practices, but for the life of me, I can't figure out what it is.

"String bashing" is a slang term for cutting up strings and manipulating them: splitting, joining, inserting, tokenizing, parsing, etc..
It's not inherently bad (despite the connotation of "bashing"), but as you point out, in Java, one needs to be careful not to use String when StringBuilder would be more efficient.

Why don't you ask your boss for an example of string bashing.
Don't forget to ask him for the correct way of refactoring the examples he gives you.

Out of context, "string bashing" doesn't really have any meaning in itself. It's not a buzz word for any good or bad behaviour. It would just mean "bashing strings", as in using string operations.
Whether that is good or bad depends on what you are doing, and the role of the strings would not really be important. There are good and bad ways of handling any kind of data.
Sometimes "bashing strings" is actually the best solution. Consider for example that you want to pick out the first three characters of a string. You could create a regular expression that isolates the characters, but that would certainly be overkill as there is a simple string operation that can do the same, which is a lot faster and easier to maintain.

Effective Java has an item about using strings: "Item 50: Avoid strings where other types are more appropriate". Also on stackoverflow: "Stringly typed".

A guess: It might imply something related to creation of unnecessary temporary objects, and in this particular case Strings. For example, if you're constructing a String token by token then it's usually a good idea to use a StringBuilder. If the String is not built using a builder, each concatenation will cause another temporary object to be created (and later garbage collected).
In modern VMs (I'm thinking HotSpot 1.5 or 1.6) this is rarely a problem unless you're in performance critical code or you're building long strings, e.g. in for loops.
Only a guess; might be better to ask what he or she means? I've never heard the term before.

There are a few results on google which refer to string bashing in this context. They don't appear to refer to the concern about the inefficent temporaries and using StringBuilder.
Instead, it appears to refer to simplistic string parsing. I.e. doing stuff like checking for substrings, slicing the string, etc. In particular, it appears to have the implication of it being a hacky solution to the problem.
It might be seen badly because you should either use real parsing or obtain the data in a non-string format.

Related

Why Wrapper Classes do not have pool Similar to Stringpool?

Integer, Character, Double, etc. -- all these are immutable classes like String. String has Stringpool to save memory but why don't these wrappers have similar pools?
I have checked: Integer has a similar pool only up to 127, but not more than that.
Unless someone can find a design document from Gosling, et. al., circa 1994 or so that specifically addresses this, it's impossible to say for certain.
One likely reason is that the complexity and overhead weren't deemed worth the benefit. Strings are A) a lot bigger and B) a lot more common than Integer, Long, and such, as mostly people use primitives whenever they can, only using the wrappers where they can't avoid it.
IMO, String is the most commonly used type in java. As an argument to load a class, a param to connect to DB/network connections, to store (almost) each and every thing - the list is long. Usage scenario for rest other primitives/wrapper types combined together would also be negligible compared to String - in any application.
If used in an un-optimized manner (e.g. implemented without Stringpool), performance would be up for a toss - hence it does make sense to have a pool of (only) String.

Is a new string created every time replaceAll() is used on a String?

So I do know that java Strings are immutable.
There are a bunch of methods that replace characters in a string in java.
So every time these methods are called, would it involve creation of a brand new String, therefore increasing the space complexity, or would replacement be done in the original String itself. I'm a little confused on this concept as to whether each of these replace statements in my code would be generating new Strings each time, and thus consuming more memory?
You noted correctly that String objects in Java are immutable. The only case when replacement, substring, etc. methods do not create a new object is when the replacement is a no-op. For example, if you ask to replace all 'x' characters in a "Hello, world!" string, there would be no new String object created. Similarly, there would be no new object when you call str.substring(0), because the entire string is returned. In all other cases, when the return value differs from the original, a new object is created.
Yes. You have noted it correctly. That immutability of String type has some consequences.
That is why the designers of Java have bring another type that should be used when you perform operations with char sequences.
A class called StringBuilder, should be used when you perform lot of operations that involve characters manipulations like replace. Off course it it more robust and requires more attention to details but that is all when you care about the performance.
So the immutability of String type do not increase memory usage. What increase it is wrong usage of String type.
They generate new ones each time; that is the corollary to being immutable.
It's true, in some sense, that it increases the 'space complexity', in that it uses more memory than the most efficient possible algorithms for replacement, but it's not as bad as it sounds; the transient objects created during the replaceAll operation and others like it are garbage collected very quickly; java is very efficient at garbage collecting transient objects. See http://www.infoq.com/articles/Java_Garbage_Collection_Distilled for an interesting writeup on some garbage collection basics.
It is true that it will return a new String , but unless the call is part of some giant loop or recursive function, one need not worry too much.
But if you purposefully wanted to crash your system, I'm sure you can think up some way.
JDK misses mutating operations for character sequences, i.e. StringBuilder for some reason does not implement replacement functionality.
A possible option would be to use third party libraries, i.e. a MutableString. It is available in Maven Central.

Java String Comparison: style choice or optimization?

I've been taking a look at some GWT code written by various people and there are different ways of comparing strings. I'm curious if this is just a style choice, or if one is more optimized than another:
"".equals(myString);
myString.equals("");
myString.isEmpty();
Is there a difference?
"".equals(myString);
will not throw a NullPointerException if myString is null. That is why a lot of developers use this form.
myString.isEmpty();
is the best way if myString is never null, because it explains what is going on. The compiler may optimize this or myString.equals(""), so it is more of a style choice. isEmpty() shows your intent better than equals(""), so it is generally preferred.
Beware that isEmpty() was added in Java 6, and, unfortunately, there are still people who complain pretty loudly if you don't support Java 1.4.
apache StringUtils provides some convenience methods for, well, String manipulation.
http://commons.apache.org/lang/api/org/apache/commons/lang/StringUtils.html#isBlank(java.lang.CharSequence)
check out that method and associated ones.
myString.isEmpty() is probably best if you are working on a recent version of Java (1.6). It is likely to perform better than myString.equals("") as it only needs to examine one string.
"".equals(myString) has the property of not throwing a null pointer exception if myString is null. However for that reason alone I'd avoid it as it is usually better to fail fast if you hit an unexpected condition. Otherwise some little bug in the future will be very difficult to track down.....
myString.equals("") is the most natural / idiomatic approach for people wanting to keep compatibility with older Java versions, or who just want to be very explicit about what they are comparing to.
Both of the options using "" may require the creation of a temporary String object but the .isEmpty() function shouldn't.
If they bothered to put the .isEmpty() function in I say it is probably best to use it!

Standard API way to check if one array is contained in another

I have two byte[] arrays in a method like this:
private static boolean containsBytes(byte[] body, byte[] checker){
//Code you do not want to ever see here.
}
I want to, using the standard API as much as possible, determine if the series contained in the checker array exists anywhere in the body array.
Right now I'm looking at some nasty code that did a hand-crafted algorithm. The performance of the algorithm is OK, which is about all you can say for it. I'm wondering if there is a more standard api way to accomplish it. Otherwise, I know how to write a readable hand-crafted one.
To get a sense of scale here, the checker array would not be larger than 48 (probably less) and the body might be a few kb large at most.
Not in the standard library (like Jon Skeet said, probably nothing there that does this) but Guava could help you here with its method Bytes.indexOf(byte[] array, byte[] target).
boolean contained = Bytes.indexOf(body, checker) != -1;
Plus, the same method exists in the classes for the other primitive types as well.
I don't know of anything in the standard API to help you here. There may be something in a third party library, although it would potentially need to be implemented repeatedly, once for each primitive type :(
EDIT: I was going to look for Boyer-Moore, but this answer was added on my phone, and I ran out of time :)
Depending on the data and your requirements, you may find that a brute force approach is absolutely fine - and a lot simpler to implement than any of the fancier algorithms available. The simple brute force approach is generally my first port of call - it often turns out to be perfectly adequate :)
You probably already know this, but what you're trying to (re-)implement is basically a string search:
http://en.wikipedia.org/wiki/String_searching_algorithm
The old code might in fact be an implementation of one of the string search algorithms; for better performance, it might be good to implement one of the other algorithms. You didn't mention how often this method is going to be called, which would help to decide whether it's worth doing that.
The collections framework can both cheaply wrap an array in the List interface and search for a sublist. I think this would work reasonably well:
import java.util.Arrays;
import java.util.Collections;
boolean found = Collections.indexOfSubList(Arrays.asList(body), Arrays.asList(checker) >= 0;

When is a Java method name too long? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In the last weeks I've seen some guys using really long names for a Method or Class (50 characters), this is usually under the premise that it improves readability, my opinion is that a long name like this is an indicator that we are trying to do a lot or too much in a method class if we need such a long name, however I wanted to know what do you guys think about it.
An Example is:
getNumberOfSkinCareEligibleItemsWithinTransaction
A name in Java, or any other language, is too long when a shorter name exists that equally conveys the behavior of the method.
Some techniques for reducing the length of method names:
If your whole program, or class, or module is about 'skin care items' you can drop skin care. For example, if your class is called SkinCareUtils,
that brings you to getNumberOfEligibleItemsWithinTransaction
You can change within to in, getNumberOfEligibleItemsInTransaction
You can change Transaction to Tx, which gets you to getNumberOfEligibleItemsInTx.
Or if the method accepts a param of type Transaction you can drop the InTx altogether: getNumberOfEligibleItems
You change numberOf by count: getEligibleItemsCount
Now that is very reasonable. And it is 60% shorter.
Just for a change, a non-subjective answer: 65536 characters.
A.java:1: UTF8 representation for string "xxxxxxxxxxxxxxxxxxxx..." is too long
for the constant pool
;-)
I agree with everyone: method names should not be too long. I do want to add one exception though:
The names of JUnit test methods, however, can be long and should resemble sentences.
Why?
Because they are not called in other code.
Because they are used as test names.
Because they then can be written as sentences describing requirements. (For example, using AgileDox)
Example:
#Test
public void testDialogClosesDownWhenTheRedButtonIsPressedTwice() {
...
}
See "Behavior Driven Design" for more info on this idea.
Context "...WithinTransaction" should be obvious. That's what object-orientation is all about.
The method is part of a class. If the class doesn't mean "Transaction" -- and if it doesn't save you from having to say "WithinTransaction" all the time, then you've got problems.
Java has a culture of encouraging long names, perhaps because the IDEs come with good autocompletion.
This site says that the longest class name in the JRE is InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonWindowNotFocusedState which is 92 chars long.
As for longest method name I have found this one supportsDataDefinitionAndDataManipulationTransactions, which is 52 characters.
Never use a long word when a diminutive one will do.
I don't think your thesis of "length of method name is proportional to length of method" really holds water.
Take the example you give: "getNumberOfSkinCareEligibleItemsWithinTransaction". That sounds to me like it does just one thing: it counts the number of items in a transaction that fall into a certain category. Of course I can't judge without seeing the actual code for the method, but that sounds like a good method to me.
On the other hand, I've seen lots of methods with very short and concise names that do way to much work, like "processSale" or the ever popular "doStuff".
I think it would be tough to give a hard-and-fast rule about method name length, but the goal should be: long enough to convey what the function does, short enough to be readable. In this example, I'd think "getSkinCareCount" would probably have been sufficient. The question is what you need to distinguish. If you have one function that counts skin-care-eligible items in transactions and another that counts skin-care-eligible items in something else, then "withinTransactions" adds value. But if it doesn't mean anything to talk about such items outside of a transaction, then there's no point cluttering up the name with such superfluous information.
Two, I think it's wildly unrealistic to suppose that a name of any manageable length will tell you exactly what the function does in all but the most trivial cases. A realistic goal is to make a name that gives a reader a clue, and that can be remembered later. Like, if I'm trying to find the code that calculates how much antimatter we need to consume to reach warp speed, if I look at function names and see "calibrateTransporter", "firePhasers", and "calcAntimatterBurn", it's pretty clear that the first two aren't it but the third one might be. If I check and find that that is indeed the one I'm looking for, it will be easy to remember that when I come back tomorrow to work on this problem some more. That's good enough.
Three, long names that are similar are more confusing than short names. If I have two functions called "calcSalesmanPay" and "calcGeekPay", I can make a good guess which is which at a quick glance. But if they are called "calculateMonthlyCheckAmountForSalesmanForExportToAccountingSystemAndReconciliation" and "calculateMonthlyCheckAmountForProgrammersForExportToAccountingSystemAndReconciliation", I have to study the names to see which is which. The extra information in the name is probably counter-productive in such cases. It turns a half-second think into a 30-second think.
I tend use the haiku rule for names:
Seven syllable class names
five for variables
seven for method and other names
These are rules of thumb for max names. I violate this only when it improves readability. Something like recalculateMortgageInterest(currentRate, quoteSet...) is better than recalculateMortgageInterestRate or recalculateMortgageInterestRateFromSet since the fact that it involves rates and a set of quotes should be pretty clear from the embedded docs like javadoc or the .NET equivalent.
NOTE: Not a real haiku, as it is 7-5-7 rather than 5-7-5. But I still prefer calling it haiku.
Design your interface the way you want it to be, and make the implementation match.
For example, maybe i'd write that as
getTransaction().getItems(SKIN_CARE).getEligible().size()
or with Java 8 streams:
getTransaction().getItems().stream()
.filter(item -> item.getType() == SKIN_CARE)
.filter(item -> item.isEligible())
.count();
My rule is as follows: if a name is so long that it has to appear on a line of its own, then it is too long. (In practice, this means I'm rarely above 20 characters.)
This is based upon research showing that the number of visible vertical lines of code positively correlates with coding speed/effectiveness. If class/method names start significantly hurting that, they're too long.
Add a comment where the method/class is declared and let the IDE take you there if you want a long description of what it's for.
The length of the method itself is probably a better indicator of whether it's doing too much, and even that only gives you a rough idea. You should strive for conciseness, but descriptiveness is more important. If you can't convey the same meaning in a shorter name, then the name itself is probably okay.
When you are going to write a method name next time , just think the bellow quote
"The man who is going to maintain your code is a phyco who knows where you stay"
That method name is definitely too long. My mind tends to wander when I am reading such sized method names. It's like reading a sentence without spaces.
Personally, I prefer as few words in methods as possible. You are helped if the package and class name can convey meaning. If the responsibility of the class is very concise, there is no need for a giant method name. I'm curious why "WithinTransaction" on there.
"getNumberOfSkinCareEligibleItemsWithinTransaction" could become:
com.mycompany.app.product.SkinCareQuery.getNumEligibleItems();
Then when in use, the method could look like "query.getNumEligibleItems()"
A variable name is too long when a shorter name will allow for better code readability over the entire program, or the important parts of the program.
If a longer name allows you to convey more information about a value. However, if a name is too long, it will clutter the code and reduce the ability to comprehend the rest of the code. This typically happens by causing line wraps and pushing other lines of code off the page.
The trick is determining which will offer better readability. If the variable is used often or several times in a short amount of space, it may be better to give it a short name and use a comment clarify. The reader can refer back to the comment easily. If the variable is used often throughout the program, often as a parameter or in other complicated operations, it may be best to trim down the name, or use acronyms as a reminder to the reader. They can always reference a comment by the variable declaration if they forget the meaning.
This is not an easy trade off to make, since you have to consider what the code reader is likely to be trying to comprehend, and also take into account how the code will change and grow over time. That's why naming things is hard.
Readability is why it's acceptable to use i as a loop counter instead of DescriptiveLoopCounterName. Because this is the most common use for a variable, you can spend the least amount of screen space explaining why it exists. The longer name is just going to waste time by making it harder to understand how you are testing the loop condition or indexing into an array.
On the other end of the spectrum, if a function or variable is used rarely as in a complex operation, such as being passed to a multi-parameter function call, you can afford to give it an overly descriptive name.
As with any other language: when it no longer describes the single action the function performs.
I'd say use a combination of the good answers and be reasonable.
Completely, clearly and readably describe what the method does.
If the method name seems too long--refactor the method to do less.
It's too long when the name of the method wraps onto another line and the call to the method is the only thing on the line and starts pretty close to the margin. You have to take into account the average size of the screen of the people who will be using it.
But! If the name seems too long then it probably is too long. The way to get around it is to write your code in such a way that you are within a context and the name is short but duplicated in other contexts. This is like when you can say "she" or "he" in English instead of someone's full name.
It's too long when it too verbosively explains what the thing is about.
For example, these names are functionally equivalent.
in Java: java.sql.SQLIntegrityConstraintViolationException
in Python/Django: django.db.IntegrityError
Ask yourself, in a SQL/db package, how many more types of integrity errors can you come up with? ;)
Hence db.IntegrityError is sufficient.
An identifier name is too long when it exceeds the length your Java compiler can handle.
There are two ways or points of view here: One is that it really doesn't matter how long the method name is, as long as it's as descriptive as possible to describe what the method is doing (Java best practices basic rule). On the other hand, I agree with the flybywire post. We should use our intelligence to try to reduce as much as possible the method name, but without reducing it's descriptiveness. Descriptiveness is more important :)
A name is too long if it:
Takes more than 1 second to read
Takes up more RAM than you allocate for your JVM
Is something absurdly named
If a shorter name makes perfect sense
If it wraps around in your IDE
Honestly the name only needs to convey its purpose to the the Developers that will utilize it as a public API method or have to maintain the code when you leave. Just remember KISS (keep it simple stupid)

Categories

Resources