I need to build a pattern string according to an argument list. If the arguments are "foo", "bar", "data", then pattern should be: "?, ?, ?"
My code is:
List<String> args;
...
for(String s : args) {
pattern += "?,";
}
pattern = pattern.substring(0, pattern.length()-1);
It works fine, the only concern is, s is not used, it seems the code is a little dirty.
Any improvements for this?
I hope something like:
for(args.size()) {
...
}
But apparently there isn't..
You could use the class for loop with conditions:
for (int i = 0, s < args.size(); i++)
In this case, i is being used as a counting variable.
Other than that, there aren't any improvements to be made, although there isn't a need for improvements.
I usually do that in Haskell / Python style - naming it with "_". That way it's sort of obvious that variable is intentionally unused:
int n = 0;
for (final Object _ : iterable) { ++n; }
IntelliJ still complains, though :)
Another option is to use the Java Stream's api. It's pretty neat.
String output = args
.stream()
.map( string -> "?" ) // transform each string into a ?
.collect( Collectors.joining( "," ) ); // collect and join on ,
Why not use
for (int i = 0; i < args.size(); i++) {
...
}
You use the for each block if you want to make use of the contents of whatever you iterate on. For example, you use for (String s : args) if you know you're going to use each String value present in args. And it looks like here, you don't need the actual Strings.
If you have Guava around, you could try combining Joiner with Collections.nCopies:
Joiner.on(", ").join(Collections.nCopies(args.size(), "?"));
What you're looking for is a concept known as a "join". In stronger languages, like Groovy, it's available in the standard library, and you could write, for instance args.join(',') to get what you want. With Java, you can get a similar effect with StringUtils.join(args, ",") from Commons Lang (a library that should be included in every Java project everywhere).
Update: I obviously missed an important part with my original answer. The list of strings needs to be turned into question marks first. Commons Collections, another library that should always be included in Java apps, lets you do that with CollectionUtils.transform(args,new ConstantTransformer<String, String>("?")). Then pass the result to the join that I originally mentioned. Of course, this is getting a bit unwieldy in Java, and a more imperative approach might be more appropriate.
For the sake of comparison, the entire thing can be solved in Groovy and many other languages with something like args.collect{'?'}.join(','). In Java, with the utilities I mentioned, that goes more like:
StringUtils.join(
CollectionUtils.transform(args,
new ConstantTransformer<String, String>("?")),
",");
Quite a bit less readable...
I would suggest using StringBuilder along with the classic for loop here.
String pattern = "";
if (args.size() > 0) {
StringBuilder sb = new StringBuilder("?");
for(int i = 1; i < args.size(); i++) {
sb.append(", ?");
}
pattern = sb.toString();
}
If you don't want to use a for loop (as you stated not concise enough) use a while instead:
int count;
String pattern = "";
if ((count = args.size()) > 0) {
StringBuilder sb = new StringBuilder("?");
while (count-- > 1) {
sb.append(", ?");
}
pattern = sb.toString();
}
Also, see When to use StringBuilder?
At the point where you're concatenating in a loop - that's usually when the compiler can't substitute StringBuilder by itself.
Related
I've got a set of path strings:
/content/example-site/global/library/about/contact/thank-you.html
/content/example-site/global/corporate/about/contact/thank-you.html
/content/example-site/countries/uk/about/contact/thank-you.html
/content/example-site/countries/de/about/contact/thank-you.html
/content/example-site/others/about/contact/thank-you.html
...
(Often the paths are much longer than this)
As you can see it is difficult to notice the differences immediately. That's why I would like to highlight the relevant parts in the strings.
To find the differences I currently calculate the common prefix and suffix of all strings:
String prefix = getCommonPrefix(paths);
String suffix = getCommonSuffix(paths);
for (String path : paths) {
String relevantPath = path.substring(prefix.length(), path.length() - suffix.length());
// OUTPUT: prefix + "<b>" + relevantPath + "</b>" + suffix
}
For the prefix I'm using StringUtils.getCommonPrefix from Commons Lang.
For the suffix I couldn't find a utility (neither in Commons nor in Guava, the later has only one for exactly two strings). So I had to write my own - similar to the one from Commons Lang.
I'm now wondering, if I missed some function in one of the libraries - or
if there is an easy way with Java 8 streaming functions?
Here is a little hack, I do not say it is optimal nor nothing but it could be interesting to follow this path if no other option is available:
String[] reversedPaths = new String[paths.length];
for (int i = 0; i < paths.length; i++) {
reversedPaths[i] = StringUtils.reverse(paths[i]);
}
String suffix = StringUtils.reverse(StringUtils.getCommonPrefix(reversedPaths));
You could inverse each path, find the prefix of these inversed strings and inverse said prefix to get a common suffix.
Like this:
String commonSuffix = new StringBuffer(getCommonPrefix(paths.stream().map(path -> new StringBuffer(path).reverse().toString()).collect(Collectors.toList()))).reverse().toString();
I personally do not like this solution a lot, because you create a new StringBuffer for every path in your list. That is how java works some times, but it is at least ugly if not dangerous for performance. You could write you own function
public static String invert(String s) { // invert s using char[] }
if you want.
I have code as follows :
String s = "";
for (My my : myList) {
s += my.getX();
}
Findbugs always reports error when I do this.
I would use + if you are manually concatenating,
String word = "Hello";
word += " World!";
However, if you are iterating and concatenating I would suggest StringBuilder,
StringBuilder sb = new StringBuilder();
for (My my : myList) {
sb.append(my.getX());
}
The String object is immutable in Java. Each + means another object. You could use StringBuffer to minimize the amount of created objects.
Each time you do string+=string, it calls method like this:
private String(String s1, String s2) {
if (s1 == null) {
s1 = "null";
}
if (s2 == null) {
s2 = "null";
}
count = s1.count + s2.count;
value = new char[count];
offset = 0;
System.arraycopy(s1.value, s1.offset, value, 0, s1.count);
System.arraycopy(s2.value, s2.offset, value, s1.count, s2.count);
}
In case of StringBuilder, it comes to:
final void append0(String string) {
if (string == null) {
appendNull();
return;
}
int adding = string.length();
int newSize = count + adding;
if (newSize > value.length) {
enlargeBuffer(newSize);
}
string.getChars(0, adding, value, count);
count = newSize;
}
As you can clearly conclude, string + string creates a lot of overhead, and in my opinion should be avoided if possible. If you think using StringBuilder is bulky or to long you can just make a method and use it indirectly, like:
public static String scat(String... vargs) {
StringBuilder sb = new StringBuilder();
for (String str : vargs)
sb.append(str);
return sb.toString();
}
And use it like:
String abcd = scat("a","b","c","d");
In C# I'm told its about as same as string.Concat();. In your case it would be wise to write overload for scat, like:
public static String scat(Collection<?> vargs) {
StringBuilder sb = new StringBuilder();
for (Object str : vargs)
sb.append(str);
return sb.toString();
}
Then you can call it with:
result = scat(myList)
The compiler can optimize some thing such as
"foo"+"bar"
To
StringBuilder s1=new StringBuilder();
s1.append("foo").append("bar");
However this is still suboptimal since it starts with a default size of 16. As with many things though you should find your biggest bottle necks and work your way down the list. It doesn't hurt to be in the habbit of using a SB pattern from the get go though, especially if you're able to calculate an optimal initialization size.
Premature optimization can be bad as well as it often reduces readability and is usually completely unnecessary. Use + if it is more readable unless you actually have an overriding concern.
It is not 'always bad' to use "+". Using StringBuffer everywhere can make code really bulky.
If someone put a lot of "+" in the middle of an intensive, time-critical loop, I'd be annoyed. If someone put a lot of "+" in a rarely-used piece of code I would not care.
I would say use plus in the following:
String c = "a" + "b"
And use StringBuilder class everywhere else.
As already mentioned in the first case it will be optimized by the compiler and it's more readable.
One of the reasons why FindBugs should argue about using concatenation operator (be it "+" or "+=") is localizability. In the example you gave it is not so apparent, but in case of the following code it is:
String result = "Scanning found " + Integer.toString(numberOfViruses) + " viruses";
If this looks somewhat familiar, you need to change your coding style. The problem is, it will sound great in English, but it could be a nightmare for translators. That's just because you cannot guarantee that order of the sentence will still be the same after translation – some languages will be translated to "1 blah blah", some to "blah blah 3". In such cases you should always use MessageFormat.format() to build compound sentences and using concatenation operator is clearly internationalization bug.
BTW. I put another i18n defect here, could you spot it?
The running time of concatenation of two strings is proportional to the length of the strings. If it is used in a loop running time is always increasing. So if concatenation is needed in a loop its better to use StringBuilder like Anthony suggested.
I read a lot about using StringBuffer and String especially where concatenation is concerned in Java and whether one is thread safe or not.
So, in various Java methods, which should be used?
For example, in a PreparedStatement, should query be a StringBuffer:
String query = ("SELECT * " +
"FROM User " +
"WHERE userName = ?;");
try {
ps = connection.prepareStatement(query);
And then again, in a String utility methods like:
public static String prefixApostrophesWithBackslash(String stringIn) {
String stringOut = stringIn.replaceAll("'", "\\\\'");
return stringOut;
}
And:
// Removes a char from a String.
public static String removeChar(String stringIn, char c) {
String stringOut = ("");
for (int i = 0; i < stringIn.length(); i++) {
if (stringIn.charAt(i) != c) {
stringOut += stringIn.charAt(i);
}
}
return stringOut;
}
Should I be using StringBuffers? Especially where repalceAll is not available for such objects anyway.
Thanks
Mr Morgan.
Thanks for all the advice. StringBuffers have been replaced with StringBuilders and Strings replaced with StringBuilders where I've thought it best.
You almost never need to use StringBuffer.
Instead of StringBuffer you probably mean StringBuilder. A StringBuffer is like a StringBuilder except that it also offers thread safety. This thread safety is rarely needed in practice and will just cause your code to run more slowly.
Your question doesn't seem to be about String vs StringBuffer, but about using built-in methods or implementing the code yourself. If there is a built-in method that does exactly what you want, you should probably use it. The chances are it is much better optimized than the code you would write.
There is no simple answer (apart from repeating the mantra of StringBuilder versus StringBuffer ... ). You really have understand a fair bit about what goes on "under the hood" in order to pick the most efficient solution.
In your first example, String is the way to go. The Java compiler can generate pretty much optimal code (using a StringBuilder if necessary) for any expression consisting of a sequence of String concatenations. And, if the strings that are concatenated are all constants or literals, the compiler can actually do the concatenation at compile time.
In your second example, it is not entirely clear whether String or StringBuilder would be better ... or whether they would be roughly equivalent. One would need to look at the code of the java.util.regex.Matcher class to figure this out.
EDIT - I looked at the code, and actually it makes little difference whether you use a String or StringBuilder as the source. Internally the Matcher.replaceAll method creates a new StringBuilder and fills it by appending chunks from the source String and the replacement String.
In your third example, a StringBuilder would clearly be best. A current generation Java compiler is not able to optimize the code (as written) to avoid creating a new String as each character is added.
For the below segment of code
// Removes a char from a String.
public static String removeChar(String stringIn, char c) {
String stringOut = ("");
for (int i = 0; i < stringIn.length(); i++) {
if (stringIn.charAt(i) != c) {
stringOut += stringIn.charAt(i);
}
}
return stringOut;
}
You could just do stringIn.replaceAll(c+"","")
Even in MT code, it's unusual to have multiple threads append stuff to a string. StringBuilder is almost always preferred to StringBuffer.
Modern compilers optimize the code already. So some String additions will be optimized to use StringBuilder and we can keep the String additions if we think, it increases readibility.
Example 1:
String query = ("SELECT * " +
"FROM User " +
"WHERE userName = ?;");
will be optimized to somthing like:
StringBuiler sb = new StringBuilder();
sb.append("SELECT * ");
sb.append("FROM User ");
sb.append("WHERE userName = ?;");
String query = sb.toString();
Example 2:
String numbers = "";
for (int i = 0;i < 20; i++)
numbers = numbers + i;
This can't be optimized and we should use a StringBuilder in code.
I made this observation for SUN jdk1.5+. So for older Java versions or different jdks it can be different. There it could be save to always code StringBuilder (or StringBuffer for jdk 1.4.2 and older).
For cases which can be considered single threaded, the best would be StringBuilder. It does not add any synchronization overhead, while StringBuffer does.
String concatenation by '+' operator is "good" only when you're lazy to use StringBuilder or just want to keep the code easily readable and it is acceptable from performance point of view, like in startup log message "LOG.info("Starting instance " + inst_id + " of " + app_name);"
(java 1.5)
I have a need to build up a String, in pieces. I'm given a set of (sub)strings, each with a start and end point of where they belong in the final string. Was wondering if there were some canonical way of doing this. This isn't homework, and I can use any licensable OSS, such as jakarta commons-lang StringUtils etc.
My company has a solution using a CharBuffer, and I'm content to leave it as is (and add some unit tests, of which there are none (?!)) but the code is fairly hideous and I would like something easier to read.
As I said this isn't homework, and I don't need a complete solution, just some pointers to libraries or java classes that might give me some insight. The String.Format didn't seem QUITE right...
I would have to honor inputs too long and too short, etc. Substrings would be overlaid in the order they appear (in case of overlap).
As an example of input, I might have something like:
String:start:end
FO:0:3 (string shorter than field)
BAR:4:5 (String larger than field)
BLEH:5:9 (String overlays previous field)
I'd want to end up with
FO BBLEH
01234567890
(Edit: To all - StringBuilder (and specifically, the "pre-allocate to a known length, then use .replace()" theme) seems to be what I'm thinking of. Thanks to all who suggested it!)
StringBuilder output = new StringBuilder();
// for each input element
{
while (output.length() < start)
{
output.append(' ');
}
output.replace(start, end, string);
}
You could also establish the final size of output before inserting any string into it. You could make a first pass through the input elements to find the largest end. This will be the final size of output.
char[] spaces = new char[size];
Arrays.fill(spaces, ' ');
output.append(spaces);
Will StringBuilder do?
StringBuilder sb = new StringBuilder();
sb.setLength(20);
sb.replace(0, 3, "FO");
sb.replace(4, 5, "BAR");
sb.replace(5, 9, "BLEH");
System.out.println("[" + sb.toString().replace('\0', ' ') + "]");
// prints "[FO BBLEH ]"
If I understand your requirements correctly, you should be able to do this with the standard java.lang.StringBuilder:
public class StringAssembler
{
private final StringBuilder builder = new StringBuilder();
public void addPiece(String input, int start, int end)
{
final String actualInput = input.substring(0, end-start+1);
builder.insert(start, actualInput);
}
public String getFullString()
{
return builder.toString();
}
}
In particular, I don't think that the end parameter is strictly necessary, in that all it can do is change the length of the input string, hence the two steps in my addPiece method.
Note that this is not tested, and probably doesn't do the right thing in edge cases, but it should give you something to start from.
You can use StringUtils.rightPad(str, size) to add the necessary number of spaces. And you can use the following to strip the unneeded characters:
if (str.length() > size) {
str = str.substring(size);
}
I have a string which contains digits and letters. I wish to split the string into contiguous chunks of digits and contiguous chunks of letters.
Consider the String "34A312O5M444123A".
I would like to output:
["34", "A", "312", "O", "5", "M", "444123", "A"]
I have code which works and looks like:
List<String> digitsAsElements(String str){
StringBuilder digitCollector = new StringBuilder();
List<String> output = new ArrayList<String>();
for (int i = 0; i < str.length(); i++){
char cChar = str.charAt(i);
if (Character.isDigit(cChar))
digitCollector.append(cChar);
else{
output.add(digitCollector.toString());
output.add(""+cChar);
digitCollector = new StringBuilder();
}
}
return output;
}
I considered splitting str twice to get an array containing all the numbers chunks and an array containing the all letters chunks. Then merging the results. I shied away from this as it would harm readability.
I have intentionally avoided solving this with a regex pattern as I find regex patterns to be a major impediment to readability.
Debuggers don't handle them well.
They interrupt the flow of someone reading source code.
Overtime regex's grow organically and become monsters.
They are deeply non intuitive.
My questions are:
How could I improve the readability of the above code?
Is there a better way to do this? A Util class that solves this problem elegantly.
Where do you draw the line between using a regEx and coding something simpilar to what I've written above?
How do you increase the readability/maintainability of regExes?
For this particular task I'd always use a regex instead of hand-writing something similar. The code you have given above is, at least to me, less readable than a simple regular expression (which would be (\d+|[^\d]+) in this case, as far as I can see).
You may want to avoid writing regular expressions that exceed a few lines. Those can be and usually are unreadable and hard to understand, but so is the code they can be replaced with! Parsers are almost never pretty and you're usually better off reading the original grammar than trying to make sense of the generated (or handwritten) parser. Same goes (imho) for regexes which are just a concise description of a regular grammar.
So, in general I'd say banning regexes in favor of code like you've given in your question sounds like a terribly stupid idea. And regular expressions are just a tool, nothing less, nothing more. If something else does a better job of text parsing (say, a real parser, some substring magic, etc.) then use it. But don't throw away possibilities just because you feel uncomfortable with them – others may have less problems coping with them and all people are able to learn.
EDIT: Updated regex after comment by mmyers.
For a utility class, check out java.util.Scanner. There are a number of options in there as to how you might go about solving your problem. I have a few comments on your questions.
Debuggers don't handle them (regular expressions) well
Whether a regex works or not depends on whats in your data. There are some nice plugins you can use to help you build a regex, like QuickREx for Eclipse, does a debugger actually help you write the right parser for your data?
They interrupt the flow of someone reading source code.
I guess it depends on how comfortable you are with them. Personally, I'd rather read a reasonable regex than 50 more lines of string parsing code, but maybe that's a personal thing.
Overtime regex's grow organically and become monsters.
I guess they might, but that's probably a problem with the code they live in becoming unfocussed. If the complexity of the source data is increasing, you probably need to keep an eye on whether you need a more expressive solution (maybe a parser generator like ANTLR)
They are deeply non intuitive.
They're a pattern matching language. I would say they're pretty intuitive in that context.
How could I improve the readability of the above code?
Not sure, apart from use a regex.
Is there a better way to do this? A Util class that solves this problem elegantly.
Mentioned above, java.util.Scanner.
Where do you draw the line between using a regEx and coding something simpilar to what I've written above?
Personally I use regex for anything reasonably simple.
How do you increase the readability/maintainability of regExes?
Think carefully before extending,take extra care to comment up the code and the regex in detail so that it's clear what you're doing.
Would you be willing to use regexes if it meant solving the problem in one line of code?
// Split at any position that's either:
// preceded by a digit and followed by a non-digit, or
// preceded by a non-digit and followed by a digit.
String[] parts = str.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
With the comment to explain the regex, I think that's more readable than any of the non-regex solutions (or any of the other regex solutions, for that matter).
I would use something like this (warning, untested code). For me this is a lot more readable than trying to avoid regexps. Regexps are a great tool when used in right place.
Commenting methods and providing examples of input and output values in comments also helps.
List<String> digitsAsElements(String str){
Pattern p = Pattern.compile("(\\d+|\\w+)*");
Matcher m = p.matcher(str);
List<String> output = new ArrayList<String>();
for(int i = 1; i <= m.groupCount(); i++) {
output.add(m.group(i));
}
return output;
}
I'm not overly crazy about regex myself, but this seems like a case where they will really simplify things. What you might want to do is put them into the smallest method you can devise, name it aptly, and then put all the control code in another method.
For instance, if you coded a "Grab block of numbers or letters" method, the caller would be a very simple, straight-forward loop just printing the results of each call, and the method you were calling would be well-defined so the intention of the regex would be clear even if you didn't know anything about the syntax, and the method would be bounded so people wouldn't be likely to muck it up over time.
The problem with this is that the regex tools are so simple and well-adapted to this use that it's hard to justify a method call for this.
Since no one seems to have posted correct code yet, I'll give it a shot.
First the non-regex version. Note that I use the StringBuilder for accumulating whichever type of character was seen last (digit or non-digit). If the state changes, I dump its contents into the list and start a new StringBuilder. This way consecutive non-digits are grouped just like consecutive digits are.
static List<String> digitsAsElements(String str) {
StringBuilder collector = new StringBuilder();
List<String> output = new ArrayList<String>();
boolean lastWasDigit = false;
for (int i = 0; i < str.length(); i++) {
char cChar = str.charAt(i);
boolean isDigit = Character.isDigit(cChar);
if (isDigit != lastWasDigit) {
if (collector.length() > 0) {
output.add(collector.toString());
collector = new StringBuilder();
}
lastWasDigit = isDigit;
}
collector.append(cChar);
}
if (collector.length() > 0)
output.add(collector.toString());
return output;
}
Now the regex version. This is basically the same code that was posted by Juha S., but the regex actually works.
private static final Pattern DIGIT_OR_NONDIGIT_STRING =
Pattern.compile("(\\d+|[^\\d]+)");
static List<String> digitsAsElementsR(String str) {
// Match a consecutive series of digits or non-digits
final Matcher matcher = DIGIT_OR_NONDIGIT_STRING.matcher(str);
final List<String> output = new ArrayList<String>();
while (matcher.find()) {
output.add(matcher.group());
}
return output;
}
One way I try to keep my regexes readable is their names. I think DIGIT_OR_NONDIGIT_STRING conveys pretty well what I (the programmer) think it does, and testing should make sure that it really does what it's meant to do.
public static void main(String[] args) {
System.out.println(digitsAsElements( "34A312O5MNI444123A"));
System.out.println(digitsAsElementsR("34A312O5MNI444123A"));
}
prints:
[34, A, 312, O, 5, MNI, 444123, A]
[34, A, 312, O, 5, MNI, 444123, A]
Awww, someone beat me to code. I think the regex version is easier to read/maintain. Also, note the difference in output between the 2 implementations vs the expected output ...
Output:
digitsAsElements1("34A312O5MNI444123A") = [34, A, 312, O, 5, M, , N, , I, 444123, A]
digitsAsElements2("34A312O5MNI444123A") = [34, A, 312, O, 5, MNI, 444123, A]
Expected: [34, A, 312, O, 5, MN, 444123, A]
Compare:
DigitsAsElements.java:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DigitsAsElements {
static List<String> digitsAsElements1(String str){
StringBuilder digitCollector = new StringBuilder();
List<String> output = new ArrayList<String>();
for (int i = 0; i < str.length(); i++){
char cChar = str.charAt(i);
if (Character.isDigit(cChar))
digitCollector.append(cChar);
else{
output.add(digitCollector.toString());
output.add(""+cChar);
digitCollector = new StringBuilder();
}
}
return output;
}
static List<String> digitsAsElements2(String str){
// Match a consecutive series of digits or non-digits
final Pattern pattern = Pattern.compile("(\\d+|\\D+)");
final Matcher matcher = pattern.matcher(str);
final List<String> output = new ArrayList<String>();
while (matcher.find()) {
output.add(matcher.group());
}
return output;
}
/**
* #param args
*/
public static void main(String[] args) {
System.out.println("digitsAsElements(\"34A312O5MNI444123A\") = " +
digitsAsElements1("34A312O5MNI444123A"));
System.out.println("digitsAsElements2(\"34A312O5MNI444123A\") = " +
digitsAsElements2("34A312O5MNI444123A"));
System.out.println("Expected: [" +
"34, A, 312, O, 5, MN, 444123, A"+"]");
}
}
you could use this class in order to simplify your loop:
public class StringIterator implements Iterator<Character> {
private final char[] chars;
private int i;
private StringIterator(char[] chars) {
this.chars = chars;
}
public boolean hasNext() {
return i < chars.length;
}
public Character next() {
return chars[i++];
}
public void remove() {
throw new UnsupportedOperationException("Not supported.");
}
public static Iterable<Character> of(String string) {
final char[] chars = string.toCharArray();
return new Iterable<Character>() {
#Override
public Iterator<Character> iterator() {
return new StringIterator(chars);
}
};
}
}
Now you can rewrite this:
for (int i = 0; i < str.length(); i++){
char cChar = str.charAt(i);
...
}
with:
for (Character cChar : StringIterator.of(str)) {
...
}
my 2 cents
BTW this class is also reusable in other context.