Connect multiple strings efficiently

Connect multiple strings efficiently - java

I've got an interface IO, that offers two methods, void in(String in) and String out(). I've implemented that in a first, naive, version:
private String tmp="";
public void in(String in){
tmp=tmp+in;
}
public String out(){
return tmp;
}
I know this is an horrible implementation, if you have multiple, very long Strings. You need make a new String with length = tmp.length+in.length, copy tmp, copy in. And then repeat that again for evey inserted String. But what is an better implementation for that?
private List<String> tmp= new ArrayList<>() //maybe use an different list?
public void in(String in){
tmp.add(in);
}
public String out(){
return connect(tmp);
}
private String connect(List<String> l){
if(l.size()==1) return l.get(0);
List<String> half = new ArrayList<>();
for(int i=0;i<l.size();i+=2){
half.add(l.get(i)+l.get(i+1)); \\I have to check, if i+1 is valid, but this is just a draft ;)
}
return connect(half);
}
This is a bit better, it has to make the same number of String-connections, but the Strings are going to be smaller by averange. But it has an giant offset, and i'm not sure it's worth it. There schould be an easier option than this imho, too...

You may be looking for a StringBuilder.
private StringBuilder tmp = new StringBuilder();
public void in(String in) {
tmp.append(in);
}
public String out() {
return tmp.toString();
}

The standard library provides a class specifically for efficient string concatenation, the StringBuilder:
https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html
Note that the compiler will actually desugar your string "additions" into expressions involving StringBuilders, and in a lot of simple/naïve cases it will also optimize the code to make use of the append() method instead of constantly creating new StringBuilders. In your case, however, it is definitely a good idea to explicitly use a StringBuilder.
As for your adventurous attempt at optimizing the concatenation, I honestly don't think you will notice any improvements over the naïve solution, and clean code is always preferable to "slightly faster code", unless clock cycles are extremely expensive.

From Java Doc:
If your text can change and will only be accessed from a single
thread, use a StringBuilder because StringBuilder is unsynchronized.
If your text can changes, and will be accessed from multiple threads,
use a StringBuffer because StringBuffer is synchronous.
In your case StringBuilder will work just fine.
The StringBuilder class should generally be used in preference to this
one, as it supports all of the same operations but it is faster, as it
performs no synchronization.
http://download.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html

Related

In Java Is it efficient to use new String() instead of double quotes when string is really large and not using again?

I have so many use cases where I have to initialize a large string and not use the same string anywhere else.
//Code-1
public class Engine{
public void run(){
String q = "fjfjljkljflajlfjalkjdfkljaflkjdllllllllllllllsjfkjdaljdfkdjfnvnvnrrukvnfknv";
//do something
}
}
I call this run method very few times.
In Code-1 the string fjfjljkljflaj..... will get added to the string pool and will never get collected by GC. So I am thinking to initialize with the new operator.
//Code-2
public class Engine{
public void run(){
String q = new String("fjfjljkljflajlfjalkjdfkljaflkjdllllllllllllllsjfkjdaljdfkdjfnvnvnrru");
//do something
}
}
Will 2nd code save some memory or there will be other factors to consider to decide which one is efficient?

first thing -- if we create with the new String() object, the constant won't be created in the literal pool unless we call the intern() method.
In terms of optimization, we should use the String literal notation when possible. It is easier to read and it gives the compiler a chance to optimize our code.

Surround string with another string

I there any utils method in Java that would enable me to surround a string with another string? Something like:
surround("hello","%");
which would return "%hello%"
I need just one method so the code would be nicer then adding prefix and suffix. Also I don't want to have a custom utils class if it's not necessary.

String.format can be used for this purpose:
String s = String.format("%%%s%%", "hello");

No but you can always create a method to do this:
public String surround(String str, String surroundingStr) {
StringBuffer buffer = new StringBuffer();
buffer.append(surroundingStr).append(str).append(surroundingStr);
return buffer.toString();
}
You have another method of doing it but Do not do this if you want better performance:-
public String surround(String str, String surroundingStr){
return surroundingStr + str + surroundingStr;
}
Why not use the second method?
As we all know, Strings in Java are immutable. When you concatinate strings thrice, it creates two new string objects apart from your original strings str and surroundingStr. And so a total of 4 string objects are created:
1. str
2. surroundingStr
3. surroundingStr + str
4. (surroundingStr + str) + surroundingStr
And creating of objects do take time. So for long run, the second method will downgrade your performance in terms of space and time. So it's your choice what method is to be used.
Though this is not the case after java 1.4
as concatinating strings with + operator uses StringBuffer in the background. So using the second method is not a problem if your Java version is 1.4 or above. But still, if you wanna concatinate strings is a loop, be careful.
My suggestion:
Either use StringBuffer of StringBuilder.

Not that i know of, but as already commented, its a single line piece of code that you could write yourself.
private String SurroundWord(String word, String surround){
return surround + word + surround;
}
Do note that this will return a New String object and not edit the original string.

Create a new method:
public String surround(String s, String surr){
return surr+s+surr;
}

Tested the following and returns %hello%
public static void main (String[] args) throws java.lang.Exception
{
System.out.println(surround("hello", "%"));
}
public static String surround(String s, String sign) {
return sign + s + sign;
}

StringUtils.wrap(str,wrapWith) is what you are looking for.
If apache common utils is already a part of dependency, then you can use it. Otherwise as others already mentioned. It's better to add to your base. Not a big deal
https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java

Is it a bad practice to use arrays as parameter(s) to return multiple values

I sometimes (actually, often) find myself using a one-element array to return multiple values from a method. Something like this:
public static int foo(int param1, int param2[], String param3[])
{
// method body
....
// set return values
param2[0] = <some value>;
param3[0] = <some value>;
return <some value>;
}
Is this a bad practice? (It seems like it is because some of my friends said they didn't know what it was doing for 2 seconds!)
But the reason I used this in the first place was because this looked closest to what is know as pass-by-reference in C++. And the practice wasn't discouraged in C++, so ...
But if this is really a wrong way of doing things, any idea how to rewrite this in the clean way?
Thanks

Create an object that contains the data you want to return.
Then you can return an instance of that object.
class FooData {
private int someInt;
private int anotherInt;
private String someString;
public FooData(int a, int b, String c) {
someInt = a;
anotherInt = b;
someString = c;
}
}
public FooData foo() {
// do stuff
FooData fd = new FooData(blah, blahh, blahhh);
return fd;
}

While I agree with the general opinion here that using arrays for such a purpose is bad practice, I'd like to add a few things.
Are you sure that "pass by reference" really is what you need in the first place?
Many have said that your code is bad style, but now let me tell you why that is IMHO.
"Pass by reference" is mostly a synonym for "programming by side effect" which is a thing you always want to avoid. It makes code much harder to debug and understand, and in a multi-threaded environment, the bad effects of this attitude really can hit you hard.
To write scalable and thread-safe code in Java, you should make objects "read-only" as much as possible, i.e. ideally, you create an object and initialize it at the same time, then use it with this unmodifiable state throughout your application. Logical changes to the state can almost always be considered a "creation" of new state, i.e. creation of a new instance initialized to a state then needed. Many modern scripting languages only let you work in this way, and it makes things much easier to understand.
As opposed to C++, Java is much more efficient in allocating and releasing short-lived objects, so there is actually nothing wrong with what others here have suggested: To create an instance of a special class to hold the function result, just for the purpose of returning the result. Even if you do that in a loop, the JVM will be smart enough to deal with that efficiently. Java will only allocate memory from the OS in very large chunks when needed, and will deal with object creation and release internally without the overhead involved in languages like C/C++. "Pass by reference" really doesn't help you very much in Java.
EDIT: I suggest you search this forum or the net for the terms "side-effect", "functional programming" or "immutability". This will most likely open a new perspective to your question.

I believe that it is bad practice to "return" values using one-element arrays that are parameters to your method.
Here's another SO question about this topic. In short, it's very bad for readability.
There is an easy workaround: Wrap all values that you wish to return in a class you define specifically for this purpose, and return an instance of that class.
return new ValueHolder(someValue1, someValue2, someValue3);

That's not very idiomatic java. There are usually better approaches to software design.
What you're really doing with the "one-element array" is creating a mutable object (since String is immutable, as are primitives like int) and passing it by reference. Modifying this mutable object is called a "side effect" of the method. In general, you should minimize mutability (Effective Java Item 15) and your methods should be side-effect free. There are a couple approaches here.
1. Split the method into two (or three) methods that all take the same params:
public static int foo1(int param1)
{
// method body
....
return <some value>;
}
Similarly, you might have
public static int foo2(int param1) { ... }
and
public static String foo3(int param1) { ... }.
2. Return a composite object.
public Container {
private final int originalReturn;
private final int param2;
private final String param3;
public Container(int originalReturn, int param2, String param3) {
this.originalReturn = originalReturn;
this.param2 = param2;
this.param3 = param3;
}
// getters
}
public static Container foo(int param1, int param2[], String param3[])
{
// method body
....
// set return values
return new Container(<some value>, <some value>, <some value>);
}

This is indeed bad practice if the values are unrelated. This is usually an indicator that you can split that function into two, with each returning one of the values.
EDIT:
I am assuming that you are returning two values calculated in the method in an array. Is this not the case?
e.g.
public int[] getStatistics(int[] nums)
{
//code
int[] returns = new int[2];
returns[0] = mean;
returns[1] = mode;
return returns;
}
The above function could be split into getMean() and getMode().

Passing variables by reference allows the function to "legally" change their value. See this article to clear up the confusion of when this is possible in Java, and when it's not...

This is bad practice if the values are of different type and different entities, e.g. name and address, etc. It is fine with create an array with same data type, e.g list of addresses.

Why StringBuilder when there is String?

I just encountered StringBuilder for the first time and was surprised since Java already has a very powerful String class that allows appending.
Why a second String class?
Where can I learn more about StringBuilder?

String does not allow appending. Each method you invoke on a String creates a new object and returns it. This is because String is immutable - it cannot change its internal state.
On the other hand StringBuilder is mutable. When you call append(..) it alters the internal char array, rather than creating a new string object.
Thus it is more efficient to have:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 500; i ++) {
sb.append(i);
}
rather than str += i, which would create 500 new string objects.
Note that in the example I use a loop. As helios notes in the comments, the compiler automatically translates expressions like String d = a + b + c to something like
String d = new StringBuilder(a).append(b).append(c).toString();
Note also that there is StringBuffer in addition to StringBuilder. The difference is that the former has synchronized methods. If you use it as a local variable, use StringBuilder. If it happens that it's possible for it to be accessed by multiple threads, use StringBuffer (that's rarer)

Here is a concrete example on why -
int total = 50000;
String s = "";
for (int i = 0; i < total; i++) { s += String.valueOf(i); }
// 4828ms
StringBuilder sb = new StringBuilder();
for (int i = 0; i < total; i++) { sb.append(String.valueOf(i)); }
// 4ms
As you can see the difference in performance is significant.

String class is immutable whereas StringBuilder is mutable.
String s = "Hello";
s = s + "World";
Above code will create two object because String is immutable
StringBuilder sb = new StringBuilder("Hello");
sb.append("World");
Above code will create only one object because StringBuilder is not immutable.
Lesson: Whenever there is a need to manipulate/update/append String many times go for StringBuilder as its efficient as compared to String.

StringBuilder is for, well, building strings. Specifically, building them in a very performant way. The String class is good for a lot of things, but it actually has really terrible performance when assembling a new string out of smaller string parts because each new string is a totally new, reallocated string. (It's immutable) StringBuilder keeps the same sequence in-place and modifies it (mutable).

The StringBuilder class is mutable and unlike String, it allows you to modify the contents of the string without needing to create more String objects, which can be a performance gain when you are heavily modifying a string. There is also a counterpart for StringBuilder called StringBuffer which is also synchronized so it is ideal for multithreaded environments.
The biggest problem with String is that any operation you do with it, will always return a new object, say:
String s1 = "something";
String s2 = "else";
String s3 = s1 + s2; // this is creating a new object.

To be precise, StringBuilder adding all strings is O(N) while adding String's is O(N^2). Checking the source code, this is internally achieved by keeping a mutable array of chars. StringBuilder uses the array length duplication technique to achieve ammortized O(N^2) performance, at the cost of potentially doubling the required memory. You can call trimToSize at the end to solve this, but usually StringBuilder objects are only used temporarily. You can further improve performance by providing a good starting guess at the final string size.

Efficiency.
Each time you concatenate strings, a new string will be created. For example:
String out = "a" + "b" + "c";
This creates a new, temporary string, copies "a" and "b" into it to result in "ab". Then it creates another new, temporary string, copies "ab" and "c" into it, to result in "abc". This result is then assigned to out.
The result is a Schlemiel the Painter's algorithm of O(n²) (quadratic) time complexity.
StringBuilder, on the other hand, lets you append strings in-place, resizing the output string as necessary.

StringBuilder is good when you are dealing with larger strings. It helps you to improve performance.
Here is a article that I found that was helpful .
A quick google search could have helped you. Now you hired 7 different people to do a google search for you . :)

Java has String, StringBuffer and StringBuilder:
String : Its immutable
StringBuffer : Its Mutable and ThreadSafe
StringBuilder : Its Mutable but Not ThreadSafe, introduced in Java
1.5
String eg:
public class T1 {
public static void main(String[] args){
String s = "Hello";
for (int i=0;i<10;i++) {
s = s+"a";
System.out.println(s);
}
}
}
}
output: 10 Different Strings will be created instead of just 1 String.
Helloa
Helloaa
Helloaaa
Helloaaaa
Helloaaaaa
Helloaaaaaa
Helloaaaaaaa
Helloaaaaaaaa
Helloaaaaaaaaa
Helloaaaaaaaaaa
StringBuilder eg : Only 1 StringBuilder object will be created.
public class T1 {
public static void main(String[] args){
StringBuilder s = new StringBuilder("Hello");
for (int i=0;i<10;i++) {
s.append("a");
System.out.println(s);
}
}
}

Java: Out with the Old, In with the New

Java is nearing version 7. It occurs to me that there must be plenty of textbooks and training manuals kicking around that teach methods based on older versions of Java, where the methods taught, would have far better solutions now.
What are some boilerplate code situations, especially ones that you see people implement through force of habit, that you find yourself refactoring to utilize the latest versions of Java?

Enums. Replacing
public static final int CLUBS = 0;
public static final int DIAMONDS = 1;
public static final int HEARTS = 2;
public static final int SPADES = 3;
with
public enum Suit {
CLUBS,
DIAMONDS,
HEARTS,
SPADES
}

Generics and no longer needing to create an iterator to go through all elements in a collection. The new version is much better, easier to use, and easier to understand.
EDIT:
Before:
List l = someList;
Iterator i = l.getIterator();
while (i.hasNext()) {
MyObject o = (MyObject)i.next();
}
After
List<MyObject> l = someList;
for (MyObject o : l) {
//do something
}

Using local variables of type StringBuffer to perform string concatenation. Unless synchronization is required, it is now recommended to use StringBuilder instead, because this class offers better performance (presumably because it is unsynchronized).

reading a string from standard input:
Java pre-5:
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String str = reader.readLine();
reader.close();
}
catch (IOException e) {
System.err.println("error when closing input stream.");
}
Java 5:
Scanner reader = new Scanner(System.in);
String str = reader.nextLine();
reader.close();
Java 6:
Console reader = System.console();
String str = reader.readLine();

Older code using Thread instead of the many other alternatives to Thread... these days, very little of the code I run across still needs to use a raw thread. They would be better served by a level of abstraction, particular Callable/Futures/Executors.
See:
java.util.Timer
javax.swing.Timer
java.util.concurrent.*

Here is one that I see:
String.split() versus StringTokenizer.
StringTokenizer is not recommended for new code, but I still see people use it.
As for compatibility, Sun makes a huge effort to have Java be backwards and forwards compatible. That partially accounts for why generics are so complex. Deprecation is also supposed to help ease transitions from old to new code.

VARARGS can be useful too.
For example, you can use:
public int add(int... numbers){
int sum = 0 ;
for (int i : numbers){
sum+=i;
}
return sum ;
}
instead of:
public int add(int n1, int n2, int n3, int n4) ;
or
public int add(List<Integer> numbers) ;

Using local variables of type Vector to hold a list of objects. Unless synchronization is required, it is now recommended to use a List implementation such as ArrayList instead, because this class offers better performance (because it is unsynchronized).

Formatted printing was introduced as late as in JDK 1.5. So instead of using:
String str = "test " + intValue + " test " + doubleValue;
or the equivalent using a StringBuilder,
one can use
String str = String.format("test %d test %lg", intValue, doubleValue);
The latter is much more readable, both from the string concatenation and the string builder versions. Still I find that people adopt this style very slowly. Log4j framework for example, doesn't use this, although I believe it would be greatly benefited to do so.

Explicit conversion between primitive and wrapper types (e.g. Integer to int or vice versa) which is taken care of automatically by autoboxing/unboxing since Java 1.5.
An example is
Integer myInteger = 6;
int myInt = myInteger.intValue();
Can simply be written as
Integer myInteger = 6;
int myInt = myInteger;
But watch out for NullPointerExceptions :)

Q1: Well, the most obvious situations are in the generics / type specific collections. The other one that immediately springs to mind is the improved for loop, which I feel is a lot cleaner looking and easier to understand.
Q2: In general, I have been bundling the JVM along side of my application for customer-facing apps. This allows us to use new language features without having to worry about JVM incompatibility.
If I were not bundling the JRE, I would probably stick to 1.4 for compatibility reasons.

A simple change in since 1.5 but makes a small difference - in the Swing API accessing the contentPane of a JFrame:
myframe.getContentPane().add(mycomponent);
becomes
myframe.add(mycomponent);
And of course the introduction of Enums has changed the way many applications that used constants in the past behave.
String.format() has greatly improved String manipulation and the ternary if statement is quite helpful in making code easier to read.

Generic collections make coding much more bug-resistant.
OLD:
Vector stringVector = new Vector();
stringVector.add("hi");
stringVector.add(528); // oops!
stringVector.add(new Whatzit()); // Oh my, could spell trouble later on!
NEW:
ArrayList<String> stringList = new ArrayList<String>();
stringList.add("hello again");
stringList.add(new Whatzit()); // Won't compile!

Using Iterator:
List list = getTheList();
Iterator iter = list.iterator()
while (iter.hasNext()) {
String s = (String) iter.next();
// .. do something
}
Or an alternate form sometimes seen:
List list = getTheList();
for (Iterator iter = list.iterator(); iter.hasNext();) {
String s = (String) iter.next();
// .. do something
}
Is now all replaced with:
List<String> list = getTheList();
for (String s : list) {
// .. do something
}

Although I admit that static imports can easily be overused, I like to use
import static Math.* ;
in classes that use a lot of Math functions. It can really decrease the verbosity of your code. I wouldn't recommend it for lesser-known libraries, though, since that can lead to confusion.

copying an existing array to a new array:
pre-Java 5:
int[] src = new int[] {1, 2, 3, 4, 5};
int[] dest = new int[src.length];
System.arraycopy(src, 0, dest, 0, src.length);
Java 6:
int[] src = new int[] {1, 2, 3, 4, 5};
int[] dest = Arrays.copyOf(src, src.length);
formerly, I had to explicitly create a new array and then copy the source elements to the new array (calling a method with a lot of parameters). now, the syntax is cleaner and the new array is returned from a method, I don't have to create it. by the way, the method Arrays.copyOf has a variation called Arrays.copyOfRange, which copies a specific region of the source array (pretty much like System.arraycopy).

Converting a number to a String:
String s = n + "";
In this case I think there has always been a better way of doing this:
String s = String.valueOf(n);

The new for-each construct to iterate over arrays and collection are the biggest for me.
These days, when ever I see the boilerplate for loop to iterate over an array one-by-one using an index variable, it makes me want to scream:
// AGGHHH!!!
int[] array = new int[] {0, 1, 2, 3, 4};
for (int i = 0; i < array.length; i++)
{
// Do something...
}
Replacing the above with the for construct introduced in Java 5:
// Nice and clean.
int[] array = new int[] {0, 1, 2, 3, 4};
for (int n : array)
{
// Do something...
}
Clean, concise, and best of all, it gives meaning to the code rather than showing how to do something.
Clearly, the code has meaning to iterate over the collection, rather than the old for loop saying how to iterate over an array.
Furthermore, as each element is processed independent of other elements, it may allow for future optimizations for parallel processing without having to make changes to the code. (Just speculation, of course.)

Related to varargs; the utility method Arrays.asList() which, starting from Java 5, takes varargs parameters is immensely useful.
I often find myself simplifying something like
List<String> items = new ArrayList<String>();
items.add("one");
items.add("two");
items.add("three");
handleItems(items);
by using
handleItems(Arrays.asList("one", "two", "three"));

Annotations
I wonder no one mentioned it so far, but many frameworks rely on annotations, for example Spring and Hibernate. It is common today to deprecate xml configuration files are in favor of annotations in code (though this means losing flexibility in moving from configuration to meta-code, but is often the right choice).The best example is EJB 2 (and older) compared to EJB 3.0 and how programming EJB has been simplified thanks to annotations.
I find annotations also very useful in combination with some AOP tools like AspectJ or Spring AOP. Such combination can be very powerful.

Changing JUnit 3-style tests:
class Test extends TestCase {
public void testYadaYada() { ... }
}
to JUnit 4-style tests:
class Test {
#Test public void yadaYada() { ... }
}

Improved singleton patterns. Technically these are covered under the popular answer enums, but it's a significant subcategory.
public enum Singleton {
INSTANCE;
public void someMethod() {
...
}
}
is cleaner and safer than
public class Singleton {
public static final Singleton INSTANCE = new Singleton();
private Singleton() {
...
}
public void someMethod() {
...
}
}

Converting classes to use generics, thereby avoiding situations with unnecessary casts.

Okay, now it's my turn to get yelled at.
I don't recommend 90% of these changes.
It's not that it's not a good idea to use them with new code, but breaking into existing code to change a for loop to a for(:) loop is simply a waste of time and a chance to break something. (IIWDFWI) If it works, don't fix it!
If you are at a real development company, that change now becomes something to code-review, test and possibly debug.
If someone doing this kind of a refactor for no reason caused a problem of ANY sort, I'd give them no end of shit.
On the other hand, if you're in the code and changing stuff on that line anyway, feel free to clean it up.
Also, all the suggestions in the name of "Performance" really need to learn about the laws of optimization. In two words, Don't! Ever! (Google the "Rules of optimization if you don't believe me).

I'm a little wary to refactor along these lines if that is all you are doing to your source tree. The examples so far do not seem like reasons alone to change any working code base, but maybe if you are adding new functionality you should take advantage of all the new stuff.
At the end of the day, these example are not really removing boiler plate code, they are just using the more manageable constructs of newer JDKs to make nice looking boiler plate code.
Most ways to make your code elegant are not in the JDK.

Using Swing's new DefaultRowSorter to sort tables versus rolling your own from scratch.

New version of Java rarely break existing code, so just leave old code alone and focus on how the new feature makes your life easier.
If you just leave old code alone, then writing new code using new features isn't as scary.

String comparisons, really old school Java programmers I've met would do:
String s1 = "...", s2 = "...";
if (s1.intern() == s2.intern()) {
....
}
(Supposedly for performance reasons)
Whereas these days most people just do:
String s1 = "...", s2 = "...";
if (s1.equals(s2)) {
....
}

Using Vector instead of the new Collections.
Using classes instead of enums
public class Enum
{
public static final Enum FOO = new Enum();
public static final Enum BAR = new Enum();
}
Using Thread instead of the new java.util.concurrency package.
Using marker interfaces instead of annotations

It is worth noting that Java 5.0 has been out for five years now and there have only been minor changes since then. You would have to be working on very old code to be still refactoring it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Connect multiple strings efficiently - java

You may be looking for a StringBuilder. private StringBuilder tmp = new StringBuilder(); public void in(String in) { tmp.append(in); } public String out() { return tmp.toString(); }

Related

In Java Is it efficient to use new String() instead of double quotes when string is really large and not using again?

Surround string with another string

Is it a bad practice to use arrays as parameter(s) to return multiple values

Why StringBuilder when there is String?

Java: Out with the Old, In with the New

Categories

Resources