What ways can you create a string with 2000 "spaces" - java

For various reasons I am trying to set a string to 2000 spaces. Currently I am using:
String s = String.format("%1$-2000s"," ");
This is great for Java 5, however, some of the developers in our department are using 1.4 and this does not work.
I was wondering, are any other ways of achieving the same result? I know I can do things like a for loop adding a space at a time, but I am looking for something simple like the format option.
For those that may be interested in why I need this, it is because we have an XML type on a dataobject that on insert into the DB is null. It then gets updated with the XML string, usually around 2000 characters in size. In Oracle pre-reserving this space can prevent row migration, therefore, increasing performance.
Thanks!

char[] spacesArray = new char[2000];
Arrays.fill(spacesArray, ' ');
String spaces = new String(spacesArray);

the simplest answer: (scroll to see all the codes)
String s = " "; // 2000 spaces

You can use lpad(' ',2000,' ') in the insert statement directly to tell Oracle to create the value you want.
In fact, you can set the field in question to have this as the default, which could prevent you from needing to change it in multiple places (if your code is explicitly sending null as the value for the field, that will override the default).

A StringBuffer and then add a space 2000 times in a loop, and toString() afterwards. I don't think there are any "simpler" ways to do it which doesn't end up doing this anyway under the covers.
If you do this a lot, it would make a good library function.

A random function I found in my personal library:
public static String whiteSpace2(int l) {
if (l==0) return "";
String half=whiteSpace2(l/2);
if ((l&1)!=0) {
return half+" "+half;
} else {
return half+half;
}
}
Not claiming it is the fastest possible way to generate whitespace, but it works :-)

StringUtils.repeat(" ", 2000) (from commons-lang)
However, I'm not sure whether such micro-optimizations should be made with the cost of code that would require a 5 line comment to explain why is this needed. If you do it - be sure to add an extensive comment, otherwise imagine the reaction of those reading your code.

If nothing else works:
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 2000; ++i)
sb.append(" ");
String str = new String(sb);

See this other question.
Can I multiply strings in Java to repeat sequences?
Both Apache Commons StringUtils and Google Guava libraries have commands to multiply (repeat) strings.

Related

String .contains VS Set<String> .contains VS Regex String.matches()

I have two sets of strings which are not very long (200~500 words) in two files which looks like this:
File1 File2
this window
that good
word work
java fine
book home
All unique words.
Now First read the strings from file (line-by-line) and store them in:
Set<String> set1 Set<String> set2: That may looks like this: [this, that, word, java, book] and [window, good, work, fine, home]
Or
String str1 String str2: That may looks like this: str1: thisthatwordjava and str2: windowgoodworkfinehome OR can be str1: this,that,word,java (separated by comma).
Now there are three ways to check the word home in which Set or String will be present:
To use set1/2.contains("home")
To use str1/2.contains("home")
To use str1/2.matches("home")
All of the above will work fine, but which one the BEST one
Note: The purpose of this question is because the frequency of checking for string is very high.
Don't Make Performance Assumptions
What makes you think that String.contains will have "better performance"?
It won't, except for very simple cases, that is if:
your list of strings is short,
the strings to compare are short,
you want to do a one-time lookup.
For all other cases, the Set approach will scale and work better. Sure you'll have a memory overhead for the Set as opposed to a single string, but the O(1) lookups will remain constant even if you want to store millions of strings and compare long strings.
The Right Data-Structure and Algorithm for the Right Job
Use the safer and more robust design, especially as here it's not a difficult solution to implement. And as you mention that you will check frequently, then a set approach is definitely better for you.
Also, String.contain will be unsafe, as if your both have matching strings and substrings your lookups will fail. As kennytm said in a comment, if we use your example, and you have the "java" string in your list, looking up "ava" will match it, which you apparently don't want.
Pick the Right Set
You may not want to use the simple HashSet or to tweak its settings though. For instance, you could consider a Guava ImmutableSet, if your set will be created only once but checked very often.
Examples
Here's what I'd do, assuming you want an immutable set (as you say you read the list of strings from a file). This is off-hand and without verification so forgive the lack of ceremonies.
Using Java 8 + Guava
import com.google.common.collect.ImmutableSet;
import com.google.common.io.Files;
import com.google.common.base.Splitter;
final Set<String> lookupTable = ImmutableSet.copyOf(
Splitter.on(',')
.trimResults()
.omitEmptyStrings()
.split(Files.asCharSource(new File("YOUR_FILE_PATH"), Charsets.UTF_8).read())
);
Season to taste with correct path, correct charset, and with or without trimming if you want to allow spaces and an empty string.
Using Only Java 8
If you don't want to use Guava and only vanilla Java, then simply do something like this in Java 8 (again, apologies, untested):
final Set<String> lookupTable =
Files.lines(Paths.get("YOUR_FILE_PATH"))
.map(line -> line.split(",+"))
.map(Arrays::stream)
.collect(toSet());
Using Java < 8
If you have Java < 8, then use the usual FileInputStream to read the file, then String.split[] or StringTokenizer to extract an array, and finally add the array entries into a Set.
I guess you read the line(s) of the file into a String anyway, so splitting it and storing the substrings in a set isn't more optimal if you plan only one query.
Set should take more memory space but less execution time if given the word without comas (which can be done with a simple split)
but what i really think is the best way is the experimental proof System.currentTimeMillis()
If you want to know something about performence differences. Simply measure it. Here is a test setting for you.
final int WORDS = 10000;
final int SEARCHES = 1000000;
Set<String> strSet = new TreeSet<String>();
String strStr = "";
int[] searches = new int[SEARCHES];
Random randomGenerator = new Random();
// filling set and string
for(int i = 0; i < WORDS; i++){
strSet.add(String.valueOf(i));
strStr += "," + String.valueOf(i);
}
// creating searches
for(int i = 0; i < SEARCHES; i++)
searches[i] = randomGenerator.nextInt(WORDS);
// measure set
long startTime = System.currentTimeMillis();
for(int i = 0; i < SEARCHES; i++)
strSet.contains(String.valueOf(searches[i]));
System.out.println("set result " + (System.currentTimeMillis() - startTime));
// measure string
startTime = System.currentTimeMillis();
for(int i = 0; i < SEARCHES; i++)
strStr.contains(String.valueOf(searches[i]));
System.out.println("string result " + (System.currentTimeMillis() - startTime));
For me the output is a meaningful proof that you should stay with a Set
set result 350
string result 14197

Build code by using Concatenation?

Is there a way i can create code build code by using Concatenation in Android studio/eclipse?
In other words i have 2 sets of strings one for each country i am dealing with ZA and KE. They have 2 different EULA's.
So i would like to pull the string related to the respective country.
String message = mContext.getString(R.string.eula_string_za);
above is an example of the output code. is there someway i can go about "creating" that based on If statements?
String str = "mContext.getString(R.string.eula_string_";
if (something = "ZA") {
str += "za);";
} else {
str += "ke);";
}
so if the country selected is ZA then the output code should be
mContext.getString(R.string.eula_string_za);
and if its KE it should be
mContext.getString(R.string.eula_string_ke);
and then the result will then pull the correct string from strings.xml?
Java is a compiled code, not an executed one,you can't write code this way like in an interpreted language.
The best way to manage different languages in android is to use a string.xml file for each language.
Take a look at this tutorial, it will help you a lot :
Supporting different languages in android
If you want to go this route you could try to use reflection. Have a look at Class.getField(…) if you want to use reflection.
Instead of first building a code string using a if statement you can also use the same if statement to find the correct string:
String str;
if (something.equals("ZA")) {
str = mContext.getString(R.string.eula_string_za);
} else {
str = mContext.getString(R.string.eula_string_ke);
}
Note that your condition something = "ZA" does not do what you think it does: It assigns something the string "ZA" and then evaluates itself to "ZA", so this would not even compile. The correct way would be something == "ZA", but even this does not work in the general case. You need to use String.equals(…). Some even argue you should use it the other way around (i.e. "ZA".equals(something)) to avoid a NullPointerException…
Another possibility would be to first build a Map from county to the corresponding string ID for all the EULAs you have and then asking the Map to return the correct one.
But probably the cleanest solution would be to use Androids built in mechanism, as hkN suggests.

Is chain of StringBuilder.append more efficient than string concatenation?

According to Netbeans hint named Use chain of .append methods instead of string concatenation
Looks for string concatenation in the parameter of an invocation of the append method of StringBuilder or StringBuffer.
Is StringBuilder.append() really more efficient than strings concatenation?
Code sample
StringBuilder sb = new StringBuilder();
sb.append(filename + "/");
vs.
StringBuilder sb = new StringBuilder();
sb.append(filename).append("/");
You have to balance readability with functionality.
Let's say you have the following:
String str = "foo";
str += "bar";
if(baz) str += "baz";
This will create 2 string builders (where you only need 1, really) plus an additional string object for the interim. You would be more efficient if you went:
StringBuilder strBuilder = new StringBuilder("foo");
strBuilder.append("bar");
if(baz) strBuilder.append("baz");
String str = strBuilder.toString();
But as a matter of style, I think the first one looks just fine. The performance benefit of a single object creation seems very minimal to me. Now, if instead of 3 strings, you had 10, or 20, or 100, I would say the performance outweighs the style. If it was in a loop, for sure I'd use the string builder, but I think just a couple strings is fine to do the 'sloppy' way to make the code look cleaner. But... this has a very dangerous trap lurking in it! Read on below (pause to build suspense... dun dun dunnnn)
There are those who say to always use the explicit string builder. One rationale is that your code will continue to grow, and it will usually do so in the same manner as it is already (i.e. they won't take the time to refactor.) So you end up with those 10 or 20 statements each creating their own builder when you don't need to. So to prevent this from the start, they say always use an explicit builder.
So while in your example, it's not going to be particularly faster, when someone in the future decides they want a file extension on the end, or something like that, if they continue to use string concatenation instead of a StringBuilder, they're going to run into performance problems eventually.
We also need to think about the future. Let's say you were making Java code back in JDK 1.1 and you had the following method:
public String concat(String s1, String s2, String s3) {
return s1 + s2 + s3;
}
At that time, it would have been slow because StringBuilder didn't exist.
Then in JDK 1.3 you decided to make it faster by using StringBuffer (StringBuilder still doesn't exist yet). You do this:
public String concat(String s1, String s2, String s3) {
StringBuffer sb = new StringBuffer();
sb.append(s1);
sb.append(s2);
sb.append(s3);
return sb.toString();
}
It gets a lot faster. Awesome!
Now JDK 1.5 comes out, and with it comes StringBuilder (which is faster than StringBuffer) and the automatic transation of
return s1 + s2 + s3;
to
return new StringBuilder().append(s1).append(s2).append(s3).toString();
But you don't get this performance benefit because you're using StringBuffer explicitly. So by being smart, you have caused a performance hit when Java got smarter than you. So you have to keep in mind that there are things out there you won't think of.
Well, your first example is essentially translated by the compiler into something along the lines:
StringBuilder sb = new StringBuilder();
sb.append(new StringBuilder().append(filename).append("/").toString());
so yes, there is a certain inefficiency here. However, whether it really matters in your program is a different question. Aside from being questionable style (hint: subjective), it usually only matters, if you are doing this in a tight loop.
None of the answers so far explicitly address the specific case that hint is for. It's not saying to always use StringBuilder#append instead of concatenation. But, if you're already using a StringBuilder, it doesn't make sense to mix in concatenation, because it creates a redundant StringBuilder (See Dirk's answer) and an unnecessary temporary String instance.
Several answers already discuss why the suggested way is more efficient, but the main point is, if you already have a StringBuilder instance, just call append on it. It's just as readable (in my opinion, and apparently whoever wrote the NetBeans hint) since you're calling append anyway, and it's a little more efficient.
Theoretically, yes. Because String objects are immutable: once constructed they cannot be changed anymore. So using "+" (concatenation) basically creates a new object each time.
Practically no. The compiler is clever enough to replace all your "+" with StringBuilder appendings.
For a more detailed explanation:
http://kaioa.com/node/59
PS: Netbeans??? Come on!
A concat of two strings is faster using this function.
However, if you have multiple strings or different data type, you should use a StringBuilder either explicitly or implicitly. Using a + with Strings is using a StringBuilder implicitly.
It's only more efficient if you are using lots of concatenation and really long strings. For general-use, such as creating a filename in your example, any string concatenation is just fine and more readable.
At any rate, this part of your application is unlikely to be the performance bottleneck.

Printing an ArrayList of Strings to a PrintWriter with word wrap

Some classmates and I are working on a homework assignment for Java that requires we print an ArrayList of Strings to a PrintWriter using word wrap, so that none of the output passes 80 characters. We've extensively Googled this and can't find any Java API based way to do this.
I know it's generally "wrong" to ask a homework question on SO, but we're just looking for recommendations of the best way to do this, or if we missed something in the API. This isn't the major part of the homework, just a small output requirement.
Ideally, I'd like to be able to wordwrap the ArrayList's toString since it's nicely formatted already.
Well, this is a first for me, it's the first time one of my students has posted a question about one of the projects I've assigned them. The way it was phrased, that he was looking for an algorithm, and the answers you've all shared are just fine with me. However, this is a typical case of trying to make things too complicated. A part of the spec that was not mentioned was that the 80 characters limit was not a hard limit. I said that each line of the output file had to be roughly 80 characters long. It was OK to go over 80 a little. In my version of the solution, I just had a running count and did a modulus of the count to add the line end. I varied the value of the modulus until the output file looked right. This resulted in lines with small numbers being really short so I used a different modulus when the numbers were small. This wasn't a big part of the project and it's interesting that this got so much attention.
Our solution was to create a temporary string and append elements one by one, followed by a comma. Before adding an element, check if adding it will make the string longer than 80 characters and choose whether to print it and reset or just append.
This still has the issue with the extra trailing comma, but that's been dealt with so many times we'll be fine. I was looking to avoid this because it was originally more complicated in my head than it really is.
I think that better solution is to create your own WrapTextWriter that wraps any other writer and overrides method public void write(String str, int off, int len) throws IOException. Here it should run in loop and perform logic of wrapping.
This logic is not as simple as str.substring(80). If you are dealing with real text and wish to wrap it correctly (i.e. do not cut words, do not move comas or dots to the next line etc) you have to implement some logic. it is probably not too complicated but probably language dependent. For example in English there is not space between word and colon while in French they put space between them.
So, I performed 5 second googling and found the following discussion that can help you.
private static final int MAX_CHARACTERS = 80;
public static void main(String[] args)
throws FileNotFoundException
{
List<String> strings = new ArrayList<String>();
int size = 0;
PrintWriter writer = new PrintWriter(System.out, true); // Just as example
for (String str : strings)
{
size += str.length();
if (size > MAX_CHARACTERS)
{
writer.print(System.getProperty("line.separator") + str);
size = 0;
}
else
writer.print(str);
}
}
You can simply write a function, like "void printWordWrap(List<String> strings)", with that algorithm inside. I think, it`s a good way to solve your problem. :)

The best alternative for String flyweight implementation in Java

My application is multithreaded with intensive String processing. We are experiencing excessive memory consumption and profiling has demonstrated that this is due to String data. I think that memory consumption would benefit greatly from using some kind of flyweight pattern implementation or even cache (I know for sure that Strings are often duplicated, although I don't have any hard data in that regard).
I have looked at Java Constant Pool and String.intern, but it seems that it can provoke some PermGen problems.
What would be the best alternative for implementing application-wide, multithreaded pool of Strings in java?
EDIT: Also see my previous, related question: How does java implement flyweight pattern for string under the hood?
Note: This answer uses examples that might not be relevant in modern runtime JVM libraries. In particular, the substring example is no longer an issue in OpenJDK/Oracle 7+.
I know it goes against what people often tell you, but sometimes explicitly creating new String instances can be a significant way to reduce your memory.
Because Strings are immutable, several methods leverage that fact and share the backing character array to save memory. However, occasionally this can actually increase the memory by preventing garbage collection of unused parts of those arrays.
For example, assume you were parsing the message IDs of a log file to extract warning IDs. Your code would look something like this:
//Format:
//ID: [WARNING|ERROR|DEBUG] Message...
String testLine = "5AB729: WARNING Some really really really long message";
Matcher matcher = Pattern.compile("([A-Z0-9]*): WARNING.*").matcher(testLine);
if ( matcher.matches() ) {
String id = matcher.group(1);
//...do something with id...
}
But look at the data actually being stored:
//...
String id = matcher.group(1);
Field valueField = String.class.getDeclaredField("value");
valueField.setAccessible(true);
char[] data = ((char[])valueField.get(id));
System.out.println("Actual data stored for string \"" + id + "\": " + Arrays.toString(data) );
It's the whole test line, because the matcher just wraps a new String instance around the same character data. Compare the results when you replace String id = matcher.group(1); with String id = new String(matcher.group(1));.
This is already done at the JVM level. You only need to ensure that you aren't creating new Strings everytime, either explicitly or implicitly.
I.e. don't do:
String s1 = new String("foo");
String s2 = new String("foo");
This would create two instances in the heap. Rather do so:
String s1 = "foo";
String s2 = "foo";
This will create one instance in the heap and both will refer the same (as evidence, s1 == s2 will return true here).
Also don't use += to concatenate strings (in a loop):
String s = "";
for (/* some loop condition */) {
s += "new";
}
The += implicitly creates a new String in the heap everytime. Rather do so
StringBuilder sb = new StringBuilder();
for (/* some loop condition */) {
sb.append("new");
}
String s = sb.toString();
If you can, rather use StringBuilder or its synchronized brother StringBuffer instead of String for "intensive String processing". It offers useful methods for exactly those purposes, such as append(), insert(), delete(), etc. Also see its javadoc.
Java 7/8
If you are doing what the accepted answer says and using Java 7 or newer you are not doing what it says you are.
The implementation of subString() has changed.
Never write code that relies on an implementation that can change drastically and might make things worse if you are relying on the old behavior.
1950 public String substring(int beginIndex, int endIndex) {
1951 if (beginIndex < 0) {
1952 throw new StringIndexOutOfBoundsException(beginIndex);
1953 }
1954 if (endIndex > count) {
1955 throw new StringIndexOutOfBoundsException(endIndex);
1956 }
1957 if (beginIndex > endIndex) {
1958 throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
1959 }
1960 return ((beginIndex == 0) && (endIndex == count)) ? this :
1961 new String(offset + beginIndex, endIndex - beginIndex, value);
1962 }
So if you use the accepted answer with Java 7 or newer you are creating twice as much memory usage and garbage that needs to be collected.
Effeciently pack Strings in memory! I once wrote a hyper memory efficient Set class, where Strings were stored as a tree. If a leaf was reached by traversing the letters, the entry was contained in the set. Fast to work with, too, and ideal to store a large dictionary.
And don't forget that Strings are often the largest part in memory in nearly every app I profiled, so don't care for them if you need them.
Illustration:
You have 3 Strings: Beer, Beans and Blood. You can create a tree structure like this:
B
+-e
+-er
+-ans
+-lood
Very efficient for e.g. a list of street names, this is obviously most reasonable with a fixed dictionary, because insert cannot be done efficiently. In fact the structure should be created once, then serialized and afterwards just loaded.
First, decide how much your application and developers would suffer if you eliminated some of that parsing. A faster application does you no good if you double your employee turnover rate in the process! I think based on your question we can assume you passed this test already.
Second, if you can't eliminate creating an object, then your next goal should be to ensure it doesn't survive Eden collection. And parse-lookup can solve that problem. However, a cache "implemented properly" (I disagree with that basic premise, but I won't bore you with the attendant rant) usually brings thread contention. You'd be replacing one kind of memory pressure for another.
There's a variation of the parse-lookup idiom that suffers less from the sort of collateral damage you usually get from full-on caching, and that's a simple precalculated lookup table (see also "memoization"). The Pattern you usually see for this is the Type Safe Enumeration (TSE). With the TSE, you parse the String, pass it to the TSE to retrieve the associated enumerated type, and then you throw the String away.
Is the text you're processing free-form, or does the input have to follow a rigid specification? If a lot of your text renders down to a fixed set of possible values, then a TSE could help you here, and serves a greater master: Adding context/semantics to your information at the point of creation, instead of at the point of use.

Categories

Resources