Alternative to successive String.replace - java

I want to replace some strings in a String input :
string=string.replace("<h1>","<big><big><big><b>");
string=string.replace("</h1>","</b></big></big></big>");
string=string.replace("<h2>","<big><big>");
string=string.replace("</h2>","</big></big>");
string=string.replace("<h3>","<big>");
string=string.replace("</h3>","</big>");
string=string.replace("<h4>","<b>");
string=string.replace("</h4>","</b>");
string=string.replace("<h5>","<small><b>");
string=string.replace("</h5>","</b><small>");
string=string.replace("<h6>","<small>");
string=string.replace("</h6>","</small>");
As you can see this approach is not the best, because each time I have to search for the portion to replace etc, and Strings are immutable... Also the input is large, which means that some performance issues are to be considered.
Is there any better approach to reduce the complexity of this code ?

Although StringBuilder.replace() is a huge improvement compared to String.replace(), it is still very far from being optimal.
The problem with StringBuilder.replace() is that if the replacement has different length than the replaceable part (applies to our case), a bigger internal char array might have to be allocated, and the content has to be copied, and then the replace will occur (which also involves copying).
Imagine this: You have a text with 10.000 characters. If you want to replace the "XY" substring found at position 1 (2nd character) to "ABC", the implementation has to reallocate a char buffer which is at least larger by 1, has to copy the old content to the new array, and it has to copy 9.997 characters (starting at position 3) to the right by 1 to fit "ABC" into the place of "XY", and finally characters of "ABC" are copied to the starter position 1. This has to be done for every replace! This is slow.
Faster Solution: Building Output On-The-Fly
We can build the output on-the-fly: parts that don't contain replaceable texts can simply be appended to the output, and if we find a replaceable fragment, we append the replacement instead of it. Theoretically it's enough to loop over the input only once to generate the output. Sounds simple, and it's not that hard to implement it.
Implementation:
We will use a Map preloaded with mappings of the replaceable-replacement strings:
Map<String, String> map = new HashMap<>();
map.put("<h1>", "<big><big><big><b>");
map.put("</h1>", "</b></big></big></big>");
map.put("<h2>", "<big><big>");
map.put("</h2>", "</big></big>");
map.put("<h3>", "<big>");
map.put("</h3>", "</big>");
map.put("<h4>", "<b>");
map.put("</h4>", "</b>");
map.put("<h5>", "<small><b>");
map.put("</h5>", "</b></small>");
map.put("<h6>", "<small>");
map.put("</h6>", "</small>");
And using this, here is the replacer code: (more explanation after the code)
public static String replaceTags(String src, Map<String, String> map) {
StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);
for (int pos = 0;;) {
int ltIdx = src.indexOf('<', pos);
if (ltIdx < 0) {
// No more '<', we're done:
sb.append(src, pos, src.length());
return sb.toString();
}
sb.append(src, pos, ltIdx); // Copy chars before '<'
// Check if our hit is replaceable:
boolean mismatch = true;
for (Entry<String, String> e : map.entrySet()) {
String key = e.getKey();
if (src.regionMatches(ltIdx, key, 0, key.length())) {
// Match, append the replacement:
sb.append(e.getValue());
pos = ltIdx + key.length();
mismatch = false;
break;
}
}
if (mismatch) {
sb.append('<');
pos = ltIdx + 1;
}
}
}
Testing it:
String in = "Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End";
System.out.println(in);
System.out.println(replaceTags(in, map));
Output: (wrapped to avoid scroll bar)
Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End
This solution is faster than using regular expressions as that involves much overhead, like compiling a Pattern, creating a Matcher etc. and regexp is also much more general. It also creates many temporary objects under the hood which are thrown away after the replace. Here I only use a StringBuilder (plus char array under its hood) and the code iterates over the input String only once. Also this solution is much faster that using StringBuilder.replace() as detailed at the top of this answer.
Notes and Explanation
I initialized the StringBuilder in the replaceTags() method like this:
StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);
So basically I created it with an initial capacity of 150% of the length of the original String. This is because our replacements are longer than the replaceable texts, so if replacing occurs, the output will obviously be longer than the input. Giving a larger initial capacity to StringBuilder will result in no internal char[] reallocation at all (of course the required initial capacity depends on the replaceable-replacement pairs and their frequency/occurrence in the input, but this +50% is a good upper estimation).
I also utilized the fact that all replaceable strings start with a '<' character, so finding the next potential replaceable position becomes blazing-fast:
int ltIdx = src.indexOf('<', pos);
It's just a simple loop and char comparisons inside String, and since it always starts searching from pos (and not from the start of the input), overall the code iterates over the input String only once.
And finally to tell if a replaceable String does occur at the potential position, we use the String.regionMatches() method to check the replaceable stings which is also blazing-fast as all it does is just compares char values in a loop and returns at the very first mismatching character.
And a PLUS:
The question doesn't mention it, but our input is an HTML document. HTML tags are case-insensitive which means the input might contain <H1> instead of <h1>.
To this algorithm this is not a problem. The regionMatches() in the String class has an overload which supports case-insensitive comparison:
boolean regionMatches(boolean ignoreCase, int toffset, String other,
int ooffset, int len);
So if we want to modify our algorithm to also find and replace input tags which are the same but are written using different letter case, all we have to modify is this one line:
if (src.regionMatches(true, ltIdx, key, 0, key.length())) {
Using this modified code, replaceable tags become case-insensitive:
Yo<H1>TITLE</H1><h3>Hi!</h3>Nice day.<H6>Hi back!</H6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End

For performance - use StringBuilder.
For convenience you can use Map to store values and replacements.
Map<String, String> map = new HashMap<>();
map.put("<h1>","<big><big><big><b>");
map.put("</h1>","</b></big></big></big>");
map.put("<h2>","<big><big>");
...
StringBuilder builder = new StringBuilder(yourString);
for (String key : map.keySet()) {
replaceAll(builder, key, map.get(key));
}
... To replace all occurences in StringBuilder you can check here:
Replace all occurrences of a String using StringBuilder?
public static void replaceAll(StringBuilder builder, String from, String to)
{
int index = builder.indexOf(from);
while (index != -1)
{
builder.replace(index, index + from.length(), to);
index += to.length(); // Move to the end of the replacement
index = builder.indexOf(from, index);
}
}

Unfortunately StringBuilder doesn't provide a replace(string,string) method, so you might want to consider using Pattern and Matcher in conjunction with StringBuffer:
String input = ...;
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("</?(h1|h2|...)>");
Matcher m = p.matcher( input );
while( m.find() )
{
String match = m.group();
String replacement = ...; //get replacement for match, e.g. by lookup in a map
m.appendReplacement( sb, replacement );
}
m.appendTail( sb );
You could do something similar with StringBuilder but in that case you'd have to implement appendReplacement etc. yourself.
As for the expression you could also just try and match any html tag (although that might cause problems since regex and arbitrary html don't fit very well) and when the lookup doesn't have any result you just replace the match with itself.

The particular example you provide seems to be HTML or XHTML. Trying to edit HTML or XML using regular expressions is frought with problems. For the kind of editing you seem to be interested in doing you should look at using XSLT. Another possibility is to use SAX, the streaming XML parser, and have your back-end write the edited output on the fly. If the text is actually HTML, you might be better using a tolerant HTML parser, such as JSoup, to build a parsed representation of the document (like the DOM), and manipulate that before outputting it.

StringBuilder is backed by a char array. So, unlike String instances, it is mutable. Thus, you can call indexOf() and replace() on the StringBuilder.

I would do something like this
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (tagEquals(str, i, "h1")) {
sb.append("<big><big><big><b>");
i += 2;
} else (tagEquals(s, i, "/h1")) {
...
} else {
sb.append(str.charAt(i));
}
}
tagEquals is a func which checks a tag name

Use Apache Commons StringUtils.replaceEach.
String[] searches = new String[]{"<h1>", "</h1>", "<h2>", ...};
String[] replacements = new String[]("<big><big><big><b>", "</b></big></big></big>", "<big><big>" ...};
string = StringUtils.replaceEach(string, searches, replacements);

Related

Efficient way to search for a set of strings in a string in Java

I have a set of elements of size about 100-200. Let a sample element be X.
Each of the elements is a set of strings (number of strings in such a set is between 1 and 4). X = {s1, s2, s3}
For a given input string (about 100 characters), say P, I want to test whether any of the X is present in the string.
X is present in P iff for all s belong to X, s is a substring of P.
The set of elements is available for pre-processing.
I want this to be as fast as possible within Java. Possible approaches which do not fit my requirements:
Checking whether all the strings s are substring of P seems like a costly operation
Because s can be any substring of P (not necessarily a word), I cannot use a hash of words
I cannot directly use regex as s1, s2, s3 can be present in any order and all of the strings need to be present as substring
Right now my approach is to construct a huge regex out of each X with all possible permutations of the order of strings. Because number of elements in X <= 4, this is still feasible. It would be great if somebody can point me to a better (faster/more elegant) approach for the same.
Please note that the set of elements is available for pre-processing and I want the solution in java.
You can use regex directly:
Pattern regex = Pattern.compile(
"^ # Anchor search to start of string\n" +
"(?=.*s1) # Check if string contains s1\n" +
"(?=.*s2) # Check if string contains s2\n" +
"(?=.*s3) # Check if string contains s3",
Pattern.DOTALL | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.find();
foundMatch is true if all three substrings are present in the string.
Note that you might need to escape your "needle strings" if they could contain regex metacharacters.
It sounds like you're prematurely optimising your code before you've actually discovered a particular approach is actually too slow.
The nice property about your set of strings is that the string must contain all elements of X as a substring -- meaning we can fail fast if we find one element of X that is not contained within P. This might turn out a better time saving approach than others, especially if the elements of X are typically longer than a few characters and contain no or only a few repeating characters. For instance, a regex engine need only check 20 characters in 100 length string when checking for the presence of a 5 length string with non-repeating characters (eg. coast). And since X has 100-200 elements you really, really want to fail fast if you can.
My suggestion would be to sort the strings in order of length and check for each string in turn, stopping early if one string is not found.
Looks like a perfect case for the Rabin–Karp algorithm:
Rabin–Karp is inferior for single pattern searching to Knuth–Morris–Pratt algorithm, Boyer–Moore string search algorithm and other faster single pattern string searching algorithms because of its slow worst case behavior. However, Rabin–Karp is an algorithm of choice for multiple pattern search.
When the preprocessing time doesn't matter, you could create a hash table which maps every one-letter, two-letter, three-letter etc. combination which occurs in at least one string to a list of strings in which it occurs.
The algorithm to index a string would look like that (untested):
HashMap<String, Set<String>> indexes = new HashMap<String, Set<String>>();
for (int pos = 0; pos < string.length(); pos++) {
for (int sublen=0; sublen < string.length-pos; sublen++) {
String substring = string.substr(pos, sublen);
Set<String> stringsForThisKey = indexes.get(substring);
if (stringsForThisKey == null) {
stringsForThisKey = new HashSet<String>();
indexes.put(substring, stringsForThisKey);
}
stringsForThisKey.add(string);
}
}
Indexing each string that way would be quadratic to the length of the string, but it only needs to be done once for each string.
But the result would be constant-speed access to the list of strings in which a specific string occurs.
You are probably looking for Aho-Corasick algorithm, which constructs an automata (trie-like) from the set of strings (dictionary), and try to match the input string to the dictionary using this automata.
You might want to consider using a "Suffix Tree" as well. I haven't used this code, but there is one described here
I have used proprietary implementations (that I no longer even have access to) and they are very fast.
One way is to generate every possible substring and add this to a set. This is pretty inefficient.
Instead you can create all the strings from any point to the end into a NavigableSet and search for the closest match. If the closest match starts with the string you are looking for, you have a substring match.
static class SubstringMatcher {
final NavigableSet<String> set = new TreeSet<String>();
SubstringMatcher(Set<String> strings) {
for (String string : strings) {
for (int i = 0; i < string.length(); i++)
set.add(string.substring(i));
}
// remove duplicates.
String last = "";
for (String string : set.toArray(new String[set.size()])) {
if (string.startsWith(last))
set.remove(last);
last = string;
}
}
public boolean findIn(String s) {
String s1 = set.ceiling(s);
return s1 != null && s1.startsWith(s);
}
}
public static void main(String... args) {
Set<String> strings = new HashSet<String>();
strings.add("hello");
strings.add("there");
strings.add("old");
strings.add("world");
SubstringMatcher sm = new SubstringMatcher(strings);
System.out.println(sm.set);
for (String s : "ell,he,ow,lol".split(","))
System.out.println(s + ": " + sm.findIn(s));
}
prints
[d, ello, ere, hello, here, ld, llo, lo, old, orld, re, rld, there, world]
ell: true
he: true
ow: false
lol: false

Java - Generating strings of length x

I have some 'heavy' string manipulation in my Java program, which often involves iterating through a String and replacing certain segments with filler characters, usually "#". These are characters are later removed but are used so that the length of the String and the current index are kept intact during the iteration.
This process usually involves replacing more than 1 character at a time.
e.g.
I might need to replace "cat" with "###" in the string "I love cats", giving "I love ###s",
So often I need to create strings of "#" with x length.
In python, this is easy.
NewString = "#" *x
In Java, I find my current method revolting.
String NewString = "";
for (int i=0; i< x; i++) {
NewString = NewString.concat("#"); }
Is there a proper, pre-established method for doing this?
Does anybody have a shorter, more 'golfed' method?
Thanks!
Specs:
Java SE (Jre7)
Windows 7 (32)
It's not clear to me what kind of regex the comments are suggesting, but creating a string filled with a particular character to the given length is pretty easy:
public static String createString(char character, int length) {
char[] chars = new char[length];
Arrays.fill(chars, character);
return new String(chars);
}
Guava has a nice little method Strings.repeat(String, int). Looking at the source of that method, it basically amounts to this:
StringBuilder builder = new StringBuilder(string.length() * count);
for (int i = 0; i < count; i++) {
builder.append(string);
}
return builder.toString();
Your way of building a string of length N is very inefficient. You should either use StringBuffer with its convenient append method, or build an array of N characters, and use the corresponding constructor of the String.
Can you always use the same characters in the "filler" String and do you know the maximum value of x? The you can create a constant upfront which can be cut to arbitrary length:
private static final FILLER = "##############################################";
// inside your method
String newString = FILLER.substring(0, x);
java.lang.String is immutable. So, concating strings would result in creation of temporary string objects and thus is slow. You should consider using a mutable buffer like StringBuffer or StringBuilder. Another best practice when working with strings in java is to prefer using CharSequence type wherever possible. This would avoid unnecessary calls to toString() and you can easily change the underlying implementation type.
If you are looking for a one liner to repeat strings and this justifies using an external library, have a look at StringUtils.repeat from Apache Commons library. But, I feel you can just write your own code than using another library for a trivial task of repeating strings.

Best way to retrieve a value from a string java

If I am being passed a string that contains comma delimited key-value pairs like this
seller=1000,country="canada",address="123 1st st", etc.
There seems like there must be a better way than parsing then iterating through.
What is the best way to retreive a value from this string based on the key name in Java?
Since release 10 Google Guava provides a class MapSplitter which does exactly that kind of things:
Map<String, String> params = Splitter
.on(",")
.withKeyValueSeparator("=")
.split("k1=v1,k2=v2");
You can create your own CSV parser, it's not very complicated but there are a few corner cases to be carfull with assuming of course you are using standard CSV format.
But why reinventing the wheel...
You can try looking up a CSV parser like
OpenCSV
SuperCSV
Apache Commons
There are others, look around I'm sure you will find one that suits your needs.
Usually you will want to parse the string into a map because you will be pulling various values perhaps multiple times, so it often makes sense to pay the parsing cost up-front.
If not, then here is how I would solve the problem (assuming you want to differentiate between int values and String values).:
public Object pullValue(String pairs, String key) {
boolean returnString = false;
int keyStart = pairs.indexOf(key + "=");
if (keyStart < 0) {
logger.error("Key " + key + " not found in key-value pairs string");
return null;
}
int valueStart = keyStart + key.length() + 1;
if (pairs.charAt(valueStart) == '"') {
returnString = true;
valueStart++; // Skip past the quote mark
}
int valueEnd;
if (returnString) {
valueEnd = pairs.indexOf('"', valueStart);
if (valueEnd < 0) {
logger.error("Unmatched double quote mark extracting value for key " + key)
}
return pairs.substring(valueStart, valueEnd);
} else {
valueEnd = pairs.indexOf(',', valueStart);
if (valueEnd < 0) { // If this is the last key value pair in string
valueEnd = pairs.length();
}
return Integer.decode(pairs.substring(valueStart, valueEnd));
}
}
Note that this solution assumes no spaces between the key, the equals sign, and the value. If these are possible you will have to create some code to travel the string between them.
Another solution is to use a regular expression parser. You could do something like (this is untested):
Pattern lookingForString = Pattern.compile(key + "[ \t]*=[ \t]*[\"]([^\"]+)[\"]");
Pattern lookingForInt = Pattern.compile(key + "[ \t]*=[ \t]*([^,]+)");
Matcher stringFinder = lookingForString.matcher(pairs);
Matcher intFinder = lookingForInt.matcher(pairs);
if (stringFinder.find()) {
return stringFinder.group(1);
} else if (intFinder.find()) {
return Integer.decode(intFinder.group(1));
} else {
logger.error("Could not extract value for key " + key);
return null;
}
HTH
To separate the string by commas, the other posters are correct. It is best to use a CSV parser (your own or OTS). Considering things like commas inside quotes etc can lead to a lot of un-considered problems.
Once you have each separate token in the form:
key = "value"
I think it is easy enough to look for the first index of '='. Then the part before that will be the key, and the part after that will be the value. Then you can store them in a Map<String, String>.
This is assuming that your keys will be simple enough, and not contain = in them etc. Sometimes it's enough to take the simple route when you can restrict the problem scope.
If you just want one value out of such a string, you can use String's indexOf() and substring() methods:
String getValue(String str, String key)
{
int keyIndex = str.indexOf(key + "=");
if(keyIndex == -1) return null;
int startIndex = str.indexOf("\"", keyIndex);
int endIndex = str.indexOf("\"", startIndex);
String value = str.substring(startIndex + 1, endIndex);
return value;
}
First thing you should use a CSV parsing library to parse the comma separated values. Correctly parsing CSV data isn't as trivial as it first seems. There are lots of good arguments to not reinvent that wheel.
This will also future proof your code and be code you don't have to test or maintain.
I know the temptation to do something like data.split(','); is strong, but it is fragile and brittle solution. For just one example, what if any of the values contain the ','.
Second thing you should do is then parse the pairs. Again the temptation to use String.split("="); will be strong, but it can be brittle and fragile if the right hand side of the = has an = in it.
I am not a blind proponent of regular expressions, but used with restraint they can be just the right tool for the job. Here is the regular expression to parse the name value pairs.
The regular expression ^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$, click the regular expression to test it online interactively. This works even for multiple double quotes in the right hand side of the name value pair.
This will match only what is on the left side of the first = and everything else on the right hand side, and strip the optional " off the string values, while still matching the non-quoted number values.
Given a List<String> list of the encoded name value pairs.
final Pattern p = Pattern.compile("^(.*)\s?=\s?("?([^"]*)"?|"(.*)")$");
final Map<String, String> map = new HashMap<String, String>(list.size());
for (final String nvp : list)
{
final Matcher m = p.matcher(nvp);
m.matches();
final String name = m.group(1);
final String value = m.group(2);
System.out.format("name = %s | value = %s\n", name, value);
}
Use String.split(yourdata, ',') and you will get a String[]. Then, perform String.split(String[i],"="), on each entry to separate property and values.
Ideally, you should move this data in a Properties object instance. You can then save/load it from XML easily. It has useful methods.
REM: I am assuming that you are savvy enough to understand that this solution won't work if values contain the separator (i.e., the comma) in them...

how to replace parts of string using regular expressions

I am not a beginner to regular expressions, but their use in perl seems a bit different than in Java.
Anyways, I basically have a dictionary of shorthand words and their definitions. I want to iterate over words in the dictionary and replace them with their meanings. what is the best way to do this in JAVA?
I have seen String.replaceAll(), String.replace(), as well as the Pattern/Matcher classes. I wish to do a case insensitive replacement along the lines of:
word =~ s/\s?\Q$short_word\E\s?/ \Q$short_def\E /sig
While I am at it, do you think that it is best to extract all the words from the string and then apply my dictionary or just apply the dictionary to the string? I know that I need to be careful, because the shorthand words could match parts of other shorthand meanings.
Hopefully this all makes sense.
Thanks.
Clarification:
Dictionary is something like:
lol:laugh out loud, rofl:rolling on the floor laughing, ll:like lemons
string is:
lol, i am rofl
replaced text:
laugh out loud, i am rolling on the floor laughing
notice how the ll wasnt added anywhere
The danger is false positives inside of normal words. "fell" != "felikes lemons"
One way is to split the words on whitespace (do multiple spaces need to be conserved?) then loop over the List performing the 'if contains() { replace } else { output original } idea above.
My output class would be a StringBuffer
StringBuffer outputBuffer = new StringBuffer();
for(String s: split(inputText)) {
outputBuffer.append( dictionary.contains(s) ? dictionary.get(s) : s);
}
Make your split method smart enough to return word delimiters also:
split("now is the time") -> now,<space>,is,<space>,the,<space><space>,time
Then you don't have to worry about conserving white space - the loop above will just append anything that isn't a dictionary word to the StringBuffer.
Here's a recent SO thread on retaining delimiters when regexing.
If you insist on using regex, this would work (taking Zoltan Balazs' dictionary map approach):
Map<String, String> substitutions = loadDictionaryFromSomewhere();
int lengthOfShortestKeyInMap = 3; //Calculate
int lengthOfLongestKeyInMap = 3; //Calculate
StringBuffer output = new StringBuffer(input.length());
Pattern pattern = Pattern.compile("\\b(\\w{" + lengthOfShortestKeyInMap + "," + lengthOfLongestKeyInMap + "})\\b");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String candidate = matcher.group(1);
String substitute = substitutions.get(candidate);
if (substitute == null)
substitute = candidate; // no match, use original
matcher.appendReplacement(output, Matcher.quoteReplacement(substitute));
}
matcher.appendTail(output);
// output now contains the text with substituted words
If you plan to process many inputs, pre-compiling the pattern is more efficient than using String.split(), which compiles a new Pattern each call.
(edit) Compiling all of the keys into a single pattern yields a more efficient approach, like so:
Pattern pattern = Pattern.compile("\\b(lol|rtfm|rofl|wtf)\\b");
// rest of the method unchanged, don't need the shortest/longest key stuff
This allows the regex engine to skip over any words that happen to be short enough but aren't in the list, saving you a lot of map accesses.
The first thing, that comes into my mind is this:
...
// eg: lol -> laugh out loud
Map<String, String> dictionatry;
ArrayList<String> originalText;
ArrayList<String> replacedText;
for(String string : originalText) {
if(dictionary.contains(string)) {
replacedText.add(dictionary.get(string));
} else {
replacedText.add(string);
}
...
Or you could use a StringBuffer instead of the replacedText.

What's a good way of building up a String given specific start and end locations?

(java 1.5)
I have a need to build up a String, in pieces. I'm given a set of (sub)strings, each with a start and end point of where they belong in the final string. Was wondering if there were some canonical way of doing this. This isn't homework, and I can use any licensable OSS, such as jakarta commons-lang StringUtils etc.
My company has a solution using a CharBuffer, and I'm content to leave it as is (and add some unit tests, of which there are none (?!)) but the code is fairly hideous and I would like something easier to read.
As I said this isn't homework, and I don't need a complete solution, just some pointers to libraries or java classes that might give me some insight. The String.Format didn't seem QUITE right...
I would have to honor inputs too long and too short, etc. Substrings would be overlaid in the order they appear (in case of overlap).
As an example of input, I might have something like:
String:start:end
FO:0:3 (string shorter than field)
BAR:4:5 (String larger than field)
BLEH:5:9 (String overlays previous field)
I'd want to end up with
FO BBLEH
01234567890
(Edit: To all - StringBuilder (and specifically, the "pre-allocate to a known length, then use .replace()" theme) seems to be what I'm thinking of. Thanks to all who suggested it!)
StringBuilder output = new StringBuilder();
// for each input element
{
while (output.length() < start)
{
output.append(' ');
}
output.replace(start, end, string);
}
You could also establish the final size of output before inserting any string into it. You could make a first pass through the input elements to find the largest end. This will be the final size of output.
char[] spaces = new char[size];
Arrays.fill(spaces, ' ');
output.append(spaces);
Will StringBuilder do?
StringBuilder sb = new StringBuilder();
sb.setLength(20);
sb.replace(0, 3, "FO");
sb.replace(4, 5, "BAR");
sb.replace(5, 9, "BLEH");
System.out.println("[" + sb.toString().replace('\0', ' ') + "]");
// prints "[FO BBLEH ]"
If I understand your requirements correctly, you should be able to do this with the standard java.lang.StringBuilder:
public class StringAssembler
{
private final StringBuilder builder = new StringBuilder();
public void addPiece(String input, int start, int end)
{
final String actualInput = input.substring(0, end-start+1);
builder.insert(start, actualInput);
}
public String getFullString()
{
return builder.toString();
}
}
In particular, I don't think that the end parameter is strictly necessary, in that all it can do is change the length of the input string, hence the two steps in my addPiece method.
Note that this is not tested, and probably doesn't do the right thing in edge cases, but it should give you something to start from.
You can use StringUtils.rightPad(str, size) to add the necessary number of spaces. And you can use the following to strip the unneeded characters:
if (str.length() > size) {
str = str.substring(size);
}

Categories

Resources