Java inline method/strategy for truncating String? - java

If I have a String:
String neatish = getTheString(); // returns "...neat..."
I know that I can get rid of the first ellipsis using:
getTheString().substring(3);
But I'm wondering if there's a similar one-line method that takes the end off, based on length? Like:
getTheString().substring(3).truncate(3);
I don't want a character-specific method, ie. one that works only on ellipses.
While there's a substring() that accepts two parameters, it requires the altering String to be saved off to a variable first to determine the length. (Or, you can call getTheString() twice.)
I can certainly write one, but I'm wondering if there is a one-line method either in the standard java.lang package, or in Apache Commons-Lang, that will accomplish this.
I'm 50% curious, and 50% avoiding re-inventing the wheel.
Update: To avoid confusion, I do not want:
String neatish = getTheString();
neatish = neatish.substring(3)...;
I'm instead looking for the back-end version of substring(), and wondering why there isn't one.

Fun exercise
String theString = new StringBuilder(getTheString()).delete(0, 3).reverse().delete(0, 3).reverse().toString();
Get the String into a StringBuilder, remove the first 3 chars, reverse it, remove the last 3 chars (which are now at the start), reverse it again.

You can use subStringBefore and subStringAfter from Commons:
StringUtils.subStringAfter(StringUtils.subStringBefore(getTheString(), "..."), "...");
EDIT based on length:
You need: StringUtils.html#substring(java.lang.String, int, int)
StringUtils.substring(getTheString(), 3, -3);

getTheString().substring(3).replaceFirst("(?s)...$", "");
getTheString().replaceFirst("(?s)^...(.*)...$", "$1");
This removes the last 3 chars by a regular expression, where (?s) make . also match newlines. Use \\. to match a period. UGLY.

You could write a method to do it.
public static String trimChar(String s, char ch) {
int start = 0;
for(; start < s.length(); start++)
if (s.charAt(start) != ch)
break;
int end = s.length();
for(; end > start; end--)
if (s.charAt(end-1) != ch)
break;
return s.substring(start, end);
}
String s = trimChar(getTheString(), '.');

Related

Check for double letters with indexOf?

I need to check if there are more than one of the same letters in a word.
For example, in the name 'bob' the index of 'b' is '0 and 2' but indexOf only creates a sees the first index of 0.
What I need is for it to check and then skip over 0 and go further down the work and check for more of the same letters. Here is what I have so far.
String wordNow = "bob";
letterGuess = console.next().toUpperCase();
letterIndex = wordNow.indexOf(letterGuess);
System.out.println(letterIndex);
OUTPUT: 0
If anyone has a good efficient way of doing this, i'm all ears.
You can use String.lastIndexOf for this. Since both functions will return -1 if not found, then to check if there is more than one instance, you can just compare the values
return wordNow.indexOf(letterGuess) != wordNow.lastIndexOf(letterGuess);
There are multiple versions of the method indexOf. One of them takes an index itself! Just read the javadoc for the string class carefully. You see there is even one called "lastIndexOf" which would come in really handy.
You can use that for example to see if there are other occurrences of that char "behind" the first index you found.
In any case: the real answer here is that you should study the documentation of classes extensively.
You can use a substring by excluding the matching character, as below:
String wordNow = "bob";
letterGuess = console.next().toUpperCase();
letterIndex = wordNow.indexOf(letterGuess);
System.out.println(letterIndex);
if(letterIndex >= 0) {
int secondIndex = wordNow.subString(letterIndex+1).indexOf(letterGuess);
System.out.println(secondIndex);
}
The most efficient way is to simply just search for the element you are looking for (assuming no order or distribution over the input string).
public boolean isCharacterRepeatedIgnoreCase(String inputString, Character c) {
int numFound = 0;
final Character chUpper = Character.toUpperCase(c);
final String upperCaseString = inputString.toUpperCase();
for (int i=0;i<upperCaseString.length();++i) {
if (upperCaseString.charAt(i) == chUpper) {
numFound++;
}
if (numFound > 1) {
return true;
}
}
return false;
}
Note, I have not run the above code. So please write proper unit tests if you plan on considering the above. Also, I have assumed that your character can fit into 16 bits. You probably want to do something around String or toUpperCase(int) to handle Unicode, see Oracle.

Formatting String Array efficiently in Java

I was working on some string formatting, and I was curious if I was doing it the most efficient way.
Assume I have a String Array:
String ArrayOne[] = {"/test/" , "/this/is/test" , "/that/is/" "/random/words" }
I want the result Array to be
String resultArray[] = {"test", "this_is_test" , "that_is" , "random_words" }
It's quite messy and brute-force-like.
for(char c : ArrayOne[i].toCharArray()) {
if(c == '/'){
occurances[i]++;
}
}
First I count the number of "/" in each String like above and then using these counts, I find the indexOf("/") for each string and add "_" accordingly.
As you can see though, it gets very messy.
Is there a more efficient way to do this besides the brute-force way I'm doing?
Thanks!
You could use replaceAll and replace, as follows:
String resultArray[] = new String[ArrayOne.length];
for (int i = 0; i < ArrayOne.length; ++i) {
resultArray[i] = ArrayOne[i].replaceAll("^/|/$", "").replace('/', '_');
}
The replaceAll method searches the string for a match to the regex given in the first argument, and replaces each match with the text in the second argument.
Here, we use it first to remove leading and trailing slashes. We search for slashes at the start of the string (^/) or the end of the string (/$), and replace them with nothing.
Then, we replace all remaining slashes with underscores using replace.

why does this for loop wordcount method not work in java

Can anyone let me know why this wordsearch method doesn't work - the returned value of count is 0 everytime I run it.
public int wordcount(){
String spaceString = " ";
int count = 0;
for(int i = 0; i < this.getString().length(); i++){
if (this.getString().substring(i).equals(spaceString)){
count++;
}
}
return count;
}
The value of getString = my search string.
Much appreciated if anyone can help - I'm sure I'm prob doing something dumb.
Dylan
Read the docs:
The substring begins with the character at the specified index and extends to the end of this string.
Your if condition is only true once, if the last character of the string is a space. Perhaps you wanted charAt? (And even this won't properly handle double spaces; splitting on whitespace might be a better option.)
Because substring with only one argument returns the sub string starting from that index till the end of the string. So you're not comparing just one character.
Instead of substring define spaceString as a char, and use charAt(i)
this.getString().substring(i) -> this returns a sub string from the index i to the end of the String
So for example if your string was Test the above would return Test, est, st and finally t
For what you're trying to do there are alternative methods, but you could simple replace
this.getString().substring(i)
with
spaceString.equals(this.getString().charAt(i))
An alternative way of doing what you're trying to do is:
this.getString().split(spaceString)
This would return an array of Strings - the original string broken up by spaces.
Read the documentation of the method you are using:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int)
I.e. the count will be non zero only if you have a space on the end of your string
Using substring as you are will not work. If the value of getString() is "my search string" every iteration through the loop with have substring(i) return:
my search string
y search string
search string
search string
earch string
arch string
rch string
ch string
h string
string
string
tring
ring
ing
ng
g
Notice none of those equals " ".
Try using split.
public int countWords(String s){
return s.split("\\s+").length;
}
Change
if (this.getString().substring(i).equals(spaceString))
to
if (this.getString().charAt(i) == ' ')
this.getString().substring(i) returns a string from the index of (i) to the end of the string.
Example: for i=5, it will return "rown cow" from the string "the brown cow". This functionality isn't what you need.
If you pepper System.out.println() throughout your code (or use the debugger), you will see this.
I think it would be better to use something like String.split() or charAt(i).
By the way, even if you fix your code by counting spaces, it will not return the correct value for these conditions: "my dog" (word count=2) and "cow" (word count=1). There is also a problem if there are more than one space between words. ALso, this will produce a word cound of three:
" the cow ".

Finding multiple substrings using boundaries in Java

Alright so here is my problem. Basically I have a string with 4 words in it, with each word seperated by a #. What I need to do is use the substring method to extract each word and print it out. I am having trouble figuring out the parameters for it though. I can always get the first one right, but the following ones generally have problems.
Here is the first piece of the code:
word = format.substring( 0 , format.indexOf('#') );
Now from what I understand this basically means start at the beginning of the string, and end right before the #. So using the same logic, I tried to extract the second word like so:
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#') );
//The plus one so I don't start at the #.
But with this I continually get errors saying it doesn't exist. I figured that the compiler was trying to read the first # before the second word, so I rewrote it like so:
wordTwo = format.substring (wordlength + 1, 1 + wordLength + format.indexOf('#') );
And with this it just completely screws it up, either not printing the second word or not stopping in the right place. If I could get any help on the formatting of this, it would be greatly appreciated. Since this is for a class, I am limited to using very basic methods such as indexOf, length, substring etc. so if you could refrain from using anything to complex that would be amazing!
If you have to use substring then you need to use the variant of indexOf that takes a start. This means you can start look for the second # by starting the search after the first one. I.e.
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#', wordlength + 1 ) );
There are however much better ways of splitting a string on a delimiter like this. You can use a StringTokenizer. This is designed for splitting strings like this. Basically:
StringTokenizer tok = new StringTokenizer(format, "#");
String word = tok.nextToken();
String word2 = tok.nextToken();
String word3 = tok.nextToken();
Or you can use the String.split method which is designed for splitting strings. e.g.
String[] parts = String.split("#");
String word = parts[0];
String word2 = parts[1];
String word3 = parts[2];
You can go with split() for this kind of formatting strings.
For instance if you have string like,
String text = "Word1#Word2#Word3#Word4";
You can use delimiter as,
String delimiter = "#";
Then create an string array like,
String[] temp;
For splitting string,
temp = text.split(delimiter);
You can get words like this,
temp[0] = "Word1";
temp[1] = "Word2";
temp[2] = "Word3";
temp[3] = "Word4";
Use split() method to do this with "#" as the delimiter
String s = "hi#vivek#is#good";
String temp = new String();
String[] arr = s.split("#");
for(String x : arr){
temp = temp + x;
}
Or if you want to exact each word... you have it already in arr
arr[0] ---> First Word
arr[1] ---> Second Word
arr[2] ---> Third Word
I suggest that you've a look at the Javadoc for String before you proceed further.
Since this is your homework, I'll give you a couple of hints and maybe you can solve it yourself:
The format for subString is public void subString(int beginIndex, int endIndex). As per the javadoc for this method:
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at
index endIndex - 1. Thus the length of the substring is
endIndex-beginIndex.
Note that if you've to use this method, understand that you'll have to shift your beginIndex and endIndex each time because in your situation, you'll have multiple words that are separated by #.
However if you look closely, there's another method in String class that might be helpful to you. That's the public String[] split(String regex) method. The javadoc for this one states:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
The split() method looks pretty interesting for your case. You can split your String with the delimiter that you have as the parameter to this method, get the String array and work with that.
Hope this helps you to understand your problem and get started towards a solution :)
Since this is a home work, it may be better to have try to write it your self. But I will give a clue.
Clue:
The indexOf method has another overload: int indexOf(int chr,
int fromIndex) which find the first character chr in the string
from the fromIndex.
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html
From this clue, the program will look something like this:
Find the index of the first '#' from the start of the string.
Extract the word from 0th character to that index.
Find the index of the first '#' from the character AFTER the first '#'.
Extract the word from the first '#' that index.
... Just do it until you get 4 words or the string ends.
Hope this helps.
I don't know why you're forced to use String#substring, but as others have mentioned, it seems like the wrong method for the kind of functionality you need.
String#split(String regex) is what you would use for such a problem, or, if your input sequence is something you don't control, I would suggest you look at the overloaded method String#split(String regex, int limit); this way you can impose a limit on the amount of matches you make, controlling your resulting array.

codingbat wordEnds using regex

I'm trying to solve wordEnds from codingbat.com using regex.
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
wordEnds("XYXY", "XY") → "XY"
This is the simplest as I can make it with my current knowledge of regex:
public String wordEnds(String str, String word) {
return str.replaceAll(
".*?(?=word)(?<=(.|^))word(?=(.|$))|.+"
.replace("word", java.util.regex.Pattern.quote(word)),
"$1$2"
);
}
replace is used to place in the actual word string into the pattern for readability. Pattern.quote isn't necessary to pass their tests, but I think it's required for a proper regex-based solution.
The regex has two major parts:
If after matching as few characters as possible ".*?", word can still be found "(?=word)", then lookbehind to capture any character immediately preceding it "(?<=(.|^))", match "word", and lookforward to capture any character following it "(?=(.|$))".
The initial "if" test ensures that the atomic lookbehind captures only if there's a word
Using lookahead to capture the following character doesn't consume it, so it can be used as part of further matching
Otherwise match what's left "|.+"
Groups 1 and 2 would capture empty strings
I think this works in all cases, but it's obviously quite complex. I'm just wondering if others can suggest a simpler regex to do this.
Note: I'm not looking for a solution using indexOf and a loop. I want a regex-based replaceAll solution. I also need a working regex that passes all codingbat tests.
I managed to reduce the occurrence of word within the pattern to just one.
".+?(?<=(^|.)word)(?=(.?))|.+"
I'm still looking if it's possible to simplify this further, but I also have another question:
With this latest pattern, I simplified .|$ to just .? successfully, but if I similarly tried to simplify ^|. to .? it doesn't work. Why is that?
Based on your solution I managed to simplify the code a little bit:
public String wordEnds(String str, String word) {
return str.replaceAll(".*?(?="+word+")(?<=(.|^))"+word+"(?=(.|$))|.+","$1$2");
}
Another way of writing it would be:
public String wordEnds(String str, String word) {
return str.replaceAll(
String.format(".*?(?="+word+")(?<=(.|^))"+word+"(?=(.|$))|.+",word),
"$1$2");
}
With this latest pattern, I simplified .|$ to just .? successfully, but if I similarly tried to simplify ^|. to .? it doesn't work. Why is that?
In Oracle's implementation, the behavior of look-behind is as follow:
By "studying" the regex (with study() method in each node), it knows the maximum length and minimum length of the pattern in look-behind group. (The study() method is what allows for obvious look-behind length)
It verifies the look-behind by starting a match at every position from index (current - min_length) to position (current - max_length) and exits early if the condition is satisfied.
Effectively, it will try to verify the look-behind on the shortest string first.
The implementation multiplies the matching complexity by O(k) factor.
This explains why changing ^|. to .? doesn't work: due to the starting position, it effectively checks for word before .word. The quantifier doesn't have a say here, since the ordering is imposed by the match range.
You can check the code of match method in Pattern.Behind and Pattern.NotBehind inner classes to verify what I said above.
In .NET's flavor, look-behind is likely implemented by the reverse matching feature, which means that no extra factor is incurred on the matching complexity.
My suspicion comes from the fact that the capturing group in (?<=(a+))b matches all a's in aaaaaaaaaaaaaab. The quantifier is shown to have free reign in look-behind group.
I have tested that ^|. can be simplified to .? in .NET and the regex works correctly.
I am working in .NET's regex but I was able to change your pattern to:
.+?(?<=(\w?)word)(?=(\w?))|.+
with the positive results. You know its a word (alphanumeric) type character, why not give a valid hint to the parser of that fact; instead of any character its an optional alpha numeric character.
It may answer why you don't need to specify the anchors of ^ and $, for what exactly is $ - is it \r or \n or other? (.NET has issues with $, and maybe you are not exactly capturing a Null of $, but the null of \r or \n which allowed you to change to .? for $)
Another solution to look at...
public String wordEnds(String str, String word) {
if(str.equals(word)) return "";
int i = 0;
String result = "";
int stringLen = str.length();
int wordLen = word.length();
int diffLen = stringLen - wordLen;
while(i<=diffLen){
if(i==0 && str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i+wordLen);
}else if(i==diffLen && str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i-1);
}else if(str.substring(i,i+wordLen).equals(word)){
result = result + str.charAt(i-1) + str.charAt(i+wordLen) ;
}
i++;
}
if(result.length()==1) result = result + result;
return result;
}
Another possible solution:
public String wordEnds(String str, String word) {
String result = "";
if (str.contains(word)) {
for (int i = 0; i < str.length(); i++) {
if (str.startsWith(word, i)) {
if (i > 0) {
result += str.charAt(i - 1);
}
if ((i + word.length()) < str.length()) {
result += str.charAt(i + word.length());
}
}
}
}
return result;
}

Categories

Resources