Finding multiple substrings using boundaries in Java

Finding multiple substrings using boundaries in Java - java

Alright so here is my problem. Basically I have a string with 4 words in it, with each word seperated by a #. What I need to do is use the substring method to extract each word and print it out. I am having trouble figuring out the parameters for it though. I can always get the first one right, but the following ones generally have problems.
Here is the first piece of the code:
word = format.substring( 0 , format.indexOf('#') );
Now from what I understand this basically means start at the beginning of the string, and end right before the #. So using the same logic, I tried to extract the second word like so:
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#') );
//The plus one so I don't start at the #.
But with this I continually get errors saying it doesn't exist. I figured that the compiler was trying to read the first # before the second word, so I rewrote it like so:
wordTwo = format.substring (wordlength + 1, 1 + wordLength + format.indexOf('#') );
And with this it just completely screws it up, either not printing the second word or not stopping in the right place. If I could get any help on the formatting of this, it would be greatly appreciated. Since this is for a class, I am limited to using very basic methods such as indexOf, length, substring etc. so if you could refrain from using anything to complex that would be amazing!

If you have to use substring then you need to use the variant of indexOf that takes a start. This means you can start look for the second # by starting the search after the first one. I.e.
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#', wordlength + 1 ) );
There are however much better ways of splitting a string on a delimiter like this. You can use a StringTokenizer. This is designed for splitting strings like this. Basically:
StringTokenizer tok = new StringTokenizer(format, "#");
String word = tok.nextToken();
String word2 = tok.nextToken();
String word3 = tok.nextToken();
Or you can use the String.split method which is designed for splitting strings. e.g.
String[] parts = String.split("#");
String word = parts[0];
String word2 = parts[1];
String word3 = parts[2];

You can go with split() for this kind of formatting strings.
For instance if you have string like,
String text = "Word1#Word2#Word3#Word4";
You can use delimiter as,
String delimiter = "#";
Then create an string array like,
String[] temp;
For splitting string,
temp = text.split(delimiter);
You can get words like this,
temp[0] = "Word1";
temp[1] = "Word2";
temp[2] = "Word3";
temp[3] = "Word4";

Use split() method to do this with "#" as the delimiter
String s = "hi#vivek#is#good";
String temp = new String();
String[] arr = s.split("#");
for(String x : arr){
temp = temp + x;
}
Or if you want to exact each word... you have it already in arr
arr[0] ---> First Word
arr[1] ---> Second Word
arr[2] ---> Third Word

I suggest that you've a look at the Javadoc for String before you proceed further.
Since this is your homework, I'll give you a couple of hints and maybe you can solve it yourself:
The format for subString is public void subString(int beginIndex, int endIndex). As per the javadoc for this method:
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at
index endIndex - 1. Thus the length of the substring is
endIndex-beginIndex.
Note that if you've to use this method, understand that you'll have to shift your beginIndex and endIndex each time because in your situation, you'll have multiple words that are separated by #.
However if you look closely, there's another method in String class that might be helpful to you. That's the public String[] split(String regex) method. The javadoc for this one states:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
The split() method looks pretty interesting for your case. You can split your String with the delimiter that you have as the parameter to this method, get the String array and work with that.
Hope this helps you to understand your problem and get started towards a solution :)

Since this is a home work, it may be better to have try to write it your self. But I will give a clue.
Clue:
The indexOf method has another overload: int indexOf(int chr,
int fromIndex) which find the first character chr in the string
from the fromIndex.
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html
From this clue, the program will look something like this:
Find the index of the first '#' from the start of the string.
Extract the word from 0th character to that index.
Find the index of the first '#' from the character AFTER the first '#'.
Extract the word from the first '#' that index.
... Just do it until you get 4 words or the string ends.
Hope this helps.

I don't know why you're forced to use String#substring, but as others have mentioned, it seems like the wrong method for the kind of functionality you need.
String#split(String regex) is what you would use for such a problem, or, if your input sequence is something you don't control, I would suggest you look at the overloaded method String#split(String regex, int limit); this way you can impose a limit on the amount of matches you make, controlling your resulting array.

Related

Deleting content of every string after first empty space

How can I delete everything after first empty space in a string which user selects? I was reading this how to remove some words from a string in java. Can this help me in my case?

You can use replaceAll with a regex \s.* which match every thing after space:
String str = "Hello java word!";
str = str.replaceAll("\\s.*", "");
output
Hello
regex demo
Like #Coffeehouse Coder mention in comment, This solution will replace every thing if the input start with space, so if you want to avoid this case, you can trim your input using string.trim() so it can remove the spaces in start and in end.

Assuming that there is no space in the beginning of the string.
Follow these steps-
Split the string at space. It will create an array.
Get the first element of that array.
Hope this helps.
str = "Example string"
String[] _arr = str.split("\\s");
String word = _arr[0];
You need to consider multiple white spaces and space in the beginning before considering the above code.
I am not native to JAVA Programming but have an idea that it has split function for string.
And the reference you cited in the question is bit complex, while you can achieve the desired thing very easily.
P.S. In future if you make a mind to get two words or three, splitting method is better (assuming you have already dealt with multiple white-spaces) else substring is better.

A simple way to do it can be:
System.out.println("Hello world!".split(" ")[0]);

// Taking 'str' as your string
// To remove the first space(s) of the string,
str = str.trim();
int index = str.indexOf(" ");
String word = str.substring(0, index);
This is just one method of many.
str = str.replaceAll("\\s+", " "); // This replaces one or more spaces with one space
String[] words = str.split("\\s");
String first = words[0];

The simplest solution in my opinion would be to just locate the index which the user wants it to be cut off at and then call the substring() method from 0 to the index they wanted. Set that = to a new string and you have the string they want.
If you want to replace the string then just set the original string = to the result of the substring() method.
Link to substring() method: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int,%20int)

There are already 5 perfectly good answers, so let me add a sixth one. Variety is the spice of life!
private static final Pattern FIRST_WORD = Pattern.compile("\\S+");
public static String firstWord(CharSequence text) {
Matcher m = FIRST_WORD.matcher(text);
return m.find() ? m.group() : "";
}
Advantages over the .split(...)[0]-type answers:
It directly does exactly what is being asked, i.e. "Find the first sequence of non-space characters." So the self-documentation is more explicit.
It is more efficient when called on multiple strings (e.g. for batch processing a large list of strings) because the regular expression is compiled only once.
It is more space-efficient because it avoids unnecessarily creating a whole array with references to each word when we only need the first.
It works without having to trim the string.
(I know this is probably too late to be of any use to the OP but I'm leaving it here as an alternative solution for future readers.)

This would be more efficient
String str = "Hello world!";
int spaceInd = str.indexOf(' ');
if(spaceInd != -1) {
str = str.substring(0, spaceInd);
}
System.out.println(String.format("[%s]", str));

How to get the desired character from the variable sized strings?

I need to extract the desired string which attached to the word.
For example
pot-1_Sam
pot-22_Daniel
pot_444_Jack
pot_5434_Bill
I need to get the names from the above strings. i.e Sam, Daniel, Jack and Bill.
Thing is if I use substring the position keeps on changing due to the length of the number. How to achieve them using REGEX.
Update:
Some strings has 2 underscore options like
pot_US-1_Sam
pot_RUS_444_Jack

Assuming you have a standard set of above formats, It seems you need not to have any regex, you can try using lastIndexOf and substring methods.
String result = yourString.substring(yourString.lastIndexOf("_")+1, yourString.length());

Your answer is:
String[] s = new String[4];
s[0] = "pot-1_Sam";
s[1] = "pot-22_Daniel";
s[2] = "pot_444_Jack";
s[3] = "pot_5434_Bill";
ArrayList<String> result = new ArrayList<String>();
for (String value : s) {
String[] splitedArray = value.split("_");
result.add(splitedArray[splitedArray.length-1]);
}
for(String resultingValue : result){
System.out.println(resultingValue);
}

You have 2 options:
Keep using the indexOf method to get the index of the last _ (This assumes that there is no _ in the names you are after). Once that you have the last index of the _ character, you can use the substring method to get the bit you are after.
Use a regular expression. The strings you have shown essentially have the pattern where in you have numbers, followed by an underscore which is in turn followed by the word you are after. You can use a regular expression such as \\d+_ (which will match one or more digits followed by an underscore) in combination with the split method. The string you are after will be in the last array position.

Use a string tokenizer based on '_' and get the last element. No need for REGEX.
Or use the split method on the string object like so :
String[] strArray = strValue.split("_");
String lastToken = strArray[strArray.length -1];

String[] s = {
"pot-1_Sam",
"pot-22_Daniel",
"pot_444_Jack",
"pot_5434_Bill"
};
for (String e : s)
System.out.println(e.replaceAll(".*_", ""));

why split() produces extra , after sets limit -1

I want to split Area Code and preceding number from Telephone number without brackets so i did this.
String pattern = "[\\(?=\\)]";
String b = "(079)25894029".trim();
String c[] = b.split(pattern,-1);
for (int a = 0; a < c.length; a++)
System.out.println("c[" + a + "]::->" + c[a] + "\nLength::->"+ c[a].length());
Output:
c[0]::-> Length::->0
c[1]::->079 Length::->3
c[2]::->25894029 Length::->8
Expected Output:
c[0]::->079 Length::->3
c[1]::->25894029 Length::->8
So my question is why split() produces and extra blank at the start, e.g
[, 079, 25894029]. Is this its behavior, or I did something go wrong here?
How can I get my expected outcome?

First you have unnecessary escaping inside your character class. Your regex is same as:
String pattern = "[(?=)]";
Now, you are getting an empty result because ( is the very first character in the string and split at 0th position will indeed cause an empty string.
To avoid that result use this code:
String str = "(079)25894029";
toks = (Character.isDigit(str.charAt(0))? str:str.substring(1)).split( "[(?=)]" );
for (String tok: toks)
System.out.printf("<<%s>>%n", tok);
Output:
<<079>>
<<25894029>>

From the Java8 Oracle docs:
When there is a positive-width match at the beginning of this string
then an empty leading substring is included at the beginning of the
resulting array. A zero-width match at the beginning however never
produces such empty leading substring.
You can check that the first character is an empty string, if yes then trim that empty string character.

Your regex has problems, as does your approach - you can't solve it using your approach with any regex. The magic one-liner you seek is:
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
This removes all leading/trailing non-digits, then splits on non-digits. This will handle many different formats and separators (try a few yourself).
See live demo of this:
String b = "(079)25894029".trim();
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
System.out.println(Arrays.toString(c));
Producing this:
[079, 25894029]

Making only the first letter of a word uppercase

I have a method that converts all the first letters of the words in a sentence into uppercase.
public static String toTitleCase(String s)
{
String result = "";
String[] words = s.split(" ");
for (int i = 0; i < words.length; i++)
{
result += words[i].replace(words[i].charAt(0)+"", Character.toUpperCase(words[i].charAt(0))+"") + " ";
}
return result;
}
The problem is that the method converts each other letter in a word that is the same letter as the first to uppercase. For example, the string title comes out as TiTle
For the input this is a title this becomes the output This Is A TiTle
I've tried lots of things. A nested loop that checks every letter in each word, and if there is a recurrence, the second is ignored. I used counters, booleans, etc. Nothing works and I keep getting the same result.
What can I do? I only want the first letter in upper case.

Instead of using the replace() method, try replaceFirst().
result += words[i].replaceFirst(words[i].charAt(0)+"", Character.toUpperCase(words[i].charAt(0))+"") + " ";
Will output:
This Is A Title

The problem is that you are using replace method which replaces all occurrences of described character. To solve this problem you can either
use replaceFirst instead
take first letter,
create its uppercase version
concatenate it with rest of string which can be created with a little help of substring method.
since you are using replace(String, String) which uses regex you can add ^ before character you want to replace like replace("^a","A"). ^ means start of input so it will only replace a that is placed after start of input.
I would probably use second approach.
Also currently in each loop your code creates new StringBuilder with data stored in result, append new word to it, and reassigns result of output from toString().
This is infective approach. Instead you should create StringBuilder before loop that will represent your result and append new words created inside loop to it and after loop ends you can get its String version with toString() method.

Doing some Regex-Magic can simplify your task:
public static void main(String[] args) {
final String test = "this is a Test";
final StringBuffer buffer = new StringBuffer(test);
final Pattern patter = Pattern.compile("\\b(\\p{javaLowerCase})");
final Matcher matcher = patter.matcher(buffer);
while (matcher.find()) {
buffer.replace(matcher.start(), matcher.end(), matcher.group().toUpperCase());
}
System.out.println(buffer);
}
The expression \\b(\\p{javaLowerCase}) matches "The beginning of a word followed by a lower-case letter", while matcher.group() is equal to whats inside the () in the part that matches. Example: Applying on "test" matches on "t", so start is 0, end is 1 and group is "t". This can easily run through even a huge amount of text and replace all those letters that need replacement.
In addition: it is always a good idea to use a StringBuffer (or similar) for String manipulation, because each String in Java is unique. That is if you do something like result += stringPart you actually create a new String (equal to result + stringPart) each time this is called. So if you do this with like 10 parts, you will in the end have at least 10 different Strings in memory, while you only need one, which is the final one.
StringBuffer instead uses something like char[] to ensure that if you change only a single character no extra memory needs to be allocated.
Note that a patter only need to be compiled once, so you can keep that as a class variable somewhere.

why does this for loop wordcount method not work in java

Can anyone let me know why this wordsearch method doesn't work - the returned value of count is 0 everytime I run it.
public int wordcount(){
String spaceString = " ";
int count = 0;
for(int i = 0; i < this.getString().length(); i++){
if (this.getString().substring(i).equals(spaceString)){
count++;
}
}
return count;
}
The value of getString = my search string.
Much appreciated if anyone can help - I'm sure I'm prob doing something dumb.
Dylan

Read the docs:
The substring begins with the character at the specified index and extends to the end of this string.
Your if condition is only true once, if the last character of the string is a space. Perhaps you wanted charAt? (And even this won't properly handle double spaces; splitting on whitespace might be a better option.)

Because substring with only one argument returns the sub string starting from that index till the end of the string. So you're not comparing just one character.
Instead of substring define spaceString as a char, and use charAt(i)

this.getString().substring(i) -> this returns a sub string from the index i to the end of the String
So for example if your string was Test the above would return Test, est, st and finally t
For what you're trying to do there are alternative methods, but you could simple replace
this.getString().substring(i)
with
spaceString.equals(this.getString().charAt(i))
An alternative way of doing what you're trying to do is:
this.getString().split(spaceString)
This would return an array of Strings - the original string broken up by spaces.

Read the documentation of the method you are using:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int)
I.e. the count will be non zero only if you have a space on the end of your string

Using substring as you are will not work. If the value of getString() is "my search string" every iteration through the loop with have substring(i) return:
my search string
y search string
search string
search string
earch string
arch string
rch string
ch string
h string
string
string
tring
ring
ing
ng
g
Notice none of those equals " ".
Try using split.
public int countWords(String s){
return s.split("\\s+").length;
}

Change
if (this.getString().substring(i).equals(spaceString))
to
if (this.getString().charAt(i) == ' ')

this.getString().substring(i) returns a string from the index of (i) to the end of the string.
Example: for i=5, it will return "rown cow" from the string "the brown cow". This functionality isn't what you need.
If you pepper System.out.println() throughout your code (or use the debugger), you will see this.
I think it would be better to use something like String.split() or charAt(i).
By the way, even if you fix your code by counting spaces, it will not return the correct value for these conditions: "my dog" (word count=2) and "cow" (word count=1). There is also a problem if there are more than one space between words. ALso, this will produce a word cound of three:
" the cow ".

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding multiple substrings using boundaries in Java - java

Related

Deleting content of every string after first empty space

How to get the desired character from the variable sized strings?

why split() produces extra , after sets limit -1

Making only the first letter of a word uppercase

why does this for loop wordcount method not work in java

Categories

Resources