How to extract words from a string in Java

How to extract words from a string in Java - java

I've imported a file and turned it into a String called readFile. The file contains two lines:
qwertyuiop00%
qwertyuiop
I have already extracted the "00" from the string using:
String number = readFile.substring(11, 13);
I now want to extract the "ert" and the "uio" in "qwertyuiop"
When I try to use the same method as the first, like so:
String e = readFile.substring(16, 19);
String u = readFile.substring(20, 23);
and try to use:
System.out.println(e + "and" + u);
It says string index out of range.
How do I go about this?
Is it because the next two words I want to extract from the string are on the second line?
If so, how do I extract only the second line?
I want to keep it basic, thanks.
UPDATE:
it turns out only the first line of the file is being read, does anyone know how to make it so it reads both lines?

If you count the total number of characters for each string, they are more than the indexes your entering.
qwertyuiop00% is 13 characters. Call .length() method on the string to verify the length is the one you expect.
I would debug with adding the following before:
System.out.println(readFile);
System.out.println(readFile.length());
Note:
qwertyuiop00% qwertyuiop is 24 characters since space counts as a character. Unless ofcourse you don't have the space in which it's 23 characters and your indexes are 0 to 22
Note2:
I asked for the parser code since I suspect your using the usual code which is something like:
while ((line = reader.readLine()) != null)
You need to concatenate those lines into one String (though it's not the best approach).
see: How do I create a Java string from the contents of a file?

First split your string into lines, you could do this using
String[] lines = readFile.split("[\r\n]+");
You may want to read the content directly into a List<String> using Files.#readAllLines instead.
second, do not use hard coded indexes, use String#indexOf to find them out. If a substring does not occur in your original string, then the method retunrs -1, always check for that value and call substring only when the return value is not -1 (0 or greater).
if(lines.length > 1) {
int startIndex = lines[1].indexOf("ert");
if(startIndex != -1) {
// do what you want
}
}
Btw, there is no point in extracting already known substring from a string
System.out.println(e + "and" + u);
is equivalent to
System.out.println("ertanduio");
Knowing the start and end position of a fixed substring makes only sence if you want to do something with rest of original string, for example removing the substrings.

You may give this a try:-
Scanner sc=new Scanner(new FileReader(new File(The file path for readFile.txt)));
String st="";
while(sc.hasNext()){
st=sc.next();
}
System.out.println(st.substring(2,5)+" "+"and"+" "+st.substring(6,9));
Check out if it works.

Related

Removing letters from a string using a while loop in Java

while (sentence.indexOf(lookFor) > lookFor)
{
sentence += sentence.substring(sentence.indexOf(lookFor));
}
String cleaned = sentence;
return cleaned;
This is what I have tried to do in order to remove letters. lookFor is a char that was put in already, and sentence is the original sentence string that was put in already. Currently, my code outputs the sentence without doing anything to it.
EX Correct Output: inputting "abababa" sentence; char as "a" --->outputting "bbb"
inputting "xyxyxy" sentence; char "a" ---> outputting "xyxyxy"

You don't need while for a single string. Only if you read a text line after line.
In your case something like
String a = "abababa";
a = a.replace("a","");
would give you the output "bbb"

it probably isn't entering the loop at all.
sentence.indexOf(lookFor) is going to return the place of the character in the string.
lookFor is a char value. A value of 'a' has a numeric value of 97 so the while will only find things after the first 97 characters.
If your code ever entered the loop it would never return.
the substring command you are calling will take the found item to the end of the string.
+=, if it did what you think will append it to itself. so it will take 'ababab' and make it 'abababababab', forever. but luckily you can't use += on a string in java.
What you want is:
String something = "abababab";
something = something.replaceAll("a", "");

If you just need to get rid of letters use the replace method that others have written, but if you want to use a while loop, based on what I've seen of your logic, this is how you'd do it.
while (sentence.indexOf(lookFor) == 0)
sentence = sentence.substring(1);
while (sentence.indexOf(lookFor) > 0)
{
sentence = sentence.substring(0, sentence.indexOf(lookFor)-1)+
sentence.substring(sentence.indexOf(lookFor)+1);
}
return sentence;

Java - Changing multiple words in a string at once?

I'm trying to create a program that can abbreviate certain words in a string given by the user.
This is how I've laid it out so far:
Create a hashmap from a .txt file such as the following:
thanks,thx
your,yr
probably,prob
people,ppl
Take a string from the user
Split the string into words
Check the hashmap to see if that word exists as a key
Use hashmap.get() to return the key value
Replace the word with the key value returned
Return an updated string
It all works perfectly fine until I try to update the string:
public String shortenMessage( String inMessage ) {
String updatedstring = "";
String rawstring = inMessage;
String[] words = rawstring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
for (String word : words) {
System.out.println(word);
if (map.containsKey(word) == true) {
String x = map.get(word);
updatedstring = rawstring.replace(word, x);
}
}
System.out.println(updatedstring);
return updatedstring;
}
Input:
thanks, your, probably, people
Output:
thanks, your, probably, ppl
Does anyone know how I can update all the words in the string?
Thanks in advance

updatedstring = rawstring.replace(word, x);
This keeps replacing your updatedstring with the rawstring with a the single replacement.
You need to do something like
updatedstring = rawstring;
...
updatedString = updatedString.replace(word, x);
Edit:
That is the solution to the problem you are seeing but there are a few other problems with your code:
Your replacement won't work for things that you needed to lowercased or remove characters from. You create the words array that you iterate from altered version of your rawstring. Then you go back and try to replace the altered versions from your original rawstring where they don't exist. This will not find the words you think you are replacing.
If you are doing global replacements, you could just create a set of words instead of an array since once the word is replaced, it shouldn't come up again.
You might want to be replacing the words one at a time, because your global replacement could cause weird bugs where a word in the replacement map is a sub word of another replacement word. Instead of using String.replace, make an array/list of words, iterate the words and replace the element in the list if needed and join them. In java 8:
String.join(" ", elements);

Replace with empty space

I have done this before, but now I encounter a different problem. I want to extract just the digits at the end "Homework 1: 89", which is in a .txt file. As said I usually used ".replaceAll("[\D]", "")"* . But if I do it ths time, the number before the colon (1 in example) stays... I cannot see of any solution.
it Should look like this:
while (dataSc.hasNextLine()) {
String data = dataSc.nextLine();
ArrayData.add(i, data);
if (data.contains("Homework ")) {
idData.add(a, data);
idData.set(a, (idData.get(a).replaceAll("[\\D]", "")));
Output being, A new string with Just "89"...

Thanks for editing your question.
If you are simply trying to get the end whenever there is the word homework and you can count on the consistent format you can do the following:
String[] tokens = data.split(": ");
System.out.println(tokens[1]);
So if your looking in your code you would be wanting to place this in your if statement where you are trying to get only the numbers after the colon from data.
What the code does it breaks your string into multiple components, breaking it whenever it sees ": ".
In your example of "Homework 1: 89" it will break your data into two "tokens":
1:"Homework 1"
2:"89"
So when accessing the tokens array we access variable tokens[1] because the index starts at 0.

Use below code
1) String str="Homework 1: 89";
str = str.replaceAll("\\D+","");
2) String str="sdfvsdf68fsdfsf8999fsdf09";
String numberOnly= str.replaceAll("[^0-9]", "");
System.out.println(numberOnly);

How to delete duplicated characters in a string?

Okay, I'm a huge newbie in the world of java and I can't seem to get this program right. I am suppose to delete the duplicated characters in a 2 worded string and printing the non duplicated characters.
for example:I input the words "computer program." the output should be "cute" because these are the only char's that are not repeated.
I made it until here:
public static void main(String[] args) {
System.out.print("Input two words: ");
String str1 = Keyboard.readString();
String words[] = str1.split(" ");
String str2 = words[0] + " ";
String str3 = words[words.length - 1] ;
}
but i don't know how to output the characters. Could someone help me?
I don't know if I should use if, switch, for, do, or do-while...... I'm confused.

what you need is to build up logic for your problem. First break the problem statement and start finding solution for that. Here you go for steps,
Read every character from a string.
Add it to a collection, but before adding that, just check whether it exists.
If it exists just remove it and continue the reading of characteer.
Once you are done with reading the characters, just print the contents of collection to console using System.out.println.
I will recommend you to refer books like "Think like A Programmer". This will help you to get started with logic building.

Just a hint: use a hash map (http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html).

Adding following code after last line of your main program will resolve your issue.
char[] strChars = str2.toCharArray();
String newStr="";
for (char c : strChars) {
String charStr = ""+c;
if(!str3.contains(charStr.toLowerCase()) && !str3.contains(charStr.toUpperCase())){
newStr+=c;
}
}
System.out.println(newStr);
This code loops through all the characters of the first word and check if the second string contains that character (In any form of case Lower or Upper). If it is not containing, adding it to output string and at the end printing it.
Hope this will work in your case.

How about doing it in just 1 line?
str = str.replaceAll("(.)(?=.*\\1)", "");

Finding multiple substrings using boundaries in Java

Alright so here is my problem. Basically I have a string with 4 words in it, with each word seperated by a #. What I need to do is use the substring method to extract each word and print it out. I am having trouble figuring out the parameters for it though. I can always get the first one right, but the following ones generally have problems.
Here is the first piece of the code:
word = format.substring( 0 , format.indexOf('#') );
Now from what I understand this basically means start at the beginning of the string, and end right before the #. So using the same logic, I tried to extract the second word like so:
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#') );
//The plus one so I don't start at the #.
But with this I continually get errors saying it doesn't exist. I figured that the compiler was trying to read the first # before the second word, so I rewrote it like so:
wordTwo = format.substring (wordlength + 1, 1 + wordLength + format.indexOf('#') );
And with this it just completely screws it up, either not printing the second word or not stopping in the right place. If I could get any help on the formatting of this, it would be greatly appreciated. Since this is for a class, I am limited to using very basic methods such as indexOf, length, substring etc. so if you could refrain from using anything to complex that would be amazing!

If you have to use substring then you need to use the variant of indexOf that takes a start. This means you can start look for the second # by starting the search after the first one. I.e.
wordTwo = format.substring ( wordlength + 1 , format.indexOf('#', wordlength + 1 ) );
There are however much better ways of splitting a string on a delimiter like this. You can use a StringTokenizer. This is designed for splitting strings like this. Basically:
StringTokenizer tok = new StringTokenizer(format, "#");
String word = tok.nextToken();
String word2 = tok.nextToken();
String word3 = tok.nextToken();
Or you can use the String.split method which is designed for splitting strings. e.g.
String[] parts = String.split("#");
String word = parts[0];
String word2 = parts[1];
String word3 = parts[2];

You can go with split() for this kind of formatting strings.
For instance if you have string like,
String text = "Word1#Word2#Word3#Word4";
You can use delimiter as,
String delimiter = "#";
Then create an string array like,
String[] temp;
For splitting string,
temp = text.split(delimiter);
You can get words like this,
temp[0] = "Word1";
temp[1] = "Word2";
temp[2] = "Word3";
temp[3] = "Word4";

Use split() method to do this with "#" as the delimiter
String s = "hi#vivek#is#good";
String temp = new String();
String[] arr = s.split("#");
for(String x : arr){
temp = temp + x;
}
Or if you want to exact each word... you have it already in arr
arr[0] ---> First Word
arr[1] ---> Second Word
arr[2] ---> Third Word

I suggest that you've a look at the Javadoc for String before you proceed further.
Since this is your homework, I'll give you a couple of hints and maybe you can solve it yourself:
The format for subString is public void subString(int beginIndex, int endIndex). As per the javadoc for this method:
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at
index endIndex - 1. Thus the length of the substring is
endIndex-beginIndex.
Note that if you've to use this method, understand that you'll have to shift your beginIndex and endIndex each time because in your situation, you'll have multiple words that are separated by #.
However if you look closely, there's another method in String class that might be helpful to you. That's the public String[] split(String regex) method. The javadoc for this one states:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
The split() method looks pretty interesting for your case. You can split your String with the delimiter that you have as the parameter to this method, get the String array and work with that.
Hope this helps you to understand your problem and get started towards a solution :)

Since this is a home work, it may be better to have try to write it your self. But I will give a clue.
Clue:
The indexOf method has another overload: int indexOf(int chr,
int fromIndex) which find the first character chr in the string
from the fromIndex.
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html
From this clue, the program will look something like this:
Find the index of the first '#' from the start of the string.
Extract the word from 0th character to that index.
Find the index of the first '#' from the character AFTER the first '#'.
Extract the word from the first '#' that index.
... Just do it until you get 4 words or the string ends.
Hope this helps.

I don't know why you're forced to use String#substring, but as others have mentioned, it seems like the wrong method for the kind of functionality you need.
String#split(String regex) is what you would use for such a problem, or, if your input sequence is something you don't control, I would suggest you look at the overloaded method String#split(String regex, int limit); this way you can impose a limit on the amount of matches you make, controlling your resulting array.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to extract words from a string in Java - java

You may give this a try:- Scanner sc=new Scanner(new FileReader(new File(The file path for readFile.txt))); String st=""; while(sc.hasNext()){ st=sc.next(); } System.out.println(st.substring(2,5)+" "+"and"+" "+st.substring(6,9)); Check out if it works.

Related

Removing letters from a string using a while loop in Java

Java - Changing multiple words in a string at once?

Replace with empty space

How to delete duplicated characters in a string?

Finding multiple substrings using boundaries in Java

Categories

Resources