How should I get the total no. of substrings in a string.
For all substrings in a string.
Ex:
str="This is this my book is This"
O/p should like below:
This-3
Is=2
my=1
book=1
If I understood you correctly this is a solution for your problem:
String str="This is this my book is This";
Map<String, Integer> counts = new HashMap<String, Integer>();
String[] words = str.toLowerCase().split("[\\s\\.,;!\\?]");
for (String word: words) {
int count = counts.containsKey(word) ? counts.get(word).intValue() : 0;
counts.put(word, Integer.valueOf(count + 1));
}
You just split the string by the delimiters you want to consider and collect the occurrences in a map.
If I'm right you want to search for the occurrences of all words, not all possible substrings. A very small, easy to understand, code would be the following:
// Split at space
String[] words = input.split(" ");
HashMap<String, Integer> countingMap = new HashMap<>();
for (String word : words) {
Integer counter = countingMap.get(word);
if (counter == null)) {
counter = 0;
}
countingMap.put(word, counter + 1);
}
However, this approach is limited as it assumes each word is surrounded by a space.
Regex is a more powerful tool, it provides a special character for a word boundary (this also matches ,.!? and so on). Consider the following Pattern:
\b(.+?)\b
You can see an example here: regex101.com/r/hO8kA0/1
How to do this in Java?
Pattern pattern = Pattern.compile("\\b(.+?)\\b");
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
String word = matcher.group(1);
// Here is your word, count the occurrences like above
}
String str="This is this my book is This";
String[] words = str.split(" ");
Map<String,Integer> unitwords = new HashMap<String,Integer>;
for(String word: words){
if(unitwords.containsKey(word)){
unitwords[word]++;
}else{
unitwords.add(word,1);
}
And print the map unitwords.
Related
I want to count the occurrence of multiple substrings in a string.
I am able to do so by using the following code:
int score = 0;
String text = "This is some random text. This is some random text.";
List<String> words = Arrays.asList("this", "is", "for", "stackoverflow");
for (String word : words) {
if(StringUtils.containsIgnoreCase(text, word)){
score += 1;
}
}
My algorithm increases the score by +1 for each word from my "words" list that occurs in the text.
In my example, the score would be 2 because "this" and "is" occur in the text.
However, my code has to loop through the text for each string in my list.
Is there a faster way to do this?
How about the following:
String text = "This is some random text. This is some random text.";
text = text.toLowerCase();
String[] tokens = text.split("\\PL+");
java.util.Set<String> source = new java.util.HashSet<>();
for (String token : tokens) {
source.add(token);
}
java.util.List<String> words = java.util.Arrays.asList("this", "is", "for", "stackoverflow");
source.retainAll(words);
int score = source.size();
Split text into words.
Add the words to a Set so that each word appears only once. Hence Set will contain the word this only once despite the fact that the word this appears twice in text.
After calling method retainAll, the Set only contains words that are in the words list. Hence your score is the number of elements in the Set.
The fastest way would be to map each words of the text.
Therefore for each word in the words that you are searching for, you just have to look up for the keys in the hashmap.
Given your text has n word and your words has m words
The solution would take O(n+m) instead of O(n*m)
This is a case where regex is your friend:
public static Map<String, Integer> countTokens(String src, List<String> tokens) {
Map<String, Integer> countMap = new HashMap<>();
String end = "(\\s|$)"; //trap whitespace or the end of the string.
String start = "(^|\\s)"; //trap for whitespace or the start of a the string
//iterate through your tokens
for (String token : tokens) {
//create a pattern, note the case insensitive flag
Pattern pattern = Pattern.compile(start+token+end, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(src);
int cnt = 0;
//count your matches.
while(matcher.find()) {
cnt++;
}
countMap.put(token, cnt);
}
return countMap;
}
public static void main(String[] args) throws IOException {
String text = "This is some random text. This is some random text.";
List<String> words = Arrays.asList("this", "is", "for", "stackoverflow");
for (Entry<String, Integer> entry : countTokens(text, words).entrySet()) {
System.out.println(entry);
}
}
If you want to find tokens within token, like "is" within "this", simply remove the start and end regex.
You can use split method, to convert string to Array of Strings, sort it, and then you can binary search the elements of list in the array this has been implemented in the given code.
String[] wordsArr = "This is some random text. This is some random text.".toLowerCase().split(" ");
List<String> words = Arrays.asList("this", "is", "for", "stackoverflow");
int count = 0;
Arrays.sort(wordsArr);
for(String word: words)
if(Arrays.binarySearch( wordsArr, word )>-1)
count++;
Another good approach can be to use a TreeSet, this one I got inspiration from #Abra
String[] wordsArr = "This is some random text. This is some random text.".toLowerCase().split(" ");
List<String> words = Arrays.asList("this", "is", "for", "stackoverflow");
TreeSet<String> setOfWords = new TreeSet<String>(Arrays.asList(wordsArr));
int count = 0;
for(String word: words)
if(setOfWord.contains(word))
count++;
Both these methods have a Time Complexity of O(Nlog(M)), N being the size of words array, M being the size of wordsArr or setOfWords, However do be careful, using this method since this does have one flaw, which is quite obvious, It doesn't account for periods, thus, if you were to search for "text", it won't be found, because, The set/array contains "text.". You can get around that by removing all the punctuations from your initial text and searched text, however, if you do want it to be accurate then, you can set the regex string in split() to be "[^a-zA-Z]" this will split your String around non alphabetical characters.
I have a question about replacing words. I have some strings, each of which looks like this:
String string = "today is a (happy) day, I would like to (explore) more about Java."
I need to replace the words that have parentheses. I want to replace "(happy)" with "good", and "(explore)" with "learn".
I have some ideas, but I don't know how.
for (int i = 0; i <= string.length(), i++) {
for (int j = 0; j <= string.length(), j++
if ((string.charAt(i)== '(') && (string.charAt(j) == ')')) {
String w1 = line.substring(i+1,j);
string.replace(w1, w2)
}
}
}
My problem is that I can only replace one word with one new word...
I am thinking of using a scanner to prompt me to give a new word and then replace it, how can I do this?
The appendReplacement and appendTail methods of Matcher are designed for this purpose. You can use a regex to scan for your pattern--a pair of parentheses with a word in the middle--then do whatever you need to do to determine the string to replace it with. See the javadoc.
An example, based on the example in the javadoc. I'm assuming you have two methods, replacement(word) that tells what you want to replace the word with (so that replacement("happy") will equal "good" in your example), and hasReplacement(word) that tells whether the word has a replacement or not.
Pattern p = Pattern.compile("\\((.*?)\\)");
Matcher m = p.matcher(source);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String word = m.group(1);
String newWord = hasReplacement(word) ? replacement(word) : m.group(0);
m.appendReplacement(sb, newWord); // appends the replacement, plus any not-yet-used text that comes before the match
}
m.appendTail(sb); // appends any text left over after the last match
String result = sb.toString();
Use below code for replacing the string.
String string = "today is a (happy) day, I would like to (explore) more about Java.";
string = string.replaceAll("\\(happy\\)", "good");
string = string.replaceAll("\\(explore\\)", "learn");
System.out.println(string);`
What you can do is run a loop from 0 to length-1 and if loop encounters a ( then assign its index to a temp1 variable. Now go on further as long as you encounter ).Assign its index to temp2 .Now you can replace that substring using string.replace(string.substring(temp1+1,temp2),"Your desired string")).
No need to use the nested loops. Better use one loop and store the index when you find opening parenthesis and also for close parenthesis and replace it with the word. Continue the same loop and store next index. As you are replacing the words in same string it changes the length of string you need to maintain copy of string and perform loop and replace on different,
Do not use nested for loop. Search for occurrences of ( and ). Get the substring between these two characters and then replace it with the user entered value. Do it till there are not more ( and ) combinations left.
import java.util.Scanner;
public class ReplaceWords {
public static String replaceWords(String s){
while(s.contains(""+"(") && s.contains(""+")")){
Scanner keyboard = new Scanner(System.in);
String toBeReplaced = s.substring(s.indexOf("("), s.indexOf(")")+1);
System.out.println("Enter the word with which you want to replace "+toBeReplaced+" : ");
String replaceWith = keyboard.nextLine();
s = s.replace(toBeReplaced, replaceWith);
}
return s;
}
public static void main(String[] args) {
String myString ="today is a (happy) day, I would like to (explore) more about Java.";
myString = replaceWords(myString);
System.out.println(myString);
}
}
This snippet works for me, just load the HashMap up with replacements and iterate through:
import java.util.*;
public class Test
{
public static void main(String[] args) {
String string = "today is a (happy) day, I would like to (explore) more about Java.";
HashMap<String, String> hm = new HashMap<String, String>();
hm.put("\\(happy\\)", "good");
hm.put("\\(explore\\)", "learn");
for (Map.Entry<String, String> entry : hm.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
string = string.replaceAll(key, value);
}
System.out.println(string);
}
}
Remember, replaceAll takes a regex, so you want it to display "\(word\)", which means the slashes themselves must be escaped.
I have an ArrayList of Strings and it contains records such as:
this is a first sentence
hello my name is Chris
what's up man what's up man
today is tuesday
I need to clear this list, so that the output does not contain repeated content. In the case above, the output should be:
this is a first sentence
hello my name is Chris
what's up man
today is tuesday
as you can see, the 3rd String has been modified and now contains only one statement what's up man instead of two of them.
In my list there is a situation that sometimes the String is correct, and sometimes it is doubled as shown above.
I want to get rid of it, so I thought about iterating through this list:
for (String s: myList) {
but I cannot find a way of eliminating duplicates, especially since the length of each string is not determined, and by that I mean there might be record:
this is a very long sentence this is a very long sentence
or sometimes short ones:
single word singe word
is there some native java function for that maybe?
Assuming the String is repeated just twice, and with an space in between as in your examples, the following code would remove repetitions:
for (int i=0; i<myList.size(); i++) {
String s = myList.get(i);
String fs = s.substring(0, s.length()/2);
String ls = s.substring(s.length()/2+1, s.length());
if (fs.equals(ls)) {
myList.set(i, fs);
}
}
The code just split each entry of the list into two substrings (dividing by the half point). If both are equal, substitute the original element with only one half, thus removing the repetition.
I was testing the code and did not see #Brendan Robert answer. This code follows the same logic as his answer.
I would suggest using regular expressions. I was able to remove duplicates using this pattern: \b([\w\s']+) \1\b
public class Main {
static String [] phrases = {
"this is a first sentence",
"hello my name is Chris",
"what's up man what's up man",
"today is tuesday",
"this is a very long sentence this is a very long sentence",
"single word single word",
"hey hey"
};
public static void main(String[] args) throws Exception {
String duplicatePattern = "\\b([\\w\\s']+) \\1\\b";
Pattern p = Pattern.compile(duplicatePattern);
for (String phrase : phrases) {
Matcher m = p.matcher(phrase);
if (m.matches()) {
System.out.println(m.group(1));
} else {
System.out.println(phrase);
}
}
}
}
Results:
this is a first sentence
hello my name is Chris
what's up man
today is tuesday
this is a very long sentence
single word
hey
Assumptions:
Uppercase words are equal to lowercase counterparts.
String fullString = "lol lol";
String[] words = fullString.split("\\W+");
StringBuilder stringBuilder = new StringBuilder();
Set<String> wordsHashSet = new HashSet<>();
for (String word : words) {
// Check for duplicates
if (wordsHashSet.contains(word.toLowerCase())) continue;
wordsHashSet.add(word.toLowerCase());
stringBuilder.append(word).append(" ");
}
String nonDuplicateString = stringBuilder.toString().trim();
simple logic : split every word by token space i.e " " and now add it in LinkedHashSet , Retrieve back, Replace "[","]",","
String s = "I want to walk my dog I want to walk my dog";
Set<String> temp = new LinkedHashSet<>();
String[] arr = s.split(" ");
for ( String ss : arr)
temp.add(ss);
String newl = temp.toString()
.replace("[","")
.replace("]","")
.replace(",","");
System.out.println(newl);
o/p : I want to walk my dog
It depends on the situation that you have but assuming that the string can be repeated at most twice and not three or more times you could find the length of the entire string, find the halfway point and compare each index after the halfway point with the matching beginning index. If the string can be repeated more than once you will need a more complicated algorithm that would first determine how many times the string is repeated and then finds the starting index of each repeat and truncates all index's from the beginning of the first repeat onward. If you can provide some more context into what possible scenarios you expect to handle we can start putting together some ideas.
//Doing it in Java 8
String str1 = "I am am am a good Good coder";
String[] arrStr = str1.split(" ");
String[] element = new String[1];
return Arrays.stream(arrStr).filter(str1 -> {
if (!str1.equalsIgnoreCase(element[0])) {
element[0] = str1;
return true;
}return false;
}).collect(Collectors.joining(" "));
How can I find the last word of a string? I am not trying to find a fixed word, in other words, I would not know what the last word is, however I want to retrieve it.
Here is my code:
myString = myString.trim();
String[] wordList = myString.split("\\s+");
System.out.println(wordList[wordList.length-1]);
Providing you consider words in a sentence to be delimited by whitespace and punctuation (particularly commas, spaces, new lines, brackets, and so on), which means punctuation can appear at the end of the sentence, and you want to include non-ASCII characters in the words, then the following will find you the last word in a string without the punctuation included:
static String lastWord(String sentence) {
Pattern p = Pattern.compile("([\\p{Alpha}]+)(?=\\p{Punct}*$)", Pattern.UNICODE_CHARACTER_CLASS);
Matcher m = p.matcher(sentence);
if (m.find()) {
return m.group();
}
return ""; // or null
}
The regular expression uses look-ahead to find zero-or-more punctuations at the end of the string and matches the alphabetical word before it.
If you want to also allow numbers in the word, change {Alpha} to {Alnum}.
Read the String API for various methods you might use.
For example you could:
Use the lastIndexOf(...) method to find where the start of the word is
Then use the substring(...) method to get the word
Use the StringTokenizer for this
StringTokenizer st = new StringTokenizer("this is a test");//Take any String
int count = st.countTokens();//it will count the number of token in that particular String
String[] myStringArray = new String[count];
for (int i = 0; i < count; i++) {
`myStringArray[i] = st.nextToken();`//insert the words/Tken in to the string array
}
`System.out.println("Last Word is--" + myStringArray[myStringArray.length - 1])`;//get the last words of the given String
System.out.println("enter the string");
Scanner input = new Scanner(System.in);
String instr = input.nextLine();
instr = instr.trim();
int index = instr.lastIndexOf(" ");
int l = instr.length();
System.out.println(l);
String lastStr = instr.substring(index+1,l);
System.out.println("last string .."+lastStr);
I have requirement in which I have to find the no. of times a particular word appears in a file.
For eg.
String str = "Hi hello how are you. hell and heaven. hell, gjh, hello,sdnc ";
Now in this string I want to count no. of times the word "hell" appeared. The count should include "hell" , "hell," all these words but not "hello".
So according to the given string I want the count to be 2.
I used following approaches
1st:
int match = StringUtils.countMatches(str, "hell");
StringUtils is of org.apache.commons.lang3 library
2nd:
int count = 0;
Pattern p = Pattern.compile("hell");
Matcher m = p.matcher(str);
while (m.find()) {
count++;
}
3rd
int count =0;
String[] s = str.split(" ");
for(String word: s)
if(word.equals("hell")
count++;
the 1st two approaches gave 4 as answer and the 3rd approach gave 1 as answer.
Please suggest anyway in which I can get 2 as answer and fullfill my requirement.
You should use word boundary matchers in regex:
Pattern.compile("\\bhell\\b");
You can use a regular expression with the "\\b" word boundaries as follows:
int matches = 0;
Matcher matcher = Pattern.compile("\\bhell\\b", Pattern.CASE_SENSITIVE).matcher(str);
while (matcher.find()) matches++;
Give this a try
String str = "put the string to be searched here";
Scanner sc = new Scanner(str);
String search = "put the string you are searching here";
int counter = 0; //this will count the number of occurences
while (sc.hasNext())
{
if (sc.next() == search)
counter++;
}
Since sc.next() reads complete next token it will hell and hello will not trouble you.