Fastest way to search several strings in a string - java

Below is my code to find the occurrences of all the substrings in a given single string
public static void main(String... args) {
String fullString = "one is a good one. two is ok. three is three. four is four. five is not four";
String[] severalStringArray = { "one", "two", "three", "four" };
Map<String, Integer> countMap = countWords(fullString, severalStringArray);
}
public static Map<String, Integer> countWords(String fullString, String[] severalStringArray) {
Map<String, Integer> countMap = new HashMap<>();
for (String searchString : severalStringArray) {
if (countMap.containsKey(searchString)) {
int searchCount = countMatchesInString(fullString, searchString);
countMap.put(searchString, countMap.get(searchString) + searchCount);
} else
countMap.put(searchString, countMatchesInString(fullString, searchString));
}
return countMap;
}
private static int countMatchesInString(String fullString, String subString) {
int count = 0;
int pos = fullString.indexOf(subString);
while (pos > -1) {
count++;
pos = fullString.indexOf(subString, pos + 1);
}
return count;
}
Assume the full string might be a full file read as a string. Is the above is the efficient way of search or any other better way or fastest way to do it?
Thanks

You could just form a regex alternation of words to search, and then do a single search against that regex:
public static int matchesInString(String fullString, String regex) {
int count = 0;
Pattern r = Pattern.compile(regex);
Matcher m = r.matcher(fullString);
while (m.find())
++count;
return count;
}
String fullString = "one is a good one. two is ok. three is three. four is four. five is not four";
String[] severalStringArray = { "one", "two", "three", "four" };
String regex = "\\b(?:" + String.join("|", severalStringArray) + ")\\b";
int count = matchesInString(fullString, regex);
System.out.println("There were " + count + " matches in the input");
This prints:
There were 8 matches in the input
Note that the regex pattern used in the above example was:
\b(?:one|two|three|four)\b

Regular expressions
Your problem can be solved using regex (regular expressions). Regular expressions are a tool that help you matching patterns in strings. This pattern can be a word or can be a set of chars.
Regular expressions in Java
In Java there are two Objects helping you with regular expressions: Pattern and Matcher.
Below you can see an example for searching if the word stackoverflow exists in the string stackoverflowXstackoverflowXXXstackoverflowXX in Java.
String pattern = "stackoverflow";
String stringToExamine = "stackoverflowXstackoverflowXXXstackoverflowXX";
Pattern patternObj = Pattern.compile(pattern);
Matcher matcherObj = patternObj.matcher(stringToExamine);
Counting how many occurrencies of a word in a given string
As written here you have different solution based on your Java version:
Java 9+
long matches = matcherObj.results().count();
Older Java versions
int count = 0;
while (matcherObj.find())
count++;
Regular expressions in your problem
You use a method for calculating how many times a word is occurring in a text (a string), and you can modify it like this:
Java 9+
public static int matchesInString(String fullString, String pattern)
{
Pattern patternObj = Pattern.compile(pattern);
Matcher matcherObj = patternObj.matcher(fullString);
return matcherObj.results().count();
}
Older Java versions
public static int matchesInString(String fullString, String pattern)
{
int count = 0;
Pattern patternObj = Pattern.compile(pattern);
Matcher matcherObj = patternObj.matcher(fullString);
while (matcherObj.find())
count++;
return count;
}

Actually, the fastest way is to scan the string first and count all existed words and save it into Map. Then select required words only.
Just be simple! The regular expression is too complicated and not efficient for this simple task. Let's solve it with a hummer!
public static void main(String... args) {
String str = "one is a good one. two is ok. three is three. four is four. five is not four";
Set<String> words = Set.of("one", "two", "three", "four");
Map<String, Integer> map = countWords(str, words);
}
public static Map<String, Integer> countWords(String str, Set<String> words) {
Map<String, Integer> map = new HashMap<>();
for (int i = 0, j = 0; j <= str.length(); j++) {
char ch = j == str.length() ? '\0' : str.charAt(j);
if (j == str.length() || !isWordSymbol(ch)) {
String word = str.substring(i, j);
if (!word.isEmpty() && words.contains(word))
map.put(word, map.getOrDefault(word, 0) + 1);
i = j + 1;
}
}
return map;
}
private static boolean isWordSymbol(char ch) {
return Character.isLetter(ch) || ch == '-' || ch == '_';
}

An implementation of the Trie tree that someone commented on. I don't know if it's fast or not.
static class Trie {
static final long INC_NODE_NO = 1L << Integer.SIZE;
private long nextNodeNo = 0;
private Node root = new Node();
private final Map<Long, Node> nodes = new HashMap<>();
public void put(String word) {
Node node = root;
for (int i = 0, len = word.length(); i < len; ++i)
node = node.put(word.charAt(i));
node.data = word;
}
public List<String> findPrefix(String text, int start) {
List<String> result = new ArrayList<>();
Node node = root;
for (int i = start, length = text.length(); i < length; ++i) {
if ((node = node.get(text.charAt(i))) == null)
break;
String v = node.data;
if (v != null)
result.add(v);
}
return result;
}
public Map<String, Integer> find(String text) {
Map<String, Integer> result = new HashMap<>();
for (int i = 0, length = text.length(); i < length; ++i)
for (String w : findPrefix(text, i))
result.compute(w, (k, v) -> v == null ? 1 : v + 1);
return result;
}
class Node {
final long no;
String data;
Node() {
this.no = nextNodeNo;
nextNodeNo += INC_NODE_NO;
}
Node get(int key) {
return nodes.get(no | key);
}
Node put(int key) {
return nodes.computeIfAbsent(no | key, k -> new Node());
}
}
}
public static void main(String args[]) throws IOException {
String fullString = "one is a good one. two is ok. three is three. four is four. five is not four";
String[] severalStringArray = { "one", "two", "three", "four" };
Trie trie = new Trie();
for (String word : severalStringArray)
trie.put(word);
Map<String, Integer> count = trie.find(fullString);
System.out.println(count);
}
output:
{four=3, one=2, three=2, two=1}

Related

How can i get String result in stringPattern value birdantantcatbirdcat

i have dataDic that is an array {"ant","bird","cat"}
dataDic is array of word that i want to search on stringPattern
I want to use dataDic to get word result from stringPattern = birdantantcatbirdcat
Ex1.
dataDic = {"ant","bird","cat"}
answer is {bird,ant,ant,cat,bird,cat}
Ex2.
dataDic = {"ant","cat"}
answer is {ant,ant,cat,cat}
this is my code
`private static String stringTest="birdantantcatbirdcat";
private static List dicListWord;
private static ListresultString = new ArrayList<>();
public static void main(String[] args) {
dicListWord = new ArrayList<>();
dicListWord.add("ant");
dicListWord.add("bird");
dicListWord.add("cat");
String[] data = stringTest.split("");
for (String dataDic:dicListWord) {
String [] wordList = dataDic.split("");
String foundWord = "";
for (String charTec:data) {
for (String dicWord:wordList) {
if(charTec.equals(dicWord)){
foundWord = foundWord.concat(charTec);
if(dataDic.equals(foundWord)){
resultString.add(foundWord);
foundWord = "";
}
}
}
}
}
for (String w1:data) {
for (String result:resultString) {
System.out.println(result);
}
}
}`
///////////////////////////////////////////////////////////////////////////////
and Result that i run is
{ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,antbird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird,ant,ant,bird,bird}
Use a TreeMap to store the position of a word as the key and the word itself as the value as you navigate the string to find matches for the word. The reason why you need to choose a TreeMap is that it is sorted according to the natural ordering of its keys which is an important aspect for your requirement.
Your requirement states that the words in the resulting list should be in the order of their occurrences in the string.
Demo:
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<String> words = List.of("ant", "bird", "cat");
String str = "birdantantcatbirdcat";
System.out.println(getMatchingWords(words, str));
}
static List<String> getMatchingWords(List<String> words, String str) {
Map<Integer, String> map = new TreeMap<Integer, String>();
for (String word : words) {
Pattern pattern = Pattern.compile(word);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
map.put(matcher.start(), matcher.group());
}
}
return map.values().stream().collect(Collectors.toList());
}
}
Output:
[bird, ant, ant, cat, bird, cat]
This is a word break problem and can be solved using a depth-first search. But it is wise to check before if the given string pattern is breakable or not to get better run-time in scenario where we have given a long string pattern that doesn't match any words in the dictionary.
public class P00140_Word_Break_II {
public static void main(String[] args) {
String input = "catsanddog";
List<String> wordDict = Arrays.asList("cat", "cats", "and", "sand", "dog");
P00140_Word_Break_II solution = new P00140_Word_Break_II();
List<String> results = solution.wordBreak(input, wordDict);
System.out.println(results);
String input1 = "birdantantcatbirdcat";
List<String> wordDict1 = Arrays.asList("ant","bird","cat");
List<String> results1 = solution.wordBreak(input1, wordDict1);
System.out.println(results1);
}
public List<String> wordBreak(String s, List<String> wordDict) {
Set<String> dict = new HashSet<>(wordDict);
List<String> result = new ArrayList<>();
if (s == null || s.length() == 0 || !isbreakable(s, dict)) {
return result;
}
helper(s, 0, new StringBuilder(), dict, result);
return result;
}
public void helper(String s, int start, StringBuilder item, Set<String> dict, List<String> results) {
if (start >= s.length()) {
results.add(item.toString());
return;
}
if (start != 0) {
item.append(" ");
}
for (int i = start; i < s.length(); i++) {
String temp = s.substring(start, i + 1);
if (dict.contains(temp)) {
item.append(temp);
helper(s , i+1 , item , dict , results);
item.delete(item.length() + start - i - 1 , item.length());
}
}
if(start!=0) item.deleteCharAt(item.length()-1);
}
private boolean isbreakable(String s, Set<String> dict) {
boolean[] dp = new boolean[s.length() + 1];
dp[0] = true;
for (int i = 1; i <= s.length(); i++) {
for (int j = 0; j < i; j++) {
String subString = s.substring(j, i);
if (dp[j] && dict.contains(subString)) {
dp[i] = true;
break;
}
}
}
return dp[s.length()];
}
}

Efficient and non-interfering way of replacing multiple substrings in a String

I'm trying to apply the same replacement instructions several thousand times to different input strings with as little overhead as possible. I need to consider two things for this:
The search Strings aren't necessarily all the same length: one may be just "a", another might be "ch", yet another might be "sch"
What was already replaced shall not be replaced again: If the replacement patterns are [a->e; e->a], "beat" should become "baet", not "baat" or "beet".
With that in mind, this is the code I came up with:
public class Replacements {
private String[] search;
private String[] replace;
Replacements(String[] s, String[] r)
{
if (s.length!=r.length) throw new IllegalArgumentException();
Map<String,String> map = new HashMap<String,String>();
for (int i=0;i<s.length;i++)
{
map.put(s[i], r[i]);
}
List<String> sortedKeys = new ArrayList(map.keySet());
Collections.sort(sortedKeys, new StringLengthComparator());
this.search = sortedKeys.toArray(new String[0]);
Stack<String> r2 = new Stack<>();
sortedKeys.stream().forEach((i) -> {
r2.push(map.get(i));
});
this.replace = r2.toArray(new String[0]);
}
public String replace(String input)
{
return replace(input,0);
}
private String replace(String input,int i)
{
String out = "";
List<String> parts = Arrays.asList(input.split(this.search[i],-1));
for (Iterator it = parts.iterator(); it.hasNext();)
{
String part = it.next().toString();
if (part.length()>0 && i<this.search.length-1) out += replace(part,i+1);
if (it.hasNext()) out += this.replace[i];
}
return out;
}
}
And then
String[] words;
//fill variable words
String[] s_input = "ou|u|c|ch|ce|ci".split("\\|",-1);
String[] r_input = "u|a|k|c|se|si".split("\\|",-1);
Replacements reps = new Replacements(s_input,r_input);
for (String word : words) {
System.out.println(reps.replace(word));
}
(s_input and r_input would be up to the user, so they're just examples, just like the program wouldn't actually use println())
This code makes sure longer search strings get looked for first and also covers the second condition above.
It is, however, quite costly. What would be the most efficient way to accomplish what I'm doing here (especially if the number of Strings in words is significantly large)?
With my current code, "couch" should be converted into "kuc" (except it doesn't, apparently; it now does, thanks to the -1 in split(p,-1))
This is not a full solution but it shows how to scan the input and find all target substrings in one pass. You would use a StringBuilder to assemble the result, looking up the replacements in a Map as you are currently doing. Use the start and end indexes to handle copying of non-matching segments.
public static void main(String[] args) throws Exception
{
Pattern p = Pattern.compile("(ou|ch|ce|ci|u|c)");
Matcher m = p.matcher("auouuchcceaecxici");
while (m.find())
{
MatchResult r = m.toMatchResult();
System.out.printf("s=%d e=%d '%s'\n", r.start(), r.end(), r.group());
}
}
Output:
s=1 e=2 'u'
s=2 e=4 'ou'
s=4 e=5 'u'
s=5 e=7 'ch'
s=7 e=8 'c'
s=8 e=10 'ce'
s=12 e=13 'c'
s=15 e=17 'ci'
Note the strings in the regex have to be sorted in order of descending length to work correctly.
One could make a regex pattern from the keys and leave it to that module for optimization.
Obviously
"(ou|u|ch|ce|ci|c)"
needs to take care of ce/ci/c, either by reverse sorting or immediately as tree:
"(c(e|h|i)?|ou|u)"
Then
String soughtKeys = "ou|u|ch|ce|ci|c"; // c last
String replacements = "u|a|c|se|si|k";
Map<String, String> map = new HashMap<>();
... fill map
Pattern pattern = Pattern.compile("(" + soughtKeys + ")");
for (String word : words) {
StringBuffer sb = new StringBuffer();
Matcher m = pattern.matcher(word);
while (m.find()) {
m.appendReplacement(sb, map.get(m.group());
}
m.appendTail(sb);
System.out.printf("%s -> %s%n", word, sb.toString());
}
The advantage being that regex is quite smart (though slow), and replacements are not done over replaced text.
public class Replacements
{
private String[] search; // sorted in descending length and order, eg: sch, ch, c
private String[] replace; // corresponding replacement
Replacements(String[] s, String[] r)
{
if (s.length != r.length)
throw new IllegalArgumentException();
final TreeMap<String, String> map = new TreeMap<String, String>(Collections.reverseOrder());
for (int i = 0; i < s.length; i++)
map.put(s[i], r[i]);
this.search = map.keySet().toArray(new String[map.size()]);
this.replace = map.values().toArray(new String[map.size()]);
}
public String replace(String input)
{
final StringBuilder result = new StringBuilder();
// start of yet-to-be-copied substring
int s = 0;
SEARCH:
for (int i = s; i < input.length(); i++)
{
for (int p = 0; p < this.search.length; p++)
{
if (input.regionMatches(i, this.search[p], 0, this.search[p].length()))
{
// append buffer and replacement
result.append(input, s, i).append(this.replace[p]);
// skip beyond current match and reset buffer
i += this.search[p].length();
s = i--;
continue SEARCH;
}
}
}
if (s == 0) // no matches? no changes!
return input;
// append remaining buffer
return result.append(input, s, input.length()).toString();
}
}

How to split string at every nth occurrence of character in Java

I would like to split a string at every 4th occurrence of a comma ,.
How to do this? Below is an example:
String str = "1,,,,,2,3,,1,,3,,";
Expected output:
array[0]: 1,,,,
array[1]: ,2,3,,
array[2]: 1,,3,,
I tried using Google Guava like this:
Iterable<String> splitdata = Splitter.fixedLength(4).split(str);
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
I also tried this:
String [] splitdata = str.split("(?<=\\G.{" + 4 + "})");
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
Yet this is is not the output I want. I just want to split the string at every 4th occurrence of a comma.
Thanks.
Take two int variable. One is to count the no of ','. If ',' occurs then the count will move. And if the count is go to 4 then reset it to 0. The other int value will indicate that from where the string will be cut off. it will start from 0 and after the first string will be detected the the end point (char position in string) will be the first point of the next. Use the this start point and current end point (i+1 because after the occurrence happen the i value will be incremented). Finally add the string in the array list. This is a sample code. Hope this will help you. Sorry for my bad English.
String str = "1,,,,,2,3,,1,,3,,";
int k = 0;
int startPoint = 0;
ArrayList<String> arrayList = new ArrayList<>();
for (int i = 0; i < str.length(); i++)
{
if (str.charAt(i) == ',')
{
k++;
if (k == 4)
{
String ab = str.substring(startPoint, i+1);
System.out.println(ab);
arrayList.add(ab);
startPoint = i+1;
k = 0;
}
}
}
Here's a more flexible function, using an idea from this answer:
static List<String> splitAtNthOccurrence(String input, int n, String delimiter) {
List<String> pieces = new ArrayList<>();
// *? is the reluctant quantifier
String regex = Strings.repeat(".*?" + delimiter, n);
Matcher matcher = Pattern.compile(regex).matcher(input);
int lastEndOfMatch = -1;
while (matcher.find()) {
pieces.add(matcher.group());
lastEndOfMatch = matcher.end();
}
if (lastEndOfMatch != -1) {
pieces.add(input.substring(lastEndOfMatch));
}
return pieces;
}
This is how you call it using your example:
String input = "1,,,,,2,3,,1,,3,,";
List<String> pieces = splitAtNthOccurrence(input, 4, ",");
pieces.forEach(System.out::println);
// Output:
// 1,,,,
// ,2,3,,
// 1,,3,,
I use Strings.repeat from Guava.
try this also, if you want result in array
String str = "1,,,,,2,3,,1,,3,,";
System.out.println(str);
char c[] = str.toCharArray();
int ptnCnt = 0;
for (char d : c) {
if(d==',')
ptnCnt++;
}
String result[] = new String[ptnCnt/4];
int i=-1;
int beginIndex = 0;
int cnt=0,loopcount=0;
for (char ele : c) {
loopcount++;
if(ele==',')
cnt++;
if(cnt==4){
cnt=0;
result[++i]=str.substring(beginIndex,loopcount);
beginIndex=loopcount;
}
}
for (String string : result) {
System.out.println(string);
}
This work pefectly and tested in Java 8
public String[] split(String input,int at){
String[] out = new String[2];
String p = String.format("((?:[^/]*/){%s}[^/]*)/(.*)",at);
Pattern pat = Pattern.compile(p);
Matcher matcher = pat.matcher(input);
if (matcher.matches()) {
out[0] = matcher.group(1);// left
out[1] = matcher.group(2);// right
}
return out;
}
//Ex: D:/folder1/folder2/folder3/file1.txt
//if at = 2, group(1) = D:/folder1/folder2 and group(2) = folder3/file1.txt
The accepted solution above by Saqib Rezwan does not add the leftover string to the list, if it divides the string after every 4th comma and the length of the string is 9 then it will leave the 9th character, and return the wrong list.
A complete solution would be :
private static ArrayList<String> splitStringAtNthOccurrence(String str, int n) {
int k = 0;
int startPoint = 0;
ArrayList<String> list = new ArrayList();
for (int i = 0; i < str.length(); i++) {
if (str.charAt(i) == ',') {
k++;
if (k == n) {
String ab = str.substring(startPoint, i + 1);
list.add(ab);
startPoint = i + 1;
k = 0;
}
}
// if there is no comma left and there are still some character in the string
// add them to list
else if (!str.substring(i).contains(",")) {
list.add(str.substring(startPoint));
break;
}
}
return list;
}
}

Arranging word to formpalindrome

Hello i´ve been trying to form palindromes from this input:
String[] text ={"ivcci", "oyotta", "cecarar","bbb","babbbb"};
getPalindrome(text);
and i need to rearrange all words in array to produce this output
civic
-1
rececar
bbb
bbabb
the method expects to receive an array of Strings like
public static String getPalindrome(String[] text){}
"returning -1 means i.g "oyotta" in array can´t form a palíndrome
i´ve been testing this code and it works but i.g "cecarar" is not producing "racecar", as im a bit new in java i used an String intead an array of Strings, can anybody help to write this code properly please?
Thanks a lot!
public static String getPalindrome(String s) {
if (s == null)
return null;
Map<Character, Integer> letters = new HashMap<Character, Integer>();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (!letters.containsKey(c))
letters.put(c, 1);
else
letters.put(c, letters.get(c) + 1);
}
char[] result = new char[s.length()];
int i = 0, j = result.length - 1;
Character middle = null;
for (Entry<Character, Integer> e : letters.entrySet()) {
int val = e.getValue();
char c = e.getKey();
if (val % 2 != 0) {
if (middle == null && s.length() % 2 != 0) {
middle = c;
val--;
} else
return "-1";
}
for (int k = 0; k < val / 2; k++) {
result[i++] = c;
result[j--] = c;
}
}
if (middle != null)
result[result.length / 2] = middle;
return new String(result);
}
In order for a set of characters to be able to produce a palindrome, only one of the letters can be repeated an odd number of times, so you can first weed that out.
Without writing actual code for you, here is the algorithm I would use:
Create a map of characters to a counter. Possible to do int[] counts = new int[26];
Go through each character in the input string, and increment the count: ++counts[Character.toLower(c)-'a'];
Then go through each character, and see if its odd if (counts[i] & 1 != 0) { if (oddIndex != -1) { return -1; } oddIndex=i; } This will return -1 if there is two or more odd counts.
Then, you can create a StringBuilder, and start with the oddIndex in the middle, if it exists.
Then go through the counts, and add count[i]/2 to the front and back of your string builder.
That'll give you a symmetric string from the original inputs.
Now, if you actually need words, then you'll have to have a dictionary of palindromes. You can actually preprocess all the palindromes to have a map of "sorted character string"=>"palindrome"
class PalindromeChecker
{
final Map<String, String> palindromes = new HashMap<String, String>();
public PalindromeChecker(Iterable<String> allPalindromes) {
for (String palindrome: allPalindromes) {
char[] chars = palindrome.getChars();
Arrays.sort(chars);
palindromes.put(String.valueOf(chars), palindromes);
}
}
public String getPalindrome(String input) {
char[] chars = input.getChars();
Arrays.sort(chars);
return palindromes.get(String.valueOf(chars));
}
}
As other users pointed out, a string can be rearranged as a palindrome only if there is at most one character that appears an odd number of times.
Once you have confirmed that a string can be converted to a palindrome, you can construct the palindrome as follows (this is just one of many methods of course):
place at the sides of the string all the pairs of characters that you can get
place at the middle of the string the single character that is left out, in case there is such a character.
Example:
public class Palindromes {
public static void main(String[] args) {
String[] text = {"ivcci", "oyotta", "cecarar","bbb","babbbb"};
for(String str : text){
evaluatePalindrome(str);
}
}
private static void evaluatePalindrome(String str){
PalindromeCandidate pc = new PalindromeCandidate(str);
if(pc.isPalindrome()){
System.out.println(pc.getPalindrome());
} else {
System.out.println("-1");
}
}
}
public class PalindromeCandidate {
private final CharacterCount characterCount;
public PalindromeCandidate(String originalString) {
this.characterCount = new CharacterCount(originalString);
}
public boolean isPalindrome(){
Collection<Integer> counts = characterCount.asMap().values();
int oddCountOccurrences = 0;
for(Integer count : counts){
oddCountOccurrences += (count%2);
}
return (oddCountOccurrences <= 1);
}
public String getPalindrome(){
if(!isPalindrome()){
throw new RuntimeException("Cannot be rearranged as a palindrome.");
}
Map<Character, Integer> counts = characterCount.asMap();
StringBuilder leftSide = new StringBuilder();
StringBuilder middle = new StringBuilder();
for(Character ch : counts.keySet()){
int occurrences = counts.get(ch);
while(occurrences > 1){
leftSide.append(ch);
occurrences -= 2;
}
if(occurrences > 0){
middle.append(ch);
}
}
StringBuilder rightSide = new StringBuilder(leftSide).reverse();
return leftSide.append(middle).append(rightSide).toString();
}
}
/**
* Thin wrapper around a Map<Character, Integer>. Used for counting occurences
* of characters.
*/
public class CharacterCount {
private final Map<Character, Integer> map;
public CharacterCount(String str) {
this.map = new HashMap<>();
for(Character ch : str.toCharArray()){
increment(ch);
}
}
private void increment(Character ch){
this.map.put(ch, getCount(ch) + 1);
}
private Integer getCount(Character ch){
if(map.containsKey(ch)){
return map.get(ch);
} else {
return 0;
}
}
public Map<Character, Integer> asMap(){
return new HashMap<>(map);
}
}

Finding repeated words on a string and counting the repetitions

I need to find repeated words on a string, and then count how many times they were repeated. So basically, if the input string is this:
String s = "House, House, House, Dog, Dog, Dog, Dog";
I need to create a new string list without repetitions and save somewhere else the amount of repetitions for each word, like such:
New String: "House, Dog"
New Int Array: [3, 4]
Is there a way to do this easily with Java? I've managed to separate the string using s.split() but then how do I count repetitions and eliminate them on the new string? Thanks!
You've got the hard work done. Now you can just use a Map to count the occurrences:
Map<String, Integer> occurrences = new HashMap<String, Integer>();
for ( String word : splitWords ) {
Integer oldCount = occurrences.get(word);
if ( oldCount == null ) {
oldCount = 0;
}
occurrences.put(word, oldCount + 1);
}
Using map.get(word) will tell you many times a word occurred. You can construct a new list by iterating through map.keySet():
for ( String word : occurrences.keySet() ) {
//do something with word
}
Note that the order of what you get out of keySet is arbitrary. If you need the words to be sorted by when they first appear in your input String, you should use a LinkedHashMap instead.
Try this,
public class DuplicateWordSearcher {
#SuppressWarnings("unchecked")
public static void main(String[] args) {
String text = "a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f";
List<String> list = Arrays.asList(text.split(" "));
Set<String> uniqueWords = new HashSet<String>(list);
for (String word : uniqueWords) {
System.out.println(word + ": " + Collections.frequency(list, word));
}
}
}
public class StringsCount{
public static void main(String args[]) {
String value = "This is testing Program testing Program";
String item[] = value.split(" ");
HashMap<String, Integer> map = new HashMap<>();
for (String t : item) {
if (map.containsKey(t)) {
map.put(t, map.get(t) + 1);
} else {
map.put(t, 1);
}
}
Set<String> keys = map.keySet();
for (String key : keys) {
System.out.println(key);
System.out.println(map.get(key));
}
}
}
As mentioned by others use String::split(), followed by some map (hashmap or linkedhashmap) and then merge your result. For completeness sake putting the code.
import java.util.*;
public class Genric<E>
{
public static void main(String[] args)
{
Map<String, Integer> unique = new LinkedHashMap<String, Integer>();
for (String string : "House, House, House, Dog, Dog, Dog, Dog".split(", ")) {
if(unique.get(string) == null)
unique.put(string, 1);
else
unique.put(string, unique.get(string) + 1);
}
String uniqueString = join(unique.keySet(), ", ");
List<Integer> value = new ArrayList<Integer>(unique.values());
System.out.println("Output = " + uniqueString);
System.out.println("Values = " + value);
}
public static String join(Collection<String> s, String delimiter) {
StringBuffer buffer = new StringBuffer();
Iterator<String> iter = s.iterator();
while (iter.hasNext()) {
buffer.append(iter.next());
if (iter.hasNext()) {
buffer.append(delimiter);
}
}
return buffer.toString();
}
}
New String is Output = House, Dog
Int array (or rather list) Values = [3, 4] (you can use List::toArray) for getting an array.
Using java8
private static void findWords(String s, List<String> output, List<Integer> count){
String[] words = s.split(", ");
Map<String, Integer> map = new LinkedHashMap<>();
Arrays.stream(words).forEach(e->map.put(e, map.getOrDefault(e, 0) + 1));
map.forEach((k,v)->{
output.add(k);
count.add(v);
});
}
Also, use a LinkedHashMap if you want to preserve the order of insertion
private static void findWords(){
String s = "House, House, House, Dog, Dog, Dog, Dog";
List<String> output = new ArrayList<>();
List<Integer> count = new ArrayList<>();
findWords(s, output, count);
System.out.println(output);
System.out.println(count);
}
Output
[House, Dog]
[3, 4]
If this is a homework, then all I can say is: use String.split() and HashMap<String,Integer>.
(I see you've found split() already. You're along the right lines then.)
It may help you somehow.
String st="I am am not the one who is thinking I one thing at time";
String []ar = st.split("\\s");
Map<String, Integer> mp= new HashMap<String, Integer>();
int count=0;
for(int i=0;i<ar.length;i++){
count=0;
for(int j=0;j<ar.length;j++){
if(ar[i].equals(ar[j])){
count++;
}
}
mp.put(ar[i], count);
}
System.out.println(mp);
Once you have got the words from the string it is easy.
From Java 10 onwards you can try the following code:
import java.util.Arrays;
import java.util.stream.Collectors;
public class StringFrequencyMap {
public static void main(String... args) {
String[] wordArray = {"House", "House", "House", "Dog", "Dog", "Dog", "Dog"};
var freq = Arrays.stream(wordArray)
.collect(Collectors.groupingBy(x -> x, Collectors.counting()));
System.out.println(freq);
}
}
Output:
{House=3, Dog=4}
You can use Prefix tree (trie) data structure to store words and keep track of count of words within Prefix Tree Node.
#define ALPHABET_SIZE 26
// Structure of each node of prefix tree
struct prefix_tree_node {
prefix_tree_node() : count(0) {}
int count;
prefix_tree_node *child[ALPHABET_SIZE];
};
void insert_string_in_prefix_tree(string word)
{
prefix_tree_node *current = root;
for(unsigned int i=0;i<word.size();++i){
// Assuming it has only alphabetic lowercase characters
// Note ::::: Change this check or convert into lower case
const unsigned int letter = static_cast<int>(word[i] - 'a');
// Invalid alphabetic character, then continue
// Note :::: Change this condition depending on the scenario
if(letter > 26)
throw runtime_error("Invalid alphabetic character");
if(current->child[letter] == NULL)
current->child[letter] = new prefix_tree_node();
current = current->child[letter];
}
current->count++;
// Insert this string into Max Heap and sort them by counts
}
// Data structure for storing in Heap will be something like this
struct MaxHeapNode {
int count;
string word;
};
After inserting all words, you have to print word and count by iterating Maxheap.
//program to find number of repeating characters in a string
//Developed by Subash<subash_senapati#ymail.com>
import java.util.Scanner;
public class NoOfRepeatedChar
{
public static void main(String []args)
{
//input through key board
Scanner sc = new Scanner(System.in);
System.out.println("Enter a string :");
String s1= sc.nextLine();
//formatting String to char array
String s2=s1.replace(" ","");
char [] ch=s2.toCharArray();
int counter=0;
//for-loop tocompare first character with the whole character array
for(int i=0;i<ch.length;i++)
{
int count=0;
for(int j=0;j<ch.length;j++)
{
if(ch[i]==ch[j])
count++; //if character is matching with others
}
if(count>1)
{
boolean flag=false;
//for-loop to check whether the character is already refferenced or not
for (int k=i-1;k>=0 ;k-- )
{
if(ch[i] == ch[k] ) //if the character is already refferenced
flag=true;
}
if( !flag ) //if(flag==false)
counter=counter+1;
}
}
if(counter > 0) //if there is/are any repeating characters
System.out.println("Number of repeating charcters in the given string is/are " +counter);
else
System.out.println("Sorry there is/are no repeating charcters in the given string");
}
}
public static void main(String[] args) {
String s="sdf sdfsdfsd sdfsdfsd sdfsdfsd sdf sdf sdf ";
String st[]=s.split(" ");
System.out.println(st.length);
Map<String, Integer> mp= new TreeMap<String, Integer>();
for(int i=0;i<st.length;i++){
Integer count=mp.get(st[i]);
if(count == null){
count=0;
}
mp.put(st[i],++count);
}
System.out.println(mp.size());
System.out.println(mp.get("sdfsdfsd"));
}
If you pass a String argument it will count the repetition of each word
/**
* #param string
* #return map which contain the word and value as the no of repatation
*/
public Map findDuplicateString(String str) {
String[] stringArrays = str.split(" ");
Map<String, Integer> map = new HashMap<String, Integer>();
Set<String> words = new HashSet<String>(Arrays.asList(stringArrays));
int count = 0;
for (String word : words) {
for (String temp : stringArrays) {
if (word.equals(temp)) {
++count;
}
}
map.put(word, count);
count = 0;
}
return map;
}
output:
Word1=2, word2=4, word2=1,. . .
import java.util.HashMap;
import java.util.LinkedHashMap;
public class CountRepeatedWords {
public static void main(String[] args) {
countRepeatedWords("Note that the order of what you get out of keySet is arbitrary. If you need the words to be sorted by when they first appear in your input String, you should use a LinkedHashMap instead.");
}
public static void countRepeatedWords(String wordToFind) {
String[] words = wordToFind.split(" ");
HashMap<String, Integer> wordMap = new LinkedHashMap<String, Integer>();
for (String word : words) {
wordMap.put(word,
(wordMap.get(word) == null ? 1 : (wordMap.get(word) + 1)));
}
System.out.println(wordMap);
}
}
I hope this will help you
public void countInPara(String str) {
Map<Integer,String> strMap = new HashMap<Integer,String>();
List<String> paraWords = Arrays.asList(str.split(" "));
Set<String> strSet = new LinkedHashSet<>(paraWords);
int count;
for(String word : strSet) {
count = Collections.frequency(paraWords, word);
strMap.put(count, strMap.get(count)==null ? word : strMap.get(count).concat(","+word));
}
for(Map.Entry<Integer,String> entry : strMap.entrySet())
System.out.println(entry.getKey() +" :: "+ entry.getValue());
}
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class DuplicateWord {
public static void main(String[] args) {
String para = "this is what it is this is what it can be";
List < String > paraList = new ArrayList < String > ();
paraList = Arrays.asList(para.split(" "));
System.out.println(paraList);
int size = paraList.size();
int i = 0;
Map < String, Integer > duplicatCountMap = new HashMap < String, Integer > ();
for (int j = 0; size > j; j++) {
int count = 0;
for (i = 0; size > i; i++) {
if (paraList.get(j).equals(paraList.get(i))) {
count++;
duplicatCountMap.put(paraList.get(j), count);
}
}
}
System.out.println(duplicatCountMap);
List < Integer > myCountList = new ArrayList < > ();
Set < String > myValueSet = new HashSet < > ();
for (Map.Entry < String, Integer > entry: duplicatCountMap.entrySet()) {
myCountList.add(entry.getValue());
myValueSet.add(entry.getKey());
}
System.out.println(myCountList);
System.out.println(myValueSet);
}
}
Input: this is what it is this is what it can be
Output:
[this, is, what, it, is, this, is, what, it, can, be]
{can=1, what=2, be=1, this=2, is=3, it=2}
[1, 2, 1, 2, 3, 2]
[can, what, be, this, is, it]
import java.util.HashMap;
import java.util.Scanner;
public class class1 {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String inpStr = in.nextLine();
int key;
HashMap<String,Integer> hm = new HashMap<String,Integer>();
String[] strArr = inpStr.split(" ");
for(int i=0;i<strArr.length;i++){
if(hm.containsKey(strArr[i])){
key = hm.get(strArr[i]);
hm.put(strArr[i],key+1);
}
else{
hm.put(strArr[i],1);
}
}
System.out.println(hm);
}
}
Please use the below code. It is the most simplest as per my analysis. Hope you will like it:
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Scanner;
import java.util.Set;
public class MostRepeatingWord {
String mostRepeatedWord(String s){
String[] splitted = s.split(" ");
List<String> listString = Arrays.asList(splitted);
Set<String> setString = new HashSet<String>(listString);
int count = 0;
int maxCount = 1;
String maxRepeated = null;
for(String inp: setString){
count = Collections.frequency(listString, inp);
if(count > maxCount){
maxCount = count;
maxRepeated = inp;
}
}
return maxRepeated;
}
public static void main(String[] args)
{
System.out.println("Enter The Sentence: ");
Scanner s = new Scanner(System.in);
String input = s.nextLine();
MostRepeatingWord mrw = new MostRepeatingWord();
System.out.println("Most repeated word is: " + mrw.mostRepeatedWord(input));
}
}
package day2;
import java.util.ArrayList;
import java.util.HashMap;`enter code here`
import java.util.List;
public class DuplicateWords {
public static void main(String[] args) {
String S1 = "House, House, House, Dog, Dog, Dog, Dog";
String S2 = S1.toLowerCase();
String[] S3 = S2.split("\\s");
List<String> a1 = new ArrayList<String>();
HashMap<String, Integer> hm = new HashMap<>();
for (int i = 0; i < S3.length - 1; i++) {
if(!a1.contains(S3[i]))
{
a1.add(S3[i]);
}
else
{
continue;
}
int Count = 0;
for (int j = 0; j < S3.length - 1; j++)
{
if(S3[j].equals(S3[i]))
{
Count++;
}
}
hm.put(S3[i], Count);
}
System.out.println("Duplicate Words and their number of occurrences in String S1 : " + hm);
}
}
public class Counter {
private static final int COMMA_AND_SPACE_PLACE = 2;
private String mTextToCount;
private ArrayList<String> mSeparateWordsList;
public Counter(String mTextToCount) {
this.mTextToCount = mTextToCount;
mSeparateWordsList = cutStringIntoSeparateWords(mTextToCount);
}
private ArrayList<String> cutStringIntoSeparateWords(String text)
{
ArrayList<String> returnedArrayList = new ArrayList<>();
if(text.indexOf(',') == -1)
{
returnedArrayList.add(text);
return returnedArrayList;
}
int position1 = 0;
int position2 = 0;
while(position2 < text.length())
{
char c = ',';
if(text.toCharArray()[position2] == c)
{
String tmp = text.substring(position1, position2);
position1 += tmp.length() + COMMA_AND_SPACE_PLACE;
returnedArrayList.add(tmp);
}
position2++;
}
if(position1 < position2)
{
returnedArrayList.add(text.substring(position1, position2));
}
return returnedArrayList;
}
public int[] countWords()
{
if(mSeparateWordsList == null) return null;
HashMap<String, Integer> wordsMap = new HashMap<>();
for(String s: mSeparateWordsList)
{
int cnt;
if(wordsMap.containsKey(s))
{
cnt = wordsMap.get(s);
cnt++;
} else {
cnt = 1;
}
wordsMap.put(s, cnt);
}
return printCounterResults(wordsMap);
}
private int[] printCounterResults(HashMap<String, Integer> m)
{
int index = 0;
int[] returnedIntArray = new int[m.size()];
for(int i: m.values())
{
returnedIntArray[index] = i;
index++;
}
return returnedIntArray;
}
}
/*count no of Word in String using TreeMap we can use HashMap also but word will not display in sorted order */
import java.util.*;
public class Genric3
{
public static void main(String[] args)
{
Map<String, Integer> unique = new TreeMap<String, Integer>();
String string1="Ram:Ram: Dog: Dog: Dog: Dog:leela:leela:house:house:shayam";
String string2[]=string1.split(":");
for (int i=0; i<string2.length; i++)
{
String string=string2[i];
unique.put(string,(unique.get(string) == null?1:(unique.get(string)+1)));
}
System.out.println(unique);
}
}
//program to find number of repeating characters in a string
//Developed by Rahul Lakhmara
import java.util.*;
public class CountWordsInString {
public static void main(String[] args) {
String original = "I am rahul am i sunil so i can say am i";
// making String type of array
String[] originalSplit = original.split(" ");
// if word has only one occurrence
int count = 1;
// LinkedHashMap will store the word as key and number of occurrence as
// value
Map<String, Integer> wordMap = new LinkedHashMap<String, Integer>();
for (int i = 0; i < originalSplit.length - 1; i++) {
for (int j = i + 1; j < originalSplit.length; j++) {
if (originalSplit[i].equals(originalSplit[j])) {
// Increment in count, it will count how many time word
// occurred
count++;
}
}
// if word is already present so we will not add in Map
if (wordMap.containsKey(originalSplit[i])) {
count = 1;
} else {
wordMap.put(originalSplit[i], count);
count = 1;
}
}
Set word = wordMap.entrySet();
Iterator itr = word.iterator();
while (itr.hasNext()) {
Map.Entry map = (Map.Entry) itr.next();
// Printing
System.out.println(map.getKey() + " " + map.getValue());
}
}
}
public static void main(String[] args){
String string = "elamparuthi, elam, elamparuthi";
String[] s = string.replace(" ", "").split(",");
String[] op;
String ops = "";
for(int i=0; i<=s.length-1; i++){
if(!ops.contains(s[i]+"")){
if(ops != "")ops+=", ";
ops+=s[i];
}
}
System.out.println(ops);
}
For Strings with no space, we can use the below mentioned code
private static void findRecurrence(String input) {
final Map<String, Integer> map = new LinkedHashMap<>();
for(int i=0; i<input.length(); ) {
int pointer = i;
int startPointer = i;
boolean pointerHasIncreased = false;
for(int j=0; j<startPointer; j++){
if(pointer<input.length() && input.charAt(j)==input.charAt(pointer) && input.charAt(j)!=32){
pointer++;
pointerHasIncreased = true;
}else{
if(pointerHasIncreased){
break;
}
}
}
if(pointer - startPointer >= 2) {
String word = input.substring(startPointer, pointer);
if(map.containsKey(word)){
map.put(word, map.get(word)+1);
}else{
map.put(word, 1);
}
i=pointer;
}else{
i++;
}
}
for(Map.Entry<String, Integer> entry : map.entrySet()){
System.out.println(entry.getKey() + " = " + (entry.getValue()+1));
}
}
Passing some input as "hahaha" or "ba na na" or "xxxyyyzzzxxxzzz" give the desired output.
Hope this helps :
public static int countOfStringInAText(String stringToBeSearched, String masterString){
int count = 0;
while (masterString.indexOf(stringToBeSearched)>=0){
count = count + 1;
masterString = masterString.substring(masterString.indexOf(stringToBeSearched)+1);
}
return count;
}
package string;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
public class DublicatewordinanArray {
public static void main(String[] args) {
String str = "This is Dileep Dileep Kumar Verma Verma";
DuplicateString(str);
}
public static void DuplicateString(String str) {
String word[] = str.split(" ");
Map < String, Integer > map = new HashMap < String, Integer > ();
for (String w: word)
if (!map.containsKey(w)) {
map.put(w, 1);
}
else {
map.put(w, map.get(w) + 1);
}
Set < Map.Entry < String, Integer >> entrySet = map.entrySet();
for (Map.Entry < String, Integer > entry: entrySet)
if (entry.getValue() > 1) {
System.out.printf("%s : %d %n", entry.getKey(), entry.getValue());
}
}
}
Using Java 8 streams collectors:
public static Map<String, Integer> countRepetitions(String str) {
return Arrays.stream(str.split(", "))
.collect(Collectors.toMap(s -> s, s -> 1, (a, b) -> a + 1));
}
Input: "House, House, House, Dog, Dog, Dog, Dog, Cat"
Output: {Cat=1, House=3, Dog=4}
please try these it may be help for you.
public static void main(String[] args) {
String str1="House, House, House, Dog, Dog, Dog, Dog";
String str2=str1.replace(",", "");
Map<String,Integer> map=findFrquenciesInString(str2);
Set<String> keys=map.keySet();
Collection<Integer> vals=map.values();
System.out.println(keys);
System.out.println(vals);
}
private static Map<String,Integer> findFrquenciesInString(String str1) {
String[] strArr=str1.split(" ");
Map<String,Integer> map=new HashMap<>();
for(int i=0;i<strArr.length;i++) {
int count=1;
for(int j=i+1;j<strArr.length;j++) {
if(strArr[i].equals(strArr[j]) && strArr[i]!="-1") {
strArr[j]="-1";
count++;
}
}
if(count>1 && strArr[i]!="-1") {
map.put(strArr[i], count);
strArr[i]="-1";
}
}
return map;
}
as introduction of stream has changed the way we code; i would like to add some of the ways of doing this using it
String[] strArray = str.split(" ");
//1. All string value with their occurrences
Map<String, Long> counterMap =
Arrays.stream(strArray).collect(Collectors.groupingBy(e->e, Collectors.counting()));
//2. only duplicating Strings
Map<String, Long> temp = counterMap.entrySet().stream().filter(map->map.getValue() > 1).collect(Collectors.toMap(map -> map.getKey(), map -> map.getValue()));
System.out.println("test : "+temp);
//3. List of Duplicating Strings
List<String> masterStrings = Arrays.asList(strArray);
Set<String> duplicatingStrings =
masterStrings.stream().filter(i -> Collections.frequency(masterStrings, i) > 1).collect(Collectors.toSet());
Use Function.identity() inside Collectors.groupingBy and store everything in a MAP.
String a = "Gini Gina Gina Gina Gina Protijayi Protijayi ";
Map<String, Long> map11 = Arrays.stream(a.split(" ")).collect(Collectors
.groupingBy(Function.identity(),Collectors.counting()));
System.out.println(map11);
// output => {Gina=4, Gini=1, Protijayi=2}
In Python we can use collections.Counter()
a = "Roopa Roopi loves green color Roopa Roopi"
words = a.split()
wordsCount = collections.Counter(words)
for word,count in sorted(wordsCount.items()):
print('"%s" is repeated %d time%s.' % (word,count,"s" if count > 1 else "" ))
Output :
"Roopa" is repeated 2 times.
"Roopi" is repeated 2 times.
"color" is repeated 1 time.
"green" is repeated 1 time.
"loves" is repeated 1 time.

Categories

Resources