How to find greatest frequency of substring in string with no spaces - java

lets suppose string is like "aababcabcdabcdeabcdefabcdefg". so how to find the frequency of largest (all possible) substring.
ps: there is no spaces in between the string.

String stArray[] = s.split("\\s+");
int large =0;
String largeSt="";
for(int i=0;i<stArray.length;i++)
{
if(stArray[i].length() >large)
{
largeSt=stArray[i];
large = stArray[i].length();
}
}
system.out.println("very large :"+largeSt);
if you want to find frequency just count from array by matching

Related

Java Get first character values for a string

I have inputs like
AS23456SDE
MFD324FR
I need to get First Character values like
AS, MFD
There should no first two or first 3 characters input can be changed. Need to get first characters before a number.
Thank you.
Edit : This is what I have tried.
public static String getPrefix(String serial) {
StringBuilder prefix = new StringBuilder();
for(char c : serial.toCharArray()){
if(Character.isDigit(c)){
break;
}
else{
prefix.append(c);
}
}
return prefix.toString();
}
Here is a nice one line solution. It uses a regex to match the first non numeric characters in the string, and then replaces the input string with this match.
public String getFirstLetters(String input) {
return new String("A" + input).replaceAll("^([^\\d]+)(.*)$", "$1")
.substring(1);
}
System.out.println(getFirstLetters("AS23456SDE"));
System.out.println(getFirstLetters("1AS123"));
Output:
AS
(empty)
A simple solution could be like this:
public static void main (String[]args) {
String str = "MFD324FR";
char[] characters = str.toCharArray();
for(char c : characters){
if(Character.isDigit(c))
break;
else
System.out.print(c);
}
}
Use the following function to get required output
public String getFirstChars(String str){
int zeroAscii = '0'; int nineAscii = '9';
String result = "";
for (int i=0; i< str.lenght(); i++){
int ascii = str.toCharArray()[i];
if(ascii >= zeroAscii && ascii <= nineAscii){
result = result + str.toCharArray()[i];
}else{
return result;
}
}
return str;
}
pass your string as argument
I think this can be done by a simple regex which matches digits and java's string split function. This Regex based approach will be more efficient than the methods using more complicated regexs.
Something as below will work
String inp = "ABC345.";
String beginningChars = inp.split("[\\d]+",2)[0];
System.out.println(beginningChars); // only if you want to print.
The regex I used "[\\d]+" is escaped for java already.
What it does?
It matches one or more digits (d). d matches digits of any language in unicode, (so it matches japanese and arabian numbers as well)
What does String beginningChars = inp.split("[\\d]+",2)[0] do?
It applies this regex and separates the string into string arrays where ever a match is found. The [0] at the end selects the first result from that array, since you wanted the starting chars.
What is the second parameter to .split(regex,int) which I supplied as 2?
This is the Limit parameter. This means that the regex will be applied on the string till 1 match is found. Once 1 match is found the string is not processed anymore.
From the Strings javadoc page:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will be efficient if your string is huge.
Possible other regex if you want to split only on english numerals
"[0-9]+"
public static void main(String[] args) {
String testString = "MFD324FR";
int index = 0;
for (Character i : testString.toCharArray()) {
if (Character.isDigit(i))
break;
index++;
}
System.out.println(testString.substring(0, index));
}
this prints the first 'n' characters before it encounters a digit (i.e. integer).

Find all words in dictionary given a string of words

I am attempting to write a program that will find all the words that can be constructed from it using a dictionary which has been loaded into an arrayList from a file. sowpodsList is the dictionary stored as an arrayList. I want to iterate through each word in the dictionary and then compare it to the string. Being that the string is just a random collection of words how do I go about achieving this ?
Input: asdm
Output: a, mad, sad .... (any word which matches in the dictionary.)
for (int i = 0; i < sowpodsList.size(); i++) {
for (int j = 0; j < sowpodsList.get(i).length(); j++) {
if (sowpodsList.get(i).charAt(j) == )
;
}
}
You can search if the count of each character of each word in the dictionary is equal to the input's character count.
ArrayList <String> matches = new ArrayList <String> ();
// for each word in dict
for(String word : sowpodsList) {
// match flag
Boolean nonMatch = true;
// for each character of dict word
for( char chW : word.toCharArray() ) {
String w = Character.toString(chW);
// if the count of chW in word is equal to its count in input,
// then, they are match
if ( word.length() - word.replace(w, "").length() !=
input.length() - input.replace(w, "").length() ) {
nonMatch = false;
break;
}
}
if (nonMatch) {
matches.add( word );
}
}
System.out.println(matches);
Sample output: (dict file I used is here: https://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt)
Input: asdm
Matches: [ad, ads, am, as, dam, dams, ma, mad, mads, mas, sad]
If I were you I'd change the way you store your dictionary.
Given that the string input has random letters in it, what I'd do here is store all words of your dictionary in a SortedMap<String, char[]> (a TreeMap, to be precise) where the keys are the words in your dictionary and the values are characters in this word sorted.
Then I'd sort the characters in the input string as well and go for that (pseudo code, not tested):
public Set<String> getMatchingWords(final String input)
{
final char[] contents = input.toCharArray();
Arrays.sort(contents);
final int inputLength = contents.length;
final Set<String> matchedWords = new HashSet<>();
char[] candidate;
int len;
int matched;
for (final Map.Entry<String, char[]> entry: dictionary.entrySet()) {
candidate = entry.getValue();
// If the first character of the candidate is greater
// than the first character of the contents, no need
// to continue (recall: the dictionary is sorted)
if (candidate[0] > contents[0])
break;
// If the word has a greater length than the input,
// go for the next word
len = candidate.length;
if (len > inputLength)
continue;
// Compare character by character
for (matched = 0; matched < len; matched++)
if (candidate[matched] != contents[matched])
break;
// We only add a match if the number of matched characters
// is exactly that of the candidate
if (matched == len)
matchedWords.add(entry.getKey());
}
return matchedWords;
}
private static int commonChars(final char[] input, final char[] candidate)
{
final int len = Math.min(input.length, candidate.length);
int ret = 0;
for (int i = 0; i < len; i++) {
if (input[i] != candidate[i])
break;
ret++;
}
return ret;
}
With a trie: that would also be possible; whether it is practical or not however is another question, it depends on the size of the dictionary.
But the basic principle would be the same: you'd need a sorted character array of words in your dictionary and add to the trie little by little (use a builder).
A trie node would have three elements:
a map where the keys are the set of characters which can be matched next, and the values are the matching trie nodes;
a set of words which can match at that node exactly.
You can base your trie implementation off this one if you want.
Go for TRIE implementation.
TRIE provides the fastest way for searching over an Array of large collection of words.
https://en.wikipedia.org/wiki/Trie
What you need to do is to insert all words into the trie data structure.
Then just need to call search function in Trie to get the boolean match info.
There are two ways to do it. The best way depends on the relative size of the data structures.
If the dictionary is long and the list of letters is short, it may be best to sort the dictionary (if it is not already), then construct all possible words by permuting the letters (removing duplicates). Then do a binary search using string comparison for each combination of letters to see if it is a word in the dictionary. The tricky part is ensuring that duplicate letters are used only when appropriate.
If the list of letters is long and the dictionary is short, another way would be simply to count the number of letters in the input string: two a's, one s, one m, etc. Then for each dictionary word, if the number of each individual letter in the dictionary word does not exceed those in the input string, the word is valid.
Either way, add all words found to the output array.

Finding the longest substring between a "start" string and one of 3 possible "end" strings

So my question is substring-related.
How do you find the longest possible substring between a starting string and one of three ending strings? I also need to find the index of the original string that the largest substring starts at.
So:
Start string:
"ATG"
3 possible end strings:
"TAG"
"TAA"
"TGA"
An example original string might be:
"SDAFKJDAFKATGDFSDFAKJDNKSJFNSDTGASDFKJSDNKFJSNDJFATGDSDFKJNSDFTAGSDFSDATGFF"
So the result of that should give me:
- Longest substring length: 23 (from the substring ATGDFSDFAKJDNKSJFNSDTGA)
- Index of longest substring: 10
I cannot use Regex.
Thanks for any help!
This is arguably the easiest way, and it's just one line:
String target = str.replaceAll(".*ATG(.*)(TAG|TAA|TGA).*", "$1");
To find the index:
int index = str.indexOf("ATG") + 3;
Note: I have interpreted your remark "I cannot use regex" to mean "I am unskilled at regex", because if it's a java question, regex is available.
Well, this looks like a fun one.
It seems the most straightforward way to do this would be to build your own mini finite state machine. You would have to parse each character in the string and keep track of all possible character sequences that would terminate the sequence.
If you hit a 'T', you need to jump ahead and look at the next character. If it's an 'A' or a 'G' you need to jump ahead again, otherwise, add those tokens to your string. Continue the pattern until you get to the end of the original string, or match one of your terminal patterns.
So, maybe something that looks like this (simplified example):
String longestSequence(String original) {
StringBuilder sb = new StringBuilder();
char[] tokens = original.toCharArray();
for (int i = 0; i < tokens.length; ++i) {
// read each token, and compare / look ahead to see if you should keep going or terminate.
}
return sb.toString();
}
match your string to this regex:
ATG[A-Z]+(TAG|TAA|TGA)
if multiple match occurs then iterate and keep the one with highest length.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
// using pattern with flags
Pattern pattern = Pattern.compile("ATG[A-Z]+(TAG|TAA|TGA)");
Matcher matcher = pattern.matcher( yourInputStringHere );
while (matcher.find()) {
System.out.println("Found the text \"" + matcher.group()
+ "\" starting at " + matcher.start()
+ " and ending at index " + matcher.end());
}
There are already some beautiful and elegant solutions to your problem (Bohemian and inquisitive). If you still - as originally stated - can't use regex, here's an alternative. This code is not especially elegant, and as pointed, there are better ways to do it, but it should at least clearly show you the logic behind the solution to your problem.
How do you find the longest possible substring between a starting string
and one of three ending strings?
First, find the index of starting string, then find the index of each ending string, and get substrings for each ending, then their length. Remember that if string is not found, its index will be -1.
String originalString = "SDAFKJDAFKATGDFSDFAKJDNKSJFNSDTGASDFKJSDNKFJSNDJFATGDSDFKJNSDFTAGSDFSDATGFF";
String STARTING_STRING = "ATG";
String END1 = "TAG";
String END2 = "TAA";
String END3 = "TGA";
//let's find the index of STARTING_STRING
int posOfStartingString = originalString.indexOf(STARTING_STRING);
//if found
if (posOfStartingString != -1) {
int tagPos[] = new int[3];
//let's find the index of each ending strings in the original string
tagPos[0] = originalString.indexOf(END1, posOfStartingString+3);
tagPos[1] = originalString.indexOf(END2, posOfStartingString+3);
tagPos[2] = originalString.indexOf(END3, posOfStartingString+3);
int lengths[] = new int[3];
//we can now use the following methods:
//public String substring(int beginIndex, int endIndex)
//where beginIndex is our posOfStartingString
//and endIndex is position of each ending string (if found)
//
//and finally, String.length() to get the length of each substring
if (tagPos[0] != -1) {
lengths[0] = originalString.substring(posOfStartingString, tagPos[0]).length();
}
if (tagPos[1] != -1) {
lengths[1] = originalString.substring(posOfStartingString, tagPos[1]).length();
}
if (tagPos[2] != -1) {
lengths[2] = originalString.substring(posOfStartingString, tagPos[2]).length();
}
} else {
//no starting string in original string
}
lengths[] table now contains length of strings starting with STARTING_STRING and 3 respective endings. Then just find which one is the longest and you will have your answer.
I also need to find the index of the original string that the largest substring starts at.
This will be the index of where starting string starts, in this case 10.

String Manipulation in java

I have one array of strings. I want to get each of string, divide it in to 3 parts (number-string-number), and put each part in another array. At last I want to have 3 arrays which two of them store numbers and one of them stores strings. The number of spaces between numbers and strings are not fixed.
the format of the strings in the first array is:
-2.2052 dalam -2.7300
-3.0511 dan akan -0.1116
It will be great if you help me with a sample code.
Here's the algorithm you could implement :
Create your 3 output arrays. They should all have the same length as the original string array
iterate through your original array.
for each string, find the index of the first space character and the index of the last space character. (look into the javadoc of the String class for methods doing that)
extract the substring before the first space, the substring between the first and last space, and the substring after the last space. The javadoc should help you.
Convert the first and third substring into an int (see the javadoc for Double for how to do it)
store the doubles and the string into the ouput arrays.
You can use indexOf and lastIndexOf to achieve this. Try following:
String arrayWithStringAndNumber[] = new String[2];
arrayWithStringAndNumber[0] = "-2.2052 dalam -2.7300";
arrayWithStringAndNumber[1] = "-3.0511 dan akan -0.1116";
String numArray1[] = new String[2];
String numArray2[] = new String[2];
String strArray[] = new String[2];
String temp;
for (int i = 0; i < arrayWithStringAndNumber.length; i++) {
temp = arrayWithStringAndNumber[i];
numArray1[i]=temp.substring(0,temp.indexOf(" "));
numArray2[i]=temp.substring(temp.lastIndexOf(" ")+1);
strArray[i]=temp.substring(temp.indexOf(" ")+1,temp.lastIndexOf(" "));
}
Make sure all arrays are of same length.
For num arrays use type whatever you want. I think you may need double and then you can easily parse the value to fit in it.
Hope this helps.
You can use indexOf(int ch) and lastIndexOf(int ch) of String object to find the first and last whitespace character and divide the string using these two indexes. You can also trim the middle string part if needed.
So:
String[] input; // given
Double[] firstNumbers = new Double[input.length];
String[] middleParts = new String[input.length];
Double[] secondNumbers = new Double[input.length];
for(int i = 0; i < input.length; i++) {
String line = input[i];
int firstWhitespace = line.indexOf(" ");
int lastWhitespace = line.lastIndexOf(" ");
String firstNumber = line.substring(0, firstWhitespace);
String middlePart = line.substring(firstWhitespace, lastWhitespace+1);
String secondNumber = line.substring(lastWhitespace+1, line.length());
// parse numbers to double, add to an array
firstNumbers[i] = Double.parseDouble(firstNumber);
middleParts[i] = middlePart;
secondNumbers[i] = Double.parseDouble(secondNumber);
}
Usually every programming language has functions for operating on strings data. Common set of functions is
length (or len) - to get length of string
find (or indexOf or somthing like this) - to find position of character of substring
substring (or substr) - to get substring of N characters from postion P
often
left/right - to get substring of N characters from left or right string's side
Trim/leftTrim/rightTrim - to trim from left and/or right string's side all space-characters or given as function parameter character.
Always as you need to operate on strings data, try to read documentation or google. You always will find information at Internet. Good luck!

Count no. of words using Regular expressions in java

How to count the number of times each word appear in a String in Java using Regular Expression?
I don't think a regex can solve your problem completely.
You want to
split a string into words, a regular expression can do this for a very simple definition of word, "parts of a string seperated by whitespace or punctuation", which is not a very good definition even if you just stick to English text
Count the number of occurances of each word derived from step 1. To do that you must store some kind of Mapping, and regexes neither store nor count.
A workable approach could be to
split the inputstring (by either regex or other means) into an array of word-strings
iterate over the array, and building a Map to keep count of each word
iterate over the map to output a list of words and the number of occurances.
If your input is limited to English you still have to consider how you want your algorithm to behave in case of things like they're <->they are etc and compound words. Add other languages to the mix for additional kinds of headaches (different ways of writing the same word, words split into parts, difference in writing depending on where in a sentence the word occurs, etc)
I would split your task into a) identify words and b) count number of each unique word in text.
a) could be solved with splitting the text with a regex.
b) could be solved by building a map with the result from a).
String text = "I like good mules. Mules are good :)";
String[] words = text.split("([\\W\\s]+)");
Map<String, Integer> counts = new HashMap<String, Integer>();
for (String word: words) {
if (counts.containsKey(word)) {
counts.put(word, counts.get(word) + 1);
} else {
counts.put(word, 1);
}
}
result: {Mules=1, are=1, good=2, mules=1, like=1, I=1}
Pattern p = Pattern.compile("\\babba\\b");
Matcher m = p.matcher("abba is abba with abbabba and abba doing abba");
int count = 0;
while(m.find()){
count++;
}
System.out.println(count); //4
Using Guava, this is a one-liner:
Multiset<String> countOfEachWord =
HashMultiset.create(Splitter.on(" ").omitEmptyStrings().split(myString));
then to get the count of "dog" for example you would say:
countOfEachWord.count("dog")
Must you use a regex? If not this might help:
public static int count(final String string, final String substring)
{
int count = 0;
int idx = 0;
while ((idx = string.indexOf(substring, idx)) != -1)
{
idx++;
count++;
}
return count;
}
int CountWords(String t){
return t.split("([[a-z][A-Z][0-9][\\Q-\\E]]+)",-1).length+(t.replaceAll("([[a-z][A-Z][0-9][\\W]]*)", "")).length()-1;
}
English Words(chemical names)+Chinese words

Categories

Resources