Parsing and replacing doubles in Java String - java

I am writing java code for translating signals. If a given string (INPUT) is:
C*12.387a0d14assc7*18.65d142a
Its translation (OUTPUT) should be:
C*12387:1000a0d14assc7*1865:100d142a
ALGORITHM:
Wherever in the string a star(*) follows a number containing decimal (in this string there are two, first is '*12.387' and the second is *18.65'), this number is to be changed into fraction as in the above example 12.387 is converted into 12387:1000 and 18.65 converted into 1865:100
If decimal number is isolated I can convert it into fraction with the following code:
double d = 12.387;
String str = Double.toString(d);
String[] fraction = str.split("\\.");
int denominator = (int)Math.pow(10, fraction[1].length());
int numerator = Integer.parseInt(fraction[0] + "" + fraction[1]);
System.out.println(numerator + ":" + denominator);
But I do not how to separate the substring of decimal number from the string it contains. Being new to java and programming I need help. Thanks in anticipation.

Using regex and capturing groups is a good way to implement your parsing:
String s = "C*12.387a0d14assc7*18.65d142a";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\\*(\\d+)\\.(\\d+)").matcher(s);
while (m.find()) {
String num = m.group(1);
String denom = m.group(2);
String divisor = "1" + new String(new char[denom.length()]).replace("\0", "0");
String replacement = "*" + num + denom + ":" + divisor;
m.appendReplacement(result, replacement);
}
m.appendTail(result);
System.out.println(result.toString());

I was writing a solution which uses regex'es but then tried without them. The solution here is general (for any programming language). Sure it would be interesting to see if it is faster than regex based solution. Anyway I suspect that regex based solution might be faster. Please see this solution too (it is not perfect though :) ).
import java.util.*;
class DoubleConvert
{
public static void main (String[] args)
{
StringBuilder buffer = new StringBuilder("C*12.387a0d14assc7*18.65d142a");
int j, m, k;
int i = 0;
while (i < buffer.length())
{
if (buffer.charAt(i) == '*')
{
m = -1; k = -1;
j = i; //remember where * found
while ( i + 1 < buffer.length() )
{
i++;
if (Character.isDigit(buffer.charAt(i)))
{
continue;
}
else if (buffer.charAt(i) == '.')
{
m = i; // remember where . found
while (i + 1 < buffer.length())
{
i++;
if (Character.isDigit(buffer.charAt(i)))
{
continue;
}
else
{
k = i; //remember the last position
break;
}
}
}
else //let's see what we got
{
if (m > 0 && j > 0 && m - j > 0 && k - m > 0) //there must exist strings
{
System.out.println("Found " + buffer.substring(j, m)
+ " second part " + buffer.substring(m, k));
buffer.replace(j+1, k,
buffer.substring(j+1, m) +
buffer.substring(m+1, k) +
":1" +
new String(new char[k-1-m]).replace("\0","0"));
}
break;
}
}
}
else
{
i++;
}
}
System.out.println("Result " + buffer);
}
}
The output
Found *12 second part .387
Found *18 second part .65
Result C*12387:1000a0d14assc7*1865:100d142a

Related

Hello I am trying to write a method that duplicates all vowels but only if they are on their own. for example "beautiful" would return "beautiifuul"

Here's what I have if someone could give me some idea of what to do that would be great. I think taking the index and counting how many values are together would be helpful but im not sure how to implement that. isVowel is a helper method to determine if the char is a vowel.
public static String doubleVowelsMaybe(String s)
{
int run =0;
String n = "";
for(int i = 0; i< s.length(); ++i)
{
char k = s.charAt(i);
if(isVowel(k))
{
}
if(run == 1)
{
n = n + s.substring(i, i+1) + s.substring(i, i+1);
run=0;
}
else
{
n = n + s.substring(i, i+1);
run= 0;
}
}
return n;
Most simple string manipulation tasks like this can be fairly easily done with a regex. This one's a one-liner:
public static String doubleVowelsMaybe(String s) {
return s.replaceAll("(?<![aeiou])([aeiou])(?![aeiou])", "$1$1");
}
The regex works as follows:
(?<![aeiou]) is a negative lookbehind, so it matches only if the character is not preceded by a vowel.
([aeiou]) matches a single vowel, and captures it to group number 1.
(?![aeiou]) is a negative lookahead, so it matches only if the character is not followed by a vowel.
The replacement of $1$1 means two copies of whatever was matched by group number 1, which is the single vowel character.
import java.util.*;
class Hello {
public static void main(String[] args) {
String abc = "beautiful";
String n = "";
int i = 0;
char[] abcchar = abc.toCharArray();
HashSet<Character> hs = new HashSet<>();
hs.add('a');
hs.add('e');
hs.add('i');
hs.add('o');
hs.add('u');
while (i < abcchar.length) {
if (i + 1 < abcchar.length && hs.contains(abcchar[i]) && !hs.contains(abcchar[i + 1])) {
n = n + abc.substring(i, i + 1) + abc.substring(i, i + 1);
} else {
while (hs.contains(abcchar[i])) {
n = n + abc.substring(i, i + 1);
i++;
}
n = n + abc.substring(i, i + 1);
}
i++;
}
System.out.print(n);
}
}

Replace nested string with some rules

There are 3 rules in the string:
It contains either word or group (enclosed by parentheses), and group can be nested;
If there is a space between word or group, those words or groups should append with "+".
For example:
"a b" needs to be "+a +b"
"a (b c)" needs to be "+a +(+b +c)"
If there is a | between word or group, those words or groups should be surround with parentheses.
For example:
"a|b" needs to be "(a b)"
"a|b|c" needs to be "(a b c)"
Consider all the rules, here is another example:
"aa|bb|(cc|(ff gg)) hh" needs to be "+(aa bb (cc (+ff +gg))) +hh"
I have tried to use regex, stack and recursive descent parser logic, but still cannot fully solve the problem.
Could anyone please share the logic or pseudo code on this problem?
New edited:
One more important rule: vertical bar has higher precedence.
For example:
aa|bb hh cc|dd (a|b) needs to be +(aa bb) +hh +(cc dd) +((a b))
(aa dd)|bb|cc (ee ff)|(gg hh) needs to be +((+aa +dd) bb cc) +((+ee +ff) (+gg +hh))
New edited:
To solve the precedence problem, I find a way to add the parentheses before calling Sunil Dabburi's methods.
For example:
aa|bb hh cc|dd (a|b) will be (aa|bb) hh (cc|dd) (a|b)
(aa dd)|bb|cc (ee ff)|(gg hh) will be ((aa dd)|bb|cc) ((ee ff)|(gg hh))
Since the performance is not a big concern to my application, this way at least make it work for me. I guess the JavaCC tool may solve this problem beautifully. Hope someone else can continue to discuss and contribute this problem.
Here is my attempt. Based on your examples and a few that I came up with I believe it is correct under the rules. I solved this by breaking the problem up into 2 parts.
Solving the case where I assume the string only contains words or is a group with only words.
Solving words and groups by substituting child groups out, use the 1) part and recursively repeating 2) with the child groups.
private String transformString(String input) {
Stack<Pair<Integer, String>> childParams = new Stack<>();
String parsedInput = input;
int nextInt = Integer.MAX_VALUE;
Pattern pattern = Pattern.compile("\\((\\w|\\|| )+\\)");
Matcher matcher = pattern.matcher(parsedInput);
while (matcher.find()) {
nextInt--;
parsedInput = matcher.replaceFirst(String.valueOf(nextInt));
String childParam = matcher.group();
childParams.add(Pair.of(nextInt, childParam));
matcher = pattern.matcher(parsedInput);
}
parsedInput = transformBasic(parsedInput);
while (!childParams.empty()) {
Pair<Integer, String> childGroup = childParams.pop();
parsedInput = parsedInput.replace(childGroup.fst.toString(), transformBasic(childGroup.snd));
}
return parsedInput;
}
// Transform basic only handles strings that contain words. This allows us to simplify the problem
// and not have to worry about child groups or nested groups.
private String transformBasic(String input) {
String transformedBasic = input;
if (input.startsWith("(")) {
transformedBasic = input.substring(1, input.length() - 1);
}
// Append + in front of each word if there are multiple words.
if (transformedBasic.contains(" ")) {
transformedBasic = transformedBasic.replaceAll("( )|^", "$1+");
}
// Surround all words containing | with parenthesis.
transformedBasic = transformedBasic.replaceAll("([\\w]+\\|[\\w|]*[\\w]+)", "($1)");
// Replace pipes with spaces.
transformedBasic = transformedBasic.replace("|", " ");
if (input.startsWith("(") && !transformedBasic.startsWith("(")) {
transformedBasic = "(" + transformedBasic + ")";
}
return transformedBasic;
}
Verified with the following test cases:
#ParameterizedTest
#CsvSource({
"a b,+a +b",
"a (b c),+a +(+b +c)",
"a|b,(a b)",
"a|b|c,(a b c)",
"aa|bb|(cc|(ff gg)) hh,+(aa bb (cc (+ff +gg))) +hh",
"(aa(bb(cc|ee)|ff) gg),(+aa(bb(cc ee) ff) +gg)",
"(a b),(+a +b)",
"(a(c|d) b),(+a(c d) +b)",
"bb(cc|ee),bb(cc ee)",
"((a|b) (a b)|b (c|d)|e),(+(a b) +((+a +b) b) +((c d) e))"
})
void testTransformString(String input, String output) {
Assertions.assertEquals(output, transformString(input));
}
#ParameterizedTest
#CsvSource({
"a b,+a +b",
"a b c,+a +b +c",
"a|b,(a b)",
"(a b),(+a +b)",
"(a|b),(a b)",
"a|b|c,(a b c)",
"(aa|bb cc|dd),(+(aa bb) +(cc dd))",
"(aa|bb|ee cc|dd),(+(aa bb ee) +(cc dd))",
"aa|bb|cc|ff gg hh,+(aa bb cc ff) +gg +hh"
})
void testTransformBasic(String input, String output) {
Assertions.assertEquals(output, transformBasic(input));
}
I tried to solve the problem. Not sure if it works in all cases. Verified with the inputs given in the question and it worked fine.
We need to format the pipes first. That will help add necessary parentheses and spacing.
The spaces generated as part of pipe processing can interfere with actual spaces that are available in our expression. So used $ symbol to mask them.
To process spaces, its tricky as parantheses need to be processed individually. So the approach I am following is to find a set of parantheses starting from outside and going inside.
So typically we have <left_part><parantheses_code><right_part>. Now left_part can be empty, similary right_part can be empty. we need to handle such cases.
Also, if the right_part starts with a space, we need to add '+' to left_part as per space requirement.
NOTE: I am not sure what's expected of (a|b). If the result should be ((a b)) or (a b). I am going with ((a b)) purely by the definition of it.
Now here is the working code:
public class Test {
public static void main(String[] args) {
String input = "aa|bb hh cc|dd (a|b)";
String result = formatSpaces(formatPipes(input)).replaceAll("\\$", " ");
System.out.println(result);
}
private static String formatPipes(String input) {
while (true) {
char[] chars = input.toCharArray();
int pIndex = input.indexOf("|");
if (pIndex == -1) {
return input;
}
input = input.substring(0, pIndex) + '$' + input.substring(pIndex + 1);
int first = pIndex - 1;
int closeParenthesesCount = 0;
while (first >= 0) {
if (chars[first] == ')') {
closeParenthesesCount++;
}
if (chars[first] == '(') {
if (closeParenthesesCount > 0) {
closeParenthesesCount--;
}
}
if (chars[first] == ' ') {
if (closeParenthesesCount == 0) {
break;
}
}
first--;
}
String result;
if (first > 0) {
result = input.substring(0, first + 1) + "(";
} else {
result = "(";
}
int last = pIndex + 1;
int openParenthesesCount = 0;
while (last <= input.length() - 1) {
if (chars[last] == '(') {
openParenthesesCount++;
}
if (chars[last] == ')') {
if (openParenthesesCount > 0) {
openParenthesesCount--;
}
}
if (chars[last] == ' ') {
if (openParenthesesCount == 0) {
break;
}
}
last++;
}
if (last >= input.length() - 1) {
result = result + input.substring(first + 1) + ")";
} else {
result = result + input.substring(first + 1, last) + ")" + input.substring(last);
}
input = result;
}
}
private static String formatSpaces(String input) {
if (input.isEmpty()) {
return "";
}
int startIndex = input.indexOf("(");
if (startIndex == -1) {
if (input.contains(" ")) {
String result = input.replaceAll(" ", " +");
if (!result.trim().startsWith("+")) {
result = '+' + result;
}
return result;
} else {
return input;
}
}
int endIndex = startIndex + matchingCloseParenthesesIndex(input.substring(startIndex));
if (endIndex == -1) {
System.out.println("Invalid input!!!");
return "";
}
String first = "";
String last = "";
if (startIndex > 0) {
first = input.substring(0, startIndex);
}
if (endIndex < input.length() - 1) {
last = input.substring(endIndex + 1);
}
String result = formatSpaces(first);
String parenthesesStr = input.substring(startIndex + 1, endIndex);
if (last.startsWith(" ") && first.isEmpty()) {
result = result + "+";
}
result = result + "("
+ formatSpaces(parenthesesStr)
+ ")"
+ formatSpaces(last);
return result;
}
private static int matchingCloseParenthesesIndex(String input) {
int counter = 1;
char[] chars = input.toCharArray();
for (int i = 1; i < chars.length; i++) {
char ch = chars[i];
if (ch == '(') {
counter++;
} else if (ch == ')') {
counter--;
}
if (counter == 0) {
return i;
}
}
return -1;
}
}

Find n:th word in a string

I'm trying to find nth word in a string. I am not allowed to use StringToknizer or split method from String class.
I now realize that I can use white space as a separator. The only problem is I don't know how to find the location of the first white space.
public static String pick(String message, int number){
String lastWord;
int word = 1;
String result = "haha";
for(int i=0; i<message.length();i++){
if(message.charAt(i)==' '){enter code here
word++;
}
}
if(number<=word && number > 0 && number != 1){//Confused..
int space = message.indexOf(" ");//improve
int nextSpace = message.indexOf(" ", space + 1);//also check dat
result = message.substring(space,message.indexOf(' ', space + 1));
}
if(number == 1){
result = message.substring(0,message.indexOf(" "));
}
if(number>word){
lastWord = message.substring(message.lastIndexOf(" ")+1);
return lastWord;
}
else return result;
}
The current implementation is overcomplicated, hard to understand.
Consider this alternative algorithm:
Initialize index = 0, to track your position in the input string
Repeat n - 1 of times:
Skip over non-space characters
Skip over space characters
At this point you are at the start of the n-th word, save this to start
Skip over non-space characters
At this point you are just after the end of the n-th word
Return the substring between start and end
Like this:
public static String pick(String message, int n) {
int index = 0;
for (int i = 1; i < n; i++) {
while (index < message.length() && message.charAt(index) != ' ') index++;
while (index < message.length() && message.charAt(index) == ' ') index++;
}
int start = index;
while (index < message.length() && message.charAt(index) != ' ') index++;
return message.substring(start, index);
}
Note that if n is higher than there are words in the input,
this will return empty string.
(If that's not what you want, it should be easy to tweak.)
CHEAT (using regex)1
public static String pick(String message, int number){
Matcher m = Pattern.compile("^\\W*" + (number > 1 ? "(?:\\w+\\W+){" + (number - 1) + "}" : "") + "(\\w+)").matcher(message);
return (m.find() ? m.group(1) : null);
}
Test
System.out.println(pick("This is a test", 1));
System.out.println(pick("! This # is # a $ test % ", 3));
System.out.println(pick("This is a test", 5));
Output
This
a
null
1) Only StringTokenizer and split are disallowed ;-)
This needs some edge case handling (e.g. there are fewer than n words), but here's the idea I was getting at. This is similar to your solution, but IMO less elegant than janos'.
public static String pick(String message, int n) {
int wordCount = 0;
String word = "";
int wordBegin = 0;
int wordEnd = message.indexOf(' ');
while (wordEnd >= 0 && wordCount < n) {
word = message.substring(wordBegin, wordEnd).trim();
message = message.substring(wordEnd).trim();
wordEnd = message.indexOf(' ');
wordCount++;
}
if (wordEnd == -1 && wordCount + 1 == n) {
return message;
}
if (wordCount + 1 < n) {
return "Not enough words to satisfy";
}
return word;
}
Most iteration in Java can now be replaced by streams. Whether this is an improvement is a matter of (strong) opinion.
int thirdWordIndex = IntStream.range(0, message.size() - 1)
.filter(i -> Character.isWhiteSpace(message.charAt(i)))
.filter(i -> Character.isLetter(message.charAt(i + 1)))
.skip(2).findFirst()
.orElseThrow(IllegalArgumentException::new) + 1;

I want the string pattern aabbcc to be displayed as 2a2b2c

I have somehow got the output with the help of some browsing. But I couldn't understand the logic behind the code. Is there any simple way to achieve this?
public class LetterCount {
public static void main(String[] args)
{
String str = "aabbcccddd";
int[] counts = new int[(int) Character.MAX_VALUE];
// If you are certain you will only have ASCII characters, I would use `new int[256]` instead
for (int i = 0; i < str.length(); i++) {
char charAt = str.charAt(i);
counts[(int) charAt]++;
}
for (int i = 0; i < counts.length; i++) {
if (counts[i] > 0)
//System.out.println("Number of " + (char) i + ": " + counts[i]);
System.out.print(""+ counts[i] + (char) i + "");
}
}
}
There are 3 conditions which need to be taken care of:
if (s.charAt(x) != s.charAt(x + 1) && count == 1) ⇒ print the counter and character;
if (s.charAt(x) == s.charAt(x + 1)) ⇒ increase the counter;
if (s.charAt(x) != s.charAt(x + 1) && count >= 2) ⇒ reset to counter 1.
{
int count= 1;
int x;
for (x = 0; x < s.length() - 1; x++) {
if (s.charAt(x) != s.charAt(x + 1) && count == 1) {
System.out.print(s.charAt(x));
System.out.print(count);
}
else if (s.charAt(x)== s.charAt(x + 1)) {
count++;
}
else if (s.charAt(x) != s.charAt(x + 1) && count >= 2) {
System.out.print(s.charAt(x));
System.out.print(count);
count = 1;
}
}
System.out.print(s.charAt(x));
System.out.println(count);
}
The code is really simple.It uses the ASCII value of a character to index into the array that stores the frequency of each character.
The output is simply got by iterating over that array and which character has frequency greater than 1, print it accordingly as you want in the output that is frequency followed by character.
If the input string has same characters consecutive then the solution can be using space of O(1)
For example in your string aabbcc, the same characters are consecutive , so we can take advantage of this fact and count the character frequency and print it at the same time.
for (int i = 0; i < str.length(); i++)
{
int freq = 1;
while((i+1)<str.length()&&str.charAt(i) == str.charAt(i+1))
{++freq;++i}
System.out.print(freq+str.charAt(i));
}
You are trying to keep count of the number of times each character is found. An array is referenced by an index. For example, the ASCII code for the lowercase letter a is the integer 97. Thus the count of the number of times the letter a is seen is in counts[97]. After every element in the counts array has been set, you print out how many have been found.
This should help you understand the basic idea behind how to approach the string compression problem
import java.util.*;
public class LetterCount {
public static void main(String[] args) {
//your input string
String str = "aabbcccddd";
//split your input into characters
String chars[] = str.split("");
//maintain a map to store unique character and its frequency
Map<String, Integer> compressMap = new LinkedHashMap<String, Integer>();
//read every letter in input string
for(String s: chars) {
//java.lang.String.split(String) method includes empty string in your
//split array, so you need to ignore that
if("".equals(s))
continue;
//obtain the previous occurances of the character
Integer count = compressMap.get(s);
//if the character was previously encountered, increment its count
if(count != null)
compressMap.put(s, ++count);
else//otherwise store it as first occurance
compressMap.put(s, 1);
}
//Create a StringBuffer object, to append your input
//StringBuffer is thread safe, so I prefer using it
//you could use StringBuilder if you don't expect your code to run
//in a multithreaded environment
StringBuffer output = new StringBuffer("");
//iterate over every entry in map
for (Map.Entry<String, Integer> entry : compressMap.entrySet()) {
//append the results to output
output.append(entry.getValue()).append(entry.getKey());
}
//print the output on console
System.out.println(output);
}
}
class Solution {
public String toFormat(String input) {
char inChar[] = input.toCharArray();
String output = "";
int i;
for(i=0;i<input.length();i++) {
int count = 1;
while(i+1<input.length() && inChar[i] == inChar[i+1]) {
count+=1;
i+=1;
}
output+=inChar[i]+String.valueOf(count);
}
return output;
}
public static void main(String[] args) {
Solution sol = new Solution();
String input = "aaabbbbcc";
System.out.println("Formatted String is: " + sol.toFormat(input));
}
}
def encode(Test_string):
count = 0
Result = ""
for i in range(len(Test_string)):
if (i+1) < len(Test_string) and (Test_string[i] == Test_string[i+1]):
count += 1
else:
Result += str((count+1))+Test_string[i]
count = 0
return Result
print(encode("ABBBBCCCCCCCCAB"))
If you want to get the correct count considering the string is not in alphabetical order. Sort the string
public class SquareStrings {
public static void main(String[] args) {
SquareStrings squareStrings = new SquareStrings();
String str = "abbccddddbd";
System.out.println(squareStrings.manipulate(str));
}
private String manipulate(String str1) {
//convert to charArray
char[] charArray = str1.toCharArray();
Arrays.sort(charArray);
String str = new String(charArray);
StringBuilder stbuBuilder = new StringBuilder("");
int length = str.length();
String temp = "";
if (length > 1) {
for (int i = 0; i < length; i++) {
int freq = 1;
while (((i + 1) < length) && (str.charAt(i) == str.charAt(i + 1))) {
++freq;
temp = str.charAt(i) + "" + freq;
++i;
}
stbuBuilder.append(temp);
}
} else {
return str + "" + 1;
}
return stbuBuilder.toString();
}
}
Kotlin:
fun compressString(input: String): String {
if (input.isEmpty()){
return ""
}
var result = ""
var count = 1
var char1 = input[0]
for (i in 1 until input.length) {
val char2 = input[i]
if (char1 == char2) {
count++
} else {
if (count != 1) {
result += "$count$char1"
count = 1
} else {
result += "$char1"
}
char1 = char2
}
}
result += if (count != 1) {
"$count$char1"
} else {
"$char1"
}
return result
}

detect incomplete patterns in strings

i have a string containing nested repeating patterns, for example:
String pattern1 = "1234";
String pattern2 = "5678";
String patternscombined = "1234|1234|5678|9"//added | for reading pleasure
String pattern = (pattern1 + pattern1 + pattern2 + "9")
+(pattern1 + pattern1 + pattern2 + "9")
+(pattern1 + pattern1 + pattern2 + "9")
String result = "1234|1234|5678|9|1234|1234|56";
As you can see in the above example, the result got cut off. But when knowing the repeating patterns, you can predict, what could come next.
Now to my question:
How can i predict the next repetitions of this pattern, to get a resulting string like:
String predictedresult = "1234|1234|5678|9|1234|1234|5678|9|1234|1234|5678|9";
Patterns will be smaller that 10 characters, the predicted result will be smaller than 1000 characters.
I am only receiving the cutoff result string and a pattern recognition program is already implemented and working. In the above example, i would have result, pattern1, pattern2 and patternscombined.
EDIT:
I have found a solution working for me:
import java.util.Arrays;
public class LRS {
// return the longest common prefix of s and t
public static String lcp(String s, String t) {
int n = Math.min(s.length(), t.length());
for (int i = 0; i < n; i++) {
if (s.charAt(i) != t.charAt(i))
return s.substring(0, i);
}
return s.substring(0, n);
}
// return the longest repeated string in s
public static String lrs(String s) {
// form the N suffixes
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++) {
suffixes[i] = s.substring(i, N);
}
// sort them
Arrays.sort(suffixes);
// find longest repeated substring by comparing adjacent sorted suffixes
String lrs = "";
for (int i = 0; i < N - 1; i++) {
String x = lcp(suffixes[i], suffixes[i + 1]);
if (x.length() > lrs.length())
lrs = x;
}
return lrs;
}
public static int startingRepeats(final String haystack, final String needle)
{
String s = haystack;
final int len = needle.length();
if(len == 0){
return 0;
}
int count = 0;
while (s.startsWith(needle)) {
count++;
s = s.substring(len);
}
return count;
}
public static String lrscutoff(String s){
String lrs = s;
int length = s.length();
for (int i = length; i > 0; i--) {
String x = lrs(s.substring(0, i));
if (startingRepeats(s, x) < 10 &&
startingRepeats(s, x) > startingRepeats(s, lrs)){
lrs = x;
}
}
return lrs;
}
// read in text, replacing all consecutive whitespace with a single space
// then compute longest repeated substring
public static void main(String[] args) {
long time = System.nanoTime();
long timemilis = System.currentTimeMillis();
String s = "12341234567891234123456789123412345";
String repeat = s;
while(repeat.length() > 0){
System.out.println("-------------------------");
String repeat2 = lrscutoff(repeat);
System.out.println("'" + repeat + "'");
int count = startingRepeats(repeat, repeat2);
String rest = repeat.substring(count*repeat2.length());
System.out.println("predicted: (rest ='" + rest + "')" );
while(count > 0){
System.out.print("'" + repeat2 + "' + ");
count--;
}
if(repeat.equals(repeat2)){
System.out.println("''");
break;
}
if(rest!="" && repeat2.contains(rest)){
System.out.println("'" + repeat2 + "'");
}else{
System.out.println("'" + rest + "'");
}
repeat = repeat2;
}
System.out.println("Time: (nano+millis):");
System.out.println(System.nanoTime()-time);
System.out.println(System.currentTimeMillis()-timemilis);
}
}
If your predict String is always piped(|) the numbers then you can easily split them using pipe and then keep track of the counts on a HashMap. For example
1234 = 2
1344 = 1
4411 = 5
But if not, then you have to modify the Longest Repeated Substring algorithm. As you need to have all repeated substrings so keep track of all instead of only the Longest one. Also, you have to put a checking for minimum length of substring along with overlapping substring. By searching google you'll find lot of reference of this algorithm.
You seem to need something like an n-gram language model, which is a statistical model that is based on counts of co-occurring events. If you are given some training data, you can derive the probabilities from counts of seen patterns. If not, you can try to specify them manually, but this can get tricky. Once you have such a language model (where the digit patterns correspond to words), you can always predict the next word by picking one with the highest probability given some previous words ("history").

Categories

Resources