I am trying to count all instances of a substring from .PBAAP.B with P A B in that sequence and can have 1-3 symbols in between them (inclusive).
The output should be 2
.P.A...B
.P..A..B
What I've tried so far is
return (int) Pattern
.compile("P.{0,2}A.{0,2}B")
.matcher(C)
.results()
.count();
But I only get output 1. My guess is that in both cases, the group is PBAAP.B. So instead of 2, I get 1.
I could write an elaborate function to achieve what I am trying to do, but I was wondering if there was a way to do it with regex.
int count = 0;
for (int i = 0; i < C.length(); i++) {
String p = Character.toString(C.charAt(i));
if (p.equalsIgnoreCase("P")) {
for (int j = X; j <= Y; j++) {
if (i + j < C.length()) {
String a = Character.toString(C.charAt(i + j));
if (a.equals("A")) {
for (int k = X; k <= Y; k++) {
if (i + j + k < C.length()) {
String b = Character.toString(C.charAt(i + j + k));
if (b.equalsIgnoreCase("B")) {
count++;
}
}
}
}
}
return count;
To my knowledge, you will only get a boolean response out of a regex match - either it is a match or it isn't. Thus, I can't think of a solution solving your problem using regex.
count() is used, if you want to check if there are multiple matches at different indices - which is not what you want. For instance, the following snippet will return 2 as th is found at 11-12 and at 20-21:
Pattern
.compile("(th)")
.matcher("let's test this together")
.results()
.count();
In order to keep the solution without regex readable and extensible, you may want to use regression.
Class to keep track of the latest match of each letter:
public class Letter {
private char letter;
private int latestMatch;
// constructor
// getters + setters
}
Class to detect different matches:
public class MatchFinder {
final static String C = "......PAA.BBBadsfjksPeAkBB";
final static Letter[] LETTERS = new Letter[3];
final static int SYMBOLS_THRESHOLD = 4;
public static void main(String[] args) {
LETTERS[0] = new Letter('P', -1);
LETTERS[1] = new Letter('A', -1);
LETTERS[2] = new Letter('B', -1);
int count = countMatches(0, C.length(), 0, 0, -1);
System.out.println(count);
}
public static int countMatches(int start, int end, int letterIndex, int currentCount, int latestMatch) {
for (int i = start; i < end; i++) {
if (i < C.length()) {
Character c = Character.toLowerCase(C.charAt(i));
if (c.equals(LETTERS[letterIndex].getLowercaseLetter()) && i > latestMatch) {
LETTERS[letterIndex].setLatestMatch(i);
if (letterIndex + 1 < LETTERS.length) {
int childStart = LETTERS[letterIndex].getLatestMatch() + 1;
return countMatches(childStart, childStart + SYMBOLS_THRESHOLD, letterIndex + 1, currentCount, -1);
}
currentCount++;
}
}
}
if (letterIndex > 0) {
int parentLetterIndex = letterIndex - 1;
int latestParentMatch = LETTERS[parentLetterIndex].getLatestMatch();
if (letterIndex > 1) {
int parentStart = LETTERS[letterIndex - 2].getLatestMatch() + 1;
return countMatches(parentStart, parentStart + SYMBOLS_THRESHOLD, parentLetterIndex, currentCount, latestParentMatch);
} else {
return countMatches(LETTERS[parentLetterIndex].getLatestMatch() + 1, C.length(), 0, currentCount, latestParentMatch);
}
}
return currentCount;
}
}
I have to create an N * M matrix and fill it up with values between 0 and 9. One of the values should be "A" which is the starting point of the graph, and I should find the shortest path to the value "B" (both of these are generated at a random position of the matrix). If the value is 0 it counts as an obstacle, and 2 < N, M < 100.
I have to print out the exact route of the shortest graph and the total cost of it. Also, the problem has to be solved by Dijkstra's algorithm.
I've haven't gotten past filling up the Matrix. I store the values in a 2D String array, but I think I should use different arrays or maybe Maps for storing the positions of key values such as the start and endpoint. I've been thinking on this for 2 days now because I'm a total noob in Java and not much better at programming in general. I'm mainly looking for guidance on how to store the datas and what should I actually store in order to get to the end because I think I overcomplicate the problem.
This is the matrix generating part of the code.
int N = ThreadLocalRandom.current().nextInt(3,7);
int M = ThreadLocalRandom.current().nextInt(3,7);
int J = ThreadLocalRandom.current().nextInt(0,(Math.min(N, M))/2);
int K = 0;
int aPosX = ThreadLocalRandom.current().nextInt(0,N);
int aPosY = ThreadLocalRandom.current().nextInt(0,M);
int bPosX = ThreadLocalRandom.current().nextInt(0,N);
int bPosY = ThreadLocalRandom.current().nextInt(0,M);
String[][] matrix = new String[N][M];
int[][] map = new int[N][M];
int shortestPath = 10;
int currentPosX,currentPosY;
int shortestPosX, shortestPosY;
public void generateMatrix(){
for (int i = 0; i < N; i++) {
for (int j = 0; j < M; j++) {
K = ThreadLocalRandom.current().nextInt(0,10);
matrix[i][j] = String.valueOf(K);
}
}
}
public void createStartAndFinish(){
matrix[aPosX][aPosY] = "A";
matrix[bPosX][bPosY] = "B";
}
}
This part finds the lowest cost adjacent tiles and steps on them but id does generate an out of bounds exception. I'm also aware that it has nothing to do with Dijkstra algorithm but this is my starting point.
public void solveMatrix(){
visited[aPosX][aPosY] = true;
currentPosX = aPosX;
currentPosY = aPosY;
while (!matrix[currentPosX - 1][currentPosY].equals("B") ||
!matrix[currentPosX + 1][currentPosY].equals("B") ||
!matrix[currentPosX][currentPosY - 1].equals("B") ||
!matrix[currentPosX][currentPosY + 1].equals("B")) {
if(currentPosX > 0) {
if(!visited[currentPosX - 1][currentPosY] && Integer.parseInt(matrix[currentPosX - 1][currentPosY]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX - 1][currentPosY]);
shortestPosX = currentPosX - 1;
shortestPosY = currentPosY;
}
}
if(currentPosX + 1 < N){
if(!visited[currentPosX + 1][currentPosY] && Integer.parseInt(matrix[currentPosX + 1][currentPosY]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX + 1][currentPosY]);
shortestPosX = currentPosX + 1;
shortestPosY = currentPosY;
}
}
if(currentPosY > 0){
if(!visited[currentPosX][currentPosY - 1] && Integer.parseInt(matrix[currentPosX][currentPosY - 1]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX][currentPosY - 1]);
shortestPosX = currentPosX;
shortestPosY = currentPosY - 1;
}
}
if(currentPosY - 1 < M){
if(!visited[currentPosX][currentPosY + 1] && Integer.parseInt(matrix[currentPosX][currentPosY + 1]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX][currentPosY + 1]);
shortestPosX = currentPosX;
shortestPosY = currentPosY + 1;
}
}
visited[shortestPosX][shortestPosY] = true;
currentPosX = shortestPosX;
currentPosY = shortestPosY;
System.out.println(shortestPosX + " " + shortestPosY + " " + shortestPath);
shortestPath = 10;
}
}
I am trying to find patterns that:
occur more than once
are more than 1 character long
are not substrings of any other known pattern
without knowing any of the patterns that might occur.
For example:
The string "the boy fell by the bell" would return 'ell', 'the b', 'y '.
The string "the boy fell by the bell, the boy fell by the bell" would return 'the boy fell by the bell'.
Using double for-loops, it can be brute forced very inefficiently:
ArrayList<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) {
int limit = (length - i) / 2;
for (int j = limit; j >= 1; j--) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if(candidate.length() <= 1) {
continue;
}
if (string.substring(candidateEndIndex).contains(candidate)) {
boolean notASubpattern = true;
for (String pattern : patternsList) {
if (pattern.contains(candidate)) {
notASubpattern = false;
break;
}
}
if (notASubpattern) {
patternsList.add(candidate);
}
}
}
}
However, this is incredibly slow when searching large strings with tons of patterns.
You can build a suffix tree for your string in linear time:
https://en.wikipedia.org/wiki/Suffix_tree
The patterns you are looking for are the strings corresponding to internal nodes that have only leaf children.
You could use n-grams to find patterns in a string. It would take O(n) time to scan the string for n-grams. When you find a substring by using a n-gram, put it into a hash table with a count of how many times that substring was found in the string. When you're done searching for n-grams in the string, search the hash table for counts greater than 1 to find recurring patterns in the string.
For example, in the string "the boy fell by the bell, the boy fell by the bell" using a 6-gram will find the substring "the boy fell by the bell". A hash table entry with that substring will have a count of 2 because it occurred twice in the string. Varying the number of words in the n-gram will help you discover different patterns in the string.
Dictionary<string, int>dict = new Dictionary<string, int>();
int count = 0;
int ngramcount = 6;
string substring = "";
// Add entries to the hash table
while (count < str.length) {
// copy the words into the substring
int i = 0;
substring = "";
while (ngramcount > 0 && count < str.length) {
substring[i] = str[count];
if (str[i] == ' ')
ngramcount--;
i++;
count++;
}
ngramcount = 6;
substring.Trim(); // get rid of the last blank in the substring
// Update the dictionary (hash table) with the substring
if (dict.Contains(substring)) { // substring is already in hash table so increment the count
int hashCount = dict[substring];
hashCount++;
dict[substring] = hashCount;
}
else
dict[substring] = 1;
}
// Find the most commonly occurrring pattern in the string
// by searching the hash table for the greatest count.
int maxCount = 0;
string mostCommonPattern = "";
foreach (KeyValuePair<string, int> pair in dict) {
if (pair.Value > maxCount) {
maxCount = pair.Value;
mostCommonPattern = pair.Key;
}
}
I've written this just for fun. I hope I have understood the problem correctly, this is valid and fast enough; if not, please be easy on me :) I might optimize it a little more I guess, if someone finds it useful.
private static IEnumerable<string> getPatterns(string txt)
{
char[] arr = txt.ToArray();
BitArray ba = new BitArray(arr.Length);
for (int shingle = getMaxShingleSize(arr); shingle >= 2; shingle--)
{
char[] arr1 = new char[shingle];
int[] indexes = new int[shingle];
HashSet<int> hs = new HashSet<int>();
Dictionary<int, int[]> dic = new Dictionary<int, int[]>();
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
{
int index = i + j;
arr1[j] = arr[index];
indexes[j] = index;
}
int h = getHashCode(arr1);
if (hs.Add(h))
{
int[] indexes1 = new int[indexes.Length];
Buffer.BlockCopy(indexes, 0, indexes1, 0, indexes.Length * sizeof(int));
dic.Add(h, indexes1);
}
else
{
bool exists = false;
foreach (int index in indexes)
if (ba.Get(index))
{
exists = true;
break;
}
if (!exists)
{
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
if (ba.Get(index))
{
exists = true;
break;
}
}
if (!exists)
{
foreach (int index in indexes)
ba.Set(index, true);
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
ba.Set(index, true);
dic[h] = null;
yield return new string(arr1);
}
}
}
}
}
private static int getMaxShingleSize(char[] arr)
{
for (int shingle = 2; shingle <= arr.Length / 2 + 1; shingle++)
{
char[] arr1 = new char[shingle];
HashSet<int> hs = new HashSet<int>();
bool noPattern = true;
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
arr1[j] = arr[i + j];
int h = getHashCode(arr1);
if (!hs.Add(h))
{
noPattern = false;
break;
}
}
if (noPattern)
return shingle - 1;
}
return -1;
}
private static int getHashCode(char[] arr)
{
unchecked
{
int hash = (int)2166136261;
foreach (char c in arr)
hash = (hash * 16777619) ^ c.GetHashCode();
return hash;
}
}
Edit
My previous code has serious problems. This one is better:
private static IEnumerable<string> getPatterns(string txt)
{
Dictionary<int, int> dicIndexSize = new Dictionary<int, int>();
for (int shingle = 2, count0 = txt.Length / 2 + 1; shingle <= count0; shingle++)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
bool patternExists = false;
for (int i = 0, count = txt.Length - shingle; i <= count; i++)
{
string sub = txt.Substring(i, shingle);
if (!dic.ContainsKey(sub))
dic.Add(sub, i);
else
{
patternExists = true;
int index0 = dic[sub];
if (index0 >= 0)
{
dicIndexSize[index0] = shingle;
dic[sub] = -1;
}
}
}
if (!patternExists)
break;
}
List<int> lst = dicIndexSize.Keys.ToList();
lst.Sort((a, b) => dicIndexSize[b].CompareTo(dicIndexSize[a]));
BitArray ba = new BitArray(txt.Length);
foreach (int i in lst)
{
bool ok = true;
int len = dicIndexSize[i];
for (int j = i, max = i + len; j < max; j++)
{
if (ok) ok = !ba.Get(j);
ba.Set(j, true);
}
if (ok)
yield return txt.Substring(i, len);
}
}
Text in this book took 3.4sec in my computer.
Suffix arrays are the right idea, but there's a non-trivial piece missing, namely, identifying what are known in the literature as "supermaximal repeats". Here's a GitHub repo with working code: https://github.com/eisenstatdavid/commonsub . Suffix array construction uses the SAIS library, vendored in as a submodule. The supermaximal repeats are found using a corrected version of the pseudocode from findsmaxr in Efficient repeat finding via suffix arrays
(Becher–Deymonnaz–Heiber).
static void FindRepeatedStrings(void) {
// findsmaxr from https://arxiv.org/pdf/1304.0528.pdf
printf("[");
bool needComma = false;
int up = -1;
for (int i = 1; i < Len; i++) {
if (LongCommPre[i - 1] < LongCommPre[i]) {
up = i;
continue;
}
if (LongCommPre[i - 1] == LongCommPre[i] || up < 0) continue;
for (int k = up - 1; k < i; k++) {
if (SufArr[k] == 0) continue;
unsigned char c = Buf[SufArr[k] - 1];
if (Set[c] == i) goto skip;
Set[c] = i;
}
if (needComma) {
printf("\n,");
}
printf("\"");
for (int j = 0; j < LongCommPre[up]; j++) {
unsigned char c = Buf[SufArr[up] + j];
if (iscntrl(c)) {
printf("\\u%.4x", c);
} else if (c == '\"' || c == '\\') {
printf("\\%c", c);
} else {
printf("%c", c);
}
}
printf("\"");
needComma = true;
skip:
up = -1;
}
printf("\n]\n");
}
Here's a sample output on the text of the first paragraph:
Davids-MBP:commonsub eisen$ ./repsub input
["\u000a"
," S"
," as "
," co"
," ide"
," in "
," li"
," n"
," p"
," the "
," us"
," ve"
," w"
,"\""
,"–"
,"("
,")"
,". "
,"0"
,"He"
,"Suffix array"
,"`"
,"a su"
,"at "
,"code"
,"com"
,"ct"
,"do"
,"e f"
,"ec"
,"ed "
,"ei"
,"ent"
,"ere's a "
,"find"
,"her"
,"https://"
,"ib"
,"ie"
,"ing "
,"ion "
,"is"
,"ith"
,"iv"
,"k"
,"mon"
,"na"
,"no"
,"nst"
,"ons"
,"or"
,"pdf"
,"ri"
,"s are "
,"se"
,"sing"
,"sub"
,"supermaximal repeats"
,"te"
,"ti"
,"tr"
,"ub "
,"uffix arrays"
,"via"
,"y, "
]
I would use Knuth–Morris–Pratt algorithm (linear time complexity O(n)) to find substrings. I would try to find the largest substring pattern, remove it from the input string and try to find the second largest and so on. I would do something like this:
string pattern = input.substring(0,lenght/2);
string toMatchString = input.substring(pattern.length, input.lenght - 1);
List<string> matches = new List<string>();
while(pattern.lenght > 0)
{
int index = KMP(pattern, toMatchString);
if(index > 0)
{
matches.Add(pattern);
// remove the matched pattern occurences from the input string
// I would do something like this:
// 0 to pattern.lenght gets removed
// check for all occurences of pattern in toMatchString and remove them
// get the remaing shrinked input, reassign values for pattern & toMatchString
// keep looking for the next largest substring
}
else
{
pattern = input.substring(0, pattern.lenght - 1);
toMatchString = input.substring(pattern.length, input.lenght - 1);
}
}
Where KMP implements Knuth–Morris–Pratt algorithm. You can find the Java implementations of it at Github or Princeton or write it yourself.
PS: I don't code in Java and it is quick try to my first bounty about to close soon. So please don't give me the stick if I missed something trivial or made a +/-1 error.
I should carry out this exercise in the creation of a class, I uploaded this is the professor's solution, in sum and product methods can not quite figure out what place and why use "A".
class Vettore {
private int[] V = new int[6];
public Vettore(int[] X) {
if (X.length != 6)
throw new BadDataException();
for (int i = 0; i < 6; i++)
if (X[i] < 0)
throw new BadDataException();
else
V[i] = X[i];
}
public Vettore() {}
public Vettore somma(Vettore X) {
int[] A = new int[6];
for (int i = 0; i < 6; i++)
A[i] = V[i] + X.V[i];
return new Vettore(A);
}
public Vettore prodotto(Vettore X) {
int k = 0;
for (int i = 0; i < 6; i++)
k += V[i] * X.V[i];
return k;
}
public int get(int i) {
if (i < 0 || i > 5)
throw new BadDataException();
return V[i];
}
public String toString() {
String t = "( ";
for (int i = 0; i < 6; i++)
t += V[i] + (i == 5 ? " " : ", ");
return t + ")";
}
public boolean equals(Vettore X) {
for (int i = 0; i < 6; i++)
if (V[i] != X.V[i])
return false;
return true;
}
}
As far as I see it, and assuming somma means sum and prodotto means product, The A is needed because you have to store the sum values of the V and X.V arrays for every index. If you didn't use another array for this, you wouldn't be able to achive adding the appropriate indexes in somma for example. This method stands for - as I see it - Adding the two arrays' appropriate elements.
EDIT: another thing. Are you sure that the return types match variables to return? I elaborated the use of somma but didn't pay attention that prodotto has a wrong return type, just as it was said in the comments.
You might want to correct the prodotto method definition as -
public Vettore prodotto(Vettore X) {
int[] K = new int[6]; // deault values are 0
for (int i = 0; i < 6; i++)
K[i] += V[i] * X.V[i];
return new Vettore(K);
}
This would evaluate the product of the array field V for two instances of class Vettore namingly X the input param and the current instance that you would call the method from.
return new Vettore(K); creates a new instance of Vettore class with K as arrays field, while executing the constructor logic in place as follows -
public SumAndProductExercise(int[] X) {
if (X.length != 6) { // length of the array is 6 or not
throw new BadDataException();
}
for (int i = 0; i < 6; i++) {
if (X[i] < 0) { // all the elements of array are >=0 or not
throw new BadDataException();
} else {
V[i] = X[i]; // the field on the new instance
}
}
}
I have two ArrayLists of String type and want to mix them as follows:
SPK = [A,A,A,B,A,A,A,A,A,B] and
DA= [ofm,sd,sd,sd,sd,sd,sd,sd,sd,sv]
I need to create some String in other ArrayList as below:
SPK_DA = [ofmAsdAsdAB, sdBA, sdAsdAsdAsdAsdAB]
in this set I need to equate previous similar elements before turning (from A to B) occur in SPK array.
I wrote a program but it adds one extra sdA (I don't know why I can't do such a simple thing).
for (int i=0; i <SPK.size()-1; i++){
if (SPK.get(i)==SPK.get(i+1) && (i+1)<= SPK.size()){
speakerChain = DA.get(i)+SPK.get(i);
speakerChain1=DA.get(i+1)+SPK.get(i+1);
SPKTrace.add(speakerChain);
SPKTrace.add(speakerChain1);
}else if (SPK.get(i)!=SPK.get(i+1)){
if (SPKTrace.size()!=0){
SPKTrace.add(SPK.get(i+1));
//SPKString = removeDuplicates (SPKTrace);
String S1 = arrayTostring(SPKTrace);
SPKResource.add(S1);
SPKTrace.clear();
}else {
SPKTrace.add(DA.get(i)+SPK.get(i)+SPK.get(i+1));
//SPKString = removeDuplicates (SPKTrace);
String S1 = arrayTostring(SPKTrace);
SPKResource.add(S1);
SPKTrace.clear();
}
}
}
}
}
System.out.println(SPKResource.toString());
My Output: [ofmAsdAsdAsdAB, sdBA, sdAsdAsdAsdAsdAsdAsdAsdAB]
When I use for loop it happens that it creates more sdAs....
Indicies.add(0);
for (int i = 0; i < SPK.size() - 1; i++) {
if (SPK.get(i) != SPK.get(i + 1)) {
Indicies.add(i + 1);
}
}
for (int i = 0; i < Indicies.size() - 1; i++) {
Count.add(Indicies.get(i + 1) - Indicies.get(i));
}
Count.add((SPK.size() - Indicies.get(Indicies.size() - 1)));
System.out.println("count:" + Count);
int counter = 0;
int newIndex =0;
for (int j = 1; j <= Count.size(); j++) {
String element = "";
for (int kk = 0; kk < (Count.get(j-1)); kk++) {
element = element + (DA.get(kk+newIndex) + SPK.get(kk+newIndex));
}
newIndex = newIndex+Count.get(j-1);
if (element.endsWith("A")){
SPKResource.add(element+"B");
} else if (element.endsWith("B")){
SPKResource.add(element+"A");
}
}
}
}
for (String S:SPKResource){
System.out.println(SPKResource);
}
The above code can give me the answer but I think it is quite inefficient. Is there any idea to make it more efficient?