Getting common string pairs in two string list - java

Hi i am taking common count in two list.
Here is my code.
public static int getMatchCount(List<String> listOne, List<String> listTwo) {
String valueOne = "";
String valueTwo = "";
int matchCount = 0;
boolean isMatchedOnce=false;
for (int i = 0; i < listOne.size(); i++) {
valueOne = listOne.get(i);
isMatchedOnce=false;
if (StringUtils.isBlank(valueOne))
continue;
for (int j = 0; j < listTwo.size(); j++) {
valueTwo = listTwo.get(j);
if (StringUtils.isBlank(valueTwo))
continue;
if (valueTwo.equals(valueOne) && (!isMatchedOnce)) {
matchCount++;
listOne.set(i, "");
listTwo.set(j, "");
isMatchedOnce=true;
}
}
}
return matchCount;
}
for ex
listone listTwo
A A
A B
B
Then result is 2 not 3
As their is only two common pair we can take out.
But the method is very slow Any Improvement in Above method to make it quick.

This should be an easier work around:
List<String> listOne = new ArrayList<String>();
//add elements
List<String> listTwo= new ArrayList<String>();
//add elements
List<String> commonList = new ArrayList<String>(listTwo);
commonList.retainAll(listOne);
int commonListSize = commonList.size();

Use an interim Collection and addAll(), retainAll():
Set<String> set = new HashSet<String>();
set.addAll(list1);
set.retainAll(list2);
int count = set.size();

maybe you can try this ...
public static int getMatchCount(List<String> listOne, List<String> listTwo) {
String valueOne;
String valueTwo;
int matchCount = 0;
boolean isMatchedOnce;
//for (int i = 0; i < listOne.size(); i++) {
for(String i : listOne){
valueOne = i;
isMatchedOnce = false;
if (StringUtils.isBlank(valueOne)) {
continue;
}
for (String j : listTwo) {
valueTwo = j;
if (StringUtils.isBlank(valueTwo)) {
continue;
}
if (valueTwo.equals(valueOne) && (!isMatchedOnce)) {
matchCount++;
listOne.set(listOne.indexOf(i), "");
listTwo.set(listOne.indexOf(j), "");
isMatchedOnce = true;
}
}
}
return matchCount;
}

Related

Sorted String with input characters pattern

I have the methode sortString, that have to sort characters in the string with pattern that i add.
For example:
if the pattern : List patter= Arrays.asList('a', 'b', 'c', 'd', 'z') and the inputString is "bbacrrt"
sortString have to return "abbcrrt" symbols that doesn't included in pattern have to add in the end of return string, order of this symbols doesn't metter.
In ideal way difficulty should be O(n).
private static String sortString(String inputString, List<Character> pattern) {
List<Character> inputlist = convert(inputString); // method convert create List<Character> from String
List<Character> returnedList = new ArrayList<>(Collections.nCopies(inputlist.size(), ' '));
Map<Character, Integer> map = new HashMap<>();
boolean isAdded = false;
for (int i = 0; i < pattern.size(); i++) {
map.put(pattern.get(i), i);
}
for (int i = 0; i < inputlist.size(); i++) {
for (int j = 0; j < pattern.size(); j++) {
if (inputlist.get(i) == pattern.get(j)) {
if (returnedList.get(map.get(pattern.get(j))) == pattern.get(j)) {
returnedList.add(map.get(pattern.get(j)), inputlist.get(i));
} else {
if (map.get(pattern.get(j)) - 1 < 0) {
returnedList.add(map.get(pattern.get(j)), inputlist.get(i));
} else {
returnedList.add(map.get(pattern.get(j)) + 1, inputlist.get(i));
}
}
isAdded = true;
}
}
if (!isAdded) {
returnedList.add(inputlist.get(i));
}
isAdded = false;
}
return returnedList.toString();
}
Could you help me?
My initial thought was to use a HashMap<String, ArrayList> to store the returnList. Use the String part of the HashMap to store each character of the pattern and ArrayList to store each character of the inputList as you walked through that string.
When you want the final output, loop through the HashMap using the pattern as the index.
When sorting characters, it's usually a counting sort. Then use a boolean array to mark the characters in the pattern. When you build the string, use the boolean array to check if the character should be append in the sort section or the unsorted section. Use the count array to append the number of characters to the string.
This is an O(n) solution.
import java.util.*;
public class Main {
public static void main(String[] args) {
List<Character> pat = Arrays.asList('a', 'c', 'z');
String s = "atttbbacrrt";
System.out.println(sort(pattern, input));
}
static String sort(List<Character> pat, String s) {
boolean[] mustHave = new boolean[26];
int[] count = new int[26];
for(char c: pat) mustHave[c-'a'] = true;
for(char c: s.toCharArray()) count[c-'a']++;
StringBuilder sorted = new StringBuilder();
StringBuilder unsorted = new StringBuilder();
for(int i = 0; i < 26; i++) {
StringBuilder sb = mustHave[i] ? sorted : unsorted;
for(int j = 0; j < count[i]; j++) sb.append((char)('a'+i));
}
return sorted.toString() + unsorted.toString();
}
}
Output:
aacbbrrtttt
I hope that will help to someone
This is my Solition:
private static String sortString(String input, List<Character> alphabet) {
List<Character> inputlist = convert(input); // method convert return List<Character> from String
Map<String, ArrayList<String>> returnedMap = new HashMap<>();
Map<Integer, String> orderedMap = new HashMap<>();
List<List<String>> listWithSortedValues = new ArrayList<>();
boolean isAdded = false;
for (int i = 0; i < alphabet.size(); i++) {
returnedMap.put(alphabet.get(i).toString(), new ArrayList<>());
orderedMap.put(i ,alphabet.get(i).toString());
}
returnedMap.put("undefined", new ArrayList<>()); //better create CONSTANT private final static String UNDEFINED_PART = "undefined";
orderedMap.put(alphabet.size(), "undefined");
for (int i = 0; i < inputlist.size(); i++) {
for (int j = 0; j < alphabet.size(); j++) {
if (inputlist.get(i) == alphabet.get(j)) {
ArrayList<String> strings = returnedMap.get(inputlist.get(i).toString());
strings.add(alphabet.get(j).toString());
returnedMap.put(alphabet.get(j).toString(), strings);
isAdded = true;
}
}
if (!isAdded) {
ArrayList<String> unsortedValues = returnedMap.get("undefined");
unsortedValues.add(inputlist.get(i).toString());
returnedMap.put("undefined", unsortedValues);
}
isAdded = false;
}
for (int i = 0; i < orderedMap.size(); i++) {
String keyWithValueFromOrderedMap = orderedMap.get(i);
listWithSortedValues.add(returnedMap.get(keyWithValueFromOrderedMap));
}
List<String> returnedList = listWithSortedValues.stream().flatMap(List::stream).collect(Collectors.toList());
return converterListToString(returnedList); //method converterListToString returned String from List<String>

Get largest Group of anagrams in an array

For an assignment I have been asked to find the largest group of anagrams in a list. I believe I would have to have an accumulation loop inside of another loop that keeps track of the largest number of items. The problem is that I don't know how to count how many of each anagram I have. I have been able to sort the array into groups based on their anagrams. So from the index 1-3 is one anagram, 4-10 is another, etc. How do I search through and count how many of each anagram I have? Then compare each one to the previous count.
Sample of the code:
public static String[] getLargestAnagramGroup(String[] inputArray) {
ArrayList<String> largestGroupArrayList = new ArrayList<String>();
if (inputArray.length == 0 || inputArray == null) {
return new String[0];
}
insertionSort(inputArray, new AnagramComparator());
String[] largestGroupArray = new String[largestGroupArrayList.size()];
largestGroupArrayList.toArray(inputArray);
System.out.println(largestGroupArray);
return largestGroupArray;
}
UPDATE: This is how we solved it. Is there a more efficient way?
public static String[] getLargestAnagramGroup(String[] inputArray) {
int numberOfAnagrams = 0;
int temporary = 1;
int position = -1;
int index = 0;
if (inputArray == null) {
return new String[0];
}
insertionSort(inputArray, new AnagramComparator());
for (index = 0; index < inputArray.length - 1; index++) {
if (areAnagrams(inputArray[index], inputArray[index + 1])) {
temporary++;
} else {
if (temporary > numberOfAnagrams) {
numberOfAnagrams = temporary;
position = index;
temporary = 1;
} else if (temporary < numberOfAnagrams) {
temporary = 1;
}
}
}
if (temporary > numberOfAnagrams) {
position = index;
numberOfAnagrams = temporary;
}
String[] largestArray = new String[numberOfAnagrams];
for (int startIndex = position - numberOfAnagrams + 1, i = 0; startIndex <= position; startIndex++, i++) {
largestArray[i] = inputArray[startIndex];
}
return largestArray;
}
Here is a piece of code to help you out.
public class AnagramTest {
public static void main(String[] args) {
String[] input = {"test", "ttes", "abcd", "dcba", "dbac"};
for (String string : getLargestAnagramGroup(input)) {
System.out.println(string);
}
}
/**
* Gives an array of Strings which are anagrams and has the highest occurrence.
*
* #param inputArray
* #return
*/
public static String[] getLargestAnagramGroup(String[] inputArray) {
// Creating a linked hash map to maintain the order
Map<String, List<String>> map = new LinkedHashMap<String, List<String>>();
for (String string : inputArray) {
char[] charArray = string.toCharArray();
Arrays.sort(charArray);
String sortedStr = new String(charArray);
List<String> anagrams = map.get(sortedStr);
if (anagrams == null) {
anagrams = new ArrayList<String>();
}
anagrams.add(string);
map.put(sortedStr, anagrams);
}
Set<Entry<String, List<String>>> entrySet = map.entrySet();
List<String> l = new ArrayList<String>();
int highestAnagrams = -1;
for (Entry<String, List<String>> entry : entrySet) {
List<String> value = entry.getValue();
if (value.size() > highestAnagrams) {
highestAnagrams = value.size();
l = value;
}
}
return l.toArray(new String[l.size()]);
}
}
The idea is to first find the anangrams. I am doing that using a sorting the string's character array and using the LinkedhashMap.
Then I am storing the original string in the list which can be used to print or reuse as a result.
You have to keep counting the number of times the an anagram occurs and that value can be used solve your problem
This is my solution in C#.
public static string[] LargestAnagramsSet(string[] words)
{
var maxSize = 0;
var maxKey = string.Empty;
Dictionary<string, List<string>> set = new Dictionary<string, List<string>>();
for (int i = 0; i < words.Length; i++)
{
char[] temp = words[i].ToCharArray();
Array.Sort(temp);
var key = new string(temp);
if (set.ContainsKey(key))
{
set[key].Add(words[i]);
}
else
{
var anagrams = new List<string>
{
words[i]
};
set.Add(key, anagrams);
}
if (set[key].Count() > maxSize)
{
maxSize = set[key].Count();
maxKey = key;
}
}
return string.IsNullOrEmpty(maxKey) ? words : set[maxKey].ToArray();
}

How to check if a pair already exists?

I have a string say "abab" and im splitting it in pairs.(i.e ab,ab) If pair already exists then i dont want it to be generated.How do i do it
Here's the code for what ive tried
String r="abab";
String pair[] = new String[r.length()/2];
for( int i = 0; i <pair.length; i++ )
{
pair[i] = r.substring(i*2,(i*2)+2);
}
Before adding it to the pair array you could see if it already exists with the Arrays function .contains. If the pair already exists then don't add it to the pair list. For example here the ab and fe pairs will not be added:
String r="ababtefedefe";
String pair[] = new String[r.length()/2];
String currentPair = "";
for( int i = 0; i <pair.length; i++ )
{
currentPair = r.substring(i*2,(i*2)+2);
if(!java.util.Arrays.asList(pair).contains(currentPair))
pair[i] = currentPair;
System.out.println(pair[i]);
}
I would use a Set to help me out.
private String[] retrieveUniquePair(String input) {
int dim = input.length() / 2;
Set<String> pairs = new LinkedHashSet<>(dim);
for (int i = 0; i <= dim; i += 2) {
String currentPair = input.substring(i, i + 2);
pairs.add(currentPair);
}
return pairs.toArray(new String[] {});
}
Edit:
I post the solution I propose and the test
public class PairTest {
#DataProvider(name = "input")
public static Object[][] input() {
return new Object[][] {
{"abcd", Arrays.asList("ab", "cd")},
{"abcde", Arrays.asList("ab", "cd")},
{"abcdab", Arrays.asList("ab", "cd")},
{"ababcdcd", Arrays.asList("ab", "cd")},
{"ababtefedefe", Arrays.asList("ab", "te", "fe", "de")},
};
}
#Test(dataProvider = "input")
public void test(String input, List<String> expectedOutput) {
String[] output = retrieveUniquePair(input);
Assert.assertNotNull(output);
Assert.assertEquals(output.length, expectedOutput.size());
for (String pair : output) {
Assert.assertTrue(expectedOutput.contains(pair));
}
}
private String[] retrieveUniquePair(String input) {
int pairNumber = input.length() / 2;
Set<String> pairs = new LinkedHashSet<>(pairNumber);
int endIteration = input.length();
if (input.length() % 2 != 0) { // odd number
endIteration--; // ignore last character
}
for (int i = 0; i < endIteration; i += 2) {
String currentPair = input.substring(i, i + 2);
pairs.add(currentPair);
}
return pairs.toArray(new String[pairs.size() - 1]);
}
}

Sort strings in an array based on length

I have the below program for sorting Strings based on length. I want to print the shortest element first. I don't want to use Comparator or any API to do this. Where I am going wrong?
public class SortArrayElements {
public static void main(String[] args) {
String[] arr = new String[]{"Fan","dexter","abc","fruit","apple","banana"};
String[] sortedArr = new String[arr.length];
for(int i=0;i<sortedArr.length;i++)
{
sortedArr[i] = compareArrayElements(arr);
}
System.out.println("The strings in the sorted order of length are: ");
for(String sortedArray:sortedArr)
{
System.out.println(sortedArray);
}
}
public static String compareArrayElements(String[] arr) {
String temp = null;
for(int i=0;i<arr.length-1;i++)
{
temp = new String();
if(arr[i].length() > arr[i+1].length())
temp = arr[i+1];
else
temp = arr[i];
}
return temp;
}
}
If you really want to learn Java: use a Comparator. Any other way is bad Java code.
You can however rewrite the Comparator system if you want, it will teach you about proper code structuring.
For your actual code, here are some hints:
Using the proper algorithm is much more important than the Language you use to code. Good algorithms are always the same, no matter the language.
Do never do new in loops, unless you actually need to create new objects. The GC says "thanks".
Change the compareArrayElements function to accept a minimum size and have it return the smallest String with at least minimum size.
You could cut out those Strings that you have considered to be the smallest (set them to null), this will however modify the original array.
Use bubble sort, but instead of comparing ints, just compare String lengths.
I won't write the code for you. You will have to do a little bit of research on this algorithm. Google is your best friend as a programmer.
Good luck.
References:
Bubble sort in Java
Sorting an array of strings
Implement bubbleSort() and swap(). My implementations mutate the original array, but you can modify them to make a copy if you want.
public class SortArrayElements {
public static void main(String[] args) {
String[] arr = new String[]{"Fan", "dexter", "abc", "fruit", "apple", "banana"};
bubbleSort(arr);
System.out.println("The strings in the sorted order of length are: ");
for (String item : arr) {
System.out.println(item);
}
}
// Mutates the original array
public static void bubbleSort(String[] arr) {
boolean swapped = false;
do {
swapped = false;
for (int i = 0; i < arr.length - 1; i += 1) {
if (arr[i].length() > arr[i + 1].length()) {
swap(arr, i, i + 1);
swapped = true;
}
}
} while (swapped);
}
// Mutates the original array
public static void swap(String[] arr, int index0, int index1) {
String temp = arr[index0];
arr[index0] = arr[index1];
arr[index1] = temp;
}
}
Okay, there is the code completely based on loops and on bubble sort. No sets are there as you wanted it. This is a pure loop program so you could understand the nested loops, plus it doesn't change the index or something of the string
import java.util.*;
class strings {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
ArrayList<String> a = new ArrayList<String>(2);
System.out.println("Start entering your words or sentences.");
System.out.println("Type stop to stop.");
String b;
int c = 0, d;
do {
b = in.nextLine();
b = b.trim();
a.add(b);
c++;
}
while (!b.equalsIgnoreCase("stop"));
if (c > 1)
a.remove(a.size() - 1);
System.out.println("Choose the sort you want. Type the corresponding
number");
System.out.println("1. Ascending");
System.out.println("2. Descending");
int sc=in.nextInt();
switch(sc) {
case 1: {
int sag[] = new int[a.size()];
for (int jk = 0; jk < a.size(); jk++) {
b = a.get(jk);
c = b.length();
sag[jk] = c;
}
int temp;
for (int i = 0; i < a.size() - 1; i++) {
for (int j = 0; j < a.size() - 1; j++) {
if (sag[j] > sag[j + 1]) {
temp = sag[j + 1];
sag[j + 1] = sag[j];
sag[j] = temp;
}
}
}
ArrayList saga = new ArrayList();
for (int i = 0; i < sag.length; i++) {
saga.add(sag[i]);
}
for (int i = 0; i < saga.size(); i++) {
for (int j = i + 1; j < saga.size(); j++) {
if (saga.get(i).equals(saga.get(j))) {
saga.remove(j);
j--;
}
}
}
for (int i = 0; i < saga.size(); i++) {
for (int j = 0; j < a.size(); j++) {
String jl = a.get(j);
if (saga.get(i).equals(jl.length()))
System.out.println(jl);
}
}
break;
}
case 2: {
int sag[] = new int[a.size()];
for (int jk = 0; jk < a.size(); jk++) {
b = a.get(jk);
c = b.length();
sag[jk] = c;
}
int temp;
for (int i = 0; i < a.size() - 1; i++) {
for (int j = 0; j < a.size() - 1; j++) {
if (sag[j] < sag[j + 1]) {
temp = sag[j + 1];
sag[j + 1] = sag[j];
sag[j] = temp;
}
}
}
ArrayList saga = new ArrayList();
for (int i = 0; i < sag.length; i++) {
saga.add(sag[i]);
}
for (int i = 0; i < saga.size(); i++) {
for (int j = i + 1; j < saga.size(); j++) {
if (saga.get(i).equals(saga.get(j))) {
saga.remove(j);
j--;
}
}
}
for (int i = 0; i < saga.size(); i++) {
for (int j = 0; j < a.size(); j++) {
String jl = a.get(j);
if (saga.get(i).equals(jl.length()))
System.out.println(jl);
}
}
break;
}
}
}
}
For instance, the following:
ArrayList<String> str = new ArrayList<>(
Arrays.asList(
"Long", "Short", "VeryLong", "S")
);
By lambda:
str.sort((String s1, String s2) -> s1.length() - s2.length());
By static Collections.sort
import static java.util.Collections.sort;
sort(str, new Comparator<String>{
#Override
public int compare(String s1, String s2) {
return s1.lenght() - s2.lenght()
}
});
Both options are implemented by default sort method from List interface
Let's take a following array of String inputArray = ["abc","","aaa","a","zz"]
we can use Comparator for sorting the given string array to sort it based on length with the following code:
String[] sortByLength(String[] inputArray) {
Arrays.sort(inputArray, new Comparator<String>(){
public int compare(String s1, String s2){
return s1.length() - s2.length();
}
});
return inputArray;
}
//sort String array based on length
public class FirstNonRepeatedString {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.println("Please Enter your String");
String str = in.nextLine();
String arrString[] = str.split("\\s");
arrString = sortArray(arrString);
System.out.println("Sort String ");
for(String s:arrString){
System.out.println(s);
}
}
private static String[] sortArray(String[] arrString) {
int length = arrString.length;
String s;
for (int i = 0; i < length ; i++) {
s= new String();
for(int j = 0; j < length; j++ ){
if(arrString[i].length()< arrString[j].length()){
s = arrString[i];
arrString[i] = arrString[j];
arrString[j] = s;
}
}
}
return arrString;
}
}
import java.util.*;
public class SortStringBasedOnTheirLength {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
System.out.println("Enter String:");
String str=sc.nextLine();
String[] str1=str.split("\\s");
for(int i=0;i<str1.length;i++)
{
for(int j=i+1;j<str1.length;j++)
{
if(str1[i].length()>str1[j].length())
{
String temp= str1[i];
str1[i]=str1[j];
str1[j]=temp;
}
}
}
for(int i=0;i<str1.length;i++)
{
System.out.print(str1[i]+" ");
}
}
}

Performance enhancing for a string searching program in eclipse

I have written a program to search for a given phrase in a paragraph and enclose the phrase with a curly braces in that paragraph. I have used BoyerMoore's Algorithm for searching purpose.In the same time i also need to enhance the performance of the program. Though i got the required output the performance is disastrous.
Here is the code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
public class BoyerMoore {
static class Pair {
public int start, end;
Pair(int start, int end) {
this.start = start;
this.end = end;
}
public int weight() {
return end - start;
}
public boolean contains(int point) {
return start <= point && point <= end;
}
public int returnStart() {
return start;
}
}
static class Group {
public List<Pair> pairs = new ArrayList<Pair>();
public Pair maxWeight;
Group(Pair start) {
add(start);
}
Group(List<Pair> pairs) {
for (Pair pair : pairs) {
add(pair);
}
}
public boolean contains(Pair pair) {
for (Pair my : pairs) {
if (my.contains(pair.start) || my.contains(pair.end))
return true;
}
return false;
}
public void add(Pair pair) {
pairs.add(pair);
if (maxWeight == null || maxWeight.weight() < pair.weight())
maxWeight = pair;
}
}
public static List<Integer> match(String pattern, String text) {
List<Integer> matches = new ArrayList<Integer>();
int m = text.length();
int n = pattern.length();
Map<Character, Integer> rightMostIndexes = preprocessForBadCharacterShift(pattern);
int alignedAt = 0;
while (alignedAt + (n - 1) < m) {
for (int indexInPattern = n - 1; indexInPattern >= 0; indexInPattern--) {
int indexInText = alignedAt + indexInPattern;
char x = text.charAt(indexInText);
char y = pattern.charAt(indexInPattern);
if (indexInText >= m)
break;
if (x != y) {
Integer r = rightMostIndexes.get(x);
if (r == null) {
alignedAt = indexInText + 1;
} else {
int shift = indexInText - (alignedAt + r);
alignedAt += shift > 0 ? shift : 1;
}
break;
} else if (indexInPattern == 0) {
matches.add(alignedAt);
alignedAt++;
}
}
}
return matches;
}
private static Map<Character, Integer> preprocessForBadCharacterShift(
String pattern) {
Map<Character, Integer> map = new HashMap<Character, Integer>();
for (int i = pattern.length() - 1; i >= 0; i--) {
char c = pattern.charAt(i);
if (!map.containsKey(c))
map.put(c, i);
}
return map;
}
public static void main(String[] args) throws IOException {
BufferedReader input = new BufferedReader(new InputStreamReader(
System.in));
ArrayList<String> ListOfAllPhrase = new ArrayList<String>();
List<Pair> pairs = new ArrayList<Pair>();
List<Group> groups = new ArrayList<Group>();
ListOfAllPhrase.add("protein");
ListOfAllPhrase.add("protein kinase");
ListOfAllPhrase.add("protein kinase A anchor protein");
ListOfAllPhrase.add("protein kinase A anchor proteins");
ListOfAllPhrase.add("protein kinase A anchor protein activity");
ListOfAllPhrase.add("IL-6");
ListOfAllPhrase.add("SOX5");
ListOfAllPhrase.add("NOX5");
System.out.println("Input a sentence: ");
String line = input.readLine();
char[] lineInChar = line.toCharArray();
long startTime = System.currentTimeMillis();
for (int i = 0; i < ListOfAllPhrase.size(); i++) {
// offset.add((ListOfAllPhrase.get(i)).length());
List<Integer> matches = match(ListOfAllPhrase.get(i).toLowerCase(),
line.toLowerCase());
for (Integer integer : matches) {
pairs.add(new Pair(integer, (ListOfAllPhrase.get(i)).length()
+ integer));
}
}
System.out.println("Total time taken: "
+ (System.currentTimeMillis() - startTime));
for (Pair pair : pairs) {
List<Group> intersects = new ArrayList<Group>();
for (Group group : groups) {
if (group.contains(pair)) {
intersects.add(group);
}
}
if (intersects.isEmpty()) {
groups.add(new Group(pair));
} else {
List<Pair> intervals = new ArrayList<Pair>();
intervals.add(pair);
for (Group intersect : intersects) {
intervals.addAll(intersect.pairs);
}
groups.removeAll(intersects);
groups.add(new Group(intervals));
}
}
StringBuilder newBuilder = new StringBuilder();
int flag = 1;
System.out.println(lineInChar.length);
for (int a = 0; a <= lineInChar.length; a++) {
for (Group group : groups) {
if (a == group.maxWeight.start) {
newBuilder.append("{");
flag = 1;
break;
}
if (a == group.maxWeight.end && a == lineInChar.length) {
newBuilder.append("}");
flag = 0;
break;
}
if (a == lineInChar.length && a == group.maxWeight.end + 1) {
newBuilder.append("}");
flag = 0;
break;
}
if (a == group.maxWeight.end) {
newBuilder.append("}");
flag = 1;
break;
}
}
if (flag == 0)
continue;
newBuilder.append(lineInChar[a]);
flag = 1;
}
System.out.println("Final output: " + newBuilder);
}
}
What can I implement or do to increase the performance of my program?
Should i switch to another string search algorithm?
If anyone could help me with this?
I think you implemented the Boyer-Moore algorithm like it is described. Although I would suggest this:
Avoid 'Expensive' operations in a for loop. For instance the toLowerCase() operations in your main method. Rewrite the loop (33% speed gain in my test):
for (int i = 0; i < ListOfAllPhrase.size(); i++) {
// offset.add((ListOfAllPhrase.get(i)).length());
List<Integer> matches = match(ListOfAllPhrase.get(i).toLowerCase(),
line.toLowerCase());
for (Integer integer : matches) {
pairs.add(new Pair(integer, (ListOfAllPhrase.get(i)).length()
+ integer));
}
}
To :
ArrayList<String> lowerCaseListOfPhrases = new ArrayList<String>(ListOfAllPhrase.size());
for (String phrase : ListOfAllPhrase) {
lowerCaseListOfPhrases.add(phrase.toLowerCase());
}
String lowerCaseLine = line.toLowerCase();
for (String phrase : lowerCaseListOfPhrases) {
List<Integer> matches = match(phrase, lowerCaseLine);
for (Integer integer : matches) {
pairs.add(new Pair(integer, phrase.length() + integer));
}
}
Take a look at this fast implementation (See http://algs4.cs.princeton.edu/53substring/BoyerMoore.java.html):
public static List<Integer> match2(String pattern, String text) {
List<Integer> result = new ArrayList<Integer>();
int[] right = new int[256]; // Assuming a 256 character encoding
for (int c = 0; c < 256; c++)
right[c] = -1;
for (int j = 0; j < pattern.length(); j++)
right[pattern.charAt(j)] = j;
int M = pattern.length();
int N = text.length();
int skip;
for (int i = 0; i <= N - M; i += skip) {
skip = 0;
for (int j = M-1; j >= 0; j--) {
if (pattern.charAt(j) != text.charAt(i+j)) {
skip = Math.max(1, j - right[text.charAt(i+j)]);
break;
}
}
if (skip == 0) { // found
result.add(i);
skip += pattern.length();
}
}
return result;
}
I get a performance increase of +- 50% when executing this test:
public static void main(String[] args) throws IOException {
String phrase = "protein kinase A anchor protein activity";
String txt = "This is a test protein kinase A anchor protein activityThis is a test protein kinase A anchor protein activityThis is ";
List<Integer> result1 = null;
List<Integer> result2 = null;
long currentTime = System.currentTimeMillis();
for (int i=0; i<1000000; i++) {
result1 = match(phrase, txt);
}
System.out.println("ExecutionTime match: " + (System.currentTimeMillis() - currentTime));
currentTime = System.currentTimeMillis();
for (int i=0; i<1000000; i++) {
result2 = match2(phrase, txt);
}
System.out.println("ExecutionTime match2: " + (System.currentTimeMillis() - currentTime));
Assert.assertTrue(result1.equals(result2));
}
Output:
ExecutionTime match: 5590
ExecutionTime match2: 2663
If you do not mind about Boyer-Moore algorithm, please use Java built-in functionality:
public static List match3(String pattern, String text) {
List result = new ArrayList();
int index = text.indexOf(pattern);
while (index >= 0) {
result.add(index);
index = text.indexOf(pattern, index + 1);
}
return result;
}

Categories

Resources