determine if string has unique characters - java

The problem asks to "implement an algorithm to determine if a string has all unique character.
I saw the solution, but don't quite understand.
public boolean isUniqueChars(String str) {
if (str.length() > 256) return false;
boolean[] char_set = new boolean[256];
for (int i = 0; i < str.length(); i++) {
int val = str.charAt(i);
if (char_set[val])
return false;
char_set[val] = true;
}
return true;
}
Do we not use parseInt or (int) converter in front of the code? (Will str.charAt[i] be automatically changed to int?)
What does boolean[] char set=new boolean[256] mean?
Why do we need to set char_set[val]=true?

We can also use HashSet Data structure to determine if string has all unique characters in java.
Set testSet = new HashSet();
for (int i = 0; i < str.length(); i++) {
testSet.add(new Character(str.charAt(i)));
}
if (testSet.size() == str.length()) {
System.out.println("All charcaters are Unique");
} else {
System.out.println("All charcaters are niot unique");
}

See my explanation in the comments, since you only tagged algorithm I'm assuming no language and just address the algorithm itself:
public boolean isUniqueChars(String str){
//more than 256 chars means at least one is not unique
//please see first comment by #Domagoj as to why 256 length
if(str.length()>256) return false;
//keeping an array to see which chars have been used
boolean[] char_set = new boolean[256];
//iterating over the string
for(int i=0; i<str,length;i++){
//not sure what language this is, but let's say it returns an
//int representation of the char
int val=str.charAt(i);
//meaning this has been set to true before, so this char is not unique
if(char_set[val])
//nope, not unique
return false;
//remember this char for next time
char_set[val]=true;
}
//if we reached here, then string is unique
return true;
}

A simple solution would be to convert String to a Set and compare the lengths of corresponding objects. Use java 8:
private static boolean checkUniqueChars(String copy) {
if(copy.length() <= 1) return true;
Set<Character> set = copy.chars()
.mapToObj(e->(char)e).collect(Collectors.toSet());
if (set.size() < copy.length()){
return false;
}
return true;
}

One way you can do is via bits.
Time Complexity : O(N) (You could also argue the time complexity is 0(1), since the for loop will never iterate through more than
128 characters.)
Space Complexity : O(1)
Consider each char as bit(whether its there or not). Example, we need to check if all the chars are unique in "abcada", so if we will check if the bit for a char is already turned on, if yes then we return false otherwise set the bit there.
Now how we do it ? All the chars can be represented as numbers and then bits. We are gonna use "set bit" and "get bit" approach.
We will assume, in the below code, that the string only uses the lowercase letters a through z.
public static boolean isUniqueChars(String str) {
int mask = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a'; // you will get value as 0, 1, 2.. consider these as the positions inside your mask which you need to check if the bit is set or not
if ((mask & (1 << val)) > 0) return false; // Check if the bit is already set
mask |= (1 << val); // Set bit
}
return true;
}
To understand bit manipulation , you can refer
https://www.hackerearth.com/practice/notes/bit-manipulation/
https://snook.ca/archives/javascript/creative-use-bitwise-operators
It took me a while to understand bits but once I did, it was eye opening. You can solve so many problems using bits and provides most optimal solutions.

Think about how you would do this with a paper and pencil.
Write out the alphabet once.
Then go through your string character by character.
When you reach a character cross it out of your alphabet.
If you go to cross out a character and find that it has already been crossed out, then you know the character appeared previously in your string and you can then stop.
That's essentially what the code you posted does, using an array. The operation completes in O(N) time with O(K) extra space (where K is the number of keys you have).
If your input had a large number of elements or you could not know what they were ahead of time, you could use a hash table to keep track of which elements have already been seen. This again takes O(N) time with O(cK) extra space, where K is the number of keys and c is some value greater than 1.
But hash tables can take up quite a bit of space. There's another way to do this. Sort your array, which will take O(N log N) time but which requires no extra space. Then walk through the array checking to see if any two neighbouring characters are the same. If so, you have a duplicate.

You could see the detailed explanation in my blogpost here:
Check if string has all unique characters
The simplest solution is to make a loop through all characters, use hashMap and to put each character into the hashmap table, and before this check if the character is already there. If the character is already there, it's not unique.

public class UniqueString {
public static void main(String[] args) {
String input = "tes";
Map<String, Integer> map = new HashMap<String, Integer>();
for (int i = 0; i < input.length(); i++) {
if (!map.containsKey(Character.toString(input.charAt(i)))) {
map.put(Character.toString(input.charAt(i)), 1);
} else {
System.out.println("String has duplicate char");
break;
}
}
}
}

Java SE 9
You can simply match the length of the string with the count of distinct elements. In order to get the IntStream of all characters, you can use String#chars on which you can apply Stream#distinct to get the Stream of unique elements. Make sure to convert the string to a single case (upper/lower) otherwise the function, Stream#distinct will fail to count the same character in different cases (e.g. I and i) as one.
Demo:
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) {
// Test
Stream.of(
"Hello",
"Hi",
"Bye",
"India"
).forEach(s -> System.out.println(s + " => " + hasUniqueChars(s)));
}
static boolean hasUniqueChars(String str) {
return str.toLowerCase().chars().distinct().count() == str.length();
}
}
Output:
Hello => false
Hi => true
Bye => true
India => false
Java SE 8
static boolean hasUniqueChars(String str) {
return Arrays.stream(str.toLowerCase().split("")).distinct().count() == str.length();
}

For best performance, you should use a Set, and add the characters of the string to the set. If the set.add(...) method returns false, it means that the given character has been seen before, so you return false, otherwise you return true after adding all the characters.
For the simple solution, use Set<Character>:
public static boolean allUniqueCharacters(String input) {
Set<Character> unique = new HashSet<>();
for (int i = 0; i < input.length(); i++)
if (! unique.add(input.charAt(i)))
return false;
return true;
}
That will however not handle Unicode characters outside the BMP, like Emojis, so we might want to change the set to use Unicode Code Points:
public static boolean allUniqueCodePoints(String input) {
Set<Integer> unique = new HashSet<>();
return input.codePoints().noneMatch(cp -> ! unique.add(cp));
}
However, even Code Points do not represent "characters" as we humans think of them. For that we need to process Grapheme Clusters:
public static boolean allUniqueClusters(String input) {
BreakIterator breakIterator = BreakIterator.getCharacterInstance(Locale.US);
breakIterator.setText(input);
Set<String> unique = new HashSet<>();
for (int start = 0, end; (end = breakIterator.next()) != BreakIterator.DONE; start = end)
if (! unique.add(input.substring(start, end)))
return false;
return true;
}
Or with Java 9+:
public static boolean allUniqueClusters(String input) {
Set<String> unique = new HashSet<>();
return Pattern.compile("\\X").matcher(input).results()
.noneMatch(m -> ! unique.add(m.group()));
}

In JS,
const isStringUnique = str => {
if(str){
let obj = {}
for(let char of str){
obj[char] ? obj[char]++ : obj[char]=1;
}
for(let char of str){
if(obj[char] > 1)
return false;
}
return true;
}
return true;
}

public class CheckStringUniqueChars {
public static boolean checkUnique(String str) {
int i=0,j=str.length()-1;
while(i<j) {
if(str.charAt(i) == str.charAt(j)) {
return false;
}
i++;
j--;
}
return true;
}
}

Related

Efficient way for finding Shortest Palindrome from Arraylist of strings

I am trying to solve this problem in java. I have an arraylist of palindromic strings. I have to find the shortest palindrome string out of the given array list. I have solved the question but was looking at getting feedback on my code and also how I can try to make the code more efficient/better.
Here is the code what I have tried.
In this case, size would be 3, since that is the length of the smallest palindromic string.
import java.util.ArrayList;
class ShortestPalindrome {
public static int isShortestPalindrome(ArrayList<String> list) {
int smallest = list.get(0).length();
boolean ret = false;
for (String element : list) {
ret = isPalindrome(element);
if (ret) {
if (element.length() < smallest)
smallest = element.length();
}
}
return smallest;
}
private static boolean isPalindrome(String input) {
String str = "";
boolean result = false;
if (input.length() == 1 || input.length() == 0)
return true;
if (input.charAt(0) != input.charAt(input.length() - 1))
return false;
StringBuilder sb = new StringBuilder(input.toLowerCase());
str = sb.reverse().toString();
if (input.equals(str)) {
result = true;
}
return result;
}
public static void main(String[] args) {
ArrayList<String> array = new ArrayList<String>();
array.add("malayam");
array.add("aba");
array.add("abcdeyugugi");
array.add("nitin");
int size = isShortestPalindrome(array);
System.out.println("Shortest length of string in list:" + size);
}
}
I have a couple of comments regarding your code:
In general - if you break your problem to smaller parts, there are efficient solutions all around.
As #AKSW mentioned in his comment, if - in any case - we have to check each string's length, it's better to do it in the beginning - so we don't run the relatively expensive method isPalindrome() with irrelevant strings.(Just notice I override the given list with the sorted one, even though initializing a new sorted list is trivial)
The main improvement that I made is in the isPalindrome() method:
Reversing a string of length n takes n time and additional n space. Comparing the two takes also n time.Overall: 2n time, n space
Comparing each two matching characters (from the beginning and from the end) takes 2 additional space (for the integers) and approximately n/2 time.Overall: n/2 time, 2 space
Obviously when using limits for complexity calculations, the time complexities are both the same - O(n) - but the second solution is still 4 times cheaper and cost a negligible amount of space.
Therefore I believe this is the most efficient way to achieve your test:
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
class ShortestPalindrome {
public static int isShortestPalindrome(ArrayList<String> list) {
// Sorts the given ArrayList by length
Collections.sort(list, Comparator.comparingInt(String::length));
for (String element : list) {
if(isPalindrome(element)) {
return element.length();
}
}
return -1; // If there is no palindrome in the given array
}
private static boolean isPalindrome(String input) {
String lowerCased = input.toLowerCase();
int pre = 0;
int end = lowerCased.length() - 1;
while (end > pre) {
if (lowerCased.charAt(pre) != lowerCased.charAt(end))
return false;
pre ++;
end --;
}
return true;
}
public static void main(String[] args) {
ArrayList<String> array = new ArrayList<>(Arrays.asList("malayam", "aba", "abcdeyugugi", "nitin"));
int size = isShortestPalindrome(array);
System.out.println("Shortest length of string in list: " + size);
}
}
Edit: I've tested this algorithm with the following list. Sorting the list before checking for palindromes reduces run time in 50%.
"malayam", "aba", "abcdeyugugi", "nitin", "sadjsaudifjksdfjds", "sadjsaudifjksdfjdssadjsaudifjksdfjds", "sadjsaudifjksdfjdssadjsaudifjksdfjdssadjsaudifjksdfjds", "a"
The simplest improvement to your code is to only check whether a string is a palindrome, if it's length is smaller than smallest.
Btw the initialization int smallest = list.get(0).length(); is not correct, imagine the first element not being a palindrome and being of smallest size of all strings. You should do int smallest = Integer.MAX_VALUE;
Also the check
if (input.charAt(0) != input.charAt(input.length() - 1))
return false;
is incorrect, as you don't convert the characters to lower case (as you do later), thus "ajA" would not be a palindrome.
There are further improvements of your code possible:
You could replace the palindrome checking by copying and reversing with this:
for (int i = 0; i < input.length() / 2; ++i)
if (Character.toLowerCase(input.charAt(i)) != Character.toLowerCase(input.charAt(input.length() - 1 - i)))
return false;
Here there is no copy necessary and in the average case it might be faster (as it can terminate early).
Also, like AKSW mentioned, it might be faster to sort the strings by length and then you can terminate early, once you found a palindrome.
Below simple code should work for you.
First check for length and then check is it actually palindrome or not.
If yes, then just store it in smallest
public static int isShortestPalindrome(ArrayList<String> list) {
Integer smallest = null;
for(String s:list){
if ( (smallest == null || s.length()< smallest) && new StringBuilder(s).reverse().toString().equalsIgnoreCase(s) ){
smallest = s.length();
}
}
return smallest == null ? 0 :smallest;
}
Here is a stream version:
OptionalInt minimalLenOfPalindrome
= list.paralellStream()
.filter(st -> {
StringBuilder sb = new StringBuilder(st);
String reversedSt = sb.reverse().toString();
return st.equalsIgnoreCase(reversedSt);
})
.mapToInt(String::length)
.min();
Thanks for #yassin answer I am changing above code with:
public class SOFlow {
private static boolean isPalindrome(String input) {
for (int i = 0; i < input.length() / 2; ++i) {
if (Character.toLowerCase(input.charAt(i)) != Character.toLowerCase(input.charAt(input.length() - 1 - i))) {
return false;
}
}
return true;
}
public static void main(String args[]) {
List<String> list = new ArrayList<>();
list.add("cAcc");
list.add("a;;;;a");
list.add("aJA");
list.add("vrrtrrr");
list.add("cAccccccccc");
OptionalInt minimalLenOfPalindrome
= list.parallelStream()
.filter(SOFlow::isPalindrome)
.mapToInt(String::length)
.min();
System.out.println(minimalLenOfPalindrome);
}
}
Using Java8 streams, and considering sorting first, for the same reasons as others:
boolean isPalindrome (String input) {
StringBuilder sb = new StringBuilder(input.toLowerCase());
return sb == sb.reverse();
}
public static int isShortestPalindrome(ArrayList<String> list) {
return (list.stream().sorted ((s1, s2) -> {
return s1.length () - s2.length ();
})
.filter (s-> isPalindrome (s))
.findFirst ()
.map (s -> s.length ())
.orElse (-1));
}
If you have many Strings of equal, minimal length, of very big length, only non palindromic in the very middle, you might spend much time in isPalindrome and prefer something like isPalindrome1 over isPalindrome.
If we assume a Million Strings of equally distributed length from 1000 to 2000 characters, we would end up with concentration on in avg. 1000 Strings. If most of them are equal except on few characters, close to the middle, then fine tuning that comparison might be relevant. But finding a palindrome early terminates our search, so the percentage of palindromes is of heavy influence on the performance too.
private static boolean isPalindrome1 (String s) {
String input = s.toLowerCase ();
int len = input.length ();
for (int i = 0, j = len -1; i < len/2 && j > len/2; ++i, --j)
if (input.charAt(i) != input.charAt (j))
return false;
return true;
}
The Result of the stream sorting and filtering is an Option, which is great opportunity to signal, that nothing was found. I sticked to your interface of returning int and return -1 if nothing is found, which has to be properly evaluated, of course, by the caller.

Moving from one base to another java?

I'm trying to create a program that reads in two bases from stdin and checks to see what's the smallest number in which both have repeating digits in it. It seems to be working fine for small bases but when I use larger bases I seem to be getting the wrong answer. e.g. giving it 3 and 50 it will find 22 as the smallest number where they both have repeated digits but i'm pretty sure 22 in base 50 is a single number.
What's the logic here that I'm missing? I'm stumped. Anything to point me in the right direction would be appreciated :)
My conversion method, this works for smaller bases but not larger it seems.
public static String converties(int number, int base)
{
int remainder;
ArrayList<Integer>remainders = new ArrayList<Integer>();
while (number != 0)
{
remainder = number%base;
remainders.add(remainder);
number = number/base;
}
String result = "";
for (int i = 0; i < remainders.size(); i++)
{
result+=Integer.toString(remainders.get(i));
}
result = reverseString(result);
return result;
}
public static String reverseString(String result)
{
String newResult = "";
for (int i = result.length()-1; i >= 0; i--)
{
newResult+=result.charAt(i);
}
return newResult;
}
public static boolean areThereRepeats(String value)
{
ArrayList<Character> splitString = new ArrayList<Character>();
for (char c : value.toCharArray())
{
//if it already contains value then theres repeated digit
if (splitString.contains(c))
{
return true;
}
splitString.add(c);
}
return false;
}
The problem is in this function:
public static boolean areThereRepeats(String value){
ArrayList<Character> splitString = new ArrayList<Character>();
for (char c : value.toCharArray()){
//if it already contains value then theres repeated digit
if (splitString.contains(c)){
return true;//Note that returning here only checks the first value that matches
}
splitString.add(c);
}
return false;
}
When you check to see if splitString.contains(c) it will return true if the array is length one. You aren't doing anything to check that the char c you're checking isn't comparing against itself.
Also note that Maraca has a point: the data structure you're choosing to utilize to record your remainders is flawed. areThereRepeats will work fine for checking if you assume that each new character represents a new remainder (or more specifically, the index into the base you're checking of the remainder you found). But why marshal all of that into a string in the first place? Why not pass the ArrayList to areThereRepeats?
public static boolean converties(int number, int base){
int remainder;
ArrayList<Integer>remainders = new ArrayList<Integer>();
while (number != 0){
remainder = number%base;
remainders.add(remainder);//Saves the index of the remainder in the current base, using an integer base-10 representation
number = number/base;
}
return areThereRepeats(remainders);
}
//Recursion ain't efficient, but...
public static boolean areThereRepeats(ArrayList<Integer> remainders){
if (remainders.size() <= 1) {
return false;
}
rSublist = remainders.sublist(1, remainders.size())
if (rSublist.contains(remainders.get(0)) {
return true;
}
return areThereRepeats(rSublist);
}
result+=Integer.toString(remainders.get(i));
In this line you add the remainder in base 10, so it will only work correctly if you find a match with base <= 10. Btw. It could be done very easily with BigInteger (if you don't want to do it yourself).
Otherwise:
result += (char)(remainders.get(i) < 10 ? ('0' + remainders.get(i)) : ('A' + remainders.get(i) - 10));
This will work up to base 36.
Or just use result += (char)remainders.get(i); it will work up to base 256, but it won't be readable.
And I agree with Nathaniel Ford, it would be better to pass the ArrayLists. If you still want to get the String in the standard way you can make another function to which you pass the ArrayList and transform it with the 1st method shown here.

Java: Determining if a word contains letters that can be found in another word?

For example:
If you have a String "magikarp", and you tested it against "karma", this would be true, because all of the letters that make up "karma" can be found in "magikarp".
"kipp" would return false, because there is only one "p" in "magikarp."
This is the attempt that I have right now, but I don't think it's very efficient, and it doesn't return correctly for cases when there are multiple occurrences of one letter.
private boolean containsHelper(String word, String word2){
for (int i = 0; i < word2.length(); i ++){
if (!word.contains(String.valueOf(word2.charAt(i)))){
return false;
}
}
return true;
}
I don't write the program here, but let you know how to do. There are 2 ways to do this considering complexity:
1) If you are sure that you would be getting only a-z/A-Z characters in the string, then take a array of size 26. Loop thorough the first string and place the count of the character appeared in the respective index. Say for example you have String "aabcc". Now array would look like [2,1,2,0,...0]. Now loop through the second String, and at each character, subtract the 1 from the array at the respective character position and check the resultant value. If value is less than 0, then return false. For example you have "aacd". When you are at d, you would be doing (0-1), resulting -1 which is less than 0, hence return false.
2) Sort the characters in the each String, and then compare.
Since you are only checking for characters, it would be more efficient to use indexOf and to check if it return -1 as contains itself call indexOf but with some other trims...
However, I think it would be simplier to convert the String to an array of char and to remove them if they are found which would also handle the case of multiple occurences.
So you're algorithm woud look something like this :
private final boolean containsHelper(final String word, final String word2)
{
char[] secondWordAsCharArray = word2.toCharArray();
char[] firstWordAsCharArray = word.toCharArray();
Arrays.sort(firstWordAsCharArray);//Make sure to sort so we can use binary search.
int index = 0;
for(int i = 0; i++ < secondWordAsCharArray.length;)
{
index = Arrays.binarySearch(firstWordAsCharArray, secondWordAsCharArray[i]);//Binary search is a very performant search algorithm
if(index == -1)
return false;
else
firstWordAsCharArray[index] = ''; //A SENTINEL value, set the char a value you are sure that will never be in word2.
}
}
Basically, what I do is :
Convert both word to char array to make it easier.
Sort the char array of the word we inspect so we can use binary search.
Loop over all characters of the second word.
retrieve the index using the binary search algorithm (a very performant algorithm on char, the best from my knowledge).
if index is -1, it was not found so we can return false.
else make sure we unset the character.
You need to ensure that for any character c that appears in the second string, the number of times that c appears in the second string is no greater than the number of times that c appears in the first string.
One efficient way to tackle this is to use a hashmap to store the count of the characters of the first string, and then loop through the characters in the second string to check whether their total count is no greater than that in the first string. The time and space complexity are O(n) in the worst case, where n is the length of the input strings.
Here is the sample code for your reference:
import java.util.HashMap;
import java.util.Map;
public class HashExample {
public static void main(String[] args) {
System.out.println(containsHelper("magikarp", "karma")); // true
System.out.println(containsHelper("magikarp", "kipp")); // false
}
private static boolean containsHelper(String word, String word2) {
Map<Character, Integer> hm = new HashMap<>();
for (int i = 0; i < word.length(); i++) {
Character key = word.charAt(i);
int count = 0;
if (hm.containsKey(key)) {
count = hm.get(key);
}
hm.put(key, ++count);
}
for (int i = 0; i < word2.length(); i++) {
Character key = word2.charAt(i);
if (hm.containsKey(key)) {
int count = hm.get(key);
if (count > 0) {
hm.put(key, --count);
} else {
return false;
}
} else {
return false;
}
}
return true;
}
}
On possible algorithm is to remove all letters from your second word that don't occur in the first, sort both and then compare them.
Here is a reasonable way to achieve that in Java 8:
List<Integer> wordChars = word.chars().sorted().collect(Collectors.toList());
List<Integer> searchChars = search.chars()
.filter(wordChars::contains).sorted().collect(Collectors.toList());
return wordChars.equals(searchChars);

Find if a string has all unique chars using recursion

Just brushing up on some simple java stuff. I am trying to check if a string is unique, and I figured the best way would be to do so through recursion. Here is what I have so far, but am getting an out of bounds error, obviously i'm overlooking something pretty simple:
public class uniqueCharString {
public static void main(String [] args){
String a = "abcdefghijk";
System.out.println(unique(a));
}
public static boolean unique(String s){
if(s.substring(1).contains(String.valueOf(s.charAt(0)))){
return false;
}
else return unique(s.substring(1));
}
}
okay so I finished my way of thinking. I got some good advice from you guys, but I wanted to finish my thought process. How does this solution compare to some of the ones where you guys said use a set?
public static boolean unique(String s){
for(int x = 0; x < s.length(); x++){
if(s.substring(x+1).contains(String.valueOf(s.charAt(x)))){
return false;
}
}
return true;
}
The most efficient way to do this is with a Set. Iterate over each character and add them to a set. If the add operation returns false, then the character is already in your set and the String has not-all-unique chars, otherwise all are unique
String s = "some string blah blah blah";
Set<Character> set = new HashSet<>();
for (char c : s.toCharArray())
{
boolean elementFirstAdded = set.add(c);
if (!elementFirstAdded)
//Duplicate
}
//Not duplicate
Like #Kon said, a Set is (probably) more efficient.
However, to use recursion you need to add a termination condition: your function never returns true!
A zero or one length string must be unique (well, unique zero-length is a bit ambiguous...): add this at the top:
if (s.length() <= 1) {
return true;
}
Another way of doing this, in case you're crazy about performance-tuning homework problems :)
public static boolean unique(String s) {
BitSet chars = new BitSet(Character.SIZE);
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (chars.get(c)) {
return false;
}
chars.set(c);
}
return true;
}
If you don't need to know what was duplicated I would just throw everything into the Set(), because a Set() cannot contain duplicates. They will never be added to the Set(). And if you just want to know if you had duplicates you can check the length of the String before and then the size of the HashSet() after.
Like this:
String a = "abcdefghijk";
String[] ary = a.split("");
Set<T> mySet = new HashSet<T>(Arrays.asList(words))
if (mySet.size() != ary.length){
//do something
}

Count Occurence of Needle String in Haystack String, most optimally?

The Problem is simple Find "ABC" in "ABCDSGDABCSAGAABCCCCAAABAABC" without using String.split("ABC")
Here is the solution I propose, I'm looking for any solutions that might be better than this one.
public static void main(String[] args) {
String haystack = "ABCDSGDABCSAGAABCCCCAAABAABC";
String needle = "ABC";
char [] needl = needle.toCharArray();
int needleLen = needle.length();
int found=0;
char hay[] = haystack.toCharArray();
int index =0;
int chMatched =0;
for (int i=0; i<hay.length; i++){
if (index >= needleLen || chMatched==0)
index=0;
System.out.print("\nchar-->"+hay[i] + ", with->"+needl[index]);
if(hay[i] == needl[index]){
chMatched++;
System.out.println(", matched");
}else {
chMatched=0;
index=0;
if(hay[i] == needl[index]){
chMatched++;
System.out.print("\nchar->"+hay[i] + ", with->"+needl[index]);
System.out.print(", matched");
}else
continue;
}
if(chMatched == needleLen){
found++;
System.out.println("found. Total ->"+found);
}
index++;
}
System.out.println("Result Found-->"+found);
}
It took me a while creating this one. Can someone suggest a better solution (if any)
P.S. Drop the sysouts if they look messy to you.
How about:
boolean found = haystack.indexOf("ABC") >= 0;
**Edit - The question asks for number of occurences, so here's a modified version of the above:
public static void main(String[] args)
{
String needle = "ABC";
String haystack = "ABCDSGDABCSAGAABCCCCAAABAABC";
int numberOfOccurences = 0;
int index = haystack.indexOf(needle);
while (index != -1)
{
numberOfOccurences++;
haystack = haystack.substring(index+needle.length());
index = haystack.indexOf(needle);
}
System.out.println("" + numberOfOccurences);
}
If you're looking for an algorithm, google for "Boyer-Moore". You can do this in sub-linear time.
edit to clarify and hopefully make all the purists happy: the time bound on Boyer-Moore is, formally speaking, linear. However the effective performance is often such that you do many fewer comparisons than you would with a simpler approach, and in particular you can often skip through the "haystack" string without having to check each character.
You say your challenge is to find ABC within a string. If all you need is to know if ABC exists within the string, a simple indexOf() test will suffice.
If you need to know the number of occurrences, as your posted code tries to find, a simple approach would be to use a regex:
public static int countOccurrences(string haystack, string regexToFind) {
Pattern p = Pattern.compile(regexToFind);
Matcher m = p.matcher(haystack); // get a matcher object
int count = 0;
while(m.find()) {
count++;
}
return count;
}
Have a look at http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
public class NeedleCount
{
public static void main(String[] args)
{
String s="AVBVDABCHJHDFABCJKHKHF",ned="ABC";
int nedIndex=-1,count=0,totalNed=0;
for(int i=0;i<s.length();i++)
{
if(i>ned.length()-1)
nedIndex++;
else
nedIndex=i;
if(s.charAt(i)==ned.charAt(nedIndex))
count++;
else
{
nedIndex=0;
count=0;
if(s.charAt(i)==ned.charAt(nedIndex))
count++;
else
nedIndex=-1;
}
if(count==ned.length())
{
nedIndex=-1;
count=0;
totalNed++;
System.out.println(totalNed+" needle found at index="+(i-(ned.length()-1)));
}
}
System.out.print("Total Ned="+totalNed);
}
}
Asked by others, better in what sense? A regexp based solution will be the most concise and readable (:-) ). Boyer-Moore (http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm) will be the most efficient in terms of time (O(N)).
If you don't mind implementing a new datastructure as replacement for strings, have a look at Tries: http://c2.com/cgi/wiki?StringTrie or http://en.wikipedia.org/wiki/Trie
If you don't look for a regular expression but an exact match they should provide the fastest solution (proportional to length of search string).
public class FindNeedleInHaystack {
String hayStack="ASDVKDBGKBCDGFLBJADLBCNFVKVBCDXKBXCVJXBCVKFALDKBJAFFXBCD";
String needle="BCD";
boolean flag=false;
public void findNeedle() {
//Below for loop iterates the string by each character till reaches max length
for(int i=0;i<hayStack.length();i++) {
//When i=n (0,1,2... ) then we are at nth character of hayStack. Let's start comparing nth char of hayStach with first char of needle
if(hayStack.charAt(i)==needle.charAt(0)) {
//if condition return true, we reach forloop which iterates needle by lenghth.
//Now needle(BCD) first char is 'B' and nth char of hayStack is 'B'. Then let's compare remaining characters of needle with haystack using below loop.
for(int j=0;j<needle.length();j++) {
//for example at i=9 is 'B', i+j is i+0,i+1,i+2...
//if condition return true, loop continues or else it will break and goes to i+1
if(hayStack.charAt(i+j)==needle.charAt(j)) {
flag=true;
} else {
flag=false;
break;
}
}
if(flag) {
System.out.print(i+" ");
}
}
}
}
}
Below code will perform exactly O(n) complexity because we are looping n chars of haystack. If you want to capture start and end index's of needle uncomment below commented code. Solution is around playing with characters and no Java String functions (Pattern matching, IndexOf, substring etc.,) are used as they may bring extra space/time complexity
char[] needleArray = needle.toCharArray();
char[] hayStackArray = hayStack.toCharArray();
//java.util.LinkedList<Pair<Integer,Integer>> indexList = new LinkedList<>();
int head;
int tail = 0;
int needleCount = 0;
while(tail<hayStackArray.length){
head = tail;
boolean proceed = false;
for(int j=0;j<needleArray.length;j++){
if(head+j<hayStackArray.length && hayStackArray[head+j]==needleArray[j]){
tail = head+j;
proceed = true;
}else{
proceed = false;
break;
}
}
if(proceed){
// indexList.add(new Pair<>(head,tail));
needleCount++;
}
++tail;
}
System.out.println(needleCount);
//System.out.println(indexList);

Categories

Resources