Find duplicates in unsorted string array - O(nlogn)

Find duplicates in unsorted string array - O(nlogn) - java

I got array of String hash values, for example: "123-51s-12as-dasd1-das-41c-sadasdgt-31". I need to find out if there are any duplicates. The catch is, that I need to find them all in O(nlogn).
1) My idea:
To do this I could use binary-search algorithm. But binary-search works only for sorted numeric array. So I ask: Is there any way to sort string array ?
2) I am open for any other answers. My question is:
How to find all duplicates in array of unknown strings - nlogn.

Since the time bound is nlog(n), you could safely first sort the array, and then do a scan from left to right to check for duplicated strings.

You can use a Set<String> and insert your strings into it by cycling the array: walking the array is O(n), inserting is O(log(n)). If .add() returns false, this is a duplicate:
public Set<String> getDups(String[] hashes)
{
Set<String> all = new HashSet<String>();
Set<String> ret = new HashSet<String>();
for (final String hash: hashes)
if (!all.add(hash)) // already seen
ret.add(hash);
return ret;
}

Related

How to check String contains one of Strings in collection

I want to check if the target string contains string in collections. And match the longest one. E.g.
Target string: str = "eignelaiwgn"
Collection strings: eig, a, eb, eigne, eignep
The result needs to be eigne
First I thought HashMap, but it is not sorted. So I try to put collection strings into ArrayList, then sort the list with string length. Then use for each loop to check
if ( str.contains("eigne") )
This needs to loop list each time. Is there a better(faster) way to achieve this?

Seems pretty straightforward with streams:
String targetString = "eignelaiwgn";
Collection<String> collection = Arrays.asList("eig", "a", "eb", "eigne", "eignep");
Optional<String> longestMatch = collection.stream()
.filter(targetString::contains)
.max(Comparator.comparingInt(String::length));
longestMatch.ifPresent(System.out::println); // eigne
This reads as: For every string in the collection, check if the target string contains it. If true, return the string with the max length. (As the collection might be empty, or as no string in the collection might match the filter, max returns an Optional<String>).

You could use a TreeSet for the same.
String str = "eignelaiwgn";
// Assuming that the 'sub-strings' are stored in a list
List<String> myList = Arrays.asList("eig", "a", "eb", "eigne", "eignep");
// Create a TreeSet that sorts based on descending order of length
Set<String> treeSet = new TreeSet<>((a, b) -> b.length() - a.length());
treeSet.addAll(myList);
String containsSub = treeSet.stream().filter(e -> str.contains(e))
.findFirst()
.orElse("Not found");
Now we iterate over the TreeSet and find the first occurrence where the sub-string is present in the original string. Now since the TreeSet is sorted in descending order of length, iteration will start from the highest to the lowest.

you can use LevensteinDistance() method of StringUtils class in java which will tell you the number of changes needed to change one String into another.you can print string with minimum changes needed, which is your answer. see this document -> LevenshteinDistance
Also look for differences method for same class which will tell the difference between the two string.

You could use a suffix tree. Please follow this link:
https://www.geeksforgeeks.org/pattern-searching-using-suffix-tree/

Java sorting alphabetical order of array

I am trying to sort via alphabetical order,
I pasted my code snippet below and the issue I'm having.
String[] arr = new String[3];
arr[0] = config.getfoldersdata() + "." + config.getCars();
arr[1] = config.getType();
arr[2] = entry.getVals() ? "Data" : "Entry";
result.add(arr);
I want to sort alphabetically the .getCars.
The above code returns arr 1-3 in a single line/row. There can be multiple records/rows/line. So I want them to be sorted alphabetically, by what is returned by .getCars.
The .getCars will return a string.
I have tried the Arrays.sort() above, but the re is no change in the result (no sorting).
Where am I going wrong ?

Your design is awful: the arrays should in fact be objects.
But anyway, you just need a comparator which sorts the array based on their first element's natural ordering:
result.sort(Comparator.comparing(array -> array[0]));

as far as i know u can't sort in alphabetical order using normal arrays try using tree set like this TreeSet tree = new TreeSet(); that way it will be sorted in

How to retrieve elements from sorted TreeSet using Binary Search?

I am trying to merge multiple sorted lists into one TreeSet.. And then I am thinking to apply Binary Search algorithm on that TreeSet to retrieve the element in O(log n) time complexity..
Below is my code in which I am passing List of Lists in in one of my method and combining them into TreeSet to avoid duplicacy... All the lists inside inputs are sorted -
private TreeSet<Integer> tree = new TreeSet<Integer>();
public void mergeMultipleLists(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
for(Integer ii : input) {
tree.add(ii);
}
}
}
public List<Integer> getItem(final Integer x) {
// extract elements from TreeSet in O(log n)
}
First of all, is this right way to merge multiple sorted lists into TreeSet? Is there any direct way to merge multiple sorted lists in TreeSet efficiently?
Secondly, how would I extract an element from that TreeSet in O(log n) time complexity? I would like to find an element x in that TreeSet, if it is there, then return it, if it is not there then return the next largest value from the TreeSet.
Or may be I am better off to another data structure as compared to which I am using currently?
UPDATED CODE:-
private TreeSet tree = new TreeSet();
public SearchItem(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
tree.addAll(input);
}
}
public Integer getItem(final Integer x) {
if(tree.contains(x)) {
return x;
} else {
// now how do I extract next largest
// element from it if x is not present
}
}

TreeSet is backed by a NavigableMap, a TreeMap specifically. Calling contains() on a TreeSet delegates to TreeMap.containsKey(), which is a binary search implementation.
You can check if an object is contained in the set by using TreeSet.contains(), but you have to have the object first. If you want to be able to look up and retrieve an object, then a Map implementation will be better.

You could use TreeSet.floor(), which according to the docs
Returns the greatest element in this set less than or equal to the given element, or null if there is no such element.

TreeSet, by it's nature is a sorted set and uses a red-tree-black-tree via TreeMap as it's backing
Basically: TreeSet.add(E) -> TreeMap.put(E,NULL);
As it is already a binary, sorted tree structure any 'get' or 'contains' will result in an O(log n) operation.
Your code and your question though don't line up.
You're flattening a List<List<Integer>> and just putting them all in to get all unique elements (or, at least, that's what this code will do).
But then your following method says "given this integer, give me a List<Integer>" which isn't achievable in the above code
So, let me answer your questions in order:
Sure/Yes Y
No. You misunderstand Sets (you can't extract by design) If you can do Set.contains(e)
then you HAVE the element and need not extract anything
If you need to do something like a "Set extraction" then use a TreeMap or turn your set back into a list and do myList.get(Collections.binarySearch(myElement));

Compare Set & List in my case

I have a List<Integer> , this list contains duplicated elements:
//myList content is something like e.g. 1,2,1,3,3,6,7,...
List<Integer> myList = getNumbers();
I have also an Set<String> , as you all know, Set only contain unique elements, no duplicated one. My Set<String> contains String-type-integers:
//The Set content is String type integer , e.g. "1", "3", "5" …
Set<String> mySet = getNumSet();
I would like to compare mySet with myList to figure out what elements mySet has but myList doesn't have & remove those elements from mySet.
The way I do now is to use nested iteration like following:
for(Integer i : myList){
for(String s : mySet){
if(!myList.contains(Integer.valueOf(s))){
mySet.remove(s);
}
}
}
Is there more efficient way than mine to do it?

The easiest way may be using Collection#retainAll(Collection<?> c) which could be implements with some optimization in function of the collection's type.
mySet.retainAll(myList)
However mySet and myList must be Set<X> and List<X>. I advice you to change the declaration of mySet and fill it with something like Integer#valueOf(String s), then use the retainAll method.

for (Iterator<String> it = mySet.iterator(); it.hasNext();) {
if(myList.contains(Integer.parseInt(it.next()))){
it.remove();
}
}

Copy the set.
Iterate through the list, removing found items from
the copy.
Subtract the copy from the original set.
Suppose you have m elements in the set and n elements in the list.
Then removing items from the set is O(log m) and iterating through the list is O(n). The overall complexity is O(n × log m). The subtraction is at most O(m × log m), so the overall complexity is O(s × log s) where s = max(m, n).
The complexity of your algorithm is O(n2 × (log m)2), because you
iterate over the list (O(n)) and for each item
iterate through the set (O(log m)),
finding the corresponding item in the list (O(n)), and finally
removing it from the set if appropriate (O(log m)).

Actually more information is needed to find out the best approach.
If myList.size() >> mySet.size(), follow Oswald's solution, or
Create a new Set of String.
For each item in myList, if it's not in mySet, save it to the newSet.
call mySet.removeAll(newSet).
If mySet.size() >> myList.size(), the best solution would be convert myList to a new Set of String, and then call mySet.retainAll method.

how to calculate efficent how many elements in a list are the same?

I need to do the following task:
I have a list with items.
Each of the items also have a List with strings like "gkejgueieriug"
Now I need to run throw the list and check how many of the items in the list of each item are also in the current element
here is a small pseudeo code:
OneItem;
List AllItems;
for Item in AllItems:
int count = number strings in Item.Values which are also in OneItem.Values
because the data is very big, I need some help to make a efficent implementation.
How to do this? Should I use a hashmap? how to count the overlap?

Your question doesn't provide detailed information about the involved types which you want to compare. So I assume you have a List<Item>. Each item has a String and an own List<Item>
So first I would create a HashSet of the Strings of the Items in your AllItems-List. Iterate the AllList and add the String of each Item to the HashSet.
Then in the second step iterate the AllList again and iterate the List in the Items and check each String here if it is in the HashSet which was created before.
If you have to check this several times you can keep the HashSet as a cache which you refresh when the AllList gets changed.
// Step 1: Create Set of Strings
Set<String> allStrings = new HashSet<String>();
for (Item item : allList) {
allStrings.add(item.getString());
}
// Step 2: Calculate occurrences
for (Item item : allList) {
for (Item internalItem : item.getItems()) {
if (allStrings.contains(internalItem.getString()) {
// Count one up for this String
// This might be done by replacing the HashSet by a HashMap and use its values for counting
}
}
}

Make Item.Values a Set rather than a List. A decent Set implementation - like a HashSet - will run the contains() operation in constant time. Then iterate over one set and increment a count each time the other set contains the element.
An optimization is to always iterate over the smaller set. That way the counting operation is O(n) where n is the size of the smaller set.

If the comparison is only one way (i.e. only counting strings in one list that are also in another but NOT the other way around) then the best way of doing it would probably be to put both lists in a Set instead:
HashSet firstSet = ...
HashSet secondSet = ...
for(each value in firstSet)
{
if(secondSet.contains(value)
{
// Do what you want with the value.
// Sugestion: Add value to a separate set
// so you can track duplicates etc
}
}

With this code you create an ArrayList of Map with the string values and the number of matches in your OneItem.Values...
ArrayList<Map<String,Integer>> matches=new ArrayList<>();
for (Item i : AllItems) {
Map<String,Integer> map=new HashMap<>();
for(String s:values){
map.put(s,Collections.frequency(OneItem.Values, s));
}
matches.add(map);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find duplicates in unsorted string array - O(nlogn) - java

Since the time bound is nlog(n), you could safely first sort the array, and then do a scan from left to right to check for duplicated strings.

Related

How to check String contains one of Strings in collection

Java sorting alphabetical order of array

How to retrieve elements from sorted TreeSet using Binary Search?

Compare Set & List in my case

how to calculate efficent how many elements in a list are the same?

Categories

Resources