Fastest way to find substring in JAVA

Fastest way to find substring in JAVA - java

lets say i have list of names.
ArrayList<String> nameslist = new ArrayList<String>();
nameslist.add("jon");
nameslist.add("david");
nameslist.add("davis");
nameslist.add("jonson");
and this list contains few thousands nameslist in it. What is the fastes way to know that this list contains names start with given name.
String name = "jon"
result should be 2.
I have tried with comparing every element of list with substring function (it works but) it is very slow specially when list is huge.
Thanks is advance.

You could use a TreeSet for O(log n) access and write something like:
TreeSet<String> set = new TreeSet<String>();
set.add("jon");
set.add("david");
set.add("davis");
set.add("jonson");
set.add("henry");
Set<String> subset = set.tailSet("jon");
int count = 0;
for (String s : subset) {
if (s.startsWith("jon")) count++;
else break;
}
System.out.println("count = " + count);
which prints 2 as you expect.
Alternatively, you could use Set<String> subset = set.subSet("jon", "joo"); to return the full list of al names that start with "jon", but you need to give the first invalid entry that follows the jons (in this case: "joo").

Have a look at Trie. It's a data structure aimed to perform fast searches according to word prefixes. You may need to manipulate it a bit in order to get the number of leafs in the subtree, but in any case you do not traverse the entire list.

The complexity of searching in ArrayList (or linear array) is O(n), where n is number of elements in array.
For best performance you can see Trie

Iterate on the ArrayList, for each element, check if it begins with jon. Time complexity is O(n).

What exactly does "very slow" mean?
Really the only way to do this is to loop through the list and check every element:
int count = 0;
for (String name : nameslist) {
if (name.startsWith("jon")) {
count++;
}
}
System.out.println("Found: " + count);

If your strings in list are not too long you can use this cheat: store in HashSet all prefixes and your complexity will be ~O(1):
// Preprocessing
List<String> list = Arrays.asList("hello", "world"); // Your list
Set<String> set = new HashSet<>()
for(String s: list) {
for (int i = 1; i <= s.length; i++) {
set.add(s.substring(0, i));
}
}
// Now you want to test
assert true == set.contains("wor")
If it is not, you can use any full text search engine like Apache Lucene

I'd suggest you to create a Runnable for processing the list elements. Then you create an ExecutorService with fixed pool size, which processes the elements concurrently.
Rough example:
ExecutorService executor = Executors.newFixedThreadPool(5);
for (String str : coll){
Runnable r = new StringProcessor(str);
executor.execute(r);
}

I suggest TreeSet.
similar way access every element and increment count. alogorithm wise you can improve performance.
int count = 0;
iter = list.iterator();
String name;
while(iter.hasNext()) {
name = iter.next();
if (name.startsWith("jon")) {
count++;
}
if(name.startsWith("k")) break;
}
This break eliminates the checking of rest of string comparisons.

You can consider Boyer–Moore string search algorithm.
complexity O(n+m) worst case.

You need to iterate each name and find the name within it.
String name = "jon";
int count=0;
for(String n:nameslist){
if(n.contains(name){
count++;
}
}

Related

Removing duplicate elements & count repetitions in ArrayList

This is more difficult than I expected. I have a sorted ArrayList of Strings (words), and my task is to remove the repetitions and print out a list of each word, followed by the number of the word's repetitions. Suffice it to say that it's more complex than I expected. After trying different things, I decided to use a HashMap to store the words (key), value(repetitions).
This is the code. Dictionary is the sorted ArrayList and Repetitions that HashMap.
public void countElements ()
{
String word=dictionary.get(0);
int wordCount=1;
int count=dictionary.size();
for (int i=0;i<count;i++)
{
word=dictionary.get(i);
for (int j=i+1; j<count;j++)
{
if(word.equals(dictionary.get(j)))
{
wordCount=wordCount+1;
repetitions.put(word, wordCount);
dictionary.remove(j--);
count--;
}
}
}
For some reason that I do not understand (I'm a beginner), after I call the dictionary.remove(j--) method, variable j decrements by 1, even though it should be i+1. What am I missing? Any ideas on how to do this properly would be appreciated. I know that it would be best to use an iterator, but that can become even more confusing.
Many thanks.

A version which uses streams:
final Map<String, Long> countMap = dictionary.stream().collect(
Collectors.groupingBy(word -> word, LinkedHashMap::new, Collectors.counting()));
System.out.println("Counts follow");
System.out.println(countMap);
System.out.println("Duplicate-free list follows");
System.out.println(countMap.keySet());
Here we group (using Collectors.groupingBy) the elements of the list using each element (i.e. each word) as a key in the resulting map, and counting this word occurrences (using Collectors.counting()).
Outer collector (groupingBy) uses counting collector as a downstream collector that collects (here, counts) all the occurrences of a single word.
We're using LinkedHashMap here to build the map because it maintains the order in which key-value pairs were added to it as we want to maintain the same order that words had in your initial list.
And one more thing: countMap.keySet() is not a List. If you want to get a List in the end, it's just new ArrayList(countMap.keySet()).

This code will serve your purpose. Now dictionary would contain the unique words and hashmap would contain the frequency count of each word.
public class newq {
public static void main(String[] args)
{
ArrayList<String> dictionary=new ArrayList<String>();
dictionary.add("hello");
dictionary.add("hello");
dictionary.add("asd");
dictionary.add("qwet");
dictionary.add("qwet");
HashMap<String,Integer> hs=new HashMap<String,Integer>();
int i=0;
while(i<dictionary.size())
{
String word=dictionary.get(i);
if(hs.containsKey(word)) // check if word repeated
{
hs.put(word, hs.get(word)+1); //if repeated increase the count
dictionary.remove(i); // remove the word
}
else
{
hs.put(word, 1); //not repeated
i++;
}
}
Iterator it = hs.entrySet().iterator();
while(it.hasNext())
{
HashMap.Entry pair = (HashMap.Entry)it.next();
System.out.println(pair.getKey() + " = " + pair.getValue());
it.remove();
}
for(String word: dictionary)
{
System.out.println(word);
}
}
}

If you don't want 'j' to decrement you should use j-1.
Using j--, --j, j++, or ++j changes the value of the variable.
This link has a good explanation and simple examples about post- en pre-incrementing.

How to check multiple contains operations faster?

I have a String list as below. I want to do some calculations based on if this list has multiple elements with same value.
I got nearly 120k elements and when I run this code it runs too slow. Is there any faster approach than contains method?
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
List<String> tempWordsList = new LinkedList<String>(); //empty list
String[] keys = getKeysFromDB();
List<String> tempKeysList = new LinkedList<String>();
for (int x = 0; x < words.size(); x++) {
if (!tempWordsList.contains(words.get(x))) {
tempWordsList.add(words.get(x));
String key= keys[x];
tempKeysList.add(key);
} else {
int index = tempWordsList.indexOf(words.get(x));
String m = tempKeysList.get(index);
String n = keys[x];
if (!m.contains(n)) {
String newWord = m + ", " + n;
tempKeysList.set(index, newWord);
}
}
}
EDIT: words list comes from database and problem is there is a service continuously updating and inserting data to this table. I don't have any access to this service and there are other applications who is using the same table.
EDIT2: I have updated for full code.

You are searching the list twice per word: once for contains() and once for indexOf(). You could replace contains() by indexOf(), test the result for -1, otherwise reuse the result instead of calling indexOf() again. But you are certainly using the wrong data structure. What exactly do you need a for? Do you need a? I would use a HashSet, or a HashMap if you need to associate other data with each word.

//1) if you can avoid using linked list use below solution
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
//if you can avoid using linked list, use set instead
Set<String> set=new HashSet<>();
for (String s:words) {
if (!set.add(s)) {
//do some calculations
}
}
//2) if you can't avoid using linked list use below code
List<String> words= getWordsFromDB(); //words list has nearly 120k elements
List<String> tempList = new LinkedList<String>(); //empty list
//if you can't avoid LinkedListv (tempList) you need to use a set
Set<String> set=new HashSet<>();
for (String s:words) {
if (set.add(s)) {
tempList.add(s);
} else {
int a = tempList.indexOf(s);
//do some calculations
}
}

LinkedList.get() runs in O(N) time. Either use ArrayList with O(1) lookup time, or avoid indexed lookups altogether by using an iterator:
for (String word : words) {
if (!tempList.contains(word)) {
tempList.add(word);
} else {
int firstIndex = tempList.indexOf(word);
//do some calculations
}
}
Disclaimer: The above was written under the questionable assumption that words is a LinkedList. I would still recommend the enhanced-for loop, since it's more conventional and its time complexity is not implementation-dependent. Either way, the suggestion below still stands.
You can further improve by replacing tempList with a HashMap. This will avoid the O(N) cost of contains() and indexOf():
Map<String, Integer> indexes = new HashMap<>();
int index = 0;
for (String word : words) {
Integer firstIndex = indexes.putIfAbsent(word, index++);
if (firstIndex != null) {
//do some calculations
}
}
Based on your latest update, it looks like you're trying to group "keys" by their corresponding "word". If so, you might give streams a spin:
List<String> words = getWordsFromDB();
String[] keys = getKeysFromDB();
Collection<String> groupedKeys = IntStream.range(0, words.size())
.boxed()
.collect(Collectors.groupingBy(
words::get,
LinkedHashMap::new, // if word order is significant
Collectors.mapping(
i -> keys[i],
Collectors.joining(", "))))
.values();
However, as mentioned in the comments, it would probably be best to move this logic into your database query.

Acutally, tempList use linear complexity time methods :
if (!tempList.contains(words.get(x))) {
and
int a = tempList.indexOf(words.get(x));
It means that at each invocation of them, the list is in average iterate at half.
Besides, these are redundant.
indexOf() only could be invoked :
for (int x = 0; x < words.size(); x++) {
int indexWord = tempList.indexOf(words.get(x));
if (indexWord != -1) {
tempList.add(words.get(x));
} else {
//do some calculations by using indexWord
}
}
But to improve all accesses, you should change your structure : wrapping or replacing LinkedList by LinkedHashSet.
LinkedHashSet would keep the actual behavior because as List, it defines the iteration ordering, which is the order in which elements were inserted into the set but it also uses hashing feature to improve time access to its elements.

Binary search over a list of pairs

I need to find elem that would match element.
My program works but it is not efficient. I have a very large ArrayList<Obj> pairs (more than 4000 elements) and I use a binary search to find matching indexes.
public int search(String element) {
ArrayList<String> list = new ArrayList<String>();
for (int i = 0; i < pairs.size(); i++) {
list.add(pairs.get(i).getElem());
}
return index = Collections.binarySearch(list, element);
}
I wonder if there is a more efficient way than using a loop to copy half of the ArrayList pairs into a new ArrayList list.
Constructor for Obj: Obj x = new Obj(String elem, String word);

If your master list (pairs) does not change then I'd recommend creating a TreeMap to maintain reverse index structure, e.g.:
List<String> pairs = new ArrayList<String>(); //list containing 4000 entries
Map<String, Integer> indexMap = new TreeMap<>();
int index = 0;
for(String element : pairs){
indexMap.put(element, index++);
}
Now, while searching for an element, all you need to do is :
indexMap.get(element);
That will give you the required index or null if element doesn't exist. Also, if an element can be present in the list multiple times then, you can change the indexMap to be Map<String, List<Integer>>.
Your current algorithm iterates the list and calls the binary search, so complexity would be O(n) for iteration and O(log n) whereas TreeMap guarantees log(n) time cost so it will be much quicker.
Here's the documentation of TreeMap.

It looks like the problem is solved.
As my issue was that ArrayList pairs type was Obj and element type was String, I couldn't use Collections.binarySearch, I decided to create a new variable
Obj x = new Obj(element, "");. It looks like the string doesn't cause any issues (it passed my JUnit tests) as my compareTo method compares two elems and ignores the second variable of Obj x.
My updated method:
public int search(String element) {
Obj x = new Obj(element, "");
int index = Collections.binarySearch(pairs, x);

detect last foreach loop iteration

Supposing that I have some foreach loop like this:
Set<String> names = new HashSet<>();
//some code
for (String name: names) {
//some code
}
Is there a way to check inside foreach that the actual name is the last one in Set without a counter? I didn't found here some question like this.

For simplicity and understandability, imo, would do:
Set<String> names = new HashSet<>();
Iterator<String> iterator = names.iterator();
while (iterator.hasNext()) {
String name = iterator.next();
//Do stuff
if (!iterator.hasNext()) {
//last name
}
}
Also, it depends on what you're trying to achieve. Let's say you are implementing the common use case of separating each name by coma, but not add an empty coma at the end:
Set<String> names = new HashSet<>();
names.add("Joao");
names.add("Pereira");
//if the result should be Joao, Pereira then something like this may work
String result = names.stream().collect(Collectors.joining(", "));

Other answears are completely adequate, just adding this solution for the given question.
Set<String> names = new HashSet<>();
//some code
int i = 0;
for (String name: names) {
if(i++ == names.size() - 1){
// Last iteration
}
//some code
}

There isn't, take a look at How does the Java 'for each' loop work?
You must change your loop to use an iterator explicitly or an int counter.

If you are working with a complex object and not just a plain list/set the below code might help. Just adding a map function to actually get the desired string before you collect.
String result = violations.stream().map(e->e.getMessage()).collect(Collectors.joining(", "));

Yes, there is a way to check it inside of foreach, by use of a counter:
Set<String> names = new HashSet<>();
int i = names.size() - 1;
for (String name: names) {
if (i-- == 0) {
// some code for last name
}
//some code
}
Consider, names.size() is called only one time outside of the loop. This makes the loop faster than processing it multiple times within the loop.

There is no build in method to check if the current element is also the last element. Besides that you are using a HashSet which does not guarantee the return order. Even if you want to check it e.g. with an index i the last element could always be a different one.

A Set does not guaranty order over of items within it. You may loop through the Set once and retrieve "abc" as the "last item" and the next time you may find that "hij" is the "last item" in the Set.
That being said, if you are not concerned about order and you just want to know what the arbitrary "last item" is when observing the Set at that current moment, there is not built in way to do this. You would have to make use of a counter.

.map(String::toString) from the answer above is redundant, because HashSet already contains String values. Do not use Set to concatenate strings because the order is not assured.
List<String> nameList = Arrays.asList("Michael", "Kate", "Tom");
String result = nameList.stream().collect(Collectors.joining(", "));

There is an easy way you can do this throw one condition.
consider this is your array:
int odd[] = {1,3,5,7,9,11};
and you want to print it all in one line with " - " hyphen between them except the last one.
for(int aa:odd) {
System.out.print(aa);
if(odd[odd.length - 1] != aa)
System.out.print(" - ");
}
this condition
if( odd[odd.length - 1] != aa )
will check if you aren't in the last element so you can still add " - ", otherwise you will not add it.

List<String> list = Arrays.asList("1", "2", "3");
for (String each : list) {
if (list.indexOf(each) == (list.size() - 1)) {
System.out.println("last loop");
}
}
Note: Set is NOT an ordered collection.

Java ArrayList not working on characters in my case

I have a big doubt. I want to find the first char of a string here which isn't repeated.For e.g. for the input below should return 'c'. So this is how I was planning on doing it. But I noticed the remove method is looking to remove at an index of 98 vs removing the object "a". How do I force it to remove the object "a" instead of removing from index ?
Why doesn't this work ?
And what can I do to change this ?
Is ArrayList always guaranteed to store things in order ?
public void findStartingLetter()
{
String[] array={"a","b","c","d","b","a","d","d","d"};
List<Character> list = new ArrayList<Character>();
for(String i:array)
{
if(list.contains(i.charAt(0)))
list.remove(i.charAt(0));
else
list.add(i.charAt(0));
}
}
EDIT:
Performance wise is this an O(n) function ?

You have to cast manually to a Character since the char gets casted to an int, which in turn goes by index and not value.
list.remove((Character) i.charAt(0));
Will ensure that it is done properly.

Is ArrayList always guaranteed to store things in order ?
Depends on your definition of order:
If you mean the order you add them, Yes.
If you mean numerical/alphabetical order, then No, but you can sort it by using
Collections.sort(list)
This will sort by the natural ascending order of the objects in the list.

I'm not entirely sure why you want to use a List for this, but I would instead recommend a Set - it's guaranteed to not contain duplicates.
Here's the first approach, with a set:
public Set<Character> addToSet(String[] elements) {
Set<Character> res = new HashSet<>();
for(String c : elements) {
res.add(c.charAt(0));
}
return res;
}
Now, if you really want to do this with a List, then it's similar code - you just need to check to see if the element exists before you add it in.
public List<Character> addUnique(String[] elements) {
List<Character> res = new ArrayList<>();
for(String item : elements) {
Character c = item.charAt(0);
if(!res.contains(c)) {
res.add(c);
}
}
return res;
}

Your approach to this problem is quite confusing and you ask many questions which do not seem to relate to your problem.
Why not just use:
String testString = "abcdbaddd";
Character retVal = null;
for (int i = 0; i < testString.length() -1; i++) {
if (testString.charAt(i) == testString.charAt(i + 1)) {
retVal = testString.charAt(i);
break;
}
}
return retVal;
That gets you the first non-repeated character (I'm assuming that by repeated you mean repeated and adjacent) or null if no such character exists.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Fastest way to find substring in JAVA - java

Have a look at Trie. It's a data structure aimed to perform fast searches according to word prefixes. You may need to manipulate it a bit in order to get the number of leafs in the subtree, but in any case you do not traverse the entire list.

The complexity of searching in ArrayList (or linear array) is O(n), where n is number of elements in array. For best performance you can see Trie

Iterate on the ArrayList, for each element, check if it begins with jon. Time complexity is O(n).

What exactly does "very slow" mean? Really the only way to do this is to loop through the list and check every element: int count = 0; for (String name : nameslist) { if (name.startsWith("jon")) { count++; } } System.out.println("Found: " + count);

You can consider Boyer–Moore string search algorithm. complexity O(n+m) worst case.

You need to iterate each name and find the name within it. String name = "jon"; int count=0; for(String n:nameslist){ if(n.contains(name){ count++; } }

Related

Removing duplicate elements & count repetitions in ArrayList

How to check multiple contains operations faster?

Binary search over a list of pairs

detect last foreach loop iteration

Java ArrayList not working on characters in my case

Categories

Resources