Need to use of concurrency in this java method

Need to use of concurrency in this java method - java

So I have the following code that takes the input of two arrays, and apply some queries to match elements from Array1 with elements from Array2, then it returns the number of elements that are similar in the two ArrayLists.
Here is the code I use:
public static void get_ND_Matches() throws IOException{
#SuppressWarnings("rawtypes")
List<String> array1 = new ArrayList<String>();
List<String> array2 = new ArrayList<String>();
array1 = new ArrayList<String>( ClassesRetrieval.getDBpediaClasses());
array2 = new ArrayList<String>( ClassesRetrieval.fileToArrayListYago());
String maxLabel="";
HashMap<String,Integer> map = new HashMap<String,Integer>();
int number;
HashMap<String,ArrayList<String>> theMap = new HashMap<>();
for(String yagoClass:array2){
theMap.put(yagoClass, getListTwo(yagoClass));
System.out.println("Done for : "+yagoClass );
}
for(String dbClass:array1){
ArrayList<String> result = get_2D_Matches(dbClass);
for(Map.Entry<String, ArrayList<String>> entry : theMap.entrySet()){
String yagoClass=entry.getKey();
Set<String> IntersectionSet =Sets.intersection(Sets.newHashSet(result), Sets.newHashSet(entry.getValue()));
System.out.println(dbClass + " and "+ yagoClass+ " = "+ IntersectionSet.size());
number = IntersectionSet.size();
map.put(yagoClass, number);
}
int maxValue=(Collections.max(map.values()));
for(Entry<String, Integer> entry:map.entrySet()){
if(entry.getValue()==maxValue && maxValue != 0){
maxLabel = entry.getKey();
}
if(maxValue==0){
maxLabel = "Nothing in yago";
}
}
System.out.println("-------------------------------");
System.out.println(dbClass+" from DBPEDIA Corresponds to "+ maxLabel);
System.out.println("-------------------------------");
}
}
This code returns for example:
Actor from DBPEDIA Corresponds to Yago_Actor
Album from DBPEDIA Corresponds to Yago_Album
SomeClass from DBPEDIA Corresponds to nothing in Yago
Etc..
Behind the scenes, this code uses getDBpediaClasses and then applies Get_2D_Matches(); method to get an arrayList of results for each class. Each ArrayList resulted is then compared to another ArrayList generated by getListTwo() for each class of fileToArrayListYago().
Now, because of all the calculations made in the background (there are millions of elements in each array), this process takes hours to execute.
I would really like to use concurrency/multithreading to solve this issue. Could anyone show me how to do that?

It makes little sense to parallelize code which is not perfectly clean and optimized. You may get factor 4 on a typical 4-core CPU or nothing at all, depending on whether you choose the part to be paralellized properly. Use of a better algorithm may give you much more.
It's possible that the bottleneck is get_2D_Matches, which you haven't published.
Computing the maximum directly instead of creating a throw-away HashMap<String,Integer> map could save quite some time, and so could moving Sets.newHashSet(result) out of the loop.
You should really reconsider variable naming. With names like map, theMap, and result (for something which is not the method's result), it's hard to find out what's going on.
If you really want to parallelize it, you need to split your overlong method first. Then it's rather simple as the processing of each dbClass can be done independently. Just encapsulate it as a Callable and submit it to an ExecutorService.
However, I'd suggest to clean the code first, then submit it to CR, and then consider parallelizing it.

Related

Iterating and comparing big data set

Basically I receive a 2 big data lists from 2 different database, the list looks like this:
List 1:
=============
A000001
A000002
A000003
.
.
A999999
List 2:
=============
121111
000111
000003
000001
.
.
I need to compare two list and find out each data which is in List 1 is available in List 2 (after appending some standard key to it), so that and if it is available put it in 3rd list for further manipulation. As an example A000001 is available in List 1 as well as in List 2 (after appending some standard key to it) so I need to put it in 3rd list.
Basically I have this code, it does like this for each row in List 1, I'm iterating through all data in List 2 and doing comparison. (Both are array list)
List<String> list1 = //Data of list 1 from db
List<String> list2 = //Data of list 2 from db
for(String list1Item:list1) {
for(String list2Item:list2) {
String list2ItemAfterAppend = "A" + list2Item;
if(list1Item.equalsIgnoreCase(list2ItemAfterAppend)) {
//Add it to 3rd list
}
}
}
Yes, this logic works fine, but I feel this is not efficient way to iterate list. After putting timers, it's taking 13444 milliseconds on average for 2000x5000 list of data. My question is, is there any other logic you people can think of or suggest me to improve the performance of this code?
I hope I'm clear, if not please let me know if I can improve question.

You can order both list, then using only one loop iterate on both value, switching which index increments depending on which value is the biggest. Something like:
boolean isWorking = true;
Collections.sort(list1);
Collections.sort(list2);
int index1 = 0;
int index2 = 0;
while(isWorking){
String val1 = list1.get(index1);
String val2 = "A" + list2.get(index2);
int compare = val1.compareTo(val2)
if(compare == 0){
list3.add(val1);
index1++;
index2++;
}else if (compare > 0){
val2++;
}else{ // if(compare < 0)
val1++;
}
isWorking = !(index1 == list1.size() || index2 == list2.size() );
}
Be carefull about what kind of List you're using. The get(int i) on LinkedList is expensive, whereas it is not on an ArrayList. Also, you might want to save list1.size() and list2.size(), I dont't think it calcluates it everytime, but chek it. I'm not sure if it's really usefull/efficient, but you can initialise list3 with the size of the smallest of both list (taking into acount the loadFactor, look up for it), so list3 doesnt have to resize everytime.
The code above is not tested (maybe switch val1++ and val2++), but you get the idea. I believe it's faster than yours (because it's O(n+m) rather than O(n*m) but I'll let you see (both sort() and compareTo() will add some time compared to your method, but normally it shouldn't be too much). If you can, use your RDBMS to sort both list when you get them (so you don't have to do it in the Java code)

I think the problem is how big the list is and how much memory you have.
For me for under 1 million records, I will use a HashSet to make it faster.
Code may like:
Set<String> set1 = //Data of list 1 from db, when you get the data you make it a Set instead of a List. HashSet is enough for you to use.
List<String> list2 = //Data of list 2 from db
Then you just need to:
for(String list2Item:list2) {
if(set1.contains("A" + list2Item) {
}
}
Hope this can help you.

You can use intersection method from apache commons. Example:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import org.apache.commons.collections4.CollectionUtils;
public class NewClass {
public static void main(String[] args) {
List<String> list1 = Arrays.asList("A000001","A000002","A000003");
List<String> list2 = Arrays.asList("121111","000111","000001");
List<String> list3 = new ArrayList<>();
list2.stream().forEach((s) -> {list3.add("A"+s);});
Collection<String> common = CollectionUtils.intersection(list1, list3);
}
}

You could try to use the Stream API for this, the code to create the new list with Streams is very concise and straightforward and probably very similar in performance:
List<String> list3 = list2.stream()
.map(s->"A"+s)
.filter(list1::contains)
.collect(Collectors.toList());
If the list are big, you could try to process the list in parallel and use multiple threads to process the list. This may or may not improve the performance. Doing some measures its important to check if processing the list in parallel is actually improving the performance.
To process the stream in parallel, you only need to call the method parallel on the stream:
List<String> list3 = list2.stream()
.parallel()
.map(s->"A"+s)
.filter(list1::contains)
.collect(Collectors.toList());

Your code is doing a lot of String manipulation, 'equalsIgnoreCase' convert the Characters to upper/lower case. This is being executed in your inner loop and the size of your list is 5000x2000, so the String manipulation is being done millions of times.
Ideally, get your Strings in either upper or lower case from the database and avoid the conversion inside the inner loop. If this is not possible, probably converting the case of the String at the beginning improves the performance.
Then, you could create a new list with the elements of one of the lists and keep all the elements present in the other list, the code with the uppercase conversion could be:
list1.replaceAll(String::toUpperCase);
List<String> list3 = new ArrayList<>(list2);
list3.replaceAll(s->"A"+s.toUpperCase());
list3.retainAll(list1);

How to count the number of occurrences of each word?

If I have an article in English, or a novel in English, and I want to count how many times each words appears, what is the fastest algorithm written in Java?
Some people said you can use Map < String, Integer>() to complete this, but I was wondering how do I know what is the key words? Every article has different words and how do you know the "key" words then add one on its count?

Here is another way to do it with the things that appeared in Java 8:
private void countWords(final Path file) throws IOException {
Arrays.stream(new String(Files.readAllBytes(file), StandardCharsets.UTF_8).split("\\W+"))
.collect(Collectors.groupingBy(Function.<String>identity(), TreeMap::new, counting())).entrySet()
.forEach(System.out::println);
}
So what is it doing?
It reads a text file completely into memory, into a byte array to be more precise: Files.readAllBytes(file). This method turned up in Java 7 and allows methods of loading files very fast, however for the price that the file will be completely in memory, costing a lot of memory. For speed however this is a good appraoch.
The byte[] is converted to a String: new String(Files.readAllBytes(file), StandardCharsets.UTF_8) while assuming that the file is UTF8 encoded. Change at your own need. The price is a full memory copy of the already huge piece of data in memory. It may be faster to work with a memory mapped file instead.
The string is split at non-Word charcaters: ...split("\\W+") which creates an array of strings with all your words.
We create a stream from that array: Arrays.stream(...). This by itself does not do very much, but we can do a lot of fun things with the stream
We group all the words together: Collectors.groupingBy(Function.<String>identity(), TreeMap::new, counting()). This means:
We want to group the words by the word themselves (identity()). We could also e.g. lowercase the string here first if you want grouping to be case insensitive. This will end up to be the key in a map.
As a result for storng the grouped values we want a TreeMap (TreeMap::new). TreeMaps are sorted by their key, so we can easily output in alphabetical order in the end. If you do not need sorting you could also use a HashMap here.
As value for each group we want to have the number of occurances of each word (counting()). In background that means that for each word we add to a group we increase the counter by one.
From Step 5 we are left with a Map that maps words to their count. Now we just want to print them. So we access a collection with all the key/value pairs in this map (.entrySet()).
Finally the actual printing. We say that each element should be passed to the println method: .forEach(System.out::println). And now you are left with a nice list.
So how good is this answer? The upside is that is is very short and thus highly expressive. It also gets along with only a single system call that hides behind Files.readAllBytes (or at least a fixed number I am not sure if this really works with a single system call) and System calls can be a bottleneck. E.g. if you are reading a file from a stream, each call to read may trigger a system call. This is significantly reduced by using a BufferedReader that as the name suggests buffers. but stilly readAllBytes should be fastest. The price for this is that it consumes huge amounts of memory. However wikipedia claims that a typical english book has 500 pages with 2,000 characters per page which mean roughly 1 Megabyte which should not be a problem in terms of memory consumption even if you are on a smartphone, raspberry pi or a really really old computer.
This solutions does involve some optimizations that were not possible prior to Java 8. For example the idiom map.put(word, map.get(word) + 1) requires the "word" to be looked up twicte in the map, which is an unnecessary waste.
But also a simple loop might be easier to optimize for the compiler and might save a number of method calls. So I wanted to know and put this to a test. I generated a file using:
[ -f /tmp/random.txt ] && rm /tmp/random.txt; for i in {1..15}; do head -n 10000 /usr/share/dict/american-english >> /tmp/random.txt; done; perl -MList::Util -e 'print List::Util::shuffle <>' /tmp/random.txt > /tmp/random.tmp; mv /tmp/random.tmp /tmp/random.txt
Which gives me a file of about 1,3MB, so not that untypical for a book with most words being repeated 15 times, but in random order to circumvent that this end up to be a branch prediction test. Then I ran the following tests:
public class WordCountTest {
#Test(dataProvider = "provide_description_testMethod")
public void test(String description, TestMethod testMethod) throws Exception {
long start = System.currentTimeMillis();
for (int i = 0; i < 100_000; i++) {
testMethod.run();
}
System.out.println(description + " took " + (System.currentTimeMillis() - start) / 1000d + "s");
}
#DataProvider
public Object[][] provide_description_testMethod() {
Path path = Paths.get("/tmp/random.txt");
return new Object[][]{
{"classic", (TestMethod)() -> countWordsClassic(path)},
{"mixed", (TestMethod)() -> countWordsMixed(path)},
{"mixed2", (TestMethod)() -> countWordsMixed2(path)},
{"stream", (TestMethod)() -> countWordsStream(path)},
{"stream2", (TestMethod)() -> countWordsStream2(path)},
};
}
private void countWordsClassic(final Path path) throws IOException {
final Map<String, Integer> wordCounts = new HashMap<>();
for (String word : new String(readAllBytes(path), StandardCharsets.UTF_8).split("\\W+")) {
Integer oldCount = wordCounts.get(word);
if (oldCount == null) {
wordCounts.put(word, 1);
} else {
wordCounts.put(word, oldCount + 1);
}
}
}
private void countWordsMixed(final Path path) throws IOException {
final Map<String, Integer> wordCounts = new HashMap<>();
for (String word : new String(readAllBytes(path), StandardCharsets.UTF_8).split("\\W+")) {
wordCounts.merge(word, 1, (key, oldCount) -> oldCount + 1);
}
}
private void countWordsMixed2(final Path path) throws IOException {
final Map<String, Integer> wordCounts = new HashMap<>();
Pattern.compile("\\W+")
.splitAsStream(new String(readAllBytes(path), StandardCharsets.UTF_8))
.forEach(word -> wordCounts.merge(word, 1, (key, oldCount) -> oldCount + 1));
}
private void countWordsStream2(final Path tmpFile) throws IOException {
Pattern.compile("\\W+").splitAsStream(new String(readAllBytes(tmpFile), StandardCharsets.UTF_8))
.collect(Collectors.groupingBy(Function.<String>identity(), HashMap::new, counting()));
}
private void countWordsStream(final Path tmpFile) throws IOException {
Arrays.stream(new String(readAllBytes(tmpFile), StandardCharsets.UTF_8).split("\\W+"))
.collect(Collectors.groupingBy(Function.<String>identity(), HashMap::new, counting()));
}
interface TestMethod {
void run() throws Exception;
}
}
The result were:
type length diff
classic 4665s +9%
mixed 4273s +0%
mixed2 4833s +13%
stream 4868s +14%
stream2 5070s +19%
Note that I previously also tested with TreeMaps, but found that the HashMaps were much faster, even if I sorted the output afterwards. Also I changed the tests above after Tagir Valeev told me in the comments below about the Pattern.splitAsStream() method. Since I got strongly varying results I left the tests run for quite a while as you can see by the length in seconds above to get meaningful results.
How I judge the results:
The "mixed" approach which does not use streams at all, but uses the "merge" method with callback introduced in Java 8 does improve the performance. This is something I expected because the classic get/put appraoch requires the key to be looked up twice in the HashMap and this is not required anymore with the "merge"-approach.
To my suprise the Pattern.splitAsStream() appraoch is actually slower compared to Arrays.asStream(....split()). I did have a look at the source code of both implementations and I noticed that the split() call saves the results in an ArrayList which starts with a size of zero and is enlarged as needed. This requires many copy operations and in the end another copy operation to copy the ArrayList to an array. But "splitAsStream" actually creates an iterator which I thought can be queried as needed avoiding these copy operations completely. I did not quite look through all the source that converts the iterator to a stream object, but it seems to be slow and I don't know why. In the end it theoretically could have to do with CPU memory caches: If exactly the same code is executed over and over again the code will more likely be in the cache then actually running on large function chains, but this is a very wild speculation on my side. It may also be something completely different. However splitAsStream MIGHT have a better memory footprint, maybe it does not, I did not profile that.
The stream approach in general is pretty slow. This is not totally unexpected because quite a number of method invocations take place, including for example something as pointless as Function.identity. However I did not expect the difference at this magnitude.
As an interesting side note I find the mixed approach which was fastest quite well to read and understand. The call to "merge" does not have the most ovbious effect to me, but if you know what this method is doing it seems most readable to me while at the same time the groupingBy command is more difficult to understand for me. I guess one might be tempted to say that this groupingBy is so special and highly optimised that it makes sense to use it for performance but as demonstrated here, this is not the case.

Map<String, Integer> countByWords = new HashMap<String, Integer>();
Scanner s = new Scanner(new File("your_file_path"));
while (s.hasNext()) {
String next = s.next();
Integer count = countByWords.get(next);
if (count != null) {
countByWords.put(next, count + 1);
} else {
countByWords.put(next, 1);
}
}
s.close();
this count "I'm" as only one word

General overview of steps:
Create a HashMap<String, Integer>
Read the file one word a time. If it doesn't exist in your HashMap, add it and change the count value assigned to 1. If it exists, increment the value by 1. Read till end of file.
This will result in a set of all your words and the count for each word.

If i were you, I would use one of the implementations of a map<String, int>, like a hashmap. Then as you loop through each word if it already exists just increment the int by one, otherwise add it into the map. At the end you can pull out all of the words, or query it based on a specific word to get the count.
If order is important to you, you could try a SortedMap<String, int> to be able to pring them out in alphabetical order.
Hope that helps!

It is actually classic word-count algorithm.
Here is the solution:
public Map<String, Integer> wordCount(String[] strings) {
Map<String, Integer> map = new HashMap<String, Integer>();
int count = 0;
for (String s:strings) {
if (map.containsKey(s)) {
count = map.get(s);
map.put(s, count + 1);
} else {
map.put(s, 1);
}
}
return map;
}

Here is my solution:
Map<String, Integer> map= new HashMap();
int count=0;
for(int i =0;i<strings.length;i++){
for(int j=0;j<strings.length;j++){
if(strings[i]==strings[j])
count++;
}map.put(strings[i],count);
count=0;
}return map;

Can the new for loop in Java be used with two variables?

We can use the old for loop (for(i = 0, j = 0; i<30; i++,j++)) with two variables
Can we use the for-each loop (or the enhanced for loop) in java (for(Item item : items) with two variables? What's the syntax for that?

Unfortunately, Java supports only a rudimentary foreach loop, called the enhanced for loop. Other languages, especially FP ones like Scala, support a construct known as list comprehension (Scala calls it for comprehension) which allows nested iterations, as well as filtering of elements along the way.

No you can't. It is syntactic sugar for using Iterator. Refer here for a good answer on this issue.
You need to have an object that contains both variables.
It can be shown on a Map object for example.
for (Map.Entry<String,String> e: map.entrySet()) {
// you can use e.getKey() and e.getValue() here
}

The following should have the same (performance) effect that you are trying to achieve:
List<Item> aItems = new List<Item>();
List<Item> bItems = new List<Item>();
...
Iterator aIterator = aItems.iterator();
Iterator bIterator = bItems.iterator();
while (aIterator.hasNext() && bIterator.hasNext()) {
Item aItem = aIterator.next();
Item bItem = bIterator.next();
}

The foreach loop assumes that there is only one collection of things. You can do something for each element per iteration. How would you want it to behave that if you could iterate over two collections at once? What if they have different lenghts?
Assuming that you have
Collection<T1> collection1;
Collection<T2> collection2;
You could write an iterable wrapper that iterates over both and returns some sort of merged result.
for(TwoThings<T1, T2> thing : new TwoCollectionWrapper(collection1, collection2) {
// one of them could be null if collections have different length
T1 t1 = thing.getFirst();
T2 t2 = thing.getSecond();
}
That's the closest what I can think of but I don't see much use for that. If both collections are meant to be iterated together, it would be simpler to create a Collection<TwoThings> in the first place.
Besides iterating in parallel you could also want to iterate sequentially. There are implementations for that, e.g. Guava's Iterables.concat()

The simple answer "No" is already given. But you could implement taking two iterators as argument, and returning Pairs of the elements coming from the two iterators. Pair being a class with two fields. You'd either have to implement that yourself, or it is probably existent in some apache commons or similar lib.
This new Iterator could then be used in the foreach loop.

I had to do one task where I need to collect various data from XML and store in SET interface and then output them to a CSV file.
I read the data and stored it in Set interface object as x,y,z.
For CSV file header, I used string buffer to hold the headers
String buffer
StringBuffer buffer = new StringBuffer("");
buffer.append("FIRST_NAME,LAST_NAME,ADDRESS\r\n")
Set<String> x = new HashSet<String>();
Set<String> y = new HashSet<String>();
Set<String> z = new HashSet<String>();
....
Iterator iterator1 = x.iterator()
Iterator iterator2 = y.iterator()
Iterator iterator3 = z.iterator()
while(iterator1.hasNext() && iterator2.hasNext() && iterator3.hasNext()){
String fN = iterator1.next()
String lN = iterator2.next()
String aDS = iterator3.next()
buffer.append(""+fN+","+lN+","+aDS+"\r\n")
}

Counting occurrences of words in an array

I've been working on something which takes a stream of characters, forms words, makes an array of the words, then creates a vector which contains each unique words and the number of times it occurs (basically a word counter).
Anyway I've not used Java in a long time, or much programming to be honest and I'm not happy with how this currently looks. The part I have which makes the vector looks ugly to me and I wanted to know if I could make it less messy.
int counter = 1;
Vector<Pair<String, Integer>> finalList = new Vector<Pair<String, Integer>>();
Pair<String, Integer> wordAndCount = new Pair<String, Integer>(wordList.get(1), counter); // wordList contains " " as first word, starting at wordList.get(1) skips it.
for(int i= 1; i<wordList.size();i++){
if(wordAndCount.getLeft().equals(wordList.get(i))){
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter++);
}
else if(!wordAndCount.getLeft().equals(wordList.get(i))){
finalList.add(wordAndCount);
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter=1);
}
}
finalList.add(wordAndCount); //UGLY!!
As a secondary question, this gives me a vector with all the words in alphabetical order (as in the array). I want to have it sorted by occurrence, the alphabetical within that.
Would the best option be:
Iterate down the vector, testing each occurrence int with the one above, using Collections.swap() if it was higher, then checking the next one above (as its now moved up 1) and so on until it's no longer larger than anything above it. Any occurrence of 1 could be skipped.
Iterate down the vector again, testing each element against the first element of the vector and then iterating downwards until the number of occurrences is lower and inserting it above that element. All occurrences of 1 would once again be skipped.
The first method would doing more in terms of iterating over the elements, but the second one requires you to add and remove components of the vector (I think?) so I don't know which is more efficient, or whether its worth considering.

Why not use a Map to solve your problem?
String[] words // your incoming array of words.
Map<String, Integer> wordMap = new HashMap<String, Integer>();
for(String word : words) {
if(!wordMap.containsKey(word))
wordMap.put(word, 1);
else
wordMap.put(word, wordMap.get(word) + 1);
}
Sorting can be done using Java's sorted collections:
SortedMap<Integer, SortedSet<String>> sortedMap = new TreeMap<Integer, SortedSet<String>>();
for(Entry<String, Integer> entry : wordMap.entrySet()) {
if(!sortedMap.containsKey(entry.getValue()))
sortedMap.put(entry.getValue(), new TreeSet<String>());
sortedMap.get(entry.getValue()).add(entry.getKey());
}
Nowadays you should leave the sorting to the language's libraries. They have been proven correct with the years.
Note that the code may use a lot of memory because of all the data structures involved, but that is what we pay for higher level programming (and memory is getting cheaper every second).
I didn't run the code to see that it works, but it does compile (copied it directly from eclipse)

re: sorting, one option is to write a custom Comparator which first examines the number of times each word appears, then (if equal) compares the words alphabetically.
private final class PairComparator implements Comparator<Pair<String, Integer>> {
public int compareTo(<Pair<String, Integer>> p1, <Pair<String, Integer>> p2) {
/* compare by Integer */
/* compare by String, if necessary */
/* return a negative number, a positive number, or 0 as appropriate */
}
}
You'd then sort finalList by calling Collections.sort(finalList, new PairComparator());

How about using google guava library?
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
int countFoo = multiset.count("foo");
From their javadocs:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Simple enough?

How to lowercase every element of a collection efficiently?

What's the most efficient way to lower case every element of a List or Set?
My idea for a List:
final List<String> strings = new ArrayList<String>();
strings.add("HELLO");
strings.add("WORLD");
for(int i=0,l=strings.size();i<l;++i)
{
strings.add(strings.remove(0).toLowerCase());
}
Is there a better, faster way? How would this example look like for a Set? As there is currently no method for applying an operation to each element of a Set (or List) can it be done without creating an additional temporary Set?
Something like this would be nice:
Set<String> strings = new HashSet<String>();
strings.apply(
function (element)
{ this.replace(element, element.toLowerCase();) }
);
Thanks,

Yet another solution, but with Java 8 and above:
List<String> result = strings.stream()
.map(String::toLowerCase)
.collect(Collectors.toList());

This seems like a fairly clean solution for lists. It should allow for the particular List implementation being used to provide an implementation that is optimal for both the traversal of the list--in linear time--and the replacing of the string--in constant time.
public static void replace(List<String> strings)
{
ListIterator<String> iterator = strings.listIterator();
while (iterator.hasNext())
{
iterator.set(iterator.next().toLowerCase());
}
}
This is the best that I can come up with for sets. As others have said, the operation cannot be performed in-place in the set for a number of reasons. The lower-case string may need to be placed in a different location in the set than the string it is replacing. Moreover, the lower-case string may not be added to the set at all if it is identical to another lower-case string that has already been added (e.g., "HELLO" and "Hello" will both yield "hello", which will only be added to the set once).
public static void replace(Set<String> strings)
{
String[] stringsArray = strings.toArray(new String[0]);
for (int i=0; i<stringsArray.length; ++i)
{
stringsArray[i] = stringsArray[i].toLowerCase();
}
strings.clear();
strings.addAll(Arrays.asList(stringsArray));
}

You can do this with Google Collections:
Collection<String> lowerCaseStrings = Collections2.transform(strings,
new Function<String, String>() {
public String apply(String str) {
return str.toLowerCase();
}
}
);

If you are fine with changing the input list here is one more way to achieve it.
strings.replaceAll(String::toLowerCase)

Well, there is no real elegant solution due to two facts:
Strings in Java are immutable
Java gives you no real nice map(f, list) function as you have in functional languages.
Asymptotically speaking, you can't get a better run time than your current method. You will have to create a new string using toLowerCase() and you will need to iterate by yourself over the list and generate each new lower-case string, replacing it with the existing one.

Try CollectionUtils#transform in Commons Collections for an in-place solution, or Collections2#transform in Guava if you need a live view.

This is probably faster:
for(int i=0,l=strings.size();i<l;++i)
{
strings.set(i, strings.get(i).toLowerCase());
}

I don't believe it is possible to do the manipulation in place (without creating another Collection) if you change strings to be a Set. This is because you can only iterate over the Set using an iterator or a for each loop, and cannot insert new objects whilst doing so (it throws an exception)

Referring to the ListIterator method in the accepted (Matthew T. Staebler's) solution. How is using the ListIterator better than the method here?
public static Set<String> replace(List<String> strings) {
Set<String> set = new HashSet<>();
for (String s: strings)
set.add(s.toLowerCase());
return set;
}

I was looking for similar stuff, but was stuck because my ArrayList object was not declared as GENERIC and it was available as raw List type object from somewhere. I was just getting an ArrayList object "_products". So, what I did is mentioned below and it worked for me perfectly ::
List<String> dbProducts = _products;
for(int i = 0; i<dbProducts.size(); i++) {
dbProducts.add(dbProducts.get(i).toLowerCase());
}
That is, I first took my available _products and made a GENERIC list object (As I were getting only strings in same) then I applied the toLowerCase() method on list elements which was not working previously because of non-generic ArrayList object.
And the method toLowerCase() we are using here is of String class.
String java.lang.String.toLowerCase()
not of ArrayList or Object class.
Please correct if m wrong. Newbie in JAVA seeks guidance. :)

Using JAVA 8 parallel stream it becomes faster
List<String> output= new ArrayList<>();
List<String> input= new ArrayList<>();
input.add("A");
input.add("B");
input.add("C");
input.add("D");
input.stream().parallel().map((item) -> item.toLowerCase())
.collect(Collectors.toCollection(() -> output));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Need to use of concurrency in this java method - java

Related

Iterating and comparing big data set

How to count the number of occurrences of each word?

Can the new for loop in Java be used with two variables?

Counting occurrences of words in an array

How to lowercase every element of a collection efficiently?

Categories

Resources