parallelize a for loop and populate multiple data structures - java

I have a for loop that I want to parallelize. In my below code, I iterate my outermost for loop and put entries in various data structures and it works fine. And all those datastructures have a getter in the same class which I use later on to get all the details once everything is done in this for loop from some other class. I am populating info, itemToNumberMapping, catToValueHolder, tasksByCategory, catHolder, itemIds data structures and they also have getters as well.
// want to parallelize this for loop
for (Task task : tasks) {
if (task.getCategories().isEmpty() || task.getEventList() == null
|| task.getMetaInfo() == null) {
continue;
}
String itemId = task.getEventList().getId();
String categoryId = task.getCategories().get(0).getId();
Processor fp = new Processor(siteId, itemId, categoryId, poolType);
Map<String, Integer> holder = fp.getDataHolder();
if (!holder.isEmpty()) {
for (Map.Entry<String, Integer> entry : holder.entrySet()) {
info.putIfAbsent(entry.getKey(), entry.getValue());
}
List<Integer> values = new ArrayList<>();
for (String key : holder.keySet()) {
values.add(info.get(key));
}
itemToNumberMapping.put(itemId, StringUtils.join(values, ","));
catToValueHolder.put(categoryId, StringUtils.join(values, ","));
}
Category cat = getCategory(task, holder.isEmpty());
tasksByCategory.add(cat);
LinkedList<String> ids = getCategoryIds(task);
catHolder.put(categoryId, ids.getLast());
itemIds.add(itemId);
}
Now I know how to parallelize a for loop as in below example but confusion is - In my case, I don't have one object like output in below example. In my case, I have multiple data structures that I am populating by iterating for loop so I am confuse how can I parallelize my outermost for loop and still populate all those data structures?
private final ExecutorService service = Executors.newFixedThreadPool(10);
List<Future<Output>> futures = new ArrayList<Future<Output>>();
for (final Input input : inputs) {
Callable<Output> callable = new Callable<Output>() {
public Output call() throws Exception {
Output output = new Output();
// process your input here and compute the output
return output;
}
};
futures.add(service.submit(callable));
}
service.shutdown();
List<Output> outputs = new ArrayList<Output>();
for (Future<Output> future : futures) {
outputs.add(future.get());
}
Update:-
I am parallelizing a for loop which is inside a do while loop and my do while loop runs until number is less than or equal to pages. So maybe I am not doing it correctly. Because my do while loop will run until all the pages are done and for each page, I have a for loop which I am trying to parallelize and the way I have set it up, it's giving rejectedexecutionexception.
private void check() {
String endpoint = "some_url";
int number = 1;
int pages = 0;
do {
ExecutorService executorService = Executors.newFixedThreadPool(10);
for (int i = 1; i <= retryCount; i++) {
try {
HttpEntity<String> requestEntity =
new HttpEntity<String>(getBody(number), getHeader());
ResponseEntity<String> responseEntity =
HttpClient.getInstance().getClient()
.exchange(URI.create(endpoint), HttpMethod.POST, requestEntity, String.class);
String jsonInput = responseEntity.getBody();
Process response = objectMapper.readValue(jsonInput, Process.class);
pages = (int) response.getPaginationResponse().getTotalPages();
List<Task> tasks = response.getTasks();
if (pages <= 0 || tasks.isEmpty()) {
continue;
}
// want to parallelize this for loop
for (Task task : tasks) {
Callable<Void> c = new Callable<>() {
public void call() {
if (!task.getCategories().isEmpty() && task.getEventList() != null
&& task.getMetaInfo() != null) {
// my code here
}
}
};
executorService.submit(c);
}
// is this at right place? because I am getting rejectedexecutionexception
executorService.shutdown();
number++;
break;
} catch (Exception ex) {
// log exception
}
}
} while (number <= pages);
}

You do not have to output something from your parallel code. You just take the body of the outer loop and create a task for each item, like this:
for (Task task : tasks) {
Callable<Void> c = new Callable<>() {
public void call() {
if (task.getCategories().isEmpty() || task.getEventList() == null || task.getMetaInfo() == null) {
// ... rest of code here
}
}
};
executorService.submit(c);
}
// wait for executor service, check for exceptions or whatever else you want to do here

Related

Observe livedata inside a for loop

I want to update a list in my activity that depends on the data of another list. Both the data list are being observed from the activity from the my viewmodel. After I get the data from my firstlist I need to run a for loop on this list to get the required ids and get the data for the second list.
But keeping the livedata observer in the for loop is causing a lot of problems. The for loop runs as expected but the livedata observer is getting called almost double the amount of the for loop. This happens only the first time when the list in being brought from the api. When I do the same operation a second time where the list is cached and is being brought from the database, the problem does not occur. Below is the source code for the problem,
for (int i = 0; i < firstList.size(); i++) {
final String uId = firstList.get(i).item.uid;
final long id = firstList.get(i).item.id;
viewModel.initAnotherItemRepository(uId, id);
viewModel.getSecondItem().observe(this, new Observer<Resource<List<SecondItem>>>() {
#Override
public void onChanged(Resource<List<SecondItem>> listResource) {
if (listResource.data != null) {
secondItemList.addAll(listResource.data);
if (count == firstList.size() - 1) {
//Do something
}
count = count + 1;
}
if (listResource.state == Resource.STATE_FAILURE) {
showLoadingSpinner(false);
}
}
}
);
}
Try to observe SecondItem outside the for loop. It gets data whenever update
viewModel.getSecondItem().observe(this, new Observer<Resource<List<SecondItem>>>() {
#Override
public void onChanged(Resource<List<SecondItem>> listResource) {
if (listResource.data != null) {
secondItemList.addAll(listResource.data);
if (count == firstList.size() - 1) {
//Do something
}
count = count + 1;
}
if (listResource.state == Resource.STATE_FAILURE) {
showLoadingSpinner(false);
}
}
}
);
for (int i = 0; i < firstList.size(); i++) {
final String uId = firstList.get(i).item.uid;
final long id = firstList.get(i).item.id;
viewModel.initAnotherItemRepository(uId, id);
}

Right way to combine group of collections

I have done some code to combine in parallel group of collections which contains pairs[String,Integer], Example
Thread 1
[Car,1][Bear,1][Car,1]
Thread 2
[River,1][Car,1][River,1]
Result should be collections of each unique pair key (sorted alphabetically)
[Bear,1]
[Car,1][Car,1][Car,1]
[River,1][River,1][River,1]
My solution to do this like what shown below but sometime i don't get expected result or ConcurrentModificationException gets thrown from the list that contains result collections
List<Collection<Pair<String, Integer>>> combiningResult = new ArrayList<>();
private void startMappingPhase() throws Exception {
SimpleDateFormat formatter = new SimpleDateFormat("HH:mm:ss.SSS");
Invoker invoker = new Invoker(mappingClsPath, "Mapping", "mapper");
List<Callable<Integer>> tasks = new ArrayList<>();
for (String line : fileLines) {
tasks.add(() -> {
try {
combine((Collection<Pair<String, Integer>>) invoker.invoke(line));
} catch (Exception e) {
e.printStackTrace();
executor.shutdownNow();
errorOccurred = true;
return 0;
}
return 1;
});
if (errorOccurred)
Utils.showFatalError("Some error occurred, See log for more detalis");
}
long start = System.nanoTime();
System.out.println(tasks.size() + " Tasks");
System.out.println("Started at " + formatter.format(new Date()) + "\n");
executor.invokeAll(tasks);
long elapsedTime = System.nanoTime() - start;
partitioningResult.forEach(c -> {
System.out.println(c.size() + "\n" + c);
});
System.out.print("\nFinished in " + (elapsedTime / 1_000_000_000.0) + " milliseconds\n");
}
private void partition(Collection<Pair<String, Integer>> pairs) {
Set<Pair<String, Integer>> uniquePairs = new LinkedHashSet<>(pairs);
for (Pair<String, Integer> uniquePair : uniquePairs) {
int pFrequencyCount = Collections.frequency(pairs, uniquePair);
Optional<Collection<Pair<String, Integer>>> collResult = combiningResult.stream().filter(c -> c.contains(uniquePair)).findAny();
if (collResult.isPresent()) {
collResult.ifPresent(c -> {
for (int i = 0; i < pFrequencyCount; i++)
c.add(uniquePair);
});
} else {
Collection<Pair<String, Integer>> newColl = new ArrayList<>();
for (int i = 0; i < pFrequencyCount; i++)
newColl.add(uniquePair);
combiningResult.add(newColl);
}
}
}
I tried CopyOnWriteList insisted of ArrayList but sometimes it gets incomplete result like
[Car,1][Car,1] insisted of three entries, My question
Is there a way to achieve what I'm trying to do without getting ConcurrentModificationException and incomplete result?
An example image
If you are trying to modify a single collections from multiple threads you will need to add a synchronized block or use one of the JDK classes supporting concurrency. These will typically perform better than a synchronized block.
https://docs.oracle.com/javase/tutorial/essential/concurrency/collections.html

Thread.sleep blocks other Thread

I have a Output class which just prints everything that it gets to print.
public class Output {
private static List<String> textList = new ArrayList<>();
private static Output output = null;
private Output() {
Runnable task = () -> {
int lastIndex = 0;
while (true) {
while (lastIndex < textList.size()) {
System.out.println(lastIndex + " - " + textList.size() + ": " + textList.get(lastIndex));
outputText(textList.get(lastIndex));
lastIndex ++;
}
}
};
new Thread(task).start();
}
private static void outputText(String text) {
synchronized (System.out) {
System.out.println(text);
}
}
public static void say(String text) {
if (output == null) {
output = new Output();
}
textList.add(text);
}
}
When I add something to print, everything works fine:
for (int i = 0; i < 10; i++) {
Output.say("" + i);
}
But when I add a Thread.sleep to the loop it stops on the first output:
for (int i = 0; i < 10; i++) {
Output.say("" + i);
Thread.sleep(100);
}
How can I prevent it? I mean, I'm stopping with sleep just the main thread and not the separate thread.
When you don’t synchronize threads correctly, there is no guaranty that threads see updates made by other threads. They may either completely miss updates or see only parts of them, creating an entirely inconsistent result. Sometimes they may even appear to do the right thing. Without proper synchronization (in the sense of any valid construct specified to be thread safe), this is entirely unpredictable.
Sometimes, the chances of seeing a particular behavior are higher, like in your example. In most runs, the loop without sleep will complete before the other thread even starts its work, whereas inserting sleep raises the chance of lost updates after the second thread has seen values. Once the second thread has seen a value for textList.size(), it might reuse the value forever, evaluating lastIndex < textList.size() to false and executing the equivalent of while(true) { }.
It’s funny that the only place where you inserted a construct for thread safety, is the method outputText that is called by a single thread only (and printing to System.out is synchronized internally in most environments anyway).
Besides, it’s not clear why you are creating an object of type Output that has no relevance here, as all fields and methods are static.
Your code can be corrected and simplified to
public static void main(String[] args) throws InterruptedException {
List<String> textList = new ArrayList<>();
new Thread( () -> {
int index=0;
while(true) synchronized(textList) {
for(; index<textList.size(); index++)
System.out.println(textList.get(index));
}
}).start();
for (int i = 0; i < 10; i++) {
synchronized(textList) {
textList.add(""+i);
}
Thread.sleep(100);
}
}
though it still contains the issues of you original code of never terminating due to the infinite second thread and also burning the CPU with a polling loop. You should let the second thread wait for new items and add a termination condition:
public static void main(String[] args) throws InterruptedException {
List<String> textList = new ArrayList<>();
new Thread( () -> {
synchronized(textList) {
for(int index=0; ; index++) {
while(index>=textList.size()) try {
textList.wait();
} catch(InterruptedException ex) { return; }
final String item = textList.get(index);
if(item==null) break;
System.out.println(item);
}
}
}).start();
for (int i = 0; i < 10; i++) {
synchronized(textList) {
textList.add(""+i);
textList.notify();
}
Thread.sleep(100);
}
synchronized(textList) {
textList.add(null);
textList.notify();
}
}
This is still only an academic example that you shouldn’t use in real life code. There are classes for thread safe data exchange provided by the Java API removing the burden of implementing such things yourself.
public static void main(String[] args) throws InterruptedException {
ArrayBlockingQueue<String> queue = new ArrayBlockingQueue<>(10);
String endMarker = "END-OF-QUEUE"; // the queue does not allow null
new Thread( () -> {
for(;;) try {
String item = queue.take();
if(item == endMarker) break;// don't use == for ordinary strings
System.out.println(item);
} catch(InterruptedException ex) { return; }
}).start();
for (int i = 0; i < 10; i++) {
queue.put(""+i);
Thread.sleep(100);
}
queue.put(endMarker);
}

How to set two Java Arraylist values into one Java object?

I have an issue while working with java ArrayList. Here is the brief description:
By making a web service call, I will get all the videos around 900+ as Java objects. These Java objects are lacking some of the required information. So I am again making a call to another web service by passing the video id. This also returns Java objects.
I am storing the first web service call values and the second web service call values into two different Java ArrayLists as below:
List mediaList = new ArrayList();
List mediaVOs = new ArrayList();
Finally I am writing a method by passing two lists and setting those values into one java object. This should return the total objects around 942. But this is returning some odd number 887364 instead of 942 count.
Please help me resolving the issue. Here is the code:
client = getClient();
if (client != null) {
List<MediaEntry> mediaList = getAllMedia();
if (mediaList.size() >= 1) {
System.out.println("Total Media ------>" + mediaList.size());
MetadataListResponse metadataListResponse = null;
Media mediaVO = null;
List<List<String>> metadataValues = new ArrayList<List<String>>();
List<String> categoriesList = new ArrayList<String>();
List<String> accountNamesList = new ArrayList<String>();
List<String> ownerNamesList = new ArrayList<String>();
List<String> countryList = new ArrayList<String>();
List<String> languageList = new ArrayList<String>();
for(MediaEntry entry:mediaList) {
if(entry != null) {
metadataListResponse = getMetadata(entry.id);
if (metadataListResponse.totalCount >= 1) {
mediaVO = new Media();
List<Metadata> metadataObjs = metadataListResponse.objects;
if (metadataObjs != null
&& metadataObjs.size() > 0) {
for (int i = 0; i < metadataObjs.size(); i++) {
Metadata metadata = metadataObjs
.get(i);
if (metadata != null) {
if (metadata.xml != null) {
metadataValues = parseXml(metadata.xml);
if (metadataValues.size() != 0) {
categoriesList = metadataValues
.get(0);
accountNamesList = metadataValues.get(1);
ownerNamesList = metadataValues.get(2);
countryList = metadataValues.get(3);
languageList = metadataValues.get(4);
if (categoriesList.size() == 1) {
for (String categoryName : categoriesList) {
//System.out
//.println("categoryName"+categoryName);
mediaVO.setCategories(categoryName);
}
}
if (accountNamesList.size() == 1) {
for (String accountName : accountNamesList) {
//System.out
//.println("accountName"+accountName);
mediaVO.setAccountName(accountName);
}
}
if (ownerNamesList.size() == 1) {
for (String ownerName : ownerNamesList) {
//System.out
//.println("ownerName"+ownerName);
mediaVO.setOwnerName(ownerName);
}
}
if (countryList.size() == 1) {
for (String country : countryList) {
//System.out
//.println("country"+country);
mediaVO.setCountry(country);
}
}
if (languageList.size() == 1) {
for (String language : languageList) {
//System.out
//.println("language"+language);
mediaVO.setLanguage(language);
}
}
}
}
}
}
}
}
mediaVOs.add(mediaVO);
}
}
System.out.println("mediaVOs.size()------>"+mediaVOs.size());
List<Media> medias = setMediaVO(mediaList, mediaVOs);
if(medias.size() >= 1) {
System.out.println("Final medias size ------>"+medias.size());
mediaXml = convertToXml(medias);
System.out.println("Final Media XML converted ------->"+mediaXml);
Document doc = convertStrToDoc(mediaXml);
}
}
}
private List<Media> setMediaVO(List<MediaEntry> mediaList,List<Media> mediaList1) {
if(mediaList.size() >= 1) {
if(mediaList1.size() >= 1) {
for(MediaEntry media:mediaList) {
for(Media media1:mediaList1) {
Media mediaVO = new Media();
MediaType mediaType = media.mediaType;
mediaVO.setMediaId(media.id);
mediaVO.setMediaName(media.name);
mediaVO.setMediaDesc(media.description);
mediaVO.setCreatedDate(media.createdAt);
mediaVO.setCreditUserName(media.creditUserName);
mediaVO.setDataUrl(media.dataUrl);
mediaVO.setDownloadUrl(media.dataUrl);
mediaVO.setDuration(media.duration);
mediaVO.setEndDate(media.endDate);
mediaVO.setEntitledUsersEdit(media.entitledUsersEdit);
mediaVO.setEntitledUsersPublish(media.entitledUsersPublish);
mediaVO.setLastPlayedAt(media.lastPlayedAt);
mediaVO.setMediaType(mediaType.toString());
mediaVO.setUpdatedDate(media.updatedAt);
mediaVO.setPlays(media.plays);
mediaVO.setViews(media.views);
mediaVO.setCategories(media1.getCategories());
mediaVO.setAccountName(media1.getAccountName());
mediaVO.setOwnerName(media1.getOwnerName());
mediaVO.setCountry(media1.getCountry());
mediaVO.setLanguage(media1.getLanguage());
medias.add(mediaVO);
}
}
}
}
return medias;
}
Thanks,
Raji
Your problem is here :
for(MediaEntry media:mediaList) {
for(Media media1:mediaList1) {
For each MediaEntry, you're looping on each Media, which means you'll execute the code inside 942 * 942 times, while what you want is to execute it 942 times. You've got to match MediaEntries with Media and execute the code once.
Let me try to explain this in a way where everybody understands what i mean.
The problem is indeed the fact that you multiply 942 by itself.
This happens cause of the following code:
private List<Media> setMediaVO(List<MediaEntry> mediaList,List<Media> mediaList1) {
if(mediaList.size() >= 1) {
if(mediaList1.size() >= 1) {
for(MediaEntry media:mediaList) {
for(Media media1:mediaList1) {
//Do stuff
}
}
}
}
return medias;
}
Here you loop though medialist 1 for each item in medialist and do stuff with it.
At the end of this code you add each entry found in medialist 1 to a other list but this happend 942 times per item in the first list.
And since that list has 942 items you get the "odd" number of 887.364.

Converting Java code into Groovy

I am trying to convert a Java function into equivalent Groovy code, but I am not able to find anything which does && operation in loop. Can anyone guide me through..
So far this is what I got
public List getAlert(def searchParameters, def numOfResult) throws UnsupportedEncodingException
{
List respList=null
respList = new ArrayList()
String[] searchStrings = searchParameters.split(",")
try
{
for(strIndex in searchStrings)
{
IQueryResult result = search(searchStrings[strIndex])
if(result!=null)
{
def count = 0
/*The below line gives me error*/
for(it in result.document && count < numOfResult)
{
}
}
}
}
catch(Exception e)
{
e.printStackTrace()
}
}
My Java code
public List getAlert(String searchParameters, int numOfResult) throws UnsupportedEncodingException
{
List respList = null
respList = new ArrayList()
String[] searchStrings = searchParameters.split(",")
try {
for (int strIndex = 0; strIndex < searchStrings.length; strIndex++) {
IQueryResult result = search(searchStrings[strIndex])
if (result != null) {
ListIterator it = result.documents()
int count = 0
while ((it.hasNext()) && (count < numOfResult)) {
IDocumentSummary summary = (IDocumentSummary)it.next()
if (summary != null) {
String docid = summary.getSummaryField("infadocid").getStringValue()
int index = docid.indexOf("#")
docid = docid.substring(index + 1)
String url = summary.getSummaryField("url").getStringValue()
int i = url.indexOf("/", 8)
String endURL = url.substring(i + 1, url.length())
String body = summary.getSummaryField("infadocumenttitle").getStringValue()
String frontURL = produrl + endURL
String strURL
strURL = frontURL
strURL = body
String strDocId
strDocId = frontURL
strDocId = docid
count++
}
}
}
result = null
}
} catch (Exception e) {
e.printStackTrace()
return respList
}
return respList
}
It seems to me like
def summary = result.documents.first()
if (summary) {
String docid = summary.getSummaryField("infadocid").getStringValue()
...
strDocId = docid
}
is all you really need, because the for loop actually doesn't make much sense when all you want is to process the first record.
If there is a possibility that result.documents contains nulls, then replace first() with find()
Edit: To process more than one result:
def summaries = result.documents.take(numOfResult)
// above code assumes result.documents contains no nulls; otherwise:
// def count=0
// def summaries = result.documents.findAll { it && count++<numOfResult }
summaries.each { summary ->
String docid = summary.getSummaryField("infadocid").getStringValue()
...
strDocId = docid
}
In idiomatic Groovy code, many loops are replace by iterating methods like each()
You know the while statement also exists in Groovy ?
As a consequence, there is no reason to transform it into a for loop.
/*The below line gives me error*/
for(it in result.document && count < 1)
{
}
This line is giving you an error, because result.document will try to call result.getDocument() which doesn't exist.
Also, you should avoid using it as a variable name in Groovy, because within the scope of a closure it is the default name of the first closure parameter.
I haven't looked at the code thoroughly (or as the kids say, "tl;dr"), but I suspect if you just rename the file from .java to .groovy, it will probably work.

Categories

Resources