I want to make parallelism each parent and child entities, in a process which must be return quickly childEntities. So I couldn't decide clearly, which way is suitable for this process. Because in that parallel threads also calls http call and springdataRepository's save method one time(I will manage thread size because of JDBC connection pool size).
By the way, I have just tried RxJava-2 library yet.
I expected that -> If a parallel flow process throws an exception, onErrorResumeNextmethod (or near something) must be go on and complete all process after exception. But it suspends the flow completely.
So what I need -> Completely Non/Blocking parallel flows, if one of throws exception, just catch it and then continue the rest of the parallel process.
Any ideas ? Any other solution ideas is acceptable.(Like manual thread management)
That is what I tried, but not working as expected.
package com.mypackage;
import io.reactivex.Flowable;
import io.reactivex.schedulers.Schedulers;
import lombok.extern.slf4j.Slf4j;
import java.util.ArrayList;
import java.util.List;
#Slf4j
public class TestApp {
public static void main(String[] args) {
long start = System.currentTimeMillis();
List<String> createdParentEntities = new ArrayList<>();
List<String> erroredResponses = new ArrayList<>();
List<String> childEntities = new ArrayList<>();
Flowable.range(1, 100) // 100: is not fixed normalle
.parallel(100) // It will be changed according to size
.runOn(Schedulers.io())
.map(integer -> createParentEntity(String.valueOf(integer)))
.sequential()
.onErrorResumeNext(t -> {
System.out.println(t.getMessage());
if (t instanceof Exception) {
erroredResponses.add(t.getMessage());
return Flowable.empty();
} else {
return Flowable.error(t);
}
})
.blockingSubscribe(createdParentEntities::add);
if (!createdParentEntities.isEmpty()) {
Flowable.fromIterable(createdParentEntities)
.parallel(createdParentEntities.size())
.runOn(Schedulers.io())
.doOnNext(TestApp::createChildEntity)
.sequential()
.blockingSubscribe(childEntities::add);
}
System.out.println("====================");
long time = System.currentTimeMillis() - start;
log.info("Total Time : " + time);
log.info("TOTAL CREATED ENTITIES : " + createdParentEntities.size());
log.info("CREATED ENTITIES " + createdParentEntities.toString());
log.info("ERRORED RESPONSES " + erroredResponses.toString());
log.info("TOTAL ENTITIES : " + childEntities.size());
}
public static String createParentEntity(String id) throws Exception {
Thread.sleep(1000); // Simulated for creation call
if (id.equals("35") || id.equals("75")) {
throw new Exception("ENTITIY SAVE ERROR " + id);
}
log.info("Parent entity saved : " + id);
return id;
}
public static String createChildEntity(String parentId) throws Exception {
Thread.sleep(1000);// Simulated for creation call
log.info("Incoming entity: " + parentId);
return "Child Entity: " + parentId + " parentId";
}
}
Related
I am evaluating Ignite as a caching layer for our architecture. When trying out Ignite Java thin client for the use case mentioned below, I do not find any pointer(s) in ignite doc/any forum as to how this is being tackled by the ignite community. Any pointer(s) will be helpful before I go ahead and use my custom solution.
Use case: All nodes in an ignite cluster go down and come back up. Basically, thin client loses connection to all cluster nodes for some time.
What I was expecting
I am using continuous query and register for disconnect events. Hence, I was expecting some disconnect event which I never got. Reference code below.
public static QueryCursor<Cache.Entry<String, String>> subscribeForDataUpdates(ClientCache<String, String> entityCache,
AtomicLong totalUpdatesTracker) {
ClientDisconnectListener disconnectListener = reason ->
System.out.printf("Client: %s received disconnect event with reason:%s %n",
getClientIpAddr(),
reason.getMessage());
ContinuousQuery<String, String> continuousQuery = new ContinuousQuery<>();
continuousQuery.setLocalListener(new CacheUpdateListener(entityCache.getName(), totalUpdatesTracker));
QueryCursor<Cache.Entry<String, String>> queryCursor = entityCache.query(continuousQuery, disconnectListener);
System.out.printf("Client: %s - subscribed for change notification(s) for entity cache: %s %n",
getClientIpAddr(),
entityCache.getName());
return queryCursor;
}
What I ended up doing
Writing my own checker to re-initialize the thin client connection to ignite cluster and re-subscribing for continuous query updates.
import io.vavr.control.Try;
import org.apache.ignite.cache.query.QueryCursor;
import org.apache.ignite.client.IgniteClient;
import javax.cache.Cache;
import javax.inject.Inject;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import static com.cisco.ignite.consumer.CacheChangeSubscriber.subscribeForDataUpdates;
import static com.cisco.ignite.consumer.Utils.addShutDownHookToCloseCacheUpdates;
import static com.cisco.ignite.consumer.Utils.getClientIpAddr;
public class ClusterConnectionChecker implements Runnable {
private static final List<QueryCursor<Cache.Entry<String, String>>> querySubscriptions = new ArrayList<>();
#Inject
private CacheChangeSubscriber cacheChangeSubscriber;
private IgniteClient thinClientInstance;
private final long secondsDelayBetweenChecks;
private final List<String> cacheNames;
private final AtomicLong totalUpdatesTracker;
private boolean needsReSubscription = false;
public ClusterConnectionChecker(IgniteClient client, long delayBetweenChecks,
List<String> cacheNames, AtomicLong totalUpdatesTracker) {
this.thinClientInstance = client;
this.secondsDelayBetweenChecks = delayBetweenChecks;
this.cacheNames = cacheNames;
this.totalUpdatesTracker = totalUpdatesTracker;
}
#Override
public void run() {
while(!Thread.interrupted()) {
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(secondsDelayBetweenChecks));
boolean isClusterConnectionActive = isConnectionToClusterActive();
if (!isClusterConnectionActive) {
needsReSubscription = true;
System.out.printf("Time: %s | Connection to ignite cluster is not active !!! %n",
LocalDateTime.now());
reInitializeThinClient();
reSubscribeForUpdates();
} else {
// we only need to conditionally re-subscribe
if (needsReSubscription) {
reSubscribeForUpdates();
}
}
} catch (InterruptedException ie) {
// do nothing - just reset the interrupt flag.
Thread.currentThread().interrupt();
}
}
}
private boolean isConnectionToClusterActive() {
return Try.of(() -> {
return thinClientInstance.cluster().state().active();
}).recover(ex -> {
return false;
}).getOrElse(false);
}
private void reInitializeThinClient() {
Try.of(() -> {
thinClientInstance = cacheChangeSubscriber.createThinClientInstance();
if (thinClientInstance.cluster().state().active()) {
System.out.printf("Client: %s | Thin client instance was re-initialized since it was not active %n",
getClientIpAddr());
}
return thinClientInstance;
}).onFailure(th -> System.out.printf("Client: %s | Failed to re-initialize ignite cluster connection. " +
"Will re-try after:%d seconds %n", getClientIpAddr(),secondsDelayBetweenChecks));
}
private void reSubscribeForUpdates() {
if (isConnectionToClusterActive()) {
System.out.printf("Client: %s | Re-subscribing for cache updates after cluster connection re-init... %n",
getClientIpAddr());
// re-set the counter to 0 since we are re-subscribing fresh
totalUpdatesTracker.set(0);
cacheNames.forEach(name -> querySubscriptions.add(subscribeForDataUpdates(
thinClientInstance.getOrCreateCache(name),
totalUpdatesTracker)));
addShutDownHookToCloseCacheUpdates(querySubscriptions, thinClientInstance);
needsReSubscription = false;
}
}
}
I search for a way to correctly employ Publishers from Project Reactor without producing useless GC pressure by instantiating the whole pipeline on each call.
In our code a typical handle function answering inter service HTTP requests looks like so:
final List<Function<ChangeEvent, Mono<Void>>> triggerOtherMicroservices;
#PostMapping("/handle")
public Mono<Void> handle(#RequestBody ChangeEvent changeEvent) {
return Mono
.defer(() -> someService.callToAnotherMicroServiceToFetchData(changeEvent))
.subscribeOn(Schedulers.parallel())
.map(this::mapping)
.flatMap(data -> databaseService.save(data))
.thenMany(Flux.fromIterable(triggerOtherMicroservices).flatMap(t -> t.apply(changeEvent)))
.then();
}
If I understand correctly this means, on each invocation of handle the whole pipeline (which normally has huge stacktraces) needs to be instantiated (and thus collected later).
My question is: Can I somehow "prepare" the whole flow once and reuse it later?
I was thinking about something like Mono.create( ... ) ..... Or, am I completely wrong and there is no need to think about optimization here?
EDIT:
Thinking further I could do:
final List<Function<ChangeEvent, Mono<Void>>> triggerOtherMicroservices;
final Mono<Void> mono = Mono
.defer(() -> Mono
.subscriberContext()
.map(context -> context.get("event"))
.flatMap(event -> someService.callToAnotherMicroServiceToFetchData(event))
)
.subscribeOn(Schedulers.parallel())
.flatMap(data -> databaseService.save(data))
.thenMany(Mono
.subscriberContext()
.map(context -> context.get("event"))
.flatMap(event -> Flux
.fromIterable(triggerOtherMicroservices)
.flatMap(t -> t.apply(event)))
)
.then();
public Mono<Void> handle(#Validated ChangeEvent changeEvent) throws NoSuchElementException {
return mono.subscriberContext(context -> context.put("event", changeEvent));
}
Anyway, I doubt this is what subscriberContext is meant for.
Note: There are many JVM implementations and this answer doesn't claim to have tested all of them, nor to be a general statement for all possible situations.
According to https://www.bettercodebytes.com/the-cost-of-object-creation-in-java-including-garbage-collection/, it is possible that there is no overhead of object creation when objects only live within a method. This is, since the JIT doesn't actually instantiate the object but rather executes the contained methods directly.
Hence, there is also no garbage collection required later on.
A test of this combined with the question can be implemented like so:
Controller:
final List<Function<Event, Mono<Void>>> triggerOtherMicroservices = Arrays.asList(
event -> Mono.empty(),
event -> Mono.empty(),
event -> Mono.empty()
);
final Mono<Void> mono = Mono
.defer(() -> Mono
.subscriberContext()
.<Event>map(context -> context.get("event"))
.flatMap(this::fetch)
)
.subscribeOn(Schedulers.parallel())
.flatMap(this::duplicate)
.flatMap(this::duplicate)
.flatMap(this::duplicate)
.flatMap(this::duplicate)
.thenMany(Mono
.subscriberContext()
.<Event>map(context -> context.get("event"))
.flatMapMany(event -> Flux
.fromIterable(triggerOtherMicroservices)
.flatMap(t -> t.apply(event))
)
)
.then();
#PostMapping("/event-prepared")
public Mono<Void> handle(#RequestBody #Validated Event event) throws NoSuchElementException {
return mono.subscriberContext(context -> context.put("event", event));
}
#PostMapping("/event-on-the-fly")
public Mono<Void> handleOld(#RequestBody #Validated Event event) throws NoSuchElementException {
return Mono
.defer(() -> fetch(event))
.subscribeOn(Schedulers.parallel())
.flatMap(this::duplicate)
.flatMap(this::duplicate)
.flatMap(this::duplicate)
.flatMap(this::duplicate)
.thenMany(Flux.fromIterable(triggerOtherMicroservices).flatMap(t -> t.apply(event)))
.then();
}
private Mono<Data> fetch(Event event) {
return Mono.just(new Data(event.timestamp));
}
private Mono<Data> duplicate(Data data) {
return Mono.just(new Data(data.a * 2));
}
Data:
long a;
public Data(long a) {
this.a = a;
}
#Override
public String toString() {
return "Data{" +
"a=" + a +
'}';
}
Event:
#JsonSerialize(using = EventSerializer.class)
public class Event {
UUID source;
long timestamp;
#JsonCreator
public Event(#JsonProperty("source") UUID source, #JsonProperty("timestamp") long timestamp) {
this.source = source;
this.timestamp = timestamp;
}
#Override
public String toString() {
return "Event{" +
"source=" + source +
", timestamp=" + timestamp +
'}';
}
}
EventSerializer:
public class EventSerializer extends StdSerializer<Event> {
public EventSerializer() {
this(null);
}
public EventSerializer(Class<Event> t) {
super(t);
}
#Override
public void serialize(Event value, JsonGenerator jsonGenerator, SerializerProvider provider) throws IOException {
jsonGenerator.writeStartObject();
jsonGenerator.writeStringField("source", value.source.toString());
jsonGenerator.writeNumberField("timestamp", value.timestamp);
jsonGenerator.writeEndObject();
}
}
and finally the test itself:
#SpringBootTest
#AutoConfigureWebTestClient
class MonoAssemblyTimeTest {
#Autowired
private WebTestClient webTestClient;
final int number_of_requests = 500000;
#Test
void measureExecutionTime() throws IOException {
measureExecutionTime("on-the-fly");
measureExecutionTime("prepared");
}
private void measureExecutionTime(String testCase) throws IOException {
warmUp("/event-" + testCase);
final GCStatisticsDifferential gcStatistics = new GCStatisticsDifferential();
long[] duration = benchmark("/event-" + testCase);
StringBuilder output = new StringBuilder();
int plotPointsInterval = (int) Math.ceil((float) number_of_requests / 1000);
for (int i = 0; i < number_of_requests; i++) {
if (i % plotPointsInterval == 0) {
output.append(String.format("%d , %d %n", i, duration[i]));
}
}
Files.writeString(Paths.get(testCase + ".txt"), output.toString());
long totalDuration = LongStream.of(duration).sum();
System.out.println(testCase + " duration: " + totalDuration / 1000000 + " ms.");
System.out.println(testCase + " average: " + totalDuration / number_of_requests + " ns.");
System.out.println(testCase + ": " + gcStatistics.get());
}
private void warmUp(String path) {
UUID source = UUID.randomUUID();
IntStream.range(0, number_of_requests).forEach(i -> call(new Event(source, i), path));
System.out.println("done with warm-up for path: " + path);
}
private long[] benchmark(String path) {
long[] duration = new long[number_of_requests];
UUID source = UUID.randomUUID();
IntStream.range(0, number_of_requests).forEach(i -> {
long start = System.nanoTime();
call(new Event(source, i), path).returnResult().getResponseBody();
duration[i] = System.nanoTime() - start;
});
System.out.println("done with benchmark for path: " + path);
return duration;
}
private WebTestClient.BodySpec<Void, ?> call(Event event, String path) {
return webTestClient
.post()
.uri(path)
.contentType(MediaType.APPLICATION_JSON)
.bodyValue(event)
.exchange()
.expectBody(Void.class);
}
private static class GCStatisticsDifferential extends GCStatistics {
GCStatistics old = new GCStatistics(0, 0);
public GCStatisticsDifferential() {
super(0, 0);
calculateIncrementalGCStats();
}
public GCStatistics get() {
calculateIncrementalGCStats();
return this;
}
private void calculateIncrementalGCStats() {
long timeNew = 0;
long countNew = 0;
for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
long count = gc.getCollectionCount();
if (count >= 0) {
countNew += count;
}
long time = gc.getCollectionTime();
if (time >= 0) {
timeNew += time;
}
}
time = timeNew - old.time;
count = countNew - old.count;
old = new GCStatistics(timeNew, countNew);
}
}
private static class GCStatistics {
long count, time;
public GCStatistics(long count, long time) {
this.count = count;
this.time = time;
}
#Override
public String toString() {
return "GCStatistics{" +
"count=" + count +
", time=" + time +
'}';
}
}
}
The results are not always the same, but the "on-the-fly" method constantly outperforms the "prepared" method. Plus, the "on-the-fly" method has way less garbage collections.
A typical result looks like:
done with warm-up for path: /event-on-the-fly
done with benchmark for path: /event-on-the-fly
on-the-fly duration: 42679 ms.
on-the-fly average: 85358 ns.
on-the-fly: GCStatistics{count=29, time=128}
done with warm-up for path: /event-prepared
done with benchmark for path: /event-prepared
prepared duration: 44678 ms.
prepared average: 89357 ns.
prepared: GCStatistics{count=86, time=67}
This result were done on a MacBook Pro (16-inch, 2019), 2,4 GHz 8-Core Intel Core i9, 64 GB 2667 MHz DDR4.
Note: Comments, better answers, or ... are still very welcome.
First, take some measurment to decide if GC pressure is really high and is worth to bother.
Then, use some object-oriented library which allows you to explicitely create pipline objects, and reuse it for multiple requests. Look at Vert.x, for example (I never use it). My library Df4j allows to create and reuse computational graph of any topology, not only linear pipelines , but it does not contain modules to perform HTTP requests. However, Df4j implements reactive streams protocol and so can be connected to any compatible network library.
I'm getting into RxJava and am looking for a good way to share a number of BehaviourSubjects with multiple subscribers. Each BehaviourSubject is identified by a unique subject and only one subscription should be made to the back end for each subject.
If there are no current subscribers for the BehaviourSubject it should unsubscribe from the back end.
The following code does what I want, but the MyFakeService class lacks the elegance I that RxJava promises.
package au.play;
import io.reactivex.Observable;
import io.reactivex.disposables.Disposable;
import io.reactivex.functions.Consumer;
import io.reactivex.observers.DisposableObserver;
import io.reactivex.subjects.BehaviorSubject;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;
public class Demo {
public static class MyFakeBackEnd {
private final Observable<Long> FAKE_SOURCE = Observable.interval(30, 10, TimeUnit.MILLISECONDS);
public Observable<Long> getObservable(String subject) {
return FAKE_SOURCE;
}
}
public static class MyFakeService {
private final MyFakeBackEnd myFakeBackEnd = new MyFakeBackEnd();
private final Map<String, Observable<Long>> subjectMap = new ConcurrentHashMap<>();
public Observable<Long> getObservable(String subject) {
return subjectMap.computeIfAbsent(subject, (String key) -> {
BehaviorSubject<Long> behaviourSubject = BehaviorSubject.createDefault(-1L);
AtomicReference<Disposable> atomicDisposable = new AtomicReference<>();
return behaviourSubject
.doOnSubscribe(disposable -> {
System.out.println("First subscriber for <" + key + ">");
final DisposableObserver<Long> disposableObserver = new DisposableObserver<Long>() {
#Override
public void onNext(Long value) {
behaviourSubject.onNext(value);
}
#Override
public void onError(Throwable e) {
e.printStackTrace();
}
#Override
public void onComplete() {
System.out.println("Why complete?");
}
};
myFakeBackEnd.getObservable(subject).subscribeWith(disposableObserver);
atomicDisposable.set(disposableObserver);
})
.doOnDispose(() -> {
System.out.println("Last observer unsubscribed : <" + key + ">");
atomicDisposable.get().dispose();
behaviourSubject.onNext(-2L);
}).share();
});
}
}
public static void main(String[] args) throws InterruptedException {
MyFakeService service = new MyFakeService();
System.out.println("C-1 subscription, should trigger 'First subscriber for <firstSubject>' and then start receiving updates. Initial value should be -1");
Disposable firstDisposable = service.getObservable("firstSubject").subscribe(createConsumer("C-1"));
Thread.sleep(45);
System.out.println("C-2 subscription, should not trigger 'First subscriber for <firstSubject>'. Should receive same updates as C-1.");
Disposable secondDisposable = service.getObservable("firstSubject").subscribe(createConsumer("C-2"));
System.out.println("C-3 subscription, should trigger 'First subscriber for <secondSubject>' and then start receiving updates. Initial value should be -1");
Disposable thirdDisposable = service.getObservable("secondSubject").subscribe(createConsumer("C-3"));
Thread.sleep(45);
System.out.println("Dispose of C-1 subscription. C-2 should continue getting updates.");
firstDisposable.dispose();
Thread.sleep(45);
System.out.println("Dispose of C-2 subscription. Should trigger 'Last observer unsubscribed : <firstSubject>'.");
secondDisposable.dispose();
Thread.sleep(45);
System.out.println("Dispose of C-3 subscription. Should trigger 'Last observer unsubscribed : <secondSubject>'.");
thirdDisposable.dispose();
Thread.sleep(45);
System.out.println("C-4 subscription, should trigger 'First subscriber for <secondSubject>' and then start receiving updates. Initial value should be -2 as this subject has been subscribed to before.");
Disposable fourthDisposable = service.getObservable("secondSubject").subscribe(createConsumer("C-3"));
Thread.sleep(45);
fourthDisposable.dispose();
}
private static Consumer<Long> createConsumer(final String id) {
return (data) -> System.out.println(id + " : <" + data + ">");
}
}
It seems very likely that there is a better solution to this that I can't spot because I'm new to the framework. Any ideas?
So a little background;
I am working on a project in which a servlet is going to release crawlers upon a lot of text files within a file system. I was thinking of dividing the load under multiple threads, for example:
a crawler enters a directory, finds 3 files and 6 directories. it will start processing the files and start a thread with a new crawler for the other directories. So from my creator class I would create a single crawler upon a base directory. The crawler would assess the workload and if deemed needed it would spawn another crawler under another thread.
My crawler class looks like this
package com.fujitsu.spider;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
public class DocumentSpider implements Runnable, Serializable {
private static final long serialVersionUID = 8401649393078703808L;
private Spidermode currentMode = null;
private String URL = null;
private String[] terms = null;
private float score = 0;
private ArrayList<SpiderDataPair> resultList = null;
public enum Spidermode {
FILE, DIRECTORY
}
public DocumentSpider(String resourceURL, Spidermode mode, ArrayList<SpiderDataPair> resultList) {
currentMode = mode;
setURL(resourceURL);
this.setResultList(resultList);
}
#Override
public void run() {
try {
if (currentMode == Spidermode.FILE) {
doCrawlFile();
} else {
doCrawlDirectory();
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("SPIDER # " + URL + " HAS FINISHED.");
}
public Spidermode getCurrentMode() {
return currentMode;
}
public void setCurrentMode(Spidermode currentMode) {
this.currentMode = currentMode;
}
public String getURL() {
return URL;
}
public void setURL(String uRL) {
URL = uRL;
}
public void doCrawlFile() throws Exception {
File target = new File(URL);
if (target.isDirectory()) {
throw new Exception(
"This URL points to a directory while the spider is in FILE mode. Please change this spider to FILE mode.");
}
procesFile(target);
}
public void doCrawlDirectory() throws Exception {
File baseDir = new File(URL);
if (!baseDir.isDirectory()) {
throw new Exception(
"This URL points to a FILE while the spider is in DIRECTORY mode. Please change this spider to DIRECTORY mode.");
}
File[] directoryContent = baseDir.listFiles();
for (File f : directoryContent) {
if (f.isDirectory()) {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.DIRECTORY, this.resultList);
spider.terms = this.terms;
(new Thread(spider)).start();
} else {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.FILE, this.resultList);
spider.terms = this.terms;
(new Thread(spider)).start();
}
}
}
public void procesDirectory(String target) throws IOException {
File base = new File(target);
File[] directoryContent = base.listFiles();
for (File f : directoryContent) {
if (f.isDirectory()) {
procesDirectory(f.getPath());
} else {
procesFile(f);
}
}
}
public void procesFile(File target) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(target));
String line;
while ((line = br.readLine()) != null) {
String[] words = line.split(" ");
for (String currentWord : words) {
for (String a : terms) {
if (a.toLowerCase().equalsIgnoreCase(currentWord)) {
score += 1f;
}
;
if (currentWord.toLowerCase().contains(a)) {
score += 1f;
}
;
}
}
}
br.close();
resultList.add(new SpiderDataPair(this, URL));
}
public String[] getTerms() {
return terms;
}
public void setTerms(String[] terms) {
this.terms = terms;
}
public float getScore() {
return score;
}
public void setScore(float score) {
this.score = score;
}
public ArrayList<SpiderDataPair> getResultList() {
return resultList;
}
public void setResultList(ArrayList<SpiderDataPair> resultList) {
this.resultList = resultList;
}
}
The problem I am facing is that in my root crawler I have this list of results from every crawler that I want to process further. The operation to process the data from this list is called from the servlet (or main method for this example). However the operations is always called before all of the crawlers have completed their processing. thus launching the operation to process the results too soon, which leads to incomplete data.
I tried solving this using the join methods but unfortunately I cant seems to figure this one out.
package com.fujitsu.spider;
import java.util.ArrayList;
import com.fujitsu.spider.DocumentSpider.Spidermode;
public class Main {
public static void main(String[] args) throws InterruptedException {
ArrayList<SpiderDataPair> results = new ArrayList<SpiderDataPair>();
String [] terms = {"SERVER","CHANGE","MO"};
DocumentSpider spider1 = new DocumentSpider("C:\\Users\\Mark\\workspace\\Spider\\Files", Spidermode.DIRECTORY, results);
spider1.setTerms(terms);
DocumentSpider spider2 = new DocumentSpider("C:\\Users\\Mark\\workspace\\Spider\\File2", Spidermode.DIRECTORY, results);
spider2.setTerms(terms);
Thread t1 = new Thread(spider1);
Thread t2 = new Thread(spider2);
t1.start();
t1.join();
t2.start();
t2.join();
for(SpiderDataPair d : spider1.getResultList()){
System.out.println("PATH -> " + d.getFile() + " SCORE -> " + d.getSpider().getScore());
}
for(SpiderDataPair d : spider2.getResultList()){
System.out.println("PATH -> " + d.getFile() + " SCORE -> " + d.getSpider().getScore());
}
}
}
TL:DR
I really wish to understand this subject so any help would be immensely appreciated!.
You need a couple of changes in your code:
In the spider:
List<Thread> threads = new LinkedList<Thread>();
for (File f : directoryContent) {
if (f.isDirectory()) {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.DIRECTORY, this.resultList);
spider.terms = this.terms;
Thread thread = new Thread(spider);
threads.add(thread)
thread.start();
} else {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.FILE, this.resultList);
spider.terms = this.terms;
Thread thread = new Thread(spider);
threads.add(thread)
thread.start();
}
}
for (Thread thread: threads) thread.join()
The idea is to create a new thread for each spider and start it. Once they are all running, you wait until each on is done before the Spider itself finishes. This way each spider thread keeps running until all of its work is done (thus the top thread runs until all children and their children are finished).
You also need to change your runner so that it runs the two spiders in parallel instead of one after another like this:
Thread t1 = new Thread(spider1);
Thread t2 = new Thread(spider2);
t1.start();
t2.start();
t1.join();
t2.join();
You should use a higher-level library than bare Thread for this task. I would suggest looking into ExecutorService in particular and all of java.util.concurrent generally. There are abstractions there that can manage all of the threading issues while providing well-formed tasks a properly protected environment in which to run.
For your specific problem, I would recommend some sort of blocking queue of tasks and a standard producer-consumer architecture. Each task knows how to determine if its path is a file or directory. If it is a file, process the file; if it is a directory, crawl the directory's immediate contents and enqueue new tasks for each sub-path. You could also use some properly-synchronized shared state to cap the number of files processed, depth, etc. Also, the service provides the ability to await termination of its tasks, making the "join" simpler.
With this architecture, you decouple the notion of threads and thread management (handled by the ExecutorService) with your business logic of tasks (typically a Runnable or Callable). The service itself has the ability to tune how to instantiate, such as a fixed maximum number of threads or a scalable number depending on how many concurrent tasks exist (See factory methods on java.util.concurrent.Executors). Threads, which are more expensive than the Runnables they execute, are re-used to conserve resources.
If your objective is primarily something functional that works in production quality, then the library is the way to go. However, if your objective is to understand the lower-level details of thread management, then you may want to investigate the use of latches and perhaps thread groups to manage them at a lower level, exposing the details of the implementation so you can work with the details.
I am new to the Java / Hibernate / Seam way of development but I appear to have a strange issue with Hibernate and concurrent threads.
I have a application scoped Seam component which is executed via EJB timers at a set interval (Orchestrator.java) calling the method startProcessingWorkloads.
This method has a injected EntityManager which it uses to check the database for a collection of data, and if it finds a work collection it creates a new Asynchronous Seam component (LoadContoller.java) and executes the start() method on the Controller
LoadController has EntityManager injected and use it to perform a very large transaction (About one hour)
Once the LoadController is running as a separate thread, the Orchestrator is still being executed as a thread at a set interval, so for example
1min
Orchestrator - Looks for work collection (None found) (thread 1)
2min
Orchestrator - Looks for work collection (finds one, Starts LoadController) (thread 1)
LoadController - Starts updating database records (thread 2)
3min
Orchestrator - Looks for work collection (None found) (thread 1)
LoadController - Still updating database records (thread 2)
4min
Orchestrator - Looks for work collection (None found) (thread 1)
LoadController - Still updating database records (thread 2)
5min
Orchestrator - Looks for work collection (None found) (thread 1)
LoadController - Done updating database records (thread 2)
6min
Orchestrator - Looks for work collection (None found) (thread 1)
7min
Orchestrator - Looks for work collection (None found) (thread 1)
However, I am receiving a intermittent error (See below) when the Orchestrator runs concurrently with the LoadController.
5:10:40,852 WARN [AbstractBatcher]
exception clearing
maxRows/queryTimeout
java.sql.SQLException: Connection is
not associated with a managed
connection.org.jboss.resource.adapter.jdbc.jdk6.WrappedConnectionJDK6#1fcdb21
This error is thrown after the Orchestrator has completed its SQl query and as the LoadController attempts to execute its next SQl query.
I did some research I came to the conclusion that the EntityManager was being closed hence the LoadController was unable to use it.
Now confused as to what exactly closed the connection I did some basic object dumps of the entity manager objects used by the Orchestrator and the LoadController when each of the components are called and I found that just before I receive the above error this happens.
2010-07-30 15:06:40,804 INFO
[processManagement.LoadController]
(pool-15-thread-2)
org.jboss.seam.persistence.EntityManagerProxy#7e3da1
2010-07-30 15:10:40,758 INFO
[processManagement.Orchestrator]
(pool-15-thread-1)
org.jboss.seam.persistence.EntityManagerProxy#7e3da1
It appears that during one of the Orchestrator execution intervals it obtains a reference to the same EntityManager that the LoadController is currently using. When the Orchestrator completes its SQL execution it closes the connection and than LoadController can no longer execute its updates.
So my question is, does any one know of this happening or having I got my threading all mucked up in this code?
From my understanding when injecting a EntityManager a new instance is injected from the EntityManagerFactory which remains with that particualr object until object leaves scope (in this case they are stateless so when the start() methods ends), how could the same instance of a entity manager be injected into two separate threads?
Orchestrator.java
#Name("processOrchestrator")
#Scope(ScopeType.APPLICATION)
#AutoCreate
public class Orchestrator {
//___________________________________________________________
#Logger Log log;
#In EntityManager entityManager;
#In LoadController loadController;
#In WorkloadManager workloadManager;
//___________________________________________________________
private int fProcessInstanceCount = 0;
//___________________________________________________________
public Orchestrator() {}
//___________________________________________________________
synchronized private void incrementProcessInstanceCount() {
fProcessInstanceCount++;
}
//___________________________________________________________
synchronized private void decreaseProcessInstanceCount() {
fProcessInstanceCount--;
}
//___________________________________________________________
#Observer("controllerExceptionEvent")
synchronized public void controllerExceptionListiner(Process aProcess, Exception aException) {
decreaseProcessInstanceCount();
log.info(
"Controller " + String.valueOf(aProcess) +
" failed with the error [" + aException.getMessage() + "]"
);
Events.instance().raiseEvent(
Application.ApplicationEvent.applicationExceptionEvent.name(),
aException,
Orchestrator.class
);
}
//___________________________________________________________
#Observer("controllerCompleteEvent")
synchronized public void successfulControllerCompleteListiner(Process aProcess, long aWorkloadId) {
try {
MisWorkload completedWorklaod = entityManager.find(MisWorkload.class, aWorkloadId);
workloadManager.completeWorkload(completedWorklaod);
} catch (Exception ex) {
log.error(ex.getMessage(), ex);
}
decreaseProcessInstanceCount();
log.info("Controller " + String.valueOf(aProcess) + " completed successfuly");
}
//___________________________________________________________
#Asynchronous
public void startProcessingWorkloads(#IntervalDuration long interval) {
log.info("Polling for workloads.");
log.info(entityManager.toString());
try {
MisWorkload pendingWorkload = workloadManager.getNextPendingWorkload();
if (pendingWorkload != null) {
log.info(
"Pending Workload found (Workload_Id = " +
String.valueOf(pendingWorkload.getWorkloadId()) +
"), starting process controller."
);
Process aProcess = pendingWorkload.retriveProcessIdAsProcess();
ControllerIntf controller = createWorkloadController(aProcess);
if (controller != null) {
controller.start(aProcess, pendingWorkload.getWorkloadId());
workloadManager.setWorkloadProcessing(pendingWorkload);
}
}
} catch (Exception ex) {
Events.instance().raiseEvent(
Application.ApplicationEvent.applicationExceptionEvent.name(),
ex,
Orchestrator.class
);
}
log.info("Polling complete.");
}
//___________________________________________________________
private ControllerIntf createWorkloadController(Process aProcess) {
ControllerIntf newController = null;
switch(aProcess) {
case LOAD:
newController = loadController;
break;
default:
log.info(
"createWorkloadController() does not know the value (" +
aProcess.name() +
") no controller will be started."
);
}
// If a new controller is created than increase the
// count of started controllers so that we know how
// many are running.
if (newController != null) {
incrementProcessInstanceCount();
}
return newController;
}
//___________________________________________________________
}
LoadController.java
#Name("loadController")
#Scope(ScopeType.STATELESS)
#AutoCreate
public class LoadController implements ControllerIntf {
//__________________________________________________
#Logger private Log log;
#In private EntityManager entityManager;
//__________________________________________________
private String fFileName = "";
private String fNMDSFileName = "";
private String fAddtFileName = "";
//__________________________________________________
public LoadController(){ }
//__________________________________________________
#Asynchronous
synchronized public void start(Process aProcess, long aWorkloadId) {
log.info(
LoadController.class.getName() +
" process thread was started for WorkloadId [" +
String.valueOf(aWorkloadId) + "]."
);
log.info(entityManager.toString());
try {
Query aQuery = entityManager.createQuery(
"from MisLoad MIS_Load where Workload_Id = " + String.valueOf(aWorkloadId)
);
MisLoad misLoadRecord = (MisLoad)aQuery.getSingleResult();
fFileName =
misLoadRecord.getInitiatedBy().toUpperCase() + "_" +
misLoadRecord.getMdSourceSystem().getMdState().getShortName() + "_" +
DateUtils.now(DateUtils.FORMAT_FILE) + ".csv"
;
fNMDSFileName = "NMDS_" + fFileName;
fAddtFileName = "Addt_" + fFileName;
createDataFile(misLoadRecord.getFileContents());
ArrayList<String>sasCode = generateSASCode(
misLoadRecord.getLoadId(),
misLoadRecord.getMdSourceSystem().getPreloadFile()
);
//TODO: As the sas password will be encrypted in the database, we will
// need to decrypt it before passing to the below function
executeLoadSASCode(
sasCode,
misLoadRecord.getInitiatedBy(),
misLoadRecord.getSasPassword()
);
createWorkloadContentRecords(aWorkloadId, misLoadRecord.getLoadId());
//TODO: Needs to remove password from DB when complete
removeTempCSVFiles();
Events.instance().raiseEvent(
Application.ApplicationEvent.controllerCompleteEvent.name(),
aProcess,
aWorkloadId
);
log.info(LoadController.class.getName() + " process thread completed.");
} catch (Exception ex) {
Events.instance().raiseEvent(
Application.ApplicationEvent.controllerExceptionEvent.name(),
aProcess,
ex
);
}
}
//__________________________________________________
private void createDataFile(byte[] aFileContent) throws Exception {
File dataFile =
new File(ECEConfig.getConfiguration().sas_tempFileDir() + "\\" + fFileName);
FileUtils.writeBytesToFile(dataFile, aFileContent, true);
}
//__________________________________________________
private ArrayList<String> generateSASCode(long aLoadId, String aSourceSystemPreloadSasFile) {
String sasTempDir = ECEConfig.getConfiguration().sas_tempFileDir();
ArrayList<String> sasCode = new ArrayList<String>();
sasCode.add("%let sOracleUserId = " + ECEConfig.getConfiguration().oracle_username() + ";");
sasCode.add("%let sOraclePassword = " + ECEConfig.getConfiguration().oracle_password() + ";");
sasCode.add("%let sOracleSID = " + ECEConfig.getConfiguration().oracle_sid() + ";");
sasCode.add("%let sSchema = " + ECEConfig.getConfiguration().oracle_username() + ";");
sasCode.add("%let sECESASSourceDir = " + ECEConfig.getConfiguration().sas_sourceDir() + ";");
sasCode.add("libname lOracle ORACLE user=&sOracleUserId pw=&sOraclePassword path=&sOracleSID schema=&sSchema;");
sasCode.add("%let sCommaDelimiter = %str(" + ECEConfig.getConfiguration().dataload_csvRawDataFileDelimiter() + ");");
sasCode.add("%let sPipeDelimiter = %nrquote(" + ECEConfig.getConfiguration().dataload_csvNMDSDataFileDelimiter() + ");");
sasCode.add("%let sDataFileLocation = " + sasTempDir + "\\" + fFileName + ";");
sasCode.add("%let sNMDSOutputDataFileLoc = " + sasTempDir + "\\" + fNMDSFileName + ";");
sasCode.add("%let sAddtOutputDataFileLoc = " + sasTempDir + "\\" + fAddtFileName + ";");
sasCode.add("%let iLoadId = " + String.valueOf(aLoadId) + ";");
sasCode.add("%include \"&sECESASSourceDir\\ECE_UtilMacros.sas\";");
sasCode.add("%include \"&sECESASSourceDir\\" + aSourceSystemPreloadSasFile + "\";");
sasCode.add("%include \"&sECESASSourceDir\\ECE_NMDSLoad.sas\";");
sasCode.add("%preload(&sDataFileLocation, &sCommaDelimiter, &sNMDSOutputDataFileLoc, &sAddtOutputDataFileLoc, &sPipeDelimiter);");
sasCode.add("%loadNMDS(lOracle, &sNMDSOutputDataFileLoc, &sAddtOutputDataFileLoc, &sPipeDelimiter, &iLoadId);");
return sasCode;
}
//__________________________________________________
private void executeLoadSASCode(
ArrayList<String> aSasCode, String aUserName, String aPassword) throws Exception
{
SASExecutor aSASExecutor = new SASExecutor(
ECEConfig.getConfiguration().sas_server(),
ECEConfig.getConfiguration().sas_port(),
aUserName,
aPassword
);
aSASExecutor.execute(aSasCode);
log.info(aSASExecutor.getCompleteSasLog());
}
//__________________________________________________
/**
* Creates the MIS_UR_Workload_Contents records for
* the ECE Unit Record data that was just loaded
*
* #param aWorkloadId
* #param aMisLoadId
* #throws Exception
*/
private void createWorkloadContentRecords(long aWorkloadId, long aMisLoadId) throws Exception {
String selectionRule =
" from EceUnitRecord ECE_Unit_Record where ECE_Unit_Record.loadId = " +
String.valueOf(aMisLoadId)
;
MisWorkload misWorkload = entityManager.find(MisWorkload.class, aWorkloadId);
SeamManualTransaction manualTx = new SeamManualTransaction(
entityManager,
ECEConfig.getConfiguration().manualSeamTxTimeLimit()
);
manualTx.begin();
RecordPager oPager = new RecordPager(
entityManager,
selectionRule,
ECEConfig.getConfiguration().recordPagerDefaultPageSize()
);
Object nextRecord = null;
while ((nextRecord = oPager.getNextRecord()) != null) {
EceUnitRecord aEceUnitRecord = (EceUnitRecord)nextRecord;
MisUrWorkloadContents aContentsRecord = new MisUrWorkloadContents();
aContentsRecord.setEceUnitRecordId(aEceUnitRecord.getEceUnitRecordId());
aContentsRecord.setMisWorkload(misWorkload);
aContentsRecord.setProcessOutcome('C');
entityManager.persist(aContentsRecord);
}
manualTx.commit();
}
/**
* Removes the CSV temp files that are created for input
* into the SAS server and that are created as output.
*/
private void removeTempCSVFiles() {
String sasTempDir = ECEConfig.getConfiguration().sas_tempFileDir();
File dataInputCSV = new File(sasTempDir + "\\" + fFileName);
File nmdsOutputCSV = new File(sasTempDir + "\\" + fNMDSFileName);
File addtOutputCSV = new File(sasTempDir + "\\" + fAddtFileName);
if (dataInputCSV.exists()) {
dataInputCSV.delete();
}
if (nmdsOutputCSV.exists()) {
nmdsOutputCSV.delete();
}
if (addtOutputCSV.exists()) {
addtOutputCSV.delete();
}
}
}
SeamManualTransaction.java
public class SeamManualTransaction {
//___________________________________________________________
private boolean fObjectUsed = false;
private boolean fJoinExistingTransaction = true;
private int fTransactionTimeout = 60; // Default: 60 seconds
private UserTransaction fUserTx;
private EntityManager fEntityManager;
//___________________________________________________________
/**
* Set the transaction timeout in milliseconds (from minutes)
*
* #param aTimeoutInMins The number of minutes to keep the transaction active
*/
private void setTransactionTimeout(int aTimeoutInSecs) {
// 60 * aTimeoutInSecs = Timeout in Seconds
fTransactionTimeout = 60 * aTimeoutInSecs;
}
//___________________________________________________________
/**
* Constructor
*
* #param aEntityManager
*/
public SeamManualTransaction(EntityManager aEntityManager) {
fEntityManager = aEntityManager;
}
//___________________________________________________________
/**
* Constructor
*
* #param aEntityManager
* #param aTimeoutInSecs
*/
public SeamManualTransaction(EntityManager aEntityManager, int aTimeoutInSecs) {
setTransactionTimeout(aTimeoutInSecs);
fEntityManager = aEntityManager;
}
//___________________________________________________________
/**
* Constructor
*
* #param aEntityManager
* #param aTimeoutInSecs
* #param aJoinExistingTransaction
*/
public SeamManualTransaction(EntityManager aEntityManager, int aTimeoutInSecs, boolean aJoinExistingTransaction) {
setTransactionTimeout(aTimeoutInSecs);
fJoinExistingTransaction = aJoinExistingTransaction;
fEntityManager = aEntityManager;
}
//___________________________________________________________
/**
* Starts the new transaction
*
* #throws Exception
*/
public void begin() throws Exception {
if (fObjectUsed) {
throw new Exception(
SeamManualTransaction.class.getCanonicalName() +
" has been used. Create new instance."
);
}
fUserTx =
(UserTransaction) org.jboss.seam.Component.getInstance("org.jboss.seam.transaction.transaction");
fUserTx.setTransactionTimeout(fTransactionTimeout);
fUserTx.begin();
/* If entity manager is created before the transaction
* is started (ie. via Injection) then it must join the
* transaction
*/
if (fJoinExistingTransaction) {
fEntityManager.joinTransaction();
}
}
//___________________________________________________________
/**
* Commit the transaction to the database
*
* #throws Exception
*/
public void commit() throws Exception {
fObjectUsed = true;
fUserTx.commit();
}
//___________________________________________________________
/**
* Rolls the transaction back
*
* #throws Exception
*/
public void rollback() throws Exception {
fObjectUsed = true;
fUserTx.rollback();
}
//___________________________________________________________
}
In general, injecting an entityManager in a Seam component of scope APPLICATION is not right. An entity manager is something you create, use and close again, in a scope typically much shorter than APPLICATION scope.
Improve by choosing smaller scopes with a standard entityManager injection, or if you need the APPLICATION scope, inject an EntityManagerFactory instead, and create, use and close the entityManager yourself.
Look in your Seam components.xml to find the name of your EntityManagerFactory compoment.
Well, my first is advice is
If you are using an EJB application, prefer To use a Bean Managed Transaction instead of your custom SeamManualTransaction. When you use a Bean Managed Transaction, you, as a developer, Take care of calling begin and commit. You get this feature by using an UserTransaction component. You can create a Facade layer which begins and commit your Transaction. Something like
/**
* default scope when using #Stateless session bean is ScopeType.STATELESS
*
* So you do not need to declare #Scope(ScopeType.STATELESS) anymore
*
* A session bean can not use both BEAN and CONTAINER Transaction management at The same Time
*/
#Stateless
#Name("businessFacade")
#TransactionManagement(TransactionManagerType.BEAN)
public class BusinessFacade implements BusinessFacadeLocal {
private #Resource TimerService timerService;
private #Resource UserTransaction userTransaction;
/**
* You can use #In of you are using Seam capabilities
*/
private #PersistenceContext entityManager;
public void doSomething() {
try {
userTransaction.begin();
userTransaction.setTransactionTimeout(int seconds);
// business logic goes here
/**
* To enable your Timer service, just call
*
* timerService.createTimer(15*60*1000, 15*60*1000, <ANY_SERIALIZABLE_INFO_GOES_HERE>);
*/
userTransaction.commit();
} catch (Exception e) {
userTransaction.rollback();
}
}
#Timeout
public void doTimer(Timer timer) {
try {
userTransaction.begin();
timer.getInfo();
// logic goes here
userTransaction.commit();
} catch (Exception e) {
userTransaction.rollback();
}
}
}
Let's see UserTransaction.begin method API
Create a new transaction and associate it with the current thread
There is more:
The lifetime of a container-managed persistence context (injected Through #PersistenceContext annotation) corresponds to the scope of a transaction (between begin and commit method call) when using transaction-scoped persistence context
Now Let's see TimerService
It is a container-provided service that allows enterprise beans to be registered for
timer callback methods to occur at a specified time, after a specified elapsed time, or after specified intervals. The bean class of an enterprise bean that uses the timer
service must provide a timeout callback method. Timers can be created for stateless session beans, message-driven beans
I hope It can be useful To you