I'm building a webcrawler and it has 2 main feature wich are both executed as threads :
-The fetcher (crawl a website and separate links from files store both of them into the database).
-The downloader (download files based on their url returned by the fetcher).
I've an object WebSite wich include everything I want to know about a website. Now I want to manipulate my database to change the status of a link from waiting to fetching then to fetched. The same goes for files from waiting to downloading then to downloaded.
To prevent a Fetcher to fetch a link that has been chosen by another fetcher I've done this function inside my WebSite object :
public synchronized String[] getNextLink(){
//Return the next link from database that has visited set to 0 then change it to -1 to say that it's in-use.
}
And I've done the same for my Downloaders with this function :
public synchronized String getNextFile(){
//Return the next file from database that has downloaded set to 0 then change it to -1 to say that it's downloading
}
Both method are inside my WebSite object since if 2 Fetchers are working with different websites they cannot Select the same row inside my database (same goes for downloaders). But both function can be called at the same time because Fetchers never select a file and Downloaders never select a link.
Now synchronized is using a single lock (per object) so both of my methods cannot be called at the same time. Is there another keyword to use one lock per method per object ? Or do I need to code it ?
Instead of applying the synchronized keyword to whole methods, which implicitly uses this as a lock-object, you can use two independent lock objects (and any object can be used as a lock-object in Java) within the methods. Each lock object will be independent of others:
private final Object fetcherMutex = new Object();
private final Object downloaderMutex = new Object();
public String[] getNextLink(){
synchronized (fetcherMutex) { /* ... */ }
}
public String[] getNextFile(){
synchronized (downloaderMutex) { /* ... */ }
}
Related
I want to update a document with a User object that I have, but I do not want the document to be created if it does not exist, and therefore I cannot use "DocumentReference.set" with "SetOptions.Merge()" (to my understanding).
However, according to this post (Difference between set with {merge: true} and update), "update" is actually the command I need. My problem is, it doesn't seem like update accepts a Java object.
I do not want to check whether or not the document exists myself, as this will result in an unnecessary read.
Is there any way around this?
Here is my code (I have removed success and failure listeners for simplicity):
public void saveUser(User user)
{
CollectionReference collection = db.collection("users");
String id = user.getId();
if (id.equals(""))
{
collection.add(user);
}
else
{
// I need to ensure that the ID variable for my user corresponds
// with an existing ID, as I do not want a new ID to be generated by
// my Java code (all IDs should be generated by Firestore auto-ID)
collection.document(ID).set(user);
}
}
It sounds like you:
Want to update an existing document
Are unsure if it already exists
Are unwilling to read the document to see if it exists
If this is the case, simply call update() and let it fail if the document doesn't exist. It won't crash your app. Simply attach an error listener to the task it returns, and decide what you want to do if it fails.
However you will need to construct a Map of fields and values to update using the source object you have. There are no workarounds for that.
While working on IntelliJ , I am unable to check that if the thread is holding the lock or not.
On eclipse GUI there is a lock like icon against the thread , telling us that it is holding that lock.
In below code snapshot, my thread is at notifyElementAdded() and holding the lock however, in a thread stack there is no such Icon or intimation from Intellij
So my question is how to check the same on IntelliJ GUI.
There is actually a boolean attribute to the Thread class in Java - Thread.holdsLock().
To get the name of the thread which holds the monitor you can use the code example below:
public static long getMonitorOwner(Object obj)
{
if (Thread.holdsLock(obj))
{
return Thread.currentThread().getId();
}
}
I don't think there is a similar functionality. But you can still check by getting the dump
You can click on Get Thread Dump in Debug window and then you can see the locked in the log to see that the thread is actually holding the lock
Create a custom variable in the Intellij debugging console using the plus button as shown in the image below.
Now every time you run the code in the debug mode, this variable will be re-calculated at your all debug points.
I created a variable- Thread.holdsLock(AwsS3ClientHelper.class) since I was acquiring a lock on the class itself. You can write any variable of your choice there. In your particular case, it will be Thread.holdsLock(observers).
This can be a potential feature request for IntelliJ to include this to their GUI product.
Programmatically, to verify this you can use the java.lang.Thread.holdsLock() method which returns true if and only if the current thread holds the monitor lock on the specified object
public static boolean holdsLock(Object obj)
Below snippet of run method for reference,
public void run() {
/* returns true if thread holds monitor lock */
// returns false
System.out.println("Holds Lock = " + Thread.holdsLock(this));
synchronized (this) {
// returns true
System.out.println("Holds Lock = " + Thread.holdsLock(this));
}
}
This is SAP PI requirement,
Source System: XY_Client
Middleware: PI System
Target system : SAP
The XML files are received to the PI system, for each XML file an internal file is generated to keep track of store_number, and count of xml files.
How it works: suppose if XML_FILE_1 reaches PI, internal file called sequence_gen is created. the file contains the store number present in XML file and count will be initialized to 1.
so first time,
sequence_gen file contains Store: 1001 Count:1
(after some time interval)If XML_FILE_2 reaches PI, second time,
sequence_gen file contains Store: 1001 Count:2
and so on..
My question is : If 'n' number of files come at the same time to PI system, the 1st file will lock the sequence_gen file. so how will the 2nd file update the value to the sequence_gen file? So how to tackle this problem?
I thought of creating a thread instance for every call and storing it in a database. and retrieving each instance, performing function, returning result to the xml call and deleting that instance.. Is it possible? How to go forward on this?
Rather than keep track of all of the threads that are locking and unlocking the file, you could have a single thread that is in charge of changing it. Have each thread place a request to change the file into a Concurrent Queue, which then notifies the Sequence_Gen thread to write to its own file. In essence:
Sequence_Gen thread:
#Override
public synchronized void Run(){
while(true){ //Some condition
while(queue.isEmpty()) {
this.wait();
}
Object obj = queue.pop();
//Open file
file.write(obj);
//Close file
}
}
Then, in any other thread, just queue and notify that there is something to write.
public synchronized void AddItem(Object item) {
queue.put(item);
this.notifyAll();
}
In my application i have a set of file which contains some information.Now what i have to do i have to process the files and among those files which are duplicate i have to skip them so to do that i have used file CRC to check which files are processed and which are not,so now in the case of duplicate checking i have to store the file CRC to some where because when today's processing is over then i will have to process the file again tomorrow.Then if some file are duplicate of today those should be skipped. so now what i have done in my code is..
filesSize += fileInf.srcFileSize;
// if file if already polled and under processing or in execution queue
if (filesInQueue.get(fileInf.srcFileCRC) != null) {
System.out.println("Skipping queued file: " + fileInf.srcFileName);
ifCRCExist.add(fileInf.srcFileCRC);
fileInf.srcFileCRC, is the required fileCRC which i have added to the list now i have store the list non persistently some where which i can use to check the record later..I think i am clear now..Please any one help me..
Each instance in Java is transient by nature. Are you in search of a Singleton? You may provide the ArrayList as a static field of some Class. For example:
public class InMemoryStore {
public static final List<String> MY_SINGLETON_LIST;
static {
MY_SINGLETON_LIST = ...
}
}
Or you may do it by creating a bean with Spring, for example.
If you want a non persistent storage means not in DB or XML then make a class
and create a static list member. Now you can use it as many times by initializing only once. Just like Singleton pattern.
I have some counters I created at my Mapper class:
(example written using the appengine-mapreduce Java library v.0.5)
#Override
public void map(Entity entity) {
getContext().incrementCounter("analyzed");
if (isSpecial(entity)){
getContext().incrementCounter("special");
}
}
(The method isSpecial just returns true or false depending on the state of the entity, not relevant to the question)
I want to access those counters when I finish processing the whole stuff, at the finish method of the Output class:
#Override
public Summary finish(Collection<? extends OutputWriter<Entity>> writers) {
//get the counters and save/return the summary
int analyzed = 0; //getCounter("analyzed");
int special = 0; //getCounter("special");
Summary summary = new Summary(analyzed, special);
save(summary);
return summary;
}
... but the method getCounter is only available from the MapperContext class, which is accessible only from Mappers/Reducers getContext() method.
How can I access my counters at the Output stage?
Side note: I can't send the counters values to my outputted class because the whole Map/Reduce is about transforming a set of Entities to another set (in other words: the counters are not the main purpose of the Map/Reduce). The counters are just for control - it makes sense I compute them here instead of creating another process just to make the counts.
Thanks.
There is not a way to do this inside of output today. But feel free to request it here:
https://code.google.com/p/appengine-mapreduce/issues/list
What you can do however is to chain a job to run after your map-reduce that will receive it's output and counters. There is an example of this here:
https://code.google.com/p/appengine-mapreduce/source/browse/trunk/java/example/src/com/google/appengine/demos/mapreduce/entitycount/ChainedMapReduceJob.java
In the above example it is running 3 MapReduce jobs in a row. Note that these don't have to be MapReduce jobs, you can create your own class that extends Job and has a run method which creates your Summary object.