Implementing busy wait within an async task in Java - java

I'm building a program in Java 7 and I need to download multiple files from a backend server depending on a different downloaded file value.
I'll explain:
Firstly, my program downloads a file via AsyncTask this file contains the value of files to download. onPost method it calls a different method that downloads these files and insert them into an array list after manipulating them into my app data.
Now, in order to create some kind of handling the ending of these AsyncTasks I've created a different AsyncTask in order to create a busy wait, considering I know how many files are to be downloaded, I check wether the size of the array is equal to the files number on a while loop.
My question is, does this busy wait in the AsyncTask disables the ability of the OS to release the processor it is running on or it's nothing to worry about?
I don't want the busy wait to lock a processor in order to download the files faster, or does it even matter?
Since I assume the async task has its own apoc and goes to sleep if needed I assume there won't be abuse of a processor by this busy wait?
Does the amount of files needed to be downloaded affect processing time? And if yes, does downloading a single file with the data is better than multi threading the download into a few hundred smaller files?
And finally, is it a good practice to write my own busy wait in an AsyncTask?
I'll add some code snippets soon...

is it a good practice to write my own XYZ - no, almost never, unless its Free vs proprietary, this is for the new language or platform, you are doing university research or the standard implementation is so crappy that you have realistic chances to take over. In this case, none of the above seems true.
I would suggest to use the standard parts of java.util.concurrent package. It has many classes that would be very useful for your project, including the Future class that implements the waiting functionality you need. Futures can be returned by the executor service that is implemented there in even multiple ways.
If you need more control over the process, it is also possible to use the CyclicBarrier. It allows one thread to wait till another completes something, and you can use the time out if you think your download may stall.
To answer clearly and directly, no, there is not a good practice to implement the homegrown framework for the functionality that duplicates that is available for free as part of the standard runtime. With or without busy waiting, do not even matter much.

Related

Will threading help improve efficiency in Java?

My application is supposed to have a "realtime with pause" functionality. The user can pause execution, do some things that modify what's going to happen, then unpause and let stuff happen. Stuff happens at regular intervals as specified by the user, can be slow, can be fast.
My goal at using threading here is to improve performance on multicore systems. The amount of data that the application is supposed to crunch at the time intervals is supposed to be arbitrarily large (I expect lots and lots of loops over collections, modifying object properties and generating random numbers, but precious little disk access). I don't want the application to be constrained by the capacity of a single core, if it can use more to run faster.
Will this actually work this way?
I've run some tests (made a program crunch numbers a lot, and looked at CPU usage during its activity), but it's not really conclusive - usage is certainly in the proximity of 100% on my dual core machine, but hardly ever 100%. Does a single-threaded (main only) Java application use all available cores for computation?
Does a single-threaded (main only) Java application use all available cores for computation?
No, it will normally use a single core.
Making a program do computations in parallel with multiple threads may make it faster, but it's not a magical solution for any kind of problem. Whether this is a suitable solution for your program depends on what your program is doing exactly, and if the algorithm can be parallelized. If, for example, you are doing lots of computations where the next computation depends on the result of the previous computation, then making it multi-threaded will not help a lot, because you can't do the computations at the same time - the next one first has to wait for the answer of the previous one. So, you first have to think about what computations in your program could be run in parallel.
Java has a lot of support for multi-threading. You can program with threads directly, or use an executor service, or use the fork/join framework. Whatever is appropriate depends on what exactly you want to do.
Does a single-threaded (main only) Java application use all available cores for computation?
Not usually, but you could make use of some higher level apis in java that is actually using threads for you and youre not even usinfpg threads directly, more obviousiously fork/join and executors, less obvious the new Streams API on collections (ie parallelStream).
In general, though, to make use of all cores, you need to do some kind of concurrency. Further...its really hard to just observe you OS monitor to see what is going on (especially with only 2 cores)...your OS has other things going on (trying to manage itself, running your IDE, running crontab, running a browers to post to stackoverflow ;).
Finally, just implementing (concurrency) itself may not help, you have to do it "right" for your code/algorithm.
a java thread will run in a single cpu. to use multiple CPUs, you should have multiple threads.
Imagine that u have to do various tasks using your hand. You will do it slowly using one hand and more effciently using both your hands. Similarly, in java or in any other language multi threading provides the system with many hands. The good news is that you can have many threads to do different tasks. Running operations in a single thread will make the program sluggish and sometimes unresponsive. A good practice is to do long running tasks in a separate thread. For example loading large chunks of data from a database should be processed in a separate thread. Downloading data from the internet should also be processed in a separate thread. What happens if you do long running operations in the main thread? The program HANGS and will become unresponsive till the task gets completed and the user will think that there is someting wrong. I hope you get it

How do I execute multiple processes simultaneously in Java?

I am working on an application in which I want multiple tasks to be executed simultaneously.
I also want to be able to keep track of the number of such tasks being run in parallel, and sometimes add yet another task to be processed in parallel, in addition to the current set of tasks already being processed.
One more thing- I want to do the above, not only in a desktop app, but also in a cloud app, in which I initialise another virtual machine running Tomcat, and then repeat all of the above in that instance.
What is the best way to do this? If you can point me to the correct theory/guides on this subject, that would be great, although code samples are also welcome.
Concurrency is a huge topic in Java, please take your time for it
Lesson: Concurrency
Concurrency in a Java program is accomplished by starting your own Threads. Multiple processes can only be realized with multiple JVMs. When you are done with the basics, you want to take a look at Executors. They will help to implement your application in a structured way since they abstract from Threads to Tasks.
I don't know how much time you have planned for this, but if you are really at the start, get Java Concurrency in Practice, read it and write a kick-ass concurrent Java application.
Raising the whole thing to a distributed level is a whole other story. You cannot tackle that all at once.
Wow... What a series of steps. Start by extending Runnable, then using Thread to run and manage your Jobs. After that, you can get into Tomcat.

Concurrency while reading files from file system

We have an application that reads files from a particular folder, processes them and copies(some business logic) it to another folder.
The problem here is when there are very large number of files to be processed, running a single instance of an application or a single thread is no longer enough to process this files.
One approach we have for this is to start multiple instances of the application(I feel something is wrong with this approach. Suggest me an alternative if there is one).
Spawning threads or starting multiple instances of the application, care should be taken that, if a thread reads one file and starts processing it, another thread should not pick it up.
We are trying to achieve this by having a database table with the list of file names in the folder, so that when a thread first reads the table for the file name ,we will change the status to in-process or completed and pessimistically lock the table so that other threads cannot read it.
Is there any better solution to the problem ?
You can use most of your existing implementation as the front-end processor to feed file streams to worker threads that you can start/stop as demand dictates. Only the front-end thread opens files, so there is no possibility of one worker interfering with another.
EDIT: Added the word 'no' as it changes the meaning quite a bit...
Also have a look at JDK 7. It has a new file I/O API and a fork/ join framework which might help.
Take a look at Apache Camel (http://camel.apache.org), and its File component (http://camel.apache.org/file2.html). Using Camel allows you to very easily define a set of processing instructions to consume files in a directory atomically, and also to configure a thread pool to deal with multiple files at the same time. Camel in Action's a great book to get you started.
What you describe reminds me of the classical style to develop on UNIX.
In this classical style, you would move a file to a work-in-progress directory so that other files do not pick it up. In general: You could use one directory per processing state and than move files from state to state.
This works essentially because file moves are atomic (at least under Unix systems and NFTS).
What is nice with this approach, is that it is pretty easy to handle problematic situations like crashes and it has automatically a nice management interface everyone is familiar with (the filesystem GUI, ls, Windows Explorer, ...).

How can I process multiple files concurrently?

I've a scenario where web archive files (warc) are being dropped by a crawler periodically in different directories. Each warc file internally consists of thousand of HTML files.
Now, I need to build a framework to process these files efficiently. I know Java doesn't scale in terms of parallel processing of I/O. What I'm thinking is to have a monitor thread which scans this directory, pick the file names and drop into a Executor Service or some Java blocking queue. A bunch of worker threads (maybe a small number for I/O issue) listening under the executor service will read the files, read the HTML files within and do respective processing. This is to make sure that threads do not fight for the same file.
Is this the right approach in terms of performance and scalability? Also, how to handle the files once they are processed? Ideally, the files should be moved or tagged so that they are not being picked up by the thread again. Can this be handled through Future objects ?
In recent versions of Java (starting from 1.5 I believe) there are already built in file change notification services as part of the native io library. You might want to check this out first instead of going on your own. See here
My key recommendation is to avoid re-inventing the wheel unless you have some specific requirement.
If you're using Java 7, you can take advantage of the WatchService (as suggested by Simeon G).
If you're restricted to Java 6 or earlier, these services aren't available in the JRE. However, Apache Commons-IO provides file monitoring See here.
As an advantage over Java 7, Commons-IO monitors will create a thread for you that raises events against the registered callback. With Java 7, you will need to poll the event list yourself.
Once you have the events, your suggestion of using an ExecutorService to process files off-thread is a good one. Moving files is supported by Java IO and you can just ignore any delete events that are raised.
I've used this model in the past with success.
Here are some things to watch out for:
The new file event will likely be raised once the file exists in the directory. HOWEVER, data will still be being written to it. Consider reasonable expectations for file size and how long you need to wait until a file is considered 'whole'
What is the maximum amount of time you must spend on a file?
Make your executor service parameters tweakable via config - this will simplify your performance testing
Hope this helps. Good luck.

What's the most efficient method of continually deleting files older than X hours on Windows?

I have a directory that continually fills up with "artefact" files. Many different programs dump their temporary files in this directory and it's unlikely that these programs will become self-cleaning any time soon.
Meanwhile, I would like to write a program that continually deletes files in this directory as they become stale, which I'll define as "older than 30 minutes".
A typical approach would be to have a timed mechanism that lists the files in the directory, filters on the old stuff, and deletes the old stuff. However, this approach is not very performant in my case because this directory could conceivably contain 10s or hundreds of thousands of files that do not yet qualify as stale. Consequently, this approach would continually be looping over the same thousands of files to find the old ones.
What I'd really like to do is implement some kind of directory listener that was notified of any new files added to the directory. This listener would then add those files to a queue to be deleted down the road. However, there doesn't appear to be a way to implement such a solution in the languages I program in (JVM languages like Java and Scala).
So: I'm looking for the most efficient way to keep a directory "as clean as it can be" on Windows, preferably with a JVM language. Also, though I've never programmed with Powershell, I'd consider it if it offered this kind of functionality. Finally, if there are 3rd party tools out there to do such things, I'd like to hear about them.
Thanks.
Why can't you issue a directory system command sorted by oldest first:
c:>dir /OD
Take the results and delete all files older than your threshold or sleep if no files are old enough.
Combine that with a Timer or Executor set to a granularity 1 second - 1 minute which guarantees that the files don't keep piling up faster than you can delete them.
If you don't want to write C++, you can use Python. Install pywin32 and you can then use the win32 API as such:
import win32api, win32con
change_handle = win32api.FindFirstChangeNotification(
path_to_watch,
0,
win32con.FILE_NOTIFY_CHANGE_FILE_NAME
)
Full explanation of what to do with that handle by Tim Golden here: http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html.
In Java, you can also use Apache Commons JCI FAM. It's is an opensource java library that you can use for free.
JDK 7 (released in beta currently) includes support for file notifications as well. Check out Java NIO2 tutorial.
Both options should work both on Windows and Linux.
http://www.cyberpro.com.au/Tips_n_Tricks/Windows_Related_Tips/Purge_a_Directory_in_Windows_automatically/
I'd go with C++ for a utility like this - lets you interface with the WIN32 API, which does indeed have directory listening facilities (FindFirstChangeNotification or ReadDirectoryChangesW). Use one thread that listens for change notifications and updates your list of files (iirc FFCN requires you to rescan the folder, whereas RDCW gives you the actual changes).
If you keep this list sorted according to modification time, it becomes easy to Sleep() just long enough for a file to go stale, instead of polling at some random fixed interval. You might want to do a WaitForSingleObject with a timeout instead of Sleep, in order to react to outside changes (ie, the file you're waiting for to become stale has been deleted externally, so you'll want to wake up and determine when the next file will become stale).
Sounds like a fun little tool to write :)
You might want to bite the bullet and code it up in C# (or VB). What you're asking for is pretty well handled by the FileSystemWatcher class. It would work basically the way you are describing. Register files as they are added into the directory. Have a periodic timer that scans the list of files for ones that are stale and deletes them if they are still there. I'd probably code it up as a Windows service running under a service id that has enough rights to read/delete files in the directory.
EDIT: A quick google turned up this FileSystemWatcher for Java. Commercial software. Never used it, so can't comment on how well it works.

Categories

Resources