Read JSON files into collections, best practice

Read JSON files into collections, best practice - java

I'm working on a JavaFX application. I have several JSON files which I would like to read and insert into Collections in domain objects. I am using Gson to read these files at present. My application currently is working, however, there is a long delay before the application launches. I assume that this is because I'm reading these files sequentially and in the same Thread. Therefore, I am looking to enhance the launch time by introducing some concurrency. I'm thinking If I can figure out how to read the files in parallel it should speed up the launch time. I'm new to the idea of concurrency so I'm trying to learn as I go. Needless to say, I've hit a few roadblocks and can't seem to find much information or examples online.
Here are my issues:
Not sure if the JSON file reads can be done in a background thread.
Domain classes use these Collections to compute and eventually display values in the GUI. From my understanding, if you modify the GUI it has to be done in the JavaFX Application thread and not in the background. I'm not sure if loading data to be used in the GUI counts as modifying the GUI. I'm not directly updating any GUI Nodes like textField.setText("something") by reading Json, so I would assume no, I'm not. Am I wrong?
What is the difference between a Task> and Thread or an ExecutorService and Callable>? Is one method preferred over the other? I've tried both and failed. When I tried using a task and background thread, I would get a NullPointerException because the app tried to access the collection before the files were read and initialized with data. It went from being too slow to being too fast. SMH.
To solve this problem, I heard about Preloaders. The idea here was to launch some sort of splash screen to delay until the loading of resources (reading of JSON files) was complete, then proceed to the main application. However, the examples or information here is VERY scarce. I'm using Java 10 and IntelliJ, so I may have cornered myself into a one in a million niche.
I'm not asking for anyone to solve my problem for me. I'm just a little lost and don't know where or how to proceed. I'll be happy to share specifics if needed but I think my issues are still conceptual at this point.
Help me StackOverflow you're my only hope.
edit: code example:
public class Employee {
private List<Employee> employeeList;
public Employee() {
employeeList = new ArrayList<>();
populateEmployees();
}
private final void populateEmployees() {
Task<Void> readEmployees = new Task<>() {
#Override
protected Void call() throws Exception {
System.out.println("Starting to read employee.json"); // #1
InputStream in = getClass().getResourceAsStream("/json/employee.json");
Reader reader = new InputStreamReader(in);
Type type = new TypeToken<List<Employee>>(){}.getType();
Gson gson = new Gson();
employeeList.addAll(gson.fromJson(reader, type));
System.out.println("employeeList has " + employeeList.size() + " elements"); // #2
return null;
}
};
readEmployees.run();
System.out.println(readEmployees.getMessage()); // #3
}
}
I see #1 printed to the console, never #2 or 3. How do I know that it processed all through the Task?

How much your app will speed up depends on how big are those files and how much files there are. You should know that creating threads is also resource consuming task. I can imagine situation where you have plenty of files and for each one you're creating a new thread which could even make your app initialize slower.
In case of big amount of files or number of files which can change in time, you can arrange some thread pool of constant number eg. 5 which can work simultaneously on reading files task.
Back to the problem and the question is it worth to use separate threads for reading files, I'll say yes but only if your app have some work on initialization which can be done without knowing content of those files. You should be aware that in some point in time you'll probably need to wait for file parsing results.
As a part of problem solving you can do some benchmark to check how long parsing each file process takes and then you'll know what configuration/amount of working threads will be the best. Eg. you won't create thread for each file when parsing takes 1 second, but if you have 100 files of 1 second processing time you can create some thread pool and divide the job for each thread equally.
yes
I don't know JavaFX but in general concept of Thread and Task is the same. Thread gives you certanity that you're starting new thread, it's lower level of abstraction. Task is some sort of higher abstraction where you want to run part of your code separately, and asynchronously but you don't want to be aware on which thread it will run. Some programming languages behind Task hides actually some thread pool.
Preloaders are fine, because they show user some job is being done in background so he won't worry if application has frozen. On the other hand if you can speed up initialization process it will be great. You can join those two ideas, but remember, no one wants to wait a lot :)

Related

Is a "Chain of Threads" a bad solution for this Java application?

I'm running a program where I download large files, parse them and then write the data I have extracted from the file into another file.
The files take a long time to download and parse but the write task only takes a minute or so on average. My solution I threw together was to have three fixedthreadpools of three threads.
ExecutorService downloadExecutor = Executors.newFixedThreadPool(3);
ExecutorService parseExecutor = Executors.newFixedThreadPool(3);
ExecutorService writeExecutor = Executors.newFixedThreadPool(3);
A thread in the download pool downloads the file, then submits a new thread to the parser threadpool, with the filename as a parameter. This is done within the thread itself. The download thread then gets to work downloading another file from a list of file URLs.
Once the parser thread has finished parsing the data I want from the file,it then submits a new thread containing the data to the write threadpool, where it is then written to a .csv file.
My question is if there is a more elegant solution to this. I have not really done much complex threading. Since I have a lot of files to download and parse, I do not want any of the threads being idle at any time. The idea again, is that since parsing a file can take a while, I might as well make seperate threads devoted to downloading those files first.

Why not use only one Thread pool. Download, parse and save must wait anyway for each other so the best seperation of tasks would be to use one thread per file.

This is not a bad practice as many developers do similar sort of coding. But there are something you need to keep in mind.
Number One, You can't expect the performance to increase just because you have more threads. There are optimum number of threads based on the no of CPUs.
Number Two, You must make sure how exceptions are handled.
Number Three, You must make sure you can shutdown all the thread pools in an event where you need to stop the application.

So your problem has two aspects:
Compute bound
IO bound
Reading and writing to the file is IO bound. Async IO is the best for IO bound tasks. Java has AsynchronousFileChannel that allows you to read and write files without worrying about thread pools where continuation is achieved through completion handlers.
Complete Example.
AsynchronousFileChannel ch = AsynchronousFileChannel.open(path);
final ByteBuffer buf = ByteBuffer.allocate(1024);
ch.read(buf, 0, 0,
new CompletionHandler() {
public void completed(Integer result, Integer length){
..
}
public void failed(Throwable exc, Integer length) {
..
}
}
);
And you do the same for writes, you just write to the channel
ch.write(...
No for parsing the file, thats a compute bound task, and you should get your CPU cores hot for that, you can assign a thread pool equal to the number of cores you have.
executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors())
Now what get to remember is: you need to test your code, and testing concurrent code is hard. If you can't proof its correctness, don't do it.

java application multi-threading design and optimization

I designed a java application. A friend suggested using multi-threading, he claims that running my application as several threads will decrease the run time significantly.
In my main class, I carry several operations that are out of our scope to fill global static variables and hash maps to be used across the whole life time of the process. Then I run the core of the application on the entries of an array list.
for(int customerID : customers){
ConsumerPrinter consumerPrinter = new ConsumerPrinter();
consumerPrinter.runPE(docsPath,outputPath,customerID);
System.out.println("Customer with CustomerID:"+customerID+" Done");
}
for each iteration of this loop XMLs of the given customer is fetched from the machine, parsed and calculations are taken on the parsed data. Later, processed results are written in a text file (Fetched and written data can reach up to several Giga bytes at most and 50 MBs on average). More than one iteration can write on the same file.
Should I make this piece of code multi-threaded so each group of customers are taken in an independent thread?
How can I know the most optimal number of threads to run?
What are the best practices to take into consideration when implementing multi-threading?

Should I make this piece of code multi-threaded so each group of customers are taken
in an independent thread?
Yes multi-threading will save your processing time. While iterating on your list you can spawn new thread each iteration and do customer processing in it. But you need to do proper synchronization meaning if two customers processing requires operation on same resource you must synchronize that operation to avoid possible race condition or memory inconsistency issues.
How can I know the most optimal number of threads to run?
You cannot really without actually analyzing the processing time for n customers with different number of threads. It will depend on number of cores your processor has, and what is the actually processing that is taking place for each customer.
What are the best practices to take into consideration when implementing multi-threading?
First and foremost criteria is you must have multiple cores and your OS must support multi-threading. Almost every system does that in present times but is a good criteria to look into. Secondly you must analyze all the possible scenarios that may led to race condition. All the resource that you know will be shared among multiple threads must be thread-safe. Also you must also look out for possible chances of memory inconsistency issues(declare your variable as volatile). Finally there are something that you cannot predict or analyze until you actually run test cases like deadlocks(Need to analyze Thread dump) or memory leaks(Need to analyze Heap dump).

The idea of multi thread is to make some heavy process into another, lets say..., "block of memory".
Any UI updates have to be done on the main/default thread, like print messenges or inflate a view for example. You can ask the app to draw a bitmap, donwload images from the internet or a heavy validation/loop block to run them on a separate thread, imagine that you are creating a second short life app to handle those tasks for you.
Remember, you can ask the app to download/draw a image on another thread, but you have to print this image on the screen on the main thread.
This is common used to load a large bitmap on a separated thread, make math calculations to resize this large image and then, on the main thread, inflate/print/paint/show the smaller version of that image to te user.
In your case, I don't know how heavy runPE() method is, I don't know what it does, you could try to create another thread for him, but the rest should be on the main thread, it is the main process of your UI.
You could optmize your loop by placing the "ConsumerPrinter consumerPrinter = new ConsumerPrinter();" before the "for(...)", since it does not change dinamically, you can remove it inside the loop to avoid the creating of the same object each time the loop restarts : )

While straight java multi-threading can be used (java.util.concurrent) as other answers have discussed, consider also alternate programming approaches to multi-threading, such as the actor model. The actor model still uses threads underneath, but much complexity is handled by the actor framework rather than directly by you the programmer. In addition, there is less (or no) need to reason about synchronizing on shared state between threads because of the way programs using the actor model are created.
See Which Actor model library/framework for Java? for a discussion of popular actor model libraries.

Test inter-device communication timings in Java

Scenario:
I want to test a communication between 2 devices. They communicate by frames.
I start up the application (on device 1) and I send a number of frames (each frames contains a unique (int) ID). Device 2 receives each frame and sends an acknowledgement (and just echo's the ID) or it doesn't. (when frame got lost)
When device 1 receives the ACK I want to compare the time it took to send and receive the ACK back.
From looking around SO
How do I measure time elapsed in Java?
System.nanoTime() is probably the best way to monitor the elapsed time. However this is all happening in different threads according to the classic producer-consumer pattern where a thread (on device 1) is always reading and another is managing the process (and also writing the frames). Now thank you for bearing with me my question is:
Question: Now for the problem: I need to convey the unique ID from the ACK frame from the reading thread to the managing thread. I've done some research and this seems to be an good candidate for wait/notify system or not? Or perhaps I just need a shared array that contains data of each frame send? But than how does the managing thread know it happened?
Context I want to compare these times because I want to research what factors can hamper communication.

Why don't you just populate a shared map with <unique id, timestamp> pairs? You can expire old entries by periodically removing entries older than a certain amount.

I suggest you reformulate your problem with tasks (Callable). Create a task for the writer and one for the reader role. Submit these in pairs in an ExecutorService and let the Java concurrency framework handle the concurrency for you. You only have to think about what will be the result of a task and how would you want to use it.
// Pseudo code
ExecutorService EXC = Executors.newCachedThreadPool();
Future<List<Timestamp>> readerFuture = EXC.submit(new ReaderRole(sentFramwNum));
Future<List<Timestamp>> writerFuture = EXC.submit(new WriterRole(sentFrameNum));
List<Timestamp> writeResult = writerFuture.get(); // wait for the completion of writing
List<Timestamp> readResult = readerFuture.get(); // wait for the completion of reading
This is pretty complex stuff but much cleaner and more stable that a custom developed synchronization solution.
Here is a pretty good tutorial for the Java concurrency framework: http://www.vogella.com/articles/JavaConcurrency/article.html#threadpools

Trouble understanding Java threads

I learned about multiprocessing from Python and I'm having a bit of trouble understanding Java's approach. In Python, I can say I want a pool of 4 processes and then send a bunch of work to my program and it'll work on 4 items at a time. I realized, with Java, I need to use threads to achieve this same task and it seems to be working really really well so far.
But.. unlike in Python, my cpu(s) aren't getting 100% utilization (they are about 70-80%) and I suspect it's the way I'm creating threads (code is the same between Python/Java and processes are independent). In Java, I'm not sure how to create one thread so I create a thread for every item in a list I want to process, like this:
for (int i = 0; i < 500; i++) {
Runnable task = new MyRunnable(10000000L + i);
Thread worker = new Thread(task);
// We can set the name of the thread
worker.setName(String.valueOf(i));
// Start the thread, never call method run() direct
worker.start();
// Remember the thread for later usage
threads.add(worker);
}
I took it from here. My question is this the correct way to launch threads or is there a way to have Java itself manage the number of threads so it's optimal? I want my code to run as fast as possible and I'm trying to understand how to tell and resolve any issues that maybe arising from too many threads being created.
This is not a major issue, just curious to how it works under the Java hood.

You use an Executor, the implementation of which handles a pool of threads, decides how many, and so forth. See the Java tutorial for lots of examples.
In general, bare threads aren’t used in Java except for very simple things. Instead, there will be some higher-level API that receives your Runnable or Task and knows what to do.

Take a look at the Java Executor API. See this article, for example.
Although creating Threads is much 'cheaper' than it used to be, creating large numbers of threads (one per runnable as in your example) isn't the way to go - there's still an overhead in creating them, and you'll end up with too much context switching.
The Executor API allows you to create various types of thread pool for executing Runnable tasks, so you can reuse threads, flexibly manage the number that are created, and avoid the overhead of thread-per-runnable.
The Java threading model and the Python threading model (not multiprocessing) are really quite similar, incidentally. There isn't a Global Interpreter Lock as in Python, so there's usually less need to fork off multiple processes.

Thread is a "low level" API.
Depending on what you want to do, and the version of java you use, their is better solution.
If you use Java 7, and if your task allow it, you can use the fork/join framework : http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
However, take a look at the java concurrency tutorial : http://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html

Exchange data in real time over AJAX with multiple threads

I am developing an application in JSF 2.0 and I would like to have a multiline textbox which displays output data which is being read (line by line) from a file in real time.
So the goal is to have a page with a button on it that triggers the backend to start reading from the file and then displaying the results as it's reading in the textbox.
I had thought about doing this in the following way:
Have the local page keep track of what lines it has retrieved/displayed in the textbox so far.
Periodically the local page will poll the backend using AJAX and request any new data that has been read (tell it what lines the page has so far and only retrieve the new lines since then).
This will continue until the entire file has been completely retrieved.
The issue is that the bean method that reads from the file is running a while loop that blocks. So to read from the data structure it is writing to at the same time will require using additional Threads, correct? I hear that spawning new Threads in a web application is a potentially dangerous move and that Thread pools should be used, etc.
Can anyone shed some insight on this?
Update: I tried a couple of different things with no luck. But I did manage to get it working by spawning a separate Thread to run my blocking loop while the main thread could be used to read from it whenever an AJAX request is processed. Is there a good library I could use to do something similar to this that still gives JSF some lifecycle control over this Thread?

Have you considered implementing the Future interface (included in Java5+ Concurrency API)? Basically, as you read in the file, you could split it into sections and simply create a new Future object (for each section). Then you can have the object return once the computation has completed.
This way you prevent having to access the structure while it is still being manipulated by the loop and you also split the operations into smaller computations reducing the amount of time locking occurs (total lock time might be greater but you get faster response to other areas). If you maintain the order in which your Future objects were created then you don't need to track line #'s. Note that calling Future.get() does block until the object is 'ready'.
The rest of you're approach would be similar - make the Ajax call to get content of all 'ready' Future objects from a FIFO queue.
I think I understand what you're trying to accomplish.. maybe a bit more info would help.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.