Comparing Fork And Join with Single threaded program - java

I am trying to get started with the Fork-Join framework for a smaller task. As I start-up example I tried copying mp3 files
import java.io.IOException;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.StandardCopyOption;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;
public class DeepFileCopier extends RecursiveTask<String>{
/**
*
*/
private static final long serialVersionUID = 1L;
private static Path startingDir = Paths.get("D:\\larsen\\Music\\");
private static List<Path> listOfPaths = new ArrayList<>();
private int start, end;
public static void main(String[] args) throws IOException
{
long startMillis = System.currentTimeMillis();
Files.walkFileTree(startingDir, new CustomFileVisitor());
final DeepFileCopier deepFileCopier = new DeepFileCopier(0,listOfPaths.size());
final ForkJoinPool pool = new ForkJoinPool(Runtime.getRuntime().availableProcessors());
pool.invoke(deepFileCopier);
System.out.println("With Fork-Join " + (System.currentTimeMillis() - startMillis));
long secondStartMillis = System.currentTimeMillis();
deepFileCopier.start = 0;
deepFileCopier.end = listOfPaths.size();
deepFileCopier.computeDirectly();
System.out.println("Without Fork-Join " + (System.currentTimeMillis() - secondStartMillis));
}
private static class CustomFileVisitor extends SimpleFileVisitor<Path> {
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException
{
if (file.toString().endsWith(".mp3")) {
listOfPaths.add(file);
}
return FileVisitResult.CONTINUE;
}
}
#Override
protected String compute() {
int length = end-start;
if(length < 4) {
return computeDirectly();
}
int split = length / 2;
final DeepFileCopier firstHalfCopier = new DeepFileCopier(start, start + split);
firstHalfCopier.fork();
final DeepFileCopier secondHalfCopier = new DeepFileCopier(start + split, end);
secondHalfCopier.compute();
firstHalfCopier.join();
return null;
}
private String computeDirectly() {
for(int index = start; index< end; index++) {
Path currentFile = listOfPaths.get(index);
System.out.println("Copying :: " + currentFile.getFileName());
Path targetDir = Paths.get("D:\\Fork-Join Test\\" + currentFile.getFileName());
try {
Files.copy(currentFile, targetDir, StandardCopyOption.REPLACE_EXISTING);
} catch (IOException e) {
e.printStackTrace();
}
}
return null;
}
private DeepFileCopier(int start, int end ) {
this.start = start;
this.end = end;
}
}
On comparing the performance I noticed -
With Fork-Join 149714
Without Fork-Join 146590
Am working on a Dual Core machine. I was expecting a 50% reduction in the work time but the portion with Fork-Join takes 3 seconds more than a single threaded approach. Please let me know if some thing is incorrect.

Your problem is not well suited to benefit from multithreading on normal systems. The execution time is spent copying all the files. But this is limited by your hard drive that will process the files in sequence.
If you run a more CPU intense task, you should note a difference. For test purposes you could try the following:
private String computeDirectly() {
Integer nonsense;
for(int index = start; index< end; index++) {
for( int j = 0; j < 1000000; j++ )
nonsense += index*j;
}
return nonsense.toString();
}
On my system (i5-2410M) this will print:
With Fork-Join 2628
Without Fork-Join 6421

Related

Parallel Processing using Multi Threading in Java

I am creating a Web App in which, I have to upload files by splitting them using parallel processing and multi threading and while downloading I have to combine them back to a single file using multi threading and parallel processing.
I want to combine split files into a single. But its not working as I expected to work.
The number of threads created is equal to the number of parts the file have been split.
And the threads should run parallelly and should run only once. But the threads are called several times. Help me fix the code.
UploadServlet.java
import java.util.Arrays;
import java.nio.charset.StandardCharsets;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import org.apache.commons.io.IOUtils;
import jakarta.servlet.*;
import jakarta.servlet.annotation.MultipartConfig;
import jakarta.servlet.http.HttpServlet;
import jakarta.servlet.http.HttpServletResponse;
import jakarta.servlet.http.Part;
import jakarta.servlet.http.HttpServletRequest;
import java.io.*;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
public class UploadServlet extends HttpServlet
{
private static final long serialVersionUID = 100L;
public static String fileName;
public static long size;
public static int noOfParts;
public static String type;
public static byte[] b;
private static final String INSERT_USERS_SQL = "INSERT INTO uploadlist" +
" (filename, filesize, noofparts) VALUES " +
"(?, ?, ?);";
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
Part file = request.getPart("file");
fileName=file.getSubmittedFileName();
type=file.getContentType();
PrintWriter writer=response.getWriter();
file.write(fileName);
String n = request.getParameter("parts");
size = file.getSize();
Integer temp1 = Integer.parseInt(n);
noOfParts = temp1.intValue();
set();
writer.println("File Uploaded Successfully");
file.delete();
}
public static void set()
{
Split.split(fileName,size,noOfParts);
try {
Connection c = DataBaseConnection.getConnection();
PreparedStatement preparedStatement = c.prepareStatement(INSERT_USERS_SQL);
preparedStatement.setString(1, fileName);
preparedStatement.setLong(2, size);
preparedStatement.setInt(3, noOfParts);
System.out.println(preparedStatement);
preparedStatement.executeUpdate();
} catch (Exception e)
{
e.printStackTrace();
}
}
}
From UploadServlet Split.split() is called to split the files into number of parts.
Split.java
import java.io.*;
import java.util.Arrays;
public class Split implements Runnable
{
int i;
long size;
int noOfParts;
String fileName;
Split()
{
fileName="";
}
Split(String fileName, int i, long size, int noOfParts)
{
this.fileName=fileName;
this.i=i;
this.size=size;
this.noOfParts=noOfParts;
}
public void run()
{
try
{
System.out.println(i);
RandomAccessFile in = new RandomAccessFile("D:\\temp\\"+fileName,"r");
int bytesPerSplit = (int)(size/noOfParts);
int remainingBytes = (int)(size % noOfParts);
byte[] b;
if(i!=noOfParts-1)
{
b = new byte[bytesPerSplit];
}
else
{
b = new byte[bytesPerSplit+remainingBytes];
}
in.seek((long)i*bytesPerSplit);
in.read(b);
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream("D:\\Upload\\"+fileName+i+".bin"));
for(byte temp : b)
out.write(temp);
out.close();
}
catch(IOException e)
{
e.printStackTrace();
}
}
public static void split(String fileName, long size, int noOfParts)
{
for(int i=0; i<noOfParts; i++)
{
Split obj = new Split(fileName,i,size,noOfParts);
Thread t = new Thread(obj);
t.start();
}
}
}
In this program, I split the files according to number of parts. And I want to combine them back using Parallel Processing and Multi Threading.
DownloadServlet.java\
import jakarta.servlet.http.HttpServlet;
import org.postgresql.Driver;
import java.sql.Statement;
import java.io.*;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.DriverManager;
import java.util.Arrays;
import jakarta.servlet.ServletContext;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
public class DownloadServlet extends HttpServlet
{
private static final long serialVersionUID = 1L;
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
{
String name = new String(request.getParameter("fileName"));
int noOfParts = Integer.parseInt(request.getParameter("parts"));
int size = Integer.parseInt(request.getParameter("size"));
File downloadFile = new File("D:\\Download\\"+name);
Combine.combine(name,noOfParts,size);
int length = (int)downloadFile.length();
String completeFile=name;
ServletContext context=getServletContext();
String mimeType = context.getMimeType(completeFile);
if (mimeType == null)
{
mimeType = "application/octet-stream";
}
response.setContentType(mimeType);
response.setContentLength((int)length);
String headerKey = "Content-Disposition";
String headerValue = String.format("attachment; filename=\"%s\"", completeFile);
response.setHeader(headerKey, headerValue);
OutputStream outStream = response.getOutputStream();
DataInputStream in = new DataInputStream(new FileInputStream(downloadFile));
byte[] buffer = new byte[(int)length];
while ((in != null) && ((length = in.read(buffer)) != -1))
{
outStream.write(buffer,0,length);
}
if ((length = in.read(buffer))== -1) {
outStream.write(buffer, 0, length);
}
Arrays.fill(buffer, (byte)0);
in.close();
outStream.flush();
outStream.close();
}
}
From DownloadServlet, Combine.combine() is called to combine the split parts into a single file.
Combine.java
import java.io.*;
import java.util.concurrent.TimeUnit;
import java.util.*;
import org.apache.commons.lang3.StringUtils;
import java.util.regex.*;
import java.util.Scanner;
import java.util.Arrays;
public class Combine implements Runnable
{
String name;
int size;
int noOfParts;
int i;
public static String root = "D:\\Upload\\";
Combine(String name,int noOfParts,int size, int i)
{
this.name = name;
this.noOfParts=noOfParts;
this.size=size;
this.i=i;
}
public void run()
{
try
{
System.out.println(i);
RandomAccessFile out = new RandomAccessFile("D:\\Download\\"+name,"rw");
int bytesPerSplit = size/noOfParts;
int remainingBytes = size%noOfParts;
String temp=name+i+".bin";
RandomAccessFile file = new RandomAccessFile(root+temp,"r");
long l=file.length();
byte[] b = new byte[(int)l];
file.read(b);
out.seek(i*bytesPerSplit);
out.write(b);
file.close();
out.close();
}
catch(IOException e)
{
e.printStackTrace();
}
}
public static void combine(String name, int noOfParts, int size)
{
for(int i=0; i<noOfParts; i++)
{
Combine obj = new Combine(name,noOfParts,size,i);
Thread t = new Thread(obj,"Thread"+i);
t.start();
}
}
}
I have attached the image in which the numbers represent the part of the file being read and combined using threads.
The output shows that the threads keeping on executing again and again.
I don't know where is the error or any logical mistake in my program.
Help me solve this problem.

How to make multithreads code with 2 loops

I trying to make my code faster with multi-threads.
I have 2 bigs arrays (in my code above I put small arrays for the example).
This is my code that I tryied, but its stuck and not make anything:
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.ArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
private ArrayList<String> blackList = new ArrayList<String>();
String[] cities = {
"London",
"Paris",
"Barcelona"
};
String[] cats = {
"Animals",
"Jobs",
"Env",
"B"
};
public static void main(String[] args) {
Main m = new Main();
m.run();
}
public void run() {
ExecutorService svc = Executors.newCachedThreadPool();
int chunks = Runtime.getRuntime().availableProcessors();
long iterationsCities = cities.length;
long iterationsCats = cats.length;
for (int i = 0; i < chunks; ++i) {
int startCities = (int) (iterationsCities / chunks * i);
int endCities = (int) (iterationsCities / chunks * (i + 1));
int startCats = (int) (iterationsCats / chunks * i);
int endCats = (int) (iterationsCats / chunks * (i + 1));
svc.execute(new Task(startCities, endCities, startCats, endCats));
}
}
public class Task implements Runnable {
int startCities;
int endCities;
int startCats;
int endCats;
public Task(int startCities, int endCities, int startCats, int endCats) {
this.startCities = startCities;
this.endCities = endCities;
this.startCats = startCats;
this.endCats = endCats;
}
public void launch() throws IOException, InterruptedException {
for(int i=startCities; i<endCities; i++)
{
for(int j=startCats; j<endCats; j++)
{
String link = "https://.../pro/search/" + cats[j] + "/" + cities[i];
//System.out.println(link);
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(link))
.GET()
.build();
HttpResponse<String> response = client.send(request,
HttpResponse.BodyHandlers.ofString());
//System.out.println(link);
String html = response.body();
Pattern pattern = Pattern.compile("data-count=\"([A-Za-z0-9_]*)\"");
Matcher matcher = pattern.matcher(html);
boolean matchFound = matcher.find();
if(matchFound)
{
int dataCount = Integer.parseInt(matcher.group(1));
if(dataCount == 0)
{
String l = cats[j] + "/" + cities[i];
if(!blackList.contains(l)) {
blackList.add(l);
}
}
}
}
}
}
#Override
public void run() {
try {
launch();
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
My code makes http request, gets value in the meta tag of the HTML and asks if the value equal to 0, if yes it saves the URL in the blacklist.

Pagination in Getting the File

I have a location where 3000 files is stored. But i want to get the list of 1000 files at a time and in next call another 1000 files and so on.
Please find my below code :
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class FileSystem {
public static void main(String args[]) throws Exception {
FileSystem.createListFile();
FileSystem.getFileInBatch();
}
private static void getFileInBatch() {
int MAX_INDEX= 1000;
try (Stream<Path> walk = Files.walk(Paths.get("C://FileTest"))) {
List<String> result = walk.filter(p -> Files.isRegularFile(p) && p.getFileName().toString().endsWith(".txt"))
.sorted(Comparator.comparingInt(FileSystem::pathToInt))
.map(x -> x.toString()).limit(MAX_INDEX).collect(Collectors.toList());
result.forEach(System.out::println);
System.out.println(result.size());
} catch (IOException e) {
e.printStackTrace();
}
}
private static int pathToInt(final Path path) {
return Integer.parseInt(path.getFileName()
.toString()
.replaceAll("Aamir(\\d+).txt", "$1")
);
}
private static void createListFile() throws IOException {
for (int i = 0; i < 3000; i++) {
File file = new File("C://FileTest/Aamir" + i + ".txt");
if (file.createNewFile()) {
System.out.println(file.getName() + " is created!");
}
}
}
}
I am able to get the first 1000 (Aamir0.txt to Aamir999.txt) files using the limit in streams.
Now how can i get the next 1000 files ( Aamir1000.txt to Aamir1999.txt)
You can use skip in your Stream. For example:
int toSkip = 1000; // define as method param/etc.
List<String> result = walk.filter(p -> Files.isRegularFile(p) && p.getFileName().toString().endsWith(".txt"))
.sorted(Comparator.comparingInt(FileSystem::pathToInt))
.map(x -> x.toString()).skip(toSkip).limit(MAX_INDEX).collect(Collectors.toList());

Streams performance difference

Performance difference on 2 different streams executions
I try to do the same operation for default parallel stream and using custom ForkJoin pool.
I see huge performance difference for the same operation.
94 ms vs ~5341 ms (Time1 and Time2 are almost the same - so I don't blame awaitQuiescence here)
What can be a reason ? Tricky java intrinsic ?
public final class SharedForkJoinExecutor {
private static final Logger LOGGER = LoggerFactory.getLogger(SharedForkJoinExecutor.class);
private static final ForkJoinPool EXEC = new ForkJoinPool(ForkJoinPool.commonPool().getParallelism(),
pool -> {
final ForkJoinWorkerThread aThread = ForkJoinPool.defaultForkJoinWorkerThreadFactory.newThread(pool);
aThread.setName("ForkJoin-Executor-" + aThread.getPoolIndex());
return aThread;
},
(t, e) -> LOGGER.info(e.getMessage(), e),
true);
/**
* Shuts down this executor
*/
public static void shutdown() {
EXEC.shutdown();
}
public static ForkJoinPool get() {
return EXEC;
}
}
package com.stream;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.IntStream;
import com.stream.SharedForkJoinExecutor;
import org.junit.Test;
import static junit.framework.TestCase.assertEquals;
public class ForkJoinTest {
private static final int INT_NUMBERS = 1_000_000;
#Test
public void forEachIntTest() {
final AtomicInteger aEvenCounter = new AtomicInteger(0);
final AtomicInteger aAllCounter = new AtomicInteger(0);
long t = System.currentTimeMillis();
IntStream.range(0, INT_NUMBERS).parallel().forEach(theIndex -> {
if (theIndex % 2 == 0) {
aEvenCounter.incrementAndGet();
}
aAllCounter.incrementAndGet();
});
System.out.println("Time=" + (System.currentTimeMillis() - t));
assertEquals(INT_NUMBERS / 2, aEvenCounter.get());
assertEquals(INT_NUMBERS, aAllCounter.get());
aEvenCounter.set(0);
aAllCounter.set(0);
t = System.currentTimeMillis();
SharedForkJoinExecutor.get().execute(() -> IntStream.range(0, INT_NUMBERS).parallel().forEach(theIndex -> {
if (theIndex % 2 == 0) {
aEvenCounter.incrementAndGet();
}
aAllCounter.incrementAndGet();
}));
System.out.println("Time1=" + (System.currentTimeMillis() - t));
SharedForkJoinExecutor.get().awaitQuiescence(10, TimeUnit.HOURS);
System.out.println("Time2=" + (System.currentTimeMillis() - t));
assertEquals(INT_NUMBERS / 2, aEvenCounter.get());
assertEquals(INT_NUMBERS, aAllCounter.get());
}
}

Multithreaded test to test response time of sites/web services

Below code tests the response time of reading www.google.com into a BufferedReader. I plan on using this code to test the response times of other sites and web services within intranet. Below tests runs for 20 seconds and opens 4 requests per second :
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.Map.Entry;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import org.junit.Test;
public class ResponseTimeTest {
private static final int NUMBER_REQUESTS_PER_SECOND = 4;
private static final int TEST_EXECUTION_TIME = 20000;
private static final ConcurrentHashMap<Long, Long> timingMap = new ConcurrentHashMap<Long, Long>();
#Test
public void testResponseTime() throws InterruptedException {
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(10);
scheduler.scheduleAtFixedRate(new RequestThreadCreator(), 0, 1, TimeUnit.SECONDS);
Thread.sleep(TEST_EXECUTION_TIME);
System.out.println("Start Time, End Time, Total Time");
for (Entry<Long, Long> entry : timingMap.entrySet())
{
System.out.println(entry.getKey() + "," + entry.getValue() +","+(entry.getValue() - entry.getKey()));
}
}
private final class RequestThreadCreator implements Runnable {
public void run() {
ExecutorService es = Executors.newCachedThreadPool();
for (int i = 1; i <= NUMBER_REQUESTS_PER_SECOND; i++) {
es.execute(new RequestThread());
}
es.shutdown();
}
}
private final class RequestThread implements Runnable {
public void run() {
long startTime = System.currentTimeMillis();
try {
URL oracle = new URL("http://www.google.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
while ((in.readLine()) != null) {
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
long endTime = System.currentTimeMillis();
timingMap.put(startTime, endTime);
}
}
}
The output is :
Start Time, End Time, Total Time
1417692221531,1417692221956,425
1417692213530,1417692213869,339
1417692224530,1417692224983,453
1417692210534,1417692210899,365
1417692214530,1417692214957,427
1417692220530,1417692221041,511
1417692209530,1417692209949,419
1417692215532,1417692215950,418
1417692214533,1417692215075,542
1417692213531,1417692213897,366
1417692212530,1417692212924,394
1417692219530,1417692219897,367
1417692226532,1417692226876,344
1417692211530,1417692211955,425
1417692209529,1417692209987,458
1417692222531,1417692222967,436
1417692215533,1417692215904,371
1417692219531,1417692219954,423
1417692215530,1417692215870,340
1417692217531,1417692218035,504
1417692207547,1417692207882,335
1417692208535,1417692208898,363
1417692207544,1417692208095,551
1417692208537,1417692208958,421
1417692226533,1417692226899,366
1417692224531,1417692224951,420
1417692225529,1417692225957,428
1417692216530,1417692216963,433
1417692223541,1417692223884,343
1417692223546,1417692223959,413
1417692222530,1417692222954,424
1417692208532,1417692208871,339
1417692207536,1417692207988,452
1417692226538,1417692226955,417
1417692220531,1417692220992,461
1417692209531,1417692209953,422
1417692226531,1417692226959,428
1417692217532,1417692217944,412
1417692210533,1417692210964,431
1417692221530,1417692221870,340
1417692216531,1417692216959,428
1417692207535,1417692208021,486
1417692223548,1417692223957,409
1417692216532,1417692216904,372
1417692214535,1417692215071,536
1417692217530,1417692217835,305
1417692213529,1417692213954,425
1417692210531,1417692210964,433
1417692212529,1417692212993,464
1417692213532,1417692213954,422
1417692215531,1417692215957,426
1417692210529,1417692210868,339
1417692218531,1417692219102,571
1417692225530,1417692225907,377
1417692208536,1417692208966,430
1417692218533,1417692219168,635
As System.out.println is synchronized in order to not skew results I add the timings to a ConcurrentHashMap and do not output the timings within the RequestThread itself. Are other gotcha's I should be aware of in above code so as to not skew the results. Or are there area's I should concentrate on in order to improve the accuracy or is it accurate "enough", by enough accurate to approx 100 millliseconds.

Categories

Resources