I have a large sized file. Each line in that file maps to a database record. So I need to read that file line by line and persist each record into the database. Suppose I use multiple threads to read that file.
Is there a way in java wherein one thread can read lines from line number 1...50 and other thread reads lines from line number 51..100. In the similar way I will have multiple threads which will be reading from that single file.
Related
I have one file which contains 100 messages,one message in one line. I have 10 threads and each thread should pick one message from file and sent it to given address. Message should not be sent duplicate by any thread. Here i have 10 threads, so 1 thread should be responsible for sending 10 messages.
Normally people use CSV Data Set Config for this form of parameterization.
Add CSV Data Set Config to your Test Plan
Configure it as follows:
That's it, now you can refer the line from CSV as ${message} where required, each user will read its own line, no duplicates, when all lines are read - test will end.
Another option is to use __StringFromFile() function, however in this case the test will not stop, you will have to worry about setting the number of iterations yourself. Also __StringFromFile() function keeps the whole file in memory so it is not suitable for large data sets.
How can I read a file in Java using multithreading?
It doesn't matter if it's slower than using once, I have to do it.
So, for example, if there are 2 threads, the first reads the first line and, at the same time, the second reads the second line; then the first reads the third line and the second reads the fourth line and they continue reading in this way since the end of the file. How can I implement this in Java?
Just use a single BufferedReader that is shared between the threads, and synchronize on it when calling readLine().
It is completely pointless.
Suppose I have 5 threads, and already pre assigned line numbers in a file associated to threads, for example thread 1 writes to line no. 1 to 5 , thread 2 writes to 6 to 10 ...thread 5 writes to 21 to 25. I want to write a Java program that can do this in a parallel way(i.e not locking the file by a single thread)
Your best bet here will be to use NIO API's that are available in Java. Specifically you will be interested in FileChannel
Just to quote the documentation here , in the interest of whats important
A file channel is a SeekableByteChannel that is connected to a file.
It has a current position within its file which can be both queried
and modified. The file itself contains a variable-length sequence of
bytes that can be read and written and whose current size can be
queried.
A region of a file may be locked against access by other programs.
File channels are safe for use by multiple concurrent threads. The
close method may be invoked at any time, as specified by the Channel
interface.
I want to load the data into PostgreSQL table using java program.
In the java program I want to use multi threading and I use the COPY command API
CopyManager copyManager = ((PGConnection)conn).getCopyAPI();
FileReader fileReader = new FileReader(filename);
copyManager.copyIn("COPY "+tblname+ " FROM STDIN DELIMITER ',' ", fileReader);
I have divided the file into 'n' parts and then each thread load that part into PostgreSQL table.
Example:
file:test.csv
Thread=3
then test1.csv,test2.csv,test3.csv is created. and each thread execute each file.
it works correctly.
Now I want to check the data loading performance using multi threading of single file(not divide the file into n no of file)
Example:
file: test.csv(contain 30000 records)
thread =3(1st thread execute the 1st 10000 rows, 2nd thread execute the 1000120000 and 3rd thread execute the 20001-30000);
three thread divide the content of file and load the data into Postgres using COPY.
Is it possible to divide the records of file and load into Postgres using java multithreading?
Writing something to a storage will prevent all other threads from writing to the same. You cannot simply make everything multithreaded.
In this case all other parts Need to wait for the previous one to finish.
now i am working on a job about data format transform.
there is a large file, like 10GB, the current solution i implemented is read this file line by line, transform the format for each line, then output to a output file. i found the transform process is a bottle neck. so i am trying to do this in a concurrent way.
Each line is a complete unit, has nothing to do with other lines. Some lines may be discarded as some specific value in the line do not meet the demand.
now i have two plans:
one thread read data line by line from input file, then put the line into a queue, several threads get lines from the queue, transform the format, then put the line into a output queue, finally an output thread reads lines from the output queue and writes to a output file.
several threads currently read data from different part of the input file, then process the line and output to a file through a output queue or file lock.
would you guys please give me some advise ? i really appreciate it.
thanks in advance!
I would go for the first option ... reading data from a file in small pieces normally is slower than reading the whole file at once (depending on file caches/buffering/read ahead etc).
You also might need to think about a way to create the output file (acquiring all lines from the different processes, possibly in the correct order if needed).
Solution 1 makes sense.
This would also map nicely and simply to Java's Executor framework. Your main thread reads lines and submits each line to an Executor or ExecutorService.
It gets more complicated if you must keep order intact, though.