I am writing a lots of files like bellow.
public void call(Iterator<Tuple2<Text, BytesWritable>> arg0)
throws Exception {
// TODO Auto-generated method stub
while (arg0.hasNext()) {
Tuple2<Text, BytesWritable> tuple2 = arg0.next();
System.out.println(tuple2._1().toString());
PrintWriter writer = new PrintWriter("/home/suv/junk/sparkOutPut/"+tuple2._1().toString(), "UTF-8");
writer.println(new String(tuple2._2().getBytes()));
writer.close();
}
}
Is there any better way to write the files..without closing or creating printwriter every time.
There is no significantly better way to write lots of files. What you are doing is inherently I/O intensive.
UPDATE - #Michael Anderson is right, I think. Using multiple threads to write the files (probably) will speed things up considerably. However, the I/O is still going to be the ultimate bottleneck from a couple of respects:
Creating, opening and closing files involves file & directory metadata access and update. This entails non-trivial CPU.
The file data and metadata changes need to be written to disc. That is possibly multiple disc writes.
There are at least 3 syscalls for each file written.
Then there are thread stitching overheads.
Unless the quantity of data written to each file is significant (multiple kilobytes per file), I doubt that the techniques like using NIO, direct buffers, JNI and so on will be worthwhile. The real bottlenecks will be in the kernel: file system operations and low-level disk I/O.
... without closing or creating printwriter every time.
No. You need to create a new PrintWriter ( or Writer or OutputStream ) for each file.
However, this ...
writer.println(new String(tuple2._2().getBytes()));
... looks rather peculiar. You appear to be:
calling getBytes() on a String (?),
converting the byte array to a String
calling the println() method on the String which will copy it, and the convert it back into bytes before finally outputting them.
What gives? What is the point of the String -> bytes -> String conversion?
I'd just do this:
writer.println(tuple2._2());
This should be faster, though I wouldn't expect the percentage speed-up to be that large.
I'm assuming you're after the fastest way. Because everyone knows fastest is best ;)
One simple way is to use a bunch of threads to do your writing for you.
However you're not going to get much benefit by doing this unless your filesystem scales well. (I use this technique on Luster based cluster systems, and in cases where "lots of files" could mean 10k - in this case many of the writes will be going to different servers / disks)
The code would look something like this: (Note I think this version is not right as for small numbers of files this fills the work queue - but see the next version for the better version anyway...)
public void call(Iterator<Tuple2<Text, BytesWritable>> arg0) throws Exception {
int nThreads=5;
ExecutorService threadPool = Executors.newFixedThreadPool(nThreads);
ExecutorCompletionService<Void> ecs = new ExecutorCompletionService<>(threadPool);
int nJobs = 0;
while (arg0.hasNext()) {
++nJobs;
final Tuple2<Text, BytesWritable> tuple2 = arg0.next();
ecs.submit(new Callable<Void>() {
#Override Void call() {
System.out.println(tuple2._1().toString());
String path = "/home/suv/junk/sparkOutPut/"+tuple2._1().toString();
try(PrintWriter writer = new PrintWriter(path, "UTF-8") ) {
writer.println(new String(tuple2._2().getBytes()))
}
return null;
}
});
}
for(int i=0; i<nJobs; ++i) {
ecs.take().get();
}
}
Better yet is to start writing your files as soon as you have data for the first one, not when you've got data for all of them - and for this writing to not block the calculation thread(s).
To do this you split your application into several pieces communicating over a (thread safe) queue.
Code then ends up looking more like this:
public void main() {
SomeMultithreadedQueue<Data> queue = ...;
int nGeneratorThreads=1;
int nWriterThreads=5;
int nThreads = nGeneratorThreads + nWriterThreads;
ExecutorService threadPool = Executors.newFixedThreadPool(nThreads);
ExecutorCompletionService<Void> ecs = new ExecutorCompletionService<>(threadPool);
AtomicInteger completedGenerators = new AtomicInteger(0);
// Start some generator threads.
for(int i=0; ++i; i<nGeneratorThreads) {
ecs.submit( () -> {
while(...) {
Data d = ... ;
queue.push(d);
}
if(completedGenerators.incrementAndGet()==nGeneratorThreads) {
queue.push(null);
}
return null;
});
}
// Start some writer threads
for(int i=0; i<nWriterThreads; ++i) {
ecs.submit( () -> {
Data d
while((d = queue.take())!=null) {
String path = data.path();
try(PrintWriter writer = new PrintWriter(path, "UTF-8") ) {
writer.println(new String(data.getBytes()));
}
return null;
}
});
}
for(int i=0; i<nThreads; ++i) {
ecs.take().get();
}
}
Note I've not provided an implementation of the queue class you can easily wrap the standard java threadsafe ones to get what you need.
There's still lots more that can be done to reduce latency, etc - heres some of the further things I've used to get the times down ...
don't even wait for all the data to be generated for a given file. Pass another queue containing packets of bytes to write.
Watch out for allocations - you can reuse some of your buffers.
There's some latency in the nio stuff - you can get some performance improvements by using C writes and JNI and direct buffers.
Thread switching can hurt, and the latency in the queues can hurt, so you might want to batch up your data slightly. Balancing this with 1 can be tricky.
Related
I am creating a RandomAccessFile object to write to a file (on SSD) by multiple threads. Each thread tries to write a direct byte buffer at a specific position within the file and I ensure that the position at which a thread writes won't overlap with another thread:
file_.getChannel().write(buffer, position);
where file_ is an instance of RandomAccessFile and buffer is a direct byte buffer.
For the RandomAccessFile object, since I'm not using fallocate to allocate the file, and the file's length is changing, will this utilize the concurrency of the underlying media?
If it is not, is there any point in using the above function without calling fallocate while creating the file?
I made some testing with the following code :
public class App {
public static CountDownLatch latch;
public static void main(String[] args) throws InterruptedException, IOException {
File f = new File("test.txt");
RandomAccessFile file = new RandomAccessFile("test.txt", "rw");
latch = new CountDownLatch(5);
for (int i = 0; i < 5; i++) {
Thread t = new Thread(new WritingThread(i, (long) i * 10, file.getChannel()));
t.start();
}
latch.await();
file.close();
InputStream fileR = new FileInputStream("test.txt");
byte[] bytes = IOUtils.toByteArray(fileR);
for (int i = 0; i < bytes.length; i++) {
System.out.println(bytes[i]);
}
}
public static class WritingThread implements Runnable {
private long startPosition = 0;
private FileChannel channel;
private int id;
public WritingThread(int id, long startPosition, FileChannel channel) {
super();
this.startPosition = startPosition;
this.channel = channel;
this.id = id;
}
private ByteBuffer generateStaticBytes() {
ByteBuffer buf = ByteBuffer.allocate(10);
byte[] b = new byte[10];
for (int i = 0; i < 10; i++) {
b[i] = (byte) (this.id * 10 + i);
}
buf.put(b);
buf.flip();
return buf;
}
#Override
public void run() {
Random r = new Random();
while (r.nextInt(100) != 50) {
try {
System.out.println("Thread " + id + " is Writing");
this.channel.write(this.generateStaticBytes(), this.startPosition);
this.startPosition += 10;
} catch (IOException e) {
e.printStackTrace();
}
}
latch.countDown();
}
}
}
So far what I've seen:
Windows 7 (NTFS partition): Run linearly (aka one thread writes and when it is over, another one gets to run)
Linux Parrot 4.8.15 (ext4 partition) (Debian based distro), with Linux Kernel 4.8.0: Threads intermingle during the execution
Again as the documentation says:
File channels are safe for use by multiple concurrent threads. The
close method may be invoked at any time, as specified by the Channel
interface. Only one operation that involves the channel's position or
can change its file's size may be in progress at any given time;
attempts to initiate a second such operation while the first is still
in progress will block until the first operation completes. Other
operations, in particular those that take an explicit position, may
proceed concurrently; whether they in fact do so is dependent upon the
underlying implementation and is therefore unspecified.
So I'd suggest to first give it a try and see if the OS(es) you are going to deploy your code to (possibly the filesystem type) support parallel execution of a FileChannel.write call
Edit: As pointed out to, the above does not mean that threads can write concurrently to the file, it is actually the opposite as the write call behave according to the contract of a WritableByteChannel which clearly specifies that only one thread at a time can write to a given file:
If one thread initiates a write operation upon a channel then any
other thread that attempts to initiate another write operation will
block until the first operation is complete
As the documentation states and Adonis already mentions this, a write can only be performed by one thread at a time. You won't achieve performance gains through concurreny, moreover, you should only worry about performance if it's an actual issue, because writing concurrently to a disk may actually degrade your performance (probably less for SSDs than HDDs).
The underlying media is in most cases (SSD, HDD, Network) single-threaded - actually, there is no such thing as a thread on hardware level, threads are nothing but an abstraction.
In your case the media is an SSD.
While the SSD internally may write data to multiple modules concurrently ( they may reach a level of parallism where writes may be as fast and even outperform a read), the internal mapping datastructures are shared resource and therefore contended, especially on frequent updates such as concurrent writes. Nevertheless, the updates of this datastructure is quite fast and therefore nothing to worry about unless it becomes a problem.
But apart from this, those are just internals of the SSD. On the outside you communicate over a Serial ATA interface, thus one-byte-at-a-time (actually packets in a Frame Information Structure, FIS). On top of this is a OS/Filesystem that again has a probably contended datastructure and/or applies their own means of optimization such as write-behind-caching.
Further, as you know what your media is, you may optimize especially for that and SSDs are really fast when one single threads writes a large piece of data.
Thus, instead of using multiple threads for writing, you may create a large In-Memory Buffer (probably consider a memory-mapped file) and write concurrently into this buffer. The memory itself is not contended, as long as you ensure each thread access it's own address space of the buffer. Once all threads are done, you write this one buffer to the SSD (not needed if using memory-mapped file).
See also this good summary about developing for SSDs:
A Summary – What every programmer should know about solid-state drives
The point for doing pre-allocation (or to be more precise, file_.setLength(), which acutally maps to ftruncate) is that the resizing of the file may use extra-cycles and you may wan't to avoid that. But again, this may depend on the OS/Filesystem.
In Java (or clojure) I would like to spin up an external process and consume its stdout as a stream. Ideally, I would like to consume the process' output stream every time that the external process flushes it, but am not sure how that can be accomplished, and how it can be accomplished without blocking.
Going around consuming a Java ProcessPipeInputStream for a shelled out process (for example a Unix ProcessPipeInputStream), I find the inherited InputStream methods a bit low-level to work with, and am not sure if there's a non-blocking way to consume from the stream every time the producer-side flushes or otherwise in a non-blocking fashion.
Many code examples block on the output stream in an infinite loop, thereby hogging a thread for the listening. My hope is this blocking behavior can be avoided altogether.
Bottom line:
Is there a non-blocking way to be notified on an input stream, every time that the producing side of it flushes?
You need to create a separate Thread that would consume from such a stream allowing the rest of your program to do whatever is meant to be do doing in parallel.
class ProcessOutputReader implements Runnable {
private InputStream processOutput;
public ProcessOutputReader(final InputStream processOutput) {
this.processOutput = processOutput;
}
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
// do whatever you need to do byte-by-byte.
processByte(nextByte);
}
}
}
class Main {
public static void main(final String[] args) {
final Process proc = ...;
final ProcessOutputReader reader = new ProcessOutputReader(proc.getInputStream());
final Thread processOutputReaderThread = new Thread(reader);
processOutputReaderThread.setDaemon(true); // allow the VM to terminate if this is the only thread still active.
processOutputReaderThread.start();
...
// if you wanna wait for the whole process output to be processed at some point you can do this:
try {
processOutputReaderThread.join();
} catch (final InterruptedException ex) {
// you need to decide how to recover from if your wait was interrupted.
}
}
}
If instead of processing byte-by-byte you want to deal with each flush as a single piece... I'm not sure there is 100% guaranteed to be able tocapture each process flush. After all the process own's IO framework software (Java, C, Python, etc.) may process the "flush" operation differently and perhaps what you end up receiving is multiple blocks of bytes for any given flush in that external process.
In any case you can attempt to do that by using the InputStream's available method like so:
#Override
public void run() {
int nextByte;
while ((nextByte = processOutput.read()) != -1) {
final int available = processOutput.available();
byte[] block = new byte[available + 1];
block[0] = nextByte;
final int actuallyAvailable = processOutput.read(block, 1, available);
if (actuallyAvailable < available) {
if (actuallyAvailable == -1) {
block = new byte[] { nextByte };
} else {
block = Arrays.copyOf(block, actuallyAvailable + 1);
}
}
// do whatever you need to do on that block now.
processBlock(block);
}
}
I'm not 100% sure of this but I think that one cannot trust that available will return a guaranteed lower bound of the number of bytes that you can retrieve without being block nor that the next read operation is going to return that number of available bytes if so requested; that is why the code above checks on the actual number of bytes read (actuallyAvailable).
I have a Spring batch job consisting of a partitioned step and partitioned step is doing processing in chunks.
Can I further launch new threads ( implementing Runnable ) from method, public void write(List<? extends VO> itemsToWrite)?
Basically, writer here writes indices using Lucene and since writer has a List of chunk-size items, I thought to divide that List into segments and pass each segment to a new Runnable.
Is that a good approach?
I coded a sample and it works most of the times but gets stuck few times.
Are there any thing that I need to worry about? OR is there something inbuilt in spring batch to achieve this?
I don't want write to happen by a single thread for whole chunk. I wish to further divide up chunk.
Lucene IndexWriter is thread safe and a approach is listed here
Sample Code - Writer gets a List of items for which I open threads from thread pool? Will there be any concern even if I wait for pool to terminate for a chunk,
#Override
public void write(List<? extends IndexerInputVO> inputItems) throws Exception {
int docsPerThread = Constants.NUMBER_OF_DOCS_PER_INDEX_WRITER_THREADS;
int docSize = inputItems.size();
int remainder = docSize%docsPerThread;
int poolSize = docSize/docsPerThread;
ExecutorService executor = Executors.newFixedThreadPool(poolSize+1);
int fromIndex=0;
int toIndex = docsPerThread;
if(docSize < docsPerThread){
executor.submit(new IndexWriterRunnable(this.luceneObjects,service,inputItems));
}else{
for(int i=1;i<=poolSize;i++){
executor.submit(new IndexWriterRunnable(this.luceneObjects,service,inputItems.subList(fromIndex, toIndex)));
fromIndex+=docsPerThread;
toIndex+=docsPerThread;
}
if(remainder != 0){
toIndex=docSize;
executor.submit(new IndexWriterRunnable(this.luceneObjects,service,inputItems.subList(fromIndex, toIndex)));
}
}
executor.shutdown();
while(executor.isTerminated()){
;
}
I'm not sure that launching new Threads in writer it's the good idea.
These threads are out of scope of spring batch framework, so you will need to implement shutdown and cancellation policy for above. If processing of one segments will fail it can lead to fail entire queue.
As alternate approach I can suggest to promote your custom segments of list from writer to next step as described in official docs passingDataToFutureSteps
I have a java method which writes content into a text file with values de-limited by | symbol. Contents needs to be picked up from 10 tables depending on conditions and write into file. Currently am doing the following method. Could anyone please suggest a better alternate approach for doing this requirement. Does this method have performance bottleneck
public static void createFile()
{
queryFromTable1
whileLoopForqueryFromTable1
{
writer.write(value1+"|"+value2+"|".....)
}
queryFromTable2
whileLoopForqueryFromTable2
{
writer.write("||||"+value4+"|".....)
}
queryFromTable2
whileLoopForqueryFromTable2
{
writer.write("||"+value5+"|".....)
}
}
There is no performance bottleneck in this pseudo code.
If you are using a BufferedWriter I think it's ok to call write many times. You just have to not forget to close the writer at the end of treatments.
Now we don't know what is behind your DB query. Maybe they can be optimized. Do you use Prepared Statements?
You can create separate method for extracting data from tables
private List<String> getDataFromTable("select * from table1"){...}
The obvious bottleneck is string concatenation value1+"|"+value2+"|". You'd better use single write for each element
for(int i=0;i<tableData.size();i++){
String str = tableData.get(i);
if(checkPassed(str)){
writer.write(str);
// don't print last |
if(i<tableData.size()-1)writer.write(DELIMITER); // private static final String DELIMITER = "|";
}
}
More information will allow us to give better answer.
Try breaking down to several methods, below is pseudo code:
void createFile() {
writeTo(out, query1);
writeTo(out, query2);
writeTo(out, query3);
....
}
void writeTo(out, query) {
execute query
loop(){
out.write(...)
}
}
A possible bottleneck is if any of the queries are slow, in that case all of the remaining queries have to wait for the earlier to complete. Another possible bottleneck is that the algorithm is single threaded.
To solve that a solution would be to do the reads in parallel and process the writing in multiple writers. When all of that has been completed, simply merge the writers in the correct order (single threaded). The Java 8 class CompletableFuture provides some nice features that can be used here. Simply create some futures that completes and merge the output.
Check out the CompletableFuture JavaDocs for more info.
An example of the algorithm could be something like code below. Please note that this is simply an example and not a full-fledged solution. The use of the StringWriter is just for convencience and is just one way of handling the data.
public class AlgorithmTest {
public static void main(String[] args)
throws IOException, ExecutionException, InterruptedException {
// Setup async processing of task 1
final CompletableFuture<String> q1 = CompletableFuture.supplyAsync(() -> {
// Setup result data
StringWriter result = new StringWriter();
// execute query 1
// Process result
result.write("result from 1");
// Return the result (can of course be handled in other ways).
return result.toString();
});
// Setup async processing of task 2
final CompletableFuture<String> q2 = CompletableFuture.supplyAsync(() -> {
// Setup result data
StringWriter result = new StringWriter();
// execute query 2
// Process result
result.write("result from 2");
// Return the result (can of course be handled in other ways).
return result.toString();
});
// Write the whole thing to file (i.e. merge the streams)
final Path path = new File("result.txt").toPath();
Files.write(path, Arrays.asList(q1.get(), q2.get()));
}
}
I am reading a 77MB file inside a Servlet, in future this will be 150GB. This file is not written using any kind of nio package thing, it is just written using BufferedWriter.
Now this is what I need to do.
Read the file line by line. Each line is a "hash code" of a text. Separate it into pieces of 3 chars (3 chars represents 1 word) It could be long, it could be short, I don't know.
After reading the line, convert it into real words. We have a Map of words and Hashes so we can find the words.
Up to now, I used BufferedReader to read the file. It is slow and not good for Huge files like 150GB. It took hours to complete the entire process even for this 77MB file. Because we can't keep the user waiting for hours, it should be within seconds. So, we decided to load the file into the memory. First we thought about loadng every single line into a LinkedList, so the memory coulkd save it. But you know, memory cannot save such a big amount. After a Big Search, I decided Mapping Files to the memory would be the answer. Memory is super faster than the Disk, so we could read the files super fast too.
Code:
public class MapRead {
public MapRead()
{
try {
File file = new File("E:/Amazon HashFile/Hash.txt");
FileChannel c = new RandomAccessFile(file,"r").getChannel();
MappedByteBuffer buffer = c.map(FileChannel.MapMode.READ_ONLY, 0,c.size()).load();
for(int i=0;i<buffer.limit();i++)
{
System.out.println((char)buffer.get());
}
System.out.println(buffer.isLoaded());
System.out.println(buffer.capacity());
} catch (IOException ex) {
Logger.getLogger(MapRead.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
But I could not see any "super fast" thing. And I need line by line. I have few questions to ask.
You read my description and you know what I need to do. I have done the first step for that, so is that correct?
The way I Map is correct? I mean, this is no difference than reading it in normal way. So does this hold the "entire" file in memory first? (lets say using a technique called Mapping) Then we have to write another code to access that memory?
How to read line by line, in super "fast" ? (If I have to load/map the entire file to the memory first for hours, then access it in super speed in seconds, I am totally fine with it too)
Reading files in Servlets is good ? (Because it is being accessed by number of people, and only one IO stream will be opened at once. In this case this servlet will be accessed by thousands at once)
Update
This is how my code look when I updated it with SO user Luiggi Mendoza's answer.
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
#Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
System.out.println(line); //This is not happening
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public class BigFileReader implements Runnable {
private final String fileName;
int a = 0;
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
#Override
public void run() {
try {
//Scanner do not work. I had to use BufferedReader
BufferedReader br = new BufferedReader(new FileReader(new File("E:/Amazon HashFile/Hash.txt")));
String str = "";
while((str=br.readLine())!=null)
{
// System.out.println(a);
a++;
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
}
}
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
BigFileWholeProcessor b = new BigFileWholeProcessor ();
b.processFile("E:/Amazon HashFile/Hash.txt");
}
}
I am trying to print the file in BigFileProcessor. What I understood is this;
User enter file name
That file get read by BigFileReader, line by line
After each line, the BigFileProcessor get called. Which means, assume BigFileReader read the first line. Now the BigFileProcessor is called. Now once the BigFileProcessor completes the processing for that line, now the BigFileReader reads the line 2. Then again the BigFileProcessor get called for that line, and so on.
May be my understanding about this code is incorrect. How should I process the line anyway?
I would suggest using multi thread here:
One thread will take care to read every line of the file and insert it into a BlockingQueue in order to be processed.
Another thread(s) will take the elements from this queue and process them.
To implement this multi thread work, it would be better using ExecutorService interface and passing Runnable instances, each should implement each task. Remember to have only a single task to read the file.
You could also manage a way to stop reading if the queue has a specific size e.g. if the queue has 10000 elements then wait until its size is down to 8000, then continue reading and filling the queue.
Reading files in Servlets is good ?
I would recommend never do heavy work in servlet. Instead, fire an asynchronous task e.g. via JMS call, then in this external agent you will process your file.
A brief sample of the above explanation to solve the problem:
public class BigFileReader implements Runnable {
private final String fileName;
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
#Override
public void run() {
//since it is a sample, I avoid the manage of how many lines you have read
//and that stuff, but it should not be complicated to accomplish
Scanner scanner = new Scanner(new File(fileName));
while (scanner.hasNext()) {
try {
linesRead.put(scanner.nextLine());
} catch (InterruptedException ie) {
//handle the exception...
ie.printStackTrace();
}
}
scanner.close();
}
}
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
#Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
//do what you want/need to process this line...
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
}
}
NIO won't help you here. BufferedReader is not slow. If you're I/O bound, you're I/O bound -- get faster I/O.
Mapping the file in to memory can help, but only if you're actually using the memory in place, rather than just copying all of the data out of the big byte array that you get back. The primary advantage of mapping the file is that it keeps the data out of the java heap, and away from the garbage collector.
Your best performance will come from working on the data in place, and not copying it in to the heap if you can.
Some of your performance may be impacted by the object creation. For example, if you were trying to load your data in to the LinkedList, you're creating (likely) millions of nodes for the List itself, plus the object surrounding your data (even if they're just strings).
Creating Strings based on your memory mapped array can be quite efficient, as the String will simply wrap the data, not copy it. But you'll have to be UTF aware if you're working with something other than ASCII (as bytes are not characters in Java).
Also if you're loading in large things, with lots of objects, ensure that you have free space in your heap for them. And by free space, I mean actual room. You can have a 500MB heap, as specified by -Xmx, but the ACTUAL heap will not be that large initially, it will grow to that limit.
Assuming you have sufficient memory in the first place, you can do this via -Xms, which will pre-allocate the heap to a desired size, or you can simply do a quick byte[] buf = new byte[400 * 1024 * 1024], to make a huge allocation, force the GC, and stretch the heap.
What you don't want to be doing is allocating a million objects and have the VM GC every 10000 or so as it grows. Pre-allocating other data structures is also helpful (notably ArrayLists, LinkedLists not so much).
Divide the file into smaller parts. For this you'll need have access to seekable read so you can fast-forward to other parts of file.
For each part, spawn multiple worker threads, each with its own copy of the hash lookup table. Let completed threads join a collector thread, which will write completed chunks in order and signal the processing completion.
It will be better to stream file chunks rather than loading all of them in memory.