For content type "text/plain" which of the following is more efficient if I have to send huge data.
ServletOutputStream sos = response.getOutputStream();
sos.write(byte[])
//or
sos.println("")
Thanks
That depends on the format you have your source data in.
If it's a String, you're likely going to get better performance using response.getPrintWriter().print() - and it's most certainly going to be safer as far as encoding is concerned.
If it's a byte array then ServletOutputStream.write(byte[]) is likely the fastest as it won't do any additional conversions.
The real answer, however, to this and all other "which is faster" questions is - measure it :-)
After quickly looking at Sun's implementation of both OutputStream.write(byte[]) and ServletOutputStream.println(String), I'd say there's no real difference. But as ChssPly76 put it, that can only be verified by measuring it.
The most efficient is writing an InputStream (which is NOT in flavor of a ByteArrayInputStream).
Simply because each byte of a byte[] eats exactly one byte of JVM's memory and each character of a String eats that amount of bytes from JVM's memory the character takes in space. So imagine you have 128MB of available heap memory and the "huge" text/plain file is 1.28MB in size and there are 100 users concurrently requesting the file, then your application will crash with an OutOfMemoryError. Not really professional.
Have the "huge" data somewhere in a database or on the disk file system and obtain it as an InputStream the "default way" (i.e. from DB by ResultSet#getBinaryStream() or from disk by FileInputStream) and write it to the OutputStream through a bytebuffer and/or BufferedInputStream/BufferedOutputStream.
An example of such a servlet can be found here.
Good luck.
Related
I want to read an file of 1.5 GB into an array. Now, as it takes long time, I want to switch it to some other option. Can anybody help me,
If I preprocess the byte file into some database (or may be in other way) can I make it faster ?
Can anybody help me is there any other way to make it faster.
Actually, I have to process more than 50, 1.5GB file. So, such operation is quite expensive for me.
It depends on what you want to do.
If you only wanted to access a few random bytes, then reading into an array isn't good - a MappedByteBuffer would be better.
If you want to read all the data and sequentially process it a small portion at a time then you could stream it.
If you need to do computations that do random access of the whole dataset, particularly if you need to repeatedly read elements, then loading into an array might be sensible (but a ByteBuffer is still a candidate).
Can you show some example code or explain further?
How fast is your disk subsystem?
If you can read 40 MB per second, reading 1500 MB should take about 40 seconds. If you want to go faster than this, you need a faster disk subsystem. If you are reading from a local drive and its taking minutes, you have a tuning problem and there is not much you can doing Java to fix this because it is not the problem.
You can use a memory mapped file instead, but this will only speed up the access if you don't need all the data. If you need it all, you are limited by the speed of your hardware.
Using BufferedInputStream or InputStream is probably as fast as you can get (faster than RandomAccessFile). The largest int size is 2,147,483,647 so you're getting somewhat close there with your array of 1,610,612,736 which would also be the max size of an array.
I'd recommend you just access the file using BufferedInputStream for best speed, skip() and read() to get the data you want. Maybe have a class that implements those, is aware of its position, and takes care of the seeking for you when you send it an offset to read from. I believe you close and reopen the input stream to put it back at the beginning.
And... you may not want to save them in an array and just access them on need from the file. That might help if loading time is your killer.
I have to read a big text file of, say, 25 GB and need to process this file within 15-20 minutes. This file will have multiple header and footer section.
I tried CSplit to split this file based on header, but it is taking around 24 to 25 min to split it to a number of files based on header, which is not acceptable at all.
I tried sequential reading and writing by using BufferReader and BufferWiter along with FileReader and FileWriter. It is taking more than 27 min. Again, it is not acceptable.
I tried another approach like get the start index of each header and then run multiple threads to read file from specific location by using RandomAccessFile. But no luck on this.
How can I achieve my requirement?
Possible duplicate of:
Read large files in Java
Try using a large buffer read size (for example, 20MB instead of 2MB) to process your data quicker. Also don't use a BufferedReader because of slow speeds and character conversions.
This question has been asked before: Read large files in Java
You need to ensure that the IO is fast enough without your processing because I suspect the processing, not the IO is slowing you down. You should be able to get 80 MB/s from a hard drive and up to 400 MB/s from an SSD drive. This means you could read the entire in one second.
Try the following, which is not the fastest, but the simplest.
long start = System.nanoTime();
byte[] bytes = new byte[32*1024];
FileInputStream fis = new FileInputStream(fileName);
int len;
while((len = fis.read(bytes)) > 0);
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds%n", time/1e9);
Unless you find you are getting at least 50 MB/s you have a hardware problem.
Try using java.nio to make better use of the operating systems functionality. Avoid copying the data (e.g. into a string), but try to work with offsets. I believe the java.nio classes will even have methods to transfer data from one buffer to another without pulling the data into the java layer at all (at least on linux), but that will essentially translate into operating system calls.
For many modern web servers this technique has been key to the performance they can serve static data with: essentially they delegate as much as possible to the operating system to avoid duplicating it to the main memory.
Let me emphasizes this: just seeking through a 25 GB byte buffer is a lot faster than converting it into Java Strings (which may require charset encoding/decoding - and copying). Anything that saves you copies and memory management will help.
If the platform is right, you might want to shell out and call a combination of cat and sed. If it is not, you might still want to shell out and use perl via command line. For the case that is absolutely has to be Java doing the actual processing, the others have provided sufficient answers.
Be on your guard though, shelling out is not without problems. But perl or sed might be the only widely available tools to crawl through and alter 25GB of text in your timeframe.
I am writing to a disk some text as bytes. I need to maximize my performance and write as complete pages.
Does anybody know what is the optimal size of a page in bytes when writing to disk?
If you use a BufferedWriter or Buffered streams, you should be good. Java uses a 8K buffer. This should be sufficient for most usage patterns. Is your use case anything specific (like do you have fixed length data that needs to be written and fetched from disk in a single shot) etc which is making you optimize what Java already provides?
I'm creating a compression algorithm in Java;
to use my algorithm I require a lot of information about the structure of the target file.
After collecting the data, I need to reread the file. <- But I don't want to.
While rereading the file, I make it a good target for compression by 'converting' the data of the file to a rather peculiar format. Then I compress it.
The problems now are:
I don't want to open a new FileInputStream for rereading the file.
I don't want to save the converted file which is usually 150% the size of the target file to the disk.
Are there any ways to 'reset' a FileInputStream for moving to the start of the file, and how would I store the huge amount 'converted' data efficiently without writing to the disk?
You can use one or more RandomAccessFiles. You can memory map them to ByteBuffer() which doesn't consume heap (actually they use about 128 bytes) or direct memory but can be accessed randomly.
Your temporary data can be storing in a direct ByteBuffer(s) or more memory mapped files. Since you have random access to the original data, you may not need to duplicate as much data in memory as you think.
This way you can access the whole data with just a few KB of heap.
There's the reset method, but you need to wrap the FileInputStream in a BufferedInputStream.
You could use RandomAccessFile, or java.nio ByteBuffer is what you are looking for. (I do not know.)
Resources might be saved by pipes/streams: immediately writing to a compressed stream.
To answer your questions on reset: not possible; the base class InputStream has provisions for mark and reset-to-mark, but FileInputStream was made optimal for several operating systems and does purely sequential input. Closing and opening is best.
What is the best way to change a single byte in a file using Java? I've implemented this in several ways. One uses all byte array manipulation, but this is highly sensitive to the amount of memory available and doesn't scale past 50 MB or so (i.e. I can't allocate 100MB worth of byte[] without getting OutOfMemory errors). I also implemented it another way which works, and scales, but it feels quite hacky.
If you're a java io guru, and you had to contend with very large files (200-500MB), how might you approach this?
Thanks!
I'd use RandomAccessFile, seek to the position I wanted to change and write the change.
If all I wanted to do was change a single byte, I wouldn't bother reading the entire file into memory. I'd use a RandomAccessFile, seek to the byte in question, write it, and close the file.