Is it possible to replace part of a files content, without rewriting the entire file to the disk.
Say that i have a very large file of several gigabytes, how to i replace the bytes from, lets say position 100 to 200 without rewriting the entire file?
As an added bonus, i need a solution that does not use any features never than java 1.4.
If you're positive that you're going to be writing exactly the same number of bytes, you can use a RandomAccessFile to accomplish this (available since Java 1.0). Just open the file, seek to wherever you need to be, and overwrite those bytes with whatever your new data is.
RandomAccessFile f = new RandomAccessFile(new File("C:\\test\\huge.txt"), "rw");
f.seek(100); // Seek ahead
f.write("here is some new stuff".getBytes())
You can also read from the file at arbitrary points in the same fashion, in case you don't know exactly how much data you need to replace (e.g. so you can pad/truncate whatever you're writing to avoid doing something awful by accident).
Related
I have an arraylist of gzipped xml files. Is it possible to view and manipulate the contents of these xml files all without unzipping them and taking up disk space? If so, what would be the correct class(es) to use for this task?
I know I can create a gzipinputstream from a fileinputstream of the zip file but from there I'm not sure what to do. I have only this written:
GZIPInputStream in = new GZIPInputStream(new FileInputStream(zippedFiles.get(i)));
I need some way to parse text within the xml files and modify the xml itself but again, extracting all of them would take up too much disk space.
What exactly are you going to achieve? You can extract the file into memory using a ByteArrayOutputStream and convert it into a byte-Array that you forward to your XML parser library (converting it to String and passing that is not recommended as the encoding is specified inside the XML file itself and the conversion to String must therefore be done by the XML parser internally). Most XML parsers also support reading directly from any InputStream, so you could pass yours directly to it which will probably further reduce your memory consumption. Disk space will only be occupied when writing data back to it by simply reversing the described procedure. Still, as you directly replace the source file by overwriting it, there is nowhere any disk space wasted.
The fact that they're in a list doesn't change much, but no.
Ignoring compression, files are stored linearly on disks. You can append to them cheaply, you can replace bytes cheaply, but you can't replace sequences of different lengths (like replace("Testing Procedure Specification", "TPS")) without rewriting the file after the modified substring.
Gziping the file complicates things, but the same rule applies. In general, making arbitrary modifications to a file requires rewriting the file.
Your code for reading the files is on the right track, though. You can easily read through gziped files as streams and without having to decompress the entire file.
I'm on mobile (android), and have a large text file, about 50mb. I want to be able to open the file and seek to a particular position, then start reading data into a buffer from that point. Is using FileReader + BufferedReader the best way to do this if I want to use as little memory as possible?:
BufferedReader in
= new BufferedReader(new FileReader("foo.txt"));
in.skip(byteCount); // in some cases I have to read from an offset
// start reading a line at a time here
I'll also need to write to the file, only ever appending data, so:
FileWriter w = new FileWriter("foo.txt", true);
w.write(someCharacters);
I'm primarily interested to know if by misusing the wrong file reader/writer classes, I may accidentally be loading the entire file contents into memory before the reads or writes,
Thanks
Basically you don't want to read the whole file, but just a certain portion of it. In this case use java.io.RandomAccessFile instead:
its seek() method is guaranteed to do seek instead of reading & discarding (which is what some implementations of InputStream.skip() actually do)
the seek() method can move back the file pointer - something you can't do for an InputStream
a getFilePointer() method is provided to get the current position in file.
it only reads what you tells it to read, so there's no fear you'll accidentally load more than what you want
My dictionary app uses RandomAccessFile to access about 45MB of data back when each Android app could only use 16MB of RAM, also a service running my dictionary engine that operates on the same 45MB of data uses only about 2MB of RAM(and most of it prob were used by Davlik VM and not my search engine). So this class definitely works as intended.
You could try using a memory mapped file (java.nio.channels.FileChannel.map()). I'm not sure how much heap space would be allocated for this though.
I searched and looked at multiple questions like this, but my question is really different than anything I found. I've looked at Java Docs.
How do I get the equivalent of this c file open:
stream1 = fopen (out_file, "r+b");
Once I've done a partial read from the file, the first write makes the next read return EOF no matter how many bytes were in the file.
Essentially I want a file I/O stream that doesn't do that. The whole purpose of what I'm trying to do is to replace the bytes in an existing file in the current file. I don't want to do it in a copy or make a copy before I do the Read->Write.
You can use a RandomAccessFile.
As Perception mentions, you can use a RandomAccessFile. Also, in some situations, a FileChannel may work better. I've used these to handle binary file data with great success.
EDIT: you can get a FileChannel from the RandomAccessFile object using getChannel.
EDIT
This is my file reader, can I make this read it from bottom to up seeing how difficult it is to make it write from bottom to up.
BufferedReader mainChat = new BufferedReader(new FileReader("./messages/messages.txt"));
String str;
while ((str = mainChat.readLine()) != null)
{
System.out.println(str);
}
mainChat.close();
OR (old question)
How can I make it put the next String at the beginning of the file and then insert an new line(to shift the other lines down)?
FileWriter chatBuffer = new FileWriter("./messages/messages.txt",true);
BufferedWriter mainChat = new BufferedWriter(chatBuffer);
mainChat.write(message);
mainChat.newLine();
mainChat.flush();
mainChat.close();
Someone could correct me, but I'm pretty sure in most operating systems, there is no option but to read the whole file in, then write it back again.
I suppose the main reason is that, in most modern OSs, all files on the disc start at the beginning of a boundary. The problem is, you cannot tell the file allocation table that your file starts earlier than that point.
Therefore, all the later bytes in the file have to be rewritten. I don't know of any OS routines that do this in one step.
So, I would use a BufferedReader to store whole file into a Vector or StringBuffer, then write it all back with the prepended string first.
--
Edit
A way that would save memory for larger files, reading #Saury's randomaccessfile suggestion, would be:
file has N bytes to start with
we want to add on "hello world"
open the file for append
append 11 spaces
i=N
loop {
go back to byte i
read a byte
move to byte i+11
write that byte back
i--
} until i==0
then move to byte 0
write "hello world"
voila
Use FileUtils from Apache Common IO to simplify this if you can. However, it still needs to read the whole file in so it will be slow for large files.
List<String> newList = Arrays.asList("3");
File file = new File("./messages/messages.txt");
newList.addAll(FileUtils.readLines(file));
FileUtils.writeLines(file, newList);
FileUtils also have read/write methods that take care of encoding.
Use RandomAccessFile to read/write the file in reverse order. See following links for more details.
http://www.java2s.com/Code/Java/File-Input-Output/UseRandomAccessFiletoreverseafile.htm
http://download.oracle.com/javase/1.5.0/docs/api/java/io/RandomAccessFile.html
As was suggested here pre-pending to a file is rather difficult and is indeed linked to how files are stored on the hard drive. The operation is not naturally available from the OS so you will have to make it yourself and most obvious answers to this involve reading the whole file and writing it again. this may be fine for you but will incur important costs and could be a bottleneck for your application performance.
Appending would be the natural choice but this would, as far as I understand, make reading the file unnatural.
There are many ways you could tackle this depending on the specificities of your situation.
If writing this file is not time critical in your application and the file does not grow too big you could bite the bullet and read the whole file, prepend the information and write it again. apache's common-io's FileUtils will be of help here simpifying the operation where you can read the file as a list of strings, prepend the new lines to the list and write the list again.
If writing is time critical but have control over the reading or the file. That is, if the file is to be read by another of your programs. you could load the file in a list of lines and reverse the list. Again FileUtils from the common-io library and helper functions in the Collections class in the standard JDK should do the trick nicely.
If writing is time critical but the file is intended to be read through a normal text editor you could create a small class or program that would read the file and write it in another file with the preferred order.
So, here is the situation:
I have to read big .gz archives (GBs) and kind of "index" them to later on be able to retrieve specific pieces using random access.
In other words, I wish to read the archive line by line, and be able to get the specific location in the file for any such line. (so that I can jump directly to these specific locations upon request). (PS: ...and it's UTF-8 so we cannot assume 1 byte == 1 char.)
So, basically, what I just need is a BufferedReader which keeps track of its location in the file. However, this doesn't seem to exist.
Is there anything available or do I have to roll my own?
A few additional comments:
I cannot use BufferedReader directly since the file location corresponds to what has been buffered so far. In other words, a multiple of the internal buffer size instead of the line location.
I cannot use InputStreamReader directly for performance reasons. Unbuffered would be way to slow, and, btw, lacks convenience methods to read lines.
I cannot use RandomAccessFile since 1. it's zipped, and 2. RandomAccessFile uses "modified" UTF-8
I guess the best would be use a kind of of buffered reader keeping track of file location and buffer offset ...but this sounds quite cumbersome. But maybe I missed something. Perhaps there is already something existing to do that, to read files line by lines and keep track of location (even if zipped).
Thanks for tips,
Arnaud
I think jzran could be pretty much what you're looking for:
It's a Java library based on the
zran.c sample from zlib.
You can preprocess a large gzip
archive, producing an "index" that can
be used for random read access.
You can balance between index size and
access speed.
What you are looking for is called mark(), markSupported() and skip().
This methods are declared both in InputStream and Reader, so you are welcome to use them.
GZIP compression does not support seeking. Previous data blocks are needed to build compression tables...