Getting a wrong FileChannel size - java

I'm trying to get the size of a file contained in assets.
I'm using a FileChannel because I need a FileChannel later.
The file myfile.txt contains 7 bytes.
Here is my code:
AssetManager amgr;
AssetFileDescriptor afd;
FileChannel fchIn;
FileInputStream fis;
amgr=context.getAssets();
afd=amgr.openFd("myfile.txt");
fis=afd.createInputStream();
fchIn=fis.getChannel();
Log.d("mytag", fchIn.size());
Log.d("mytag", fis.available());
And the output is:
7237492
7
Why is the size returned by the FileChannel.size() method wrong ?
Thanks for your help

FileInputStream.getChannel() documentation says it "Returns a read-only FileChannel that shares its position with this stream." You are assuming that the channel begins and ends exactly at the boundaries of your file, which is the only way its total size() would match what you expect, but the documentation does not make that guarantee.
Also note that FileInputStream.available() is not documented to mean the same thing as total size -- technically, it is the amount of data available to read without needing to load/buffer more from the source. Unless the entire source has already been read into memory, it may have almost nothing to do with the actual file size.
I expect that the underlying FileChannel has access to a large range of bytes for multiple files, like all of your assets combined (and hence the large size), and you'd need to reference the AssetFileDescriptor's getStartOffset() and getLength() methods to know the actual position of the associated file's bytes within the channel. My guess is that AssetFileDescriptor.createInputStream() is giving you an object that already knows the position and range and takes care of that for you. If you're going to use the raw channel, then honor the information in the file descriptor. Also note that, per the documentation, if you move the position of the FileInputStream (e.g. by reading bytes) then you also move the position of the channel at the same time, so be careful if you're trying to use both.

Related

Access the same file for read and write random position

Start by saying that I have not great experience in Java and I've done a lot of research. I would please ask you a specific question.
Thank you
I need to open a file for reading and writing from which I read and write a 512-byte blocks.
The file is fixed length and the information to be written will overlap with other existing.
For example, I read the first 512 bytes of the file and if it contains certain values write a block 512 to position 2048.
I tried using FileInputStream and FileOutputStream but every time I open with FileOutputStream the contents of the file are deleted.
It can be done with Java?
Roberto
Use a FileChannel; it allows random access to any part of a file, in read, write or any combination of both.
Example:
final Path path = Paths.get("path/to/the/file");
final FileChannel channel = FileChannel.open(path, relevantOptions);
Optionally, after that, you can use the .map() method.

Java: Replace part of file without writing the entire file again

Is it possible to replace part of a files content, without rewriting the entire file to the disk.
Say that i have a very large file of several gigabytes, how to i replace the bytes from, lets say position 100 to 200 without rewriting the entire file?
As an added bonus, i need a solution that does not use any features never than java 1.4.
If you're positive that you're going to be writing exactly the same number of bytes, you can use a RandomAccessFile to accomplish this (available since Java 1.0). Just open the file, seek to wherever you need to be, and overwrite those bytes with whatever your new data is.
RandomAccessFile f = new RandomAccessFile(new File("C:\\test\\huge.txt"), "rw");
f.seek(100); // Seek ahead
f.write("here is some new stuff".getBytes())
You can also read from the file at arbitrary points in the same fashion, in case you don't know exactly how much data you need to replace (e.g. so you can pad/truncate whatever you're writing to avoid doing something awful by accident).

Avoid obtaining same InputStream more than once

I can see there are a number of posts regarding reuse InputStream. I understand InputStream is a one-time thing and cannot be reused.
However, I have a use case like this:
I have downloaded the file from DropBox by obtaining the DropBoxInputStream using the DropBox's Java SDK. I then need to upload the file to another system by passing the InputStream. However, as part of the download, I have to provide the MD5 of the file. So I have to read the file from the stream before uploading the file. Because the DropBoxInputStream I received can only be used once, I have to get another DropBoxInputStream after I have calculated the MD5 and before uploading the file. The procedure is like:
Get first DropBoxInputStream
Read from the DropBoxInputStream and calculate MD5
Get the second DropBoxInputStream
Upload file using the MD5 and the second DropBoxInputStream.
I am thinking that, if there are many way for me to "cache" or "backup" the InputStream before I calculate the MD5 so that I can save step 3 of obtaining the same DropBoxInputStream again?
Many thanks
EDIT:
Sorry I missed some information.
What I am currently doing is that I use a MD5DigestOutputStream to calculate MD5. I stream data across the MD5DigestOutputStream and save them locally as a temp file. Once the data goes through the MD5DigestOutputStream, it will calculate the MD5.
I then call a third party library to upload the file using the calculated md5 and a FileInputStream which reads from the temp file.
However, this requires huge disk space sometime and I want to remove the needs to use temp file. The library I use only accepts a MD5 and InputStream. This means I have to calculate the MD5 on my end. My plan is to use my MD5DigestOutputStream to write data to /dev/null (not keeping the file) so that I can calculate theMD5, and get the InputStream from DropBox again and pass that to the library I use. I assume the library will be able to get the file directly from DropBox without the need for me to cache the file either in the memory of at the disk. Will it work?
Input streams aren't really designed for creating copies or re-using, they're specifically for situations where you don't want to read off into a byte array and use array operations on that (this is especially useful when the whole array isn't available, as in, for e.g. socket comunication). You could buffer up into a byte array, which is the process of reading sections from the stream into a byte array buffer until you have enough information.
But that's unnecessary for calculating an md5. Notice that InputStream is abstract, so it needs be implemented in an extended class. It has many implementations- GZIPInputStream, fileinputstream etc. These are, in design pattern speak, decorators of the IO stream: they add extra functionality to the abstract base IO classes. For example, GZIPInputStream gzips up the stream.
So, what you need is a stream to do this for md5. There is, joyfully, a well documented similar thing: see this answer. So you should just be able to pass your dropbox input stream (as it will be itself an input stream) to create a new DigestInputStream, and then you can both take the md5 and continue to read as before.
Worried about type casting? The idea with decorators in Java is that, since the InputStream base class interfaces all the methods and 'beef' you need to do your IO, there's no harm in passing instances of objects inheriting from InputStream in the constructor of each stream implementation, and you can still do the same core IO.
Finally, I should probably answer your actual question- say you still want to "cache" or "backup" the stream anyway? Well, you could just write it to a byte array. This is well documented, but can become faff when your streams get more complicated. Alternatively, try looking at a PushbackInputStream. Here, you can easily write a function to read off n bytes, perform and operation on them, and then restore them to the stream. Generally good to avoid these implementations of streams in Java, as it's bad for memory use, but no worse than buffering everything up which you'd otherwise have to do.
Or, of course, I would have a go with DigestInputStream.
Hope this helps,
Best.
You don't need to open a new InputStream from DropBox.
Once you have read the file from DropBox, you have it locally. So it is either in memory (in a byte array) or you stored it in a local file. Now you can create an InputStream that reads the data from memory (ByteArrayInputStream) or disk (FileInputStream) in order to upload the file.
So instead of caching the InputStream (which you can't) you cache the contents (which you can).

Reading a gz file and keeping track of position in file

So, here is the situation:
I have to read big .gz archives (GBs) and kind of "index" them to later on be able to retrieve specific pieces using random access.
In other words, I wish to read the archive line by line, and be able to get the specific location in the file for any such line. (so that I can jump directly to these specific locations upon request). (PS: ...and it's UTF-8 so we cannot assume 1 byte == 1 char.)
So, basically, what I just need is a BufferedReader which keeps track of its location in the file. However, this doesn't seem to exist.
Is there anything available or do I have to roll my own?
A few additional comments:
I cannot use BufferedReader directly since the file location corresponds to what has been buffered so far. In other words, a multiple of the internal buffer size instead of the line location.
I cannot use InputStreamReader directly for performance reasons. Unbuffered would be way to slow, and, btw, lacks convenience methods to read lines.
I cannot use RandomAccessFile since 1. it's zipped, and 2. RandomAccessFile uses "modified" UTF-8
I guess the best would be use a kind of of buffered reader keeping track of file location and buffer offset ...but this sounds quite cumbersome. But maybe I missed something. Perhaps there is already something existing to do that, to read files line by lines and keep track of location (even if zipped).
Thanks for tips,
Arnaud
I think jzran could be pretty much what you're looking for:
It's a Java library based on the
zran.c sample from zlib.
You can preprocess a large gzip
archive, producing an "index" that can
be used for random read access.
You can balance between index size and
access speed.
What you are looking for is called mark(), markSupported() and skip().
This methods are declared both in InputStream and Reader, so you are welcome to use them.
GZIP compression does not support seeking. Previous data blocks are needed to build compression tables...

IOException while reading from InputStream

I'm running into a strange problem while reading from an InputStream on the Android platform. I'm not sure if this is an Android specific issue, or something I'm doing wrong in general.
The only thing that is Android specific is this call:
InputStream is = getResources().openRawResource(R.raw.myfile);
This returns an InputStream for a file from the Android assets. Anyways, here's where I run into the issue:
bytes[] buffer = new bytes[2];
is.read(buffer);
When the read() executes it throws an IOException. The weird thing is that if I do two sequential single byte reads (or any number of single byte reads), there is no exception. In example, this works:
byte buffer;
buffer = (byte)buffer.read();
buffer = (byte)buffer.read();
Any idea why two sequential single byte reads work but one call to read both at once throws an exception? The InputStream seems fine... is.available() returns over a million bytes (as it should).
Stack trace shows these lines just before the InputStream.read():
java.io.IOException
at android.content.res.AssetManager.readAsset(Native Method)
at android.content.res.AssetManager.access$800(AssetManager.java:36)
at android.content.res.AssetManager$AssetInputStream.read(AssetManager.java:542)
Changing the buffer size to a single byte still throws the error. It looks like the exception is only raised when reading into a byte array.
If I truncate the file to 100,000 bytes (file is: 1,917,408 bytes originally) it works fine. Is there a problem with files over a certain size?
Any help is appreciated!
Thanks!
(my post to android-developers isn't showing up, so I'll try reposting it here)
IIRC, this problem comes from trying to access files that were compressed as part of building the APK.
Hence, to work around the issue, give it a file extension that won't be compressed. I forget the list of what extensions are skipped, but file types known to already be compressed (e.g., mp3, jpg) may work.
Changing the file extension to .mp3 to avoid the file compression does work, but the APK of the app is much bigger (in my case 2.3 MB instead of 0.99 MB).
Is there any other way to avoid this issue ?
Here is my answer:
Load files bigger than 1M from assets folder
You can compress the file for yourself with GZIP and unpack it with GZIPInputStream class.
http://developer.android.com/reference/java/util/zip/GZIPInputStream.html
You are correct, in that there is a certain size limit for extracting files. You may wish to split larger files into 1MB pieces, and have a method by which you know which files are made of which pieces, stitching them back together again when your app runs.

Categories

Resources