Best way to read an internal file on Android? - java

I've been Googling up some answers, and I can't seem to find the best one.
Here's what I have so far for reading internal files on Android:
fis = openFileInput("MY_FILE");
StringBuilder fileContent = new StringBuilder("");
byte[] buffer = new byte[fis.available()];
while (fis.read(buffer) != -1) {
fileContent.append(new String(buffer));
}
MYVARIABLE = fileContent.toString();
fis.close();
It use to leave a lot of whitespaces, but I just used .available method to only return what I need.
Is there a faster or shorter way to write this? I can't seem to find any good ones in the API guide.

1). API for available() says it should not be used for the purposes you need:
Note that this method provides such a weak guarantee that it is not very useful in practice.
Meaning it may not give you the file size.
2). When you read smth in RAM, then take under account the file can be lengthy, so try to avoid spending extra RAM. For this a relatively small (1~8 KB) buffer is used to read from source and then append to result. On the other hand using too small buffers (say, several bytes) slows down reading significantly.
3). Reading bytes differs from reading characters, because a single character may be represented by more than one byte (depends on encoding). To read chars the spesific classes are used which are aware of encoding and know how to convert bytes to chars properly. For instance InputStreamReader is one of such classes.
4). The encoding to use for reading should be the encoding tha was used for persisting the data.
Taking all the said above I would use smth like this:
public static String getStringFromStream(InputStream in, String encoding)
throws IOException {
InputStreamReader reader;
if (encoding == null) {
// This constructor sets the character converter to the encoding
// specified in the "file.encoding" property and falls back
// to ISO 8859_1 (ISO-Latin-1) if the property doesn't exist.
reader = new InputStreamReader(in);
} else {
reader = new InputStreamReader(in, encoding);
}
StringBuilder sb = new StringBuilder();
final char[] buf = new char[1024];
int len;
while ((len = reader.read(buf)) > 0) {
sb.append(buf, 0, len);
}
return sb.toString();
}
5). Make sure to always close an InputStream when done working with it.
Sure, there are more than one way to read text from file in Java/Android. This is mostly because Java API contains several generations of IO APIs. For instance, classes from java.nio package were created to be more efficient, however usually there is no strong reason of using them (don't fall into premature optimization sin).

Related

Java reading a file different methods

It seems that there are many, many ways to read text files in Java (BufferedReader, DataInputStream etc.) My personal favorite is Scanner with a File in the constructor (it's just simpler, works with mathy data processing better, and has familiar syntax).
Boris the Spider also mentioned Channel and RandomAccessFile.
Can someone explain the pros and cons of each of these methods? To be specific, when would I want to use each?
(edit) I think I should be specific and add that I have a strong preference for the Scanner method. So the real question is, when wouldn't I want to use it?
Lets start at the beginning. The question is what do you want to do?
It's important to understand what a file actually is. A file is a collection of bytes on a disc, these bytes are your data. There are various levels of abstraction above that that Java provides:
File(Input|Output)Stream - read these bytes as a stream of byte.
File(Reader|Writer) - read from a stream of bytes as a stream of char.
Scanner - read from a stream of char and tokenise it.
RandomAccessFile - read these bytes as a searchable byte[].
FileChannel - read these bytes in a safe multithreaded way.
On top of each of those there are the Decorators, for example you can add buffering with BufferedXXX. You could add linebreak awareness to a FileWriter with PrintWriter. You could turn an InputStream into a Reader with an InputStreamReader (currently the only way to specify character encoding for a Reader).
So - when wouldn't I want to use it [a Scanner]?.
You would not use a Scanner if you wanted to, (these are some examples):
Read in data as bytes
Read in a serialized Java object
Copy bytes from one file to another, maybe with some filtering.
It is also worth nothing that the Scanner(File file) constructor takes the File and opens a FileInputStream with the platform default encoding - this is almost always a bad idea. It is generally recognised that you should specify the encoding explicitly to avoid nasty encoding based bugs. Further the stream isn't buffered.
So you may be better off with
try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream())), "UTF-8") {
//do stuff
}
Ugly, I know.
It's worth noting that Java 7 Provides a further layer of abstraction to remove the need to loop over files - these are in the Files class:
byte[] Files.readAllBytes(Path path)
List<String> Files.readAllLines(Path path, Charset cs)
Both these methods read the entire file into memory, which might not be appropriate. In Java 8 this is further improved by adding support for the new Stream API:
Stream<String> Files.lines(Path path, Charset cs)
Stream<Path> Files.list(Path dir)
For example to get a Stream of words from a Path you can do:
final Stream<String> words = Files.lines(Paths.get("myFile.txt")).
flatMap((in) -> Arrays.stream(in.split("\\b")));
SCANNER:
can parse primitive types and strings using regular expressions.
A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types.more can be read at http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
DATA INPUT STREAM:
Lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class. More can be read at http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html
BufferedReader:
Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.More detail are at http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
NOTE: This approach is outdated. As Boris points out in his comment. I will leave it here for history, but you should use methods available in JDK.
It depends on what kind of operation you are doing and the size of the file you are reading.
In most of the cases, I recommend using commons-io for small files.
byte[] data = FileUtils.readFileToByteArray(new File("myfile"));
You can read it as string or character array...
Now, you are handing big files, or changing parts of a file directly on the filesystem, then the best it to use a RandomAccessFile and potentially even a FileChannel to do "nio" style.
Using BufferedReader
BufferedReader reader;
char[] buffer = new char[10];
reader = new BufferedReader(new FileReader("FILE_PATH"));
//or
reader = Files.newBufferedReader(Path.get("FILE_PATH"));
while (reader.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new char[10];
}
//or
while (buffReader.ready()) {
System.out.println(
buffReader.readLine());
}
reader.close();
Using FileInputStream-Read Binary Files to Bytes
FileInputStream fis;
byte[] buffer = new byte[10];
fis = new FileInputStream("FILE_PATH");
//or
fis=Files.newInoutSream(Paths.get("FILE_PATH"))
while (fis.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new byte[10];
}
fis.close();
Using Files– Read Small File to List of Strings
List<String> allLines = Files.readAllLines(Paths.get("FILE_PATH"));
for (String line : allLines) {
System.out.println(line);
}
Using Scanner – Read Text File as Iterator
Scanner scanner = new Scanner(new File("FILE_PATH"));
while (scanner.hasNextLine()) {
System.out.println(scanner.nextLine());
}
scanner.close();
Using RandomAccessFile-Reading Files in Read-Only Mode
RandomAccessFile file = new RandomAccessFile("FILE_PATH", "r");
String str;
while ((str = file.readLine()) != null) {
System.out.println(str);
}
file.close();
Using Files.lines-Reading lines as stream
Stream<String> lines = Files.lines(Paths.get("FILE_PATH") .forEach(s -> System.out.println(s));
Using FileChannel-for increasing performance by using off-heap memory furthermore using MappedByteBuffer
FileInputStream i = new FileInputStream(("FILE_PATH");
ReadableByteChannel r = i.getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while (r.read(buffer) != -1) {
buffer.flip();
while (buffer.hasRemaining()) {
System.out.print((char) buffer.get());
}
buffer.clear();
}

Java - Resetting InputStream

I'm dealing with some Java code in which there's an InputStream that I read one time and then I need to read it once again in the same method.
The problem is that I need to reset it's position to the start in order to read it twice.
I've found a hack-ish solution to the problem:
is.mark(Integer.MAX_VALUE);
//Read the InputStream is fully
// { ... }
try
{
is.reset();
}
catch (IOException e)
{
e.printStackTrace();
}
Does this solution lead to some unespected behaviours? Or it will work in it's dumbness?
As written, you have no guarantees, because mark() is not required to report whether it was successful. To get a guarantee, you must first call markSupported(), and it must return true.
Also as written, the specified read limit is very dangerous. If you happen to be using a stream that buffers in-memory, it will potentially allocate a 2GB buffer. On the other hand, if you happen to be using a FileInputStream, you're fine.
A better approach is to use a BufferedInputStream with an explicit buffer.
It depends on the InputStream implementation. You can also think whether it will be better if you use byte[]. The easiest way is to use Apache commons-io:
byte[] bytes = IOUtils.toByteArray(inputSream);
You can't do this reliably; some InputStreams (such as ones connected to terminals or sockets) don't support mark and reset (see markSupported). If you really have to traverse the data twice, you need to read it into your own buffer.
Instead of trying to reset the InputStream load it into a buffer like a StringBuilder or if it's a binary data stream a ByteArrayOutputStream. You can then process the buffer within the method as many times as you want.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int read = 0;
byte[] buff = new byte[1024];
while ((read = inStream.read(buff)) != -1) {
bos.write(buff, 0, read);
}
byte[] streamData = bos.toByteArray();
For me, the easiest solution was to pass the object from which the InputStream could be obtained, and just obtain it again. In my case, it was from a ContentResolver.

android java not enough sequential memory for stringbuilder

My app is parsing a large http response, the http response is over 6 megabytes and is json, but not in a standard schema.
final char[] buffer = new char[0x10000];
StringBuilder out = new StringBuilder();
Reader in = new InputStreamReader(is, "UTF-8");
int read;
System.gc();
do
{
read = in.read(buffer, 0, buffer.length);
if (read > 0)
{
out.append(buffer, 0, read);
}
} while (read >= 0);
in.close();
is.close();
in = null;
is = null;
System.gc();
return out.toString();
It doesn't matter if there is a bufferedreader from a file, or an inputstream, the StringBuilder simply cannot contain the entire object and it fails at out.append(buffer, 0, read); or it will fail at out.toString() as another copy may be made
IOUtils.copy from the apache library is doing the same things under the hood and it will also fail.
How can I read this large object in for further manipulation. Right now this method fails on Android 2.2 and 2.3 devices, and uses more memory than I want on newer devices.
Similar questions all have answers that involve appending to a stringbuilder, reading in lines, or have incomplete solutions that are only hints, and that doesn't work.
You need to do one of two things:
Get multiple smaller JSON responses from the server and parse those. This might be preferable on a mobile device, as large chunks of data might not be transmitted reliably, which will cause the device to request the entire thing repeatedly.
Use a streaming JSON parser, such as Jackson, to process the data as it comes in.

how to write a file without allocating the whole byte array into memory?

This is a newbie question, I know. Can you guys help?
I'm talking about big files, of course, above 100MB. I'm imagining some kind of loop, but I don't know what to use. Chunked stream?
One thins is for certain: I don't want something like this (pseudocode):
File file = new File(existing_file_path);
byte[] theWholeFile = new byte[file.length()]; //this allocates the whole thing into memory
File out = new File(new_file_path);
out.write(theWholeFile);
To be more specific, I have to re-write a applet that downloads a base64 encoded file and decodes it to the "normal" file. Because it's made with byte arrays, it holds twice the file size in memory: one base64 encoded and the other one decoded. My question is not about base64. It's about saving memory.
Can you point me in the right direction?
Thanks!
From the question, it appears that you are reading the base64 encoded contents of a file into an array, decoding it into another array before finally saving it.
This is a bit of an overhead when considering memory. Especially given the fact that Base64 encoding is in use. It can be made a bit more efficient by:
Reading the contents of the file using a FileInputStream, preferably decorated with a BufferedInputStream.
Decoding on the fly. Base64 encoded characters can be read in groups of 4 characters, to be decoded on the fly.
Writing the output to the file, using a FileOutputStream, again preferably decorated with a BufferedOutputStream. This write operation can also be done after every single decode operation.
The buffering of read and write operations is done to prevent frequent IO access. You could use a buffer size that is appropriate to your application's load; usually the buffer size is chosen to be some power of two, because such a number does not have an "impedance mismatch" with the physical disk buffer.
Perhaps a FileInputStream on the file, reading off fixed length chunks, doing your transformation and writing them to a FileOutputStream?
Perhaps a BufferedReader? Javadoc: http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/io/BufferedReader.html
Use this base64 encoder/decoder, which will wrap your file input stream and handle the decoding on the fly:
InputStream input = new Base64.InputStream(new FileInputStream("in.txt"));
OutputStream output = new FileOutputStream("out.txt");
try {
byte[] buffer = new byte[1024];
int readOffset = 0;
while(input.available() > 0) {
int bytesRead = input.read(buffer, readOffset, buffer.length);
readOffset += bytesRead;
output.write(buffer, 0, bytesRead);
}
} finally {
input.close();
output.close();
}
You can use org.apache.commons.io.FileUtils. This util class provides other options too beside what you are looking for. For example:
FileUtils.copyFile(final File srcFile, final File destFile)
FileUtils.copyFile(final File input, final OutputStream output)
FileUtils.copyFileToDirectory(final File srcFile, final File destDir)
And so on.. Also you can follow this tut.

What could lead to the creation of false EOF in a GZip compressed data stream

We are streaming data between a server (written in .Net running on Windows) to a client (written in Java running on Ubuntu) in batches. The data is in XML format. Occasionally the Java client throws an unexpected EOF while trying decompress the stream. The message content always varies and is user driven. The response from the client is also compressed using GZip. This never fails and seems to be rock solid. The response from the client is controlled by the system.
Is there a chance that some arrangement of characters or some special characters are creating false EOF markers? Could it be white-space related? Is GZip suitable for compressing XML?
I am assuming that the code to read and write from the input/output streams works because we only occasionally gets this exception and when we inspect the user data at the time there seems to be special characters (which is why I asked the question) such as the '#' sign.
Any ideas?
UPDATE:
The actual code as requested. I thought it wasn't this due to the fact that I had been to a couple of sites to get help on this issue and they all more or less had the same code. Some sites mentioned appended GZip. Something to do with GZip creating multiple segments?
public String receive() throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(8192);
do {
int nrBytes = in.read(buffer);
if (nrBytes > 0) {
baos.write(buffer, 0, nrBytes);
}
} while (in.available() > 0);
return compressor.decompress(baos.toByteArray());
}
public String decompress(byte[] data) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(data);
try {
GZIPInputStream inflater = new GZIPInputStream(in);
byte[] byteBuffer = new byte[8192];
int r;
while((r = inflater.read(byteBuffer)) > 0 ) {
buffer.write(byteBuffer, 0, r);
}
} catch (IOException e) {
log.error("Could not decompress stream", e);
throw e;
}
return new String(buffer.toByteArray());
}
At first I thought there must be something wrong with the way that I am reading in the stream and I thought perhaps I am not looping properly. I then generated a ton of data to be streamed and checked that it was looping. Also the fact they it happens so seldom and so far has not been reproducable lead me to believe that it was the content rather than the scenario. But at this point I am totally baffled and for all I know it is the code.
Thanks again everyone.
Update 2:
As requested the .Net code:
Dim DataToCompress = Encoding.UTF8.GetBytes(Data)
Dim CompressedData = Compress(DataToCompress)
To get the raw data into bytes. And then it gets compressed
Private Function Compress(ByVal Data As Byte()) As Byte()
Try
Using MS = New MemoryStream()
Using Compression = New GZipStream(MS, CompressionMode.Compress)
Compression.Write(Data, 0, Data.Length)
Compression.Flush()
Compression.Close()
Return MS.ToArray()
End Using
End Using
Catch ex As Exception
Log.Error("Error trying to compress data", ex)
Throw
End Try
End Function
Update 3: Also added more java code. the in variable is the InputStream return from socket.getInputStream()
It certainly shouldn't be due to the data involved - the streams deal with binary data, so that shouldn't make any odds at all.
However, without seeing your code, it's hard to say for sure. My first port of call would be to check anywhere that you're using InputStream.read() - check that you're using the return value correctly, rather than assuming a single call to read() will fill the buffer.
If you could provide some code, that would help a lot...
I would suspect that for some reason the data is altered underway, by treating it as text, not as binary, so it may either be \n conversions or a codepage alteration.
How is the gzipped stream transferred between the two systems?
It is not pssible. EOF in TCP is delivered as an out of band FIN segment, not via the data.

Categories

Resources