I have a large JSON data around 20 MB (I create this data using JSON.stringify
from JavaScript code). I 'm writing this JSON data to an internal storage file on Android Device and reading it later. So When I read the file it's taking too much time, I don't know whether its reading or not. One more thing I need to Read in the Main thread only.
The below code works fine if I pass data value "Hello World" in WriteFile method, But it fails with the large JSON
public String ReadFile()
{
StringBuffer text = new StringBuffer();
String FILE_NAME = "file.txt";
try {
BufferedReader bReader = new BufferedReader(new InputStreamReader(openFileInput(FILE_NAME)));
String line;
int count = 0;
while ((line = bReader.readLine()) != null) {
text.append(line + "\n");
alert("Reading File: " + ++count);
}
}
catch (Exception e) {
alert(e.toString());
}
return text.toString();
}
public String WriteFile(String data)
{
String FILE_NAME = "file.txt";
String result = "";
try {
FileOutputStream fos = openFileOutput(FILE_NAME, Context.MODE_PRIVATE);
fos.write(data.toString().getBytes());
result = "Success";
fos.close();
}
catch (Exception e) {
e.printStackTrace();
result="Error";
}
return result;
}
I have added one alert in while loop also, but I cannot see any alert message. I have not seen even the Exception message also.
so there can be two problems.
There is something wrong in writing to file (But I don't know how to verify this? because I don't think there is any way to view internal storage file).
Something wrong in my reading code.
Update1:
If let's say I cannot read so large file in Java native code, then Is there any way to read an internal storage Android file from WebView JavaScript code?
============================================================================
Application Requirement
I have an Android application, In which I have a WebView. I have copied the full javascript code (js and HTML files) to assets folder of the app. I'm writing to file from java native code and reading from java native code. I am getting all data from the server on app launch. My client has a very slow internet connection and its disconnected many times. So they want this app to be run in offline mode. Means app will get all the data at launch and We will store it somewhere and then read it throughout the app. If a user launches the app again it will get the old existing data. Actually, this data is very big so I'm storing it to the internal storage file.
First of all, the only way to be really sure why your code is taking a long time is to profile it. We can't do that for you.
But here are some performance tips relevant to your code:
Don't read the entire 20MB JSON file into the Java heap / RAM memory unless you really need to do it. (I am finding it difficult to understand why you are doing this. For example, a typical JSON parser will happily1 read input directly from a file. Or if you are reading this so that you can send this to a client on the other end of an HTTP connection, you should be able to stream the data.)
Reading a file a line at a time and then stitching the lines back together is unnecessary. It generates unnecessary garbage. Extra garbage means more work for the GC, which slows you down. If the lines are long, you have the added performance "hit" of using a internal StringBuilder to build each line.
Reading to a recycled char[], then appending the char[] content to the StringBuilder will be faster than appending lines.
Your StringBuilder will repeatedly "grow" its backing character array to accommodate the characters as you append them. This generates garbage and leads to unnecessary copying. (Implementations typically "grow" the array exponentially to avoid O(N^2) behavior. However the expansions still affect performance, and can result in up to 3 times the peak memory usage than is actually required.)
One way to avoid this is to get an initial estimate of the number of characters you are going to add and set the StringBuilder "capacity" accordingly. You may be able to estimate the number of characters from the file size. (It depends on the encoding.)
Look for a way to do it using existing standard Java libraries; e.g. Files.copy and ByteArrayOutputStream, or Files.readAllBytes
Look for an existing 3rd-party library method; e.g. Apache Commons IO has an IOUtils.toString(Reader) method. The chances are that they will have spent a lot of time figuring out how to do this efficiently. Reusing a well engineered, well maintained library is likely to saves you time.
Don't put a trace print (I assume that is what alert is ...) in the middle of a loop that could be called millions of times. (Duh!)
1 - Parser are cheerful once you get to know them :-)
Related
I have object which i want load to memory on start program.
My question is:
It is better to insert objects into the (JAR) package or put the folder with the program?
What is faster way for reads object?
EDIT:
public MapStandard loadFromFileMS(String nameOfFile) {
MapStandard hm = null;
/*
InputStream inputStreaminputStream
= getClass().getClassLoader().
getResourceAsStream("data/" + nameOfFile + ".data");
*/
try {
FileInputStream inputStreaminputStream = new FileInputStream("C:\\"+nameOfFile+".data");
try (ObjectInputStream is = new ObjectInputStream(inputStreaminputStream)) {
hm = (MapStandard) is.readObject();
}
} catch (IOException | ClassNotFoundException e) {
System.out.println("Error: " + e);
}
return hm;
}
In theory it is faster to read a file from directory as from JAR file. JAR file is basically zip file with some metadata (MANIFEST.MF) so reading from JAR will include unzipping the content.
I don't think that there is a clear answer. Of course, reading a compressed archive requires time to un-compress. But: CPU cycles are VERY cheap. The time it takes to read a smaller archive and extract its content might still be quicker than reading "much more" content directly from the file system. You can do A LOT of computations while waiting for your IO to come in.
On the other hand: do you really think that the loading of this file is a performance bottleneck?
There is an old saying that the root of all evil is premature optimization.
If you or your users complain about bad performance - only then you start analyzing your application; for example using a profiler. And then you can start to fix those performance problems that REALLY cause problems; not those that you "assume" to be problematic.
And finally: if were are talking abut such huge dimensions - then you SHOULD not ask for stackoverflow opinions, but start to measure exact times yourself! We can only assume - but you have all the data in front of you - you just have to collect it!
A qualified guess would be that when the program starts the jar file entry will load a lot faster than the external file, but repeated usages will be much more alike.
The reason is that the limiting factor here on modern computers is "How fast can the bytes be retrieved from disk" and for jar-files the zip file is already being read by the JVM so many of the bytes needed are already loaded and does not have to be read again. An external file needs a completely separate "open-read" dance with the operating system. Later both will be in the disk read cache maintained by the operating system, so the difference is neglectible.
Considerations about cpu-usage is not really necessary. A modern CPU can do a lot of uncompressing in the time needed to read extra data from disk.
Note that reading through the jar file makes it automatically write protected. If you need to update the contents you need an external file.
I'm on mobile (android), and have a large text file, about 50mb. I want to be able to open the file and seek to a particular position, then start reading data into a buffer from that point. Is using FileReader + BufferedReader the best way to do this if I want to use as little memory as possible?:
BufferedReader in
= new BufferedReader(new FileReader("foo.txt"));
in.skip(byteCount); // in some cases I have to read from an offset
// start reading a line at a time here
I'll also need to write to the file, only ever appending data, so:
FileWriter w = new FileWriter("foo.txt", true);
w.write(someCharacters);
I'm primarily interested to know if by misusing the wrong file reader/writer classes, I may accidentally be loading the entire file contents into memory before the reads or writes,
Thanks
Basically you don't want to read the whole file, but just a certain portion of it. In this case use java.io.RandomAccessFile instead:
its seek() method is guaranteed to do seek instead of reading & discarding (which is what some implementations of InputStream.skip() actually do)
the seek() method can move back the file pointer - something you can't do for an InputStream
a getFilePointer() method is provided to get the current position in file.
it only reads what you tells it to read, so there's no fear you'll accidentally load more than what you want
My dictionary app uses RandomAccessFile to access about 45MB of data back when each Android app could only use 16MB of RAM, also a service running my dictionary engine that operates on the same 45MB of data uses only about 2MB of RAM(and most of it prob were used by Davlik VM and not my search engine). So this class definitely works as intended.
You could try using a memory mapped file (java.nio.channels.FileChannel.map()). I'm not sure how much heap space would be allocated for this though.
So I'm not really sure what actually happens deep-down.
Let's say we are in a web application and user request to download a dynamically generated file which can be several mb in size, possibly even 100 mb or more. The application does this
String disposition = "attachment; fileName="myFile.txt";
response.setHeader("Content-Disposition", disposition);
ServletOutputStream output = response.getOutputStream();
OutputStreamWriter writer = new OutputStreamWriter(output);
service.exportFile(ids, writer, properties);
Am I right that the whole file is never completely in memory? eg. whatever data was generated is sent to the user and then discarded on the server (assuming everything went well, no packet loss)?
I ask this because I need to change the library that generates the files (3rd party) and the new one doesn't not use standard Java IO stuff probably because it is just an API and the actual library is in C. Anyway to get a buffers data the documentation says to call
String data = buffer.toString();
(the files are ASCII)
So is my assumption correct that memory consumption will be affected especially when multiple users download large files at the same time?
Yes in you're first code snippet data is streamed directly to client, assuming, that implementation of service.exportFile(ids, writer, properties) itself never holds the generated data in memory but really streames it directly to writter.
With String data = buffer.toString(); you'll definitively place the entire data in heap space, at the latest when buffer.toString()is called, may be earlier depending on exact implementation.
In conclusion, you have in my opinion to be aware of two things:
- never alocate data to a variable in you're codem but diretly write it to output stream
- insure that implmentation also never holds the whole data in memory whike generating it
the 3rd party lib offers a second solution and that is writing to a file. however that requires extra hassle to create unique file names, then read them in and delete them afterwards. But I implemented it anyway.
In
service.exportFile(ids, writer, properties)
ids is a collection of database identifiers of records to be exported and hence ids.size() gives a rough estimate of the resulting files size. So as long as it is small I use the buffer version and if it is large the file one.
EDIT
This is my file reader, can I make this read it from bottom to up seeing how difficult it is to make it write from bottom to up.
BufferedReader mainChat = new BufferedReader(new FileReader("./messages/messages.txt"));
String str;
while ((str = mainChat.readLine()) != null)
{
System.out.println(str);
}
mainChat.close();
OR (old question)
How can I make it put the next String at the beginning of the file and then insert an new line(to shift the other lines down)?
FileWriter chatBuffer = new FileWriter("./messages/messages.txt",true);
BufferedWriter mainChat = new BufferedWriter(chatBuffer);
mainChat.write(message);
mainChat.newLine();
mainChat.flush();
mainChat.close();
Someone could correct me, but I'm pretty sure in most operating systems, there is no option but to read the whole file in, then write it back again.
I suppose the main reason is that, in most modern OSs, all files on the disc start at the beginning of a boundary. The problem is, you cannot tell the file allocation table that your file starts earlier than that point.
Therefore, all the later bytes in the file have to be rewritten. I don't know of any OS routines that do this in one step.
So, I would use a BufferedReader to store whole file into a Vector or StringBuffer, then write it all back with the prepended string first.
--
Edit
A way that would save memory for larger files, reading #Saury's randomaccessfile suggestion, would be:
file has N bytes to start with
we want to add on "hello world"
open the file for append
append 11 spaces
i=N
loop {
go back to byte i
read a byte
move to byte i+11
write that byte back
i--
} until i==0
then move to byte 0
write "hello world"
voila
Use FileUtils from Apache Common IO to simplify this if you can. However, it still needs to read the whole file in so it will be slow for large files.
List<String> newList = Arrays.asList("3");
File file = new File("./messages/messages.txt");
newList.addAll(FileUtils.readLines(file));
FileUtils.writeLines(file, newList);
FileUtils also have read/write methods that take care of encoding.
Use RandomAccessFile to read/write the file in reverse order. See following links for more details.
http://www.java2s.com/Code/Java/File-Input-Output/UseRandomAccessFiletoreverseafile.htm
http://download.oracle.com/javase/1.5.0/docs/api/java/io/RandomAccessFile.html
As was suggested here pre-pending to a file is rather difficult and is indeed linked to how files are stored on the hard drive. The operation is not naturally available from the OS so you will have to make it yourself and most obvious answers to this involve reading the whole file and writing it again. this may be fine for you but will incur important costs and could be a bottleneck for your application performance.
Appending would be the natural choice but this would, as far as I understand, make reading the file unnatural.
There are many ways you could tackle this depending on the specificities of your situation.
If writing this file is not time critical in your application and the file does not grow too big you could bite the bullet and read the whole file, prepend the information and write it again. apache's common-io's FileUtils will be of help here simpifying the operation where you can read the file as a list of strings, prepend the new lines to the list and write the list again.
If writing is time critical but have control over the reading or the file. That is, if the file is to be read by another of your programs. you could load the file in a list of lines and reverse the list. Again FileUtils from the common-io library and helper functions in the Collections class in the standard JDK should do the trick nicely.
If writing is time critical but the file is intended to be read through a normal text editor you could create a small class or program that would read the file and write it in another file with the preferred order.
I have a large-ish file (4-5 GB compressed) of small messages that I wish to parse into approximately 6,000 files by message type. Messages are small; anywhere from 5 to 50 bytes depending on the type.
Each message starts with a fixed-size type field (a 6-byte key). If I read a message of type '000001', I want to write append its payload to 000001.dat, etc. The input file contains a mixture of messages; I want N homogeneous output files, where each output file contains only the messages of a given type.
What's an efficient a fast way of writing these messages to so many individual files? I'd like to use as much memory and processing power to get it done as fast as possible. I can write compressed or uncompressed files to the disk.
I'm thinking of using a hashmap with a message type key and an outputstream value, but I'm sure there's a better way to do it.
Thanks!
A Unix-like system will typically have a limit on the number of file handles open at any given time; on my Linux, for example, it's currently at 1024, though I could change it within reason. But there are good reasons for these limits, as open files are a burden to the system.
You haven't yet responded to my question on whether there are multiple occurrences of the same key in your input, meaning that several separate batches of data may need to be concatenated into each file. If this isn't the case, Pace's answer would be handily the best you can do, as all that work needs to be done and there's no sense in setting up a huge administration around such a simple sequence of events.
But if there are multiple messages in your input for the same key, it would be efficient to keep a large number of files open. I'd advise against trying to keep all 6000 open at once, though. Instead, I'd go for something like 500, opened on a first-come-first-served basis; i.e. you open up files for the first 500 (or so) distinct message keys and then chew through your entire input file looking for stuff to add into those 500, then close them all upon hitting EOF on input. You will also need to keep a HashSet of keys already processed, because you then proceed to re-read your input file again, processing the next batch of 500 keys you didn't catch on the first round.
Rationale: Opening and closing a file is (usually) a costly operation; you do NOT want to open and close thousands of files more than once each if you can help it. So you keep as many handles open as possible, all of which end up filled on a single pass through your input. On the other hand, streaming sequentially through a single input file is quite efficient, and even if you have to make 12 passes through your input file, the time to do so will be almost negligible compared to the time needed to open/close 6000 other files.
Pseudocode:
processedSet = [ ]
keysWaiting = true
MAXFILE = 500
handlesMap = [ ]
while (keysWaiting) {
keysWaiting = false
open/rewind input file
while (not EOF(input file)) {
read message
if (handlesMap.containsKey(messageKey)) {
write data to handlesMap.get(messageKey)
} else if (processedSet.contains(messageKey) {
continue // already processed
} else if (handlesMap.size < MAXFILE) {
handlesMap.put(messageKey, new FileOutputStream(messageKey + ".dat")
processedSet.add(messageKey)
write data to handlesMap.get(messageKey)
else
keysWaiting = true
endif
}
for all handlesMap.values() {
close file handle
}
handlesMap.clear
}
You might not need a hash map. You could just...
Read a message
Open the new file in append mode
Write the message to the new file
Close the new file
Not sure if this would be faster though because you'd be doing a lot of opens and closes.
I'd recommend some kind of intelligent pooling: keep the largest/most frequently used files open to improve performance and close the rest to conserve resources.
If the main file is made up mostly of record types 1-5, keep those files open as long as they're needed. The others can be opened and closed as required so that you don't starve the system of resources.
I'm going to make some assumptions about your question:
Each message starts with the message type, as a fixed-size field
You have a heterogenous input file, containing a mixture of messages; you want N homogenous output files, where each output file contains only the messages of a given type.
The approach that jumps to mind is functor based: you create a mapping of message types to objects that handle that particular message. Your main() is a dispatch loop that reads the fixed message header, finds the appropriate functor from the map, then calls it.
You probably won't be able to hold 6,000 files (one per message type) open at once; most operating systems have a limit of around 1,024 simultaneous open files (although with Linux you can change the kernel parameters that control this). So this implies that you'll be opening and closing files repeatedly.
Probably the best approach is to set a fixed-count buffer on every functor, so that it opens, writes, and closes after, say 10 messages. If your messages are max 50 bytes, then that's 500 bytes (10 x 50) x 6,000 that will remain in memory at any given time.
I'd probably write my functors to hold fixed-size byte arrays, and create a generic functor class that reads N bytes at a time into that array:
public class MessageProcessor
{
int _msgSize; // the number of bytes to read per message
byte[] _buf = new byte[1024]; // bigger than I said, but it's only 6 Mb total
int _curSize; // when this approaches _buf.length, write
There's usually limits on open files in the system, and in any case accessing thousands of little files in a more or less random order is going to bog your system down very badly.
Consider breaking the large file up into a file (or some sort of in-memory table, if you've got the memory) of individual messages, and sorting that by message type. Once that is done, write the message out to their appropriate files.
Since you're doing many small writes to many files you want to minimize the number of writes, especially given that the simplest design would pretty much guarantee that each new write would involve a new file open/close.
Instead, why not map each key to a buffer? at the end, write each buffer to disk. Or if you're concerned that you'll be holding too much memory, you could structure your buffers to write every 1K, or 5K, or whatever lines. e.g.
public class HashLogger {
private HashMap<String,MessageBuffer> logs;
public void write(String messageKey, String message)
{
if (!logs.contains(messageKey)) { logs.put(messageKey, new MessageBuffer(messageKey); }
logs.get(messageKey).write(message);
}
public void flush()
{
for (MessageBuffer buffer: logs.values())
{
buffer.flush();
}
// ...flush all the buffers when you're done...
}
private class MessageBuffer {
private MessageBuffer(String name){ ... }
void flush(){ .. something here to write to a file specified by name ... }
void write(String message){
//... something here to add to internal buffer, or StringBuilder, or whatever...
//... you could also have something here that flushes if the internal builder gets larger than N lines ...
}
}
You could even create separate Log4j loggers, which can be configured to use buffered logging, I'd be surprised if more modern logging frameworks like slf4j didn't support this as well.