Caching large Arrays to SQLite - Java/ Android - java

Im currently developing a system where the user will end up having large arrays( using android). However the JVM memory is at risk of running out, so in order to prevent this I was thinking of creating a temporary database and store the data in there. However, one of the concerns that comes to me is the SDcard limited by read and write. Also another problem that comes to mind is the overhead of such an operation. Can anyone clear up my concerns, as well as also suggest a possibly good alternative to handling large arrays ( in the end these arrays will be uploaded to a website by writing a csv file and uploading it).
Thanks,
Faisal

A couple of thoughts:
You could store them using a dbms like Derby, which is built into many versions of java
You could store them in a compressed output stream that writes to bytes - this would work especially well if the data is easily compressed, i.e. regularly repeating numbers, text, etc
You could upload portions of the arrays at a time, i.e. as you generate them, begin uploading pieces of the data up to the servers in chunks

Related

Impact of writing to multiple opened files

I am trying to optimize the logging system of an Android application which causes some unwanted latency. There are multiple files opened which log different parts and should be kept separate.
I am not very familiar with low-level filesystem design and even less with current flash and/or SSD memory used in mobile phones (opposed to traditional HDD). I assume that memory is organized in disk blocks (512B or 4096B more recently) and some form of continuous, linked or indexed allocations is used.
I am using BufferedOutputStreams with buffer size of 256B, but this values is chosen at random (this provides a good answer for buffer size).
Writing in append mode to multiple opened files creates additional overhead that can significantly decrease performance (from allocation strategy for ex.)? Is it influenced greatly by the buffered output buffer size (this particular case of multiple files)?
I am using Android which tends to have a variety of FSs and makes it hard to understand how each influences the append to multiple opened files. Probably the I/O functions of Java or any other are very similar.
My search for this particular issue turned empty or maybe I need some domain specific terms in my search that I am not familiar with.

Writing large data amounts to file quickly in Android

I have an issue in Java that I've been going at for about a week. I'm trying to find an efficient way to write around 90-120MB a second to external storage on Android without dropping data. In my test I have been getting 18-20MB/s which is the normal speed range for the external storage, however since the heap in Android per process is 140MB-256MB (I get 138MB max using largeHeap flag) it's impossible to make a large enough buffer to keep the data in tact using Java linked list or arrays (the data is generated at 90+MB/s). Is JNI a good option to create a large enough circular array or linked list to keep the data while the file write catches up? I saw another post where someone suggested making a service run in a separate process to get "extra" memory . I'm just worried about serialization slowing down the transfer of the generated data. I've deleted all of my code so far in frustration. Thanks in advance.
To write 90-120 MB/s you need a device which supports this data rate. Note: a typical HDD can't do this much.
You can add some extra buffering but this is not sustainable. I suggest instead you find a way to write the same information in less data. e.g. by compressing the data.
BTW JNI won't make your device run faster, or give you more memory.

How to return a lot of data from the database to a web client?

I have the following problem:
I have a web application that stores data in the database. I would like for the clients to be able to extract the data e.g. of 2 tables to a file (local to the client).
The database could be arbitrarily big (meaning I have no idea how many data can potentially be in the database. Could be huge).
What is the best approach for this?
Should all the data be SELECTed out of the tables and returned to the client as a single structure to be stored in a file?
Or should the data be retrieved in parts e.g. first 100 then next 100 entries etc and create the single structure in the client?
Are there any pros-cons to consider here?
I've built something similar - there are some really awkward problems here, especially as the filesize can grow beyond what you can comfortably handle in a browser. As the amount of data grows, the time to generate the file increases; this in turn is not what a web application is good at, so you run the risk of your web server getting unhappy with even a smallish number of visitors all requesting a large file.
What we did is split the application into 3 parts.
The "file request" was a simple web page, in which authenticated users can request their file. This kicks off the second part outside the context of the web page request:
File generator.
In our case, this was a windows service which looked at a database table with file requests, picked the latest one, ran the appropriate SQL query, wrote the output to a CSV file, and ZIPped that file, before moving it to the output directory and mailing the user with a link. It set the state of the record in the database to make sure only one process happened at any one point in time.
FTP/WebDAV site:
The ZIP files were written to a folder which was accessible via FTP and WebDAV - these protocols tend to do better with huge files than a standard HTTP download.
This worked pretty well - users didn't like to wait for their files, but the delay was rarely more than a few minutes.
We have a similar use case with an oracle cluster containig approx. 40GB of data. The solution working best for us is a maximum of data per select statement as it reduces DB-overhead significantly.
That being said, there are three optimizations which worked very well for us:
1.) We partition the data into 10 roughly same-sized sets and select them from the database in parallel. For our cluster we found that 8 connections in parallel work approx. 8 times faster than a single connection. There is some additional speedup up to 12 connections but that depends on your database and your dba.
2.) Keep away from hibernate or other ORMs and use custom made JDBCs once you talk about large amounts of data. Use all optimiziations you can get there (e.g. ResultSet.setFetchSize())
3.) Our data compresses very well and putting the data through a gziper saves lots of I/O time. In our case it eliminated I/O from the critical path. By the way, this is also true for storing the data in a file.

Memory-Mapped MappedByteBuffer or Direct ByteBuffer for DB Implementation?

This looks like a long question because of all the context. There are 2 questions inside the novel below. Thank you for taking the time to read this and provide assistance.
Situation
I am working on a scalable datastore implementation that can support working with data files from a few KB to a TB or more in size on a 32-bit or 64-bit system.
The datastore utilizes a Copy-on-Write design; always appending new or modified data to the end of the data file and never doing in-place edits to existing data.
The system can host 1 or more database; each represented by a file on-disk.
The details of the implementation are not important; the only important detail being that I need to constantly append to the file and grow it from KB, to MB, to GB to TB while at the same time randomly skipping around the file for read operations to answer client requests.
First-Thoughts
At first glance I knew I wanted to use memory-mapped files so I could push the burden of efficiently managing the in-memory state of the data onto the host OS and out of my code.
Then all my code needs to worry about is serializing the append-to-file operations on-write, and allowing any number of simultaneous readers to seek in the file to answer requests.
Design
Because the individual data-files can grow beyond the 2GB limit of a MappedByteBuffer, I expect that my design will have to include an abstraction layer that takes a write offset and converts it into an offset inside of a specific 2GB segment.
So far so good...
Problems
This is where I started to get hung up and think that going with a different design (proposed below) might be the better way to do this.
From reading through 20 or so "memory mapped" related questions here on SO, it seems mmap calls are sensitive to wanting contiguous runs of memory when allocated. So, for example, on a 32-bit host OS if I tried to mmap a 2GB file, due to memory fragmentation, my chances are slim that mapping will succeed and instead I should use something like a series of 128MB mappings to pull an entire file in.
When I think of that design, even say using 1024MB mmap sizes, for a DBMS hosting up a few huge databases all represented by say 1TB files, I now have thousands of memory-mapped regions in memory and in my own testing on Windows 7 trying to create a few hundred mmaps across a multi-GB file, I didn't just run into exceptions, I actually got the JVM to segfault every time I tried to allocate too much and in one case got the video in my Windows 7 machine to cut out and re-initialize with a OS-error-popup I've never seen before.
Regardless of the argument of "you'll never likely handle files that large" or "this is a contrived example", the fact that I could code something up like that with those type of side effects put my internal alarm on high-alert and made consider an alternative impl (below).
BESIDES that issue, my understanding of memory-mapped files is that I have to re-create the mapping every time the file is grown, so in the case of this file that is append-only in design, it literally constantly growing.
I can combat this to some extent by growing the file in chunks (say 8MB at a time) and only re-create the mapping every 8MB, but the need to constantly be re-creating these mappings has me nervous especially with no explicit unmap feature supported in Java.
Question #1 of 2
Given all of my findings up to this point, I would dismiss memory-mapped files as a good solution for primarily read-heavy solutions or read-only solutions, but not write-heavy solutions given the need to re-create the mapping constantly.
But then I look around at the landscape around me with solutions like MongoDB embracing memory-mapped files all over the place and I feel like I a missing some core component here (I do know it allocs in something like 2GB extents at a time, so I imagine they are working around the re-map cost with this logic AND helping to maintain sequential runs on-disk).
At this point I don't know if the problem is Java's lack of an unmap operation that makes this so much more dangerous and unsuitable for my uses or if my understanding is incorrect and someone can point me North.
Alternative Design
An alternative design to the memory-mapped one proposed above that I will go with if my understanding of mmap is correct is as follows:
Define a direct ByteBuffer of a reasonable configurable size (2, 4, 8, 16, 32, 64, 128KB roughly) making it easily compatible with any host platform (don't need to worry about the DBMS itself causing thrashing scenarios) and using the original FileChannel, perform specific-offset reads of the file 1 buffer-capacity-chunk at a time, completely forgoing memory-mapped files at all.
The downside being that now my code has to worry about things like "did I read enough from the file to load the complete record?"
Another down-side is that I don't get to make use of the OS's virtual memory logic, letting it keep more "hot" data in-memory for me automatically; instead I just have to hope the file cache logic employed by the OS is big enough to do something helpful for me here.
Question #2 of 2
I was hoping to get a confirmation of my understanding of all of this.
For example, maybe the file cache is fantastic, that in both cases (memory mapped or direct reads), the host OS will keep as much of my hot data available as possible and the performance difference for large files is negligible.
Or maybe my understanding of the sensitive requirements for memory-mapped files (contiguous memory) are incorrect and I can ignore all that.
You might be interested in https://github.com/peter-lawrey/Java-Chronicle
In this I create multiple memory mappings to the same file (the size is a power of 2 up to 1 GB) The file can be any size (up to the size of your hard drive)
It also creates an index so you can find any record at random and each record can be any size.
It can be shared between processes and used for low latency events between processes.
I make the assumption you are using a 64-bit OS if you want to use large amounts of data. In this case a List of MappedByteBuffer will be all you ever need. It makes sense to use the right tools for the job. ;)
I have found it performance well even with data sizes around 10x your main memory size (I was using a fast SSD drive so YMMV)
I think you shouldn't worry about mmap'ping files up to 2GB in size.
Looking at the sources of MongoDB as an example of DB making use of memory mapped files you'll find it always maps full data file in MemoryMappedFile::mapWithOptions() (which calls MemoryMappedFile::map()). DB data spans across multiple files each up to 2GB in size. Also it preallocates data files so there's no need to remap as the data grows and this prevents file fragmentation. Generally you can inspire yourself with the source code of this DB.

Best practice for storing large amounts of data with J2ME

I am developing a J2ME application that has a large amount of data to store on the device (in the region of 1MB but variable). I can't rely on the file system so I'm stuck the Record Management System (RMS), which allows multiple record stores but each have a limited size. My initial target platform, Blackberry, limits each to 64KB.
I'm wondering if anyone else has had to tackle the problem of storing a large amount of data in the RMS and how they managed it? I'm thinking of having to calculate record sizes and split one data set accross multiple stores if its too large, but that adds a lot of complexity to keep it intact.
There is lots of different types of data being stored but only one set in particular will exceed the 64KB limit.
For anything past a few kilobytes you need to use either JSR 75 or a remote server. RMS records are extremely limited in size and speed, even in some higher end handsets. If you need to juggle 1MB of data in J2ME the only reliable, portable way is to store it on the network. The HttpConnection class and the GET and POST methods are always supported.
On the handsets that support JSR 75 FileConnection it may be valid alternative but without code signing it is an user experience nightmare. Almost every single API call will invoke a security prompt with no blanket permission choice. Companies that deploy apps with JSR 75 usually need half a dozen binaries for every port just to cover a small part of the possible certificates. And this is just for the manufacturer certificates; some handsets only have carrier-locked certificates.
RMS performance and implementation varies wildly between devices, so if platform portability is a problem, you may find that your code works well on some devices and not others. RMS is designed to store small amounts of data (High score tables, or whatever) not large amounts.
You might find that some platforms are faster with files stored in multiple record stores. Some are faster with multiple records within one store. Many are ok for storage, but become unusably slow when deleting large amounts of data from the store.
Your best bet is to use JSR-75 instead where available, and create your own file store interface that falls back to RMS if nothing better is supported.
Unfortunately when it comes to JavaME, you are often drawn into writing device-specific variants of your code.
I think the most flexible approach would be to implement your own file system on top of the RMS. You can handle the RMS records in a similar way as blocks on a hard drive and use a inode structure or similar to spread logical files over multiple blocks. I would recommend implementing a byte or stream-oriented interface on top of the blocks, and then possibly making another API layer on top of that for writing special data structures (or simply make your objects serializable to the data stream).
Tanenbaum's classical book on operating systems covers how to implement a simple file system, but I am sure you can find other resources online if you don't like paper.
Under Blackberry OS 4.6 the RMS store size limit has been increased to 512Kb but this isn't much help as many devices will likely not have support for 4.6. The other option on Blackberry is the Persistent Store which has a record size limit of 64kb but no limit on the size of the store (other than the physical limits of the device).
I think Carlos and izb are right.
It is quite simple, use JSR75 (FileConnection) and remember to sign your midlet with a valid (trusted) certificate.
For read only I'm arriving at acceptable times (within 10s), by indexing a resource file. I've got two ~800KB CSV price list exports. Program classes and both those files compress to a 300KB JAR.
On searching I display a List and run a two Threads in the background to fill it, so the first results come pretty quickly and are viewable immediately. I first implemented a simple linear search, but that was too slow (~2min).
Then I indexed the file (which is alphabetically sorted) to find the beginnings of each letter. Now before parsing line by line, I first InputStreamReader.skip() to the desired position, based on first letter. I suspect the delay comes mostly from decompressing the resource, so splitting resources would speed it up further. I don't want to do that, not to loose the advantage of easy upgrade. CSV are exported without any preprocessing.
I'm just starting to code for JavaME, but have experience with old versions of PalmOS, where all data chunks are limited in size, requiring the design of data structures using record indexes and offsets.
Thanks everyone for useful commments. In the end the simplest solution was to limit the amount of data being stored, implementing code that adjusts the data according to how large the store is, and fetching data from the server on demand if its not stored locally. Thats interesting that the limit is increased in OS 4.6, with any luck my code will simply adjust on its own and store more data :)
Developing a J2ME application for Blackberry without using the .cod compiler limits the use of JSR 75 some what since we can't sign the archive. As pointed out by Carlos this is a problem on any platform and I've had similar issues using the PIM part of it. The RMS seems to be incredibly slow on the Blackberry platform so I'm not sure how useful a inode/b-tree file system on top would be, unless data was cached in memory and written to RMS in a low priority background thread.

Categories

Resources