A good way to store/read a large amount of strings?

A good way to store/read a large amount of strings? - java

I'm developing a "funny quotes" app for android. I have over 1000 quotes which I want to use inside my app but I don't know whether I should use a database or text file. Please note that the app should not read the same sentence twice. and it has a previous/next button so i need to keep track of the previous quotes. please tell me which one is better and more optimized. Also, if you can please link me to a good tutorial about storing/reading the data.
thanks

Use a database. It's faster and more flexible than a text file. One day you will extend the app and then you will be glad you used a database. I recommend to, when you boot up the app, just select all the rows using the in-built random functionality of your database. 1000 rows should not take too long. Then just iterate through the resulting ArrayList (or whatever you choose to use) of strings you end up with - the first quote you show will be element 0 from that list, the second element element 1 from that list, and so on. If you use this approach, you won't need any other structure to keep track of used quotes - just use the iterator variable that you use for indexing the quote array.
fetchAllRows on this page seems to be what you want for getting the data.
If you choose not to keep too much in memory, you could keep just a list of quote IDs that have been used so far. The last element of that list would be the current quote, and the previous elements would be what the user should see when they press the back button.

If you will never read the same string twice the I will recommend you to not use String class as there objects are immutable and will stick in the string pool waiting to be reassigned to a reference, but that will never happen as you will never read the same string twice.
The use of DB's will over complicate things.
I suggest you to read a flat file in bytes and then translate them to StringBuider objects hence keeping it simple enough but still preventing intensive GC().
I hope it helps..

USing DB should be fine as I think you would not want all the data in memory. You can keep all the quotes in DB and keep a flag to keep track whether a quote was read or not (simply update it to true once read.)
This way you can choose from any of the quote which has the flag as false.

Have you considered CsvJdbc? You have the benefit of simple csv files with an easy upgrade path to a real database later when you have a significant number of records.
1k records is quite small and in my opinion not sufficient to merit a database.

Related

Checking if a String array contains a specific string

I have a string array(String []words) which gets populated during runtime and contains almost 400k members. I tried to check if the array contains a specific string :
boolean check=Arrays.asList(words).contains ("code");
I know the "code" is already within the array, but i never get the check true as if the checking process isn't even taking place. I also tried with Hash sets but with no success. Can anyone tell where the problem is???

For the given problem, the answer isn't really clear, as the question is a bit unclear. Most likely, your 400K elements don't contain "code" (for example you forgot to trim input, and there are spaces/newlines around).
But beyond that, a distinct non-answer here: when you have 400K elements to search in, then your idea of doing Arrays.asList(words).contains ("code"); is simply a bad idea.
Walking through 400K elements in order to figure if one entry matches a search word is terrible inefficient. You see, if you want to look up only one word, then why storing all the data in memory? So you probably intend to search for different words over time. And each time, you want to iterate 400K elements to figure if a word is present?
Instead: you should invest into a more appropriate data structure. That could be a (Hash)Set, or a (Hash)Map, or even beyond that (like a full text search enabled service, using Solr, ElasticSearch, ...)
Seriously: repeatedly iterating 400K elements on a mobile device is not something that your users will appreciate. Not at all.

Try below code. I hope it will help you.
boolean check=Arrays.asList(words.matches("(.)code(.)"));

I want to preserve my data during service restart, but my data is not in simple variable name-value or table format. How should I go about this?

I want to preserve data during service restart, which uses a arraylist of {arraylist of integers} and some other variables.
Since it is about 40-60 MB, I don't want it be generated each time the service restarts(it takes a lot of time); I want to generate data once, and maybe copy it for next service restart.
How can it be done?
Please consider how will I go about putting a data structure similar to multidimensional array(3d or above) into file, before suggesting writing the data in a file; which when done, will likely take significant time to read too.

You can try writing your data after generation to a file. Then on next service restart, you can simply read that from the file.

If you need persistent data, then put it into database
https://developer.android.com/guide/topics/data/data-storage
or try some object database like http://objectbox.io/

So you're afraid reading from the file would take along time due to its size, the number and size of the rows (the inner arrays).
I think it might be worthy to stop for a minute and ask yourself whether you need all this data at once. Maybe you only need a portion of it at any given time and there are scenarios in which you don't use some (or maybe most) of the data? If this is likely, I would suggest that you'll compute the data on demand, when required, and only keep a memory based cache for future demand in the current session.
Otherwise, if you do need all the data at a given time, you have a trade-off here. Trade-off between size on disk and processing time. You can shrink the data using some algorithm, but it would be at the expense of the processing time. On the hand, you can just serialize your object of data and save it to disk as is. Less time, more disk space.
Another solution for your scenario, could be, to just use a DB and a cursor (room on top sqlite). I don't exactly know what it is that you're trying to do, but your arrays can easily be modeled into a DB. Model a single row as you'd like and add to that model the outer index of the array. Then save the models into the DB, potentially making the outer index field the primary key if the DB.
Regardless of the things I wrote, try to think if you really need this data persistent on your client, maybe you can store it at the server side? If so, there are other storage and access solutions which are not included at the Android client side.

Thank you all for answering this question.
This is what I have finally settled for:
Instead of using the structure as part of the app, I made this into a
tool, which will prepare data to be used with the main app. In doing
so, it also stopped the concern regarding service restart.
This tool will first read all the strings from input file(s).
Then put all of them into the structure one at a time.(This will be
the part which I was having doubts, and asked the question about.
Since all the data is into the structure here, as soon as program
terminates, this structured data is unusable.)
Now, I prepared another structure for putting this data into file,
and put all this data into file so that I do not need to read to all
input file again and again, but only few lines.
Then I thought, why spend time "read"ing files while I can hard code
it into my app. So, as final step of this preprocessing tool, I made
it into a class which has switch(input){case X: return Y}.
Now I will just have to put this class into the app I wanted to make.
I know this all sounds very abstract, even stretching the concept of abstract, if you want to know details, please let me know. I am also including link of my "tool". Please visit and let me know if there would have been some better way.
P.S. There could be errors in this tool yet, which if you find, let me know to fix them.
P.P.S.
link: Kompressor Tool

Speed a search cache without using too much memory

I have to access a database with 380,000 entries. I don't have write access to the DB, I can just read it. I've made a search function using a map to search for users by firstname. Here is my process:
1 - Load everything from the DB
2 - Store everything into a Map<Charactere, ArrayList<User>>, using Alpha letters to store users according to the first letter of their firstname.
<A> {Alba, jessica, Alliah jane, etc ...}
<B> {Birsmben bani, etc ...}
When someone searches for a user, I take the firstletter of the firstname typed and use map.get(firstletter), then iterate on the ArrayList to find all the users.
The Map Take a huge space in the memory I guess (380,000 User object). I had to increase the heap size
I want to make it faster. Use firstname as key for the Map, in order to make it faster (there are many people with the same firstname).
I have two solutions in mind:
1 - Still use a map with firstname as key (increasing the heap size again?)
2 - Use files on the disk instead of Map (Alba.dat will contain all Alba for example) and open the right file for each search. No need to incease the heap size, but are there any side effects?
Which one is better? (pros and cons)
Update with more info
It's a database of customers who calls our customer service on the phone. The person who takes the call has to search using the customers names (usually firstname and then lastname). Using the Db is too slow to search. The solution I've implemented is much faster already (1/2 seconds vs 26 seconds using the db), but I want to improve it.

IMHO, I don't think you have to cache all the entries in memory, but a part of them, maybe:
Maybe just use a ring buffer, or
More complicated, and make more sense, to implement a LFU Cache, that keeps the N top most frequently accessed item only. See this question for a hint of how to implement such a cache.

There are several issues with your approach:
It implies that the number in users doesn't change, a good application design would work with any number of users without software change
It implies that the current problem is the only one. What happens if the next requirement that needs implementation is "search by caller id" or "search by zip code"?
It is reinventing the wheel, you are currently starting to write a database, index or information retrieval solution (however you want to name it) from scratch
The right thing to do is to export the user data into a database engine which provides proper search capabilities. The export/extraction hopefully can be speed up, if you have modification time stamps or if you can intercept updates and reapply it to your search index.
What you use for your search does not matter to much, a simple database on a modern system is fast enough. Most also provide indexing capabilities to speed up your search. If you want something which can be embedded in your application and is specialized on search and solves your problems above, I'd recommend using Lucene.

Fast way to alphabetically sort the contents of a file in java

Can anyone recommend a fast way to sort the contents of a text file, based on the first X amount of characters of each line?
For example if i have in the text file the following text
Adrian Graham some more text here
John Adams some more text here
Then another record needs to be inserted for eg.
Bob Something some more text here
I need to keep the file sorted but this is a rather big file and i'd rather not load it entirely into memory at once.
By big i mean about 500 000 lines, so perhaps not terribly huge.
I've had a search around and found http://www.codeodor.com/index.cfm/2007/5/14/Re-Sorting-really-BIG-files---the-Java-source-code/1208
and i wanted to know if anyone could suggest any other ways? For the sake of having second opinions?
My initial idea before i read the above linked article was:
Read the file
Split it into several files, for eg A to Z
If a line begins with "a" then it is written to the file called A.txt
Each of the files then have their contents sorted (no clear idea how just yet apart from alphabetical order)
Then when it comes to reading data, i know that if i want to find a line which starts with A then i open A.txt
When inserting a new line the same thing applies and i just append to the end of the file. Later after the insert when there is time i can invoke my sorting program to reorder the files that have had stuff appended to them.
I realise that there are a few flaws in this like for eg. there won't be an even number of lines that start with a particular letter so some files may be bigger than others etc.
Which again is why i need a second opinion for suggestions on how to approach this?
The current program is in java but any programming language could be used for an example that would achieve this...I'll port what i need to.
(If anyone's wondering i'm not deliberately trying to give myself a headache by storing info this way, i inherited a painful little program which stores data to files instead of using some kind of database)
Thanks in advance

You may also want to simply call the DOS "sort" command to sort the file. It is quick and will require next to no programming on your part.
In a DOS box, type help sort|more for the sort syntax and options.

500,000 shouldn't really be that much to sort. Read the whole thing into memory, and then sort it using standard built in functions. I you really find that these are too slow, then move onto something more complicated. 500,000 lines x about 60 bytes per line still only ends up being 30 megs.

Another option might be to read the file and put it in a lightweight db (for example hsqldb in file mode)
Then get the data sorted, and write it back to a file. (Or simply migrate to program, so it uses a db)

The right way to manage a big matrix in Java

I'm working with a big matrix (not sparse), it contains about 10^10 double.
Of course I cannot keep it in memory, and I need just 1 row at time.
I thought to split it in files, every file 1 row (it requires a lot of files) and just read a file every time I need a row. do you know any more efficient way?

Why do you want to store it in different files? Can't u use a single file?
You could use functions inside RandomAccessFile class to perform the reading from that File.

So, 800KB per file, sounds like a good division. Nothing really stops you from using one giant file, of course. A matrix, at least one like yours that isn't sparse, can be considered a file of fixed length records, making random access a trivial matter.
If you do store it one file per row, I might suggest making a directory tree corresponding to decimal digits, so 0/0/0/0 through 9/9/9/9.
Considerations one way or the other...
is it being backed up? Do you have high-capacity backup media or something ordinary?
does this file ever change?
if it does change and it is backed up, does it change all at once or are changes localized?

It depends on the algorithms you want to execute, but I guess that in most cases a representation where each file contains some square or rectangular region would be better.
For example, matrix multiplication can be done recursively by breaking a matrix into submatrices.

If you are going to be saving it in a file, I believe serializing it will save space/time over storing it as text.
Serializing the doubles will store them as 2 bytes (plus serialization overhead) and means that you will not have to convert these doubles back and forth to and from Strings when saving or loading the file.

I'd suggest to use a disk-persistent cache like Ehcache. Just configure it to keep as many fragments of your matrix in memory as you like and it will take care of the serialization. All you have to do is decide on the way of fragmentation.
Another approach that comes to my mind is using Terracotta (which recently bought Ehache by the way). It's great to get a large network-attached heap that can easily manage your 10^10 double values without caring about it in code at all.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.