How can I put >1000 strings in an array & make them searchable? - java

I'm learning to code by myself (Java and Android) and I'm working on creating an application for Android.
It's a language application (Grammar) to analyze verb.
What I have currently is I have a list of 1000 verb and each one has it's own way of forms and I'm stuck to find way to store them as strings but don't want to use a lot of string or too long array.
I was thinking of array of string but an array to have 1000 string not sure if this is really practical.
I thought of creating what I need in an excel and then to use this excel as the storage where the app can use it to search in it and show the results found there in a TextView but again not quite sure if this will work with Android.
Let's say I have the below 3 verbs in infinitive
Akl
ktv
hlk
Now the first verb can come in another 2 forms (Nakl - Hikel) and the other verbs too have their own forms.
What I want to do is, when a user type the verb whether in past or present for example (Akled "past" or hikeling "present") the system will substring only the verb by removing the ending and then use what is left (for example: akled ---> akl) to show the other forms, in this case if it is "akled" then system will use "akl" and show nakl - hikel.
Example:
User type in text box (akled) and press analyse
System will do the following:
substring the verb only (akl)
then based on this verb will show other forms, which are (nakl - hikel).
Is this doable with huge number of verbs? Let's say each verb has only 2 other forms, so based on this the 1000 verb has 2000 other forms.

Don't bother about loading too many strings in the memory. Strings are internally represented as array of characters and char type in Java takes 2 bytes. So, if you were to keep 100,000 strings (each 20 characters long), then the total memory occupied by the String[] of 100,000 elements will be 100,000 * 20 * 2 = 4,000,000 bytes = 4 MB. And JVM heap size is usually in Gigabytes, so you shouldn't be bothered about whether you should load this much strings in memory or not. Even if you load 10 times the above, i.e. 1,000,000 strings, you'll be occupying only 40 MB of memory.

you can create your own String resource files in android, where you can store there all the strings used on the app: https://developer.android.com/guide/topics/resources/string-resource
above link will show you the format to use, where to put that file of strings in your android project and how to use it.
on this link you have an example of how to get an string array from resources:
https://www.android-examples.com/get-string-array-from-strings-xml-in-android/

Related

extraction of multiple occurrences of variable data from large string

I have a very long string in a text file.It is basically the below string repeated around 1000 times (as one long string, not 1000 strings).The string has variables which change with each repetition (those in bold).I'd like to extract the variables in an automated way, and return the output into either a CSV or formatted txt file (Random Bank, Random Rate, Random Product)I can do this successfully using https://regex101.com, however it involves a lot of manual copy&paste.I'd like to write a bash script to automate extracting the information, but have no luck in attempting various grep commands.How can this be done? (I'd also consider doing in Java).
[{"AccountName":"Random Product","AccountType":"Variable","AccountTypeId":1,"AER":Random Rate,"CanManageByMobileApp":false,"CanManageByPost":true,"CanManageByTelephone":true,"CanManageInBranch":false,"CanManageOnline":true,"CanOpenByMobileApp":false,"CanOpenByPost":false,"CanOpenByTelephone":false,"CanOpenInBranch":false,"CanOpenOnline":true,"Company":"Random Bank","Id":"S9701Monthly","InterestPaidFrequency":"Monthly"
This is JSON formatted data which you can't parse with regular expression engines. Get a JSON parser. If this file is larger than, say, 1GB, find one that lets you 'stream' (which is the term for parsing it and dealing with the data as it parses, vs the more usual route which turns the entire input into an object; if the file is huge, that object'd be huge, might run out of memory - hence you'd need the streaming aspect).
Here is one tutorial for Jackson-streaming.

Java Compress Multiple strings with the same rule

I'm creating an android application that needs a massive database (70mb but the application has to work offline...). The largest table has two columns, a keyword and a definition. The definitions themselves are relatively short, usually under 2000 characters, so compressing each one individually wouldn't save me very much since compression libraries store the rules decompress the strings as part of the compressed string.
However if I could compress all of these strings with the same set of rules and then store just the compressed data in the DB and the rules elsewhere, I could save a lot of space. Does anyone know of a library that will let me do something like this?
Desired behavior:
public String getDefinition(String keyword) {
DecompressionObject decompresser = new DecompressionObject(RULES_FILE);
byte[] data = queryDatabase(keyword);
return decompresser.decompress(keyword);
}
The "rules" as you call them is not why you are getting limited compression efficacy. The Huffman code table that precedes the data in a deflate stream is around 80 bytes, and so is not significant compared to your 2000 byte string.
What is limiting the compression efficacy is simply a lack of history from which to draw matching strings. The only place to look for matching strings is in the 2000 characters, and then only in the preceding characters at any point in the compression.
What you could do to improve compression would be to create a dictionary of common strings that would be used as history to precede each string you are compressing. Then that same dictionary is provided to the decompressor ahead of time for it to use to decompress each string. This assumes that there is some commonality of content in your ensemble of strings.
zlib provides these functions in deflateSetDictionary() and inflateSetDictionary().

best way of loading a large text file in java

I have a text file, with a sequence of integer per line:
47202 1457 51821 59788
49330 98706 36031 16399 1465
...
The file has 3 million lines of this format. I have to load this file into the memory and extract 5-grams out of it and do some statistics on it. I do have memory limitation (8GB RAM). I tried to minimize the number of objects I create (only have 1 class with 6 float variables, and some methods). And each line of that file, basically generates number of objects of this class (proportional to the size of the line in temrs of #ofwords). I started to feel that Java is not a good way to do these things when C++ is around.
Edit:
Assume that each line produces (n-1) objects of that class. Where n is the number of tokens in that line separated by space (i.e. 1457). So considering the average size of 10 words per line, each line gets mapped to 9 objects on average. So, there will be 9*3*10^6 objects.So, the memory needed is: 9*3*10^6*(8 bytes obj header + 6 x 4 byte floats) + (a map(String,Objects) and another map (Integer,ArrayList(Objects))). I need to keep everything in the memory, because there will be some mathematical optimization happening afterwards.
Reading/Parsing the file:
The best way to handle large files, in any language, is to try and NOT load them into memory.
In java, have a look at MappedByteBuffer. it allows you to map a file into process memory and access its contents without loading the whole thing into your heap.
You might also try reading the file line-by-line and discarding each line after you read it - again to avoid holding the entire file in memory at once.
Handling the resulting objects
For dealing with the objects you produce while parsing, there are several options:
Same as with the file itself - if you can perform whatever it is you want to perform without keeping all of them in memory (while "streaming" the file) - that is the best solution. you didnt describe the problem youre trying to solve so i dont know if thats possible.
Compression of some sort - switch from Wrapper objects (Float) to primitives (float), use something like the flyweight pattern to store your data in giant float[] arrays and only construct short-lived objects to access it, find some pattern in your data that allows you to store it more compactly
Caching/offload - if your data still doesnt fit in memory "page it out" to disk. this can be as simple as extending guava to page out to disk or bringing in a library like ehcache or the likes.
a note on java collections and maps in particular
For small objects java collections and maps in particular incur a large memory penalty (due mostly to everything being wrapped as Objects and the existence of the Map.Entry inner class instances). at the cost of a slightly less elegant API, you should probably look at gnu trove collections if memory consumption is an issue.
Optimal would be to hold only integers and line ends.
To that end, one way would be: convert the file to two files:
one binary file of integers (4 bytes)
one binary file with indexes where the next line would start.
For this one can use a Scanner to read, and a DataOutputStream+BufferedOutputStream to write.
Then you can load those two files in arrays of primitive type:
int[] integers = new int[(int)integersFile.length() / 4];
int[] lineEnds = new int[(int)lineEndsFile.length() / 4];
Reading can be done with a MappedByteBuffer.toIntBuffer(). (You then would not even need the arrays, but it would become a bit COBOL like verbose.)

Taking random defined Strings

I want to do an app that, in function of the parameters checked, gives me a random String that I previusly set. The problem isn't the design of the app so the problem is the methods or classes that do what I want.
-I explain my objective with an example:
We have 100 plans to do this weekend. They are in a box and I want to pick randomly one. So great, I suppose that would be more or less easy, but, this weekend is raining so I want to pick a plan that it could be done even if it rains. And it's ending the month and I don't want to spend money so a free plan would be great. Then, I check the boxes "Free plan" and "Raining" and the app gives to me a random plan that meet the conditions.
How would you do that?
I was thinking about a bigs arrays like "Raining plans", "Free plans", with strings inside and when you check the two boxes the arrays were mixed and a string was randomly picked but I don't know how to do that and I think it could be done in another easiest way.
Thanks in advance.
Do with a two dimentional array as follows:
arr[filter][tasks];
let the filter values be binary strings converted to int
say you have two filters as aforementioned "rainy" and "free", the possible vaues are
00 - 0(decimal value 0) - not rainy, not free
01 - 1(decimal value 1) - not rainy, free
10 - 2(decimal value 2) - rainy, not free
11 - 3(decimal value 3) - rainy, free
when user checks both the values, then you have 1 and 1. Now convert binary 11 to integer which would be your first index. Get the random second index for the task.
You can have any number of filters
Hint: Integer.parseInt(binary_str, 2) for binary to int conversion.

Huge String Table in Java

I've got a question about storing huge amount of Strings in application memory. I need to load from file and store about 5 millions lines, each of them max 255 chars (urls), but mostly ~50. From time to time i'll need to search one of them. Is it possible to do this app runnable on ~1GB of RAM?
Will
ArrayList <String> list = new ArrayList<String>();
work?
As far as I know String in java is coded in UTF-8, what gives me huge memory use. Is it possible to make such array with String coded in ANSI?
This is console application run with parameters:
java -Xmx1024M -Xms1024M -jar "PServer.jar" nogui
The latest JVMs support -XX:+UseCompressedStrings by default which stores strings which only use ASCII as a byte[] internally.
Having several GB of text in a List isn't a problem, but it can take a while to load from disk (many seconds)
If the average URL is 50 chars which are ASCII, with 32 bytes of overhead per String, 5 M entries could use about 400 MB which isn't much for a modern PC or server.
A Java String is a full blown object. This means that appart from the characters of the string theirselves, there is other information to store in it (a pointer to the class of the object, a counter with the number of pointers pointing to it, and some other infrastructure data). So an empty String already takes 45 bytes in memory (as you can see here).
Now you just have to add the maximum lenght of your string and make some easy calculations to get the maximum memory of that list.
Anyway, I would suggest you to load the string as byte[] if you have memory issues. That way you can control the encoding and you can still do searchs.
Is there some reason you need to restrict it to 1G? If you want to search through them, you definitely don't want to swap to disk, but if the machine has more memory it makes sense to go higher then 1G.
If you have to search, use a SortedSet, not an ArrayList

Categories

Resources