Checking if a String array contains a specific string

Checking if a String array contains a specific string - java

I have a string array(String []words) which gets populated during runtime and contains almost 400k members. I tried to check if the array contains a specific string :
boolean check=Arrays.asList(words).contains ("code");
I know the "code" is already within the array, but i never get the check true as if the checking process isn't even taking place. I also tried with Hash sets but with no success. Can anyone tell where the problem is???

For the given problem, the answer isn't really clear, as the question is a bit unclear. Most likely, your 400K elements don't contain "code" (for example you forgot to trim input, and there are spaces/newlines around).
But beyond that, a distinct non-answer here: when you have 400K elements to search in, then your idea of doing Arrays.asList(words).contains ("code"); is simply a bad idea.
Walking through 400K elements in order to figure if one entry matches a search word is terrible inefficient. You see, if you want to look up only one word, then why storing all the data in memory? So you probably intend to search for different words over time. And each time, you want to iterate 400K elements to figure if a word is present?
Instead: you should invest into a more appropriate data structure. That could be a (Hash)Set, or a (Hash)Map, or even beyond that (like a full text search enabled service, using Solr, ElasticSearch, ...)
Seriously: repeatedly iterating 400K elements on a mobile device is not something that your users will appreciate. Not at all.

Try below code. I hope it will help you.
boolean check=Arrays.asList(words.matches("(.)code(.)"));

Related

How do I search many text files for a string extremely quickly?

I'm working on a program for my debate team, and one of the features of it will be that will search through text files for certain keywords. Since there is always a limited time to prepare speeches in debate, speed is my absolute top priority, but the methods of searching that I've tried so far aren't fast enough. The fastest way I've tried was using grep to search each of the files and it technically works, but there are about 2500 files for it to search through, so even though it takes like 5 milliseconds per file, that time adds up really quickly when searching for multiple keywords, or searching for different things as the user would need them.
What I really need is a way to perhaps ensure that my program wont be searching through every document when it's searching or something that would essentially cut down the number of documents it has to look through. Does anyone know if something like that is possible? Or if not, could anyone point me in the direction of something to research that would cut down the search time in other ways?

I think you are looking for text search engine. I believe Apache Lucene will help you. What you can do is to create an index of all your files, based on the content of these files. Then you can quickly search over that index for interesting words and sentences so the Lucene will tell you in which file is that word/sentence best match.
The index should be stored in a file so you don't have to re-create it every time you start searching, but only extend it when the new document comes.
Lucene will do even more for you because it can search for similar words (like google does).
Describing the Lucene engine usage is I think out of the scope of this short answer, but I believe you will find the nice intro follow this link:
http://www.lucenetutorial.com/sample-apps/textfileindexer-java.html

Either use Lucene or some kind of index as stated by Vicctor.
Or, see other grep like solutions:
ignore some files if possible
Fastest possible grep <- Interesting
https://beyondgrep.com/feature-comparison/
Or if you want to learn how to code, try doing it yourself !

Which Collection<String> to use to search data with the character by character input

Not sure if the question is self explanatory or not but I was asked this question in one interview.
Lets say, we have one Input box in our UI(Say First Name) and we have one table in DB(Say Users). Someone comes and write "S" there and he is given one list of all people's name starting with "S", as a hint. Then he adds "A" in it and then he is given a list of people's name starting with "SA" and so on. To achieve this, we have our JSP->Controller->Service->DAO->DB.
So instead of hitting DB again and again, we keep the whole data in server's cache or in session in one Collection.
Then which collection we should use for this scenario where after every onChange() of input box we fetch the proper data from Collection and show it to the user? Assuming, we have millions of data per character.

I think that the answer that the interviewer was probably angling for was ConcurrentSkipListSet. You can read the javadocs here, review the methods, and figure out if it meets the requirements.

Java- Parsing a large text file

I had a quick question. I'm working on a school project and I need to parse an extremely large text file. It's for a database class, so I need to get unique actor names from the file because actors will be a primary key in the mysql database. I've already written the parser and it works great, but at the time I forgot to remove the duplicates. So, I decided the easiest way would be to create an actors arraylist. (Using ArrayList ADT) Then use the contain() method to check if the actor name is in the arraylist before I print it to a new text file. If it is I do nothing, if it isn't I add it to the arraylist and print to the page. Now the program is running extremely slow. Before the arraylist, it took about 5 minutes. The old actor file was 180k without duplicates removed. Now its been running for 30 minutes and at 12k so far. (I'm expecting 100k-150k total this time.)
I left the size of the arraylist blank because I dont know how many actors are in the file, but at least 1-2 million. I was thinking of just putting 5 million in for its size and checking to see if it got them all after. (Simply check last arraylist index and if empty, it didnt run out of space.) Would this reduce time because the arraylist isnt redoubling constantly and recopying everything over? Is there another method which would be faster than this? I'm also concerned my computer might run out of memory before it completes. Any advice would be great.
(Also I did try running 'unique' command on the text file without success. The actor names print out 1 per line. (in one column) I was thinking maybe the command was wrong. How would you remove duplicates from a text file column in a windows or linux command prompt?) Thank you and sorry for the long post. I have a midterm tomorrow and starting to get stressed.

Use Set instead of List so you don't have to check if the collection contains the element. Set doesn't allow duplicates.

Cost of lookup using arrayList contains() gives you roughly O(n) performance.
Doing this a million times is what I think, killing your program.
Use a HashSet implementation of Set. It will afford you theoretically constant time lookup and will automatically remove duplicates for you.

-try using memory mapped file in java for faster access to the large file
-and instead of ArrayList use a HashMap collection where the key is the actor's name (or the hash code) this will improve a lot the speed since the look-up of a key in a HashMap is very fast

A good way to store/read a large amount of strings?

I'm developing a "funny quotes" app for android. I have over 1000 quotes which I want to use inside my app but I don't know whether I should use a database or text file. Please note that the app should not read the same sentence twice. and it has a previous/next button so i need to keep track of the previous quotes. please tell me which one is better and more optimized. Also, if you can please link me to a good tutorial about storing/reading the data.
thanks

Use a database. It's faster and more flexible than a text file. One day you will extend the app and then you will be glad you used a database. I recommend to, when you boot up the app, just select all the rows using the in-built random functionality of your database. 1000 rows should not take too long. Then just iterate through the resulting ArrayList (or whatever you choose to use) of strings you end up with - the first quote you show will be element 0 from that list, the second element element 1 from that list, and so on. If you use this approach, you won't need any other structure to keep track of used quotes - just use the iterator variable that you use for indexing the quote array.
fetchAllRows on this page seems to be what you want for getting the data.
If you choose not to keep too much in memory, you could keep just a list of quote IDs that have been used so far. The last element of that list would be the current quote, and the previous elements would be what the user should see when they press the back button.

If you will never read the same string twice the I will recommend you to not use String class as there objects are immutable and will stick in the string pool waiting to be reassigned to a reference, but that will never happen as you will never read the same string twice.
The use of DB's will over complicate things.
I suggest you to read a flat file in bytes and then translate them to StringBuider objects hence keeping it simple enough but still preventing intensive GC().
I hope it helps..

USing DB should be fine as I think you would not want all the data in memory. You can keep all the quotes in DB and keep a flag to keep track whether a quote was read or not (simply update it to true once read.)
This way you can choose from any of the quote which has the flag as false.

Have you considered CsvJdbc? You have the benefit of simple csv files with an easy upgrade path to a real database later when you have a significant number of records.
1k records is quite small and in my opinion not sufficient to merit a database.

Text Classification in Java

I need some sort of solution in Java for the following requirements:
Search in a text for certain terms (each term can be 1-3 words). For example: {"hello world", "hello"}. The match need to be exact.
There are about 500 types of terms groups each contains about 30 terms.
Each text might contain up to 4000 words.
performance is an important issue.
Thanks,
Rod

I have done something similar for a bespoke spam filter.
A technique I found to be both simple and fast is:
Split the input file into words first.
Call intern() on each word, to simplify the comparisons in step 3.
Create a Term class, encapsulating an array of up to three strings. Its equals() method can do pointer comparison on the strings, rather than calling String.equals(). Create a Term instance for each group of 2 or 3 consecutive words in the input.
Use a Multimap (from Google Collections) to map each term to the set of files in which it appears.

Use regex expressions. See: http://java.sun.com/docs/books/tutorial/essential/regex/

There seems to be two parts to this. Figuring a decent algorithm, and implementing it in Java. (For the moment let's put aside the idea that surely "out there" someone has already implemented this, and you can probably find some ideas.)
Seems like we want to avoid repeat expensive work. but it's not clear where the costs would be. So I guess you'll need to be prepared to benchmark a few candidate appraoches. Also have in mind what is "good enough".
Start wih the simplest thing you can think of that works. Measure it. You might get the surprising result that it's good enough. Stop right there! For example, this is really dumb:
read text into String (4k, that's not too big)
for each term
use regexp to find matches in text
but it might well give a sub-second response time. Would your users really care if you took a 200ms response down to 100ms? How much would they pay for that?
Another approach. I wonder of this is faster?
prepare a collection of terms keyed by first word
tokenize the text
for each token
find terms that match
check for match (using look ahead for multi-word terms)
As for implementing in Java. Separate problem ask specific questions if you need to.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.