Fastest way to lookup a String value - java

I have a simple application that reads data in small strings from large text files and saves them to a database. To actually save each such String, the application calls the following method several (may thousands, or more) times:
setValue(String value)
{
if (!ignore(value))
{
// Save the value in the database
}
}
Currently, I implement the ignore() method by just successively comparing a set of Strings, e.g.
public boolean ignore(String value)
{
if (value.equalsIgnoreCase("Value 1") || (value.equalsIgnoreCase("Value 2"))
{
return true;
}
return false;
}
However, because I need to check against many such "ignorable" values, which will be defined in another part of the code, I need to use a data structure for this check, instead of multiple consecutive if statements.
So, my question is, what would be the fastest data structure from standard Java to to implement this? A HashMap? A Set? Something else?
Initialization time is not an issue, since it will happen statically and once per application invocation.
EDIT: The solutions suggested thus far (including HashSet) appear slower than just using a String[] with all the ignored words and just running "equalsIgnoreCase" against each of these.

Use a HashSet, storing the values in lowercase, and its contains() method, which has better lookup performance than TreeSet (constant-time versus log-time for contains).
Set<String> ignored = new HashSet<String>();
ignored.add("value 1"); // store in lowercase
ignored.add("value 2"); // store in lowercase
public boolean ignore(String value) {
return ignored.contains(value.toLowerCase());
}
Storing the values in lowercase and searching for the lowercased input avoids the hassle of dealing with case during comparison, so you get the full speed of the HashSet implementation and zero collection-related code to write (eg Collator, Comparator etc).
EDITED
Thanks to Jon Skeet for pointing out that certain Turkish characters behave oddly when calling toLowerCase(), but if you're not intending on supporting Turkish input (or perhaps other languages with non-standard case issues) then this approach will work well for you.

In most cases I'd normally start with a HashSet<String> - but as you want case-insensitivity, that makes it slightly harder.
You can try using a TreeSet<Object> using an appropriate Collator for case-insensitivity. For example:
Collator collator = Collator.getInstance(Locale.US);
collator.setStrength(Collator.SECONDARY);
TreeSet<Object> set = new TreeSet<Object>(collator);
Note that you can't create a TreeSet<String> as Collator only implements Comparator<Object>.
EDIT: While the above version works with just strings, it may be faster to create a TreeSet<CollationKey>:
Collator collator = Collator.getInstance(Locale.US);
collator.setStrength(Collator.SECONDARY);
TreeSet<CollationKey> set = new TreeSet<CollationKey>();
for (String value : valuesToIgnore) {
set.add(collator.getCollationKey(value));
}
Then:
public boolean ignore(String value)
{
return set.contains(collator.getCollationKey(value));
}
It would be nice to have a way of storing the collation keys for all ignored values but then avoid creating new collation keys when testing, but I don't know of a way of doing that.

Add the words to ignore to a list and just check if the word is in that list.
That makes it dynamically.

If using Java 7 this is a fast way to do it:
public boolean ignore(String value) {
switch(value.toLowerCase()) { // see comment Jon Skeet
case "lowercased_ignore_value1":
case "lowercased_ignore_value2":
// etc
return true;
default:
return false;
}
}

It seems that String[] is slightly better (performance-wise) than the other methods proposed, so I will use that.
It is simply something like this:
public boolean ignore(String value)
{
for (String ignore:IGNORED_VALUES)
{
if (ignore.equalsIgnoreCase(value))
{
return true;
}
return false;
}
The IGNORED_VALUES object is just a String[] with all ignored values in there.

Related

How to set a value to variable based on multiple conditions using Java Streams API?

I couldn't wrap my head around writing the below condition using Java Streams. Let's assume that I have a list of elements from the periodic table. I've to write a method that returns a String by checking whether the list has Silicon or Radium or Both. If it has only Silicon, method has to return Silicon. If it has only Radium, method has to return Radium. If it has both, method has to return Both. If none of them are available, method returns "" (default value).
Currently, the code that I've written is below.
String resolve(List<Element> elements) {
AtomicReference<String> value = new AtomicReference<>("");
elements.stream()
.map(Element::getName)
.forEach(name -> {
if (name.equalsIgnoreCase("RADIUM")) {
if (value.get().equals("")) {
value.set("RADIUM");
} else {
value.set("BOTH");
}
} else if (name.equalsIgnoreCase("SILICON")) {
if (value.get().equals("")) {
value.set("SILICON");
} else {
value.set("BOTH");
}
}
});
return value.get();
}
I understand the code looks messier and looks more imperative than functional. But I don't know how to write it in a better manner using streams. I've also considered the possibility of going through the list couple of times to filter elements Silicon and Radium and finalizing based on that. But it doesn't seem efficient going through a list twice.
NOTE : I also understand that this could be written in an imperative manner rather than complicating with streams and atomic variables. I just want to know how to write the same logic using streams.
Please share your suggestions on better ways to achieve the same goal using Java Streams.
It could be done with Stream IPA in a single statement and without multiline lambdas, nested conditions and impure function that changes the state outside the lambda.
My approach is to introduce an enum which elements correspond to all possible outcomes with its constants EMPTY, SILICON, RADIUM, BOTH.
All the return values apart from empty string can be obtained by invoking the method name() derived from the java.lang.Enum. And only to caver the case with empty string, I've added getName() method.
Note that since Java 16 enums can be declared locally inside a method.
The logic of the stream pipeline is the following:
stream elements turns into a stream of string;
gets filtered and transformed into a stream of enum constants;
reduction is done on the enum members;
optional of enum turs into an optional of string.
Implementation can look like this:
public static String resolve(List<Element> elements) {
return elements.stream()
.map(Element::getName)
.map(String::toUpperCase)
.filter(str -> str.equals("SILICON") || str.equals("RADIUM"))
.map(Elements::valueOf)
.reduce((result, next) -> result == Elements.BOTH || result != next ? Elements.BOTH : next)
.map(Elements::getName)
.orElse("");
}
enum
enum Elements {EMPTY, SILICON, RADIUM, BOTH;
String getName() {
return this == EMPTY ? "" : name(); // note name() declared in the java.lang.Enum as final and can't be overridden
}
}
main
public static void main(String[] args) {
System.out.println(resolve(List.of(new Element("Silicon"), new Element("Lithium"))));
System.out.println(resolve(List.of(new Element("Silicon"), new Element("Radium"))));
System.out.println(resolve(List.of(new Element("Ferrum"), new Element("Oxygen"), new Element("Aurum")))
.isEmpty() + " - no target elements"); // output is an empty string
}
output
SILICON
BOTH
true - no target elements
Note:
Although with streams you can produce the result in O(n) time iterative approach might be better for this task. Think about it this way: if you have a list of 10.000 elements in the list and it starts with "SILICON" and "RADIUM". You could easily break the loop and return "BOTH".
Stateful operations in the streams has to be avoided according to the documentation, also to understand why javadoc warns against stateful streams you might take a look at this question. If you want to play around with AtomicReference it's totally fine, just keep in mind that this approach is not considered to be good practice.
I guess if I had implemented such a method with streams, the overall logic would be the same as above, but without utilizing an enum. Since only a single object is needed it's a reduction, so I'll apply reduce() on a stream of strings, extract the reduction logic with all the conditions to a separate method. Normally, lambdas have to be well-readable one-liners.
Collect the strings to a unique set. Then check containment in constant time.
Set<String> names = elements.stream().map(Element::getName).map(String::toLowerCase).collect(toSet());
boolean hasSilicon = names.contains("silicon");
boolean hasRadium = names.contains("radium");
String result = "";
if (hasSilicon && hasRadium) {
result = "BOTH";
} else if (hasSilicon) {
result = "SILICON";
} else if (hasRadium) {
result = "RADIUM";
}
return result;
i have used predicate in filter to for radium and silicon and using the resulted set i am printing the result.
import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
public class Test {
public static void main(String[] args) {
List<Element> elementss = new ArrayList<>();
Set<String> stringSet = elementss.stream().map(e -> e.getName())
.filter(string -> (string.equals("Radium") || string.equals("Silicon")))
.collect(Collectors.toSet());
if(stringSet.size()==2){
System.out.println("both");
}else if(stringSet.size()==1){
System.out.println(stringSet);
}else{
System.out.println(" ");
}
}
}
You could save a few lines if you use regex, but I doubt if it is better than the other answers:
String resolve(List<Element> elements) {
String result = elements.stream()
.map(Element::getName)
.map(String::toUpperCase)
.filter(str -> str.matches("RADIUM|SILICON"))
.sorted()
.collect(Collectors.joining());
return result.matches("RADIUMSILICON") ? "BOTH" : result;
}

Extract all True properties and add to a list

I have a java class with 3 boolean property like this
boolean isActive;
boolean isEnable;
boolean isNew;
every property is related to an enum (e.g. ACTIVE,ENABLE,NEW).
I want to have 2 lists of enum. One which has only the enums related to true property value and one for the false one.
just to be clear. using if-else statement I could have
Set<FlagEnum> flagSet = new HashSet<>();
Set<FlagEnum> falseFlagSet = new HashSet<>();
if (object.isActive()) {
flagSet.add(ACTIVE);
} else {
falseFlagSet.add(ACTIVE);
}
if (object.isEnable()) {
flagSet.add(ENABLE);
} else {
falseFlagSet.add(ENABLE);
}
if (object.isNew()) {
flagSet.add(NEW);
} else {
falseFlagSet.add(NEW);
}
is there a way to avoid all these if-else?
I tried with something like
Map<boolean, List<Pair<boolean, FlagEnum>>> res = Stream.of(
new Pair<>(object.isActive(), ACTIVE),
new Pair<>(object.isNew(), NEW),
new Pair<>(object.isEnable(), ENABLE))
.collect(Collectors.partitioningBy(Pair::getKey));
but the resulted structure is an additional complexity which I would like to avoid.
In my real case, I have more than 15 boolean properties...
You can simplify this in various ways. Which of them make sense, depends on your exact requirements.
You can derive the falseFlagSet trivially from the flagSet using EnumSet.complementOf after populating the flagSet:
EnumSet<FlagEnum> falseFlagSet = EnumSet.complementOf(flagSet);
This assumes that all FlagEnum values have corresponding flags. If that's not the case then you need to construct a EnumSet with all enums that have flags and subtract flagSet from that using removeAll.
#1 already removes the need for the else in your cascade, simplifying the code to
if (object.isActive()) {
flagSet.add(ACTIVE);
}
if (object.isEnable()) {
flagSet.add(ENABLE);
}
if (object.isNew()) {
flagSet.add(NEW);
}
If you have enough different flags, then you can create a mapping from getter method to FlagEnum value like this:
Map<Function<YourClass,Boolean>,FlagEnum> GETTERS = Map.of(
YourClass::isActive, FlagEnum.ACTIVE,
YourClass::isNew, FlagEnum.NEW,
YourClass::isEnable, FlagEnum.ENABLE);
Then you can use this to make the whole process data-driven:
EnumSet<FlagEnum> getFlagSet(YourClass yourObject) {
EnumSet<FlagEnum> result = EnumSet.noneOf(FlagEnum.class);
for (Map.Entry<Function<YourClass,Boolean>, FlagEnum> getter : GETTERS.entrySet()) {
if (getter.getKey().apply(yourObject)) {
result.add(getter.getValue());
}
}
return result;
}
If the number of flags is very big, then you could switch entirely to reflection and detect the flags and matching getters dynamically using string comparison, but I would not suggest that approach. If you need something like that then you probably should switch to a framework that supports that kind of feature and not implement it yourself.
That last two obviously only makes sense when the number of flags is big. If it's actually just 3 flags, then I wouldn't mind and just have 3 simple if statements.
As a slight tangent: GETTERS above should definitely be an immutable map (wrap it in Collections.unmodifiableMap or use something like Guava ImmutableMap) and it could be argued that the same applies to the return value of the getFlagSet method. I've left those out for succinctness.
You can use a private helper method for this.
private void addFlagSet(boolean condition, FlagEnum flagEnum,
Set<FlagEnum> flagSet, Set<FlagEnum> falseFlagSet) {
Set<FlagEnum> chosenFlagSet = condition ? flagSet: falseFlagSet;
chosenFlagSet.add(flagEnum);
}
Call it as:
addFlagSet(object.isActive(), FlagEnum.ACIVE, flagSet, falseFlagSet);
addFlagSet(object.isNew(), FlagEnum.NEW, flagSet, falseFlagSet);
addFlagSet(object.isEnable(), FlagEnum.ENABLE, flagSet, falseFlagSet);
You could probably use Reflection to get all methods, then check if a getReturnType() == boolean.class. Problem is the connection between the method's name and the enum. If every single one is named like the method without the 'is', you could use FlagEnum.valueOf() to retrieve the enum value from the method name and use it.
I think this could be the easiest and clearest way to do what I need
Map<Boolean, Set<FlagEnum>> flagMap = new HashMap<>();
flagMap.computeIfAbsent(object.isActive(), h -> new HashSet()).add(ACTIVE);
flagMap.computeIfAbsent(object.isEnabled(), h -> new HashSet()).add(ENABLE);
flagMap.computeIfAbsent(object.isNew(), h -> new HashSet()).add(NEW);
//to get TRUE set simply :
flagMap.get(true);
what do you think?

How to check if Set B has at least one element that is not in Set A

So I have two sets: A and B. I need to check if set B contains anything that is not in the set A. There are maybe intersections, so I cannot just check if set A contains set B.
I can obviously do this:
for (String string : setA) {
if (!setB.contains(string) {
break;
}
}
or using the Guava library:
Sets.intersection(setA, setB).containsAll(setB); // returns false if there are elements outside.
But is there any way that would perform better or may be just cleaner or more elegant?
Thanks.
“B contains an element not in A” is the exact opposite of “A contains all elements of B”, therefore, the already existing method containsAll is sufficient to answer that question.
if(!setA.containsAll(setB)) {
System.out.println("setB contains an element not in setA");
}
You may shortcut using setB.size()>setA.size() || !setA.containsAll(setB), but this requires that the sets agree on the definition of equality, e.g. if one set is a SortedSet using String.CASE_INSENSITIVE_ORDER as comparator and the other is a HashSet, this won’t work (but the definition of the correct outcome is tricky with such combinations anyway).
If setB is really large, you might get a benefit from using a parallel stream like
if(!setB.parallelStream().allMatch(setA::contains)) {
System.out.println("setB contains an element not in setA");
}
but this is rather rare.
Merge all elements into another set and compare the total elements:
Set ab = new Set(a);
ab.addAll(b);
if (ab.size() != b.size()) break; // that means `a` had some element that was not in b
Another way to use streams (parallel) and functional mix
setB.parallelStream().filter(((Predicate<String>)setA::contains).negate()).findFirst();
same as
setB.parallelStream().filter(bi -> { return !setA.contains(bi);}).findFirst();
Straight Java
Duplicate the "target" set.
duplicateSet.removeAll(otherSet)
If duplicateSet is not empty, then the target contains one or more elements that are not in the "otherSet"
Apache SetUtils
xyz = SetUtils.difference(seta, setb);
if xyz.size() > 0 then seta contains one or more elements that are not in setb.
You can try algorithm with removing elements from setB:
if (setB.size() > setA.size()) {
return true;
}
for (String s : setA) {
//boolean contains = setB.contains(s);
boolean contains = setB.remove(s);
if (contains) return true;
}

Creating a dictionary: Method to prevent the same word from being added more than once

I need to create a method to determine whether or not the word I'm trying to add to my String[] dictionary has already been added. We were not allowed to use ArrayList for this project, only arrays.
I started out with this
public static boolean dictHasWord(String str){
for(int i = 0; i < dictionary.length; i++){
if(str.equals(dictionary[i])){
return true;
}
}
return false;
}
However, my professor told me not to use this, because it is a linear function O(n), and is not effective. What other way could I go about solving this method?
This is a example of how to quickly search through a Array with good readability. I would suggest using this method to search your array.
import java.util.*;
public class test {
public static void main(String[] args) {
String[] list = {"name", "ryan"
};
//returns boolean here
System.out.println(Arrays.asList(list).contains("ryan"));
}
}
If you are allowed to use the Arrays class as part of your assignment, you can sort your array and use a binary search instead, which is not O(n).
public static boolean dictHasWord(String str){
if(Arrays.binarySearch(dictionary, str) != -1){
return true;
}
return false;
}
Just keep in mind you must sort first.
EDIT:
Regarding writing your own implementation, here's a sample to get you going. Here are the javadocs for compareTo() as well. Heres another sample (int based example) showing the difference between recursive and non recursive, specifically in Java.
Although it maybe an overkill in this case, but a hash-table would not be O(n).
This uses the fact that every String can be turnt into an int via hashCode(), and equal strings will produce the same hash.
Our dictionary can be declared as:
LinkedList<String>[] dictionary;
In other words in each place several strings may reside, this is due to possible collisions (different strings producing the same result).
The simplest solution for addition would be:
public void add(String str)
{
dictionary[str.hashCode()].add(str);
}
But in order to do this, you would need to make an array size equal to 1 less the maximum of hashCode() function. Which is probably too much memory for you. So we can do a little differently:
public void add(String str)
{
dictionary[str.hashCode()%dictionary.length].add(str);
}
This way we always mod the hash. For best results you should make your dictionary size some prime number, or at least a power of a single prime.
Then when you want to test the existence of the string you do exactly what you had in the original, but you use the specific LinkedList that you get from the hash:
public static boolean dictHasWord(String str)
{
for(String existing : dictionary[str.hashCode()%dictionary.length])
{
if(str.equals(existing)){
return true;
}
}
return false;
}
At which point you may ask "Isn't it O(n)?". And the answer is that it is not, since the hash function did not take into consideration the number of elements in array. The more memory you will give to your array, less collisions you will have, and more this approach moves towards O(1).
If somebody finds this answer searching for a real solution (not homework assignment). Then just use HashMap.

How can I use functional programming to do string manipulation?

I'm writing a function where I'm essentially doing the same thing over and over. I have the function listed below
public String buildGarmentsString(List<Garment> garments)
{
StringBuilder garmentString = new StringBuilder(10000);
for(int i=0;i<4;i++)
{
garmentString.append(this.garmentProductId(i,garments.get(i).getProductId()));
garmentString.append(this.garmentColor(i,garments.get(i).getColor()));
for(int j=0;j<garments.get(i).getSizes().size();j++)
{
//check xxsml
if(garments.get(i).getSizes().get(j).getXxsml() >0)
{
garmentString.append(this.garmentSizes(i, Size.xxsml(),garments.get(i).getSizes().get(j).getXxsml()));
}
//check xsml
if(garments.get(i).getSizes().get(j).getXsml() > 0)
{
garmentString.append(this.garmentSizes(i,Size.xsml(),garments.get(i).getSizes().get(j).getXsml()));
}
//check sml
if(garments.get(i).getSizes().get(j).getSml() > 0)
{
garmentString.append(this.garmentSizes(i,Size.sml(),garments.get(i).getSizes().get(j).getSml()));
}
//check med
if(garments.get(i).getSizes().get(j).getMed() > 0)
{
garmentString.append(this.garmentSizes(i,Size.med(),garments.get(i).getSizes().get(j).getMed()));
}
//check lrg
if(garments.get(i).getSizes().get(j).getLrg() > 0)
{
garmentString.append(this.garmentSizes(i,Size.lrg(),garments.get(i).getSizes().get(j).getLrg()));
}
//check xlrg
if(garments.get(i).getSizes().get(j).getXlg() > 0)
{
garmentString.append(this.garmentSizes(i,Size.xlg(),garments.get(i).getSizes().get(j).getXlg()));
}
//check xxlrg
if(garments.get(i).getSizes().get(j).getXxl() >0)
{
garmentString.append(this.garmentSizes(i,Size.xxlg(),garments.get(i).getSizes().get(j).getXxl()));
}
//check xxxlrg
if(garments.get(i).getSizes().get(j).getXxxl() >0)
{
garmentString.append(this.garmentSizes(i,Size.xxxlg(),garments.get(i).getSizes().get(j).getXxxl()));
}
}
}
}
This is my garmentSizes function:
public String garmentSizes(int garmentNumber, String size,int numberToSend)
{
String garmentSizes = "&garment["+garmentNumber+"][sizes]["+size+"]="+numberToSend;
return garmentSizes;
}
I'm trying to figure out how I can get this done with a lot less code. I've read that with functional programming you can do things like pass in functions to parameters to other functions. After doing some reading online, I think I want to do something like this but I'm not sure how or what the best approach would be.
I have done some reading here on stack overflow and I've seen people mention using either the Command pattern or FunctionalJava or LambdaJ for trying to approximate this feature in Java. I've read over the documentation for the two libraries and read the Wikipedia Article on the Command Pattern, but I'm still not sure how I would use any of those to solve my particular problem. Can somebody explain this to me? As somebody that has never done any functional programming this is a bit confusing.
You could use local variables to decrease the amount of repetition. Say bySize = garments.get(i).getSizes().get(j) for example.
instead of size.getXxsml(), size.getXsml() etc. you could use an enum for sizes and loop on sizes.
The whole thing would then look like:
for(int j=0;j<garments.get(i).getSizes().size();j++) {
bySize = garments.get(i).getSizes().get(j);
for (Size s : Size.values()) {
if (bySize.get(s) > 0) {
garmentString.append(garmentSizes(i, s, bySize.get(s)));
}
}
}
The bySize.get(s) method could be implemented either with a switch that directs to the right method or directly in the enum and you could get rid of the getXsml etc. methods.
The only thing which differs between all your checks is this:
getXxsml/xxsml, getXsml/xsml, getSml/sml, etc.
If you could pass these values (as strings) to some upper-level method, and if
that upper-level method could eval i.e. execute these strings, then you can just
have an array of these values and pass that array to that upper-level method.
In Java, you can do something similar with reflection.
All these checks could indeed be simplified to much less
code through the use of reflection.
Look at:
java.lang.Class
java.lang.reflect.Method
java.lang.reflect.Field
java.lang.reflect.Constructor
and you will see what I mean.
From your code it appears that some Class has the following methods:
xxsml(), xsml(), sml(), med(), ..., xxxlg()
to get the amounts (?) available for each size.
You can design your data better, like this:
Have a "Size" type, that enumerates all sizes (could be Enum or some class with attribute String key)
Have a method that returns a List of all known sizes.
replace the above methods with amountFor(Size) This could be backed by a Map<Size, Integer>
For backward compatibility, you could rewrite the old methods along the lines:
int xxsml() {
return amountFor(Size.XXSML); // assuming you have a singleton instance
// for each well known size
}
Of course, in getGarmentString, you would then loop through the List of all known sizes:
for (Size sz : Size.getAllKnownSizes()) {
if (garments.get(i).getSizes().get(j).amountFor(sz) > 0) {
... do whatever must be done here
}
}

Categories

Resources