Assert that one of string in array contains substring - java

List<String> expectedStrings = Arrays.asList("link1", "link2");
List<String> strings = Arrays.asList("lalala link1 lalalla", "lalalal link2 lalalla");
For each expectedString, I need assert that any of string in the 'strings' contains expectedString.
How I may assert this with Hamcrest?
Thanks for your attention.

Update
After checking this old answer, I found that you can use a better combination of built-in matchers making both the assertion and the error messages more readable:
expectedStrings.forEach(expectedString ->
assertThat(strings, hasItem(containsString(expectedString))));
Original answer for reference
You can do it quite easily with streams:
assertThat(expectedStrings.stream().allMatch(
expectedString -> strings.stream()
.anyMatch(string -> string.contains(expectedString))),
is(true));
allMatch will make sure all of the expectedStrings will be checked, and with using anyMatch on strings you can efficiently check if any of the strings contains the expected string.

At the moment there isn't any matcher in hamcrest with this requeriment, despite you can combine multiple this is not possible yet.
So, in cases like yours the best solution in my opinion is to create your own matcher Why?
It can be reused
Is maintainable
Is more readable
So in your case you need to match the first list contains any string of the second one, you can create a Matcher like next:
public class ContainsStringsOf extends BaseMatcher<List<String>> {
private final List<String> valuesToCompare;
public ContainsStringsOf(List<String> valuesToCompare) {
this.valuesToCompare = valuesToCompare;
}
#Override
public void describeTo(Description description) {
description.appendText("doesn't contains all of " + valuesToCompare.toString() + " text");
}
#Override
public boolean matches(Object o) {
List<String> values = (List<String>) o;
for (String valueToCompare : valuesToCompare) {
boolean containsText = false;
for (String value : values) {
if (value.contains(valueToCompare)) {
containsText = true;
}
}
if (!containsText) {
return false;
}
}
return true;
//note: you can replace this impl with java-8 #florian answer comparison
//return valuesToCompare.stream().allMatch(exp -> strings.stream().anyMatch(st-> st.contains(exp)))
}
#Factory
public static Matcher<List<String>> containsStringsOf(List<String> collection) {
return new ContainsStringsOf(collection);
}
}
Then you can use it is just as hamcrest matcher is used:
List<String> expectedStrings = Arrays.asList("link1", "link2");
List<String> strings = Arrays.asList("lalala link1 lalalla", "lalalal link2 lalalla");
Assert.assertThat(strings , containsStringsOf(expectedStrings));

Related

Java - check if under_score string is in a list of lowerCamel strings

Consider the following keys (under_score) and fields (lowerCamel):
keys = ["opened_by","ticket_owner","close_reason"]
fields = ["openedBy","ticketOwner","closeReason"]
I'm looking for an efficient way in Java to check whether key is in fields, where I expect the following to return true:
fields = ["openedBy","ticketOwner"]
return fields.contains("opened_by")) //true
My code:
Set<String> incidentFields = Arrays
.stream(TicketIncidentDTO.class.getDeclaredFields())
.map(Field::getName)
.collect(Collectors.toSet()
);
responseJson.keySet().forEach(key ->{
if (incidentFields.contains(key))
{
//Do something
}
});
I could just replace all lowerCase with underscore, but I'm looking for more efficient way of doing this.
Try with CaseUtils from Commons Text
// opened_by -> openedBy
private String toCamel(String str) {
return CaseUtils.toCamelCase(str, false, new char[] { '_' });
}
List<String> keys = Arrays.asList("opened_by", "ticket_owner", "close_reason", "full_name");
List<String> fields = Arrays.asList("openedBy", "ticketOwner", "closeReason");
keys.forEach(t -> {
// check
if (fields.contains(toCamel(t))) {
System.out.println(t);
}
});
If you do not have fields like abcXyz (abc_xyz) and abCxyz (ab_cxyz) (fields with same spelling but combination of different words), then one solution would be to replace the "_" with empty "" and then compare to fieldName using equalsIgnoreCase. Another but similar solution would be to convert each fieldName to lower case and then compare it to the camel case string after replacing the "_" with "". This could possibly eliminate the use of an additional loop when compared to the first approach.
Set<String> fields= Arrays.stream(TicketIncidentDTO.class.getDeclaredFields())
.map(Field::getName)
.map(String::toLowerCase)
.collect(Collectors.toSet());
responseJson.keySet()
.filter(key -> fields.contains(key.replaceAll("_","")))
.forEach(key -> {
// do something..
});
A simple toCamel method:
private String toCamel(String str) {
String[] parts = str.split("_");
StringBuilder sb = new StringBuilder(parts[0]);
for (int i=1; i < parts.length ; i++) {
String part = parts[i];
if (part.length() > 0) {
sb.append(part.substring(0, 1).toUpperCase()).append(part.substring(1));
}
}
return sb.toString();
}
Now use the very same approach:
keys.forEach(t -> {
if (fields.contains(toCamel(t))) {
System.out.println("Fields contain " + t);
} else {
System.out.println("Fields doesn't contain " + t);
}
});
I could just replace all lowerCase with underscore, but I'm looking for more efficient way of doing this.
Use Set as a data structure for keys and fields that is very effective in the look-up. Moreover, it is sutable for this use case since it doesn't make sense to have duplicated keys in JSON.

Create Test case for sorted list of strings

I hava a list of strings and in my code I order this list. I want to write a unit test to ensure that the list has been orderer properly. my code
#Test
public void test() {
List<String> orderedList = new ArrayList<String>();
orderedList.add("a");
orderedList.add("b");
orderedList.add("a");
assertThat(orderedList, isInDescendingOrdering());
}
private Matcher<? super List<String>> isInDescendingOrdering()
{
return new TypeSafeMatcher<List<String>>()
{
#Override
public void describeTo (Description description)
{
description.appendText("ignored");
}
#Override
protected boolean matchesSafely (List<String> item)
{
for(int i = 0 ; i < item.size() -1; i++) {
if(item.get(i).equals(item.get(i+1))) return false;
}
return true;
}
};
}
somehow it success al the times.
You are absolutely overcomplicating things here. Writing a custom matcher is a nice exercise, but it does not add any real value to your tests.
Instead I would suggest that you simply create some
String[] expectedArray =....
value and give that to your call to assertThat. That is less sexy, but much easier to read and understand. And that is what counts for unit tests!
You can do it simply by copying the array, then sorting it and finally compare it with original array.
The code is given below:
#Test
public void test() {
List<String> orderedList = new ArrayList<String>();
orderedList.add("a");
orderedList.add("b");
orderedList.add("a");
//Copy the array to sort and then to compare with original
List<String> orderedList2 = new ArrayList<String>(orderedList);
orderedList2.sort((String s1, String s2) -> s1.compareTo(s2));
Assert.assertEquals(orderedList, orderedList2);
}

How can I check if a string has a substring from a List?

I am looking for the best way to check if a string contains a substring from a list of keywords.
For example, I create a list like this:
List<String> keywords = new ArrayList<>();
keywords.add("mary");
keywords.add("lamb");
String s1 = "mary is a good girl";
String s2 = "she likes travelling";
String s1 has "mary" from the keywords, but string s2 does not have it. So, I would like to define a method:
boolean containsAKeyword(String str, List<String> keywords)
Where containsAKeyword(s1, keywords) would return true but containsAKeyword(s2, keywords) would return false. I can return true even if there is a single substring match.
I know I can just iterate over the keywords list and call str.contains() on each item in the list, but I was wondering if there is a better way to iterate over the complete list (avoid O(n) complexity) or if Java provides any built-in methods for this.
I would recommend iterating over the entire list. Thankfully, you can use an enhanced for loop:
for(String listItem : myArrayList){
if(myString.contains(listItem)){
// do something.
}
}
EDIT To the best of my knowledge, you have to iterate the list somehow. Think about it, how will you know which elements are contained in the list without going through it?
EDIT 2
The only way I can see the iteration running quickly is to do the above. The way this is designed, it will break early once you've found a match, without searching any further. You can put your return false statement at the end of looping, because if you have checked the entire list without finding a match, clearly there is none. Here is some more detailed code:
public boolean containsAKeyword(String myString, List<String> keywords){
for(String keyword : keywords){
if(myString.contains(keyword)){
return true;
}
}
return false; // Never found match.
}
EDIT 3
If you're using Kotlin, you can do this with the any method:
val containsKeyword = myArrayList.any { it.contains("keyword") }
In JDK8 you can do this like:
public static boolean hasKey(String key) {
return keywords.stream().filter(k -> key.contains(k)).collect(Collectors.toList()).size() > 0;
}
hasKey(s1); // prints TRUE
hasKey(s2); // prints FALSE
Now you can use Java 8 stream for this purpose:
keywords.stream().anyMatch(keyword -> str.contains(keyword));
Here is the solution
List<String> keywords = new ArrayList<>();
keywords.add("mary");
keywords.add("lamb");
String s1 = "mary is a good girl";
String s2 = "she likes travelling";
// The function
boolean check(String str, List<String> keywords)
Iterator<String> it = keywords.iterator();
while(it.hasNext()){
if(str.contains(it.next()))
return true;
}
return false;
}
Iterate over the keyword list and return true if the string contains your keyword. Return false otherwise.
public boolean containsAKeyword(String str, List<String> keywords){
for(String k : keywords){
if(str.contains(k))
return true;
}
return false;
}
You can add all the words in keywords in a hashmap. Then you can use str.contains for string 1 and string 2 to check if keywords are available.
Depending on the size of the list, I would suggest using the matches() method of String. String.matches takes a regex argument that, with smaller lists, you could sinply build a regular expression and evaluate it:
String Str = new String("This is a test string");
System.out.println(Str.matches("(.*)test(.*)"));
This should print out "true."
Or you could use java.util.regex.Pattern.

remove duplicate strings in a List in Java

Update:
I guess HashSet.add(Object obj) does not call contains. is there a way to implement what I want(remove dup strings ignore case using Set)?
Original question:
trying to remove dups from a list of String in java, however in the following code CaseInsensitiveSet.contains(Object ob) is not getting called, why?
public static List<String> removeDupList(List<String>list, boolean ignoreCase){
Set<String> set = (ignoreCase?new CaseInsensitiveSet():new LinkedHashSet<String>());
set.addAll(list);
List<String> res = new Vector<String>(set);
return res;
}
public class CaseInsensitiveSet extends LinkedHashSet<String>{
#Override
public boolean contains(Object obj){
//this not getting called.
if(obj instanceof String){
return super.contains(((String)obj).toLowerCase());
}
return super.contains(obj);
}
}
Try
Set set = new TreeSet(String.CASE_INSENSITIVE_ORDER);
set.addAll(list);
return new ArrayList(set);
UPDATE but as Tom Anderson mentioned it does not preserve the initial order, if this is really an issue try
Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
Iterator<String> i = list.iterator();
while (i.hasNext()) {
String s = i.next();
if (set.contains(s)) {
i.remove();
}
else {
set.add(s);
}
}
prints
[2, 1]
contains is not called as LinkedHashSet is not implemented that way.
If you want add() to call contains() you will need to override it as well.
The reason it is not implemented this way is that calling contains first would mean you are performing two lookups instead of one which would be slower.
add() method of LinkedHashSet do not call contains() internally else your method would have been called as well.
Instead of a LinkedHashSet, why dont you use a SortedSet with a case insensitive comparator
? With the String.CASE_INSENSITIVE_ORDER comparator
Your code is reduced to
public static List<String> removeDupList(List<String>list, boolean ignoreCase){
Set<String> set = (ignoreCase?new TreeSet<String>(String.CASE_INSENSITIVE_ORDER):new LinkedHashSet<String>());
set.addAll(list);
List<String> res = new ArrayList<String>(set);
return res;
}
If you wish to preserve the Order, as #tom anderson specified in his comment, you can use an auxiliary LinkedHashSet for the order.
You can try adding that element to TreeSet, if it returns true also add it to LinkedHashSet else not.
public static List<String> removeDupList(List<String>list){
Set<String> sortedSet = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
List<String> orderedList = new ArrayList<String>();
for(String str : list){
if(sortedSet.add(str)){ // add returns true, if it is not present already else false
orderedList.add(str);
}
}
return orderedList;
}
Try
public boolean addAll(Collection<? extends String> c) {
for(String s : c) {
if(! this.contains(s)) {
this.add(s);
}
}
return super.addAll(c);
}
#Override
public boolean contains(Object o) {
//Do your checking here
// return super.contains(o);
}
This will make sure the contains method is called if you want the code to go through there.
Here's another approach, using a HashSet of the strings for deduplication, but building the result list directly:
public static List<String> removeDupList(List<String> list, boolean ignoreCase) {
HashSet<String> seen = new HashSet<String>();
ArrayList<String> deduplicatedList = new ArrayList<String>();
for (String string : list) {
if (seen.add(ignoreCase ? string.toLowerCase() : string)) {
deduplicatedList.add(string);
}
}
return deduplicatedList;
}
This is fairly simple, makes only one pass over the elements, and does only a lowercase, a hash lookup, and then a list append for each element.

Partially match strings in case of List.contains(String)

I have a List<String>
List<String> list = new ArrayList<String>();
list.add("ABCD");
list.add("EFGH");
list.add("IJ KL");
list.add("M NOP");
list.add("UVW X");
if I do list.contains("EFGH"), it returns true.
Can I get a true in case of list.contains("IJ")? I mean, can I partially match strings to find if they exist in the list?
I have a list of 15000 strings. And I have to check about 10000 strings if they exist in the list. What could be some other (faster) way to do this?
Thanks.
If suggestion from Roadrunner-EX does not suffice then, I believe you are looking for Knuth–Morris–Pratt algorithm.
Time complexity:
Time complexity of the table algorithm is O(n), preprocessing time
Time complexity of the search algorithm is O(k)
So, the complexity of the overall algorithm is O(n + k).
n = Size of the List
k = length of pattern you are searching for
Normal Brute-Force will have time complexity of O(nm)
Moreover KMP algorithm will take same O(k) complexity for searching with same search string, on the other hand, it will be always O(km) for brute force approach.
Perhaps you want to put each String group into a HashSet, and by fragment, I mean don't add "IJ KL" but rather add "IJ" and "KL" separately. If you need both the list and this search capabilities, you may need to maintain two collections.
As a second answer, upon rereading your question, you could also inherit from the interface List, specialize it for Strings only, and override the contains() method.
public class PartialStringList extends ArrayList<String>
{
public boolean contains(Object o)
{
if(!(o instanceof String))
{
return false;
}
String s = (String)o;
Iterator<String> iter = iterator();
while(iter.hasNext())
{
String iStr = iter.next();
if (iStr.contain(s))
{
return true;
}
}
return false;
}
}
Judging by your earlier comments, this is maybe not the speed you're looking for, but is this more similar to what you were asking for?
You could use IterableUtils from Apache Commons Collections.
List<String> list = new ArrayList<String>();
list.add("ABCD");
list.add("EFGH");
list.add("IJ KL");
list.add("M NOP");
list.add("UVW X");
boolean hasString = IterableUtils.contains(list, "IJ", new Equator<String>() {
#Override
public boolean equate(String o1, String o2) {
return o2.contains(o1);
}
#Override
public int hash(String o) {
return o.hashCode();
}
});
System.out.println(hasString); // true
You can iterate over the list, and then call contains() on each String.
public boolean listContainsString(List<string> list. String checkStr)
{
Iterator<String> iter = list.iterator();
while(iter.hasNext())
{
String s = iter.next();
if (s.contain(checkStr))
{
return true;
}
}
return false;
}
Something like that should work, I think.
How about:
java.util.List<String> list = new java.util.ArrayList<String>();
list.add("ABCD");
list.add("EFGH");
list.add("IJ KL");
list.add("M NOP");
list.add("UVW X");
java.util.regex.Pattern p = java.util.regex.Pattern.compile("IJ");
java.util.regex.Matcher m = p.matcher("");
for(String s : list)
{
m.reset(s);
if(m.find()) System.out.println("Partially Matched");
}
Here's some code that uses a regex to shortcut the inner loop if none of the test Strings are found in the target String.
public static void main(String[] args) throws Exception {
List<String> haystack = Arrays.asList(new String[] { "ABCD", "EFGH", "IJ KL", "M NOP", "UVW X" });
List<String> needles = Arrays.asList(new String[] { "IJ", "NOP" });
// To cut down on iterations, create one big regex to check the whole haystack
StringBuilder sb = new StringBuilder();
sb.append(".*(");
for (String needle : needles) {
sb.append(needle).append('|');
}
sb.replace(sb.length() - 1, sb.length(), ").*");
String regex = sb.toString();
for (String target : haystack) {
if (!target.matches(regex)) {
System.out.println("Skipping " + target);
continue;
}
for (String needle : needles) {
if (target.contains(needle)) {
System.out.println(target + " contains " + needle);
}
}
}
}
Output:
Skipping ABCD
Skipping EFGH
IJ KL contains IJ
M NOP contains NOP
Skipping UVW X
If you really want to get cute, you could bisect use a binary search to identify which segments of the target list matches, but it mightn't be worth it.
It depends which is how likely it is that yo'll find a hit. Low hit rates will give a good result. High hit rates will perform not much better than the simple nested loop version. consider inverting the loops if some needles hit many targets, and other hit none.
It's all about aborting a search path ASAP.
Yes, you can! Sort of.
What you are looking for, is often called fuzzy searching or approximate string matching and there are several solutions to this problem.
With the FuzzyWuzzy lib, for example, you can have all your strings assigned a score based on how similar they are to a particular search term. The actual values seem to be integer percentages of the number of characters matching with regards to the search string length.
After invoking FuzzySearch.extractAll, it is up to you to decide what the minimum score would be for a string to be considered a match.
There are also other, similar libraries worth checking out, like google-diff-match-patch or the Apache Commons Text Similarity API, and so on.
If you need something really heavy-duty, your best bet would probably be Lucene (as also mentioned by Ryan Shillington)
This is not a direct answer to the given problem. But I guess this answer will help someone to compare partially both given and the elements in a list using Apache Commons Collections.
final Equator equator = new Equator<String>() {
#Override
public boolean equate(String o1, String o2) {
final int i1 = o1.lastIndexOf(":");
final int i2 = o2.lastIndexOf(":");
return o1.substring(0, i1).equals(o2.substring(0, i2));
}
#Override
public int hash(String o) {
final int i1 = o.lastIndexOf(":");
return o.substring(0, i1).hashCode();
}
};
final List<String> list = Lists.newArrayList("a1:v1", "a2:v2");
System.out.println(IteratorUtils.matchesAny(list.iterator(), new EqualPredicate("a2:v1", equator)));

Categories

Resources