AWS DynamoDB multiple "NE" string filters? - java

I'm trying to scan/query an AWS DynamoDB table for a list of items where id (a single string) is not equal to any of strings A, B, C, D, etc.
I've tried something like this:
for (String itemString : items) {
scanExpression.addFilterCondition("id",
new Condition().withComparisonOperator(ComparisonOperator.NE)
.withAttributeValueList(new AttributeValue().withS(itemString)));
}
PaginatedScanList<Items> result = mapper.scan(Item.class, scanExpression);
What appears to happen is that each time I add filter, it overwrites the previous values, so that the scan only checks against one itemString, not several. Am I missing something, or is this a limitation of DynamoDB?
Note: if it matters, in my example, I listed four values (A, B, C, D) but in production, this may be a list of hundreds of values.

You are correct in saying that the way you are doing it "overwrites the previous values". Here is the relevant code from the 1.9.23 SDK on DynamoDBScanExpression:
public void addFilterCondition(String attributeName, Condition condition) {
if ( scanFilter == null )
scanFilter = new HashMap<String, Condition>();
scanFilter.put(attributeName, condition);
}
Since your attributeName will be id for both puts, the last value will win in this case.
In my opinion, this is poor behavior from DynamoDBScanExpression, and I would even lean more towards saying it is a bug that should be reported. The documentation does not state anything about when a duplicate attributeName is added and the method name makes it seem like this is unexpected behavior.
I don't see a good way to work around this other than building out the entire filter expression.
Another Note: On the documentation, I don't see a length constraint for how long a filter expression can be as well as how many ExpressionAttributeNames and ExpressionAttributeValues are allowed on a request. That may come into account if you are trying to filter out a ton of attribute values, but I haven't found any documentation of what that limit might be or what behavior you should expect.
StringJoiner filterExpression = new StringJoiner(" AND ");
Map<String, AttributeValue> expressionAttributeValues = new HashMap<>();
int itemAttributeSuffix = 0;
for (String itemString : items) {
StringBuilder expression = new StringBuilder("#idname")
.append(" ")
.append("<>")
.append(" ")
.append(":idval")
.append(itemAttributeSuffix);
filterExpression.add(expression);
expressionAttributeValues.put(":idval" + itemAttributeSuffix++,
new AttributeValue().withS(itemString));
}
Map<String, String> expressionAttributeNames = Collections.singletonMap("#idname", "id");
scanExpression.setFilterExpression(filterExpression.toString());
scanExpression.setExpressionAttributeNames(expressionAttributeNames);
scanExpression.setExpressionAttributeValues(expressionAttributeValues);
PaginatedScanList<Items> result = mapper.scan(Item.class, scanExpression);

Related

How to collect data from a stream in different lists based on a condition?

I have a stream of data as shown below and I wish to collect the data based on a condition.
Stream of data:
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L120;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L121;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L126;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L100;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
452857;0;L122;csO;20220411;20220411;EUR;000101435;+; ;F;1;EUR;000100000;+;
I wish to collect the data based on the index = 2 (L100,L121 ...) and store it in different lists of L120,L121,L122 etc using Java 8 streams. Any suggestions?
Note: splittedLine array below is my stream of data.
For instance: I have tried the following but I think there's a shorter way:
List<String> L100_ENTITY_NAMES = Arrays.asList("L100", "L120", "L121", "L122", "L126");
List<List<String>> list= L100_ENTITY_NAMES.stream()
.map(entity -> Arrays.stream(splittedLine)
.filter(line -> {
String[] values = line.split(String.valueOf(DELIMITER));
if(values.length > 0){
return entity.equals(values[2]);
}
else{
return false;
}
}).collect(Collectors.toList())).collect(Collectors.toList());
I'd rather change the order and also collect the data into a Map<String, List<String>> where the key would be the entity name.
Assuming splittedLine is the array of lines, I'd probably do something like this:
Set<String> L100_ENTITY_NAMES = Set.of("L100", ...);
String delimiter = String.valueOf(DELIMITER);
Map<String, List<String>> result =
Arrays.stream(splittedLine)
.map(line -> {
String[] values = line.split(delimiter );
if( values.length < 3) {
return null;
}
return new AbstractMap.SimpleEntry<>(values[2], line);
})
.filter(Objects::nonNull)
.filter(tempLine -> L100_ENTITY_NAMES.contains(tempLine.getEntityName()))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue, Collectors.toList());
Note that this isn't necessarily shorter but has a couple of other advantages:
It's not O(n*m) but rather O(n * log(m)), so it should be faster for non-trivial stream sizes
You get an entity name for each list rather than having to rely on the indices in both lists
It's easier to understand because you use distinct steps:
split and map the line
filter null values, i.e. lines that aren't valid in the first place
filter lines that don't have any of the L100 entity names
collect the filtered lines by entity name so you can easily access the sub lists
I would convert the semicolon-delimited lines to objects as soon as possible, instead of keeping them around as a serialized bunch of data.
First, I would create a model modelling our data:
public record LBasedEntity(long id, int zero, String lcode, …) { }
Then, create a method to parse the line. This can be as well an external parsing library, for this looks like CSV with semicolon as delimiter.
private static LBasedEntity parse(String line) {
String[] parts = line.split(";");
if (parts.length < 3) {
return null;
}
long id = Long.parseLong(parts[0]);
int zero = Integer.parseInt(parts[1]);
String lcode = parts[2];
…
return new LBasedEntity(id, zero, lcode, …);
}
Then the mapping is trivial:
Map<String, List<LBasedEntity>> result = Arrays.stream(lines)
.map(line -> parse(line))
.filter(Objects::nonNull)
.filter(lBasedEntity -> L100_ENTITY_NAMES.contains(lBasedEntity.lcode()))
.collect(Collectors.groupingBy(LBasedEntity::lcode));
map(line -> parse(line)) parses the line into an LBasedEntity object (or whatever you call it);
filter(Objects::nonNull) filters out all null values produced by the parse method;
The next filter selects all entities of which the lcode property is contained in the L100_ENTITY_NAMES list (I would turn this into a Set, to speed things up);
Then a Map is with key-value pairs of L100_ENTITY_NAME → List<LBasedEntity>.
You're effectively asking for what languages like Scala provide on collections: groupBy. In Scala you could write:
splitLines.groupBy(_(2)) // Map[String, List[String]]
Of course, you want this in Java, and in my opinion, not using streams here makes sense due to Java's lack of a fold or groupBy function.
HashMap<String, ArrayList<String>> map = new HashMap<>();
for (String[] line : splitLines) {
if (line.length < 2) continue;
ArrayList<String> xs = map.getOrDefault(line[2], new ArrayList<>());
xs.addAll(Arrays.asList(line));
map.put(line[2], xs);
}
As you can see, it's very easy to understand, and actually shorter than the stream based solution.
I'm leveraging two key methods on a HashMap.
The first is getOrDefault; basically if the value associate with our key doesn't exist, we can provide a default. In our case, an empty ArrayList.
The second is put, which actually acts like a putOrReplace because it lets us override the previous value associated with the key.
I hope that was helpful. :)
you're asking for a shorter way to achieve the same, actually your code is good. I guess the only part that makes it look lengthy is the if/else check in the stream.
if (values.length > 0) {
return entity.equals(values[2]);
} else {
return false;
}
I would suggest introduce two tiny private methods to improve the readability, like this:
List<List<String>> list = L100_ENTITY_NAMES.stream()
.map(entity -> getLinesByEntity(splittedLine, entity)).collect(Collectors.toList());
private List<String> getLinesByEntity(String[] splittedLine, String entity) {
return Arrays.stream(splittedLine).filter(line -> isLineMatched(entity, line)).collect(Collectors.toList());
}
private boolean isLineMatched(String entity, String line) {
String[] values = line.split(DELIMITER);
return values.length > 0 && entity.equals(values[2]);
}

read the value from nested foreach loop

I have two String Array, i have to enter the value from the second array while the first array element is used to find webelement.
Here is the sample code:
public void isAllTheFieldsDisplayed(String values, String fields) {
String[] questions = fields.split(",");
String[] answers = values.split(",");
for(String q : questions) {
// HERE IS THE PROBLEM - I want only the first answer from the String[] answers. similarly for the second question, i want the second element from the String[] answers.
// THIS WONT WORK - for(string ans : answers)
find(By.cssSelector("input[id='"+q+"']")).sendKeys(ans);
}
}
You probably need to check whether the two arrays contain the same number of elements.
Utilising a simple integer for loop and slice the elements from the arrays:-
for(int i=0; i<questions.length; i++ {
driver.findElement(By.id(questions[i])).sendKeys(answers[i]);
}
I assume the find method is some sort of wrapper for selenium's findElement
As id is being located suggest using By.id?
Ideally check whether a WebElement is found before calling sendKeys
Here's a slightly different approach. Which could be overkill depending on your environment.
Because of the coupled relationship of your questions and answers, we want to make sure they get paired correctly, and once they're paired there's no reason to distribute them separately anymore.
This could be a re-usable utility function like so:
public Map<String, String> csvsToMap(String keyCsv, String valueCsv) {
String[] questions = keyCsv.split(",");
String[] answers = valueCsv.split(",");
// This could also be something like "questions.length >= answers.length" so if there
// are more questions than answers the extras would be ignored rather than fail....
if (questions.length != answers.length) { // fail fast and explicit
throw new RuntimeException("Not the same number of questions and answers");
}
Map<String, String> map = new HashMap<>();
for (int i = 0; i < questions.length; i++) {
map.put(questions[i], answers[i]);
}
return map;
}
After the data has been sanitized and prepared for ingesting, handling it becomes a bit easier:
Map<String, String> preparedQuestions = csvsToMap(values, fields);
for (String aQuestion : preparedQuestions.keySet()) {
String selector = "input[id='" + aQuestion + "']";
String answer = preparedQuestions.get(aQuestion);
driver.findElement(By.id(selector)).sendKeys(answer);
}
Or if java8, streams could be used:
csvsToMap(values, fields).entrySet().stream()
.forEach(pair -> {
String selector = "input[id='" + pair.getKey() + "']";
driver.findElement(By.id(selector)).sendKeys(pair.getValue());
});
Preparing your data in a function like this ahead of time lets you avoid using indexes altogether elsewhere. If this is a pattern you repeat, a helper function like this becomes a single point of failure, which lets you test it, gain confidence in it, and trust that there aren't other near-identical snippets elsewhere that might have subtle differences or bugs.
Note how this helper function doesn't have any side effects, as long as the same inputs are provided, the same output should always result. This makes it easier to test than it would be having webdriver operations baked into this task, as webdriver has built in side-effects which can fail at any time with no fault to your code. (aka talking to the browser)
Iterator may resolve this, But i haven't tried.
Iterator itr = questions.iterator();
Iterator itrans = answers.iterator();
while( itr.hasNext() && itrans.hasNext())

Query in DynamoDB using Java API

I need to implement a Query operation on DynamoDB. Right now I'm doing it by giving the HashKey and then filtering out the results according to my conditions on non-key attributes.
This is what I'm doing :
MusicData hashKey = new MusicData();
hashKey.setID(singer);
DynamoDBQueryExpression<MusicData> queryExpression = new DynamoDBQueryExpression<MusicData>().withHashKeyValues(hashKey);
List<MusicData> queryResult = mapper.query(MusicData.class, queryExpression);
for (MusicData musicData : queryResult) {
if( my condtions ) {
do something;
}
}
What I'm trying to do is to be able to do something like this :
MusicData hashKey = new MusicData();
hashKey.setID(singer);
hashKey.setAlbum(sampleAlbum);
hashKey.setSinger(duration);
DynamoDBQueryExpression<MusicData> queryExpression = new DynamoDBQueryExpression<MusicData>().withHashKeyValues(hashKey);
List<MusicData> queryResult = mapper.query(MusicData.class, queryExpression);
for (MusicData musicData : queryResult) {
if( my condtions ) {
do something;
}
}
And get results already filtered out. Is there a way to do this in DynamoDB?
Yes, you can ask DynamoDB to perform filtering of queries before it returns results. However, you will still incur the 'cost' of reading these items even though they are not returned to your client. This is still a good practice as it will eliminate unnecessary transfer of items over the network.
To do this you will call additional methods on your DynamoDBQueryExpression object, specifically withFilterExpression and addExpressionAttributeNamesEntry / addExpressionAttributeValuesEntry to complete the expression.
Without the specific example of what type of conditions you want to apply it is hard to give an example, but depending on how simple your condition you want to apply is you could contain it in just the withFilterExpression method.
DynamoDBQueryExpression<MusicData> queryExpression = new DynamoDBQueryExpression<MusicData>().withHashKeyValues(hashKey).withFilterExpression("foo > 10");

Aggregating value from java ArrayList in an elegant way

I have a list of objects which I want to aggregate one of the values of this object grouped by other values of the objects in this list.
I'm currently using the properties I want to group by as a Hash key and I'm traversing the object so:
ArrayList<MyObject> raw = Some Data;
Map<String, MyObject> map = new HashMap<String, MyObject>();
for (MyObject ungrouped : raw) {
String key = ungrouped.getStringOne().getName() + ungrouped.getStringTwo() + ungrouped.getStringThree();
if (map.containsKey(key)){
MyObject holder = map.get(key);
holder.setNumericProp(holder.getNumericProp() + ungrouped.getNumericProp());
// map.put(key, holder); //Edited after comments
}
else{
map.put(key, ungrouped);
}
}
return map.values().toArray(new MyObject[map.values().size()]);
Is there a more elegant way to do this without using the concatenated strings as a key?
If this was SQL (from which I'm several application layers away) it would be:
SELECT SUM(numericvalue) FROM sometable GROUP BY stringone, stringtwo , stringthree
Apart from some problems I see with the code, one solution would be to use (if you can afford it) Guava's Equivalence (or replicate it in your code). You'd implement an Equivalence<MyObject> and use a Map<Equivalence.Wrapper<MyObject>, MyObject> as a container; you'd make the equivalence on your three string members.
That would allow it not to break in this situation:
// Oops! Same key...
s1 = "foo", s2 = "bar", s3 = "baz"
s1 = "fooba", s2 = "rb", s3 = "az"
Also, you could use the return value of the map's .put() method (the old value):
MyObject holder = map.put(key, ungrouped);
if (holder != null)
holder.setNumericProp(etc);
If your looking elegantly solve this you can use lambdja libraries (Download Here, Website). For example you can SUM a column with the following code (look at this link):
double sum = sumFrom(select(sales,
having(on(Sale.class).getBuyer().isMale())
.and( having(on(Sale.class).getSeller().isMale())))).getCost();
You also can group with this (look at this link):
Group<Person> group = group(meAndMyFriends, by(on(Person.class).getAge()));
With this libraries you can solve your problem in few lines.

how to manipulate list in java

Edit: My list is sorted as it is coming from a DB
I have an ArrayList that has objects of class People. People has two properties: ssn and terminationReason. So my list looks like this
ArrayList:
ssn TerminatinoReason
123456789 Reason1
123456789 Reason2
123456789 Reason3
568956899 Reason2
000000001 Reason3
000000001 Reason2
I want to change this list up so that there are no duplicates and termination reasons are seperated by commas.
so above list would become
New ArrayList:
ssn TerminatinoReason
123456789 Reason1, Reason2, Reason3
568956899 Reason2
000000001 Reason3, Reason2
I have something going where I am looping through the original list and matching ssn's but it does not seem to work.
Can someone help?
Code I was using was:
String ssn = "";
Iterator it = results.iterator();
ArrayList newList = new ArrayList();
People ob;
while (it.hasNext())
{
ob = (People) it.next();
if (ssn.equalsIgnoreCase(""))
{
newList.add(ob);
ssn = ob.getSSN();
}
else if (ssn.equalsIgnoreCase(ob.getSSN()))
{
//should I get last object from new list and append this termination reason?
ob.getTerminationReason()
}
}
To me, this seems like a good case to use a Multimap, which would allow storing multiple values for a single key.
The Google Collections has a Multimap implementation.
This may mean that the Person object's ssn and terminationReason fields may have to be taken out to be a key and value, respectively. (And those fields will be assumed to be String.)
Basically, it can be used as follows:
Multimap<String, String> m = HashMultimap.create();
// In reality, the following would probably be iterating over the
// Person objects returned from the database, and calling the
// getSSN and getTerminationReasons methods.
m.put("0000001", "Reason1");
m.put("0000001", "Reason2");
m.put("0000001", "Reason3");
m.put("0000002", "Reason1");
m.put("0000002", "Reason2");
m.put("0000002", "Reason3");
for (String ssn : m.keySet())
{
// For each SSN, the termination reasons can be retrieved.
Collection<String> termReasonsList = m.get(ssn);
// Do something with the list of reasons.
}
If necessary, a comma-separated list of a Collection can be produced:
StringBuilder sb = new StringBuilder();
for (String reason : termReasonsList)
{
sb.append(reason);
sb.append(", ");
}
sb.delete(sb.length() - 2, sb.length());
String commaSepList = sb.toString();
This could once again be set to the terminationReason field.
An alternative, as Jonik mentioned in the comments, is to use the StringUtils.join method from Apache Commons Lang could be used to create a comma-separated list.
It should also be noted that the Multimap doesn't specify whether an implementation should or should not allow duplicate key/value pairs, so one should look at which type of Multimap to use.
In this example, the HashMultimap is a good choice, as it does not allow duplicate key/value pairs. This would automatically eliminate any duplicate reasons given for one specific person.
What you might need is a Hash. HashMap maybe usable.
Override equals() and hashCode() inside your People Class.
Make hashCode return the people (person) SSN. This way you will have all People objects with the same SSN in the same "bucket".
Keep in mind that the Map interface implementation classes use key/value pairs for holding your objects so you will have something like myHashMap.add("ssn",peopleobject);
List<People> newlst = new ArrayList<People>();
People last = null;
for (People p : listFromDB) {
if (last == null || !last.ssn.equals(p.ssn)) {
last = new People();
last.ssn = p.ssn;
last.terminationReason = "";
newlst.add(last);
}
if (last.terminationReason.length() > 0) {
last.terminationReason += ", ";
}
last.terminationReason += p.terminationReason;
}
And you get the aggregated list in newlst.
Update: If you are using MySQL, you can use the GROUP_CONCAT function to extract data in your required format. I don't know whether other DB engines have similar function or not.
Update 2: Removed the unnecessary sorting.
Two possible problems:
This won't work if your list isn't sorted
You aren't doing anything with ob.getTerminationReason(). I think you mean to add it to the previous object.
EDIT: Now that i see you´ve edited your question.
As your list is sorted, (by ssn I presume)
Integer currentSSN = null;
List<People> peoplelist = getSortedList();//gets sorted list from DB.
/*Uses foreach construct instead of iterators*/
for (People person:peopleList){
if (currentSSN != null && people.getSSN().equals(currentSSN)){
//same person
system.out.print(person.getReason()+" ");//writes termination reason
}
else{//person has changed. New row.
currentSSN = person.getSSN();
system.out.println(" ");//new row.
system.out.print(person.getSSN()+ " ");//writes row header.
}
}
If you don´t want to display the contents of your list, you could use it to create a MAP and then use it as shown below.
If your list is not sorted
Maybe you should try a different approach, using a Map. Here, ssn would be the key of the map, and values could be a list of People
Map<Integer,List<People>> mymap = getMap();//loads a Map from input data.
for(Integer ssn:mymap.keyset()){
dorow(ssn,mymap.get(ssn));
}
public void dorow(Integer ssn, List<People> reasons){
system.out.print(ssn+" ");
for (People people:reasons){
system.out.print(people.getTerminationReason()+" ");
}
system.out.println("-----");//row separator.
Last but not least, you should override your hashCode() and equals() method on People class.
for example
public void int hashcode(){
return 3*this.reason.hascode();
}

Categories

Resources