I need to implement a Query operation on DynamoDB. Right now I'm doing it by giving the HashKey and then filtering out the results according to my conditions on non-key attributes.
This is what I'm doing :
MusicData hashKey = new MusicData();
hashKey.setID(singer);
DynamoDBQueryExpression<MusicData> queryExpression = new DynamoDBQueryExpression<MusicData>().withHashKeyValues(hashKey);
List<MusicData> queryResult = mapper.query(MusicData.class, queryExpression);
for (MusicData musicData : queryResult) {
if( my condtions ) {
do something;
}
}
What I'm trying to do is to be able to do something like this :
MusicData hashKey = new MusicData();
hashKey.setID(singer);
hashKey.setAlbum(sampleAlbum);
hashKey.setSinger(duration);
DynamoDBQueryExpression<MusicData> queryExpression = new DynamoDBQueryExpression<MusicData>().withHashKeyValues(hashKey);
List<MusicData> queryResult = mapper.query(MusicData.class, queryExpression);
for (MusicData musicData : queryResult) {
if( my condtions ) {
do something;
}
}
And get results already filtered out. Is there a way to do this in DynamoDB?
Yes, you can ask DynamoDB to perform filtering of queries before it returns results. However, you will still incur the 'cost' of reading these items even though they are not returned to your client. This is still a good practice as it will eliminate unnecessary transfer of items over the network.
To do this you will call additional methods on your DynamoDBQueryExpression object, specifically withFilterExpression and addExpressionAttributeNamesEntry / addExpressionAttributeValuesEntry to complete the expression.
Without the specific example of what type of conditions you want to apply it is hard to give an example, but depending on how simple your condition you want to apply is you could contain it in just the withFilterExpression method.
DynamoDBQueryExpression<MusicData> queryExpression = new DynamoDBQueryExpression<MusicData>().withHashKeyValues(hashKey).withFilterExpression("foo > 10");
Related
I'm developing an application in Quarkus that integrates with the DynamoDB database. I have a query method that returns a list and I'd like this list to be paginated, but it would have to be done manually by passing the parameters.
I chose to use DynamoDBMapper because it gives more possibilities to work with lists of objects and the level of complexity is lower.
Does anyone have any idea how to do this pagination manually in the function?
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression()
.withLimit(pageSize)
.withExclusiveStartKey(paginationToken);
PaginatedScanList<YourModel> result = mapper.scan(YourModel.class, scanExpression);
String nextPaginationToken = result.getLastEvaluatedKey();
You can pass the pageSize and paginationToken as parameters to your query method. The nextPaginationToken can be returned along with the results, to be used for the next page.
DynamoDB Mapper paginates by iterating over the results, by lazily loading the dataset:
By default, the scan method returns a "lazy-loaded" collection. It initially returns only one page of results, and then makes a service call for the next page if needed. To obtain all the matching items, iterate over the result collection.
Ref
For example:
List<Customer> result = mapper.scan(Customer.class, scanExpression);
for ( Customer cust : result ) {
System.out.println(cust.getId());
}
To Scan manually page by page you can use ScanPage
final DynamoDBScanExpression scanPageExpression = new DynamoDBScanExpression()
.withLimit(limit);
do {
ScanResultPage<MyClass> scanPage = mapper.scanPage(MyClass.class, scanPageExpression);
scanPage.getResults().forEach(System.out::println);
System.out.println("LastEvaluatedKey=" + scanPage.getLastEvaluatedKey());
scanPageExpression.setExclusiveStartKey(scanPage.getLastEvaluatedKey());
} while (scanPageExpression.getExclusiveStartKey() != null);
Ref
Ref
I need to create a mongotemplate database query to get a specific number of elements into a list.
At the moment I just get all the elements with findAll(), and then I modify the obtained data using code that I have writen within the service class.
Initially, I have a Laptop class with fields price::BigDecimal and name::String and I use findAll() to get a list of them.
Then I put those in a HashMap, where key is the name field, sorted from most expensive to cheapest.
Map<String, List<Laptop>> laptopsMap = laptopsFrom.stream()
.collect(Collectors.groupingBy(Laptop::getName,
Collectors.collectingAndThen(Collectors.toList(),
l -> l.stream()
.sorted(Comparator.comparing(Laptop::getPrice).reversed())
.collect(Collectors.toList())
))
);
So the results are like below:
[{"MSI", [2200, 1100, 900]},
{"HP", [3200, 900, 800]},
{"Dell", [2500, 2000, 700]}]
Then, I use the code in the bottom of the question, to create a Laptop list with the following contents:
[{"HP", 3200}, {"Dell", 2500}, {"MSI", 2200},
{"Dell", 2000}, {"MSI", 1100}, {"HP", 900},
{"MSI", 900}, {"HP", 800}, {"Dell", 700}]
So basically, I iterate the map and from each key, I extract the next in line element of the list.
do {
for (Map.Entry<String, List<Laptop>> entry :
laptopsMap.entrySet()) {
String key = entry.getKey();
List<Laptop> value = entry.getValue();
finalResultsList.add(value.get(0));
value.remove(0);
if (value.size() == 0) {
laptopsMap.entrySet()
.removeIf(pr -> pr.getKey().equals(key));
} else {
laptopsMap.replace(key, value);
}
}
} while(!laptopsMap.isEmpty());
I instead of all this in-class code need to use a mongoTemplate database argument, but I cant seem to figure out how to create such a complex query. I have read material about Aggregation but I have not found anything helpful enough. At the moment, I have started putting a query together as shown below:
Query query = new Query();
query.limit(numOfLaptops);
query.addCriteria(Criteria.where(Laptop.PRICE).gte(minPrice));
I am trying to call BatchGetItem to retrieve items from DynamoDB. As input we can get a list of up to 1000 keys (or as little as 1 key). These keys coincide with the hashKey for our DynamoDB table.
Since the BatchGetItem API only takes in up to 100 items per call, I am trying to split up the request into batches of only 100 items each, make the calls in parallel, and then merge the results into a single Set again.
For those unfamiliar with the DynamoDB who could still give advice on an extremely stripped down version (1st example) I'd appreciate it! Otherwise, please see the second more accurate example below.
1st Example - extremely stripped down
public Set<SomeResultType> retrieveSomething(Set<String> someSet) {
ImmutableSet.Builder<SomeResultType> resultBuilder = ImmutableSet.builder();
// FIXME - how to parallelize?
for (List<Map<String, String>> batch : Iterables.partition(someSet, 100)) {
result = callSomeLongRunningAPI(batch);
resultBuilder.addAll(result.getItems());
}
return resultBuilder.build();
}
2nd Example - closer to my actual problem -
Below is a stripped down, dummy version of what I'm currently doing (as such, please forgive formatting / style issues). It currently works and gets all the items, but I can't figure out how to get the batches (see FIXME) to get executed in parallel and end up in a single set. Since performance is pretty important in the system I'm trying to build, any tips would be appreciated in helping this code be more efficient!
public Set<SomeResultType> retrieveSomething(Set<String> someIds) {
if (someIds.isEmpty()) {
// handle this here
}
Collection<Map<String, AttributeValue>> keyAttributes = someIds.stream()
.map(id -> ImmutableMap.<String, AttributeValue>builder()
.put(tableName, new AttributeValue().withS(id)).build())
.collect(ImmutableList.toImmutableList());
ImmutableSet.Builder<SomeResultType> resultBuilder = ImmutableSet.builder();
Map<String, KeysAndAttributes> itemsToProcess;
BatchGetItemResult result;
// FIXME - make parallel?
for (List<Map<String, AttributeValue>> batch : Iterables.partition(keyAttributes, 100)) {
KeysAndAttributes keysAndAttributes = new KeysAndAttributes()
.withKeys(batch)
.withAttributesToGet(...// some attribute names);
itemsToProcess = ImmutableMap.of(tableName, keysAndAttributes);
result = this.dynamoDB.batchGetItem(itemsToProcess);
resultBuilder.addAll(extractItemsFromResults(tableName, result));
}
return resultBuilder.build());
}
Help with either the super stripped down case or the 2nd example would be greatly appreciated! Thanks!
I have a typed dataset of a custom class and use groupbykey method on it. You know that it results a KeyValueGroupedDataset. I want to filter this new dataset but there is no filter method for this type of dataset. So, My question is: How can I filter on this type of dataset? (Java solution is needed. spark version: 2.3.1).
sampleData:
"id":1,"fname":"Gale","lname":"Willmett","email":"gwillmett0#nhs.uk","gender":"Female"
"id":2,"fname":"Chantalle","lname":"Wilcher","email":"cwilcher1#blinklist.com","gender":"Female"
"id":3,"fname":"Polly","lname":"Grandisson","email":"pgrandisson2#linkedin.com","gender":"Female"
"id":3,"fname":"Moshe","lname":"Pink","email":"mpink3#twitter.com","gender":"Male"
"id":2,"fname":"Yorke","lname":"Ginnelly","email":"yginnelly4#apple.com","gender":"Male"
And What I did:
Dataset<Person> peopleDS = spark.read().format("parquet").load("\path").as(Encoders.bean(Person.class));
KeyValueGroupedDataset<String, Person> KVDS = peopleDS.groupByKey( (MapFunction<Person, String> ) f -> f.getGender() , Encoders.STRING());
//How Can I filter on KVDS's id field?
Update1 (use of flatMapGroups):
Dataset<Person> persons = KVDS.flatMapGroups((FlatMapGroupsFunction <String,Person,Person>) (f,k) -> (Iterator<Person>) k , Encoders.bean(Person.class));
Update2 (use of MapGroups)
Dataset<Person> peopleMap = KVDS.mapGroups((MapGroupsFunction <String,Person,Person>) (f,g) -> {
while (g.hasNext()) {
//What can I do here?
}
},Encoders.bean(Person.Class);
Update3 : I want to filter those groups that distinct of their ids is greater than 1. for example in below picture: I want just Female groups because distinct of their ids is greater that 1 (first field is id. Others are fname,lname,email and gender).
Update4: I did What I want with "RDD", but I want to do exactly this part of code with "Dataset":
List<Tuple2<String, Iterable<Person>>> f = PersonRDD
.mapToPair(s -> new Tuple2<>(s.getGender(), s)).groupByKey()
.filter(t -> ((Collection<Person>) t._2()).stream().mapToInt(e -> e.getId).distinct().count() > 1)
.collect();
Why don't you filter on id before grouping ? GroupByKey is an expensive action, it should be faster to filter first.
If you really want to group first, you may have to then use .flatMapGroups with identity function.
Not sure about java code but scala version would be something as follow:
peopleDS
.groupByKey(_.gender)
.mapGroups { case (gender, persons) => persons.filter(your condition) }
But again, you should filter first :). Specially since your ID field is already available before grouping.
Grouping is used for aggregation functions, you can find functions like "agg" in "KeyValueGroupedDataset" class. If you apply aggregation function for ex. "count", you will get "Dataset", and "filter" function will be available.
"groupBy" without aggregation function looks strange, other function, for ex. "distinct" can be used.
Filtering example with "FlatMapGroupsFunction":
.flatMapGroups(
(FlatMapGroupsFunction<String, Person, Person>) (f, k) -> {
List<Person> result = new ArrayList<>();
while (k.hasNext()) {
Person value = k.next();
// filter condition here
if (value != null) {
result.add(value);
}
}
return result.iterator();
},
Encoders.bean(Person.class))
I want to execute a query in java where path and _id are two fields of the mongo document.
I want to get results list where these two fields are equal in the document.
I have tried using the following query.But could not retrieve the results properly.Received empty list which is not the case.
List<Metadata> MetadataList= ops.find(new Query(Criteria.where("path").is("_id")), Metadata.class);
How to get results where two field values are equal in mongo.
What you are looking for is the $where operator in MongoDB. Standard query operations do not compare the values of one field against another. In order to do this, you need to employ the JavaScript evaluation server side which can actually compare the two field values:
BasicQuery query = new BasicQuery(
new BasicDBObject("$where", "return this._id == this.path")
);
<Metadata> MetadataList = ops.find(query, Metadata.class);
Or you can do the same thing with native operators through the $redact pipeline stage available to the aggregation framework.
Pretty sure there is no $redact support in spring mongo as yet, but you can wrap the aggregation operation with a class to do so:
public class CustomAggregationOperation implements AggregationOperation {
private DBObject operation;
public CustomAggregattionOperation (DBObject operation) {
this.operation = operation;
}
#Override
public DBObject toDBObject(AggregationOperationContext context) {
return context.getMappedObject(operation);
}
}
And use it like this:
Aggregation aggregation = newAggregation(
new CustomAggregationOperation(
new BasicDBObject(
"$redact",
new BasicDBObject("$cond",
new BasicDBObject()
.append("if", new BasicDBObject(
"$eq", Arrays.asList("$_id", "$path")
))
.append("then", "$$KEEP")
.append("else", "$$PRUNE")
)
)
)
);
AggregationResults<Metadata> results = ops.aggregate(
(TypedAggregation<Metadata>) aggregation, Metadata.class);
So basic MongoDB query operations do not compare field values against each other. To do this you need to follow one of the methods here.
You can use BasicDBObject to add condition.
Try something
BasicDBObject query = new BasicDBObject("path", new BasicDBObject("$eq", "_id");
collection.find(query);
Please refer the below link for more information
http://mongodb.github.io/mongo-java-driver/2.13/getting-started/quick-tour/