DynamoDB - DynamoDBMapper query and get entire result set - java

I'm querying a DynamoDB table using the hash key. Each record in the table is uniquely identified by a hash key and a range key
DynamoDBMapper mapper;
....
MyClass myClass = new MyClass();
myClass.setHashKey(hashKey);
DynamoDBQueryExpression<MyClass> queryExpression = new DynamoDBQueryExpression<MyClass>()
.withHashKeyValues(myClass);
PaginatedQueryList<MyClass> entries = mapper.query(MyClass.class, queryExpression);
//Work with the elements of entries
When the result set is more than 1MB, how can I retrieve the rest.
I cannot find any method to get the LastEvaluatedKey as mentioned in the docs.

AWS SDK for Dynamodb Mapper handles pagination for you. It internally queries db and when you require data for more than 1MB, it queries it again and gets data for you.
If you want the complete list in one go you can use operations like size or copying the paginated result into a list which will require mapper to fetch complete result.
In short, you need not worry about LastEvaluatedKey and its handling it is handled for you.
Examples,
PaginatedQueryList<T> resultPaginatedList = dynamoDBMapper.query(getModelClass(), queryExpression);
List<T> queryList = new LinkedList<>(resultPaginatedList); //--- Line1
logger.info("Total elements found: " + queryList.size()); //--- Line2
Both line 1 and line 2 depicts operations for which mapper will fetch the complete result (by querying multiple times handled by SDK) and not just 1MB.

Related

How to get Datastore entity id from com.google.datastore.v1.Entity

I have written a code to fetch data from Google Datastore in my Google Cloud Dataflow program. I am able to fetch all fields of the entity except Id field which is autogenerated field. I have tried to use entity.getKey() but I am getting null.
Below is my code snippet,
Datastore datastore = DataflowDatastoreService.getDatastoreObject(null, null, null);
Query.Builder queryBuilder = Query.newBuilder();
Filter filter1 = Filter.newBuilder()
.setPropertyFilter(PropertyFilter.newBuilder() .setProperty(PropertyReference.newBuilder().setName("cId"))
.setOp(PropertyFilter.Operator.EQUAL)
.setValue(Value.newBuilder().setIntegerValue(1059438885900008L).build()).build()).build();
Filter filter2 = Filter.newBuilder()
.setPropertyFilter(PropertyFilter.newBuilder()
.setProperty(PropertyReference.newBuilder().setName("active"))
.setOp(PropertyFilter.Operator.EQUAL)
.setValue(Value.newBuilder().setBooleanValue(Boolean.TRUE).build()).build()).build();
Filter composeFilter = Filter.newBuilder().setCompositeFilter(CompositeFilter.newBuilder()
.addFilters(filter1).setOp(Operator.AND).addFilters(filter2).build()).build();
queryBuilder.addKind(KindExpression.newBuilder().setName("MyMaster").build());
queryBuilder.setFilter(composeFilter).build();
RunQueryRequest request = DataflowDatastoreService.makeRequest(queryBuilder.build(), null);
RunQueryResponse response = datastore.runQuery(request);
QueryResultBatch batch = response.getBatch();
List<EntityResult> entityResutls = batch.getEntityResultsList();
List<Entity> myEntities = new ArrayList<>();
Map<String, Value> entityMap = myEntities(0).getPropertiesMap();
In my code I am able to get all fields in entityMap key but I am not getting key, is there any other way through which I can fetch all the fields with Id.
Note: I'm not a java user, answer based on python experience
Indeed, entities returned in a regular query result do not contain the entity key/ID. Attempting to obtain that from the entity is rather inefficient - you need to reach to the datastore for each individual entity (not even looking at why that doesn't appear to be working for you).
If I need the entity keys/IDs I'd instead use keys-only queries - obtaining the keys, from which I can easily get:
the key IDs, locally, without making actual datastore calls (in python via key.id(), I don't know the java equivalent)
the entities via direct key lookup, which can be batched for efficiency.
entity.getKey().getPathList().get(0).getId()
This help me to achieve the result. Getting entity Id through getKey method.

MongoDB result set getting modified after execution of a query

In my application there are 2 threads:
crawl the web-sites and insert the data into MongoDB
retrieve the crawled sites and perform business logic
In order to retrieve the the crawled sites I use the following query:
Document query = new Document("fetchStatus", new Document("$lte", fetchStatusParam));
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
As the result I get all episodes, which its fetchStatusParam is less or equal to the specific value.
The next step, I store the items of the result set in HashMap<String, TrackedEpisode>, which is an object property in order to track them:
for (Document document : unfetchedEpisodes) {
this.trackedEpisodes.put(document.get("_id").toString(), new TrackedEpisode(document));
}
Then I do some business logic, which:
doesn't modify the unfetchedEpisodes result set.
doesn't remove any object from trackedEpisodes.
Up till now everything is OK.
The last step, I pass over all retrieved documents and mark them as fetched in order to prevent the duplicate fetching in the future.
for (Document document : unfetchedEpisodes) {
if (this.trackedEpisodes.containsKey(document.get("_id").toString())) {
// prevent repeated fetching
document.put("fetchStatus", FetchStatus.IN_PROCESS.getID());
if (this.trackedEpisodes.get(document.get("_id").toString()).isExpired()) {
document.put("isExpired", true);
document.put("fetchStatus", FetchStatus.FETCHED.getID());
}
} else {
System.out.println("BOO! Strange new object detected");
}
dbC_Episodes.updateOne(new Document("_id", document.get("_id")), new Document("$set", document));
}
I run this code for a couple of days and paid attention that sometimes it arrives to the else part of the if (this.trackedEpisodes.containsKey()) statement. It's weird for me, how it can be possible that unfetchedEpisodes and trackedEpisodes are not synchronized and don't contain the same items?
I began to investigate the case and paid attention that the times I arrive to "BOO! Strange new object detected" the document iterator contains the item which is in database but should not yet be in unfetchedEpisodes since I didn't execute a new query to database.
I checked couple of times the matter of storing retrieved items into trackedEpisodes and always all elements from the unfetchedEpisodes have been added to trackedEpisodes but after that sometimes I still arrive to "BOO! Strange new object detected".
My question:
Why unfetchedEpisodes gets new items after execution of a query?
Is it possible that unfetchedEpisodes will be modified by MongoDB driver after execution of Collection#query()?
Maybe should I use kind of .close() after executing a query from the MongoDB?
The used versions:
MongoDB: 3.2.3, x64
MongoDB Java Driver: mongodb-driver-3.2.2, mongodb-driver-core-3.2.2, bson-3.2.2
When you call find here:
FindIterable<Document> unfetchedEpisodes = dbC_Episodes.find(query);
you are not actually getting all the episodes back. You are getting a database cursor pointing to the matched documents.
Then when you call:
for (Document document : unfetchedEpisodes){}
an iterator is created over all of the documents that match the query.
When you call it a second time, a new cursor is returned, for the same query, and all of the documents that match now are iterated over.
If the collection has changed in between, the results will be different.
If you want to ensure that the contents of unfetchedEpisodes are unchanged then one option is you could pull the entire result set into memory and iterate over it in memory rather than on the DB, e.g.
ArrayList<Document> unfetchedEpisodes = dbC_Episodes.find(query).into(new ArrayList<Document>());

Query all items in DynamoDB from a given hash key with a hash-range schema using java sdk

EDIT:
I was actually incorrect. I was querying the table when I meant to query an index which explains my error. Vikdor's solution is a valid one though.
ORIGINAL:
I have a table with a Hash-Range key schema in DynamoDB. I need to be able to get all items associated with a specific hash key but it seems to require a range key condition. My issue is I want EVERY range key but there is no wildcard option. As of right now my range key is a string and the only way I could think to do this is by querying all range keys greater or equal to the smallest ascii characters I can use since the documentation says it sorts based on ascii character values.
I looked into scanning but it appears that simply will read the entire table which is NOT an option.
Is there any better way to query for all values of a hash key or can anyone confirm that using the method with the ascii character will work?
but it seems to require a range key condition.
This doesn't sound to be true.
I use DynamoDBMapper and use DynamoDBQueryExpression to query all the records with a given HashKey as follows:
DynamoDBQueryExpression<DomainObject> query =
new DynamoDBQueryExpression<DomainObject>();
DomainObject hashKeyValues = new DomainObject();
hashKeyValues.setHashKey(hashKeyValue);
query.setHashKeyValues(hashKeyValues);
// getMapper() returns a DynamoDBMapper object with the appropriate
// AmazonDynamoDBClient object.
List<DomainObject> results = getMapper().query(query);
HTH.
You can use DynamoDB's query API, which allows you to query the database based conditional expressions using the hash/range keys. You can see examples of the API here. Here is a relevant example:
ItemCollection<QueryOutcome> items = table.query("theHashFieldName", "theHashFieldToQuery");
You can also query using more complex expressions. E.g.:
DynamoDB dynamoDB = new DynamoDB(
new AmazonDynamoDBClient(new ProfileCredentialsProvider()));
Table table = dynamoDB.getTable("TableName");
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("Id = :v_id")
.withValueMap(new ValueMap()
.withString(":v_id", "TheId"));
ItemCollection<QueryOutcome> items = table.query(spec);
Iterator<Item> iterator = items.iterator();
Item item = null;
while (iterator.hasNext()) {
item = iterator.next();
System.out.println(item.toJSONPretty());
}

How to get features record for plan estimate change using lookback API

I am using rally lookback api with java. I am trying to fetch historical data features, sample code that i am using is as shown below.
LookbackApi lookbackApi = new LookbackApi();
lookbackApi.setCredentials("username", "password");
lookbackApi.setWorkspace(47903209423);
lookbackApi.setServer("https://rally1.rallydev.com");
//lookbackApi.setWorkspace("90432948");
LookbackQuery query = lookbackApi.newSnapshotQuery();
query.addFindClause("_TypeHierarchy", "PortfolioItem/Feature");
query.setPagesize(200) // set pagesize to 200 instead of the default 20k
.setStart(200) // ask for the second page of data
.requireFields("ScheduleState", // A useful set of fields for defects, add any others you may want
"ObjectID",
"State",
"Project",
"PlanEstimate",
"_ValidFrom",
"_ValidTo")
.sortBy("_UnformattedID")
.hydrateFields("ScheduleState","State", "PlanEstimate","Project"); // ScheduleState will come back as an OID if it doesn't get hydrated
LookbackResult resultSet = query.execute();
int resultCount = resultSet.Results.size();
Map<String,Object> firstSnapshot = resultSet.Results.get(0);
Iterator<Map<String,Object>> iterator = resultSet.getResultsIterator();
while (iterator.hasNext()) {
Map<String, Object> snapshot = iterator.next();
}
I need a way to put a condition so that it will fetch all the records from history which will have plan estimate changed,but will ignore other history for any feature and underlying user story. I need it this way so that we can track plan estimate change but, will be able to avoid fetching un-necessary data and reduce the time to do this.
I'm not familiar with the java toolkit, but using the raw Lookback API, you would accomplish this with a filter clause like {"_PreviousValues.PlanEstimate": {"$exists": true}}.
Map ifExist = new HashMap();
ifExist.put("$exists", true);
// Note:- true is java boolean, be careful with this as string "true" will not work.
query.addFindClause("_PreviousValues.PlanEstimate",ifExist);
Additinally one need to consider adding "_PreviousValues.PlanEstimate" into
.requireFields() in case only "PlanEstimate" is required to hydrated

How to retrieve only the PKs of a table

I'm working with Java Apache Cayenne, under a MySQL DB.
I have quite a large table with a single bigint PK and some fields.
I'd like to retrieve just only the PK values and not all the object that maps this entity, as it would be too resource-consuming.
Is there a snippet that I can use, instead of this one that retrieves all the objects?
ObjectContext context = ...
SelectQuery select = new SelectQuery(MyClass.class);
List<MyClass> result = context.performQuery(select);
You should try using SQLTemplate instead of SelectQuery.
Here's a quick example:
ObjectContext context = ...
SQLTemplate select = new SQLTemplate(MyClass.class, "SELECT #result('PK_COLUMN' 'long') FROM MY_TABLE");
List result = context.performQuery(select);
You can find more information here
+1 for Josemando's answer. And here is another way that may work in case you are planning to only work with a subset of fetched objects:
ObjectContext context = ...
SelectQuery select = new SelectQuery(MyClass.class);
select.setPageSize(100);
List<MyClass> result = context.performQuery(select);
'setPageSize' ensures that 'result' only contains ids, until you attempt to read an object from the list. And when you do, it will resolve it page-by-page (100 objects at a time in the example above). This may fit a number of scenarios. Of course if you iterate through the entire list, eventually all objects will be fully resolved, and there will be no memory benefit.

Categories

Resources