Dynamodb parallel scan using Table.scan api in Java

Dynamodb parallel scan using Table.scan api in Java - java

I would appreciate help from anyone familiar with how DynamoDB work.
I need to perform scan on a large DynamoDB table. I know that DynamoDBClient scan operation is limited to 1 MB size of returned data. Does the same restriction apply to Table.scan operation? The thing is that Table.scan operation returns output of type "ItemCollection<ScanOutcome>", while DynamoDBClient scan returns ScanResult output and it is not clear to me whether these operations work in a similar way or not.
I have checked this example: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ScanJavaDocumentAPI.html, but it doesn't contain any hints about using last returned key.
My questions are:
Do I still need to make scan calls in a cycle until lastreturnedkey is null if I use Table.scan? If yes, how do I get last key? If not, how can I enforce pagination?
Any links to code examples would be appreciated. I have spent some time googling for examples, but most of them are either using DynamoDBClient or DynamoDBMapper, while I need to use Table and Index objects instead.
Thanks!

If you iterate over the output of Table.scan(), the SDK will do pagination for you.

Related

In Spring Data Redis, how to pass multiple keys into leftpop() (rightpop()) method with timeout?

As far as I'm concerned, this method is similar to BLPOP command in redis-cli. The latter, however, is able to get multiple lists in its signature.
Is it possible with leftPop() method too? From docs:
Removes and returns first element from lists stored at key .
It seems that it is possible, but I can't realize how to do it properly.
Thanks in advance.

As far as I know ListOperations does not support any operation with multiple keys

Writing to GCS from dataflow based on windowing and element count

I am attempting to implement a solution where I need to write data (json) messages from pubsub into GCS using dataflow. My question is exactly similar to this one
I need to write either based on windowing or element count.
Here is the code sample for writes from the the above question:
windowedValues.apply(FileIO.<String, String>writeDynamic()
.by(Event::getKey)
.via(TextIO.sink())
.to("gs://data_pipeline_events_test/events/")
.withDestinationCoder(StringUtf8Coder.of())
.withNumShards(1)
.withNaming(key -> FileIO.Write.defaultNaming(key, ".json")));
The solution suggests using FileIO.WriteDynamic function. But i am not able to understand what .by(Event::getKey) does and where it comes from.
Any help on this is greatly appreciated.

It's partitioning elements into groups according to events' keys.
From my understanding, the events come from a PCollection using the KV class since it has the getKey method.
Note that :: is a new operator included in Java 8 that is used to refer a method of a class.

Ignite Questions about "qryexe"

I figured out "qryfldexe" is able to query cross caches with multiple "join", but couldn't figure out if "qryexe" also has a way to achieve that.
The reason of having desire from question 1 is because "qryexe" returns items in key value fashion, but "qryfldexe" returns items in an array that only has values in each array item. Is there a java lib to load the type of json returned by "qryfldexe" into a json object based on the fields metadata in the end(or beginning) of the json payload?
Many thanks

No, qryexe only works on a single cache (same as its Java counterpart, SqlQuery).
There is no Java client for Ignite REST API as far as I know, but there is also a more efficient Thin Client Protocol and a work-in-progress Java client for it (see this mailing list thread with description and some links).

pub.document.sortDocuments not sorting

I am stuck, I had this working last week now I have changed something and it will not work!
I have a simple flow service as follows:
pub.file.getFile
pub.flatFile.convertToValues
pub.document.sortDocuments
But the sortDocuments stage is not doing anything.
The recordWithNoID document list is perfect and all the fields are correct (so the schema and dictionary are working as intended), but when I try to sort it on the key "Field1" the sort is not doing anything, the documents are not changing order at all.
See two attached screenshots:
Screenshot 1 shows the pipeline during pub.document.sortDocuments step
key variable is: Field1
order variable is:ascending
Screenshot 2 shows the recordwithNoID after running the flow service. As you can see the Field1 column has not been ordered correctly.(it's still in the original document order) I have also tried mapping the results to other document types with the same result.
As I said above I had this working last week and now cannot seem to get it to work. I have even started the whole process from scratch and it still will not work. Any help would be very much appreciated!
Screenshot1
Screenshot 2
EDIT:
I resolved this issue by mapping to the Document Type created from the Schema.

It appears that you map the ffValues document (IData) and not the recordWithNoID document list (IData array) inside it, which would be the wrong level.
Please map the recordWithNoID instead and let us know if that solves the issue.
While not related to the question, it seems that some "clutter" is on the pipeline. I always recommend to people that they drop variable as early as possible. Mostly to improve readability but also for performance.

I am not sure but maybe this is the problem: on screenshoot1 we can see that you sort ffValues but you are mapping it to the document. (because you are using invoke, it is done automatically)
Is screen number two is showing ffValues or document variable ?
Maybe you are checking wrong, not sorted variable?
I am also want to suggest to use Map and transformer rather than invoke, because using map gives you power to control the pipeline.
While using invoke each variable is save to the pipeline (having varaible with the same name on pipeline ale on the output of the service will result with overwrite on pipeline variable).

How to get the Google's search result using Java

According to the answer in here, using Gson we can programmatically achieve to retrieve the result that Google will return to a query. Nonetheless, yet there are 2 questions are remaining in my mind:
How can we do similar thing for Bing?
How can we get more than 4 results based on the referred answer? Because the results.getResponseData().getResults().get(n).getUrl() for n>4 returns exception.

As #Niklas noted, google search api is deprecated, thus you should not use it for your project. Currently the only solution would be to get search result by http request to get a html search results and than parse it yourself.
In case of Bing, there is a search API, but it has a limited number of calls for free users. If you need to make a lot of requests, than you will have to pay for it. https://datamarket.azure.com/dataset/5BA839F1-12CE-4CCE-BF57-A49D98D29A44

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Dynamodb parallel scan using Table.scan api in Java - java

If you iterate over the output of Table.scan(), the SDK will do pagination for you.

Related

In Spring Data Redis, how to pass multiple keys into leftpop() (rightpop()) method with timeout?

Writing to GCS from dataflow based on windowing and element count

Ignite Questions about "qryexe"

pub.document.sortDocuments not sorting

How to get the Google's search result using Java

Categories

Resources