Put multiple items into DynamoDB by Java code - java

I would like use batchWriteItem method of SDK Amazon to put a lot of items into table.
I retrive the items from Kinesis, ad it has a lot of shard.
I used this method for one item:
public static void addSingleRecord(Item thingRecord) {
// Add an item
try
{
DynamoDB dynamo = new DynamoDB(dynamoDB);
Table table = dynamo.getTable(dataTable);
table.putItem(thingRecord);
} catch (AmazonServiceException ase) {
System.out.println("addThingsData request "
+ "to AWS was rejected with an error response for some reason.");
System.out.println("Error Message: " + ase.getMessage());
System.out.println("HTTP Status Code: " + ase.getStatusCode());
System.out.println("AWS Error Code: " + ase.getErrorCode());
System.out.println("Error Type: " + ase.getErrorType());
System.out.println("Request ID: " + ase.getRequestId());
} catch (AmazonClientException ace) {
System.out.println("addThingsData - Caught an AmazonClientException, which means the client encountered "
+ "a serious internal problem while trying to communicate with AWS, "
+ "such as not being able to access the network.");
System.out.println("Error Message: " + ace.getMessage());
}
}
public static void addThings(String thingDatum) {
Item itemJ2;
itemJ2 = Item.fromJSON(thingDatum);
addSingleRecord(itemJ2);
}
The item is passed from:
private void processSingleRecord(Record record) {
// TODO Add your own record processing logic here
String data = null;
try {
// For this app, we interpret the payload as UTF-8 chars.
data = decoder.decode(record.getData()).toString();
System.out.println("**processSingleRecord - data " + data);
AmazonDynamoDBSample.addThings(data);
} catch (NumberFormatException e) {
LOG.info("Record does not match sample record format. Ignoring record with data; " + data);
} catch (CharacterCodingException e) {
LOG.error("Malformed data: " + data, e);
}
}
Now if i want to put a lot of record, I will use:
public static void writeMultipleItemsBatchWrite(Item thingRecord) {
try {
dataTableWriteItems.addItemToPut(thingRecord);
System.out.println("Making the request.");
BatchWriteItemOutcome outcome = dynamo.batchWriteItem(dataTableWriteItems);
do {
// Check for unprocessed keys which could happen if you exceed provisioned throughput
Map<String, List<WriteRequest>> unprocessedItems = outcome.getUnprocessedItems();
if (outcome.getUnprocessedItems().size() == 0) {
System.out.println("No unprocessed items found");
} else {
System.out.println("Retrieving the unprocessed items");
outcome = dynamo.batchWriteItemUnprocessed(unprocessedItems);
}
} while (outcome.getUnprocessedItems().size() > 0);
} catch (Exception e) {
System.err.println("Failed to retrieve items: ");
e.printStackTrace(System.err);
}
}
but how can I send the last group? because I send only when I have 25 items, but at the end the number is lower.

You can write items to your DynamoDB table one at a time using the Document SDK in a Lambda function attached to your Kinesis Stream using PutItem or UpdateItem. This way, you can react to Stream Records as they appear in the Stream without worrying about whether there are any more records to process. Behind the scenes, BatchWriteItem consumes the same amount of write capacity units as the corresponding PutItem calls. A BatchWriteItem will be as latent as the PUT in the batch that takes the longest. Therefore, using BatchWriteItem, you may experience higher average latency than with parallel PutItem/UpdateItem calls.

Related

JackMidi.eventWrite - time parameter

I'm using this library: https://github.com/jaudiolibs/jnajack
I created a simple project to reproduce my issue: https://github.com/sc3sc3/MidiJnaJackTest
I have a JackPort outputPort running and appears in QjackCtl in 'Output Ports'.
In QjackCtl this outputPort is connected to GMIDImonitor, to observe Midi traffic.
I send MidiMessages to GMIDImonitor via method below.
I can't figure out the value of the time parameter.
When I set time = jackClient.getFrameTime() then the message does not arrive in GMIDImonitor.
When I set it to for example to 300 then one message is being sent eternally in a loop.
Any help? Thanks
public void processMidiMessage(ShortMessage shortMessage) {
System.out.println("processMidiMessage: " + shortMessage + ", on port: " + this.outputPort.getName());
try {
JackMidi.clearBuffer(this.outputPort);
} catch (JackException e) {
e.printStackTrace();
}
try {
int time = 300;
JackMidi.eventWrite(this.outputPort, time, shortMessage.getMessage(), shortMessage.getLength());
} catch (JackException e) {
e.printStackTrace();
}
}

No response in SQSMessageSuccess while detecting faces inside a video uploaded on Amazon s3

I had been trying to detect faces from a video stored on Amazon S3, the faces have to be matched against the collection that has the faces which are to be searched for in the video.
I have used Amazon VideoDetect.
My piece of code, goes like this:
CreateCollection createCollection = new CreateCollection(collection);
createCollection.makeCollection();
AddFacesToCollection addFacesToCollection = new AddFacesToCollection(collection, bucketName, image);
addFacesToCollection.addFaces();
VideoDetect videoDetect = new VideoDetect(video, bucketName, collection);
videoDetect.CreateTopicandQueue();
try {
videoDetect.StartFaceSearchCollection(bucketName, video, collection);
if (videoDetect.GetSQSMessageSuccess())
videoDetect.GetFaceSearchCollectionResults();
} catch (Exception e) {
e.printStackTrace();
return false;
}
videoDetect.DeleteTopicandQueue();
return true;
The things seem to work fine till StartFaceSearchCollection and I am getting a jobId being made and a queue as well. But when it is trying to go around to get GetSQSMessageSuccess, its never returning me any message.
The code which is trying to fetch the message is :
ReceiveMessageRequest.Builder receiveMessageRequest = ReceiveMessageRequest.builder().queueUrl(sqsQueueUrl);
messages = sqs.receiveMessage(receiveMessageRequest.build()).messages();
Its having the correct sqsQueueUrl which exist. But I am not getting anything in the message.
On timeout its giving me this exception :
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: sqs.region.amazonaws.com
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
Caused by: java.net.UnknownHostException: sqs.region.amazonaws.com
So is there any alternative to this, instead of SQSMessage, can we track/poll the jobId any other way ?? Or I am missing out on anything ??
The simple working code snippet to receive SQS message with the valid sqsQueueUrl for more
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(sqsQueueUrl);
final List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (final Message message : messages) {
System.out.println("Message");
System.out.println(" MessageId: " + message.getMessageId());
System.out.println(" ReceiptHandle: " + message.getReceiptHandle());
System.out.println(" MD5OfBody: " + message.getMD5OfBody());
System.out.println(" Body: " + message.getBody());
for (final Entry<String, String> entry : message.getAttributes().entrySet()) {
System.out.println("Attribute");
System.out.println(" Name: " + entry.getKey());
System.out.println(" Value: " + entry.getValue());
}
}
System.out.println();

Why are threads using the same variable value? RxJava Mqtt

I'm using rxmqtt (which uses rxjava and paho) to communicate whith a mqtt broker. I'm using javax to accept rest requests and publish some content to the broker and wait for a response. The code below works fine if I make one request at a time, but if I have more than one concurrent requests it only returns a response for the last one and the others fall into the timeout exception.
The mqttConn.getMqttMessages() returns a flowable which is already subscribed to all topics i need:
public Flowable<MqttMessage> getMqttMessages() {
return this.obsClient.subscribe("pahoRx/fa/#", 1);
}
and MqttConnection is a singleton because i only want one single connection to broker and all the publishes are done in this connection
I've noticed that my queryParam id is different in each thread execution of the web service request (expected behavior), but when it enters the subscription part of the code it only considers the last id value and does not pass my validation in the takeUntil method:
mqttConn.getMqttMessages().timeout(20, TimeUnit.SECONDS).takeUntil(msgRcv -> {
System.out.println("received: " + new String(msgRcv.getPayload()) + " ID: " + id);
return id.equals(new String(msgRcv.getPayload()));
}).blockingSubscribe(msgRcv -> {
final byte[] body = msgRcv.getPayload();
System.out.println(new String(body)); //printing... but not sending the reponse
response.set("Message Receiced: " + new String(msgRcv.getPayload()));
return;
}, e -> {
if (e instanceof TimeoutException) {
response.set("Timeout Occured");
} else {
response.set("Some kind of error occured " + e.getLocalizedMessage());
}
});
The thing is, why is it only considering the last id received when each request should have its own independent thread? I've tried getting mqttConn.getMqttConnection() as a ThreadLocal object... doesn't fix.
Full WS code:
#Path("/test")
#GET
public String test(#QueryParam("id") String id) throws InterruptedException, MqttException {
String funcExec = "pahoRx/fe/";
String content = "unlock with single connection to broker";
int qos = 1;
AtomicReference<String> response = new AtomicReference<String>();
response.set("Initial Value");
MqttConnection mqttConn = MqttConnection.getMqttConnection();
ObservableMqttClient obsClient = mqttConn.getBrokerClient();
MqttMessage msg = MqttMessage.create(78, content.getBytes(), qos, false);
String topicPub = funcExec + id;
obsClient.publish(topicPub, msg).subscribe(t -> {
System.out.println("Message Published");
}, e -> {
System.out.println("Failed to publish message: " + e.getLocalizedMessage());
});
mqttConn.getMqttMessages().timeout(20, TimeUnit.SECONDS).takeUntil(msgRcv -> {
System.out.println("received: " + new String(msgRcv.getPayload()) + " ID: " + id);
return id.equals(new String(msgRcv.getPayload()));
}).blockingSubscribe(msgRcv -> {
final byte[] body = msgRcv.getPayload();
System.out.println(new String(body)); //printing... but not sending the reponse
response.set("Message Receiced: " + new String(msgRcv.getPayload()));
return;
}, e -> {
if (e instanceof TimeoutException) {
response.set("Timeout Occured");
} else {
response.set("Some kind of error occured " + e.getLocalizedMessage());
}
});
return response.get();
}
I hope the explanation is clear enough!
Ty in advance

Spark - Restore nested saved RDD

I am using AWS S3 as a backup storage for data coming in to our Spark cluster. Data comes in every second and is processed when 10 seconds of data has been read. The RDD containing the 10 seconds of data is stored to S3 using
rdd.saveAsObjectFile(s3URL + dateFormat.format(new Date()));
This means that we get a lot of files added to S3 each day in the format of
S3URL/2017/07/23/12/00/10, S3URL/2017/07/23/12/00/20 etc
From here it is easy to restore the RDD which is a
JavaRDD<'byte[]>
using either
sc.objectFile or the AmazonS3 API
The problem is that to reduce the number of files needed to iterate through we run a daily cron job that goes through each file during a day, bunch the data together and store the new RDD to S3. This is done as follows:
List<byte[]> dataList = new ArrayList<>(); // A list of all read messages
/* Get all messages from S3 and store them in the above list */
try {
final ListObjectsV2Request req = new ListObjectsV2Request().withBucketName("bucketname").withPrefix("logs/" + dateString);
ListObjectsV2Result result;
do {
result = s3Client.listObjectsV2(req);
for (S3ObjectSummary objectSummary :
result.getObjectSummaries()) {
System.out.println(" - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() +
")");
if(objectSummary.getKey().contains("part-00000")){ // The messages are stored in files named "part-00000"
S3Object object = s3Client.getObject(
new GetObjectRequest(objectSummary.getBucketName(), objectSummary.getKey()));
InputStream objectData = object.getObjectContent();
byte[] byteData = new byte[(int) objectSummary.getSize()]; // The size of the messages differ
objectData.read(byteData);
dataList.add(byteData); // Add the message to the list
objectData.close();
}
}
/* When iterating, messages are split into chunks called continuation tokens.
* All tokens have to be iterated through to get all messages. */
System.out.println("Next Continuation Token : " + result.getNextContinuationToken());
req.setContinuationToken(result.getNextContinuationToken());
} while(result.isTruncated() == true );
} catch (AmazonServiceException ase) {
System.out.println("Caught an AmazonServiceException, " +
"which means your request made it " +
"to Amazon S3, but was rejected with an error response " +
"for some reason.");
System.out.println("Error Message: " + ase.getMessage());
System.out.println("HTTP Status Code: " + ase.getStatusCode());
System.out.println("AWS Error Code: " + ase.getErrorCode());
System.out.println("Error Type: " + ase.getErrorType());
System.out.println("Request ID: " + ase.getRequestId());
} catch (AmazonClientException ace) {
System.out.println("Caught an AmazonClientException, " +
"which means the client encountered " +
"an internal error while trying to communicate" +
" with S3, " +
"such as not being able to access the network.");
System.out.println("Error Message: " + ace.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
JavaRDD<byte[]> messages = sc.parallelize(dataList); // Loads the messages into an RDD
messages.saveAsObjectFile("S3URL/daily_logs/" + dateString);
This all works fine, but now I am not sure how to actually restore the data to a manageable state again. If I use
sc.objectFile
to restore the RDD I end up with a JavaRDD<'byte[]> where the byte[] is actually a JavaRDD<'byte[]> in itself. How can I restore the nested JavaRDD from the byte[] located in the JavaRDD<'byte[]>?
I hope this somehow makes sense and I am grateful for any help. In a worst case scenario I have to come up with another way to backup the data.
Best regards
Mathias
I solved it by instead of storing a nested RDD I flatmapped all the byte[] into a single JavaRDD and stored that one instead.

Solr Trigger Optimize And Check Progress From Java Code

From this topic there are two ways to trigger solr optimize from Java code. Either sending an http request, or using solrj api.
But how to check the progress of it?
Say, an api which returns the progress of optimize in percentage
or strings like RUNNING/COMPLETED/FAILED.
Is there such an api?
Yes, optimize() in solrj api is a sync method. Here is what I used to monitor the optimization progress.
CloudSolrClient client = null;
try {
client = new CloudSolrClient(zkClientUrl);
client.setDefaultCollection(collectionName);
m_logger.info("Explicit optimize of collection " + collectionName);
long optimizeStart = System.currentTimeMillis();
UpdateResponse optimizeResponse = client.optimize();
for (Object object : optimizeResponse.getResponse()) {
m_logger.info("Solr optimizeResponse" + object.toString());
}
if (optimizeResponse != null) {
m_logger.info(String.format(
" Elapsed Time(in ms) - %d, QTime (in ms) - %d",
optimizeResponse.getElapsedTime(),
optimizeResponse.getQTime()));
}
m_logger.info(String.format(
"Time spent on Optimizing a collection %s :"
+ (System.currentTimeMillis() - optimizeStart)
/ 1000 + " seconds", collectionName));
} catch (Exception e) {
m_logger.error("Failed during explicit optimize on collection "
+ collectionName, e);
} finally {
if (client != null) {
try {
client.close();
} catch (IOException e) {
throw new RuntimeException(
"Failed to close CloudSolrClient connection.", e);
}
client = null;
}
}

Categories

Resources