How to scan a table in DynamoDB with time period? - java

I have a table in DynamoDB and it has an attribute 'createDate' and I want to do a scan using a filter in a specific period of that attribute (for example: 2022-01-01 to 2022-01-31) but I don't know exactly if it's possible and how to do. If anyone has done this and can help me it would be very helpful.
just one more question: is it possible to put the result in a CSV file?
Here is my code where I can scan with a single date:
public class QueryTableResearchAnswers {
static AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
static DynamoDB dynamoDB = new DynamoDB(client);
static String tableName = "research-answers";
public static void main(String[] args) throws Exception {
String researchAnswers = "Amazon DynamoDB";
findAnswersWithinTimePeriod(researchAnswers);
//findRepliesPostedWithinTimePeriod(researchAnswers);
}
private static void findAnswersWithinTimePeriod(String researchAnswers) {
Table table = dynamoDB.getTable(tableName);
Map<String, Object> expressionAttributeValues = new HashMap<String, Object>();
expressionAttributeValues.put(":startDate", "2022-01-01T00:00:00.0Z" );
ItemCollection<ScanOutcome> items = table.scan("createDate between > startDate", // FilterExpression
"bizId, accountingsessionid, accounttype, acctsessionid, choicecode, contextname, createDate, document, framedipaddress," +
"macaddress, macaddressnetworkdata, machash, mail, nasgrelocalip, nasidentifier, nasipaddress, nasportid, network, networktype, networkuuid, phone," +
"question, questionanswer, questioncode, realm, relayingmacaddress, remoteipaddress, useragent, username", // ProjectionExpression
null, // ExpressionAttributeNames - not used in this example
expressionAttributeValues);
System.out.println("Scan of " + tableName + " for january answers");
Iterator<Item> iterator = items.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next().toJSONPretty());
}
}

In general, for an arbitrary date range:
createDate BETWEEN :date1 AND :date2
But, in your specific case of 2022-01-01 to 2022-01-31 (the entire month of January), you can simplify this to:
beginsWith(createDate, "2022-01")

Related

Spring batch - JdbcPagingItemReader, SQLServerException: The column name XYZ is not valid

I have a job that reads from SQL Server database list of documents. Documents needs to be in some status and sorted by column status_updated_time.
I want to read an document.id and then to process it in job processor as Driving Query Based ItemReaders.
Column status is changed in writer, so I can't use JpaPagingItemReader because of this problem.
I used JdbcPagingItemReader but got an error on sorting by status_updated_time.
Then I tried to add and id to sorting, but that didn't help.
Query that I want to get is:
SELECT id
FROM document
WHERE status IN (0, 1, 2)
ORDER BY status_updated_time ASC, id ASC
My reader:
#StepScope
#Bean
private ItemReader<Long> statusReader() {
JdbcPagingItemReader<Long> reader = new JdbcPagingItemReader<>();
...
reader.setRowMapper(SingleColumnRowMapper.newInstance(Long.class));
...
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("status_updated_time", Order.ASCENDING);
sortKeys.put("id", Order.ASCENDING);
SqlServerPagingQueryProvider queryProvider = new SqlServerPagingQueryProvider();
queryProvider.setSelectClause(SELECT_CLAUSE);
queryProvider.setFromClause(FROM_CLAUSE);
queryProvider.setWhereClause(WHERE_CLAUSE);
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
...
return reader;
}
Where constants are:
private static final String SELECT_CLAUSE = "id";
private static final String FROM_CLAUSE = "document";
private static final String WHERE_CLAUSE = "status IN (0, 1, 2) ";
When job is executed I get error:
org.springframework.dao.TransientDataAccessResourceException: StatementCallback; SQL [SELECT TOP 10 id FROM document WHERE status IN (0, 1, 2) ORDER BY id ASC, status_updated_time ASC]; The column name status_updated_time is not valid.; nested exception is com.microsoft.sqlserver.jdbc.SQLServerException: The column name status_updated_time is not valid.
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:110)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1443)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:388)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:452)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:462)
at org.springframework.batch.item.database.JdbcPagingItemReader.doReadPage(JdbcPagingItemReader.java:210)
at org.springframework.batch.item.database.AbstractPagingItemReader.doRead(AbstractPagingItemReader.java:108)
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:92)
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:94)
at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:87)
at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration(SimpleChunkProvider.java:119)
I saw some question regarding The column name XYZ is not valid on stack overflow (this...) but haven't seen anything that works in my case where I need to sort by another column.
Another problem is sorting columns order.
No matter if I first add status_updated_time or id to the map sorting in generated script is always ORDER BY id ASC, status_updated_time ASC.
EDIT:
Reading this question, specially this line:
JdbcPagingItemReader assumes here that the sort key and the column in the select clause are called exactly the same
I realized that I need column status_updated_time in result set, so I refactored:
private static final String SELECT_CLAUSE = "id, status_updated_time";
...
queryProvider.setSelectClause(SELECT_CLAUSE);
...
reader.setRowMapper(
(rs, i) -> {
Document document = new Document();
document.setId(rs.getLong(1));
document.setStatusUpdatedTime(rs.getObject(2, Timestamp.class));
return document;
}
);
Now application can compile and job can run.
But, problem with sorting stay the same. I can't order by first status_updated_time and then id. id always comes first.
I tried to remove id from sorting and came to another problem.
On test env. I had 1600 rows to process. My job process row and update status_updated_time to now(). When job started processing he didn't stop at 1600, but continue processing because each row got new status_updated_time and reader consider it it new row, and kept processing endlessly.
When sort only by id job processed 1600 rows and then stopped.
So it seems like I can't use JdbcPagingItemReader because of sorting problem.
And I wanted some reader that can run in parallel to speed up this job (it runs about 20 minutes each hour in a day).
Any suggestions?
I want to thank Mahmoud for monitoring Spring Batch question and trying to help. But his proposal didn't helped me so I used different approach.
I used temporary (auxiliary) table to prepare data for main step execution and in the main step reader is reading from that table.
First step will drop help table:
#Bean
private Step dropHelpTable() {
return stepBuilderFactory
.get(STEP_DROP_HELP_TABLE)
.transactionManager(cronTransactionManager)
.tasklet(dropHelpTableTasklet())
.build();
}
private Tasklet dropHelpTableTasklet() {
return (contribution, chunkContext) -> {
jdbcTemplate.execute(DROP_SCRIPT);
return RepeatStatus.FINISHED;
};
}
private static final String STEP_DROP_HELP_TABLE = "dropHelpTable";
private static final String DROP_SCRIPT = "IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES "
+ "WHERE TABLE_NAME = 'query_document_helper') "
+ "BEGIN "
+ " DROP TABLE query_document_helper "
+ "END";
Second step will prepare data. Insert id's of document which will be processed later:
#Bean
private Step insertDataToHelpTable() {
return stepBuilderFactory
.get(STEP_INSERT_HELP_TABLE)
.transactionManager(cronTransactionManager)
.tasklet(insertDataToHelpTableTasklet())
.build();
}
private Tasklet insertDataToHelpTableTasklet() {
return (contribution, chunkContext) -> {
jdbcTemplate.execute("SELECT TOP " + limit + " id " + INSERT_SCRIPT);
return RepeatStatus.FINISHED;
};
}
private static final String STEP_INSERT_HELP_TABLE = "insertHelpTable";
private static final String INSERT_SCRIPT = "INTO query_document_helper "
+ "FROM dbo.document "
+ "WHERE status IN (0, 1, 2) "
+ "ORDER BY status_updated_time ASC";
#Value("${cron.batchjob.queryDocument.limit}")
private Integer limit;
After this I have all the data that will be used in one job execution so ordering by status_updated_time is not longer needed (condition was not to process youngest document in this job execution, but in some later execution when they become oldest).
And then in the next step I use regular reader.
#Bean
private Step queryDocumentStep() {
return stepBuilderFactory
.get(STEP_QUERY_NEW_DOCUMENT_STATUS)
.transactionManager(cronTransactionManager)
.<Long, Document>chunk(chunk)
.reader(documentReader())
...
.taskExecutor(multiThreadingTaskExecutor.threadPoolTaskExecutor())
.build();
}
#StepScope
#Bean
private ItemReader<Long> documentReader() {
JdbcPagingItemReader<Long> reader = new JdbcPagingItemReader<>();
reader.setDataSource(coreBatchDataSource);
reader.setMaxItemCount(limit);
reader.setPageSize(chunk);
...
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("id", Order.ASCENDING);
SqlServerPagingQueryProvider queryProvider = new SqlServerPagingQueryProvider();
queryProvider.setSelectClause(SELECT_CLAUSE);
queryProvider.setFromClause(FROM_CLAUSE);
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
...
return reader;
}
private static final String STEP_QUERY_NEW_DOCUMENT_STATUS = "queryNewDocumentStatus";
private static final String SELECT_CLAUSE = "id";
private static final String FROM_CLAUSE = "query_archive_document_helper";
And job looks like this:
#Bean
public Job queryDocumentJob() {
return jobBuilderFactory
.get(JOB_QUERY_DOCUMENT)
.incrementer(new RunIdIncrementer())
.start(dropHelpTable())
.next(insertDataToHelpTable())
.next(queryDocumentStep())
.build();
}
private static final String JOB_QUERY_DOCUMENT = "queryDocument";

How to use BeanMapHandler with field to column mapping with key as ID in a nested SELECT query?

I have a nested SQL query to fetch employee details using their ID.
Right now I am using BeanListHandler to fetch data as a List<Details> but want to store it as a Map<String, Details> where the ID I originally pass needs to be the key for easy retrieval instead of searching the List with streams every time.
I have tried to convert to Maps but I am not sure of how to map the ID as String nor how to get the original ID passed to the inner Query as a column in the final result.
MainTest.java:
String candidateId = "('1111', '2222', '3333', '4444')";
String detailsQuery =
"select PARTNER, BIRTHDT, XSEXM, XSEXF from \"schema\".\"platform.view/table2\" where partner IN \r\n"
+ "(select SID from \"schema\".\"platform.view/table1\" where TYPE='BB' and CLASS='yy' and ID IN \r\n"
+ "(select SID from \"schema\".\"platform.view/table1\" where TYPE='AA' and CLASS='zz' and ID IN"
+ candidateId + "\r\n" + "))";
Map<String, Details> detailsView = queryRunner.query(conn, detailsQuery, new DetailsViewHandler());
Details.java:
public class Details {
private String candidateId;
private String birthDate;
private String maleSex;
private String femaleSex;
// getter and setter
}
DetailsViewHandler.java:
public class DetailsViewHandler extends BeanMapHandler<String, Details> {
public DetailsViewHandler() {
super(Details.class, new BasicRowProcessor(new BeanProcessor(getColumnsToFieldsMap())));
}
public static Map<String, String> getColumnsToFieldsMap() {
Map<String, String> columnsToFieldsMap = new HashMap<>();
columnsToFieldsMap.put("PARTNER", "candidateId");
columnsToFieldsMap.put("BIRTHDT", "birthDate");
columnsToFieldsMap.put("XSEXM", "maleSex");
columnsToFieldsMap.put("XSEXF", "femaleSex");
return columnsToFieldsMap;
}
}
Is there a way to get the ID (candidateId) in the result and what am I missing in terms of creating the key-value pairing ?
From the doc https://commons.apache.org/proper/commons-dbutils/apidocs/org/apache/commons/dbutils/handlers/BeanMapHandler.html
of constructor which you are using
public BeanMapHandler(Class<V> type,
RowProcessor convert)
// Creates a new instance of BeanMapHandler. The value of the first column of each row will be a key in the Map.
Above should work.
You can also try overriding createKey like so
protected K createKey(ResultSet rs)
throws SQLException {
return rs.getString("PARTNER"); // or getInt whatever suits
}

Java BigQuery API to list table data

I am trying to list the table data from BigQuery using JAVA. However I am not able to find how to configure API to get maximum rows per call?
public class QuickstartSample {
public static void main(String... args) throws Exception {
GoogleCredentials credentials;
File credentialsPath = new File("/Users/gaurang.shah/Downloads/fb3735b731b9.json"); // TODO: update to your key path.
FileInputStream serviceAccountStream = new FileInputStream(credentialsPath);
credentials = ServiceAccountCredentials.fromStream(serviceAccountStream);
BigQuery bigquery = BigQueryOptions.newBuilder().
setCredentials(credentials).
setProjectId("bigquery-public-data").
build().
getService();
Dataset hacker_news = bigquery.getDataset("hacker_news");
Table comments = hacker_news.get("comments");
TableResult result = comments.list().;
for (FieldValueList row : result.iterateAll()) {
// do something with the row
System.out.println(row);
}
}
}
To limit the number of rows you can use listTableData method with TableDataListOption.pageSize(n) parameter.
Following example returns 100 rows as the result:
String datasetName = "my_dataset_name";
String tableName = "my_table_name";
TableId tableIdObject = TableId.of(datasetName, tableName);
TableResult tableData =
bigquery.listTableData(tableIdObject, TableDataListOption.pageSize(100));
for (FieldValueList row : tableData.iterateAll()) {
// do something with the row
}

Is It possible to change value of Range key in DynamoDB Table?

I know it may be a very silly question, but I am new to DynamoDB.
My doubt is is it possible to update the value of a Range Key in DynamoDB.
Suppose My Table is "TEST"
{
ID : PK/HK
Date : RK
Name : GSI
Add : LSI
}
I want to modify Date Attribute.
Initial Values in Table was:
{
ID = "344"
Date = "5656"
Name = "ABC"
}
Running this code below. I am able to change the Name Attribute which is GSI.
Map<String,AttributeValue> item = new HashMap<String,AttributeValue>();
item.put("ID", new AttributeValue("344"));
item.put("Date", new AttributeValue("5656"));
Map<String,AttributeValueUpdate> item1 = new HashMap<String,AttributeValueUpdate>();
AttributeValueUpdate update = new AttributeValueUpdate().withValue(new AttributeValue("AMIT")).withAction("PUT");
item1.put("Name", update);
UpdateItemRequest updateItemreq = new UpdateItemRequest("Test",item,item1);
UpdateItemResult updateItemres = dynamoDBUSEast.updateItem(updateItemreq);
But When I change this line
item1.put("Name", update);
with
item1.put("Date", update);
I am getting some error as
Exception in thread "main" com.amazonaws.AmazonServiceException: One or more parameter values were invalid: Cannot update attribute Date. This attribute is part of the key (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: HRRP24Q7C48AMD8ASAI992L6MBVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:820)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:439)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:245)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2908)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.updateItem(AmazonDynamoDBClient.java:1256)
So Is it possible to change the range Key value?
No, like the exception message states, you Cannot update attribute Date. This attribute is part of the key.
You can also see this under the AttributeUpdates documentation:
The names of attributes to be modified, the action to perform on each,
and the new value for each. If you are updating an attribute that is
an index key attribute for any indexes on that table, the attribute
type must match the index key type defined in the AttributesDefinition
of the table description. You can use UpdateItem to update any nonkey
attributes.
The documentation states that you can update any attribute for "an attribute that is an index key attribute for any indexes on that table", which means that when you update an attribute that is projected onto an index, even it is is part of that indexes key, that index will also be updated to reflect the original item.
From the docs of AttributeValueUpdate
You cannot use UpdateItem to update any primary key attributes.
Instead, you will need to delete the item, and then use PutItem to
create a new item with new attributes.
It's a little buried but in docs for UpdateItem it says:
"You can use UpdateItem to update any nonkey attributes."
So, currently the only way to update the primary key of an item is to delete the old item and write a new one.
Here is my implementation of updating id in .net by deleting the item and then recreating it with the new id. I assume java is very similar:
// Based on https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LowLevelDotNetItemsExample.html
public class UpdateId
{
private static string tableName = "MyTableName";
private static AmazonDynamoDBClient client = new AmazonDynamoDBClient();
private static bool isVerbose = false;
public static void ChangeId(string currentId, string newId)
{
try
{
var deletedItem = DeleteItem(currentId);
if (deletedItem.Count == 0)
{
Console.WriteLine($"ERROR: Item to delete not found: {currentId}");
return;
}
deletedItem["Id"] = new AttributeValue
{
S = newId
};
CreateItem(deletedItem);
var updatedItem = RetrieveItem(newId);
if (updatedItem.Count > 0 && updatedItem["Id"].S == newId)
{
Console.WriteLine($"Item id successfully changed from ({currentId}) to ({newId})");
}
else
{
Console.WriteLine($"ERROR: Item id didn't change from ({currentId}) to ({newId})");
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
Console.WriteLine("To continue, press Enter");
Console.ReadLine();
}
}
private static void CreateItem(Dictionary<string, AttributeValue> item)
{
var request = new PutItemRequest
{
TableName = tableName,
Item = item
};
client.PutItem(request);
}
private static Dictionary<string, AttributeValue> RetrieveItem(string id)
{
var request = new GetItemRequest
{
TableName = tableName,
Key = new Dictionary<string, AttributeValue>()
{
{ "Id", new AttributeValue {
S = id
} }
},
ConsistentRead = true
};
var response = client.GetItem(request);
// Check the response.
var attributeList = response.Item; // attribute list in the response.
if (isVerbose)
{
Console.WriteLine("\nPrinting item after retrieving it ............");
PrintItem(attributeList);
}
return attributeList;
}
private static Dictionary<string, AttributeValue> DeleteItem(string id)
{
var request = new DeleteItemRequest
{
TableName = tableName,
Key = new Dictionary<string, AttributeValue>()
{
{ "Id", new AttributeValue {
S = id
} }
},
// Return the entire item as it appeared before the update.
ReturnValues = "ALL_OLD",
// ExpressionAttributeNames = new Dictionary<string, string>()
// {
// {"#IP", "InPublication"}
// },
// ExpressionAttributeValues = new Dictionary<string, AttributeValue>()
// {
// {":inpub",new AttributeValue {
// BOOL = false
// }}
// },
// ConditionExpression = "#IP = :inpub"
};
var response = client.DeleteItem(request);
// Check the response.
var attributeList = response.Attributes; // Attribute list in the response.
// Print item.
if (isVerbose)
{
Console.WriteLine("\nPrinting item that was just deleted ............");
PrintItem(attributeList);
}
return attributeList;
}
private static void PrintItem(Dictionary<string, AttributeValue> attributeList)
{
foreach (KeyValuePair<string, AttributeValue> kvp in attributeList)
{
string attributeName = kvp.Key;
AttributeValue value = kvp.Value;
Console.WriteLine(
attributeName + " " +
(value.S == null ? "" : "S=[" + value.S + "]") +
(value.N == null ? "" : "N=[" + value.N + "]") +
(value.SS == null ? "" : "SS=[" + string.Join(",", value.SS.ToArray()) + "]") +
(value.NS == null ? "" : "NS=[" + string.Join(",", value.NS.ToArray()) + "]")
);
}
Console.WriteLine("************************************************");
}
}
To call it just do this:
UpdateId.ChangeId("OriginalId", "NewId");

Iterartor - How to traverse for specific records

I have executed a query using JDBC and traversing the resultset I have stored all fields in List in java.
List<String> dataList=new ArrayList<String>();
while(res.next())
{
dataList.add(res.getString(1));
dataList.add(res.getString(2));
dataList.add(res.getString(3));
dataList.add(res.getString(4));
dataList.add(res.getString(5));
dataList.add(res.getString(6));
dataList.add(res.getString(7));
}
Iterator<String> it= dataList.iterator();
As I have added directly into list so how can I get this 7 fields while traversing the iterator.
Means:
while(it.hasNext())
{
String f1=it.next();
}
Like wise everytime I want 7 fields at a time
and next 7, next 7....... so on
Using this while loop how can I get those 7 fields (one row in table having 7 field) at a time.
I get little bit confuse here. Please help me.
Thanks
What you want to do is actually create another object that stores all seven of the values.
Then create a list of these entries so that you can access one row at a time, which is what I think you are asking.
First create a class for the row.
private static class Entry {
String[] row;
public Entry ( ResultSet r ) {
row = new String [ 7 ];
for (int i = 1; i <= 7; i++) {
row[i] = r.getString(i);
}
}
}
Using that, you can then create a list of Entry objects.
List<Entry> entryList = new ArrayList <Entry> ();
while(res.next())
{
entryList.add ( new Entry ( res ) );
}
Then, you can go ahead and loop through entryList and get any specific entry you would want.
Of course, if you have specific values, it might be wise to create instance variables of type String for Entry rather than an array of Strings.
By that I mean you could do this:
private static class Entry {
String column1; // rather than name column1 use what the column semantically represents
String column2;
// ...
public Entry ( ResultSet r ) {
column1 = r.getString(1);
// ...
}
This way, you can also calls like r.getInt(i) for certain columns which have an different type other than String.
Good luck!
I think your List declaration should be
List<Any DAO Object> instead of List<String>
While fetching from resultset, create a DAO object, add all fetched data into that object and then add that object into the list.
Then you can iterate and get each DAO object at each iteration.
You can use DatabaseMetaData class,
private static final String DRIVER = "com.mysql.jdbc.Driver";
private static final String URL = "jdbc:mysql://localhost/testdb";
private static final String USERNAME = "root";
private static final String PASSWORD = "";
public static void main(String[] args) throws Exception {
Class.forName(DRIVER);
Connection connection = DriverManager.getConnection(URL, USERNAME, PASSWORD);
DatabaseMetaData metadata = connection.getMetaData();
ResultSet resultSet = metadata.getColumns(null, null, "users", null);
while (resultSet.next()) {
String name = resultSet.getString("COLUMN_NAME");
String type = resultSet.getString("TYPE_NAME");
int size = resultSet.getInt("COLUMN_SIZE");
System.out.println("Column name: [" + name + "]; type: [" + type + "]; size: [" + size + "]");
}
connection.close();
}

Categories

Resources