How to I make sure my N1QL Query considers recent changes? - java

My situation is that, given 3 following methods (I used couchbase-java-client 2.2 in Scala. And Version of Couchbase server is 4.1):
def findAll() = {
bucket.query(N1qlQuery.simple(select("*").from(i(DatabaseBucket.USER))))
.allRows().toList
}
def findById(id: UUID) = {
Option(bucket.get(id.toString, classOf[RawJsonDocument])).map(i => read[User](i.content()))
}
def upsert(i: User) = {
bucket.async().upsert(RawJsonDocument.create(i.id.toString, write(i)))
}
Basically, they are insert, find one by id and findAll. I did an experiment where :
I insert a User, then find one by findById right after that, I got a user that I have inserted correctly.
I insert and then I use findAll right after that, it returns empty.
I insert, put 3 seconds delay and then I use findAll, I can find the one that I have inserted.
By that, I suspected that N1qlQuery only search over cached layer rather than "persist" layer. So, how can I force to let it search on "persist" layer?

In Couchbase 4.0 with N1QL, there are different consistency levels you can specify when querying which correspond to different cost for updates/changes to propagate through index recalculation. These aren't tied to whether or not data is persisted, but rather it's an option when you issue the query. The default is "not bounded" and to make sure that your upsert request is taken into consideration, you'll want to issue this query as "request plus".
To get the effect you're looking for, you'll want to add N1qlPararms on your creation of the N1qlQuery by using another form of the simple() method. Add a N1qlParams with ScanConsistency.REQUEST_PLUS. You can read more about this in Couchbase's Developer Guide. There's a Java API example of this. With that change, you won't need to have a sleep() in there, the system will automatically service the query request once the index recalculation has gotten to your specified level.
Depending on how you're using this elsewhere in your application, there are times you may want either consistency level.

You need stronger scan consistency. Add a N1qlParam to the query, using consistency(ScanConsistency.REQUEST_PLUS)

Related

Retrieve order by status using Salesforce REST API

I have an Saleforce app that allows me to execute REST API calls, and I need to retrieve orders (/services/data/v47.0/sobjects/Order) by status.
I've found some manual that describes similar filtering on another entitiy (https://developer.salesforce.com/docs/atlas.en-us.api_placeorder.meta/api_placeorder/sforce_placeorder_rest_api_standalone.htm).
However when trying to execute followin request, it seems that all statuses returned:
GET /services/data/v47.0/sobjects/Order?order.status='ddd'
I also tried some variations of query params. Is this functionality supported?
/sobjects service will let you learn dynamically what fields (standard and custom) exist in Order table (or any other really), what types they are, picklist values...
To retrieve actual data you can use query resource. (Salesforce uses a dialect of SQL, called SOQL. If you've never used it before it'll look bit weird the moment you want to do any JOINs, would be nice if a SF developer would fill you in)
This might be a good start
/services/data/v47.0/query/?q=SELECT Id, Name, OrderNumber FROM Order WHERE Status = 'Draft' LIMIT 10
Never seen the API you've linked to, interesting stuff. But I don't see anything obvious that would let you filter by status there so the more generic "query anything you wish" might work better for you. Play a bit and perhaps https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_query.htm will suit your needs more?

Indexes update in DynamoDB

I've been working with LSI and GSI in DynamoDB, but I guess I'm missing something.
I created an index to always query by the latest results without using the partition key only other attributes and without reading the entire items, only those that really matter, but with the GSI at some point my query returns data that are not up-to-date; this I understand due to the fact of eventual consistence described in the docs (You may correct me if I'm wrong).
And what about LSI? Even using the ConsistentRead, at some point my data is not being queried correctly, and the results are not up-to-date. From the docs I thought that LSI where updated syncronized with its table and with the ConsistentRead property set I'd always get the latest results, but this is not happening.
I'm using a REST endpoint (API Gateway) to perform inserts into my dynamo table (I perform some treatments before the insertion) so, I've been wondering if this has something to do with it: maybe the code (currently Java) or DynamoDB is slow to update but since in my endpoint everything seems to work fine I perform another request too fast or if I have to wait a little longer to interact with the table because the index is being updated, although I have already tested waiting longer I'm receiving the same wrong results. I'm a bit lost here.
This is the code I'm using to query the index:
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("#c = :v_attrib1 and #e = :v_attrib2")
.withNameMap(new NameMap()
.with("#c", "attrib1")
.with("#e", "attrib2"))
.withValueMap(new ValueMap()
.withString(":v_attrib1", attrib1Value)
.withString(":v_attrib2", attrib2Value))
.withMaxResultSize(1) // to only bring the latest one
.withConsistentRead(true) // is this wrong?
.withScanIndexForward(false); // what about this one?
I don't know if the maven library version would interfere, but in any case the version I'm using is the 1.11.76 (I know there are a lot of newer versions, but if this is the problem we'll update it then).
Thank you all in advance.
After searching for quite some time and some other tests I, finally, figured out that the problem was not in DynamoDB indexes, they are working as expected, but in the Lambda functions.
The fact that I was sending a lot of requests, one after another, was not giving the indexes time to remain updated: Lambda functions execute asynchronously (I should have known), and so the requests received by the database were not ordered and my data was not properly updated. So, we changed our implementation to use Atomic Counters: we can keep our data updated no matter the number or the order of the requests.
See: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html#WorkingWithItems.AtomicCounters

How to validate if a record exists when issuing a REST update request using spring jdbcTemplate?

I have a simple database table users with 3 columns:
| id | username | nationality |
| 1 | John | American |
| 2 | Doe | English |
I want to issue an update via a POST request to http://mysite/users/2/nationality
Now my initial approach was to do a single query
UPDATE users SET nationality="French" WHERE id=2; followed by a query for the updated object SELECT * FROM users WHERE id=2; then return the updated object in the response.
The problem is the id passed in the request may not exist in my database. How should I validate if a user exists in the database?
Should I just check if the query returns an object?
Should I validate the update first for the affected rows (affected
rows will be zero if the no change was made to the data to be
updated so I can't throw a UserNotFoundException in that case)?
Is it better to issue a query before the update just to check if the
row exists then update then query the updated row?
public void updateRecord(Long id, String username) {
String updateSql = "UPDATE users SET username = ? WHERE id = ?";
JdbcTemplate template = new JdbcTemplate(dataSource);
Object[] params = { username, id};
int[] types = {Types.VARCHAR, Types.BIGINT};
int rows = template.update(updateSql, params, types);
System.out.println(rows + " row(s) updated.");
}
If you always need the update to return the updated object in the response, then option 1 seems like a reasonable way to check if the update matched an existing user. Although if you aren't using transactions, you should be aware that the user may not exist at the time of the update, but a separate connection could insert the user before your select.
That said, without transactions there is always a chance that the select will return the object in a different state from the update you just performed. It is slightly worse in this case, though, because technically the update should have failed.
If you don't need the update to return the updated object in the response, then options 2 seems like a much better solution. For this to work, though, you need the update to return the number of matched rows rather than the number of changed rows (so if the update matches an existing user, but the field you are updating doesn't change, you'll still get a non-zero result).
Usually you would have to set a connection attribute to make this work for MySQL (for example, in PHP's PDO driver there is the MYSQL_ATTR_FOUND_ROWS attribute). However, my understanding is that this option is already enabled in JDBC so executeUpdate should return the number of matched rows. I can't confirm that at the moment, but it should be easy enough for you to test.
Best approach in your case is to run a select query against given id in order to verify that a corresponding record exists in your database. If record exists, then you can proceed with success flow and run the update and the select queries you mentioned above. Otherwise, if record does not exists, then you can proceed with the failure flow (throw exceptions etc.)
I'd go with option 3 for the reusability and sanity aspect, unless you're worried about a couple of extra queries for (very strict and not obvious) performance reasons.
Since you'll likely reuse the code to retrieve the user in other places, I'd first retrieve the user, and return a 404 if he's not found. Then I'd call update and sanity check the number of rows changed. Finally, I'd call the retrieval method to get the user and marshall it into the response body. It's simple, it works, it's readable, it's predictable, it's testable, and it's most likely fast enough. And you've just reused your retrieval method.
I had a similar issue. Here is how I tackled the issue
Whenever its a new user, then mark the id with a default number (eg. 51002122) here 51002122 is never the id in db. So the page shows "/51002122/user". When ever the id of the user is 51002122 then I would do an insert to db. After the insert, I render the page with the id from db. Eg. after insertion, the page would be "/27/user".
For all other ids other than 51002122 ( eg. /12/user or /129/user ) I would do an update in the db because I know that this user exists in the db.
Not sure if this is the right approach but this works. Can someone tell a better or correct approach.
I think that the most safe way is:
SELECT EXISTS(
SELECT *
FROM users
WHERE id = 3 ) as columnCount;
This will return the number of rows that have id=3. Then you can return this and check if the columnCount is 1, then execute the update statement else do something else.
Before arriving at a solution, few things needs to be considered.
This is a REST API call that calls for simplicity from the usage perspective.
The server side code should also consider the performance implication of the implementation chosen.
The API should be robust. Meaning, come what may, the request should always take the flows(happy/exception) conceived in design.
Based on these considerations, I would suggest the following approach.
In the DAO, define two different methods, namely updateRecord(Long id, String username) and getRecord(Long id).
Mark transaction attribute (#Transaction) to these method as follows
Mark transaction attribute for updateRecord as REQUIRED.
Mark transaction attribute for getRecord as NOT_REQUIRED since this is purly a read call.
Note that in all the cases, at least on DB call is required.
From controller, call updateRecord method first. This method will return an integer.
If the returned value is nonzero, then call getRecord to retrieve the updated record from database.
If the returned value is zero, that indicates the user does not exists and no need to call getRecord. An appropriate error response (404 Not Found ) to the returned to the calling client.
In this approach, you will save on one database call when the user does not exist.
Overall this approach is neat, less cluttered and most importantly simple and efficient(We are limiting transaction boundary only for update call). Moreover getRecord can be used independently as another API to retrieve a record (without transaction).
I had similar issue where I was supposed to update a table but before that need to check if the id exists or not. I used openjpa and wrote method verifyUser(id) where id was the one which i need to check. OpenJpa findById returns u the record. On update it returns you the complete record whereas on add it returns you the new primary key by which the record is added. I am not sure how it works for hibernate but there are many similarities in jpa n hibernate.

Handling very large amount of data in MyBatis

My goal is actually to dump all the data of a database to an XML file. The database is not terribly big, it's about 300MB. The problem is that I have a memory limitation of 256MB (in JVM) only. So obviously I cannot just read everything into memory.
I managed to solve this problem using iBatis (yes I mean iBatis, not myBatis) by calling it's getList(... int skip, int max) multiple times, with incremented skip. That does solve my memory problem, but I'm not impressed with the speed. The variable names suggests that what the method does under the hood is to read the entire result-set skip then specified record. This sounds quite redundant to me (I'm not saying that's what the method is doing, I'm just guessing base on the variable name).
Now, I switched to myBatis 3 for the next version of my application. My question is: is there any better way to handle large amount of data chunk by chunk in myBatis? Is there anyway to make myBatis process first N records, return them to the caller while keeping the result set connection open so the next time the user calls the getList(...) it will start reading from the N+1 record without doing any "skipping"?
myBatis CAN stream results. What you need is a custom result handler. With this you can take each row separately and write it to your XML file. The overall scheme looks like this:
session.select(
"mappedStatementThatFindsYourObjects",
parametersForStatement,
resultHandler);
Where resultHandler is an instance of a class implementing the ResultHandler interface. This interface has just one method handleResult. This method provides you with a ResultContext object. From this context you can retrieve the row currently being read and do something with it.
handleResult(ResultContext context) {
Object result = context.getResultObject();
doSomething(result);
}
No, mybatis does not have full capability to stream results yet.
EDIT 1:
If you don't need nested result mappings then you could implement a custom result handler to stream results. on current released versions of MyBatis. (3.1.1) The current limitation is when you need to do complex result mapping. The NestedResultSetHandler does not allow custom result handlers. A fix is available, and it looks like is currently targeted for 3.2. See Issue 577.
In summary, to stream large result sets using MyBatis you'll need.
Implement your own ResultSetHandler.
Increase fetch size. (as noted below by Guillaume Perrot)
For Nested result maps, use the fix discussed on Issue 577. This fix also resolves some memory issues with large result sets.
I have successfully used MyBatis streaming with the Cursor. The Cursor has been implemented on MyBatis at this PR.
From the documentation it is described as
A Cursor offers the same results as a List, except it fetches data
lazily using an Iterator.
Besides, the code documentation says
Cursors are a perfect fit to handle millions of items queries that
would not normally fits in memory.
Here is an example of implementation I have done and which I was able to successfully use it:
import org.mybatis.spring.SqlSessionFactoryBean;
// You have your SqlSessionFactory somehow, if using Spring you can use
SqlSessionFactoryBean sqlSessionFactory = new SqlSessionFactoryBean();
Then you define your mapper, e.g., UserMapper with the SQL query that returns a Cursor of your target object, not a List. The whole idea is to not store all the elements in memory:
import org.apache.ibatis.annotations.Select;
import org.apache.ibatis.cursor.Cursor;
public interface UserMapper {
#Select(
"SELECT * FROM users"
)
Cursor<User> getAll();
}
Then you write the that code that will use an open SQL session from the factory and query using your mapper:
try(SqlSession sqlSession = sqlSessionFactory.openSession()) {
Iterator<User> iterator = sqlSession.getMapper(UserMapper.class)
.getAll()
.iterator();
while (iterator.hasNext()) {
doSomethingWithUser(iterator.next());
}
}
handleResult receives as many records as the query gets, no pause.
When there are too many records to process I used sqlSessionFactory.getSession().getConnection().
Then as, normal JDBC, get a Statement, get the Resultset, and process one by one the records. Don't forget to close the session.
If just dumping all the data without any ordering requirement from tables, why not directly do the pagination in SQL? Set a limit to the query statement, where specifying different record id as the offset, to separate the whole table into chunks, each of which could directly be read into memory if the rows limit is a reasonable number.
The sql could be something like:
SELECT * FROM resource
WHERE "ID" >= continuation_id LIMIT 300;
I think this could be viewed as an alternative solution for you to dump all the data by chunks, getting rid of the different feature problems in mybatis, or any Persistence layer, support.

When to 'IN' and when not to?

Let's presume that you are writing an application for a retail store chain. So, you would design your object model such that you would define 'Store' as the core business object and lots of supporting objects. Let's say 'Store' looks like follows:
class Store implements Validatable{
int storeNo;
int storeName;
... etc....
}
So, your client tells you that you have to import store schedule from a excel sheet into the application and you would have to run a series of validations on 'em. For instance, 'StoreIsInSameCountry';'StoreIsValid'... etc. So, you would design a Rule interface for checking all business conditions. Something like this:
interface Rule T extends Validatable> {
public Error check(T value) throws Exception;
}
Now, here comes the question. I am uploading 2000 stores from this excel sheet. So, I would end up running each rule defined for a store that many times. If I were to have 4 rules = 8000 queries to the database, i.e, 16000 hits to the connection pool. For a simple check where I would just have to check whether the store exists or not, the query would be:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID = ?
That way I would obtain get my 'Store' object. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
Alternatively, I could just do:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID in (1,2,3..... )
This query would actually return much faster than doing the one above it 2000 times.
However, it doesn't go well with the design that a Rule can be run for a single store only.
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
What would you do if you were in my shoes, and what is the best practice?
That way I would obtain get my 'Store' object from the database. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
This is what you should not do.
Create a temporary table, fill the table with your values and JOIN this table, like this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM temptable tt
JOIN STORE s
ON s.STORE_ID = t.id
or this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM STORE s
WHERE s.STORE_ID IN
(
SELECT id
FROM temptable tt
)
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
IN filters duplicates out.
If you want each eligible row to be selected for each duplicate value in the list, use JOIN.
IN is in no way a "not suggested methology".
In fact, there was a time when some databases did not support IN queries effciently, that's why folk wisdom still advices against using it.
But if your store_id is indexed properly (and it most probably is, if it's a PRIMARY KEY which it looks like), then all modern versions of major databases (that is Oracle, SQL Server, MySQL and PostgreSQL) will use an efficient plan to perform this query.
See this article in my blog for performance details in SQL Server:
IN vs. JOIN vs. EXISTS
Note, that in a properly designed database, validation rules are also set-based.
I. e. you implement your validation rules as queries against the temptable.
However, to support legacy rules, you can select values from temptable row-by-agonizing-row, apply the rules, and delete values which did not pass validation.
SELECT store_id FROM store WHERE store_active = 1
or even
SELECT store_id FROM store
will tell you all the active stores in a single query. You can now conduct the other tests on stores you know to exist, and you've saved yourself 1,999 hits to the database.
If you've got relatively uncontested database access, and no time constraint on how long the whole thing is going to take then you've no real need to worry about hitting the connection pool over and over again. That's what it's designed for, after all!
I think it's more of a business question with parameter of how often does the client run the import, how long would it take for you to implement either of the solution, and how expensive is your time per hour.
If it's something that runs once in a while, a bit of bad performance is acceptable in my opinion, especially if you can get the job done quick using clean code.
...a Rule can be run for a single store only.
Managing business rules along with performance is a tricky task, so there is a library ("Persistence Layer") that does exactly that. You define rules, then execute a bulk of commands, then the library fetch from DB whatever the rules require in a single query (by using temp tables rather than 'IN') and then passes it to the rules.
There is an example of a validator in here.

Categories

Resources