I'm currently facing very slow/ no response on a collection looking by ID. I have ~ 2 milion of documents in a partitioned collection. If lookup the document using the partitionKey and id the response is immediate
SELECT * FROM c WHERE c.partitionKey=123 AND c.id="20566-2"
if I try using only the id
SELECT * FROM c WHERE c.id="20566-2"
the response never returns, java client seems freezed and I have the same situation using the Data Explorer from Azure Portal. I tried also looking up by another field that isn't the id or the partitionKey and the response always returns. When I try the select from Java client I always set the flag to enable cross partition query.
The next thing to try is to avoid the character "-" in the ID to test if this character blocks the query (anyway I didn't find anything on the documentation)
The issue is related to your Java code. Due to Azure DocumentDB Java SDK wrapped the DocumentDB REST APIs, according to the reference of REST API Query Documents, as #DanCiborowski-MSFT said, the header x-ms-documentdb-query-enablecrosspartition explains your issue reason as below.
Header: x-ms-documentdb-query-enablecrosspartition
Required/Type: Optional/Boolean
Description: If the collection is partitioned, this must be set to True to allow execution across multiple partitions. Queries that filter against a single partition key, or against single-partitioned collections do not need to set the header.
So you need to set True to enable cross partition for querying across multiple partitions without a partitionKey in where clause via pass a instance of class FeedOption to the method queryDocuments, as below.
FeedOptions queryOptions = new FeedOptions();
queryOptions.setEnableCrossPartitionQuery(true); // Enable query across multiple partitions
String collectionLink = collection.getSelfLink();
FeedResponse<Document> queryResults = documentClient.queryDocuments(
collectionLink,
"SELECT * FROM c WHERE c.id='20566-2'", queryOptions);
Related
I have a Apache Camel route between two JPA endpoints:
from("jpa://Data").to("jpa://DataConverted");
I basically want to do two things: fetch and copy data from my Data entity table to a similar dataConverted entity table in another database, and mark my Data entities with data.hasBeenCopied(true), only after successfully copying it though.
My route looks as follows:
from("jpa://Data").process(ex -> {
Data data = ex.getIn().getBody(Data.class);
DataConverted dataConverted = convertData(data);
ex.getMessage().setBody(dataConverted);
})
.recipientList(constant("direct:DataConverted","direct:updateFlag")).end();
from("direct:DataConverted").to("jpa://DataConverted").end;
from("direct:updateFlag").process(ex -> {
DataConverted dataConverted = ex.getIn().getBody(DataConverted.class);
var originalData = myDao.getData(dataConverted.getId());
originalData.setHasBeenCopied(true);
}).to("jpa://Data).end();
This runs without error, however it isn't setting the flag in my original database!
What did work was to call data.setHasBeenCopied(true); in the first process directly after from("jpa://Data") - however, this means that the flag is already set and if something happens with the copy process (e.g. the target database isn't available) the route will crash but the flag will stay set for that one data entity.
Note that I haven't called transacted() on my route as that didn't work out for me (multiple interfering transactions were opened).
Any idea how to proceed? Is Camel unable to update existing data via .to()? I can add my Camel configurations of the endpoints and such if needed, but it would probably get a bit long.
I'm using the v2 aws DynamoDV Java sdk, and I want to limit the number of results that are returned when querying by the partition key (code snippet below), but the code below returns the full set of items back.
The java docs say "Note:The limit does not refer to the number of items to return, but how many items the database should evaluate while executing the query. Use limit together with Page.lastEvaluatedKey() and exclusiveStartKey in subsequent query calls to evaluate limit items per call." which seems to support the behavior I'm seeing.
However, How to set limit of matching items returned by DynamoDB using Java? has a solution using the .withMaxResultSize method in an earlier version of the sdk.
Does the java dynamodb v2 skd have something similar, or will I have to limit the result set manually?
Code looks like:
QueryConditional conditional = QueryConditional.keyEqualTo(
Key.builder()
.partitionValue(jobId)
.build()
);
QueryEnhancedRequest request = QueryEnhancedRequest.builder()
.queryConditional(conditional)
.limit(1)
.scanIndexForward(false)
.build();
Please read https://github.com/aws/aws-sdk-java-v2/issues/1951
You need to limit the number of pages to be returned from iterable. Example
PageIterable<MyMovie> myMovie = moviesTable.query(queryEnhancedRequest);
myMovie.items()
.stream()
.limit(2)
.forEach(content -> System.out.println(" Movie: %s (%s) \n", content.title, content.year));
This will fetch 2 pages and each page will have 1 item if you set limit(1) in the queryEnhancedRequest
As of NiFi 1.7.1, the new DBCPConnectionPoolLookup enables dynamic selection of database connections: set an attribute database.name on a FlowFile and when a consuming processor accesses a configured DBCPConnectionPoolLookup controller service, the content of that attribute will be used to get a connection through this lookup's configured properties, which contain a mapping of potential values to DBCPConnectionPool controller service.
I'd like to list the tables in each database that I've configured in the lookup, but the ListDatabaseTables processor does not accept incoming FlowFiles. This seems to mean that it's not usable for listing tables in a dynamic set of databases.
What is the best way to accomplish this?
ListDatabaseTables uses the JDBC API for getting table info from the metadata of an established JDBC connection. This hides the underlying method of how to actually get tables from a particular database.
If all your databases are of the same ilk, then if you have a list of databases, you could generate flow files with one per database, filling in the database.name attribute, then using ExecuteSQL with the DBCPConnectionPoolLookup to execute the corresponding SQL statement to get the tables for that database, such as SHOW TABLES. You can parse the records using any of the record-aware processors such as QueryRecord, UpdateRecord, ConvertRecord, etc. and if you need one table per flow file you can use SplitRecord. If the output is JSON or CSV or XML, you could use EvaluateJsonPath, ExtractText, or EvaluateXPath respectively to get the table name into an attribute, and continue on from there.
I wrote up NIFI-5519 to cover the proposal for ListDatabaseTables to optionally accept incoming connections, in the meantime you'd need 1 ListDatabaseTables instance to correspond to each of your DBCPConnectionPool instances.
I have the following data structure in Firebase Firestore to represent a many to many relationship between clients and users:
Clients
clientId1 {
users (object): {
userId1: true
userId2: true
}
}
clientId2 {
users (object): {
userId1: true
}
}
I query it on Android using the following query:
db.collection("clients").whereEqualTo("users."+uid, true);
For userId2, the query should only return clientId1.
If I set the rule to (allow read: if true;) when I execute the query above I get the correct clients returned.
I would also like to set up a database rule to prevent userId2 from seeing clientId2.
I tried this rule but I get no results returned:
match /clients/{clientId} {
//Allow read if the user exists in the user collection for this client
allow read: if users[request.auth.uid] == true;
}
I also tried:
match /clients/{clientId} {
//Allow read if the user exists in the user collection for this client
allow read: if resource.data.users[request.auth.uid] == true;
}
But neither of the above rules returns any clients.
How do I write the rule?
I am going to answer my own question as I was just doing something silly.
My data structure is fine and the correct syntax for my rule is this one:
match /clients/{clientId} {
//Allow read if the user exists in the user collection for this client
allow read: if resource.data.users[request.auth.uid] == true;
}
Given this:
Cloud Firestore evaluates a query against its potential result set
instead of the actual field values for all of your documents. If a
query could potentially return documents that the client does not have
permission to read, the entire request fails.
This Android query does correctly implement the right filter for the rule:
db.collection("clients").whereEqualTo("users."+uid, true);
I am yet to implement my adapter properly. I wanted to see if I could get the correct data structure / rules / query working first. I was calling it from another listener that was listening on the entire client collection (which fails the rule) and therefore this query was not being called. Earlier when I set the rule to (allow read: if true;) the initial listener was executing my query and returning the correct results. This lead me to believe my rule was incorrect, when it wasn't.
As per the official documentation regarding Firestore Security Rules:
When writing queries to retrieve documents, keep in mind that security rules are not filters—queries are all or nothing. To save you time and resources, Cloud Firestore evaluates a query against its potential result set instead of the actual field values for all of your documents. If a query could potentially return documents that the client does not have permission to read, the entire request fails.
So you cannot filter the documents that exist in your database using security rules.
I paginate through a large collection of data (circa 500 000 000 rows) using PagingState, and do some business intelligence during this process. To be able to resume the process I created this table...
/**
* This table stores temporary paging state
*/
CREATE TABLE IF NOT EXISTS lp_operations.paging_state (
id text, // ID of process
pos bigint, // current position
page text, // paging state
info text, // info json
finished tinyint, // finished
PRIMARY KEY (id)
) WITH default_time_to_live = 28800; // 8 hours
..in which i store current page (string representation of PagingState) and JSON meta data associated with calculation.
Questions
Can 'page' index expire in Cassandra?
How long does it exist (by default)?
No, Cassandra Driver's Paging State will not Expire.
Because Every time you query with paging state, cassandra actually execute your query every time. It don't store your result. Paging State just tell cassandra from which index the driver want the data .
Due to internal implementation details, PagingState instances are not portable across native protocol versions. This could become a problem in the following scenario:
you’re using the driver 2.0.x and Cassandra 2.0.x, and therefore native protocol v2;
a user bookmarks a link to your web service that contains a serialized paging state;
you upgrade your server stack to use the driver 2.1.x and Cassandra 2.1.x, so you’re now using protocol v3;
the user tries to reload their bookmark, but the paging state was serialized with protocol v2, so trying to reuse it will fail.
Source : http://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/