I have two repositories in sesame, where one have whole data and one have data with few fields which links to primary data.
Example:
Primary Data Fields:
uri, skos:prefLabel , skosAltLabel etc.
Secondary Data Fields:
uri, customField
So basically i want to query secondary data on customField which will return uri which can be mapped to primary data and get other details.
So they are Linked data-sets.
So is it possible to query linked repositories which are on sesame at a time?
Using SPARQL 1.1 SERVICE queries
SPARQL 1.1 supports the SERVICE clause, which allows you to combine results from multiple SPARQL endpoints in a single query result. Because Sesame Server exposes every repository as a SPARQL endpoint, you can use this to do queries over multiple repositories.
For example, say you have a Sesame Server running at http://localhost:8080/openrdf-sesame with two repositories, Primary and Secondary. The SPARQL query endpoints for both repositories are http://localhost:8080/openrdf-sesame/repositories/Primary and http://localhost:8080/openrdf-sesame/repositories/Secondary, respectively.
You can execute a SPARQL on one repository (say, Primary) which then in the query refers to the other one, like this:
SELECT *
WHERE {
# data from Primary dataset
?uri a skos:Concept ;
skos:prefLabel ?prefLabel ;
skos:altLabel ?altLabel .
# data from Secondary dataset
SERVICE <http://localhost:8080/openrdf-sesame/repositories/Secondary> {
?uri :customField ?customFieldValue .
}
}
Using Sesame's FederationSail
An alternative is to set up a Federated repository in Sesame, using the FederationSail. This a way to group several Sesame databases together to form a "virtual" repository - a Federation. You can execute queries on the Federation and the result will include data from all member databases of the Federation (without the need to specify which endpoints you want to query, like you need to do when using a SERVICE clause).
A Federation can be set up programmatically, or (if you're using Sesame Server and Workbench) via the Workbench. Just choose 'New repository', and pick the 'Federation store' option in the store type drop-down. Give it an id and a description, then on the next screen you get to pick which databases should be part of the federation.
Related
As of NiFi 1.7.1, the new DBCPConnectionPoolLookup enables dynamic selection of database connections: set an attribute database.name on a FlowFile and when a consuming processor accesses a configured DBCPConnectionPoolLookup controller service, the content of that attribute will be used to get a connection through this lookup's configured properties, which contain a mapping of potential values to DBCPConnectionPool controller service.
I'd like to list the tables in each database that I've configured in the lookup, but the ListDatabaseTables processor does not accept incoming FlowFiles. This seems to mean that it's not usable for listing tables in a dynamic set of databases.
What is the best way to accomplish this?
ListDatabaseTables uses the JDBC API for getting table info from the metadata of an established JDBC connection. This hides the underlying method of how to actually get tables from a particular database.
If all your databases are of the same ilk, then if you have a list of databases, you could generate flow files with one per database, filling in the database.name attribute, then using ExecuteSQL with the DBCPConnectionPoolLookup to execute the corresponding SQL statement to get the tables for that database, such as SHOW TABLES. You can parse the records using any of the record-aware processors such as QueryRecord, UpdateRecord, ConvertRecord, etc. and if you need one table per flow file you can use SplitRecord. If the output is JSON or CSV or XML, you could use EvaluateJsonPath, ExtractText, or EvaluateXPath respectively to get the table name into an attribute, and continue on from there.
I wrote up NIFI-5519 to cover the proposal for ListDatabaseTables to optionally accept incoming connections, in the meantime you'd need 1 ListDatabaseTables instance to correspond to each of your DBCPConnectionPool instances.
I'm currently facing very slow/ no response on a collection looking by ID. I have ~ 2 milion of documents in a partitioned collection. If lookup the document using the partitionKey and id the response is immediate
SELECT * FROM c WHERE c.partitionKey=123 AND c.id="20566-2"
if I try using only the id
SELECT * FROM c WHERE c.id="20566-2"
the response never returns, java client seems freezed and I have the same situation using the Data Explorer from Azure Portal. I tried also looking up by another field that isn't the id or the partitionKey and the response always returns. When I try the select from Java client I always set the flag to enable cross partition query.
The next thing to try is to avoid the character "-" in the ID to test if this character blocks the query (anyway I didn't find anything on the documentation)
The issue is related to your Java code. Due to Azure DocumentDB Java SDK wrapped the DocumentDB REST APIs, according to the reference of REST API Query Documents, as #DanCiborowski-MSFT said, the header x-ms-documentdb-query-enablecrosspartition explains your issue reason as below.
Header: x-ms-documentdb-query-enablecrosspartition
Required/Type: Optional/Boolean
Description: If the collection is partitioned, this must be set to True to allow execution across multiple partitions. Queries that filter against a single partition key, or against single-partitioned collections do not need to set the header.
So you need to set True to enable cross partition for querying across multiple partitions without a partitionKey in where clause via pass a instance of class FeedOption to the method queryDocuments, as below.
FeedOptions queryOptions = new FeedOptions();
queryOptions.setEnableCrossPartitionQuery(true); // Enable query across multiple partitions
String collectionLink = collection.getSelfLink();
FeedResponse<Document> queryResults = documentClient.queryDocuments(
collectionLink,
"SELECT * FROM c WHERE c.id='20566-2'", queryOptions);
I paginate through a large collection of data (circa 500 000 000 rows) using PagingState, and do some business intelligence during this process. To be able to resume the process I created this table...
/**
* This table stores temporary paging state
*/
CREATE TABLE IF NOT EXISTS lp_operations.paging_state (
id text, // ID of process
pos bigint, // current position
page text, // paging state
info text, // info json
finished tinyint, // finished
PRIMARY KEY (id)
) WITH default_time_to_live = 28800; // 8 hours
..in which i store current page (string representation of PagingState) and JSON meta data associated with calculation.
Questions
Can 'page' index expire in Cassandra?
How long does it exist (by default)?
No, Cassandra Driver's Paging State will not Expire.
Because Every time you query with paging state, cassandra actually execute your query every time. It don't store your result. Paging State just tell cassandra from which index the driver want the data .
Due to internal implementation details, PagingState instances are not portable across native protocol versions. This could become a problem in the following scenario:
you’re using the driver 2.0.x and Cassandra 2.0.x, and therefore native protocol v2;
a user bookmarks a link to your web service that contains a serialized paging state;
you upgrade your server stack to use the driver 2.1.x and Cassandra 2.1.x, so you’re now using protocol v3;
the user tries to reload their bookmark, but the paging state was serialized with protocol v2, so trying to reuse it will fail.
Source : http://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
I am writing a java based REST API based on resteasy. I have following structure
Category has many groups; group has many preferences
I have 3 resources 1. category 2. group 3. preference
I want to support these API endpoints
/categories/{cat_id}/groups/ (returns all groups for the category)
/groups/{group_id}/preferences/ (returns all prefs for the group)
/preferences/{preference_id} (returns a pref identified by the id passed)
/preferences (returns all prefs)
I have 3 resource classes one for each resource mentioned above
I am confused how to structure the methods and where they should go. Following are my specific questions
/groups/{group_id}/preferences/ Should the implementation go under GroupsResourceImpl class or PreferenceResourceImpl class
PreferenceResourceImpl class has the implementation for /preferences end point which returns all the global preferences. So if /groups/{groupid}/preferences endpoint reside under GroupResource and call a method on the PreferenceResource (a method that takes in group id as an extra param) ??
Basically, the implementation makes no difference, you can do it either way. You should group them the way that groups the code the most logical way to you.
I, personally, like to group resource implementation the way that groups similar code together. This can be achieved by grouping together the paths that require similar ways to extract from the infrastructure. For example, if all your data is in relational database, then all your /*/preferences queries would be basically SELECT * FROM preferences WHERE ...different stuff.... Therefore, it would be most logical to group them all by the table you are querying from. For example:
GroupResource: /categories/{cat_id}/groups/
Because it would always query from table "groups": SELECT * FROM groups WHERE parent_cat_id = {cat_id}
PreferenceResource: /groups/{group_id}/preferences/, /preferences/{preference_id}, /preferences.
Because it would always query from table "preferences": SELECT * FROM preferences WHERE parent_group = {group_id}, ... WHERE id = {preference_id}, SELECT * FROM preferences
However, if your categories/groups/preferences reside in some kind of graph database - then to query /groups/{group_id}/preferences you would need to first find a group, and then get all the preferences from the inside and therefore, it would be more logical to group resources like this:
CategoryResource: /categories/{cat_id}/groups/
Because it would start querying from "categories" graph: categories.get(cat_id).getGroups()
GroupResource: /groups/{group_id}/preferences/
Because it would start querying from "groups" graph: groups.get(group_id).getPreferences()
PreferenceResource: , /preferences/{preference_id}, /preferences.
Because it would start querying from "preferences" graph: preferences.get(preference_id) and preferences.getAll()
I have 4 tables involved in this query.
Campaign - many to one business
Business - one to many client
Client - one to one contact
Contact
In contact there is the field contact_name which is unique. I need to retrieve all campaigns related to contact(via client and business) which campaign field type equals 2.
What is the best way to do it with hibernate?
In SQL is will look like this:
select *
from campaign,contact, business, client
where campaign.type=2
and client.contact_id = contact.contact_id
and contact.name = 'Josh'
and client.business_id = business.business_id
and campaign.campaign_id = business.business_id
I think that the following should work.
from Compaign where Compaign.type=2 and compaign.business.client.contact.contact_name=:name
You can execute native SQL Queries too using createSQLQuery() method of Session.
You can also use Scalar Property to avoid the overhead of using ResultSetMetadata.
You can find more information on this from here