I have a Spring application that runs a cron on it. The cron every few minutes gets new data from external API. The data should be stored in a database (MySQL), in place of old data (Old data should be overwritten by new data). The data requires to be overwritten instead of updated. The application itself provides REST API so the client is able to get the data from the database. So there should not be situation that client sees an empty or just a part of data from database because there is an data update.
Currently I've tried deleting whole old data and insert new data but there is a place that a client gets just a part of the data. I've tried it via Spring Data deleteAll and saveAll methods.
#Override
#Transactional
public List<Country> overrideAll(#NonNull Iterable<Country> countries) {
removeAllAndFlush();
List<CountryEntity> countriesToCreate = stream(countries.spliterator(), false)
.map(CountryEntity::from)
.collect(toList());
List<CountryEntity> createdCountries = repository.saveAll(countriesToCreate);
return createdCountries.stream()
.map(CountryEntity::toCountry)
.collect(toList());
}
private void removeAllAndFlush() {
repository.deleteAll();
repository.flush();
}
I also thought about having a temporary table that gets new data and when the data is complete just replace main table with temporary table. Is it a good idea? Any other ideas?
It's a good idea. You can minimize the downtime by working on another table until it's ready and then switch tables quickly by renaming. This will also improve perceived performance by the users because no record needs to be locked like what happens when using UPDATE/DELETE.
In MySQL, you can use RENAME TABLE if you don't have triggers on the table. It allows multiple table renaming at once and it works atomically (i.e. transaction - if any error happens, no change is made). You can use the following for example
RENAME TABLE countries TO countries_old, countries_new TO countries;
DROP TABLE countries_old;
Refer here for more details
https://dev.mysql.com/doc/refman/5.7/en/rename-table.html
I am required to execute a stored procedure in a SQL server to fetch some data, and since I will later save the data into a Mongo and this one is with ReactiveMongoTemplate and so on, I introduced Spring R2DBC.
implementation("org.springframework.data:spring-data-r2dbc:1.0.0.RELEASE")
implementation("io.r2dbc:r2dbc-mssql:0.8.1.RELEASE")
I see that I can do SELECT and INSERT and so on with R2DBC, but is it possible to EXEC prod_name? I tried it and it hangs forever and then the test terminates, without success but neither failure. The last line of log is:
io.r2dbc.mssql.QUERY - Executing query: EXEC "SCHEMA"."MY_PROCEDURE"
The code is like:
public Flux<Coupon> selectWithProcedure() {
return databaseClient
.execute("EXEC \"SCHEMA\".\"MY_PROCEDURE\" ")
.as(Coupon.class)
.fetch().all()
.doOnNext(coupon -> {
coupon.setCouponStatusRefFromId(coupon.getCouponStatusRefId());
});
}
And it seems that no data is retrieved.
If I test some other methods with simple queries like SELECT... it works. But the problem is, DBAs do not allow my app to read table data, instead, they create a procedure for me. If this query is not possible, I must go with traditional JPA way and going reactive at Mongo side has lost its sense.
Well. I just saw this:
https://github.com/r2dbc/r2dbc-mssql, version 0.8.1:
Next steps:
Execution of stored procedures
Add support for TVP and UDTs
And:
https://r2dbc.io/2019/05/13/r2dbc-0-8-milestone-8-released
We already have a few tickets lined up for the next milestone, and we know that they will require further SPI modifications:
Support for Auto-Commit
Connection Validation
Support for Stored Procedures
As of NiFi 1.7.1, the new DBCPConnectionPoolLookup enables dynamic selection of database connections: set an attribute database.name on a FlowFile and when a consuming processor accesses a configured DBCPConnectionPoolLookup controller service, the content of that attribute will be used to get a connection through this lookup's configured properties, which contain a mapping of potential values to DBCPConnectionPool controller service.
I'd like to list the tables in each database that I've configured in the lookup, but the ListDatabaseTables processor does not accept incoming FlowFiles. This seems to mean that it's not usable for listing tables in a dynamic set of databases.
What is the best way to accomplish this?
ListDatabaseTables uses the JDBC API for getting table info from the metadata of an established JDBC connection. This hides the underlying method of how to actually get tables from a particular database.
If all your databases are of the same ilk, then if you have a list of databases, you could generate flow files with one per database, filling in the database.name attribute, then using ExecuteSQL with the DBCPConnectionPoolLookup to execute the corresponding SQL statement to get the tables for that database, such as SHOW TABLES. You can parse the records using any of the record-aware processors such as QueryRecord, UpdateRecord, ConvertRecord, etc. and if you need one table per flow file you can use SplitRecord. If the output is JSON or CSV or XML, you could use EvaluateJsonPath, ExtractText, or EvaluateXPath respectively to get the table name into an attribute, and continue on from there.
I wrote up NIFI-5519 to cover the proposal for ListDatabaseTables to optionally accept incoming connections, in the meantime you'd need 1 ListDatabaseTables instance to correspond to each of your DBCPConnectionPool instances.
I'm currently facing very slow/ no response on a collection looking by ID. I have ~ 2 milion of documents in a partitioned collection. If lookup the document using the partitionKey and id the response is immediate
SELECT * FROM c WHERE c.partitionKey=123 AND c.id="20566-2"
if I try using only the id
SELECT * FROM c WHERE c.id="20566-2"
the response never returns, java client seems freezed and I have the same situation using the Data Explorer from Azure Portal. I tried also looking up by another field that isn't the id or the partitionKey and the response always returns. When I try the select from Java client I always set the flag to enable cross partition query.
The next thing to try is to avoid the character "-" in the ID to test if this character blocks the query (anyway I didn't find anything on the documentation)
The issue is related to your Java code. Due to Azure DocumentDB Java SDK wrapped the DocumentDB REST APIs, according to the reference of REST API Query Documents, as #DanCiborowski-MSFT said, the header x-ms-documentdb-query-enablecrosspartition explains your issue reason as below.
Header: x-ms-documentdb-query-enablecrosspartition
Required/Type: Optional/Boolean
Description: If the collection is partitioned, this must be set to True to allow execution across multiple partitions. Queries that filter against a single partition key, or against single-partitioned collections do not need to set the header.
So you need to set True to enable cross partition for querying across multiple partitions without a partitionKey in where clause via pass a instance of class FeedOption to the method queryDocuments, as below.
FeedOptions queryOptions = new FeedOptions();
queryOptions.setEnableCrossPartitionQuery(true); // Enable query across multiple partitions
String collectionLink = collection.getSelfLink();
FeedResponse<Document> queryResults = documentClient.queryDocuments(
collectionLink,
"SELECT * FROM c WHERE c.id='20566-2'", queryOptions);
I have a web application build in Django + Python that interact with web services (written in JAVA).
Now all the database management part is done by web-services i.e. all CRUD operations to actual database is done by web-services.
Now i have to track all User Activities done on my website in some log table.
Like If User posted a new article, then a new row is created into Articles table by web-services and side by side, i need to add a new row into log table , something like "User : Raman has posted a new article (with ID, title etc)"
I have to do this for all Objects in my database like "Article", "Media", "Comments" etc
Note : I am using PostgreSQL
So what is the best way to achieve this..?? (Should I do it in PostgreSQL OR JAVA ..??..And How..??)
So, you have UI <-> Web Services <-> DB
Since the web services talk to the DB, and the web services contain the business logic (i.e. I guess you validate stuff there, create your queries and execute them), then the best place to 'log' activities is in the services themselves.
IMO, logging PostgreSQL transactions is a different thing. It's not the same as logging 'user activities' anymore.
EDIT: This still means you create DB schema for 'logs' and write them to DB.
Second EDIT: Catching log worthy events in the UI and then logging them from there might not be the best idea either. You will have to rewrite logging if you ever decide to replace the UI, or for example, write an alternate UI for, say mobile devices, or something else.
For an audit table within the DB itself, have a look at the PL/pgSQL Trigger Audit Example
This logs every INSERT, UPDATE, DELETE into another table.
In your log table you can have various columns, including:
user_id (the user that did the action)
activity_type (the type of activity, such as view or commented_on)
object_id (the actual object that it concerns, such as the Article or Media)
object_type (the type of object; this can be used later, in combination with object_id to lookup the object in the database)
This way, you can keep track of all actions the users do. You'd need to update this table whenever something happens that you wish to track.
Whenever we had to do this, we overrode signals for every model and possible action.
https://docs.djangoproject.com/en/dev/topics/signals/
You can have the signal do whatever you want, from injecting some HTML into the page, to making an entry in the database. They're an excellent tool to learn to use.
I used django-audit-log and I am very satisfied.
Django-audit-log can track multiple models each in it's own additional table. All of these tables are pretty unified, so it should be fairly straightforward to create a SQL view that shows data for all models.
Here is what I've done to track a single model ("Pauza"):
class Pauza(models.Model):
started = models.TimeField(null=True, blank=False)
ended = models.TimeField(null=True, blank=True)
#... more fields ...
audit_log = AuditLog()
If you want changes to show in Django Admin, you can create an unmanaged model (but this is by no means required):
class PauzaAction(models.Model):
started = models.TimeField(null=True, blank=True)
ended = models.TimeField(null=True, blank=True)
#... more fields ...
# fields added by Audit Trail:
action_id = models.PositiveIntegerField(primary_key=True, default=1, blank=True)
action_user = models.ForeignKey(User, null=True, blank=True)
action_date = models.DateTimeField(null=True, blank=True)
action_type = models.CharField(max_length=31, choices=(('I', 'create'), ('U', 'update'), ('D', 'delete'),), null=True, blank=True)
pauza = models.ForeignKey(Pauza, db_column='id', on_delete=models.DO_NOTHING, default=0, null=True, blank=True)
class Meta:
db_table = 'testapp_pauzaauditlogentry'
managed = False
app_label = 'testapp'
Table testapp_pauzaauditlogentry is automatically created by django-audit-log, this merely creates a model for displaying data from it.
It may be a good idea to throw in some rude tamper protection:
class PauzaAction(models.Model):
# ... all like above, plus:
def save(self, *args, **kwargs):
raise Exception('Permission Denied')
def delete(self, *args, **kwargs):
raise Exception('Permission Denied')
As I said, I imagine you could create a SQL view with the four action_ fields and an additional 'action_model' field that could contain varchar references to model itself (maybe just the original table name).