Entity & Entity Properties. Database design for effective searching - java

Last two days i've been searching suitable solution for the problem described below.
In my standalone notification-service module I have an abstract Message entity. Message has 'to', 'from', 'sentAt', 'receivedAt' and other attributes. The responsibility of the notification-service is to:
send new messages using different registered message providers (SMS, EMAIL, Skype , etc).
receive new messages from registered message providers
update status for already sent messages.
Notification-service module is developed as standalone module that is available by SOAP protocol. A lot of clients can use this module to send or searching through already received messages.
Clients want to attach some properties (~ smth like tags) while sending messages for further searching messages by these properties. These properties make a sense only in client's environment.
For example, Client A might want to send message and save following custom properties :
1. Internal system id of user whom system sends message
2. Distinguish flag (whether id related to users / admins or clients)
3. Notification flag (notification/alert/ ...)
Client B might want to send message and save another set of custom properties :
1. Internal system operator id (who sends sms)
2. Template id that was used to send message
Custom properties can be used by the clients to search already sent messages.
For example:
Client A could find SMS messages sent to administrator users in period between [Date 1; Date 2] that have 'alert' status.
Client B could find all notification sent by specified template.
Of course, data should be fetched page by page.
At first I created the following database model:
Database scheme
To find all messages with specified properties I tried to use query:
SELECT * FROM (SELECT message_id FROM custom_message_properties
WHERE CONCAT(CONCAT(key, ':'), value) IN ('property1:value1', 'property2:value2')
GROUP BY message_id having(count(*)) = 2)
as cmp JOIN message m ON cmp.message_id = m.id ORDER BY ID LIMIT 100 OFFSET 0
Query worked fine (although it seems me not very good) in database with small data. I decided to check results for ~ real awaited data .
So i generated 10 000 000 messages that have 40 000 000 custom properties and checked result. Execution time was ~ 2 minutes. The most time consumed operation was following sub-select:
SELECT message_id FROM custom_message_properties
WHERE CONCAT(CONCAT(key, ':'), value) IN ('property1:value1', 'property2:value2')
I understand that string comparison is very slow cause database index feature is not used. I decided to change database structure to merge 'key' and 'value' columns into single one. So i updated by database scheme :
Updated database scheme
I checked result again. Now execution time was ~20 seconds. It's much better but still is not suitable for production use.
So now I have no idea how to improve performance without significant changes in application architecture design.
The only one thought i have is to create separate table for each client with required client properties.
client(i)_custom_properties {
mid bigint, // foreign key references message (id)
p1 type1,
p2 type2,
......
pn type(n)
}
I have spent a lot of time while trying to find any useful information. I have also analyzed 'stackoverflow' database cause it seemed me that it should be quite the same. But in 'stackoverflow' there are ~ 50 000 different tags. Not so much that my database could have.
Any help is appreciated. Thanks, in advance!
Project environment that i use :
Postgres database (9.6)
Java 1.8
Spring modules (spring-boot, spring-data-jpa + hibernate, spring-ws, etc).

I have not found any suitable solution except creating additional table with client's properties for each client.
I know, that solution is not so flexible,
but now search query time is less than 1 second.
In future, I will try to solve the same problem using noSQL data storage.

Related

How do I send categorized and grouped logs into Stackdriver using the Java Client Library?

I'd like to aggregate multiple log.info log.warning and log.error calls, and possibly stack traces, into a single Stackdriver log line generated by the server interacting with my application code. The goal is to summarize a request handled by my Scala server, and then group as many logging statements as occurred during its execution, with any errors.
This is default behavior on GAE logging, but because I'm new to reading Java API's, I'm having trouble figuring out how to:
1/ create a custom MonitoredResource (?) representing, e.g., "API server", then specifying category within it (e.g. "production"). Specifically, do I have to create these via the REST API, even though I'm only doing it once for my deployment? Can I use something like Troposphere to define these in code and commit them in a repo?
2/ how the nouns MonitoredResource, MonitoredResourceDescriptor, LogEntry, LogEntryOperation and logName fit together, and where the categories "API Server" and "production" get defined, as well as logging statement groups like GET /foobar -> 200 response + 1834 bytes can be added (are those logNames?).
No need to write code for me, of course, but pointers and a high level overview to save me trial and error would be appreciated greatly.
You can group together multiple log entries for the same operation by using the LogEntryOperation field in the LogEntry (https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#LogEntryOperation).
In the Logs Viewer, you can group the log entries by filtering on the operation.id field using the advanced filters.
In the Java client library, you can set the Operation Id using https://googlecloudplatform.github.io/google-cloud-java/0.33.0/apidocs/com/google/cloud/logging/LogEntry.Builder.html#setOperation-com.google.cloud.logging.Operation-
1) The Monitored Resources you can use is a curated set defined by Google. You cannot define your own type. The supported resources are listed in https://cloud.google.com/logging/docs/api/v2/resource-list.
2) Basic concepts are described in https://cloud.google.com/logging/docs/basic-concepts.
1) The MonitoredResource monitors what you configure in MonitoredResourceDescriptor
I assume you can create it anyway you want (REST API or with the client libraries).
2) I am unsure where you want to describe "API Server" or "Production", the MonitoredResourceDescriptor is how to setup what to monitor. The LogEntry is the actual logs and the LogName is just a label you give this particular log. What you described should be something that a logEntry returns (the 200 code + other stuff).
I might be a bit confused at what you are asking. The best thing to do is to create a sample MonitoredResource and see how it works.

How to ensure no duplicates created in DB on connection loss

Евгений Кравцов:
I develop some tiny service with http backend and android app. And recently i felt the lack of knowledge in such a systems.
Case:
- Client makes order in app and send request to server
- Server successfuly recieved data and make a database row for order
- After database work completes, backend tries to respond to App with 200 success code.
- App user faced some internet connection problems and can not receive server response. App gets timeout exception and notify user, that order was not successful
- After some time, internet connection on user device restored and he send another request with same oreder.
- Backend recieves this again and create duplicate for previous order
So i got 2 orders in database, but user wanted to create only one.
Question: How can i prevent such a behavior?
You should add a key to your table in the db, for example defining the key as compound key of user_id, type - so specific user cannot make 2 orders with the same order type.
This will cause the second db insert to fail.

mongo uri for springboot get request

I have configure mongo uri in property file as below,
spring.data.mongodb.uri=mongodb://db1.dev.com,db2.dev.com,db3.dev.com
spring.data.mongodb.database=mydb
I use mongoowl as a monitoring tool.
When i do a get request, it shows hits in every mongodb which ideally should be show only in one db right?
No, You are actually opening a cluster replica set connection, in this connection type spring actually connects to all 3 databases to maintain fail over conditions or to full fill "read from secondary" option(hence you see hits on all 3 databases), but however the read and write operations happen only on primary unless you have specified it to read from a secondary.

How to check delayed/scheduled messages in RabbitMQ's Mnesia

I was checking for some alternatives for Quartz-scheduler.
Though this is not a complete replacement, I was trying out RabbitMQ Delayed Messages Plugin (suits for my use-case).
I was able to get the scheduling work but I was not to view the messages which are delayed(which are stored in Mnesia).
Is there a way to check the messages and/or number of messages in Mnesia?
Edit : I inferred that the messages are stored in Mnesia from the comment from here.
There is no way to check the messages that RabbitMQ is persisting in it's mnesia database.
RabbitMQ is not a generalized datastore. It is a purpose-built message broker and queueing system. The datastore it has in it is there to facilitate the persistence of messages, not to be queried and used as if it were a database on it's own.
To view the data inside MNESIA you could :
Write a simple Erlang program as this, as result you have:
(rabbit#gabrieles-MBP)5>
load:traverse_table_and_show('rabbit_delayed_messagerabbit#gabrieles-MBP').
{delay_entry,
{delay_key,1442258857832,
{exchange,
{resource,<<"/">>,exchange,<<"my-exchange">>},
'x-delayed-message',true,false,false,
[{<<"x-delayed-type">>,longstr,<<"direct">>}],
undefined,undefined, {[],[]}}},
{delivery,false,false,<0.2008.0>,
{basic_message,
{resource,<<"/">>,exchange,<<"my-exchange">>},
[<<>>],
{content,60,
{'P_basic',undefined,undefined,
[{<<"x-delay">>,signedint,100000}],
undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,undefined,
undefined},
..
OR in this way:
execute an Erlang shell session using:
erl -set-cookie ABCDEFGHI -sname monitorNode#gabrielesMBP
you have to use the same cookie that rabbitmq are using.
Typically $(HOME).erlang.cookie
execute this command:observer:start().
and you should have this:
Once you are connected to rabbitmq node open Table Viewer and from the menu Mnesia table as:
Here you can see your data:

Sending mail asynchornously

I have four db servers with the same db structure but different data.
Currently when new data are inserted to database my application get this data, create template and send email.
I would like to separate sending email from my applications.
For example some thread which will start once per 10 minutes. It selects data from my four db servers, connect to mail server and send email to users.
It's possible with using JMS or something similar ?
Thanks for replys !
I did the same by creating a mail table (likely one per DB) and save the Template and data (or subject/body) in it. A separate process could be Quartz or your own pooling thread read that table and connect to mail server and send email and update email status.
In this way you can check status of any email at any given time and even you can resend any email. The table needs to purge/archive after some time may be after 1 day or 1 week depends on table size.

Categories

Resources