I am performing a query + reindex on the fly in Java 8 using Elasticsearch 6.2, on AWS. I am also interfacing the ES cluster through a Jest client, again, with the Java APIs provided by the client's library. I return back the results of the query to the user and use those results to start a reindex operation in the background for later use. The reindex operation can be semi-long running and take more than just a few or several seconds. I obviously know what my new index name is, but working in a stateless application on a server, I cannot save the task ID returned from the reindex API to query later, I need to look it up by other means. Doing a little research, I came across this API call in Kibana:
GET /_tasks?actions=*reindex
which will return all tasks that are currently reindexing, or an empty list. From there, I can get the parent task ID and query it for status. This may be a problem as I might have more than one reindex operation happening on the ES cluster at once.
Is there a more intelligent or straight-forward approach to my problem?
Related
I'm integrating BMC remedy and JIRA to solve a problem.
Task: I run a rest service and it raises automatic JIRA after reading BMC remedy if there are any records which are of type hotfix. So basically few fields from BMC will be mapped to JIRA when JIRA are created.
Problem : Because Remedy API accepts only one search criteria (which is "hotFix" in my case), Every time my service runs it reads remedy and fetches all the data which are of type "hotFix" even the ones I've already created JIRAs for which is expected. But now i need to resolve this because I don't want to raise duplicate JIRAs for them.
I don't want store all these things in database due to some reason. (Well infra cost).
Is there any way I can import this data without creating duplicates?
in your service before creating a JIRA ticket(I assume its an api call), check if already one exists (by using GET api from JIRA).
Based on your constraints for querying bmc remedy, seems this extra call to JIRA, to check if its a duplicate is an option.
Okay! I'm using a flat file.
As an alternative solution I've used a flat file to store the "date created" of last remedy incident with "HotFix" label (only one record !!! this gets updated every time my service gets a hit if there are new remedy incidents) and while fetching the data from remedy I'm ordering it based on date created and storing the most updated date (which would effectively serve me as a parameter for comparison when i hit my service next time to check if JIRAs till this particular date/time has already been created.) in this file.
This has resolved my issue.
I'm currently indexing webpages to elasticsearch. The indexing are done through java (Spring) and also through Apache Nutch.
I met with a situation that, I have to call an external API just after indexing or updating a document in elasticsearch. The API processes a field value in the index and store the processed result in the same index in another field. I tried the API call just before indexing and it affects indexing performance (takes too much time). I have to call the external API without affecting indexing or updating elasticsearch document.
Looking for some ideas.
I'm using elasticsearch version 5.6.3.
At the moment ES doesn't support a "notification system" similar to the one that you need (https://discuss.elastic.co/t/notifications-from-elasticsearch-when-documents-are-added/5106/31) this is impractical in most cases due to the distributed nature of ES.
I think that the easier approach would be to push into Kafka/RabbitMQ (a queue) and you could have your ES indexer as a worker in this queue, and then this worker would be the ideal place to send a message to a different queue indicating that the document X is ready for enrichment (add more metadata). And in this case, you don't have to worry about slowing down the indexing speed of your system (you can add more ES indexers). You also don't need to query ES constantly to enrich your documents because you could send the field (or fields) that are needed along with the ES id to the enrichment workers, and they would update that document directly after the call to the external API). Keep in mind that perhaps part of this could be wrapped in a custom ES plugin.
The advantage of this is that you could scale both places (ES indexer/metadata enricher) separately.
Other option could be having some external module that queries ES for a chunk of documents that still haven't been enriched with the external content, and then you could call the external API and then update the document back to ES.
In my case, we had used logstash-kafka-logstash to write to ES. At the consumer end of Kafka, we invoked external API to compute new field, updated that in a POJO and wrote to ES. It has been running pretty well.
Note: you may also want to check if data computation process via external API can be improved.
I have a use case in which my data is present in Mysql.
For each new row insert in Mysql, I have to perform analytics for the new data.
How I am currently solving this problem is:
My application is a Spring-boot application, in which I have used Scheduler which checks for new row entered in the database after every 2 seconds.
The problem with the current approach is:
Even if there is no new data available in Mysql table, Scheduler fires MySQL query to check if new data available or not.
One way to solve this type of problem in any SQL database in Triggers .
But till now I am not successful in creating Mysql triggers which can call Java-based Spring application or a simple java application.
My question is :
Is their any better way to solve my above use-case? Even I am open to change to another storage (database) system if they are built for this type of use-case.
This fundamentally sounds like an architecture issue. You're essentially using a database as an API which, as you can see, causes all kinds of issues. Ideally, this db would be wrapped in a service that can manage the notification of systems that need to be notified. Let's look at a few different options going forward.
Continue to poll
You didn't outline what the actual issue is with your current polling approach. Is running the job when it's not needed causing an issue of some kind? I'd be a proponent for just leaving it unless you're interested in making a larger change.
Database Trigger
While I'm unaware of a way to launch a java process via a db trigger, you can do an HTTP POST from one. With that in mind, you can have your batch job staged in a web app that uses a POST to launch the job when the trigger fires.
Wrap existing datastore in a service
This is, IMHO, the best option. This allows there to be a system of record that provides an API that can be versioned, etc. This would allow any logic around who to notify would also be encapsulated into this service.
Replace data store with something that allows for better notifications
Without any real information on what the data being store is, it's hard to say how practical this is. But using something as Apache Kafka or Apache Geode would both be options that provide the ability to be notified when new data is persisted (Kafka by listening to the topic, Geode via a continuous query).
For the record, I'd advocate for the wrapping of the existing database in a service. That service would be the only way into the db and take on responsibility for any notifications required.
I have a java backend system. We need integrate to a third party. I need to return results to a client. Currently we are using a view (SQL SERVER) that the third party write to and keep a tracker of the unique id somewhere.
I have a Spring-wired poller that runs every 10min that will return everything from what was last sent to end and update the tracker table with the new ID. Nothing complicated.
I would like to know if there is a simple way to almost "listen" on the table. If any new rows are added, grab them and return them via my service. And if there is, is it advisable?
Disclaimer: I have not exactly done this, yet, and SQL Server is not my playground. However, combining triggers with non-SQL commands should be possible nowadays, and searching the internet for 'sql server notification' yields a section on 'Query Notifications in SQL Server':
https://msdn.microsoft.com/en-us/library/t9x04ed2(v=vs.110).aspx
In general, as long as it is possible to somehow send a command to a socket from inside a trigger then you could use a PUB-SUB message queue (RabbitMQ, NSQ etc.) to send a notification that you can retrieve in your Java program. Of course, you would have to install triggers on any columns that you want to monitor. Whether it is possible to monitor a schema (or database) for changes in general - this might only be possible if there is some logging inside the database that you have access to. This might not be there, out of the box. The trigger is probably the cleaner way because it resides in the schema/database itself and does not need to access system tables.
EDIT: also found this SO question/answer about socket connection inside a trigger: Creating socket inside a SQL-CLR trigger or stored procedure
You could use triggers on the view, but that won't help in java land as most DBs aren't going to let you run java code there. Your choices are: 1) polling like you are doing or 2) create a web service (or JMS queue or something) that the third party calls to push the update/insert of data. Then you are in java land and Hibernate/Spring can handle the insert and do whatever processing you need.
You could use triggers on the tables, if performance is not an issue
I want to have my PostgreSQL server send out notifications when a piece of data changes, preferably over JMS, but also considering any other Pub-Sub mechanism or Callback.
Any ideas if this is possible?
Are there any available Java Add-on Packages that replicate this sort of functionality?
EDIT: I've been informed that PostgreSQL does support stored procedures in Java. That means the following approach becomes feasible:
Essentially, the way I would go is to put a trigger on whatever it is you want to watch, and then call a stored procedure from that. The stored procedure then needs to communicate with the world outside the DB server; I once did an SP like this in Java that opened up a socket connection to a process on the same server listening on a port. If worst came to worst, you could maybe write a file and have something like imon monitoring that file, or you could start up a program in an exec() shell of its own... something like that.
the simplest approach is to use LISTEN/NOTIFY interface, write your own program that connects to database, issues some LISTENs, and does whatever you want when it gets notification - for example sens information over JMS, or simply does what should be done, without adding additional transportation layer.
You can certainly create a Java-language stored procedure and put it into PostgreSQL. But why not keep it simple and debuggable until you know you have your messaging scheme working perfectly? If I were doing this (I am actually doing something similar) here's what I'd do.
(1) create an "outbound message" table with columns for the payload and other info for your JMS messages. I'd put a timestamp column in each row.
(2) write a database trigger for each item that you want to generate a message. Have the trigger INSERT a row into your "outbound message" table.
(3) unit test (1) and (2) by looking at the contents of your outbound message table as you change stuff in your database that should generate messages.
(4) write yourself a simple but high-performance Java JDBC client program that will query this outbound message table, send a JMS message for each row, and then DELETE it. Order the rows in your query by timestamp to preserve your message order. To get it to be high performance you'll need to do a good job with PreparedStatement objects and other aspects of heap management.
(5) Unit test (4) by running it a few times while message-generating changes are happening to your data base.
(6) set up this program to repeat operation (6) several times a minute, while using a single persistent JDBC connection. Queries to a small or empty table aren't very expensive, so this won't smack down your table server.
(7) system test this whole setup.
(8) figure out how to start your Java program from your crontab or your startup script.
When you get all this working you'll have a functioning messaging / notification system ready for systems integration. More importantly, you'll know exactly what you want your Java message-originating software to do. Once you're up and running, if the latency of your messages or the database overhead proves to be a big problem, then you can migrate your Java program into a stored procedure.
Note also that there are message-origination bindings for PERL and other languages built into the Apache ActiveMQ package, so you have some choices about how you implement your message-originating agent.
This approach happens to have two advantages: you aren't critically dependent on postgreSQL's distinctive stored-procedure scheme, and you aren't putting code with external communications dependencies into your table server.
Good luck.
If LISTEN/NOTIFY isn't accessible via JDBC, perhaps you could implement a long-polling HTTP comet-like mechanism via the LOCK statement, or plain "SELECT ... FOR UPDATE" and "SELECT ... FOR SHARE" or other similar queries from within a transaction that'd cause other transactions to block.
The message-writing party could e.g. start a transaction, perform "SELECT ... FOR UPDATE", wait (java code) until either something changes, or a timer expires (say after 30 seconds or so), update the locked row to indicate if data (elsewhere?) is available and commit the transaction to unblock others. Then repeat with a new transaction with "SELECT ... FOR UPDATE" immediately.
The message-reading party would perform a "SELECT ... FOR SHARE" which would block while a "SELECT ... FOR UPDATE" initiated elsewhere is active. It'd return an indication of message availability or the message data itself when the message-writing party's transaction ends.
Hopefully PostgreSQL queues the parties fairly, so that there's no risk of live-lock continuously blocking the message-reading party.
I would install PL/Java to Postgres and write a stored procedure based trigger for the the data you are interested in, which then calls JMS when being called. PL/Java documentation covers the trigger + stored procedure part pretty nicely btw.
I haven't used the JMS from the trigger code, but I'm pretty certain that there are no reasons why it wouldn't be doable, as this is standard Java code and my quick recheck on the documentation also didn't indicate anything suspicious.
Another possibility would be to call the JMS through a proxy service using either, perl, python or any other language that is available for postgres stored procedure development. Just as the JMS doesn't have a standard wire protocol you have to write a proxy service which does the translation.
Since the original question mentions JMS, consider using Apache ActiveMQ. ActiveMQ can use an SQL database for message persistence and supports Postgres in this way, see:
https://activemq.apache.org/jdbc-support
If you don't want to run the ActiveMQ broker as a separate service, it can be run in embedded mode as described here:
https://activemq.apache.org/how-do-i-embed-a-broker-inside-a-connection
Though I'm not sure if there are any limitations when running it embedded