So, I have an application that extracts keywords from an elastic search document. I need to somehow run this application when my elastic search receives a new document to index so that the keywords generated get registered and stored with the document. Is there anyway to create a Plugin that extracts the keywords as soon as the document arrives?
"I need to somehow run this application when my elastic search receives a new document to index"
what you are describing requires a combination of an Ingest Pipeline and Ingest Plugin with a listener for indexing events.
https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
the ingest pipleline lets you manipulate the incoming JSON, and the listener provides a hook for you to trigger any code you want.
Elastic.co don't usually provide good documentation for these kinds of things, but checkout the examples they have and their GitHub:
https://github.com/elastic/elasticsearch/tree/master/plugins
I cannot think of an Elasticsearch plugin (I'm not aware of such a thing), but I'd rather recommend using Logstash, and configure the Elasticsearch input plugin to listen to updates in your cluster and use other plugins to react accordingly. To enable Logstash communicating with your application, an idea might be to add a REST endpoint to your app and let Logstash sending requests to that endpoint by using the Http output plugin.
Related
Using Spring-boot-actuator API I need to count the number of API hits per clientID. How can I achieve this? Another challenge is my application is deployed on AWS and Azure. At any time I want to know the total API hit count across all environments.
There are multiple ways to do it. You have use tools like newrelic to capture that.
It uses java agent to bound to each API call.
Another option is you can use logging system to push logs and then accumulate and show using splunk, kibana. there you can create dashboard based on logs to check API hit.
You can implement your own approach, as an API interceptor/ControllerAdvice to send request hit in a separate async thread.But then you have to implement real time aggregration of these hits.
I am new to Solr and I am using java application to list the collections using CollectionAdminReqest API. But it fails when server is in standalone mode.
I want to know if there is any API that works for both modes. Or if there is any API to know the mode of running Solr server so that we use corresponding API based on that.
The Collections Admin API has far more parameters that are relevant than what the regular CoreAdminAPI has, so as far as I know there is no common API you can use.
You can however determine the current mode by making a request to /solr/admin/info/system?wt=json. The JSON response will include a mode parameter:
"mode":"solrcloud",
This will tell you if Solr is running in cloud mode or standalone. Using your browser's developer tools' network pane is usually a good way to see all the information Solr exposes about itself through its API.
Once on your Solr Administration User Interface, just make the following request on the browser :
http://enteryourservername:enteryourportno/solr/admin/info/system?wt=json
The response returned will look something like this (look out for the value of mode which will be std for standalone and solrcloud for cloud based) -
{"responseHeader":{"status":0,"QTime":0},"mode":"std",....
I have a web application running in EC2 instance. It has different API endpoints. I want to count the number of times each API is called. The web application is in Java.
Can anyone suggest to me some articles where I can find proper Java implementation for integration of statsD with CloudWatch?
Refer their doc page https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-custom-metrics-statsd.html, They have mentioned about publishing the metrics in the same page, for your client side you can refer https://github.com/etsy/statsd/wiki#client-implementations.
Usually I follow a simple approach without using statsd, Log the events in the file and sync the file to the Cloudwatch, In cloudwatch you can configure filters and based on filters, you can increment the custom metrics.
Install CloudWatch Agent on your EC2 instance
Locate and open CW Agent config file
Add statsd section into the config file (JSON format)
{
....,
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
AWS CloudWatch agent is smart enough to understand custom tags helping you to correctly split statistics gathered from different API methods ("correctly" here means splitting API methods stats by dimension name, not by metric name). So you need a Java client lib supporting tags. For example, DataDog client
Configure the client instance as explained in the package documentation and that's it. Now you can do thing like this at the beginning of your each REST API operation:
statsd.incrementCounter(“InvocationCount”, 1, “host:YOUR-EC2-INSTANCE-NAME”, “operation:YOUR-REST-OPERATION-NAME”);
CloudWatch will handle everything else automatically. You will be able to see you metrics data flowing in the AWS CloudWatch Console under "CWAgent" namespace. Please be aware that average delay between statds client call and the data visibility in CW Console is about 10-15 minutes.
Manually writing statsd calls in each REST API operation may not be a good idea. Decorators will help you to automatically instrument it with just a several lines of code.
We have a api project written in JAVA using spring hibernate and reading data from MySQL. Recently, we added another data source as BigQuery, therefore we want to allow user to call certain apis to query data from BigQuery - order count from orders table in BigQuery using JAVA api client. Looked at the github sample, but not clear about how to setup and access.
Also found this link but not sure about registering a web application, etc. Please advise.
Update: Imagine this as a web application which shows count of orders in the last 5 days if I select a merchant. but the orders table is in BigQuery. My web application calls-> java api layer which calls -> bigquery using client library -> populates the response as a json -> web application receives the count of orders.
I feel the hiccup is in authenticating using GoogleCredential. Generated a new Oauth Client Id which provided a client_id and client_secret. But still it is not able to authenticate to the project and give results.
Thanks.
Since your question is generalized, I believe what you need is to understand Google's BigQuery on how it works, how to setup the data etc.
Once you setup the data in BigQuery, you can access BigQuery by using a web UI or a command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET or Python
You also haven't mentioned that whether you have gone through the basics.
I hope this link will be helpful in understanding how to import data to BigQuery and setting up the data, querying etc.
Use Service Accounts to connect to your BQ.
And please be aware that the response time will be 2-3 seconds as this is a big data tool not a real time db for web use. Not sure if that's how you want your web application to work. You may need to cache the number in your local database.
I must create a small IOT platform based on Spring Boot/Java 8.
Context: Devices send some various informations to the platform. I must save them and after consume them in an analysis algorithm.
Constraint: I want it all be async and the platform must be based on Java8/Spring technologies or must be easily integrated in a spring boot app.
What I imagine? I thought send devices' informations to Async Spring REST controller and save them async in Mongodb.
I have already the analysis algorithm based on Google Guava Event Bus.
To resume, I have datas from devices in Mongodb database and an algorithm based on Java POJO and the last part which is missing is transform datas from devices to Java POJO.
With which technologies can I do that? Spring Reactor? RxJava? Something else? And how can I put this in place?
I search something simple to put in place which can easily scale by instance duplication for example. For the moment, I thought that Spring Cloud technologies is a bit too big for my purpose.
You should have a look at Spring XD engine.
Spring XD Enables Different Sources (HTTP, FTP, MQTT, File etc), Transformers, Filters, Sinks (HTTP, FTP, MQTT, File etc).
Please check this post on a small IoT Project based on Spring XD and Twitter API.