Background
We have a web server written in Java that communicates with thousands of mobile apps via HTTPS REST APIs.
For investigation purposes we have to log all API calls - currently this is implemented as a programming #Aspect, and for each API call we save an api_call_log object into a MySQL table with the following attributes:
tenant_id
username
device_uuid
api_method
api_version
api_start_time
api_processing_duration
request_parameters
full_request (JSON)
full_response (JSON)
response_code
Problem
As you can imagine after reaching a certain throughput this solution doesn't scale well, and also querying this table is very slow even with the use of the right MySQL indices.
Approach
That's why we want to use the Elastic Stack to re-implement this solution, however I am a bit stuck at the moment.
Question
I couldn't found any Logstash plugins yet that would suit my needs - should I output this api_call_log object into a log file instead and use Logstash to parse, filter and transform that file?
Exactly this is what I would do in this case. Write your log to a file using a framework like logback, rotate it. If you want easy parsing use json as logging format (also available in logback). Then use Filebeat in order to ingest the logfile as it gets written. If you need to transform/parse the messages in elasticsearch ingest nodes using pipelines.
Consider tagging/enriching the logfiles read by filebeat with machine or enviroment specific informations in order to ask for them in your visualisation or report etc.
The filebeat-to-elastic approach is the simplest one. Try this first. If you can't get your parsing done in elasticsearch pipelines, put a logstash in between.
Using filebeat you'll get many stuff for free like backpressure handling and daily indicies what comes very handy in the logging scenario we are discussing here.
When you need a visualization or search ui, have a look on kibana or grafana.
And if you have more questions, raise a new question here.
Have Fun!
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-installation.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
Related
Currently we have several Java micro service apps that use Elastic Search, and for debugging purposes we have the logging set to tracer. This outputs all ES requests and responses to the logs. We really only need requests, and only on non-production. For all environments we want to keep search response times along with a custom header that we set for tracking purposes across multiple micro service apps.
I see that in .NET there is a custom solution that would work perfectly for us: https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/logging-with-on-request-completed.html#logging-with-on-request-completed but sadly I can't seem to find a matching Java feature.
Is there a way to do this using Java?
If I got you question correct then you want the following :-
Log every elasticsearch query only (and not response) from different
microservices.
You just want it on your test clusters
There is a workaround in elastisearch for the same. Elasticsearch itself logs the queries made to it and you just need to set a threshold for it. So any query that takes more time than that threshold would be logged in a separate file "_slow_log." in your logs folder. You just simply need to set the threshold to "0" to log every query only and agin this can be done in testing enviorments for your particular usecase.
There are a lot of configuration options in it and would recommend you to check this : https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html
I saw already answered questions and seems they are old enough that I couldn't use them. I tried an example given at https://www.elastic.co/blog/found-java-clients-for-elasticsearch which has the code written but not in an organized manner that would help me. The libraries are old and code gives me error.
I saw Spring Data project but that only allow a specific type of document/class to be indexed and needs the model to be predefined which is not my usecase. My goal is to build a java web application which could ingest any data documents fed to elasticsearch and we could analyze it with Kibana. I would need to know how can i fire a rest call or curl for bulk data. Can anyone state an example with complete parts please.
Use rest client.
The Java High Level REST Client works on top of the Java Low Level
REST client. Its main goal is to expose API specific methods, that
accept request objects as an argument and return response objects, so
that request marshalling and response un-marshalling is handled by the
client itself.
To upload data from java application to ES , you can use bulk Api.
To check list of APIs check link
I have a java class that sends http post requests to a solr instance to index json files. it is implemented in a multithreaded manner. However, I have realized that sending so many http requests (close to 20,000) is causing the network to be a bottle neck. I read online that I can do batch indexing, but I can't find any clear examples. Is there any advice to batch index solr?
Thank you.
For generic JSON, you must have a configuration somewhere in solrconfig.xml that defines how it is treated.
One of the parameters is split. You might be able to use it to combine your JSON documents into a one bigger one that Solr would split and process separately. Notice that the specific format may be a little different for different Solr versions. Get the correct version of the downloadable reference guide PDF, if something is not working.
Or, if you can generate it, use JSON format Solr understands directly and which has full support for multiple documents.
I'm Processing info in Google Cloud Dataflow, we tried to use JPA to insert or update the data into our mysql database, but these queries shouted down our server. So we've decided to change our paths...
I want to generate a mysql or .sql file so we can write the new info processed through dataflow. I want to know if there is an implemented way to do so, or do I have to do this by myself?
Let me explain a little more, we have an input from an XML, we process the info into java classes, we have a json dump of the db, so we can see what we have online without making so much calls, with this in mind, we compare the new info with the info we already have, and we decide if it's new or if it's just an update.
How can I do this via Java/Maven? I need code to generate this file...
Yes, Cloud Dataflow processes data in parallel on many machines. As such, it is not very surprising that other services may not be able to keep up or that some quotas are hit.
Depending on your specific use case, you may be able to slow/throttle Dataflow down without changing your approach. One might limit the number of workers, limit parallelism, use IntraBundleParallelization API, etc. This might be a better path, overall. We are also working on more explicit ways to throttle Dataflow.
Now, it is not really feasible for any system to automatically generate a .sql file for your database. However, it should be pretty straightforward to use primitives like ParDo and TextIO.Write to generate such a file via a Dataflow pipeline.
I have a Java client that allows indexing documents on a local ElasticSearch server.
I now want to build a simple Web UI that allows users to query the ES index by typing in some text in a form.
My problem is that, before calling ES APIs to issue the query, I want to preprocess the user input by calling some Java code.
What is the easiest and "cleanest" way to achieve this?
Should I create my own APIs so that the UI can access my Java code?
Should I build the UI with JSP so that I can directly call my Java
code?
Can I somehow make ElasticSearch execute my Java code before
the query is executed? (Perhaps by creating my own ElasticSearch plugin?)
In the end, I opted for the simple solution of using Json-based RESTful APIs. Time proved this to be quite flexible and effective for my case, so I thought I should share it:
My Java code exposes its ability to query an ElasticSearch index, by running an HTTP server and responding to client requests with Json-formatted ES results. I created the HTTP server with a few lines of code, by using sun.net.HttpServer. There are more serious/complex HTTP servers out there (such as Tomcat), but this was very quick to adopt and required zero configuration headaches.
My Web UI makes HTTP GET requests to the Java server, receives Json-formatted data and consumes it happily. My UI is implemented in PHP, but any web language does the job, as long as you can issue HTTP requests.
This solution works really well in my case, because it allows to have no dependencies on ES plugins. I can do any sort of pre-processing before calling ES, and even post-process ES output before sending the results back to the UI.
Depending on the type of pre-processing, you can create an Elasticsearch plugin as custom analyser or custom filter: you essentially extend the appropriate Lucene class(es) and wrap everything into an Elasticsearch plugin. Once the plugin is loaded, you can configure the custom analyser and apply it to the related fields. There are a lot of analysers and filters already available in Elasticsearch, so you might want to have a look at those before writing your own.
Elasticsearch plugins: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-plugins.html (a list of known plugins at the end)
Defining custom analysers: https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-analyzers.html