Elastic Search custom logger for Java - java

Currently we have several Java micro service apps that use Elastic Search, and for debugging purposes we have the logging set to tracer. This outputs all ES requests and responses to the logs. We really only need requests, and only on non-production. For all environments we want to keep search response times along with a custom header that we set for tracking purposes across multiple micro service apps.
I see that in .NET there is a custom solution that would work perfectly for us: https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/logging-with-on-request-completed.html#logging-with-on-request-completed but sadly I can't seem to find a matching Java feature.
Is there a way to do this using Java?

If I got you question correct then you want the following :-
Log every elasticsearch query only (and not response) from different
microservices.
You just want it on your test clusters
There is a workaround in elastisearch for the same. Elasticsearch itself logs the queries made to it and you just need to set a threshold for it. So any query that takes more time than that threshold would be logged in a separate file "_slow_log." in your logs folder. You just simply need to set the threshold to "0" to log every query only and agin this can be done in testing enviorments for your particular usecase.
There are a lot of configuration options in it and would recommend you to check this : https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html

Related

Is it possible to add a sequential number to each log event using Logback?

I have an ELK mounted and my app send logs use Logback. The problem is that I loose the events order when there are many logs with the same timestamp (many fast events in the same millisecond).
I would like to add a sequential number to keep the log events order when there is many logs at the same time. Is that possible? How?
Thank you and sorry for my English.
EDIT:
Sorry, I give more information about the scenario: I'm using a Springboot application which sends the logs to an ELK (Elastic-Logstash-Kibana) stack.
I need to add a field with the sequence number so then I'll be able to order the logs in Kibana using that field. Currently Kibana is ordering the logs by timestamp field but sometimes there is too many logs at the same time. The logs at the same time are unordered.
You can use custom log pattern. Please follow below link.
https://reflectoring.io/logging-format-logback/

Elastic Stack - REST API logging with full JSON request and response

Background
We have a web server written in Java that communicates with thousands of mobile apps via HTTPS REST APIs.
For investigation purposes we have to log all API calls - currently this is implemented as a programming #Aspect, and for each API call we save an api_call_log object into a MySQL table with the following attributes:
tenant_id
username
device_uuid
api_method
api_version
api_start_time
api_processing_duration
request_parameters
full_request (JSON)
full_response (JSON)
response_code
Problem
As you can imagine after reaching a certain throughput this solution doesn't scale well, and also querying this table is very slow even with the use of the right MySQL indices.
Approach
That's why we want to use the Elastic Stack to re-implement this solution, however I am a bit stuck at the moment.
Question
I couldn't found any Logstash plugins yet that would suit my needs - should I output this api_call_log object into a log file instead and use Logstash to parse, filter and transform that file?
Exactly this is what I would do in this case. Write your log to a file using a framework like logback, rotate it. If you want easy parsing use json as logging format (also available in logback). Then use Filebeat in order to ingest the logfile as it gets written. If you need to transform/parse the messages in elasticsearch ingest nodes using pipelines.
Consider tagging/enriching the logfiles read by filebeat with machine or enviroment specific informations in order to ask for them in your visualisation or report etc.
The filebeat-to-elastic approach is the simplest one. Try this first. If you can't get your parsing done in elasticsearch pipelines, put a logstash in between.
Using filebeat you'll get many stuff for free like backpressure handling and daily indicies what comes very handy in the logging scenario we are discussing here.
When you need a visualization or search ui, have a look on kibana or grafana.
And if you have more questions, raise a new question here.
Have Fun!
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-installation.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html

How to find unused endpoints on Spring REST application running on Tomcat?

We are running our Spring applications on Tomcat, and over a period of time we have added multiple REST endpoints to our application. We now want to trim it down and remove all the unused endpoints that our GUIs do not use any more.
We do use Splunk, but it will only give the number of hits on active endpoints from the log aggregator on localhost_access file of Tomcat. We want to find the end points that have 0 hits.
The most straightforward way is to write some kind of python script, that copies data from Tomcat start up, and gets all the end points(or manually feed it). Then put them in a hash map, and then go over local host access files in Tomcat server logs for last few months, incrementing a counter when the corresponding endpoint is met. Then print out all the keys in this hash map with value 0.
Is the above a feasible way to do so, or does there exist an easier method?
Splunk is essentially a search engine and, like any other search engine, cannot find something that is not there. Endpoints with no hits will not have data in Splunk and so will not appear in search results.
The usual approach to a problem like this is to start with a list of known objects and subtract those that are found by Splunk. The result is a list of unused objects. You touched on this concept yourself with your hash map idea.
Create a CSV file containing a list of all of your endpoints. I'll call it endpoints.csv. Then use it in a search like this one:
index=foo endpoint=* NOT [ inputlookup endpoint.csv | format ]
One way to find unused endpoints, go to access.log, check for few days logs which all endpoints are getting accessed. You'll get to know, which endpoints are unused over a period of time.

Advanced logging for Java Web Applications

I want to build a more advanced logging mechanism for my java web applications, similar to App engine logs.
My needs are:
Stream logs to a database (for ex. sql, bigquery or something else)
Automatically log important data (like app context, request url, request id, browser user agent, user id, etc.)
For point 1, I ca use a "buffering" implementation, where logs are put into different lists, and periodically a cron (thread) gathers all the logs in memory and write's them to database (which can also be on another server)
For point 2, the only way I found of doing this is to inject needed objects into my classes (subsystems), like ServletContext, HttpServletReqest, current user, etc, all modeled into a custom class (let's say AppLogContext), which then can be used by the logging mechanism.
The problem here is that I don't know if this is a good practice. For example, that means that many classes will have to contain this object which has access to servlet context and http request objects and I'm thinking this may create architectural problems (when building modules, layers etc) or even security issues.
App Engine will automatically log this kind of information (and much more, like latencies, cpu usage etc, but this more complicated), and it can be found in the project's Console logs (also it can duplicate logs to big query tables) and I need something for Jetty or other java web app servers.
So, is there another way of doing this, other patterns, different approaches? (couldn't find 3rd party libraries for any of these points)
Thank you.
You don't really need to invent a bicycle.
There is a common practice that you can follow:
Just log using standard logger to a file
(if you need to see logs in request context) Logback, Log4J and SLF4J supports Mapped Diagnostic Context (MDC), that's what you can use to put current request into every log line (just initialize context in a filter, put request id for example, or generate a random uuid). You can aggregate log entries by this id later
Then use ELK:
Logstash got gather logs into
ElasticSearch for storing logs
to analyze using Kibana

How do I generate a mysqldump script in Java Maven?

I'm Processing info in Google Cloud Dataflow, we tried to use JPA to insert or update the data into our mysql database, but these queries shouted down our server. So we've decided to change our paths...
I want to generate a mysql or .sql file so we can write the new info processed through dataflow. I want to know if there is an implemented way to do so, or do I have to do this by myself?
Let me explain a little more, we have an input from an XML, we process the info into java classes, we have a json dump of the db, so we can see what we have online without making so much calls, with this in mind, we compare the new info with the info we already have, and we decide if it's new or if it's just an update.
How can I do this via Java/Maven? I need code to generate this file...
Yes, Cloud Dataflow processes data in parallel on many machines. As such, it is not very surprising that other services may not be able to keep up or that some quotas are hit.
Depending on your specific use case, you may be able to slow/throttle Dataflow down without changing your approach. One might limit the number of workers, limit parallelism, use IntraBundleParallelization API, etc. This might be a better path, overall. We are also working on more explicit ways to throttle Dataflow.
Now, it is not really feasible for any system to automatically generate a .sql file for your database. However, it should be pretty straightforward to use primitives like ParDo and TextIO.Write to generate such a file via a Dataflow pipeline.

Categories

Resources