Reducing metrics dimensions with a MetricFilter - java

We're currently implementing a Spring Boot service with Kafka stream and are currenty looking into the automatically generated metrics so we can establish proper monitoring.
We've found that the data (as exposed by the actuator endpoint) is split into too many dimensions for our needs; we'd be fine with metrics "by topic" for now, but there are also dimensions like "kafka.version", "partitions" etc.
What we did was to try and add a MeterFilter bean like this:
#Bean
public MeterFilter meterFilter() {
return MeterFilter.ignoreTags("spring.id", "kafka.version", "client.id", "partition");
}
This question states "this will ignore all meters with this tag."
That doesn't seem entirely the case - we see output like
(endpoint .../actuator/metrics/kafka.consumer.fetch.manager.records.lag)
{
"name": "kafka.consumer.fetch.manager.records.lag",
"description": "The latest lag of the partition",
"measurements": [
{
"statistic": "VALUE",
"value": 0.0
}
],
"availableTags": [
{
"tag": "topic",
"values": [
"our_internal_app-KSTREAM-TOTABLE-STATE-STORE-0000000002-changelog"
]
}
]
}
This remains the only metric shown at this endpoint.
Does this metric really not originally contain the tags "kafka.version" and "partition"? If we remove the filter, that topic appears along with several more (including internal ones) and with the dimensions we tried to remove.
We were expecting to either not see any metric ("ignore all meters with this tag") or all topics, which is how we read the MeterFilter.ignoreTags() javadoc: "Suppress tags with given tag keys." (suppress tags, not metrics containing them).
What are we misunderstanding?

Related

Multiple matches for request body in wiremock

Recently learned request matching in wiremock (http://wiremock.org/docs/request-matching/). Curious about what happens when a request's body matches more than one mappings (defined for same url path with different conditions and returns different json response)?
Technically, WireMock won't ever match twice -- once it finds a singular match, it will return that match. Based on my own testing with using separate mapping files, this usually is the most recently added mapping (I don't know how this works with creating the stubs programmatically, but my guess is that the most recently added stub would be matched and returned).
To avoid this sort of ambiguity, there are a few strategies you can employ, but my personal favorite is to use the priority field along with specific and general mapping.
{
"priority": 1,
"request": {
"url": "/test",
"queryParameters": {
"search_term": {
"equalTo": "WireMock"
}
}
},
"response": {
"status": 201
}
}
{
"priority": 10,
"request": {
"url": "/test",
"queryParameters": {
"search_term": {
"matches": "*"
}
}
},
"response": {
"status": 204
}
}
More information about priority can be found here.
I'd also challenge that you shouldn't have two specific mappings that would both be matched -- the matcher should differ enough to separate the two. If you do need two identical matches, in order to simulate data changing or some other workflow, you can use scenarios to achieve this.

Is is possible to reduce two separate query to only one in Elastic Search?

I have ElasticSearch data offers with the following structure:
{
{
"id": "123",
"tariffId": "15477",
"tariffFamilyId": "555",
"characteristics": "xxx"
},
{
"id": "124",
"tariffId": "15478",
"tariffFamilyId": "777",
"characteristics": "yyy"
},
{
"id": "351",
"tariffId": "25271",
"tariffFamilyId": "555",
"characteristics": "zzz"
}
}
I need to find all offers with tariffFamilyId of a certain tariffId. As an initial argument, I know only tariffId and do not know tariffFamilyId (I need to detect it). Normally it means the two separate requests to Elastic Search should be made:
first request - find tariffFamilyId by tariffId.
second request - find offers with that tariffFamilyId.
For example for tariffId=15477, we get tariffFamilyId=555. So for this family, there will be two offers with id 123 and 351.
The question - is it possible to somehow make only one request to Elastic search, not two?
P.S. This is for Java implementation.

ES JAVA API not able to updateByQuery when adding sort by an existing field

I want to avoid reindexing all documents after a field mapping being changed from (excerpt from the template file):
"subject": {
"type": "string"
},
to
"subject": {
"type": "string",
"fields": {
"sort": {
"type": "string",
"analyzer": "ducet_sort_subject"
},
"sortConversation": {
"type": "string",
"analyzer": "ducet_sort_subject_conversation"
}
}
},
the java code latest version is the following:
PutMappingResponse putMappingResponse = client.admin().indices().preparePutMapping(indexName)
.setSource("{\"properties\": {}}").setType("email").execute().actionGet();
BulkIndexByScrollResponse bulkIndexByScrollResponse = INSTANCE.newRequestBuilder(client).source(indexName).get();
Please note that there is no unmappedType default, so if the mapping is not found, I'm expecting to see some error, (as it does without those lines).
Instead I get the results back also for unindexed documents, but the order is random.
I have also tried to add the mapping manually in the setSource, but result didn't change.
In the documentation I read two statements that seem to be contrasting:
"The simplest usage of _update_by_query just performs an update on every document in the index without changing the source. This is useful to pick up a new property or some other online mapping change."
and
"Templates are only applied at index creation time. Changing a template will have no impact on existing indices. When using the create index API, the settings/mappings defined as part of the create index call will take precedence over any matching settings/mappings defined in the template."
is it even possible to do what I want without reindexing?

Load js script using java - MongoDB [duplicate]

Is there a way to use a user-defined function saved as db.system.js.save(...) in pipeline or mapreduce?
Any function you save to system.js is available for usage by "JavaScript" processing statements such as the $where operator and mapReduce and can be referenced by the _id value is was asssigned.
db.system.js.save({
"_id": "squareThis",
"value": function(a) { return a*a }
})
And some data inserted to "sample" collection:
{ "_id" : ObjectId("55aafd2bacbed38e06f9eccf"), "a" : 1 }
{ "_id" : ObjectId("55aafea6acbed38e06f9ecd0"), "a" : 2 }
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
Then:
db.sample.mapReduce(
function() {
emit(null, squareThis(this.a));
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
);
Gives:
"results" : [
{
"_id" : null,
"value" : 14
}
],
Or with $where:
db.sample.find(function() { return squareThis(this.a) == 9 })
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
But in "neither" case can you use globals such as the database db reference or other functions. Both $where and mapReduce documentation contain information of the limits of what you can do here. So if you thought you were going to do something like "look up data in another collection", then you can forget it because it is "Not Allowed".
Every MongoDB command action is actually a call to a "runCommand" action "under the hood" anyway. But unless what that command is actually doing is "calling a JavaScript processing engine" then the usage becomes irrelevant. There are only a few commands anyway that do this, being mapReduce, group or eval, and of course the find operations with $where.
The aggregation framework does not use JavaScript in any way at all. You might be mistaking just as others have done a statement like this, which does not do what you think it does:
db.sample.aggregate([
{ "$match": {
"a": { "$in": db.sample.distinct("a") }
}}
])
So that is "not running inside" the aggregation pipeline, but rather the "result" of that .distinct() call is "evaluated" before the pipeline is sent to the server. Much as with an external variable is done anyway:
var items = [1,2,3];
db.sample.aggregate([
{ "$match": {
"a": { "$in": items }
}}
])
Both essentially send to the server in the same way:
db.sample.aggregate([
{ "$match": {
"a": { "$in": [1,2,3] }
}}
])
So it is "not possible" to "call" any JavaScript function in the aggregation pipeline, nor is there really any point is "passing in" results in general from something saved in system.js. The "code" needs to be "loaded to the client" and only a JavaScript engine can actually do anything with it.
With the aggregation framework, all of the "operators" available are actually natively coded functions as opposed to the "free form" JavaScript interpretation provided for mapReduce. So instead of writing "JavaScript", you use the operators themselves:
db.sample.aggregate([
{ "$group": {
"_id": null,
"sqared": { "$sum": {
"$multiply": [ "$a", "$a" ]
}}
}}
])
{ "_id" : null, "sqared" : 14 }
So there are limitations on what you can do with functions saved in system.js, and the chances are that what you want to do is either:
Not allowed, such as accessing data from another collection
Not really required as the logic is generally self contained anyway
Or probably better implemented in client logic or other different form anyway
Just about the only practical use I can really think of is that you have a number of "mapReduce" operations that cannot be done any other way and you have various "shared" functions that you would rather just store on the server than maintain within every mapReduce function call.
But then again, the 90% reason for mapReduce over the aggregation framework is usually that the "document structure" of the collections has been poorly chosen and the JavaScript functionality is "required" to traverse the document for search and analysis.
So you can use it under the allowed constraints, but in most cases you probably should not be using this at all, but fixing the other issues that caused you to believe you needed this feature in the first place.

How do I capture STDOUT in Cucumber JVM like Cucumber Ruby's puts?

In vanilla Cucumber, anything output by a call to puts in a step definition is captured as "test output" and formatted accordingly, as in the following output example:
Feature: A simple thing
Scenario: A simple scenario # features/simple.feature:3
Given I do a step # features/steps/step.rb:1
This text is output with puts
As you see above, it's helpfully indented in the "pretty" output format. In the JSON format, it's even captured in a structured way:
"keyword": "Scenario",
"name": "A simple scenario",
"line": 3,
"description": "",
"id": "a-simple-thing;a-simple-scenario",
"type": "scenario",
"steps": [
{
"keyword": "Given ",
"name": "I do a step",
"line": 4,
"output": [
"This text is output with puts"
],
}
],
The above is generated with a trivial feature file and a step definition like the following:
Given(/^I do a step$/) do
puts 'This text is output with puts'
end
Is there an equivalent function when implementing Cucumber steps in Java that I can use to have this output captured the same way? Printing to System.out results in bypassing the capturing mechanism, much like using STDOUT.puts in Ruby.
I haven't seen any Cucumber-JVM examples that make use of this feature, unlike many of Ruby Cucumber's examples, but there is clearly an entry in the JSON output by Cucumber-JVM for "output," and therefore I imagine there must be a way to write to that field.
It looks like this can be accomplished by writing to the object representing the currently running scenario.
public class StepDefs {
private Scenario scenario;
/* Need to capture the scenario object in the instance to access it
* in the step definition methods. */
#Before
public void before(Scenario scenario) {
this.scenario = scenario;
}
#Given("^I do a step$")
public void iDoAStep() {
scenario.write("This text is output with scenario.write");
}
}
As with the puts call in Ruby, this gets captured in the JSON output file. It's also colored in the "pretty" output, though it's not indented like the Ruby implementation. Nevertheless, this seems like the closest equivalent I can find.

Categories

Resources