Publishing mterics to ganglia using gmetric4j

Publishing mterics to ganglia using gmetric4j - java

I'm considering using gmetric4j to publish metrics to ganglia. So far the only documented way I found for doing this is to use it's GSampler class to make a Metric data polling Runnable that runs at scheduled times.
In my application, though, it would be easier to have it's components themselves publish the metric data when they see fit (i.e. not in regular scheduled intervals). From inspecting the gmetric4j source code I can see that this can be done with GMetric objects, but I am not sure if this would produce the meaningful results in the end.
So what I would like to know is:
Can you publish data to ganglia at irregular intervals, and if yes how are data aggregations and time series formed in this case?
Also I failed to understand the meaning of "tmax" (-x on command line) and "dmax" (-d on command line) parameters of gmetric calls and if they have anything to do with the above problem. Does anyone know anything more about these?

Have you tried the Metrics Library? It has a Ganglia reporter that takes care of when and how to send your measurements to gmond/gmetad. You can also check out the source if you want a code example.
For dmax, tmax, and how often to report, I found this to be a good source

Related

Multiple reader/processor/writer in spring batch

I am new to Spring batch and I have a peculiar problem. I want to get results from a 3 different jpa queries with JpaPagingItemReader and process them individually and write them into one consolidated XML file using StaxEventItemWriter.
For eg the resultant XML would look like,
<root>
<query1>
...
</query1>
<query2>
...
</query2>
<query3>
...
</query3>
</root>
Please let me know how to achieve this?
Also, I currently implemented my configurer with one query but the reader/writer is also quite slow. It took around 59 minutes to generate file of 20MB as I am running it in single threaded environment as of now as opposed to multithreaded env. If there are any other suggestions around it, please do let me know. Thanks.
EDIT:
I tried following this approach:
Created 3 different steps and added 1 reader, processor, writer in each of them but the problem I am facing now is writer is not able to write in the same file or append to it.
This is written in StaxEventItemWriter class:
FileUtils.setUpOutputFile(file, restarted, false, overwriteOutput);
Here 3rd argument append is false by default.

The second approach to your question seems the right direction, you could create 3 different readers/processors/writers and create your custom writer which should extend AbstractFileItemWriter in which setAppend is allowed. Also, I have seen that xmlWriter writes faster xmls than StaxEventItemWriter but there is some trade off in writing boiler plate code.

One option off the top of my head is to
create a StaxEventItemWriter
create 3 instances of a step that has a JpaPagingItemReader and writes the corresponding <queryX>...</queryX> section to the shared writer
write the <root> and </root> tags in a JobExecutionListener, so the steps don't care about the envelope
There are other considerations here, like whether it's always 3 files, etc. but the general idea is to separate concerns between processors, step, job, tasks, and listeners to make each perform a clear piece of work.

use JVisualVm to monitor the bottlenecks inside your application.
Since you said it is taking 59 minutes to create file of 20MB, you will get better insights of where you are getting performance hits.
VisualVm tutorial
Open visualvm connect your application => sampler => cpu => CPU Samples.
Take snapshot at various times and analyse where is it taking much time. By checking this only you will get enough data for optimisation.
Note: JvisualVm comes under oracle jdk 8 distribution. you can simply type jvisualvm on command prompt/terminal. if not download from here

How to Monitor/inspect data/attribute flow in Java code

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to another and perform some more processing and then finally convert into final result hibernate object. In a nutshell something like POJO1 to POJO2 to POJO3.
In Java is there a way where I can deduce that an attribute from POJO3 was made/transformed from this attribute of POJO1. I want to look something where I can capture data flow from one model to another. This tool can be either compile time or runtime, I am ok with both.
I am looking for a tool which can run in parallel with code and provide data lineage details on each run basis.

Now instead of Pojos I will call them States! You are having a start position you iterate and transform your model through different states. At the end you have a final terminal state that you would like to persist to the database
stream(A).map(P1).map(P2).map(P3)....-> set of B
If you use a technic known as Event sourcing you can deduce it yes. How would this look like then? Instead of mapping directly A to state P1 and state P1 to state P2 you will queue all your operations that are necessary and enough to map A to P1 and P1 to P2 and so on... If you want to recover P1 or P2 at any time, it will be just a product of the queued operations. You can at any time rewind forward or rewind backwards as long as you have not yet chaged your DB state. P1,P2,P3 can act as snapshots.
This way you will be able to rebuild the exact mapping flow for this attribute. How fine grained you will queue your oprations, if it is going to be as fine as attribute level , or more course grained it is up to you.
Here is a good article that depicts event sourcing and how it works: https://kickstarter.engineering/event-sourcing-made-simple-4a2625113224
UPDATE:
I can think of one more technic to capture the attribute changes. You can instument your Pojo-s, it is pretty much the same technic used by Hibernate to enhance Pojos and same technic profiles use to for tracing. Then you can capture and react to each setter invocation on the Pojo1,Pojo2,Pojo3. Not sure if I would have gone that way though....
Here is some detiled readin about the byte code instrumentation if https://www.cs.helsinki.fi/u/pohjalai/k05/okk/seminar/Aarniala-instrumenting.pdf

I would imagine two reasons, either the code is not developed by you and therefore you want to understand the flow of data along with combinations to convert input to output OR your code is behaving in a way that you are not expecting.
I think you need to log the values of all the pojos, inputs and outputs to any place that you can inspect later for each run.
Example: A database table if you might need after hundred of runs, but if its one time may be to a log in appropriate form. Then you need to yourself manually use those data values layer by later to map to the next layer. I think with availability of code that would be easy. If you have a different need pls. explain.
Please accept and like if you appreciate my gesture to help with my ideas n experience.

There are "time travelling debuggers". For Java, a quick search did only spill this out:
Chronon Time Travelling Debugger, see this screencast how it might help you .
Since your transformations probably use setters and getters this tool might also be interesting: Flow
Writing your own java agent for tracking this is probably not what you want. You might be able to use AspectJ to add some stack trace logging to getters and setters. See here for a quick introduction.

Java Cucumber: creating scenario outlines with dynamic examples

We have a test where basically we need to input a specific value in a web site and make sure another value comes out. The data of the input-output for this is stored in an XML file.
Now we can create a single Scenario that runs once and loops through, submitting each value however we run into some reporting problems, if 2 out of 100 pairs fail we want to know which ones and not just have an assertion error for the whole scenario.
We would get much clearer reporting using a Scenario Outline where all the values are in the examples table. then the scenario itself runs repeatedly and we can fail an individual set as an assertion error and have that kick back clearly in a report.
Problem: we do not want to hard code all the values from the xml into the .feature. it's noisy but also if the values change it's slow to update. we would rather just provide the XML parse it and go, if things change we just drop in an updated XML.
Is there a way to create dynamic examples where we can run the scenario repeatedly, one for each data case, without explicitly defining it in the examples table ?

Using Cucumber for this is a bad idea. You should test this functionality lower down your stack with a unit test.
At some point in your code, after the user has input their value, the value will be passed to a method/function that will return your answer. This is the place to do this sort of testing.
A cucumber test going through the whole stack will upwards of 3 orders of magnitude slower than a well written unit tests. So you could test thousands of pairs of values in your unit test in the time it takes to run one single cuke.
If you do this sort of testing in Cucumber you will quickly end up with a test suite that takes far too long to run, or that can only be run quickly at great expense. This is very damaging to a project.
Cuking should be about one happy path (The user can enter a value and see the result) and maybe a sad path (the user enters a bad value and sees an error/explanation). Anything else needs to be pushed down to unit tests.

The NoraUi framework does exactly what you want to do in your project. The NoraUi code is open source. If you have questions about this framework, you can post an issue with the tag "Question"

WSO2 EI - Disable collecting statistics for specific component types

I'm looking to disable collecting statistics for all sequences and mediators in WSO2 EI. I still want to collect statistics about service calls and what not but discard
unwanted statistics about sequences and mediators contained in those services (which is a lot of unnecessary data).
I'm aware that apart from enabling/disabling statistics for specific services, you can also disable statistics for specific sequences, which would also mean not collecting stats about mediators contained in those sequences. However, in our project some services only contain mediators and not sequences.
So far we've tried adding booleans into synapse.properties file
mediation.flow.statistics.collect.proxy=true
mediation.flow.statistics.collect.api=true
mediation.flow.statistics.collect.mediator=false
mediation.flow.statistics.collect.sequence=false
mediation.flow.statistics.collect.resource=true
mediation.flow.statistics.collect.endpoint=true
and editing reportEntryEvent() and
reportChildEntryEvent() methods in org.apache.synapse.aspects.flow.statistics.collectors.OpenEventCollector.java file. For example if incoming componentType is mediator, I quit the reportChildEntryEvent() method assuming it would stop the statistic collection process. However this logic doesn't seem to be correct as I still receive mediator statistics in my Stream Processor.
This statistic handling is probably being managed also somewhere else but I actually struggle to see where and what exactly in the wso2-synapse code should I edit to achieve this behavior.
Thanks for any reply.

How do I log from a mapper? (hadoop with commoncrawl)

I'm using the commoncrawl example code from their "Mapreduce for the Masses" tutorial. I'm trying to make modifications to the mapper and I'd like to be able to log strings to some output. I'm considering setting up some noSQL db and just pushing my output to it, but it doesn't feel like a good solution. What's the standard way to do this kind of logging from java?

While there is no special solution for the logs aside of usual logger (at least one I am aware about) I can see about some solutions.
a) if logs are of debug purpose - indeed write usual debug logs. In case of the failed tasks you can find them via UI and analyze.
b) if this logs are some kind of output you want to get alongside some other output from you job - assign them some specail key and write to the context. Then in the reducer you will need some special logic to put them to the output.
c) You can create directory on HDFS and make mapper to write to there. It is not classic way for MR because it is side effect - in some cases it can be fine. Especially taking to account that after each mapper will create its own file - you can use command hadoop fs -getmerge ... to get all logs as one file.
c) If you want to be able to monitor the progress of your job, number of error etc - you can use counters.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.