Interpreting Prometheus query result - java

My first question has been answered. Now I am trying to interpret the results based on the given query.
METRIC ACQUISITION:
// globally done
Summary.build()
.name("http_response_time")
.labelNames("method", "handler", "status")
.help("Request completed")
.register();
// done BEFORE every request
final long start = System.nanoTime();
// "start" is saved as a request attribute and lateron read from the request
// done AFTER every request
final double latencyInSeconds =
SimpleTimer.elapsedSecondsFromNanos(start, System.nanoTime());
responseTime.labels(
request.getMethod(),
handlerLabel,
String.valueOf(response.getStatus())
)
.observe(latencyInSeconds);
QUERY:
rate(http_response_time_sum{application="myapp",handler="myHandler", status="200"}[1m])
/
rate(http_response_time_count{application="myapp",handler="myHandler", status="200"}[1m])
RESULT:
0.0020312920780360694
So, what is this? Measured in ns, pushed to summary object in seconds.
As far as I would interpret it, this tells me that all successful requests of the last minute have an average latency of 0.0020 seconds (20ms).
Is that correct?

I will post my results here: the measured/calculated/interpreted value seems to be correct.
Anyway, I would prefer a more detailed and mathematical documentation of the Prometheus methods.

Related

How to ensure the expiry of a stream data structure in redis is set once only?

I have a function that use lettuce to talk to a redis cluster.
In this function, I insert data into a stream data structure.
import io.lettuce.core.cluster.SlotHash;
...
public void addData(Map<String, String> dataMap) {
var sync = SlotHash.getSlot(key).sync()
sync.xadd(key, dataMap);
}
I also want to set the ttl when I insert a record for the first time. It is because part of the user requirement is expire the structure after a fix length of time. In this case it is 10 hour.
Unfortunately the XADD function does not accept an extra parameter to set the TTL like the SET function.
So now I am setting the ttl this way:
public void addData(Map<String, String> dataMap) {
var sync = SlotHash.getSlot(key).sync()
sync.xadd(key, dataMap);
sync.expire(key, 60000 /* 10 hours */);
}
What is the best way to ensure the I will set the expiry time only once (i.e. when the stream structure is first created)? I should not set TTL multiple times within the function because every call to xadd will also follow by a call of expire which effectively postpone the expiry time indefinitely.
I think I can always check the number of items in the stream data structure but it is an overhead. I don't want to keep flags in the java application side because the app could be restarted and this information will be removed from the memory.
You may want to try lua script, sample script below which sets the expiry only if it's not set for key, works with any type of key in redis.
eval "local ttl = redis.call('ttl', KEYS[1]); if ttl == -1 then redis.call('expire', KEYS[1], ARGV[1]); end; return ttl;" 1 mykey 12
script also returns the actual expiry time left in seconds.

How can I get JPA/Entity Manager to make parallel queries instead of lumping them into one batch?

Inside the doGet method in my servlet I'm using a JPA TypedQuery to retrieve my data. I'm able to get the data I want through an http get request method. The method to get the data takes roughly 10 seconds and when I make a single request all is good. The problem occurs when I get multiple requests at the same time. If I make 4 request at the same time, all 4 queries are lumped together and they take 40 seconds to get the data back for all of them. How can I get JPA to make 4 separate queries in parallel? Is this something in the persistence.xml that needs set or is it a code related issue? Note: I've also tried executing this code in a thread. A link and some appropriate terminology to increase my understanding would be appreciated.
Thanks!
try{
String sequenceNo = request.getParameter("sequenceNo");
EntityManagrFactory emf = Persistence.createEntityManagerFactory("mydbcon");
EntityManager em = emf.createEntityManager();
long startTime = System.currentTimeMillis();
List<Myeo> returnData = methodToGetData(em);
System.out.println(sequenceNo + " " + (System.currentTimeMillis() - startTime));
String myJson = new Gson().toJson(returnData);
resp.getOutputStream().print(myJson);
resp.getOutputStream().flush();
}finally{
resp.getOutputStream().close();
if (em.isOpen())
em.close();
}
4 simulaneous request samples
localhost/myservlet/mycodeblock?sequenceNo=A
localhost/myservlet/mycodeblock?sequenceNo=B
localhost/myservlet/mycodeblock?sequenceNo=C
localhost/myservlet/mycodeblock?sequenceNo=D
resulting print statements
A 38002
B 38344
C 38785
D 39065
What I want
A 9002
B 9344
C 9785
D 10065
If you do 4 separate GET-requests these request should be called in parallel. They must not be lumped together, since they are called in different transactions.
If that does not work as you wrote, you should check whether you have defined a database-connection-pool-size or a servlet-thread-pool-size which serializes the calls to the dbms.

Influx db java client batch does not write to DB

I am trying to write points to influxDB using their Java client.
Batch is important to me.
If I use the influxDB.enableBatch with influxDB.write(Point) no data is inserted.
If I use the BatchPoints and influxDB.write(batchPoints) - data is inserted successfully.
Both code samples are taken from: https://github.com/influxdata/influxdb-java/tree/influxdb-java-2.7
InfluxDB influxDB = InfluxDBFactory.connect(influxUrl, influxUser, influxPassword);
influxDB.setDatabase(dbName);
influxDB.setRetentionPolicy("autogen");
// Flush every 2000 Points, at least every 100ms
influxDB.enableBatch(2000, 100, TimeUnit.MILLISECONDS);
influxDB.write(Point.measurement("cpu")
.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
.addField("idle", 90L)
.addField("user", 9L)
.addField("system", 1L)
.build());
Query query = new Query("SELECT idle FROM cpu", dbName);
QueryResult result = influxDB.query(query);
Returns nothing.
BatchPoints batchPoints = BatchPoints.database(dbName).tag("async", "true").build();
Point point1 = Point
.measurement("cpu")
.tag("atag", "test")
.addField("idle", 90L)
.addField("usertime", 9L)
.addField("system", 1L)
.build();
batchPoints.point(point1);
influxDB.write(batchPoints);
Query query = new Query("SELECT * FROM cpu ", dbName);
QueryResult result = influxDB.query(query);
This returns data successfully.
As mentioned, I need the first way to function.
How can I achieve that?
versions:
influxdb-1.3.6
influxdb-java:2.7
Regards, Ido
maybe it's too late or you have already resolved your issue, but I will answer your question, it may be useful for others.
I think your first example is not working because you enabled batch functionality and it will "Flush every 2000 Points, at least every 100ms". So basically it's working, but you are making select before the actual save is performed.
When you use influxDB.enableBatch(...); functionality influxdb-client creates internal thread pool for storing your data after collecting them or by timeout and it will not be done immediately.
In second example when you use influxDB.write(batchPoints); influxdb-client is synchronously writing your data to InfluxDb. That's why your select statement is able to return data immediately.

Elasticsearch Java API - How to get the number of documents without retrieving the documents

I need to get the number of documents in an index. not the documents themselves, but just this "how many" .
What's the best way to do that?
There is https://www.elastic.co/guide/en/elasticsearch/reference/current/search-count.html. but I'm looking to do this in Java.
There also is https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.4/count.html, but it seems way old.
I can get all the documents in the given index and come up with "how many". But there must be a better way.
Use the search API, but set it to return no documents and retrieve the count of hits from the SearchResponse object it returns.
For example:
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("your_index_goes_here")
.setTypes("YourTypeGoesHere")
.setQuery(QueryBuilders.termQuery("some_field", "some_value"))
.setSize(0) // Don't return any documents, we don't need them.
.get();
SearchHits hits = response.getHits();
long hitsCount = hits.getTotalHits();
Just an addition to #evanjd's answer
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("your_index_goes_here")
.setTypes("YourTypeGoesHere")
.setQuery(QueryBuilders.termQuery("some_field", "some_value"))
.setSize(0) // Don't return any documents, we don't need them.
.get();
SearchHits hits = response.getHits();
long hitsCount = hits.getTotalHits().value;
we need to add .value to get long value of total hits otherwise it will be a string value like "6 hits"
long hitsCount = hits.getTotalHits().value;
long hitsCount = hits.getTotalHits().value;
Elastic - Indices Stats
Indices level stats provide statistics on different operations
happening on an index. The API provides statistics on the index level
scope (though most stats can also be retrieved using node level
scope).
prepareStats(indexName)
client.admin().indices().prepareStats(indexName).get().getTotal().getDocs().getCount();
Breaking changes after 7.0; you need to set track_total_hits to true explicitly in the search request.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#track-total-hits-10000-default
We can also get lowLevelClient from highLevelClient and invoke the "_count" rest API like "GET /twitter/_doc/_count?q=user:kimchy".
2021 Solution
I went through the solutions posted and none of them are convincing. You may get the job done by setting size of the search request to 0 but that's not the correct way. For counting purposes we should use the count API because count consumes less resources/bandwidth and it doesn't require to fetch documents, scoring and other internal optimisations.
You must use the Count API for Java (link attached below) to get the count of the documents. Following piece of code should get the job done.
Build query using QueryBuilder
Pass the query and list of indexes to the CountRequest() constructor
Get CountResponse() object by doing client.count(countReq)
Extract/Return the value by doing countResp.getCount()
CountRequest countReq = new CountRequest(indexes, query);
CountResponse countResp = client.count(countReq, RequestOptions.DEFAULT);
return countResp.getCount();
Read the second link for more information.
Important Links
Count API vs Search API : Counting number of documents using Elasticsearch
Count API for Java : https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-count.html

API twitter rate limit

Here is what i'm trying to do :
I have a list a twitter user ID, for each one of them I need to retrieve a complete list of his followers ID and his friends ID. I don't need anything else, no screen name etc..
i'm using twitter4j btw
Here is how I'm doing it :
for each user i'm executing the following code in order to get a complete list of his followers IDs
long lCursor = -1
do{
IDs response = t.getFollowersIDs(id, lCursor);
long tab[] = response.getIDs();
for(long val : tab){
myIdList.add(val);
}
lCursor = response.getNextCursor();
}while(lCursor != 0);
My problem :
according to this page : https://dev.twitter.com/docs/api/1.1/get/followers/ids
the request rate limit for getFollowersIDs() is 15, considering this method return a maximum number of 5000 IDs, it means that it will be only possible to get 15*5000 IDs (or 15 users if they have less than 5000 followers).
This is really not enough for what i'm trying to do.
Am I doing something wrong ? Is there any solutions to improve that ? (even slightly)
Thanks for your help :)
The rate limit for that endpoint in v1.1 is 15 calls per 15 minutes per access token. See https://dev.twitter.com/docs/rate-limiting/1.1 for more information about the limits.
With that in mind, if you have an access token for each of your users, you should be able to fetch up to 75,000 (15*5000) follower IDs every 15 minutes for each access token.
If you only have one access token you'll, unfortunately, be limited in the manner you described and will just have to handle when your application hits the rate limit and continue processing once the 15 minutes is up.

Categories

Resources