First some background:
I'm currently using j-interop to query WMI calls to a Windows box from a Linux box, I'm running this query against WMI:
SELECT * FROM __InstanceCreationEvent WHERE TargetInstance ISA 'Win32_NTLogEvent'
And executing it as a notification query so I can get the data back as soon as it's created. However this proves an issue at (rare) times.
Say, when a user changes permissions on a root folder, I can be flooded with thousands of logs, the system can handle this fine, java and the interop code is happy, however the WMI cycle seems to be this:
Hook into event
while(forever)
{
Query server for next event.
Do work with event.
}
Obviously this doesn't work for me, being as I'll jump back and forth from the server thousands of times, program doesn't choke but it sure takes forever, I can't find a way to get the event to return all pending events (I think).
The next choice is to keep track of the last record ID returned by WMI, and do a straight up query for all events where their record ID is greater than the last, I'm assuming this will work better, however I'm not familiar with DCOM.
So my question:
If I run a ExecQuery instead of a notification query, will I have to dance back and forth between client/server to iterate through each record returned by the query due to the nature of Distributed COM?
The main solution here is to not use DCOM for this, it's terribly inefficient and I've put servers under a decent amount of CPU strain executing large sets of DCOM instructions.
I'm looking into Windows RPC Java implementations if I wanted to do this remotely, or WinAPI locally.
Related
If I have Microservice, which should create User but since user creation is complex it uses queue, and user is actually created by the consumer the endpoint only takes request and returns ok or fail.
How do I create acceptance test for this acceptance criteria:
Given: User who wants to register
When: api is requested for user creation
Then: create user AND set hosting environment_id on new user
For this I have to wait while the environment is actually set up, which takes up to 30 seconds. And if I implement sleep inside my test, then I hit anti pattern wait and see how to properly test it without failing best practices?
most proper might be, to return a response instantly, let's say "setup process started" (with a setup process id) and then have another API method, which will "obtain setup status" (for that setup process id) - and then proceed, when "setup has completed".
because, alike this nothing will be stuck for 30s, neither in tests nor production - and one could display a progress bar to the user, which indicates the current status, so that they will have an estimate how long it will take - whilst not getting the impression, that something is stuck or would not work.
one barely can test asynchronously, while the setup process by itself won't be asynchronous; and long-running tasks without any kind of status indicator are barely acceptable for delivery - because this only appears valid, while knowing what is going on in the background, but not whilst not knowing that.
whenever testing hits an anti-pattern, this is an indicator, that the solution might be sub-optimal.
I don't presume to tell you exactly how to code your acceptance tests without more detail regarding language or testing stack, but the simplest solution is to implement a dynamic wait that continuously polls the state of the system for a desired result before moving forward, breaking the loop (presuming you would use some form of loop, but that’s up to you) when the expected/desired response has been received.
This "polling" can take many forms such as:
a) querying for an expected update to a database (perhaps a value within a table is updated when the user is created)
b) pinging the dependent service until you receive the proper "signal" you are expecting to indicate user creation. For example, perhaps a GET request to another service (or another endpoint of the same service) returns a status of “created” for the given user, signifying that the user has been created.
Without further technical information I can’t give you exact instructions, but dynamic polling is the solution I use every day to test our asynchronous microservice architecture.
Keep in mind, this dynamic polling solution operates on the assumption that you have access to the service(s) and/or database(s) that contain the indicator for which you are "polling" when it is time to move forward with your test. Again, I'm the signal to move forward is something transparent such as a status change for the newly created user, the user's existence in a database/table either external or internal to the microservice, etc.
Some other assumptions in this scenario are:
a) sufficient non-functional performance of the System Under Test, where poor non-functional performance of the System Under Test would be a constraint.
b) a lack of resource constraints as resources are consumed somewhat heavily during the "polling", as resources are consumed somewhat heavily during the period of “polling”. (think Azure dynamic resource flexing, which can be costly over time).
Note: Be careful for infinite loops. You should insert some sort of constraint that exits the polling loop (and likely results in a failed test) after a reasonable period of time or number of attempts at your discretion.
Create a query service that given the user attributes (id, or name etc), will return the status of the user.
For the acceptance criteria, will be 2 part
create-user service returns 200
get-status service returns 200 (you can call it in a loop in your test).
This service will be helpful in the long run for various reasons
Check how long is it taking to the async process to complete.
At any time you can get status of any user, including to validate if a user is truly deleted / inactivated etc
You can mock this service results in your end-to-end integrated testing.
I have written a managment application which has a function to put a bunch of events in multiple Google calendars.
On my computer everything works fine. But the main user of this application has a verry bad network connection. More percicely the ping to different server varies between 23ms and like 2000 ms and packets get lost.
My approach was, besides increasing the timout, to use an own thread for each API call and recall in case of an connection error.
And at this point I got stuck. Now every event is created. Unfortunately not once but at least once. So some events were uploaded mutiple times.
I have already tried to group them as batch requests, but google doesn't want events on multiple calendars in a single batch request.
I hope my situtaion is clear and someone has a solution for me.
I would first try to persuade the "main user" to get a better network connection.
If that is impossible, I would change the code to have the following logic:
// Current version
createEvent(parameters)
// New version
while (queryEvent(parameters) -> no event) {
createEvent(parameters)
}
with appropriate timeouts and retry counters. The idea is to implement some extra logic to make the creation of an event in the calendar idempotent. (This may entail generating a unique identifier on the client side for each event so that you can query the events reliably.)
My application takes a lot of measurements of it's internal processes. For example I time certain methods, I time external webservice calls and I also have variables which have a changing value, and processes which have a 'state' (e.g. PAUSED, WAITING etc).
The application uses 100 to 200 threads, and each bit of data would be associated with a particular thread.
I am looking for some software that I can channel all this information into that would produce useful metrics and graphs of the data (ideally in real time or close to real time), let me set thresholds to trigger warnings, would allow me to filter the data by thread or thread group, etc etc.
The application is performing time critical tasks so the software/api would need to be very fast and never block.
The application is written in java, and ideally the software/api would be in java as well. I think what I'm looking for is called Event Stream Processing, but I'm really not sure what language to use to describe it.
All I've found so far are Esper and ERMA. Can anyone give me a recommendation? I'm the only one working on this project so I'm hoping for something that is pretty easy to set up and use, and has a workable front end.
In the end I found Graphite which was pretty close to being exactly what I wanted. Not the simplest to set up and configure however, but I got it working in the end.
http://graphite.wikidot.com/
In my case I send data directly from my application to Statsd (via UDP), which collects the data and does some pre processing before it ends up in the whisper back end, there is a simple example of a java interface here https://github.com/etsy/statsd/commit/2253223f3c19d2149d65ec5bc802198ff93da4cb
Alternatively you could send your data directly to graphite, example here http://neopatel.blogspot.co.uk/2011/04/logging-to-graphite-monitoring-tool.html
Given the following facts, is there a existing open-source Java API (possibly as part of some greater product) that implements an algorithm enabling the reproducible ordering of events in a cluster environment:
1) There are N sources of events, each with a unique ID.
2) Each event produced has an ID/timestamp, which, together with
its source ID, makes it uniquely identifiable.
3) The ids can be used to sort the events.
4) There are M application servers receiving those events.
M is normally 3.
5) The events can arrive at any one or more of the application
servers, in no specific order.
6) The events are processed in batches.
7) The servers have to agree for each batch on the list of events
to process.
8) The event each have earliest and latest batch ID in which they
must be processed.
9) They must not be processed earlier, and are "failed" if they
cannot be processed before the deadline.
10) The batches are based on the real clock time. For example,
one batch per second.
11) The events of a batch are processed when 2 of the 3 servers
agree on the list of events to process for that batch (quorum).
12) The "third" server then has to wait until it possesses all the
required events before it can process that batch too.
13) Once an event was processed or failed, the source has to be
informed.
14) [EDIT] Events from one source must be processed (or failed) in
the order of their ID/timestamp, but there is no causality
between different sources.
Less formally, I have those servers that receive events. They start with the same initial state, and should keep in sync by agreeing on which event to process in which order. Luckily for me, the events are not to be processed ASAP, but "in a bit", so that I have some time to get the servers to agree before the deadline. But I'm not sure if that actually make any real difference to the algorithms. And if all servers agree on all batches, then they will always be in sync, therefore presenting a consistent view when queried.
While I would be most happy with a Java API, I would accept something else if I can call it from Java. And if there is no open-source API, but a clear algorithm, I would also take that as an answer and try to implement it myself.
Looking at the question and your follow-up there probably "wasn't" an API to satisfy your requirements. To day you could take a look at the Kafka (from LinkedIn)
Apache Kafka
And the general concept of "a log" entity, in what folks like to call 'big data':
The Log: What every software engineer should know about real-time data's unifying abstraction
Actually for your question, I'd begin with the blog about "the log". In my terms the way it works -- And Kafka isn't the only package out doing log handling -- Works as follows:
Instead of a queue based message-passing / publish-subscribe
Kafka uses a "log" of messages
Subscribers (or end-points) can consume the log
The log guarantees to be "in-order"; it handles giga-data, is fast
Double check on the guarantee, there's usually a trade-off for reliability
You just read the log, I think reads are destructive as the default.
If there's a subscriber group, everyone can 'read' before the log-entry dies.
The basic handling (compute) process for the log, is a Map-Reduce-Filter model so you read-everything really fast; keep only stuff you want; process it (reduce) produce outcome(s).
The downside seems to be you need clusters and stuff to make it really shine. Since different servers or sites was mentioned I think we are still on track. I found it a finicky to get up-and-running with the Apache downloads because the tend to assume non-Windows environments (ho hum).
The other 'fast' option would be
Apache Apollo
Which would need you to do the plumbing for connecting different servers. Since the requirements include ...
servers that receive events. They start with the same initial state, and should keep in sync by agreeing on which event to process in which order. Luckily for me, the events are not to be processed ASAP, but "in a bit", so that I have some time to get the servers to agree before the deadline
I suggest looking at a "Getting Started" example or tutorial with Kafka and then looking at similar ZooKeeper organised message/log software(s). Good luck and Enjoy!
So far I haven't got a clear answer, but I think it would be useful anyone interested to see what I found.
Here are some theoretical discussions related to the subject:
Dynamic Vector Clocks for Consistent Ordering of Events
Conflict-free Replicated Data Types
One way of making multiple concurent process wait for each other, which I could use to synchronize the "batches" is a distributed barrier. One Java implementation seems to be available on top of Hazelcast and another uses ZooKeeper
One simpler alternative I found is to use a DB. Every process inserts all events it receives into the DB. Depending on the DB design, this can be fully concurrent and lock-free, like in VoltDB, for example. Then at regular interval of one second, some "cron job" runs that selects and marks the events to be processed in the next batch. The job can run on every server. The first to run the job for one batches fixes the set of events, so that the others just get to use the list that the first one defined. Like that we have a guarantee that all batches contain the same set of event on all servers. And if we can use a complete order over the whole batch, which the cron job could specify itself, then the state of the servers will be kept in sync.
I'm building an application with distributed parts.
Meaning, while one part (writer) maybe inserting, updating information to a database, the other part (reader) is reading off and acting on that information.
Now, i wish to trigger an action event in the reader and reload information from the DB whenever i insert something from the writer.
Is there a simple way about this?
Would this be a good idea? :
// READER
while(true) {
connect();
// reload info from DB
executeQuery("select * from foo");
disconnect();
}
EDIT : More info
This is a restaurant Point-of-sale system.
Whenever the cashier punches an order into the db - the application in the kitchen get's the notification. This is the kind of system you would see at McDonald's.
The applications needn't be connected in any way. But each part will connect to a single mySQL server.
And i believe i'd expect immediate notifications.
You might consider setting up an embedded JMS server in your application, I would recommend ActiveMQ as it is super easy to embed.
For what you want to do a JMS Topic is a perfect fit. When the cashier punches in an order the order is not written to the database but in a message on the Topic, let's name it newOrders.
On the topic there are 2 subscribers : NewOrderPersister and KitchenNotifier. These will each have an onMessage(Message msg) method which contains the details of the order. One saves it to the database, the other adds it to a screen or yells it through te kitchen with text-to-speech, whatever.
The nice part of this is that the poster does not need to know which and how many subscribers are there waiting for the messages. So if you want a NewOrderCOunter in the backoffice to keep an online count of how much money the owner has made today, or add a "FreanchFiresOrderListener" to have a special display near the deep frying pan, nothing has to change in the rest of the application. They just subscribe to the topic.
The idea you are talking about is called "polling". As Graphain pointed out you must add a delay in the loop. The amount of delay should be decided based on factors like how quickly you want your reader to detect any changes in database and how fast the writer is expected to insert/update data.
Next improvement to your solution could be to have an change-indicator within the database. Your algo will look something like:
// READER
while(true) {
connect();
// reload info from DB
change_count=executeQuery("select change_count from change_counters where counter=foo");
if(change_count> last_change_count){
last_change_count=change_count;
reload();
}
disconnect();
}
The above change will ensure that you do not reload data unnecessarily.
You can further tune the solution to keep a row level change count so that you can reload only the updated rows.
I don't think it's a good idea to use a database to synchronize processes. The parties using the database should synchronize directly, i.e., the writer should write its orders and then notify the kitchen that there is a new order. Then again the notification could be the order itself (or some ID for the database). The notifications can be sent via a message broker.
It's more or less like in a real restaurant. The kitchen rings a bell when meals are finished and the waiters fetch them. They don't poll unnecessarily.
If you really want to use the database for synchronization, you should look into triggers and stored procedures. I'm fairly sure that most RDBMS allow the creation of stored procedures in Java or C that can do arbitrary things like opening a Socket and communicating with another Computer. While this is possible and not as bad as polling I still don't think of it as a very good idea.
Well to start with you'd want some kind of wait timer in there or it is literally going to poll every instance of time it can which would be a pretty bad idea unless you want to simulate what it would be like if Google was hosted on one database.
What kind of environment do the apps run in? Are we talking same machine notification, cross-network, over the net?
How frequently do updates occur and how soon does the reader need to know about them?
I have done something similar before using jGroups I don't remember the exact details as it was quite a few years ago but I had a listener on the "writer" end which would then use JGroups to send out notification of change which would cause the receivers to respond accordingly.