Is there any open source implementation of a "refillable" queue in Java?
Essentially, such a queue would be implemented as a class which reads data from a source and stores it in its memory buffer, which is replenished every time the queue capacity falls below a predefined threshold. Therefore, it requires:
An in memory buffer to hold the data.
An input source to fill the buffer whenever it goes below threshold.
JMS queues or any other messaging system which uses network serialization are not suitable, for performance reasons.
The scenario is trivial and easy to implement, but if there is a library that offers this functionality already, there is no need to reinvent it.
RabbitMQ is a message broker. In essence, it accepts messages from producers, and delivers them to consumers. In-between, it can route, buffer, and persist the messages and data according to rules you give it.
You can also use Google Guava
Related
Spark has a useful API for accumulating data in a thread safe way https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.util.AccumulatorV2 and comes with some out-of-box useful accumulators e.g. for Longs https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.util.LongAccumulator
I usually use accumulators for wiring in debugging, profiling, monitoring and diagnostics into Spark jobs. I usually fire off a Future before running a Spark job to periodically print the stats (e.g. TPS, histograms, counts, timings, etc)
So far I cannot find anything that is similar for Kafka Streams. Does anything exist? I imagine this is possible at least for each instance of a Kafka app, but to make this work across several instances would require creating an intermediate topic.
Kafka Streams avoids concurrency by design -- if the accumulated does not need to be fault-tolerant, you can do it in memory and flush it out via a wall-clock time punctuation.
If it needs to be fault-tolerant, you can use a state store and scan the whole store in a punctuation to flush it out.
This will give you task-level accumulation. Not sure how Spark's accumulator works in detail, but if it give you a "global" view, I assume that it needs the send data over network, and one single instance only has access to the data (or maybe a broadcast instead -- not sure, how consistency would be guaranteed for the broadcast case). Similar, you could send the data to a topic (with 1 partition) to collect all data globally into a single place.
I recently came across BufferedMutator class of HBase which can be used for batch inserts and deletes.
I was previously using a List to put data as hTable.put(putList) to do the same.
Benchmarking my code didn't seem to show much difference too, where I was instead doing mutator.mutate(putList);.
Is there a significant performance improvement of using BufferedMutator over PutList?
Short Answer
BufferedMutator generally provides better throughput than just using Table#put(List<Put>) but needs proper tuning of hbase.client.write.buffer, hbase.client.max.total.tasks, hbase.client.max.perserver.tasks and hbase.client.max.perregion.tasks for good performance.
Explanation
When you pass a list of puts to the HBase client, it groups the puts by destination regions and batches these groups by destination region server. A single rpc request is sent for each batch. This cuts down the rpc overhead, especially in cases when the Puts are very small thus making rpc overhead per request significant.
The Table client sends all the Puts to the region servers immediately and waits for response. This means that any batching that can happen is limited to the number of Puts in the single API call and the api calls are synchronous from the caller's perspective.
However, the BufferedMutator keeps buffering the Puts in a buffer and decides to flush the buffered puts based on current buffered size in background threads wrapped around by a class called AsyncProcess. From the caller's perspective, each API call is still synchronous, but the whole buffering strategy gives much better batching. The background flush model also allows a continuous flow of requests, which combined with better batching means ability to support more client threads. However, due to this buffering strategy, the larger the buffer, the worse the per operation latency as seen by the caller, but higher throughput can be sustained by having a much larger number of client threads.
Some of the configs that control BufferedMutator throughput are:
hbase.client.write.buffer: Size (bytes) of the buffer (Higher gives better peak throughput, consumes more memory)
hbase.client.max.total.tasks: Number of pending requests across the cluster before AsyncProcess starts blocking requests (Higher is better, but can starve CPU on client, or cause overload on servers)
hbase.client.max.perserver.tasks: Number of pending requests for one region server before AsyncProcess starts blocking requests.
hbase.client.max.perregion.tasks: Number of pending requests per region.
Also, for the sake of completeness, it should go without saying that if the bottleneck is on the server side instead of client side, you won't see much performance gains by using BufferedMutator over Table on the client.
In event processing a function puts values into a collection and another removes from the same collection. The items should be placed inside the collection in the order they received from the source (sockets) and read in the same way or else the results will change.
Queue is the collection most people recommend but at the same time, is the queue blocked when an item is being added and hence the other function has to wait until the adding is completed making it inefficient and the operational latency increases over time.
For example, one thread reads from a queue and another writes to the same queue. Either one operation performs at a time on queue until it releases a lock. Is there any data structure that avoids this.
ConcurrentLinkedQueue is one of the examples. Please see other classes from java.util.concurrent.
There are even more performant third party libraries for specific cases, e.g. LMAX Disruptor
In fact, the LinkedBlockingQueue is the easiest to use in many cases because of its blocking put and take methods, which wait until there's an item to take, or space for another item to insert in case an upper size limit named capacity has been activated. Setting a capacity is optional, and without one, the queue can grow indefinitely.
The ArrayBlockingQueue, on the other hand, is the most efficient and beautiful of them, it internally uses a ring buffer and therefore must have an fixed capacity. It is way faster than the LinkedBlockingQueue, yet far from the maximum throughput you can achieve with a disruptor :)
In both cases, blocking is purely optional on both sides. The non-blocking API of all concurrent queues is also supported. The blocking and non-blocking APIs can be mixed.
In many cases, the queue is not the bottleneck, and when it really is, using a disruptor is often the sensible thing to do. It is not a queue but a ring buffer shared between participating threads with different roles, i.e. typically one producer, n workers, and one consumer. A bit more cumbersome to set up but speeds around 100 million transactions per second are possible on modern hardware because it does not require expensive volatile variables but relies on more subtle ways of serialising reads and writes that are machine dependent (you basically need to write parts of such a thing in assembler) :)
I want to stream data over network continuously. The source gives me a byte array that I'd want to store in a data structure which serves as buffer to compensate for any network lags.
What is the most efficient data structure to store the bytes in a queue fashion. Think of it as a pipe where one thread pumps in the data and other one reads and sends it over the network, while the pipe itself is long enough to contain multiple frames of the input data.
Is Queue efficient enough?
A Queue would not be efficient if you put bytes in one at a time. It would eat lots of memory, create GC pressure, and slow things down.
You could make the overhead of Queues reasonable if you put reasonably-sized (say 64kB) byte[]s or ByteBuffers in them. That buffer size could be tunable and changed based on performance experiments or perhaps even be adaptive at runtime.
TCP already compensates for network lags. If you are using UDP then you will need to handle congestion properly or things will go badly. In practice using TCP or UDP directly creates a lot of extra work and reinvention of wheels.
ZeroMQ (or the pure Java JeroMQ) is a good library option with an efficient wire protocol (good enough for realtime stock trading platforms). It handles the queueing transparently and gives a lot of options for different client models including things like PUB SUB that would help if you have lots of clients on a broadcast. Within a process ZeroMQ can manage the queueing of data being producuers and consumers. You could even use it to efficiently broadcast the same bytes to workers that do independent things with the same stream (ex: one doing usage metering and another doing transcoding).
There are other libraries that may also work. I think Netty handles things like this efficiently for example.
You should look into the OKIO libraray
I have a binary protocol with some unknown amount of initial header data (unknown length until header is fully decoded) followed by a stream of data that should be written to disk for later processing.
I have an implementation that decodes the header and then writes the file data to disk as it comes in from one or more frames in ChannelBuffers (so my handler subclasses FrameDecoder, and builds the message step by step, not waiting for one ChannelBuffer to contain the entire message, and writing the file data to disk with each frame). My concern is whether this is enough, or if using something like ChunkedWriteHandler does more than this and is necessary to handle large uploads.
Is there a more optimal way of handling file data than just writing it directly to disk from the ChannelBuffer with each frame?
It should be enough as long as the throughput is good enough. Otherwise, you might want to buffer the received data so that you don't make system calls too often (say, 32KiB bounded buffer) when the amount of received data is not large enough.
Netty could be even faster if it exposed the transferTo/From operation to a user, but such a feature is not available yet.
You should also think about adding an ExecutionHandler in front if your Handler. This will help you to not get blocked by the disk I/O. Otherwise you may see slow downs on heavy disk access.