How to set a timeout for 0MQ ( ZeroMQ ) in Java? - java

I need to add a timeout for the reply/request transaction using 0MQ. How is this typically accomplished? I tried using the method :
socket.setReceiveTimeOut();
and
socket.setSendTimeout();
but they seem to cause a null pointer exception.
In essence, I want the application to timeout after 10 seconds if the application receiving the request is not available.
Any help is appreciated.
Thanks!

I think jzmq should throw a ZMQException when recv timeout,
But there is no ZMQException, when err = EAGAIN.
https://github.com/zeromq/jzmq/blob/master/jzmq-jni/src/main/c%2B%2B/Socket.cpp
static
zmq_msg_t *do_read(JNIEnv *env, jobject obj, zmq_msg_t *message, int flags)
{
void *socket = get_socket (env, obj);
int rc = zmq_msg_init (message);
if (rc != 0) {
raise_exception (env, zmq_errno());
return NULL;
}
#if ZMQ_VERSION >= ZMQ_MAKE_VERSION(3,0,0)
rc = zmq_recvmsg (socket, message, flags);
#else
rc = zmq_recv (socket, message, flags);
#endif
int err = zmq_errno();
if (rc < 0 && err == EAGAIN) {
rc = zmq_msg_close (message);
err = zmq_errno();
if (rc != 0) {
raise_exception (env, err);
return NULL;
}
return NULL;
}
if (rc < 0) {
raise_exception (env, err);
rc = zmq_msg_close (message);
err = zmq_errno();
if (rc != 0) {
raise_exception (env, err);
return NULL;
}
return NULL;
}
return message;
}

I wonder if your null pointer is related to how your socket was created. I have set a socket timeout successfully in the past.
The following has worked for me when I used the JeroMQ library (native Java implementation of ZMQ). I used this to help do REQ-REP commands via ZMQ.
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket sock = context.socket(ZMQ.REQ);
sock.setSendTimeOut(10000); // 10 second send timeout
sock.setReceiveTimeOut(10000); // 10 second receive timeout
if (sock.connect("tcp://127.0.0.1:1234")) {
if (sock.send(/* insert data here */)) {
/* Send was successful and did not time out. */
byte[] replyBytes = null;
replyBytes = sock.recv();
if (null == replyBytes) {
/* Receive timed out. */
} else {
/* Receive was successful. Do something with replyBytes. */
}
}
}

How is this [a timeout for the reply/request transaction] typically accomplished ?
I am sad to confirm, there is nothing like this in the ZeroMQ native API. The principle of doing async delivery means, there is no limit for delivery to take place ( in a best-effort model of scheduling, or not at all ).
If new to the ZeroMQ, you may enjoy a fast read into this 5-second read about the main conceptual elements in [ ZeroMQ hierarchy in less than a five seconds ] Section.
I want ... to timeout after 10 seconds if ... receiving the request is not ...
May design your .recv()-method call usage into either a pre-tested / protected after a .poll( 10000 )-method screener, to first explicitly detect a presence of any message for indeed being delivered to your application-code, before ever issuing ( or not ) a call to the actual .recv()-method only upon a previously POSACK-ed message to be ready to get locally read, or may use a bit more "raw" approach, using a handler with a non-blocking form of the method, by a call to the .recv( ZMQ_NOBLOCK )-flagged not to spend a millisecond "there", in cases when "there" are no messages to read right now from the local-side Context()-engine instance, and handle each of the cases accordingly in your code.
A Bonus Point
Also be warned, that using the REQ/REP-Scalable Formal Communication Archetype pattern will not be any easier, as there is a mandatory two-side-step-dance ( sure, if not intentionally artificially ZMQ_RELAXED ), so the both FSA-back-to-back-connected-FSA-s will still have to wait for the next "expected" remote-event, before becoming able to make a chance for handling the next local-event. If interested in details, one will find many posts on un-avoidable, un-salvagable mutual-deadlock, that is sure to happen for REQ/REP, where we only do not know when it happens, but are sure it will.

Related

Communication issues between two Netty socket applications (partial messages) from socket-based communication

I'm experiencing communication issues while testing between two Netty socket applications (the main application, and the integration test application), where I've been receiving an unusual amount of partial messages.
One pattern being noticed, is that the first message being sent from the test application (application executes sending the message from outside the pipeline, using a sharable handler) tends to always be partial. This is also noticed during times of latency, where another issue occurs.
The other issue is that when a partial message is receive at times, the decoder seems to be trapped in a loop, where it continues try to read the partial message indefinitely. I have a unit test to simulate the partial message using EmbeddedChannel, but the unit test is not replicating what I am seeing during the integration test.
The main application is using the following pipeline:
ch.pipeline().addLast(<HeaderTrailerFrameDecoder>, <NettyMessageDecoder>, <HeaderTrailerFrameEncoder>, <NettyMessageEncoder>);
ch.pipeline().addLast(<IdleStateHandler>,<EventHandler>, <ChannelHandler>);
where:
HeaderTrailerFrameDecoder - Removes single-byte frame from beginning / end of packet
NettyMessageDecoder - Converts message from ByteBuf to domain object
HeaderTrailerFrameEncoder - Appends frame to message packets
NettyMessageEncoder - Converts message from domain object to ByteBuf
IdleStateHandler - Netty class, detects stale / idle connections
EventHandler - Sharable handler to send messages from outside of the pipeline
ChannelHandler - Main handler for all business logic
I'm thinking the problem could be with all of the encoders / decoders, and not releasing my ByteBuf objects perhaps? I'm only seeing the issue from the first decoder, the HeaderTrailerFrameDecoder, so I'll provide a snippet below. For every connection, the first message coming through from a series of messages being sent by the test application, first produces the log msg="could not find trailer".
#Slf4j // Lombok logging
public class HeaderTrailerFrameDecoder extends ByteToMessageDecoder {
private final byte header;
private final byte trailer;
HeaderTrailerFrameDecoder(byte header, byte trailer) {
this.header = header;
this.trailer = trailer;
}
#Override
protected void decode(
final ChannelHandlerContext ctx,
final ByteBuf buf,
final List<Object> out) {
log.trace("msg=\"decoding message with header and trailer\", buf={}", buf);
// Find header
int headerIndex = buf.forEachByte(value -> value != header);
if (headerIndex < 0) {
log.error("msg=\"could not find header\", payload=\"{}\"", payload);
buf.skipBytes(buf.readableBytes());
return;
}
int beforeHeaderLen = headerIndex - buf.readerIndex();
buf.skipBytes(beforeHeaderLen + 1);
// Find trailer
int trailerIndex = buf.forEachByte(value -> value != trailer);
if (trailerIndex < 0) {
String payload = debug(buf);
log.error("msg=\"could not find trailer\"");
buf.resetReaderIndex();
return;
}
int insideFrameLen = trailerIndex - buf.readerIndex();
ByteBuf frame = buf.readBytes(insideFrameLen);
buf.skipBytes(1);
// Pass message through
out.add(frame);
}
}

JNI: Does C++ calls Java asynchronous?

I'm trying to call some java classes from c++ code by using the JNI. Today I experienced a very strange behaviour in my programm. I suspect that the c++ code is not wait until the java side finished it's work and I don't know why.
The C++ code (in a shared object library) is running in it's own C++ thread. It is using a existing JavaVM of a java app that is already up and running. The references to the VM and the ClassLoader where fetched while the java application was loading the shared object library in JNI_Onload. Here I call the java method of a java object I created with JNI in the C++ thread:
env->CallVoidMethod(javaClassObject, javaReceiveMethod, intParam, byteParam, objectParam);
uint32_t applicationSideResponseCode = getResponseCode(objectClass, objectParam, env);
resp.setResponseCode(applicationSideResponseCode);
std::string applicationData = getApplicationData(serviceResultClass, serviceResultObject, env);
resp.setData(applicationData);
The Java javaReceiveMethod is accessing the database and fetching some applicationData which is stored in the objectParam. Unfortunately the C++ code fetched the applicationData before the java class completed it's work. applicationData is null and the JNI crashes. Why? I can't find any documentation of Oracle stating that CallVoidMethod is executed asynchronous?
Edit
I verified that no exception has occured on the java method. Everything seems fine, except that Java is still busy while C++ is trying to access the data.
Edit
I can confirm that if I debug the java application two threads are shown. On main thread, that is executing javaReceiveMethod and one thread that is fetching the applicationData. How can I solve this problem? Idle in the second thread until the data is available?
Edit
In my C++ code I'm creating a new object of the java class that I want to call:
jmethodID javaClassConstructor= env->GetMethodID(javaClass, "<init>", "()V");
jobject serviceObject = env->NewObject(javaClass, serviceConstructor);
jobject javaClassObject = env->NewObject(javaClass, javaClassConstructor);
After that I call the code as shown above. After more debugging I can say that the method is called in a thread named Thread-2 (I don't know if that is the c++ thread or a new one from JNI). It is definitely not the java main thread. However the work of the method is interrupted. That means If I debug the code, I can see that the data would be set soon but in the next debug step the getApplicationData method is executed (which can only occur if c++ is calling it).
Edit
The Java method I call:
public int receive(int methodId, byte[] data, ServiceResult result){
log.info("java enter methodID = " + methodId + ", data= " + data);
long responseCode = SEC_ERR_CODE_SUCCESS;
JavaMessageProto msg;
try {
msg = JavaMessageProto (data);
log.info("principal: " + msg.getPrincipal());
JavaMessage message = new JavaMessage (msg);
if(methodId == GET_LISTS){
//this is shown in console
System.out.println("get lists");
responseCode = getLists(message);
//this point is not reached
log.info("leave");
}
//[... different method calls here...]
if(responseCode != SEC_ERR_CODE_METHOD_NOT_IMPLEMENTED){
//ToDoListMessageProto response = message.getProtoBuf();
JavaMessageProto response = JavaMessageProto.newBuilder()
.setToken(message.getToken())
.setPrincipal(message.getPrincipal()).build();
byte[] res = response.toByteArray();
result.setApplicationData(response.toByteArray());
}
else{
result.setApplicationData("");
}
} catch (InvalidProtocolBufferException e) {
responseCode = SEC_ERR_CODE_DATA_CORRUPTED;
log.severe("Error: Could not parse Client message." + e.getMessage());
}
result.setResponseCode((int)responseCode);
return 0;
}
The second method is
public long getLists(JavaMessage message) {
log.info("getLists enter");
String principal = message.getPrincipal();
String token = message.getToken();
if(principal == null || principal.isEmpty()){
return SEC_ERR_CODE_PRINCIPAL_EMPTY;
}
if(token == null || token.isEmpty()){
return SEC_ERR_CODE_NO_AUTHENTICATION;
}
//get user object for authorization
SubjectManager manager = new SubjectManager();
Subject user = manager.getSubject();
user.setPrincipal(principal);
long result = user.isAuthenticated(token);
if(result != SEC_ERR_CODE_SUCCESS){
return result;
}
try {
//fetch all user list names and ids
ToDoListDAO db = new ToDoListDAO();
Connection conn = db.getConnection();
log.info( principal + " is authenticated");
result = db.getLists(conn, message);
//this is printed
log.info( principal + " is authenticated");
conn.close(); //no exception here
message.addId("testentry");
//this not
log.info("Fetched lists finished for " + principal);
} catch (SQLException e) {
log.severe("SQLException:" + e.getMessage());
result = SEC_ERR_CODE_DATABASE_ERROR;
}
return result;
}
CallVoidMethod is executed synchronously.
Maybe you have an excepion on c++ side?, do you use c++ jni exception checks?:
env->CallVoidMethod(javaClassObject, javaReceiveMethod, intParam, byteParam, objectParam);
if(env->ExceptionOccurred()) {
// Print exception caused by CallVoidMethod
env->ExceptionDescribe();
env->ExceptionClear();
}
The C++ code (in a shared object library) is running in it's own C++ thread. It is using a existing JavaVM of a java app that is already up and running.
its not clear whether you have attached current thread to virtual machine. Make sure env is comming from AttachCurrentThread call. You will find example here: How to obtain JNI interface pointer (JNIEnv *) for asynchronous calls.

Odd InetAddress.isReachable() issue

My work is developing software for network capable cameras for retail enviroments. One of the peices of software my team is developing is a webserver that retrieves various reports generated in HTML by the camera itself (which has its own embedded webserver) and stored on the camera. Our software will then GET these reports from the camera and store it on a central webserver.
While we are fine plugging in the IPs of the cameras into our software, I am developing a simple Java class that will query the network and locate all cameras on the network.
The problem though is that while it runs just fine on my PC, and my coworker's PC, when we attempt to run it on the actual webserver PC that will host our software... it runs, but says every IP in the subnet is offline / unreachable EXCEPT for the gateway IP.
For example, if I run it from my PC or my coworkers PC when plugged into the closed LAN, I get the following active IPs found along with a flag telling me if its a camera or not.
(gateway is 192.168.0.1, subnet mask is 255.255.255.0, which means full range of 256 devices to be looked for)
IP:/192.168.0.1 Active:true Camera:false
IP:/192.168.0.100 Active:true Camera:true <- this is camera 1
IP:/192.168.0.101 Active:true Camera:true <- this is camera 2
IP:/192.168.0.103 Active:true Camera:false <- my PC
IP:/192.168.0.104 Active:true Camera:false <- this is our webserver
But for some reason, when running the same program from the webserver PC, using the same JRE, I only get the following found
IP:/192.168.0.1 Active:true Camera:false
Now my code, instead of enumerating through each IP in order on the main Thread, instead creates a seperate Thread for each IP to be checked and runs them concurrently (else it would take little over 21 minutes to enumerate through the entire IP range at a timeout of 5000ms / IP). The main Thread then re-runs these IP scan threads every 15 seconds over and over.
I have checked that all the threads are running to completion on all the PCs, no exceptions are being thrown. Even verified that none of the threads are getting stuck. Each Thread takes about 5001 to 5050ms from start to complete, and those Threads that have an active IP finish sooner (>5000ms), so I know that its correctly waiting the full 5000ms in the ipAddr.isReachable(5000) method.
Me and my coworker are stumped at this point while it seems to reach those active IPs fine when run on our PCs, yet getting no response from the webserver PC???
We have ruled out firewall issues, admin access issues, etc.. The only difference is that our webserver is Embedded Win XP, and our PCs are Windows 7.
This has us stumped. Any ideas why?
Below is the code that is running each IP Thread:
public void CheckIP() {
new Thread() {
#Override
public void run() {
try {
isActive = ipAddr.isReachable(5000);
if (isActive) {
if (!isCamera) {
isCamera = new IpHttpManager().GetResponse(ipAddr.toString());
}
} else {
isCamera = false;
}
} catch (Exception e) {
e.printStackTrace();
}
}
}.start();
}
EDIT: Here is the code that builds each IP to check after determining the range based on gateway and subnet...
for(int i=subMin; i<=subMax; i++) {
byte[] ip = new byte[] {(byte)oct[0],(byte)oct[1],(byte)oct[2],(byte)i};
try {
scanners[subCount] = new IpScan(InetAddress.getByAddress(ip));
subCount++;
} catch (UnknownHostException e) {
e.printStackTrace();
}}
Thanks everyone, but I never did figure out or pinpoint why this oddity was happening. Everything I checked for was not the cause, so this question can be closed.
In any case, I ended up working around it completely. Instead of using InetAddress, I just went native and built my own ICMP ping class instead, via JNA, invoking Windows libraries IPHLPAPI.DLL and WSOCK32.DLL. Here is what I used...
public interface InetAddr extends StdCallLibrary {
InetAddr INSTANCE = (InetAddr)
Native.loadLibrary("wsock32.dll", InetAddr.class);
ULONG inet_addr(String cp); //in_addr creator. Creates the in_addr C struct used below
}
public interface IcmpEcho extends StdCallLibrary {
IcmpEcho INSTANCE = (IcmpEcho)
Native.loadLibrary("iphlpapi.dll", IcmpEcho.class);
int IcmpSendEcho(
HANDLE IcmpHandle, //Handle to the ICMP
ULONG DestinationAddress, //Destination address, in the form of an in_addr C Struct defaulted to ULONG
Pointer RequestData, //Pointer to the buffer where my Message to be sent is
short RequestSize, //size of the above buffer. sizeof(Message)
byte[] RequestOptions, //OPTIONAL!! Can set this to NULL
Pointer ReplyBuffer, //Pointer to the buffer where the replied echo is written to
int ReplySize, //size of the above buffer. Normally its set to the sizeof(ICMP_ECHO_REPLY), but arbitrarily set it to 256 bytes
int Timeout); //time, as int, for timeout
HANDLE IcmpCreateFile(); //win32 ICMP Handle creator
boolean IcmpCloseHandle(HANDLE IcmpHandle); //win32 ICMP Handle destroyer
}
And then using those to create the following method...
public void SendReply(String ipAddress) {
final IcmpEcho icmpecho = IcmpEcho.INSTANCE;
final InetAddr inetAddr = InetAddr.INSTANCE;
HANDLE icmpHandle = icmpecho.IcmpCreateFile();
byte[] message = new String("thisIsMyMessage!".toCharArray()).getBytes();
Memory messageData = new Memory(32); //In C/C++ this would be: void *messageData = (void*) malloc(message.length);
messageData.write(0, message, 0, message.length); //but ignored the length and set it to 32 bytes instead for now
Pointer requestData = messageData;
Pointer replyBuffer = new Memory(256);
replyBuffer.clear(256);
// HERE IS THE NATIVE CALL!!
reply = icmpecho.IcmpSendEcho(icmpHandle,
inetAddr.inet_addr(ipAddress),
requestData,
(short) 32,
null,
replyBuffer,
256,
timeout);
// NATIVE CALL DONE, CHECK REPLY!!
icmpecho.IcmpCloseHandle(icmpHandle);
}
public boolean IsReachable () {
return (reply > 0);
}
My guess is that your iteration logic to determine the different ip address is based upon different configuration hence your pc's fetches all addresses but your webserver doesn't.
Try adding debug in the logic where you build up the list of ip adresses to check.

Couchbase: net.spy.memcached.internal.CheckedOperationTimeoutException

I'm loading local Couchbase instance with application specific json objects.
Relevant code is:
CouchbaseClient getCouchbaseClient()
{
List<URI> uris = new LinkedList<URI>();
uris.add(URI.create("http://localhost:8091/pools"));
CouchbaseConnectionFactoryBuilder cfb = new CouchbaseConnectionFactoryBuilder();
cfb.setFailureMode(FailureMode.Retry);
cfb.setMaxReconnectDelay(1500); // to enqueue an operation
cfb.setOpTimeout(10000); // wait up to 10 seconds for an operation to succeed
cfb.setOpQueueMaxBlockTime(5000); // wait up to 5 seconds when trying to
// enqueue an operation
return new CouchbaseClient(cfb.buildCouchbaseConnection(uris, "my-app-bucket", ""));
}
Method to store entry (I'm using suggestions from Bulk Load and Exponential Backoff):
void continuosSet(CouchbaseClient cache, String key, int exp, Object value, int tries)
{
OperationFuture<Boolean> result = null;
OperationStatus status = null;
int backoffexp = 0;
do
{
if (backoffexp > tries)
{
throw new RuntimeException(MessageFormat.format("Could not perform a set after {0} tries.", tries));
}
result = cache.set(key, exp, value);
try
{
if (result.get())
{
break;
}
else
{
status = result.getStatus();
LOG.warn(MessageFormat.format("Set failed with status \"{0}\" ... retrying.", status.getMessage()));
if (backoffexp > 0)
{
double backoffMillis = Math.pow(2, backoffexp);
backoffMillis = Math.min(1000, backoffMillis); // 1 sec max
Thread.sleep((int) backoffMillis);
LOG.warn("Backing off, tries so far: " + tries);
}
backoffexp++;
}
}
catch (ExecutionException e)
{
LOG.error("ExecutionException while doing set: " + e.getMessage());
}
catch (InterruptedException e)
{
LOG.error("InterruptedException while doing set: " + e.getMessage());
}
}
while (status != null && status.getMessage() != null && status.getMessage().indexOf("Temporary failure") > -1);
}
When continuosSet method called for a large amount of objects to store (single thread) e.g.
CouchbaseClient cache = getCouchbaseClient();
do
{
SerializableData data = queue.poll();
if (data != null)
{
final String key = data.getClass().getSimpleName() + data.getId();
continuosSet(cache, key, 0, gson.toJson(data, data.getClass()), 100);
...
it generates CheckedOperationTimeoutException inside of continuosSet method in result.get() operation.
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: 127.0.0.1/127.0.0.1:11210
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:160) ~[spymemcached-2.8.12.jar:2.8.12]
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:133) ~[spymemcached-2.8.12.jar:2.8.12]
Can someone shed light into this how to overcome and recover from this situation? Is there a good technique/workaround on how to bulk load in Java client for Couchbase? I already explored documentation Performing a Bulk Set which is unfortunately for PHP Couchbase client.
My suspicion is that you may be running this in a JVM spawned from the command line that doesn't have that much memory. If that's the case, you could hit longer GC pauses which could cause the timeout you're mentioning.
I think the best thing to do is to try a couple of things. First, raise the -Xmx argument to the JVM to use more memory. See if the timeout happens later or goes away. If so, then my suspicion about memory is correct.
If that doesn't work, raise the setOpTimeout() and see if that reduces the error or makes it go away.
Also, make sure you're using the latest client.
By the way, I don't think this is directly bulk loading related. It may happen owing to a lot of resource consumption during bulk loading, but it looks like the regular backoff must be working or you're not ever hitting it.

Are there C++ equivalents for the Protocol Buffers delimited I/O functions in Java?

I'm trying to read / write multiple Protocol Buffers messages from files, in both C++ and Java. Google suggests writing length prefixes before the messages, but there's no way to do that by default (that I could see).
However, the Java API in version 2.1.0 received a set of "Delimited" I/O functions which apparently do that job:
parseDelimitedFrom
mergeDelimitedFrom
writeDelimitedTo
Are there C++ equivalents? And if not, what's the wire format for the size prefixes the Java API attaches, so I can parse those messages in C++?
Update:
These now exist in google/protobuf/util/delimited_message_util.h as of v3.3.0.
I'm a bit late to the party here, but the below implementations include some optimizations missing from the other answers and will not fail after 64MB of input (though it still enforces the 64MB limit on each individual message, just not on the whole stream).
(I am the author of the C++ and Java protobuf libraries, but I no longer work for Google. Sorry that this code never made it into the official lib. This is what it would look like if it had.)
bool writeDelimitedTo(
const google::protobuf::MessageLite& message,
google::protobuf::io::ZeroCopyOutputStream* rawOutput) {
// We create a new coded stream for each message. Don't worry, this is fast.
google::protobuf::io::CodedOutputStream output(rawOutput);
// Write the size.
const int size = message.ByteSize();
output.WriteVarint32(size);
uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
if (buffer != NULL) {
// Optimization: The message fits in one buffer, so use the faster
// direct-to-array serialization path.
message.SerializeWithCachedSizesToArray(buffer);
} else {
// Slightly-slower path when the message is multiple buffers.
message.SerializeWithCachedSizes(&output);
if (output.HadError()) return false;
}
return true;
}
bool readDelimitedFrom(
google::protobuf::io::ZeroCopyInputStream* rawInput,
google::protobuf::MessageLite* message) {
// We create a new coded stream for each message. Don't worry, this is fast,
// and it makes sure the 64MB total size limit is imposed per-message rather
// than on the whole stream. (See the CodedInputStream interface for more
// info on this limit.)
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size)) return false;
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
input.PushLimit(size);
// Parse the message.
if (!message->MergeFromCodedStream(&input)) return false;
if (!input.ConsumedEntireMessage()) return false;
// Release the limit.
input.PopLimit(limit);
return true;
}
Okay, so I haven't been able to find top-level C++ functions implementing what I need, but some spelunking through the Java API reference turned up the following, inside the MessageLite interface:
void writeDelimitedTo(OutputStream output)
/* Like writeTo(OutputStream), but writes the size of
the message as a varint before writing the data. */
So the Java size prefix is a (Protocol Buffers) varint!
Armed with that information, I went digging through the C++ API and found the CodedStream header, which has these:
bool CodedInputStream::ReadVarint32(uint32 * value)
void CodedOutputStream::WriteVarint32(uint32 value)
Using those, I should be able to roll my own C++ functions that do the job.
They should really add this to the main Message API though; it's missing functionality considering Java has it, and so does Marc Gravell's excellent protobuf-net C# port (via SerializeWithLengthPrefix and DeserializeWithLengthPrefix).
I solved the same problem using CodedOutputStream/ArrayOutputStream to write the message (with the size) and CodedInputStream/ArrayInputStream to read the message (with the size).
For example, the following pseudo-code writes the message size following by the message:
const unsigned bufLength = 256;
unsigned char buffer[bufLength];
Message protoMessage;
google::protobuf::io::ArrayOutputStream arrayOutput(buffer, bufLength);
google::protobuf::io::CodedOutputStream codedOutput(&arrayOutput);
codedOutput.WriteLittleEndian32(protoMessage.ByteSize());
protoMessage.SerializeToCodedStream(&codedOutput);
When writing you should also check that your buffer is large enough to fit the message (including the size). And when reading, you should check that your buffer contains a whole message (including the size).
It definitely would be handy if they added convenience methods to C++ API similar to those provided by the Java API.
IsteamInputStream is very fragile to eofs and other errors that easily occurs when used together with std::istream. After this the protobuf streams are permamently damaged and any already used buffer data is destroyed. There are proper support for reading from traditional streams in protobuf.
Implement google::protobuf::io::CopyingInputStream and use that together with CopyingInputStreamAdapter. Do the same for the output variants.
In practice a parsing call ends up in google::protobuf::io::CopyingInputStream::Read(void* buffer, int size) where a buffer is given. The only thing left to do is read into it somehow.
Here's an example for use with Asio synchronized streams (SyncReadStream/SyncWriteStream):
#include <google/protobuf/io/zero_copy_stream_impl_lite.h>
using namespace google::protobuf::io;
template <typename SyncReadStream>
class AsioInputStream : public CopyingInputStream {
public:
AsioInputStream(SyncReadStream& sock);
int Read(void* buffer, int size);
private:
SyncReadStream& m_Socket;
};
template <typename SyncReadStream>
AsioInputStream<SyncReadStream>::AsioInputStream(SyncReadStream& sock) :
m_Socket(sock) {}
template <typename SyncReadStream>
int
AsioInputStream<SyncReadStream>::Read(void* buffer, int size)
{
std::size_t bytes_read;
boost::system::error_code ec;
bytes_read = m_Socket.read_some(boost::asio::buffer(buffer, size), ec);
if(!ec) {
return bytes_read;
} else if (ec == boost::asio::error::eof) {
return 0;
} else {
return -1;
}
}
template <typename SyncWriteStream>
class AsioOutputStream : public CopyingOutputStream {
public:
AsioOutputStream(SyncWriteStream& sock);
bool Write(const void* buffer, int size);
private:
SyncWriteStream& m_Socket;
};
template <typename SyncWriteStream>
AsioOutputStream<SyncWriteStream>::AsioOutputStream(SyncWriteStream& sock) :
m_Socket(sock) {}
template <typename SyncWriteStream>
bool
AsioOutputStream<SyncWriteStream>::Write(const void* buffer, int size)
{
boost::system::error_code ec;
m_Socket.write_some(boost::asio::buffer(buffer, size), ec);
return !ec;
}
Usage:
AsioInputStream<boost::asio::ip::tcp::socket> ais(m_Socket); // Where m_Socket is a instance of boost::asio::ip::tcp::socket
CopyingInputStreamAdaptor cis_adp(&ais);
CodedInputStream cis(&cis_adp);
Message protoMessage;
uint32_t msg_size;
/* Read message size */
if(!cis.ReadVarint32(&msg_size)) {
// Handle error
}
/* Make sure not to read beyond limit of message */
CodedInputStream::Limit msg_limit = cis.PushLimit(msg_size);
if(!msg.ParseFromCodedStream(&cis)) {
// Handle error
}
/* Remove limit */
cis.PopLimit(msg_limit);
Here you go:
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/io/coded_stream.h>
using namespace google::protobuf::io;
class FASWriter
{
std::ofstream mFs;
OstreamOutputStream *_OstreamOutputStream;
CodedOutputStream *_CodedOutputStream;
public:
FASWriter(const std::string &file) : mFs(file,std::ios::out | std::ios::binary)
{
assert(mFs.good());
_OstreamOutputStream = new OstreamOutputStream(&mFs);
_CodedOutputStream = new CodedOutputStream(_OstreamOutputStream);
}
inline void operator()(const ::google::protobuf::Message &msg)
{
_CodedOutputStream->WriteVarint32(msg.ByteSize());
if ( !msg.SerializeToCodedStream(_CodedOutputStream) )
std::cout << "SerializeToCodedStream error " << std::endl;
}
~FASWriter()
{
delete _CodedOutputStream;
delete _OstreamOutputStream;
mFs.close();
}
};
class FASReader
{
std::ifstream mFs;
IstreamInputStream *_IstreamInputStream;
CodedInputStream *_CodedInputStream;
public:
FASReader(const std::string &file), mFs(file,std::ios::in | std::ios::binary)
{
assert(mFs.good());
_IstreamInputStream = new IstreamInputStream(&mFs);
_CodedInputStream = new CodedInputStream(_IstreamInputStream);
}
template<class T>
bool ReadNext()
{
T msg;
unsigned __int32 size;
bool ret;
if ( ret = _CodedInputStream->ReadVarint32(&size) )
{
CodedInputStream::Limit msgLimit = _CodedInputStream->PushLimit(size);
if ( ret = msg.ParseFromCodedStream(_CodedInputStream) )
{
_CodedInputStream->PopLimit(msgLimit);
std::cout << mFeed << " FASReader ReadNext: " << msg.DebugString() << std::endl;
}
}
return ret;
}
~FASReader()
{
delete _CodedInputStream;
delete _IstreamInputStream;
mFs.close();
}
};
I ran into the same issue in both C++ and Python.
For the C++ version, I used a mix of the code Kenton Varda posted on this thread and the code from the pull request he sent to the protobuf team (because the version posted here doesn't handle EOF while the one he sent to github does).
#include <google/protobuf/message_lite.h>
#include <google/protobuf/io/zero_copy_stream.h>
#include <google/protobuf/io/coded_stream.h>
bool writeDelimitedTo(const google::protobuf::MessageLite& message,
google::protobuf::io::ZeroCopyOutputStream* rawOutput)
{
// We create a new coded stream for each message. Don't worry, this is fast.
google::protobuf::io::CodedOutputStream output(rawOutput);
// Write the size.
const int size = message.ByteSize();
output.WriteVarint32(size);
uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
if (buffer != NULL)
{
// Optimization: The message fits in one buffer, so use the faster
// direct-to-array serialization path.
message.SerializeWithCachedSizesToArray(buffer);
}
else
{
// Slightly-slower path when the message is multiple buffers.
message.SerializeWithCachedSizes(&output);
if (output.HadError())
return false;
}
return true;
}
bool readDelimitedFrom(google::protobuf::io::ZeroCopyInputStream* rawInput, google::protobuf::MessageLite* message, bool* clean_eof)
{
// We create a new coded stream for each message. Don't worry, this is fast,
// and it makes sure the 64MB total size limit is imposed per-message rather
// than on the whole stream. (See the CodedInputStream interface for more
// info on this limit.)
google::protobuf::io::CodedInputStream input(rawInput);
const int start = input.CurrentPosition();
if (clean_eof)
*clean_eof = false;
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size))
{
if (clean_eof)
*clean_eof = input.CurrentPosition() == start;
return false;
}
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit = input.PushLimit(size);
// Parse the message.
if (!message->MergeFromCodedStream(&input)) return false;
if (!input.ConsumedEntireMessage()) return false;
// Release the limit.
input.PopLimit(limit);
return true;
}
And here is my python2 implementation:
from google.protobuf.internal import encoder
from google.protobuf.internal import decoder
#I had to implement this because the tools in google.protobuf.internal.decoder
#read from a buffer, not from a file-like objcet
def readRawVarint32(stream):
mask = 0x80 # (1 << 7)
raw_varint32 = []
while 1:
b = stream.read(1)
#eof
if b == "":
break
raw_varint32.append(b)
if not (ord(b) & mask):
#we found a byte starting with a 0, which means it's the last byte of this varint
break
return raw_varint32
def writeDelimitedTo(message, stream):
message_str = message.SerializeToString()
delimiter = encoder._VarintBytes(len(message_str))
stream.write(delimiter + message_str)
def readDelimitedFrom(MessageType, stream):
raw_varint32 = readRawVarint32(stream)
message = None
if raw_varint32:
size, _ = decoder._DecodeVarint32(raw_varint32, 0)
data = stream.read(size)
if len(data) < size:
raise Exception("Unexpected end of file")
message = MessageType()
message.ParseFromString(data)
return message
#In place version that takes an already built protobuf object
#In my tests, this is around 20% faster than the other version
#of readDelimitedFrom()
def readDelimitedFrom_inplace(message, stream):
raw_varint32 = readRawVarint32(stream)
if raw_varint32:
size, _ = decoder._DecodeVarint32(raw_varint32, 0)
data = stream.read(size)
if len(data) < size:
raise Exception("Unexpected end of file")
message.ParseFromString(data)
return message
else:
return None
It might not be the best looking code and I'm sure it can be refactored a fair bit, but at least that should show you one way to do it.
Now the big problem: It's SLOW.
Even when using the C++ implementation of python-protobuf, it's one order of magnitude slower than in pure C++. I have a benchmark where I read 10M protobuf messages of ~30 bytes each from a file. It takes ~0.9s in C++, and 35s in python.
One way to make it a bit faster would be to re-implement the varint decoder to make it read from a file and decode in one go, instead of reading from a file and then decoding as this code currently does. (profiling shows that a significant amount of time is spent in the varint encoder/decoder). But needless to say that alone is not enough to close the gap between the python version and the C++ version.
Any idea to make it faster is very welcome :)
Just for completeness, I post here an up-to-date version that works with the master version of protobuf and Python3
For the C++ version it is sufficient to use the utils in delimited_message_utils.h, here a MWE
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/util/delimited_message_util.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
template <typename T>
bool writeManyToFile(std::deque<T> messages, std::string filename) {
int outfd = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC);
google::protobuf::io::FileOutputStream fout(outfd);
bool success;
for (auto msg: messages) {
success = google::protobuf::util::SerializeDelimitedToZeroCopyStream(
msg, &fout);
if (! success) {
std::cout << "Writing Failed" << std::endl;
break;
}
}
fout.Close();
close(outfd);
return success;
}
template <typename T>
std::deque<T> readManyFromFile(std::string filename) {
int infd = open(filename.c_str(), O_RDONLY);
google::protobuf::io::FileInputStream fin(infd);
bool keep = true;
bool clean_eof = true;
std::deque<T> out;
while (keep) {
T msg;
keep = google::protobuf::util::ParseDelimitedFromZeroCopyStream(
&msg, &fin, nullptr);
if (keep)
out.push_back(msg);
}
fin.Close();
close(infd);
return out;
}
For the Python3 version, building on #fireboot 's answer, the only thing thing that needed modification is the decoding of raw_varint32
def getSize(raw_varint32):
result = 0
shift = 0
b = six.indexbytes(raw_varint32, 0)
result |= ((ord(b) & 0x7f) << shift)
return result
def readDelimitedFrom(MessageType, stream):
raw_varint32 = readRawVarint32(stream)
message = None
if raw_varint32:
size = getSize(raw_varint32)
data = stream.read(size)
if len(data) < size:
raise Exception("Unexpected end of file")
message = MessageType()
message.ParseFromString(data)
return message
Was also looking for a solution for this. Here's the core of our solution, assuming some java code wrote many MyRecord messages with writeDelimitedTo into a file. Open the file and loop, doing:
if(someCodedInputStream->ReadVarint32(&bytes)) {
CodedInputStream::Limit msgLimit = someCodedInputStream->PushLimit(bytes);
if(myRecord->ParseFromCodedStream(someCodedInputStream)) {
//do your stuff with the parsed MyRecord instance
} else {
//handle parse error
}
someCodedInputStream->PopLimit(msgLimit);
} else {
//maybe end of file
}
Hope it helps.
Working with an objective-c version of protocol-buffers, I ran into this exact issue. On sending from the iOS client to a Java based server that uses parseDelimitedFrom, which expects the length as the first byte, I needed to call writeRawByte to the CodedOutputStream first. Posting here to hopegully help others that run into this issue. While working through this issue, one would think that Google proto-bufs would come with a simply flag which does this for you...
Request* request = [rBuild build];
[self sendMessage:request];
}
- (void) sendMessage:(Request *) request {
//** get length
NSData* n = [request data];
uint8_t len = [n length];
PBCodedOutputStream* os = [PBCodedOutputStream streamWithOutputStream:outputStream];
//** prepend it to message, such that Request.parseDelimitedFrom(in) can parse it properly
[os writeRawByte:len];
[request writeToCodedOutputStream:os];
[os flush];
}
Since I'm not allowed to write this as a comment to Kenton Varda's answer above; I believe there is a bug in the code he posted (as well as in other answers which have been provided). The following code:
...
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size)) return false;
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
input.PushLimit(size);
...
sets an incorrect limit because it does not take into account the size of the varint32 which has already been read from input. This can result in data loss/corruption as additional bytes are read from the stream which may be part of the next message. The usual way of handling this correctly is to delete the CodedInputStream used to read the size and create a new one for reading the payload:
...
uint32_t size;
{
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
if (!input.ReadVarint32(&size)) return false;
}
google::protobuf::io::CodedInputStream input(rawInput);
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
input.PushLimit(size);
...
You can use getline for reading a string from a stream, using the specified delimiter:
istream& getline ( istream& is, string& str, char delim );
(defined in the header)

Categories

Resources