Hazelcast: right way to deploy on cluster - java

What is the right way to deploy Hazelcast on one REST-server and 5 worker machines cluster? Should I start Hazelcast 5 server instances (one on each worker) and 1 HazelcastClient on REST server?
I have
One REST server machine, which handle all user requests;
Five worker machines in a cluster, each of machines keeps some data in local file system. That data is definitely to big to keep them in RAM, I need Hazelcast only to distribute my search query through cluster.
I want
On user request, search through data of each of 5 worker machines and return result to user. User request will be accepted by REST-server machine, than REST-server will send search MultiTask to each worker in a cluster. Something like:
public MySearchResult handleUserSearchRequest(String query) {
MultiTask<String> task = new MultiTask<String>(query, Hazelcast.getCluster().getMembers());
ExecutorService executorService = Hazelcast.getExecutorService();
executorService.execute(task);
Collection<String> results = task.get();
return results.stream().reduce(/*some logic*/);
}
P.S.
How to launch all 6 Hazelcast instances from single place (Spring Boot application)?

You can simply have a script that can run your main class containing the node startup code, those many number of times.
Understanding your usecase, I have given a sample code for creating a cluster and submitting a task to all the nodes from a Driver class in your case REST client.
Run the below class 5 times to create a cluster of 5 nodes under TCP/IP configuration.
public class WorkerNode {
public static void main(String[] args){
/*
Create a new Hazelcast node.
Get the configurations from Hazelcast.xml in classpath or default one from jar
*/
HazelcastInstance workerNode = Hazelcast.newHazelcastInstance();
System.out.println("*********** Started a WorkerNode ***********");
}
}
Here is the NodeTask containing your business logic to do the IO operations.
public class NodeTask implements Callable<Object>, HazelcastInstanceAware, Serializable {
private transient HazelcastInstance hazelcastInstance;
public void setHazelcastInstance(HazelcastInstance hazelcastInstance) {
this.hazelcastInstance = hazelcastInstance;
}
public Object call() throws Exception {
Object returnableObject = "testData";
//Do all the IO operations here and set the returnable object
System.out.println("Running the NodeTask on a Hazelcast Node: " + hazelcastInstance.getName());
return returnableObject;
}
}
Here is the driver class from your REST client:
public class Driver {
public static void main(String[] args) throws Exception {
HazelcastInstance client = HazelcastClient.newHazelcastClient();
IExecutorService executor = client.getExecutorService("executor");
Map<Member, Future<Object>> result = executor.submitToAllMembers(new NodeTask());
for (Future<Object> future : result.values()) {
/*
Aggregation logic goes here.
*/
System.out.println("Returned data from node: " + future.get());
}
client.shutdown();
System.exit(0);
}
}
Sample Hazelcast.xml configuration:
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.8.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<network>
<port auto-increment="true" port-count="100">5701</port>
<join>
<multicast enabled="false">
<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>
</multicast>
<tcp-ip enabled="true">
<!--Replace this with the IP addresses of the servers -->
<interface>127.0.0.1</interface>
</tcp-ip>
<aws enabled="false"/>
</join>
<interfaces enabled="false">
<interface>127.0.0.1</interface>
</interfaces>
</network>
</hazelcast>

Related

Reading Kafka messages with Apache Storm in a Java Spring application causing NotSerializeableException, why?

I'm new to Apache Storm and trying to get my feet wet.
Right now I simply want to log or print incoming Kafka messages which are received as byte arrays of ProtoBuf objects.
I need to do this within a Java Spring application.
I'm using Kafka 0.11.0.2
I'm using Storm 1.1.2 and have storm-core, storm-kafka, and storm-starters in my pom.
Main service class example
//annotations for spring
public class MyService{
public static void main(String[] args){
SpringApplication.run(MyService.class, args);
}
#PostConstruct
public void postConstruct() throws Exception {
SpoutConfig spoutConfig = new SpoutConfig(new ZKHosts("localhost:9092"), "topic", "/topic", "storm-spout");
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("storm-spout", kafkaSpout);
builder.setBolt("printer", new PrinterBolt())
.shuffleGrouping("storm-spout");
Config config = new Config();
config.setDebug(true);
config.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("kafka", config, builder.createTopology());
Thread.sleep(30000);
cluster.shutdown();
}
private class PrinterBolt extends BaseBasicBolt {
#Override
public void execute(Tuple input, BasicOutputCollector){
System.out.println("\n\n INPUT: "+input.toString()+"\n\n");
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer){}
}
}
I build a docker image from this with a Dockerfile that I know works with my environment for other spring apps and run it in a container it throws an exception and hangs.
The exception is java.io.NotSerializeableException
and I see Caused by java.lang.IllegalStateException: Bolt 'printer' contains a non-seriablizeable field of type my.package.MyService$$EnhancerBySpringCGLIB$$696afb49, which was instantiated prior to topology creation. my.package.MyService$$EnhancerBySpringCLGIB$$696afb49 should be instantiated within the prepare method of 'printer at the earliest.
I figure maybe it's because storm is trying and failing to serialize the incoming byte array but I'm not sure how to remedy that and I haven't seen a lot of people trying to do this.
I was using this as a reference. https://github.com/thehydroimpulse/storm-kafka-starter/blob/master/src/jvm/storm/starter/KafkaTopology.java
Either declare PrinterBolt in a new file, or make the class static. The problem you're running into is that PrinterBolt is a non-static inner class of MyService, which means it contains a reference to the outer MyService class. Since MyService isn't serializable, PrinterBolt isn't either. Storm requires bolts to be serializable.
Also unrelated to the error you're seeing, you might want to consider using storm-kafka-client over storm-kafka, since the latter is deprecated.

How to change the port used by tcp-ibound-gateway on the fly

Is there a way to change port used by tcp-inbound gateway on the fly? I'd like to set port and timeout used by tcp-inbound-gateway based on the configuration persisted in the database and have ability to change them on the fly without restarting an application. In order to do so I decided to use "publish-subscriber" pattern and extended TcpInboundGateway class:
public class RuntimeInboundGateway extends TcpInboundGateway implements SettingsSubscriber {
#Autowired
private Settings settings;
#PostConstruct
public void subscribe() {
settings.subscribe(this);
}
#Override
public void onSettingsChanged(Settings settings) {
this.stop();
AbstractByteArraySerializer serializer = new ByteArrayLfSerializer();
TcpNetServerConnectionFactory connectionFactory = new TcpNetServerConnectionFactory(settings.getPort());
connectionFactory.setSerializer(serializer);
connectionFactory.afterPropertiesSet();
this.setConnectionFactory(connectionFactory);
this.afterPropertiesSet();
this.start();
}
}
The settings object is a singleton bean and when it is changed the tcp inbound gateway starts indeed listening on the new port but looks like it doesn't send inbound messages further on the flow. Here is an excerpt from xml configuration:
<int-ip:tcp-connection-factory id="connFactory" type="server" port="${port}"
serializer="serializer"
deserializer="serializer"/>
<bean id="serializer" class="org.springframework.integration.ip.tcp.serializer.ByteArrayLfSerializer"/>
<bean id="inboundGateway" class="com.example.RuntimeInboundGateway">
<property name="connectionFactory" ref="connFactory"/>
<property name="requestChannel" ref="requestChannel"/>
<property name="replyChannel" ref="responseChannel"/>
<property name="errorChannel" ref="exceptionChannel"/>
<property name="autoStartup" value="true"/>
</bean>
There is logging-channel-adapter in the configuration which logs any requests to the service without any issues until the settings are changed. After that, it doesn't and I see no messages received though I'm able to connect to the new port by the telnet localhost <NEW_PORT>. Could somebody take a look and say how the desired behaviour can be achieved?
A quick look at your code indicated it should work ok, so I just wrote a quick Spring Boot app and it worked fine for me...
#SpringBootApplication
public class So40084223Application {
public static void main(String[] args) throws Exception {
ConfigurableApplicationContext ctx = SpringApplication.run(So40084223Application.class, args);
Socket socket = SocketFactory.getDefault().createSocket("localhost", 1234);
socket.getOutputStream().write("foo\r\n".getBytes());
socket.close();
QueueChannel queue = ctx.getBean("queue", QueueChannel.class);
System.out.println(queue.receive(10000));
ctx.getBean(MyInboundGateway.class).recycle(1235);
socket = SocketFactory.getDefault().createSocket("localhost", 1235);
socket.getOutputStream().write("fooo\r\n".getBytes());
socket.close();
System.out.println(queue.receive(10000));
ctx.close();
}
#Bean
public TcpNetServerConnectionFactory cf() {
return new TcpNetServerConnectionFactory(1234);
}
#Bean
public MyInboundGateway gate(TcpNetServerConnectionFactory cf) {
MyInboundGateway gate = new MyInboundGateway();
gate.setConnectionFactory(cf);
gate.setRequestChannel(queue());
return gate;
}
#Bean
public QueueChannel queue() {
return new QueueChannel();
}
public static class MyInboundGateway extends TcpInboundGateway implements ApplicationEventPublisherAware {
private ApplicationEventPublisher applicationEventPublisher;
#Override
public void setApplicationEventPublisher(ApplicationEventPublisher applicationEventPublisher) {
this.applicationEventPublisher = applicationEventPublisher;
}
public void recycle(int port) {
stop();
TcpNetServerConnectionFactory sf = new TcpNetServerConnectionFactory(port);
sf.setApplicationEventPublisher(this.applicationEventPublisher);
sf.afterPropertiesSet();
setConnectionFactory(sf);
afterPropertiesSet();
start();
}
}
}
I would turn on DEBUG logging to see if it gives you any clues.
You also might to want to explore using the new DSL dynamic flow registration instead. The tcp-dynamic-client shows how to use that technique to add/remove flow snippets on the fly. It's on the client side, but similar techniques can be used on the server side to register/unregister your gateway and connection factory.
The cause of the troubles is me. Since the deserializer is not specified in the code above the default one is used and it couldn't demarcate inbound messages from the input byte stream. Just one line connectionFactory.setDeserializer(serializer); solved the issue I spent a day on.

When should JMS connection be started? In its own Thread?

I have a Java Swing GUI client that communicates with a WildFly server.
standalone-full.xml
<jms-queue name="goReceiveFmSvrQueue">
<entry name="java:/jboss/exported/jms/goReceiveFmSvrQueue"/>
<durable>true</durable>
</jms-queue>
<jms-queue name="goSendToSvrQueue">
<entry name="java:jboss/exported/jms/goSendToSvrQueue"/>
<durable>true</durable>
</jms-queue>
My client has a Runnable MsgCenterSend class. It instantiates MsgCenterSend. then calls msgCenter.run() to start a connection. Then used msgCenter.send() to send a message. And msgCenter.stop() to close it when the client closes.
Does that make sense?
Or Should the client just create a connection, session, destination and producer every time it needs to send message? And if it does that should it be done in a separate Thread?
public class MsgCenterSend implements Runnable {
private Connection connection = null;
private MessageProducer msgProducer = null;
private Session session = null;
public void run() {
Context ctx = new InitialContext(/*connection propoerties*/);
HornetQJMSConnectionFactory jmsConnectionFactory = (HornetQJMSConnectionFactory) ctx.lookup("jms/RemoteConnectionFactory");
this.connection = jmsConnectionFactory.createConnection("jmsuser", "jmsuser#123");
this.session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
Destination sendToDestination = (Destination) ctx.lookup("jms/goSendToSvrQueue");
this.msgProducer = this.session.createProducer(sendToDestination);
this.connection.start();
}
public boolean sendMsg (/*parameters*/) {
ObjectMessage message = this.session.createObjectMessage();
// set MessageObject and Properties
this.msgProducer.send(message);
}
public void stop ()
this.connection.stop();
}
}
}
The client uses stop() on exit.
For now my MessageBean looks like:
#MessageDriven(
activationConfig ={
#ActivationConfigProperty(propertyName="destinationType", propertyValue="javax.jms.Queue"),
#ActivationConfigProperty(propertyName="maxSession",propertyValue="1"),
#ActivationConfigProperty(propertyName="destination", propertyValue="jms/goSendToSvrQueue")
})
public class GoMsgBean implements MessageListener {
#ApplicationScoped
#Inject
JMSContext jmsCtx;
//This is queue client listens to. Server sends replies to it.
#Resource(name = "java:jboss/exported/jms/goReceiveFmSvrQueue")
private Queue svrSendQueue;
public GoMsgBean () {
}
#PostConstruct
public void myInit () {
System.out.println("XXXXXXXXXX Post Construct - GoMsgBean XXXXXXXXXX");
}
#PreDestroy
public void myDestroy () {
System.out.println("XXXXXXXXXX Post Destroy - logger XXXXXXXXXX");
}
public void onMessage(Message msg) {
System.out.println("XXXXXXXXXX MessageBean received a Message XXXXXXXXX");
}
}
Even infrequent I don't see a problem keeping the connection open, unless you have serious resource constraints, messaging protocols are usually light-weight enough to just keep open and not worry about connect/disconnect/reconnect. ActiveMQ's documentation says exactly that, and though I can't find the per-connection memory overhead it's not a lot. There's also server-side configuration that can help manage large volumes of messages, but again, I'm not worried about it.
One disadvantage of ActiveMQ is that it doesn't support true clustering, so if you're really dealing with 10's or 100's of thousands of connections, then you're going to have problems.
And in the end, you'll need to do performance analysis on your end to make sure the application behaves with the server.
If your application is sending messages frequently to the same destination then it is a best practice to create connection, session and producer once and re-use them because creating connection, session etc are costly operations.
If messages are not sent frequently, then it's better to create all the required objects, send message and close the objects. This way resources are freed up on the messaging provider.

Why are all my Kafka messages being replayed in Storm?

I'm trying to figure out why all my Kafka messages are getting replayed every time I restart my Storm topology.
My understanding how how it should work were that once the last Bolt have ack'ed the tuple the spout should commit the message on Kafka, and hence I should not see it replay after a restart.
My code is a simple Kafka-spout and a Bolt which just print every message and then ack'ing them.
private static KafkaSpout buildKafkaSpout(String topicName) {
ZkHosts zkHosts = new ZkHosts("localhost:2181");
SpoutConfig spoutConfig = new SpoutConfig(zkHosts,
topicName,
"/" + topicName,
"mykafkaspout"); /*was:UUID.randomUUID().toString()*/
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
return new KafkaSpout(spoutConfig);
}
public static class PrintBolt extends BaseRichBolt {
OutputCollector _collector;
public static Logger LOG = LoggerFactory.getLogger(PrintBolt.class);
#Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
#Override
public void execute(Tuple tuple) {
LOG.error("PrintBolt.0: {}",tuple.getString(0));
_collector.ack(tuple);
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("nothing"));
}
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka", buildKafkaSpout("mytopic"), 1);
builder.setBolt("print1", new PrintBolt(),1).shuffleGrouping("kafka");
}
I have not provided any config settings than those in the code.
Am I missing a config-setting or what am I doing wrong?
UPDATE:
To clarify, everything works fine until I restart the pipeline. The below behavior is what I can get in other (non-storm) consumers, and what I expected from the KafkaSpout
My expectations:
However the actual behavior Im getting using the default setting is the following. The messages are processed fine up to I stop the pipeline, and then when I restart I get a replay of all the messages, including those (A and B) which I believed I had ack'ed already
What actually happens:
As per the configuration options mentioned by Matthias, I can change the startOffsetTime to Latest, however that is literally the latest where the pipeline is dropping the messages (Message "C") that were produced while the pipeline were restarting.
I have a consume written in NodeJS (using npm kafka-node) which is able to ack messages to Kafka and when I restart the NodeJs consumer it does exactly what I expected (catchup on message "C" which were produced when the consumer were down and continue from there) -- so how do I get the same behavior with the KafkaSpout?
The problem were in the submit code -- the template code for submitting the topology will create a instance of LocalCluster if the storm jar is run without a topology name, and the local cluster does not capture the state and hence the replay.
So
$ storm jar myjar.jar storm.myorg.MyTopology topologyname
will launch it on my single node development cluster, where
$ storm jar myjar.jar storm.myorg.MyTopology
will launch it on an instance of LocalCluster

How to run spring batch jobs simultaneously which share same readers and writers instances?

This is how my existing system works.
I have batch written using spring batch which writes messages to queues ASYNCHRONOUSLY. The writers once send certain number of messages to queue, starts listening to LINKED_BLOCKING_QUEUE for same number of messages.
I have spring amqp listeners which consumes messages and process them. Once processed, consumer replies back on reply queue. There are listeners which listens to reply queue to check whether messages are successfully processed or not. The reply listener retrives response and add it to LINKED_BLOCKING_QUEUE which is then fetched by writer. Once writer fetch all responses finishes batch. If there is exception, it stops the batch.
This is my job configurations
<beans:bean id="computeListener" class="com.st.symfony.Foundation"
p:symfony-ref="symfony" p:replyTimeout="${compute.reply.timeout}" />
<rabbit:queue name="${compute.queue}" />
<rabbit:queue name="${compute.reply.queue}" />
<rabbit:direct-exchange name="${compute.exchange}">
<rabbit:bindings>
<rabbit:binding queue="${compute.queue}" key="${compute.routing.key}" />
</rabbit:bindings>
</rabbit:direct-exchange>
<rabbit:listener-container
connection-factory="rabbitConnectionFactory" concurrency="${compute.listener.concurrency}"
requeue-rejected="false" prefetch="1">
<rabbit:listener queues="${compute.queue}" ref="computeListener"
method="run" />
</rabbit:listener-container>
<beans:beans profile="master">
<beans:bean id="computeLbq" class="java.util.concurrent.LinkedBlockingQueue" />
<beans:bean id="computeReplyHandler" p:blockingQueue-ref="computeLbq"
class="com.st.batch.foundation.ReplyHandler" />
<rabbit:listener-container
connection-factory="rabbitConnectionFactory" concurrency="1"
requeue-rejected="false">
<rabbit:listener queues="${compute.reply.queue}" ref="computeReplyHandler"
method="onMessage" />
</rabbit:listener-container>
<beans:bean id="computeItemWriter"
class="com.st.batch.foundation.AmqpAsynchItemWriter"
p:template-ref="amqpTemplate" p:queue="${compute.queue}"
p:replyQueue="${compute.reply.queue}" p:exchange="${compute.exchange}"
p:replyTimeout="${compute.reply.timeout}" p:routingKey="${compute.routing.key}"
p:blockingQueue-ref="computeLbq"
p:logFilePath="${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/log.txt"
p:admin-ref="rabbitmqAdmin" scope="step" />
<job id="computeJob" restartable="true">
<step id="computeStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="computeFileItemReader" processor="computeItemProcessor"
writer="computeItemWriter" commit-interval="${compute.commit.interval}" />
</tasklet>
</step>
</job>
</beans:beans>
This is my writer code,
public class AmqpAsynchRpcItemWriter<T> implements ItemWriter<T> {
protected String exchange;
protected String routingKey;
protected String queue;
protected String replyQueue;
protected RabbitTemplate template;
protected AmqpAdmin admin;
BlockingQueue<Object> blockingQueue;
String logFilePath;
long replyTimeout;
// Getters and Setters
#Override
public void write(List<? extends T> items) throws Exception {
for (T item : items) {
Message message = MessageBuilder
.withBody(item.toString().getBytes())
.setContentType(MessageProperties.CONTENT_TYPE_TEXT_PLAIN)
.setReplyTo(this.replyQueue)
.setCorrelationId(item.toString().getBytes()).build();
template.send(this.exchange, this.routingKey, message);
}
for (T item : items) {
Object msg = blockingQueue
.poll(this.replyTimeout, TimeUnit.MILLISECONDS);
if (msg instanceof Exception) {
admin.purgeQueue(this.queue, true);
throw (Exception) msg;
} else if (msg == null) {
throw new Exception("reply timeout...");
}
}
System.out.println("All items are processed.. Command completed. ");
}
}
Listener pojo
public class Foundation {
Symfony symfony;
long replyTimeout;
//Getters Setters
public Object run(String command) {
System.out.println("Running:" + command);
try {
symfony.run(command, this.replyTimeout);
} catch (Exception e) {
return e;
}
return "Completed : " + command;
}
}
This is reply handler
public class ReplyHandler {
BlockingQueue<Object> blockingQueue;
public void onMessage(Object msgContent) {
try {
blockingQueue.put(msgContent);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
Now, the problem is, I want to run multiple batches with unique batch id simultaneously which will process different data (of same type) for different batches.
As the number of batches are going to be increased in future, I don't want to keep adding separate queues and reply queues for each batch.
And also, to process messages simultaneously, I have multiple listeners (set with listener concurrency ) listening to queue. If I add different queue for different batches, number of listeners running will be increased which may overload servers (CPU/Memory usage goes high).
So I don't want to replicate same infrastructure for each type of batch I am going to add. I want to use same infrastructure just writers of specific batch should get only its responses not the responses of other batches running simultaneously.
Can we use same instances of item writers which use same blocking queue instances for multiple instances of batches running parallel ?
You may want to look into JMS Message Selectors.
As per Docs
The createConsumer and createDurableSubscriber methods allow you to specify a message selector as an argument when you create a message consumer.
The message consumer then receives only messages whose headers and properties match the selector.
There is no equivalent of a JMS message selector expression in the AMQP (RabbitMQ) world.
Each consumer has to have his own queue and you use an exchange to route to the appropriate queue, using a routing key set by the sender.
It is not as burdensome as you might think; you don't have to statically configure the broker; the consumers can use a RabbitAdmin to declare/delete exchanges, queues, bindings on demand.
See Configuring the Broker in the Spring AMQP documentation.

Categories

Resources