How to test pivotal gemfire with more than 4000 concurrent users inserting data into gemfire region and same number of concurrent users reading data from gemfire region.
Reading of data from gemfire region can happen after the insertion operation or parallel.
Can you please suggest some best solution around it.
The question is a bit ambiguous.
If you're looking at purely benchmarking GemFire then the YCSB framework would be a good place to start as it provides standardized tests across various IMDG and RDBM systems.
If you are looking for tools for your own app then I'd suggest looking at Jmeter. You'll obviously need to provide some custom code in order to do puts and gets but it will provide you with many other capabilities such as being able to scale your test and also quantify the results.
If you're looking for suggestions on a GemFire architecture to support the scale of your test then you'll need to provide more details as to the functional and non-functional requirements of your application.
Related
I would like to develop application for ~500 active users (sessions at one time). System would not process any massive calculations. It will be simple read/write to database solution. However, to application would be uploaded about 50mb of data daily per user. (it would be analysed and clean by other application every day when non users will be active). Actually I'm working on design of this application and I've got few questions about that.
Should I consider developing application working in some cluster with load balance or one server will handle this amount of usage?
If yes, is there any guidelines about developing application to work in cluster? Is there any difference than developing single server application?
Should I be worried about database of this application? What problems should I expect when 2 servers will read/write data to single database at same time? Maybe it also should work in cluster?
I would be pleased for any help and/or articles about design this mid size applications.
This depends on you NFR (non functional requirements). Next to load balancing, a cluster provides higher availability.
You'll have to make your back-end state-less so that requests from the same user can end up on another node without the user noticing. This makes it more expensive to build scaling software. So consider your options carefully.
Accessing a database from multiple servers is not different than accessing it from multiple threads.
To answer your first question, I think using an infrastructure provider that lets you easily scale (up or down) your application is always a big plus and can help you save money. My main experience with this kind of providers is with Amazon Web Services (AWS).
I don't know precisely what technology you are planning to use, but a general setup like that on AWS would make sense to me is:
A set of EC2 instances (= virtual servers) running behind an ELB (a load balancer)
An auto scaling group containing the EC2 instances. You can look it up, but an auto scaling group basically lets you automatically add and remove instances depending on various factors (server load, disk I/O, etc.)
The use of RDS for your database. It supports multiple DBMS such as MySQL and Oracle. It also provides you with nice features such as replication, automated backups and monitoring.
The use of CodeDeploy to deploy your application on the servers
(I'm voluntarly using the AWS names so that you can read the documentation if you are interested.)
This would basically let you scale to a lot more than 500 concurrent users if needed, and could save you some money when you are handling less users. Note that auto scaling groups can also be scheduled. For instance : « I want at least 5 instances during the day (max 50), but you can go down to 2 (and still up to 50) between 1am and 4am »
The services I mentionned are quite widely documented, so you can look it up if you'd like some more specific details.
I won't discuss in detail your two other questions because I'm not an expert on the subject, but the database can indeed be a bottleneck since it may involve a lot of I/Os.
Hope this helps :)
I want to create a java application for the purpose of handling and analyzing live streaming logs. I have to implement some complex filter functionality also. I was doing a research on finding the best suited database for the same.
I came across many portable database like mongodb, hbase, h2 and all. Among all, mongodb seems to be a better candidate. But for my requirement, there may be insertion and selection happening at the same time. Somewhere I read like, mongodb is not best at handling concurrency.
I'm sure, moving forward the performance of database is going to play a crucial role in the whole performance of the application.
I came across many stack overflow links regarding the same. But the thing is, all of them are asked 2 or more years back.
Can mongodb handle concurrency? Is there any other portable database which is better than mongodb for the same?
Please help.
Have you looked to some solution, for instance, like elasticsearch coupled with kibana and td-agent?
It provides asynchronous logging. I've used it to store and analyze 30 millions events per day from several servers, but it depends what you want to do in the end.
Currently I am gathering information what database servce we should use.
I am still very new to web development but we think we want to have a noSQL database.
We are using Java with Play! 2.
We only need a database for user registration.
Now I am already familiar with GAE ndb which is a key value store such as dynamoDB. MongoDB is a document db.
I am not sure what advantages each solution has.
I also know that dynamoDB runs on SSD's and mongoDB is inmemory.
An advantage of mongoDB would be that Java Play! already "supports" mongodb.
Now we don't expect too much database usage, but we would need to scale pretty fast if our app grows.
What alternatives do I have? What pros/cons do they have?
Considering:
Pricing
Scaling
Ease of use
Play! support?
(Disclosure: I'm a founder of MongoHQ, and would obviously prefer you choose us)
The biggest difference from a developer perspective is the querying capability. On DynamoDB, you need the exact key for a given document, or you need to build your keys in such a way that you can use them for range based queries. In Mongo, you can query on the structure of the document, add secondary indexes, do aggregations, etc.
The advantage of doing it with k/v only is that it forces you to build your application in a way that DynamoDB can scale. The advantage of Mongo flexible queries against your docs is that you can do much faster development, even if you discount what the Play framework includes. It's always going to be quicker to do new development with something like Mongo because you don't have to make your scaling decisions from the get go.
Implementation wise, both Mongo and DynamoDB can grow basically unbounded. Dynamo abstracts most of the decisions on storage, RAM and processor power. Mongo requires that you (or someone like us) make decisions on how much RAM to have, what kind of disks to use, how to managed bottlenecks, etc. The operations hurdles are different, but the end result is very similar. We run multiple Mongo DBs on top of very fast SSDs and it works phenomenally well.
Pricing is incredibly difficult to compare, unfortunately. DynamoDB pricing is based on a nominal per GB fee, but you pay for data access. You need to be sure you understand how your costs are going to grow as your database gets more active. I'm not sure I can predict DynamoDB pricing effectively, but I know we've had customers who've been surprised (to say the least) at how expensive Dynamo ended up being for the stuff they wanted to do.
Running Mongo is much more predictable cost-wise. You likely need 1GB of RAM for every 10GB of data, running a redundant setup doubles your price, etc. It's a much easier equation to wrap your head around and you're not in for quite as nasty of a shock if you have a huge amount of traffic one day.
By far the biggest advantage of Mongo (and MongoHQ) is this: you can leave your provider at any time. If you get irked at your Mongo provider, it's only a little painful to migrate away. If you get irked at Amazon, you're going to have to rewrite your app to work with an entirely different engine. This has huge implications on the support you should expect to receive, hosting Mongo is competitive enough that you get very good support from just about any Mongo specific company you choose (or we'd die).
I addressed scaling a little bit above, but the simplest answer is this: if you define your data model well, either option will scale out just about as far as you can imagine you'd need to go. You are likely to not do this right with Mongo at first, though, since you'll probably be developing quickly. This means that once you can't scale vertically any more (by adding RAM, disk speed, etc to a single server) you will have to be careful about how you choose to shard. The biggest difference between Mongo and Dynamo scaling is when you choose to make your "how do I scale my data?" decisions, not overall scaling ability.
So I'd choose Mongo (duh!). I think you can build a fantastic app on top of DynamoDB, though.
As you said, mongoDB is one step ahead among other options, because you can use morphia plugin to simplify DB interactions(you have JPA support as well). Play framework provides CRUD module (admin console) and secure module as well (for your overall login system), so I strongly suggest you to have a look at' em.
I'm trying to figure out what Java Restful framework is the best suitable fom heavily loaded identity manager server.
Did someone run load tests for Restful frameworks and is willing to share the conclusion?
Thanks a lot!
Great question! You'll probably find that the framework choice is not your primary determiner of performance/scalability. We've used Restlet, based on a very strong recommendation from a former colleague who used it to develop Overstock.com (a very large e-commerce site). It has good performance, and it works fine for Overstock.com. But we didn't do any head to head comparisons.
One of the big drivers for REST is its scalability, a quality of a distributed system whereby you can accomodate an increase in usage with a proportional increase in system size and cost. Caching is a key technique to achieve scalability. So if you allow your representations to be cached, much of the load is actually not borne by the identity management system but by web caches downstream. This is independent of the REST framework.
Your backend database technology is likely another primary factor in system performance and scaling. Tuning the database system and optimizing queries may pay off here. Also consider whether adding a database cache layer makes sense (eg, OpenSymphony).
We found that serialization costs were quite significant for us. Overall request rates were best if we used the Kryo or Smile binary serializations. If you need a textual serialization, we found that the Jackson JSON serializer was much faster then the XStream XML serializer, doubling the overall request rate. This might be an area to consider.
So if you haven't done so, examine your system from a scaling perspective. See http://www.highscalability.com, Richardson and Ruby's Restful Web Services (O'Reilly), Cal Henderson's Building Scalable Web Sites, and Theo Schlossnagle's Scalable Internet Architectures for a start.
Does anyone have experience with using Terracotta with Hibernate Search to satisfy application Queries?
If so:
What magnitude of "object
updates" can it handle? (How's the
performance)
What kind of performance do the
Queries have?
Is it possible to use Terracotta
Hibernate Search without even having
a backing Database to satisfy all
"queries" in Memory?
I am Terracotta's CTO. I spent some time last month looking at Hibernate Search. It is not built in a way to be clustered transparently by Terracotta. Here's why in a nutshell: Hibernate has a custom-built JMS replication of Lucene indexes across JVMs.
The basic idea in Search is that talking to local disk under lucene works really well, whereas fragmenting or partitioning up Lucene indexes across the network introduces sooo much latency as to make Lucene seem bad when it is not Lucene's fault at all. To that end, HIbernate Search doesn't rely on JBossCache or any in-memory partitioning / caching schemes and instead relies on JMS and each JVM's local disk in order to provide up-to-date indexing across a cluster with simultaneous low latency. Then, the beauty of Hibernate Search is that standard Hibernate queries and more can be launch through Hibernate at these natural language indexes in each machine.
At Terracotta it turns out we had a similar idea to Emmanuel and built a SearchableMap product on top of Compass. Each machine gets its own Compass store and the store is configured to spill to disk locally. Terracotta is used to create a multi-master writing capability where any JVM can add to the index and the delta is sent through Terracotta to be replayed / reapplied locally to each disk. It works just like Hibernate Search but with DSO as the networking protocol in place of JMS and w/o the nice Hibernate interfaces but instead with Compass interfaces.
I think we will support Hibernate Search w/ help from JBoss (they would need to factor out the JMS impl as pluggable) by end of the year.
Now to your questions directly:
1.Object updates/sec in Hibernate or SearchableMap should be quite high because both are sending only deltas. In Hibernate's case it is a function of our JMS provider. In Terracotta it is scalable just by adding more Terracotta Servers to the array.
Query performance in both is very fast. Local memory performance in most cases. And if you need to page in from disk, it turns out most OSes do a good job and can respond way faster than any network-based clustering can to queries.
It will be, I think, once we get JBoss to factor out their JMS assumptions, etc.
Cheers,
--Ari
Since people on the Hibernate forums keep referring to this post I feel in need to point out that while Ari's comments where correct at the beginning of 2009, we have been developing and improving a lot.
Hibernate Search provides a set of backend channels out of the box, like the already mentioned JMS based and a more recent addition using JGroups, but we made it also pretty easy to plug in alternative implementations or override some.
In addition to using a custom backend, it's now possible since version 4 to replace the whole strategy and instead of changing the backend implementation only you can use an IndexManager which follows a different design and doesn't use a backend at all; at this time we have two IndexManagers only but we're working on more alternatives; again the idea is to provide nice implementations for the most common
It does have an Infinispan based backend for very quick distribution of the index across different nodes, and it should be straight forward to contribute one based on Terracotta or any other clustering technology. More solutions are coming.