Restrict download file bandwidth/speed in Servlet

Restrict download file bandwidth/speed in Servlet - java

we got high-load java application which works in clustered mode.
I need to add ability to download and upload files for our customers.
For storing files i'm going to user gridFs, not sure, it's best choice, but mongo can be clustered and mongo can replicate data between diff nodes.
That's exactly what i need.
Different group of users should be limited with different bandwidth. Based of some business rules i should restrict download speed for some users.
I saw few solutions for this
Most of them works same way.
Read bunch of bytes
Sleep thread
Repeat
Mongo simply provide me InputStrem and i can read from that stream and write to servlet output stream. I'm not sure it is valid approach. Also I'm afraid, that users can create a lot of concurent threads during download and it can hurt performance.
Could it be an issue for servlet container ?
If it could be an issue, how can it be avoided ? probably using nio ?
I prefer to use pure java solution.
Any help will be highly appreciated.

Leaky bucket or token bucket algorithms can be used to control the network bandwidth.
EDIT: I did some quick prototyping and implemented the algorithm leveraging Servlet 3.0 asynchronous processing. Results are pretty good. Full source code can be found on GitHub. Have fun!

Also I'm afraid, that users can create a lot of concurent threads during download and it can hurt performance.
Could it be an issue for servlet container ?
Yes, it could.
If it could be an issue, how can it be avoided ? probably using nio ?
NIO won't help per se. It certainly won't prevent the low-bandwidth responses tying up threads for long periods of time.
I think what you would need to do is to implement downloads in a special web container. I'm not sure, but I think that Servlet 3.0 with async mode might do the trick.

Related

GAE/GWT server side data inconsistent / not persisting between instances

I'm writing a game app on GAE with GWT/Java and am having a issues with server-side persistent data.
Players are polling using RPC for active games and game states, all being stores on the server. Sometimes client polling fails to find game instances that I know should exist. This only happens when I deploy to google appspot, locally everything is fine.
I understand this could be to do with how appspot is a clouded service and that it can spawn and use a new instance of my servlet at any point, and the existing data is not persisting between instances.
Single games only last a minute or two and data will change rapidly, (multiple times a second) so what is the best way to ensure that RPC calls to different instances will use the same server-side data?
I have had a look at the DataStore API and it seems to be database like storage which i'm guessing will be way too slow for what I need. Also Memcache can be flushed at any point so that's not useful.
What am I missing here?

You have two issues here: persisting data between requests and polling data from clients.
When you have a distributed servlet environment (such as GAE) you can not make request to one instance, save data to memory and expect that data is available on other instances. This is true for GAE and any other servlet environment where you have multiple servers.
So to you need to save data to some shared storage: Datastore is costly, persistent, reliable and slow. Memcache is fast, free, but non-reliable. Usually we use a combination of both. Some libraries even transparently combine both: NDB, objectify.
On GAE there is also a third option to have semi-persisted shared data: backends. Those are always-on instances, where you control startup/shutdown.
Data polling: if you have multiple clients waiting for updates, it's best not to use polling. Polling will make a lot of unnecessary requests (data did not change on server) and there will still be a minimum delay (since you poll at some interval). Instead of polling you use push via Channel API. There are even GWT libs for it: gwt-gae-channel, gwt-channel-api.

Short answer: You did not design your game to run on App Engine.
You sound like you've already answered your own question. You understand that data is not persisted across instances. The two mechanisms for persisting data on the server side are memcache and the datastore, but you also understand the limitations of these. You need to architect your game around this.
If you're not using memcache or the datastore, how are you persisting your data (my best guess is that you aren't actually persisting it). From the vague details, you have not architected your game to be able to run across multiple instances, which is essential for any app running on App Engine. It's a basic design principle that you don't know which instance any HTTP request will hit. You have to rearchitect to use the datastore + memcache.
If you want to use a single server, you can use backends, which behave like single servers that stick around (if you limit it to one instance). Frankly though, because of the cost, you're better off with Amazon or Rackspace if you go this route. You will also have to deal with scaling on your own - ie if a game is running on a particular server instance, you need to build a way such that playing the game consistently hits that instance.

Remember you can deploy GWT applications without GAE, see this explanation:
https://developers.google.com/web-toolkit/doc/latest/DevGuideServerCommunication#DevGuideRPCDeployment
You may want to ask yourself: Will your application ever NEED multiple server instances or GAE-specific features?
If so, then I agree with Peter Knego's reply regarding memcache etc.
If not, then you might be able to work around your problem by choosing a different hosting option (other than GAE). Particularly one that lets you work with just a single instance. You could then indeed simply manage all your game data in server memory, like I understand you have been doing so far.
If this solution suits your purpose, then all you need to do is find a suitable hosting provider. This may well be a cloud-based PaaS offer, provided that they let you put a hard limit (unlike with GAE) on the number of server instances, and that it goes as low as one. For example, Heroku (currently) lets you do that, as far as I understand, and apparently it's suitable for GWT applications, according to this thread:
https://stackoverflow.com/a/8583493/2237986
Note that the above solution involves a bit of fiddling and I don't know your needs well enough to make a strong recommendation. There may be easier and better solutions for what you're trying to do. In particular, have a look at non-cloud-based hosting options and server architectures that are optimized for highly time-critical, real-time multiplayer gaming.
Hope this helps! Keep us posted on your progress.

Does anyone use URL Rewriting in production?

I've used tuckey's UrlRewriteFilter in small projects, but I'm hesitant to use such a thing in a production environment that could touch tens of thousands of paying customers (it feels kludge-y). Is it fine to use a rule-based rewriting engine in production, and what are some alternatives I could use for clean URLs?

We're using the UrlRewriteFilter by Tuckey in our production environment without any noticeable issues or performance downfalls. Our services are heavily used with more then 10k hits per sec.
If you're using UrlRewrite just to process RESTful URLs - think about switching to Spring 3.0 (http://blog.springsource.com/2009/03/08/rest-in-spring-3-mvc/).
Also, consider using JAX-RS but I have no extensive knowledge about it's performance vs Spring.

Yes, it is fine. A lot of large sites are doing it, in one way or another.

I've used URL rewriting in mid-scale sites (10-20k visitors/day) and have never found it to be a bottleneck. I haven't used the reqriter you mention so there's a chance it may prove problematic.
In general, unless you've got some REALLY convoluted rules, the overhead of rewriting is going to be negligible compared to say opening a database connection.
There are also benefits to the user in terms of usability/remembering URLs and (also friendly URLs seem to make users feel more confident). It's also nicer when you're digging through error logs :)

Is there a Java equivalent to libevent?

I've written a high-throughput server that handles each request in its own thread. For requests coming in it is occasionally necessary to do RPCs to one or more back-ends. These back-end RPCs are handled by a separate queue and thread-pool, which provides some bounding on the number of threads created and the maximum number of connections to the back-end (it does some caching to reuse clients and save the overhead of constantly creating connections). Having done all this, though, I'm beginning to think an event-based architecture would be more efficient.
In searching around I haven't found any equivalents to libevent for Java, but maybe I'm not looking in the right place? Mina-statemachine from Apache was the closest thing I found, but it looks more verbose than I need and there's no real release available.
Any suggestions?

I am a bit late but:
Have you looked at Netty?
Or Grizzly.

How about the Light Weight Event System? :) http://www.lwes.org/ and http://sourceforge.net/projects/lwes/files/

The answer seems to be 'no', though it looks like the Ruby EventMachine library provides a Java implementation for JRuby users that might be usable or at least serve as inspiration for writing my own:
http://github.com/eventmachine/eventmachine/tree/master/java/

You might be looking for a workflow engine like
JBPM or any other open source tool listed here.

PHP Java combination for multithreaded processing - good or bad?

i need to make multiple calls to different web services using PHP and i was wondering if the php-java combination would be more appropriate in dealing with this issue.
The multiple calls to the services if called sequentially will create a significant amount of delay, so i am looking for ways to overcome that.
I have read articles that 'simulate' concurrent processing in php and deal with this particular issue but i was wondering if the introduction of let's say a java socket server that accepts requests and creates worker threads would be more efficient (faster).
Any comments appreciated.
regards,

Interestingly I've been thinking about this issue as well. You have a number of options:
Use PHP calls to fork new processes;
Use a worker framework like beanstalkd to create work requests and have something pick them up;
Use something else like memcache to create work requests.
(2) is the interesting one (to me). You could run CLI PHP scripts to do the processing of beanstalk requests. Or you could use Java. Which depends on a large number of factors. I'd generally favour a single language environment over multi-language where possible and practical. But I can also envision instances where a Java backend would be a good idea.

That's exactly the reason why we switched from php to java - because of multithreading. We had an app that reads rss feeds through http. Switching from single threaded php app to several threads in java gave about 10x boost. I can't say anything about php threading simulation though.

Java concurrent recursive website download

I need to do a program to download the webpages, that is, I give a webpage to the software and it would download all the files in the website.
I would pass also a level of depth, that is, the level where the software goes download each file of the website.
I will develop this software in Java and I need to use concurrency also.
Please, tell me your opinion about how to do this.
Thanks for the help.
Thanks to everyone for the help.
I need to ask one more thing. How do I do to download a file from the website?
Thaks one more time. =D

A very useful library for spiders and bots: htmlunit

Well, this is a bit hard to answer without knowing how detailed guidance you need, but here's an overview. :)
Java makes such applications quite easy, actually, since both HTTP-requests and threading are readily available. My solution would probably involve a global stack containing new urls, and a farm of a constant number of threads that pop urls from the stack. I'd store the urls as a custom object, so that I could keep track of the depth.
I think your main issue here will be with sites that doesn't respond, or doesn't follow the HTTP standard. I've noticed many times in similiar applications that sometimes these doesn't time out properly, and eventually they end up blocking all the threads. Unfortunately I don't have any good solutions here.
A few useful classes as a starting point:
http://java.sun.com/javase/6/docs/api/java/lang/Thread.html
http://java.sun.com/javase/6/docs/api/java/lang/ThreadGroup.html
http://java.sun.com/javase/6/docs/api/java/net/URL.html
http://java.sun.com/javase/6/docs/api/java/net/HttpURLConnection.html

I would look at this recourses:
http://hc.apache.org/httpclient-3.x/
http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.html
http://java.sun.com/javase/6/docs/api/java/util/concurrent/locks/package-summary.html

I would have a look at the Java Executors package. You create a set of tasks (Runnables) and pass them to a suitable chosen Executor. You get a Future back and you can then query this for its result.
The Executor will coordinate when this Runnable is executed. Implementations exist for single-threaded executors, executors with a pool of threads etc. So you don't need to worry (too much) wrt. the threading intricacies. The concurrency utilities will look after this for you.
Apache HTTP Client will look after the HTTP querying for you.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.