Does anyone use URL Rewriting in production? - java

I've used tuckey's UrlRewriteFilter in small projects, but I'm hesitant to use such a thing in a production environment that could touch tens of thousands of paying customers (it feels kludge-y). Is it fine to use a rule-based rewriting engine in production, and what are some alternatives I could use for clean URLs?

We're using the UrlRewriteFilter by Tuckey in our production environment without any noticeable issues or performance downfalls. Our services are heavily used with more then 10k hits per sec.
If you're using UrlRewrite just to process RESTful URLs - think about switching to Spring 3.0 (http://blog.springsource.com/2009/03/08/rest-in-spring-3-mvc/).
Also, consider using JAX-RS but I have no extensive knowledge about it's performance vs Spring.

Yes, it is fine. A lot of large sites are doing it, in one way or another.

I've used URL rewriting in mid-scale sites (10-20k visitors/day) and have never found it to be a bottleneck. I haven't used the reqriter you mention so there's a chance it may prove problematic.
In general, unless you've got some REALLY convoluted rules, the overhead of rewriting is going to be negligible compared to say opening a database connection.
There are also benefits to the user in terms of usability/remembering URLs and (also friendly URLs seem to make users feel more confident). It's also nicer when you're digging through error logs :)

Related

Penetration testing on Spring Security

What docs or links are there that would describe passed bug that turned in to vulnerabilities of spring security and how does one go about trying to hack into the framework?
I'm mainly interested in how to work around login and authorization of the spring security framework. Any technique that would help me to create automated penetration tests would be great help.
Unrelated side-note (would like to know your opinion on it though):
One feature that I would make the framework awesome is banning of IP addresses that exceeded daily request limit, or even better request limit per hour. So far I have been doing his on Cisco ASA or IPS units but it would be very useful to have application log of the events integrated together with spring security.
Any intel on this would be greatly appreciated.
XSS exploits are one of the most infamous, but, you've probably tried them.
Check the OWASP Top 10 (google please). You can try all of those.
Also, for blocking requests, that seems like a very good idea, until you realize the fact that raw sockets allow any even remotely experienced hacker to forge their IP. I guess you could validate the IP (because many times the script used to attack the page would not have a valid IP, as they would be randomly generated), but, websites are slow enough anyway, and as soon as you would do that, you'd still be vulnerable to slightly more sophisticated attacks that use IP databases.
Also, check out Metasploit, its VERY easy to use, and should get you on the road really quickly. You can also script penetration tests, so, once you deploy an update, just run a Ruby script and it should test it for holes.

is spring security worth the effort

i've been looking over spring security and noticing how its a pretty big package. I'm wondering if it's worth the effort to take the time to ramp up on this.
has spring security 2.0+ saved you a large amount of time, or has it simplified your project in any way?
I considered using it for a Spring project a couple of years back, and opted against it because it was a tremendously heavy and complex framework and the flexibility that it provides just wasn't necessary IMHO. It was (in my estimation) less effort to roll our own authentication/authorization. Don't misinterpret this as meaning that it was a trivial effort; effective security never is.
From a risk standpoint, I didn't understand it deep down after spending some time with the documentation, and decided that the complexity represented a significant risk of misconfiguration. It may be "better" than what we built, but if we didn't understand how to use and configure it properly, then it wasn't going to live up to its potential. A custom-implemented (and possibly "inferior") security module that I understand inside-out is less concerning.
Disclaimer: Spring Security was still called Acegi at the time, and the current technology may well have changed along with the name.
Has spring security 2.0+ saved you a large amount of time, or has it simplified your project in any way?
For my project, yes and yes.
It very much depends on how simple or complex your security requirements are.
If you only need to do simple things, you can get away with only reading the small part of the SpringSecurity documentation that is relevant to your problem. Or just borrow stuff from the samples.
If you are doing complicated things like talking to an enterprise LDAP service or using OpenID, then using SpringSecurity is going to be much simpler that implementing things yourself starting from (non-spring) third-party libraries.
In my experience, decent website security is complicated and time-consuming, no matter how you implement it.
If the alternative is no security or writing and maintaining my own, I'd rather learn Spring Security.
If I can amortize the learning curve over several projects, so much the better.

What language (Java or Python) + framework for mid sized web project?

I plan to start a mid sized web project, what language + framework would you recommend?
I know Java and Python. I am looking for something simple.
Is App Engine a good option? I like the overall simplicity and free hosting, but I am worried about the datastore (how difficult is it to make it similarly fast as a standard SQL solution? + I need fulltext search + I need to filter objects by several parameters).
What about Java with Stripes? Should I use another framework in addition to Stripes (e.g. for database).
UPDATE:
Thanks for the advice, I finally decided to use Django with Eclipse/PyDev as an IDE.
Python/Django is simple and elegant, it's widely used and there is a great documentation. A small disadvantage is that perhaps I'll have to buy a VPS, but it shouldn't be very hard to port the project to App Engine, which is free to some extent.
Since you mentioned python, I would suggest looking into Django. You may need to look harder for hosting options, however...
Is App Engine a good option? I like the overall simplicity and free hosting, but I am worried about the datastore (how difficult is it to make it similarly fast as a standard SQL solution? + I need fulltext search + I need to filter objects by several parameters).
App Engine is nice. It supports Python or Java (with some limitations), and it provides free hosting for small needs (rare, at least for Java). But I wouldn't expect the exact same performances as with dedicated servers, the cloud is about scalability, not performance (you won't always get the fastest response time for a single hit; however, GAE would handle gazillions of concurrent hits without any problem while your servers would be on fire). But this scalability is not without cost; if you don't need it, the development tradeoffs may be too much trouble. And also note that it does not support full-text search out of the box (what an irony), you will have to use extra tooling.
What about Java with Stripes? Should I use another framework besides Stripes (e.g. for database).
I like Stripes very much. I love its conventions over configuration approach, it's a very elegant and simple framework (but still powerful). Definitely not a bad choice. For persistence, if you go for GAE, you will have to use JPA or JDO. If you don't, it's at your discretion (although I would go for JPA).
See also
Google AppEngine - A Second Look
As many things in life, this depends on what your goals are. If you intend to learn a web framework that is used in corporate environments, then choose a Java solution. If not, don't. Python is certainly more elegant and generally more fun in pretty much every way.
As to which framework to use, django has the most mindshare, as evidenced by the number of questions asked about it here. My understanding is that it's also pretty good. It's best suited for CMS-like web sites, though - at least that's what it's coming from and what it's optimized for. You might also have a look at one of the simpler, nimbler ones, such as the relatively new flask. All of these are enjoyable, though they may not all have all features on AppEngine.
Kay and Tipfy are excellent Python framework choices when you target specifically GAE. Kay is modelled after and similar to Django, but is better suited to GAE.
I've been kick App Engine around a little bit, and so far the DataStore is pretty quick... there is a bit of a learning curve compared to SQL, but I've had no real issues. I'm not sure about fulltext search, however filtering is simple, you would just run each filter one at a time.
class DBModel(db.Model):
field1 = db.StringProperty()
field2 = db.StringProperty()
field3 = db.IntegerProperty()
GQLObj = DBModel.all().filter('field1 =', 'Foo')
GQLObj = GQLObj.filter('field2 =', 'Bar')
As far as hosting, with GAE I'm not sure you even get a choice, I know you can register your own domain with google though.
I don't think the datastore is a problem. Many people will reject it out of hand because they want a standard relational database; if you are willing to consider a datastore in general then I doubt you will have any problems with the GAE datastore. Personally, I quite like it.
The thing that might trip you up is the operational limitations. For example, did you know that an HTTP request must complete within 10 seconds?
What if you get 50% of the way through a project and then find that a web service you are using sometimes take 15 seconds to respond? Now you are toast. You can't pay extra to get the limit raised or anything like that.
So, my point is that you must approach GAE with great care. Learn about the limitations and make sure that they will not be a problem before you start using it.
It depends on your personality. There's no right answer to this question any more than there's a right answer to "what kind of car should I drive?"
If you're artistic and believe code should be beautiful, use Rails.
If you're a real hacker type, I think you'll find a full-stack framework such as Rails or Django to be unsatisfying. These frameworks are "opinionated" software, which means you have to really embrace the author's vision to be most productive.
The wonderful thing about web development in the Python world is there's several great minimal frameworks. I've used several, including web.py, GAE's webapp, and cherrypy. These frameworks are like "here's a request, give me a string to serve up." It's raw. Don't think you'll be stuck in Python concatenating strings though, God no. There's also several excellent templating libraries for Python. I can personally recommend Cheetah but Mako also looks good.
Google App Engine + GWT and you have a pretty powerful combination for developing web applications. The datastore is quite fast, and it has so far done the job quite nicely for me.
In my project I had to do a lot of redesigning of my database model, because it was made for a traditional relational database, and some things were not (directly) possible with the datastore.
GWT has a fairly moderate learning curve, but it gets the job done very well. The gui code is really easy to get started with, but it's the asynchronous way of thinking that's the hardest part.
As for search I don't think it's supported in the framework. Filtering is possible on parameters.
There are some limitations to GAE, and you should consider them before putting all your eggs in that basket. The fact that GAE uses J2EE distribution standards makes the application very easy to move to a dedicated server, should the limitations of GAE become a problem. In fact I only think you would have to refactor the part of your code that makes the queries and stores the data (which shouldn't be much more than 100 lines).
I've built several apps on GAE (with Python) over the last year. It's hard to beat the ease with which you can get an app up and running quickly. Don't discount the value in that alone.
While you may not understand the datastore yet, it is extremely well documented and there are great resources - including this one - to help you get past any problem you might have.

Need a simple CMS. Should I customise an existing CMS or build from scratch?

I'm building a VERY simple web hosting service that will cater to the ma and pa type small business.
Now my dilemma is whether I should built it from scratch or use an existing CMS. The CMS needs to be customisable, as I wish to build my own client. I only want the user to have to put a title and content. Everything else will be hidden from the end user.
Later on I want to be able to associate my own custom built web applications to a given page, so the CMS needs to be able to cater for that.
The other requirement is that it needs to be Java based (Groovy acceptable).
Any ideas?
Even a "simple" CMS is a fairly complicated application once you start doing things like user authentication, security, scalability, etc. There are many many things you can get wrong. The major CMSes out there suffer from lots of maintenance problems and security bugs and there are a fair number of talented people working on them. If you think your CMS will be different, I suggest that it will not be. You will need to update and maintain it constantly. If you use an off-the-shelf solution you should be able to benefit from the work those other developers are doing. If you want to write your own code, consider customizing an off-the-shelf CMS or contributing new features or bug fixes.
Own CMS:
Total control over the features
Low cost
Easy for you to understand
Only you understand it and can fix it
No bug fixes from other developers
Off-the-shelf CMS:
You can simply install it, then concentrate on adding value
Steeper learning curve
Lots of hosted solutions and online help
Lots of people can admin it if you're not around
Bug fixes and security updates are released by the vendor
More limited in terms of customization, etc
Someone needs to keep on top of the updates and install them, or else the customer might be victim to a worm such as the WordPress worms. With your own CMS it's less likely that someone will create a worm just for you. (But your customers are still at risk of other security problems).
Consider the typical security issues that face every website faces: XSS, CSRF, SQL Injection, configuration errors, loose security, session hijacking, parameter validation errors, race conditions, etc. You need to handle all those cases, but the CMS vendors are already doing that for you.
As for your Java requirement, the Resin web server ships with a PHP interpreter which might allow you to deploy a PHP app in Java should you choose a PHP CMS. It should also be possible to port the PHP engine to another servlet container in a few hours (I think it's GPL).
You should definitely learn an existing CMS. As much as creating one from scratch will fit your needs. An existing CMS with a live community (such as drupal or django) will have sold foundations, based on the accumulated experience of others.
Doing so will help skip the common mistakes of creating a CMS, and will allow you to focus on functionality and making your product, versus building a framework.
Check out Alfresco Community Edition, it's java based and it's the first one to be compliant with CMIS 1.0. You can build up a web client/application on it. It has ftp, webdav, cifs, http interface.
Liferay? You'll be up and running in less than 5 minutes.
Java based, Supports Groovy
If you want to diy, take a look at CMIS. (Forget it, don't diy)
I can't imagine there isn't an existing CMS that would fit those requirements. Won't do the research for you though ;-)
If you want to figure out some Java EE technology aspects or advantages/disadvantages it is better to write your own CMS just for practice.
But before you do this go through some of the existing CMS to create list of pros and cons.
All existing CMS/frameworks/whatever can't fit well your own problem but instead they are often too big to cover as many cases as the can.
One more thing to think about. Do you want your CMS to work great at Google App Engine?

Key factors for designing scalable web based application

Currently I am working on web based application. I want to know what are the key factors a designer should take care while designing scalable web based application ?
That's a fairly vague and broad question and something you could write books about. How far do you take it? At some point the performance of SQL JOINs breaks down and you have to implement some sharding/partitioning strategy. Is that the level you mean?
General principles are:
Cache and version all static content (images, CSS, Javascript);
Put such content on another domain to stop needless cookie traffic;
GZip/deflate everything;
Only execute required Javascript;
Never do with Javascript what you can do on the serverside (eg style table rows with CSS rather than using fancy jQuery odd/even tricks, which can be a real time killer);
Keep external HTTP requests to a minimum. That means very few CSS, Javascript and image files. That may mean implementing some form of CSS spriting and/or combining CSS or JS files;
Use serverside caching where necessary but only after you find there's a problem. Memory is an expensive but often effective tradeoff for more performance;
Test and tune all database queries;
Minimize redirects.
Having a good read of highscalability.com should give you some ideas. I highly recommend the Amazon articles.
Every application is different. You'll have to profile your application to see where you should concentrate your optimization efforts. Some web applications might require database access optimizations, while others have complicated business logic that cause the bottleneck.
Don't attempt to optimize random arbitrary parts of you application without first profiling. You might end up having to support complicated optimized code that doesn't actually make your application snappier.
I get the sense from the other answers here that there is a general confusion between scalability and performance. High performance means that the response is quick. High scalability means that you get a response no matter how many others are also using the site at the same time. There's a big difference.
In fact, you actually have to sacrifice a little performance just to get good scalability. A general pattern to scalability is distributed computing. Factoring functionality out into separate tiers of clustered servers (web, business rules, database) is the usual approach to scalability. That extra round trip will slow down page load a little bit.
Everyone always wants to focus on high scalability but also don't forget that, for software vendors who sell licenses to customers who self host the application, scaling down can be just as important as scaling up. An application that can run on a single server for ten users but can also be configured to run on a ten server web cluster, a three server middle tier, and a four server database cluster for 10,000 users would be a system well designed for scalability.
None. Just code the application using proper design techniques (separation of concerns, etc) and then when the application is done or nearly done, do your performance testing. You'll find the real bottlenecks then - they won't be what you might have guessed in the beginning. This is where your proper design from the beginning comes into play - it makes it easy to make changes to fix the bottlenecks.
Sometimes, a specific answer is more helpful than just generic tips.
If you want to scale, the only thing to target is SPEED (in hardware and software) and RESOURCES (in hardware).
Hardware, the latter is expensive (more servers, load-balancers, etc.).
So, by carefully selecting your initial development framework you will save a lot of time and resources -up to several orders of magnitude.
For example, nginx is (much) faster than Apache.
Other solutions are faster than nginx (for both static and dynamic contents) but I could not disclose them without being censored on StackOverflow (it was rated SPAM & advertising despite the fact that this is a FREE solution).
That's the limits of "sharing": we must share only "acceptable" solutions rather than efficient solutions.
Cheers,
Pierre.

Categories

Resources