Edit and sanitize user input in a servlet when Code is allowed? - java

The webpage I'm working on with JSP and a Java Servlet needs to enable the user to write comments and articles which contain text but also Code of various languages (including html and javascript).
The data is stored in a mysql database and displayed later on the page.
For input, I thought to use one of the many WYSIWYG Editors out there.
Those usually produce (x)Html code for the database.
This means I need a type of sanitizing on serverside before inserting into the database since the editor could be easily circumvented and malicious code displayed onto the site (the database itself is secured by prepared statements).
What would be the best and most simple way to approach this topic?
And would it make more sense to switch to BBCode Input instead of html?
I've found several threads here around, but most don't take into account that code actually needs to be displayed on the site and most threads are several years old already.
Huge thanks in advance!

You can use KefirBB to use BBCodes or for HTML filtration.
https://github.com/kefirfromperm/kefirbb

Related

Edit, reload and redownload website in java?

I have plans for a small application, to gather some data from a website.
The website have a few textboxes, in which you can write different numerical values, then click a button and an output value will be written on the page.
What i want the application to do, is to fill the textboxes, then "click" the button and gather the output data.
Now i'm only really familiar with java, but my guess is that it's better to write such an application in javascript?
Also if it's doable in java, should i then be looking at some custom libaries, apart from jsoup which i've already used?
I already sort of figured out how to download the html and extract the data i need using jsoup, it's writing the values back into the textboxes that troubles me.
Thank you
There is an implementation of the DOM(Document Object Model, a data structure representing webpages as object trees) in jsoup that can help you to change the textboxes' values. If you're going to code your project in Java, then JSoup is the better choice to do the job.

What technologies are there for formatted, structured data input and output?

I am working on a project here that ingests internal resumes from people at my company, strips out the skills and relevant content from them and stores it in a database. This was all done using docx4j and Grails. This required the resumes to first be submitted via a template that formatted everything just right so that the ingest tool knew what to look for to strip the data.
The 2nd portion of this, is what if we want to get out a "reduced" resume from the database. In other words, I want to search the uploaded content I now have, and only print out new resumes for people who have Java programming experience lets say. So I can go into my database, find the people who originally had java as a skill, and output a new set of resumes that are also still in a nice templated format, and only have the relevant info in them, instead of ALL the content.
I have been writing some software to do this in Java that will basically use a docx template, overwriting the items in customXML which are bound to the content controls in the doc, so the new data shows up and can eb saved as a new docx with that custom data.
This seems really cumbersome to me, and has some limitations. For one, lets say my template has a place for 3 Skills, and the particular person has 8 skills. There seems to be no good way to add those 5 additional skills to the docx other than painstakingly inserting the data with all of the formatting XML tags and such. This is a real pain, because if the template changes, I dont want to have to go back into my software and edit source code to change that additional data input XML tag to bold instead of italic.
I was doing some reading up on using Infopath to create a form that I could use to get the input, connecting to some sharepoint data source or something to store the stripped out data. However, I can't seem to find out if it is possible using sharepoint to get the data back out, in a nice formatted way. What would the general steps for this be? It seems like I couldnt find very much about this topic with any quick googling.
Thanks
You could set up the skills:
<skills>
<skill>..</skill>
<skill>..</skill>
and use a "repeat" content control pointing to the container. This would handle any number of <skill> entries.

Looking for a question that combines the understanding of few web technologies [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am teaching a web development course at a CS department, I wrote most of the final test by now, each question focus on a specific feature or a specific technology,
I wonder if you can think of/recommend a question that combine the knowledge of few technologies..
The course mostly covers: HTML, CSS, JS, HTTP, Servlets, JSP and JDBC.
(as well as AJAX, ORM, basic security issues like SQL-Injection and XSS, HTML5, REST APIs)
EDIT: I will super appreciate questions with answers :-) thanks!
I'll give the bounty to the question with the highest rank, so please vote! I honestly like most of the questions here, thank you all :-)
Explain the relationship of the DOM to
each of the following technologies:
HTML, CSS, JavaScript.
The goal here is for the answer to make clear the student understands that HTML generates a DOM structure, CSS affects how that structure is rendered, and JavaScript affects how that structure is modified. If you understand how it all ties back into the DOM, all client-side coding becomes straightforward.
Fun question :-) How about...
On web development you need to separate content, style and behavior. Describe why this is done and what different technologies you use in which layer. Every acronym should be written in full text on first time use. (10 p)
or...
Describe what happens in a Web Browser (step by step) when a web page is transferred on the internet from a Web server through HyperText Transfer Protocol to a Client. Consider all the different technologies you have used in this course. (10 p)
Explain what happens, and which technologies could be used, when a user logs in to a protected web site using form based login that sets a HTTP cookie. (Starting from the HTML form all the way to the database and back to the browser.) Bonus question: What changes, when using AJAX for the login?
Answer (main points):
HTML: Form (using POST) with text input fields and a button. Security: Form sends via HTTPS. The login page itself should also be a HTTPS page (otherwise, the form could be replaced by mallory -> MITM)
Javascript: Performs some basic validation (e. g. empty password), and displays error message before sending to server.
Servlet: Receives POST request, takes username/password parameters (in plaintext), calculates (salted) hash from password, discards plaintext password.
JDBC: Selects hashed password from DB. Used to compare with the transmitted password.
Servlet: On success, creates a new session (leads to the creation of a cookie header). Prepares objects that will be used in the JSP page (and stores them in the session or request scope).
JSP: Prepares the HTML page that will be sent to the browser.
Browser: Receives HTTP response, sets cookie and displays the page.
Bonus (AJAX): The server doesn't have to prepare the entire page, but only sends the necessary data and/or HTML snippets to the client. The browser doesn't reload the entire page, but modifies the current page using JavaScript. Security: AJAX can't perform Cross-Site requests, so it's impossible to have a HTTP page submit the login data via HTTPS.
Caution
It should be noted, that this is not meant to be used as a HOWTO for building a secure login mechanism. This description is simplified and doesn't cover every security aspect. OTOH, as an exam question, it should probably be simplified further and adjusted to the content of the curriculum.
You can ask to explain how to implement MVC pattern. And in this MVC pattern where does each technology come in use. Rather How and Why ?
Since students have already developed simplified twitter during their course, you may ask a question like what additional steps they would do to make it a real twitter website or a clone of it and ask to describe each steps staring from html to ORM / database. You may explicitly specify the technologies to be used.
Well, putting on my "evil" hat for a moment, you could ask how the back end data model should dictate the layout of the front end, and any answer other than some variation of "It doesn't" gets to take the class over again. >:-)
Why should any framework you use generate
HTML, CSS and JS?
DRY
Imagine you work for a security agency
and were given the task of developing
a web-site. The field agents
specifilly requested that the site
could swap colors so that they could
use it both on night-vision and at the
office. With what you learned describe
how you would separate content from
structure to allow night/day switching
and what security measures you would
implement to prevent another enemy
agency from stealing your data.
A spiced up question. I always find my students more interested when I put them in the middle of a plot.
Something along the lines of...
Explain how you would display the results of a call to an offsite XML feed when the user performs some action in the browser. The browser must not navigate.
A good answer would address the need for client-side scripting, the XSS issue, and the server-side component necessary to get around the XSS issue, possibly with pseudocode or snippets.
ask to develop a student database system,in which you user can search the database with Date of Birth.
here the folowing technologies can be used and tested.
1.HTML for form controls
2.CSS for esthetics
3.Javascript for date validation
4.very importantly you can explain SQL INJECTION.
5.JSP
6.SERVLETS
7.JDBC
8.ANY database
9.AJAX
10.MVC design pattern can be used.

Java website protection solutions (especially XSS)

I'm developing a web application, and facing some security problems.
In my app users can send messages and see other's (a bulletin board like app). I'm validating all the form fields that users can send to my app.
There are some very easy fields, like "nick name", that can be 6-10 alpabetical characters, or message sending time, which is sended to the users as a string, and then (when users ask for messages, that are "younger" or "older" than a date) I parse this with SimpleDateFormat (I'm developing in java, but my question is not related to only java).
The big problem is the message field. I can't restrict it to only alphabetical characters (upper or lowercase), because I have to deal with some often use characters like ",',/,{,} etc... (users would not be satisfied if the system didn't allow them to use these stuff)
According to this http://ha.ckers.org/xss.html, there are a lot of ways people can "hack" my site. But I'm wondering, is there any way I can do to prevent that? Not all, because there is no 100% protection, but I'd like a solution that can protect my site.
I'm using servlets on the server side, and jQuery, on the client side. My app is "full" AJAX, so users open 1 JSP, then all the data is downloaded and rendered by jQuery using JSON. (yeah, I know it's not "users-without-javascript" friendly, but it's 2010, right? :-) )
I know front end validation is not enough. I'd like to use 3 layer validation:
- 1. front end, javascript validate the data, then send to the server
- 2. server side, the same validation, if there is anything, that shouldn't be there (because of client side javascript), I BAN the user
- 3. if there is anything that I wasn't able to catch earlier, the rendering process handle and render appropriately
Is there any "out of the box" solution, especially for java? Or other solution that I can use?
To minimize XSS attacks important thing is to encode any field data before putting it back on the page. Like change > to > and so on. This would never allow any malicious code to execute when being added to the page.
I think you are doing lot of right things by white listing the data you expect for different fields. Beyond that for fields which can allow other characters which can be problematic encoding would fix the issue for you.
Further since you are using Ajax it gives you some protection as people cannot override values in URL parameters etc.
Look at the AntiSamy library. It allows you to define rulesets for your application, then run your user input through AntiSamy to clean it per your rules.
The easiest way is to do a simple replacement for the following
< with <
> with >
' with \'
That will solve most database vulnerability.

Java Web Application

I am interested in creating a simple web application that will take in user input, convert it to an XML file and send the file to a database.
Coding wise I feel I am okay, it is just the general setup and what implementation to use I am a bit unsure of.
At the moment I have a JSP page containing a form, the user fills out the form and on submit a POST method is sent to a servlet, in the servlet doPost() method the servlet is instantiating a java object and passing it the user inputted data. The java object then writes that data to an XML file and sends it to the database via REST.
All I would be interested to know is if this the standard/optimal way of creating such a web application.
Any and all feedback is appreciated.
Thanks
For a "simple webapplication" this high level approach looks fine in general. However, if you want more critical feedback, you'd need to give more details about the low-level approach. It may for example happen that it isn't memory efficient and thus may break when the webapp is been used by over 10 users concurrently, just to give an example.
I only question the choice for the GET method. You'd normally only use it to retrieve data (SELECT), not to create/alter data (INSERT/UPDATE/DELETE). For that you'd normally use POST, so that no one can execute it "accidently" by just clicking a (bookmarked) link. Changing GET to POST isn't that hard, add method="post" to the <form> element and rename doGet() to doPost().

Categories

Resources