Time
9 hours 31 minutes
Difficulty
Intermediate
CEU/CPE
10

Video Description

This lesson discusses a two-fold mitigation strategy: *input validation: whitelists, black lists and regular expressions (regex) *output encoding The instructor offers samples and discusses what to include when doing defensive coding with the emphasis that white list is more desirable than black list as black list is limited. It is also crucial to not use regex for all the fields as it nullifies the application.

Video Transcription

00:03
Hello and welcome to the Cyber Aires secure coding course. My name is Sonny Wear, and this is a WASP Top 10 for 2013 cross site scripting medications, Countermeasures and defenses.
00:19
Now four Defenses overview. We're basically going to look at the two fold mitigation strategy,
00:28
and those two components include input, validation and output in coding.
00:34
Now the reason why I'm putting them together is because in addressing any of the cross site scripting attacks that we looked at,
00:44
you need to incorporate both of these in order to properly mitigate the problem. Now for input. Validation were speaking on Lee in terms of server side input validation. Never do we want to have
01:00
any validation done on our client side because, as we've seen in our demos,
01:06
it is very easy for Attackers to manipulate that information
01:11
to manipulate any kind of validation checks that are done within the browser.
01:17
We're gonna look at white listing blacklisting and regular expressions, which are common ways that input validation is done. If you don't have a built in framework, method or function available,
01:32
and then we'll take a look at a code sample showing some input validation.
01:38
The other main area is output in coding.
01:42
Now, output in coding is going to neutralize any kind of special characters that might be misinterpreted by either the Web server or Web browser that might be contained in an http response.
01:57
So what output in coding essentially does is it forces a conversion of all characters
02:06
to some sort of character scheme or character set.
02:10
Now why is this said Advantageous? Because that forces the characters to be treated as data instead of the Web server or the Web browser actually executing malicious scripts unintentionally.
02:28
Generally speaking, your L in coding is used for output in coding hand. It usually conforms the utf eight, and I'll explain why in just a moment
02:38
we will also look a code sample of that. And also I'm going to go through the output in coding Web page context that you need to make sure you include when doing your defensive coating now input validation techniques.
02:54
As I mentioned, if you do not have something built into your framework already, these are some alternatives. White lis blacklists and regular expressions. Now white lis we spoke about in a one injection as another defense there.
03:13
These are the same ones. Basically, you're going to define your white lis to be as specific and tight as you possibly can,
03:23
yet wide enough to still allow for the business functionality to be maintained in the page. Now what do I mean by that? What I mean is, if you have a text box that on Lee receives a name,
03:38
then you should Onley receive
03:40
Alfa alphabetic characters. You should not receive any numbers. That's just one example.
03:47
So obviously we've gone through white. Listen how what they do is they only
03:53
allow acceptable values to pass through,
03:59
and those acceptable values can actually be defined in an array in a new Marais shin. Possibly in Constance may be defined in header file or common area
04:11
and, of course, in regular expression pattern matching.
04:15
Now, black lists are exactly the opposite. They're going to reject malicious characters that are defined in the listing but realize that those air chest, the malicious characters that are known at the time of the writing
04:30
until that makes blacklists very limited in their effectiveness. And remember that every time there's a new exploit, a new way to get around a black list that's going to require more manual updating of your code and re compiling of the code, etcetera.
04:47
So the preferences to always go with the white list over a blacklist.
04:54
And then, finally, you can use regular expressions. Of course,
04:57
regular expressions could be part of your white list if you like.
05:01
This does require some learning of rejects pattern building. You can consult some very good books by O'Reilly on the subject, but to give an example. Here I have a carrot, the less than sign greater than signed the Empress stand. Some slashes in a tick.
05:19
This actually is a regular expression,
05:23
the kid carrot, meaning not
05:26
and then the other characters being literal. And so this could be used in a blacklist fashion, for example, to not accept any of these characters. But as we talked about before, you can use your oral in coding too easily. Get around a blacklist of this type.
05:45
The illustration is just to show you what is involved in creating and understanding howto build regular expressions.
05:54
One other caveat.
05:56
Please do not use one regular X pattern
06:00
to have all of the fields in your entire Web application go through.
06:04
That takes basically makes the
06:09
validation completely nullified. So that's why I was saying, Go back to the business purpose for a particular field.
06:19
See if you can identify what would be acceptable and then create a special white list for that field. And you have repeated fields in every application, name, address, et cetera.
06:34
And so you should probably come up with a common library that can address those common types so that you can reuse given rejects patterns.
06:46
So moving now to our input validation code sample, as we saw before in our explanation section, we have a variable from our client side called First name,
06:59
and we can get the value stored in that variable very easily. From our request object,
07:05
we can call, get parameter method, and then we can take that value whatever might be stored in that variable and assign it to a local variable. In this case, it's called first named parameter. Now, instead of immediately starting to use first name parameter,
07:24
what we would do instead is past that value through some sort of input validation. And in this case, the white list is made from a regular expression. Now, the regular expression I have here declared as my data validation white list.
07:44
It's set to receive on Lee Alfa characters,
07:47
so it will receive a through Z lower case in a Through Z upper case. Every single letter
07:56
that means that should there be some sort of character or number or special character that is contained in the first name variable that does not conform to this pattern, it will be rejected.
08:11
And so, as you look at the method must pass white list check. You can see how that's done. We have a pattern dot matches that passes in that value that we received and then compares it against our Reg X. If something should go wrong and it does not match,
08:31
then I'm actually throwing
08:33
a custom exception. White List failure exception.
08:37
So this is an example of how you could do the same in your code.
08:41
Now let's talk about output in coding. So as I mentioned, output in encoding helps to neutralize characters, particularly in our http response that we send back out into the browser
08:58
and you're Ellen coding. In particular is the general standard that's used by Web servers for the display of characters
09:11
as well as the receiving of characters, which is why you see a lot of attacks done with Earl Encoding for the input if you've noticed on. And that is for this very same reason, just the flip side of the coin, if you will.
09:26
And that is to ensure that the Web server actually understands the special characters. And so the attacker will
09:35
sin that input through Europe in coding.
09:37
Well, it can be used in a defensive way as well. For the http response.
09:45
And so you're really in coding, actually replace his characters in a string with one or more character triplets. Now what I mean by triplets. Basically, it's going to be a percentage sign, followed by two Hexi decimal numbers.
10:01
So, for example, percent to E is actually the dot notation.
10:07
So dot meaning current directory. Now you should realize that utf eight is theme Most predominant character said. That's generally used these days.
10:20
Four euro encoding. And the major reason why is because the 1st 128 values of UTF eight actually map directly to, uh, the United States asking codes. If you take a look at the excerpts I have below,
10:39
you can see the Unicode coat code point.
10:43
The actual character, which is what we recognize, right, the character literal.
10:48
And then you can see the UTF eight Hexi decimal number.
10:54
And so
10:54
what we have here is a less than sign is going to be represented in UT F. A. A is three c, which means in Ural encoding, it's going to be percentage three seats. If we look at the greater than sign,
11:09
it's the same idea we have the greater than sign in
11:16
asking.
11:18
But in utf eight, it's going to be represented as three e. Of course, making it in your old encoding means percent three e
11:26
And so taking that same example, in continuing with our output in coding, we can see how we can take the response that we're going to send back from this HTML page that's going to echo hello and someone's name
11:43
and see how we can instead send it through our rejects white lis to sanitize it
11:52
and also on the response on the way out, send it through some sort of encoder.
11:58
Now, the encoder that I'm showing on the slide is actually a job encoder library that's available through O ost,
12:09
but many
12:11
frameworks these days have built in encoders that you can call and so you would pass in your sanitized variable name into that encoding method or function
12:24
prior to sending it back out to your Web page.
12:28
And then, finally, I want to make you aware of all the many contacts that need to be encoded,
12:35
and I know that this could be a bit dizzying because
12:39
you don't realize just how many attack vectors are available on a Web page until you start watching a lot of demos or seeing a lot of exploits done. But
12:52
this is to make you aware that it's very important that you provide in coding
13:00
for all of the different context that might be available in your Web page.
13:03
So at a minimum, you should look at these contexts. Html Body
13:09
HTML attributes you're you are rail
13:13
any JavaScript that's in there, including any Jace on Jake weary type calls
13:22
and cascading style sheets.
13:24
Now, generally, you would have some sort of library that would be able to encode for all of these different context for you. What I'm showing is the drop down list for the encoder that we just that I had just mentioned on the previous slide.
13:41
But look inside of your framework and make sure that different context can be encoded. For
13:48
now, we're going to move on to the lab portion of this module.

Up Next

Secure Coding

In the Secure Coding training course, Sunny Wear will show you how secure coding is important when it comes to lowering risk and vulnerabilities. Learn about XSS, Direct Object Reference, Data Exposure, Buffer Overflows, & Resource Management.

Instructed By

Instructor Profile Image
Sunny Wear
Instructor