Giter Site home page Giter Site logo

Comments (29)

vkuznet avatar vkuznet commented on July 17, 2024

valya: Usage of encodeURIComponent function here is inappropriate since it encodes entire URI, rather then parameter values. For instance, let consider two calls from JS code shown in das_table.tmpl (I made stand-alone app to show this feature):

In first case we encode all parameters with encodeURIComponent inside of das_table.tmpl, e.g.

{{{
initialRequest: encodeURIComponent("input=find data&idx=0&limit=10")

}}}

and in this case the server get the following list of arguments

{{{

input kwargs {'input=find data&idx=0&limit=10': ''}

127.0.0.1 - - 966a4d2676b7013bd1fb3e238092cbc550322419Oct/2010:15:05:42] "GET /asearch?input%3Dfind%20data%26idx%3D0%26limit%3D10 HTTP/1.1" 200 236 "" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5"

}}}

In this case, all parameters are smashed into a single string and passed along to the server, while using urllib.quote_plus

{{{
initialRequest: encodeURIComponent("input="+urllib.quote_plus("find data") + "&idx=0&limit=10")

}}}

we get correct argument list on a server side

{{{

input kwargs {'input': 'find data', 'limit': '10', 'idx': '0'}

127.0.0.1 - - 966a4d2676b7013bd1fb3e238092cbc550322419Oct/2010:15:06:30] "GET /asearch?input=find+data&idx=0&limit=10 HTTP/1.1" 200 236 "" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5"

}}}

So, without doubts parameters need to be checked and properly encoded, but in this case usage of JS function is inappropriate.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: Replying to [ticket:545 valya]:

Also I note here that the quoting wasn't added universally everywhere - not in all templates, and not even systematically in the one example I happened to quote, das_table.tmpl. As I wrote before, it looks like every template needs to be sanitised. I can't easily tell which values are safe.

Lassi, I walk through templates and don't see obvious places which concerns me. I also not sure what do you mean by asking to sanitize every template. The data passed to templates and internal data of DAS server. For instance if in server I define a variable foo=1 and pass it into template which will use as $foo, does it require sanitization? I would appreciate if you'll pick up two distinguished examples and ask specific question which bother you about parameters passed to templates. I provided a comment about usage encodeURIComponent in das_table template. All templates are used to build HTML pages, some of them construct HTML links and my understanding that only those parts need to be sanitized to ensure that constructed HTML links are properly encoded.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: Lassi I would appreciate if you provide your feedback to my comments. In fact, to make your life easier I did clean the code and remove das_table.tmpl and associated code from DAS web server due to the fact that I no longer use this template. If you can find another example of inappropriate encoding in templates it would be helpful to me.

from das.

 avatar commented on July 17, 2024

lat: Everywhere where you produce HTML, you should sanitise / quote the value unless you know for sure it cannot be unsafe (e.g. it's an integer). HTML has two quoting contexts, URLs ({{{%nn}}}) and HTML ({{{&}}}), with subtly different rules what to quote.

Equally whenever you use javascript to generate HTML, you need to quote.

Then finally whenever you make requests from javascript via XHR, you again need to quote the arguments, unless you can without doubt prove that the argument is safe without quoting (e.g. integer).

Note that lack of quoting has been used in numerous attacks. Remember the attacks where people used funny searches - which just happened to <script> tags - to cause later results for "most recent searches" to cause arbitrary javascript to run, and worse yet, it runs in the security context of the originating page. It's not enough to say "it came from core server, it must be safe", if it's possible for someone to inject the data to the core server in the first place, either in form of a query, or database used, or some upstream source (e.g. funny file name).

So the general rule is that everything must be quoted by default, and you need to show positive proof why it is not necessary to quote the material.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: Lassi I am well aware of this general principles. What I asked is slightly different. Let me make concrete example without asking to look at specific template.

All our HTML is constructed via template. The template parameters are passed by server in variety of forms, e.g. strings, ints, lists, dicts, etc.. All of them are not part of the request and rather the results of it. The server get/construct data upon request from its back-end/other sources. The data is the python data which are passed to the template. My question is do you want to sanitize all python data we pass to template or not?

For example, the server has the following part:

{{{
def bla(self, ...):
mydata = dict(mydict={...}, mylist=[...], mykey=something, myval=something)
return template('template_name', data=mydata)
}}}

The template on another hand has code logic:

{{{
#for key, val in $data.items()

$key$val #end for }}}

So what do you want to sanitize here? The ''$key, $val'' or you don't need to do that?

My understanding that you want to sanitize parts which construct either URLs or JavaScript, but not the HTML body. So if I have code like

{{{
<a href="http://a.b.com/method?$mykey=$myval
}}}

then both ''$mykey'' and ''$myval'' need to be sanitized.

It would be nice if we will discuss this particular example to clarify things. Feel free to modify it to specify patterns/problems.

from das.

 avatar commented on July 17, 2024

lat: Thanks for the example. You should sanitise everything. Only if you know for sure the data is safe you can omit it. In particular, if you are representing data from another server, then by definition you cannot be sure of the contents, and you should sanitise it.

An example of safe not to sanitise is that say you page the data, and you set the page number in your own server code, and it cannot ever be anything other than an integer, and it's 100% obvious from inspecting the code that is the case, then you could not sanitise it. But if the page number is coming from the client XHR request, then you do need to sanitise it.

So by default you sanitise absolutely everything - HTML, URL (when generating e.g. attributes) and javascript (if you generate any) and in browser side, anything entered into URL arguments when making XHR requests or building URLs to refer to (=~ use encodeURIComponent()).

from das.

 avatar commented on July 17, 2024

lat: So quoting your example:

{{{
#for key, val in $data.items()

$quote($key)$quote($val) #end for }}}

And:

{{{
<a href="http://a.b.com/method?$urlquote($mykey)=$urlquote($myval)
}}}

In javascript:

{{{
var name = ...;
var href = "/some/base?name=" + encodeURIComponent(name);
}}}

from das.

 avatar commented on July 17, 2024

lat: In short, always sanitise everything :-)

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: I started this work and found it's going to be real fun. Here is example when I cannot quote:

{{{

#set total = $idx+$limit
#if $total>$nrows
#set total=$nrows
#end if
#set startidx=$idx+1
$startidx-$total

}}}
the idx/limit are passed to template as integers.

Another example

{{{

$results

}}}
here "$results" are passed by DAS server code from another template. So the flow is the following

  • DAS get some data and pass it to template #1
  • it assigns variable "results" to the template itself, such that results is a string which contains HTML snippet (and may be JS)
  • I pass "results" into new template where it gets inserted into div tag. If I'll quote results in aforementioned div tag I can screw my HTML snippet.

So, what I want to say that just looking at templates it is not obvious to blindly require to quote every single variable with dollar prefix. What I can do is provide comments in template about variable which does not require the quoting.

from das.

 avatar commented on July 17, 2024

lat: Yes, quoting is fun, especially avoiding double quoting.

If you can prove a value is without shadow of doubt an integer, it doesn't need to be quoted. That requires it's not tainted, e.g. the value didn't originate from the client request in any way (e.g. current page number), or from upstream data source. Looking at the source code it must be without any doubt clear it's obviously an integer, and cannot ever have non-integer values. As you suggest it's helpful to have comments explaining what the data is.

Your second example is a good example why it's better to design the application so it has single pass over data, and quite possibly, prefers json/xml over html. So I'd recommend to change the design so that it's always clear if the data is tainted or untainted. Mixing content like done in your example is a good example to avoid: it's unclear what state the data is in. It would be very much more difficult for anyone reading the code to understand the multi-layer templating.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: Replying to [comment:11 lat]:

Yes, quoting is fun, especially avoiding double quoting.

If you can prove a value is without shadow of doubt an integer, it doesn't need to be quoted. That requires it's not tainted, e.g. the value didn't originate from the client request in any way (e.g. current page number), or from upstream data source. Looking at the source code it must be without any doubt clear it's obviously an integer, and cannot ever have non-integer values. As you suggest it's helpful to have comments explaining what the data is.

The proof can be easily done inside template by using
{{{
assert isinstance(idx, int)
}}}
statements, I think it would be sufficient.

Your second example is a good example why it's better to design the application so it has single pass over data, and quite possibly, prefers json/xml over html. So I'd recommend to change the design so that it's always clear if the data is tainted or untainted. Mixing content like done in your example is a good example to avoid: it's unclear what state the data is in. It would be very much more difficult for anyone reading the code to understand the multi-layer templating.

I tend to agree, even it's very seducing to dynamically create content using nested templates.

from das.

 avatar commented on July 17, 2024

lat: Yes asserts are fine if they work inside the template engine. As for page contents, I tend to prefer to use javascript for rendering :-)

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: One more question. I show on a web page the data-records via the following template (das_code.tmpl)

{{{

$code

}}}

with server code like this one:

{{{
record = self.dasmgr.mapping.api_info(name)
show = kwargs.get('show', 'json')
page = "DAS mapping record"
if show == 'json':
jsoncode = {'jsoncode': json2html(record, "")}
page += self.templatepage('das_json', **jsoncode)
elif show == 'code':
code = pformat(record, indent=1, width=100)
page += self.templatepage('das_code', code=code)
else:
code = yaml.dump(record, width=100, indent=4,
default_flow_style=False)
page += self.templatepage('das_code', code=code)
}}}

The "code" here represents DAS records (I should probably rename it for clarity, but it's not a point for this question), which I don't know a-priory, since DAS is data agnostic. All records are JSON and retrieved from various DAS collections (MongoDB analog of tables). I represents them in different format to allow users to see them appropriately. Obviously I don't want to quote them, but at the same time I don't know their structure. How to address this issue, does the usage of pre tag, e.g. "

$code
", is sufficient in this case?

from das.

 avatar commented on July 17, 2024

lat: I am not sure I understand the question, but if 'code', i.e. result of pformat, is arbitrary text data, you need to sanitise it. Using

 doesn't affect text interpretation, only appearance/formatting (mono-width, follow line breaks), so that's immaterial.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: Let me make example. DAS aggregates data from different data-services. One data-service, e.g. DBS, yields in DAS the following document

{{{
{'test': 1}
}}}

and another data-service, e.g. PhEDEx, yields in DAS the following record:

{{{
{'test': 1, 'site': [{'se':'A'}, {'se': 'B'}]
}}}

DAS does not know a-priory the structure of those documents, we just store records in DB.

On a web interface we want to see our aggregated records. The server gets data from its DB (MongoDB) and send to templates. I provided templates to view records in different representations (using pformat, etc.) The point is that I don't know in my code the structure of the record, it can be a dict, a list, a dict with lists, a dict with another dict which by itself can contain strings, integers, lists, etc. What I know is that all records are JSON. My question were:

  • how I can put JSON into our HTML?
  • should we sanitize the JSON data? if so, how to generalize sanitization of arbitrary JSON structure?

from das.

 avatar commented on July 17, 2024

lat: If you are putting arbitrary text into a html page, you have to sanitise the text string. In your case it happens the text is actually json. That doesn't really affect anything, as far as I can tell.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: Yes, but as a user you want to see this

{{{
{
"test":1
}
}}}

instead of this

{{{
%7B%0A%20%22test%22%3A1%20%0A%7D
}}}

So it is UI problem. May be we should reconsider what we should display to end-users and provide a link to method which will return JSON docs, instead of HTML.

from das.

 avatar commented on July 17, 2024

lat: I am very confused. Sanitising to HTML does not convert the text to the form you quote. It will replace " with ", < with < and so on.

from das.

PerilousApricot avatar PerilousApricot commented on July 17, 2024

meloam: Replying to [comment:19 lat]:

I am very confused. Sanitising to HTML does not convert the text to the form you quote. It will replace " with ", < with < and so on.

I think val is using urlencoding and not escaping html entities. One generates ampersands, the other makes percents. If he's using these strings to be displayed in html, he needs html escaping. If it goes in a url, it needs url encoding.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: I used python urllib.quote function.

{{{
print sss
{
"test":1
}
}}}

{{{
print urllib.quote(sss)
%7B%0A%20%22test%22%3A1%20%0A%7D
}}}

So, do we have python library for sanitization?

from das.

PerilousApricot avatar PerilousApricot commented on July 17, 2024

meloam: Replying to [comment:21 valya]:

I used python urllib.quote function.

{{{
print sss
{
"test":1
}
}}}

{{{
print urllib.quote(sss)
%7B%0A%20%22test%22%3A1%20%0A%7D
}}}

So, do we have python library for sanitization?

the cgi module has an escape() function, but I don't know if you want to pull that dependency in. I'm sure you and lassi could come up with a suitable replacement, the rules are pretty simple.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: I can use cgi.escape which says

Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.

Do we need to escape (") character? If so all JSON will look like

{ "test" : 1}

on a page.

from das.

PerilousApricot avatar PerilousApricot commented on July 17, 2024

meloam: Replying to [comment:23 valya]:

I can use cgi.escape which says

Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.

Do we need to escape (") character? If so all JSON will look like

{ "test" : 1}

on a page.

It won't. Web browsers universally interpret " to mean ' when rendering the body of a webpage. There's a whole slew of HTML special entities that get interpreted that way.

from das.

PerilousApricot avatar PerilousApricot commented on July 17, 2024

meloam: Replying to [comment:24 meloam]:

Replying to [comment:23 valya]:

I can use cgi.escape which says

Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.

Do we need to escape (") character? If so all JSON will look like

{ "test" : 1}

on a page.

It won't. Web browsers universally interpret " to mean ' when rendering the body of a webpage. There's a whole slew of HTML special entities that get interpreted that way.

Sorry, I meant that " gets translated to ", not ' .. (how do you quote a quotation character??)

from das.

 avatar commented on July 17, 2024

lat: As per the earlier reply, there are two distinct contexts for sanitisation in HTML (html context, url context), and you need to use quoting primitives according to the context of contents. There's no particular need to sanitise double or single quotes, that was just an example. You should use existing library functions for sanitisation, not roll your own. IIRC the routines are cgi.escape() for HTML context and urllib.quote_plus() for URL context.

from das.

PerilousApricot avatar PerilousApricot commented on July 17, 2024

meloam: Replying to [comment:26 lat]:

As per the earlier reply, there are two distinct contexts for sanitisation in HTML (html context, url context), and you need to use quoting primitives according to the context of contents. There's no particular need to sanitise double or single quotes, that was just an example. You should use existing library functions for sanitisation, not roll your own. IIRC the routines are cgi.escape() for HTML context and urllib.quote_plus() for URL context.

There's certainly no need to escape quotes if you're just writing to the body of a tag, but it doesn't hurt anything, and it keeps you from having to be super careful about remembering to do the extra escaping when you put user-data into attributes of tags (for example, USERTEXT could end up being bad if the user puts the test as:

" /><script lang=blah blah blah,etc"

from das.

 avatar commented on July 17, 2024

lat: OK, right, so there's three different sanitisation contexts, one more for HTML non-URL attributes. Anyway, still, should use existing quoting functions if at all possible. Last thing I want us to debug is bugs in someone's sanitisation routines.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: Thanks for clarifications. Now it becomes clear how I can accomplish this. Here is layout. I made a new function

{{{
def quote(data):
"""
Sanitize the data using cgi.escape.
"""
if isinstance(data, int) or isinstance(data, float):
res = data
else:
res = cgi.escape(str(data), quote=True)
return res
}}}

which can be used to sanitize non-URL content. The templates will get it as simple as

{{{
#from DAS.web.utils import quote
#assert isinstance($jsoncode, str)

$quote($jsoncode)

}}}

and I'll use urlllib.quote_plus for sanitization of URL attributes in HTML and encodeURIComponent for JS parts. I will also provide appropriate assert statements to check passed arguments types.

from das.

vkuznet avatar vkuznet commented on July 17, 2024

valya: (In adb3d0e) Sanitize DAS templates, fixes #545

Signed-off-by: Valentin Kuznetsov [email protected]

from das.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.