Giter Site home page Giter Site logo

Comments (9)

timelyportfolio avatar timelyportfolio commented on May 26, 2024 2

This is entirely my opinion, but for what it is worth...

networkD3 whitelists columns but not for security, and I often find myself adding back lost data. As far as I know, no other htmlwidgets constrain passed data. leaflet might (but again unintentionally I believe) depending on the method. While users should be aware that all data might get passed as json, I don't think the htmlwidget author should bear responsibility for determining what columns might or might not be important and what columns might or might not contain sensitive information.

from timevis.

daattali avatar daattali commented on May 26, 2024

A bit more context:

In order to create a timevis widget, the user calls the timevis(data) function in R, where data is a data.frame containing the data to show, and it needs to have some specific columns in order for the visualization to work. For example:

data <- data.frame(content = c("event 1", "event 2"),
                   start = c("2016-01-10", "2016-01-12"),
                   end = c("2016-01-13", "2016-01-14"))
timevis::timevis(data)

results in

image

Since this package is a binding to a third party javascript library and that javascript library is responsible for building the visualization, the data dataframe that is given to the timevis() R function gets passed directly to the javascript layer. This dataframe is accessible in the javascript console as a json object. This means that if you have additional columns in the dataframe that are not actually needed to build the visualization, they will still be available, hence the potential "information leakage".

I want to hear from others if they think this is a big security issue and if timevis should explicitly remove any non standard columns before passing the data to javascript. I don't love that idea because 1. then the package will have to play catchup with visjs to make sure we always have a full list of all the supported
columns, and 2. it's possible that the names of columns is not a finite list, maybe there are some options that allow the user to define custom columns that are used (the documentation for visjs is long with a lot of options and I haven't gone through it in detail recently).

I think simply documenting the data parameter to say that the entire dataframe will be visible to the client is sufficient. Would like to hear some opinions.

I wonder if other htmlwidgets whitelist columns in the data they use?

from timevis.

daattali avatar daattali commented on May 26, 2024

@greg-minshall My thoughts are along the same as what @timelyportfolio said. For now I would feel better about just adding a few words in the parameter documentation rather than making assumptions about and changing the data being passed in

from timevis.

greg-minshall avatar greg-minshall commented on May 26, 2024

hi. so, i sort of convinced myself other than the two of you. let me see if i can explain, both my proposed solution, and (harder) my rationale. my proposed solution:

  1. timevis() drops unknown columns from the input dataframe. but, it does so "noisily", i.e., by issuing a warning. the warning includes a comment that hitherto unknown (to timevis()) columns will be allowed through if they are named in the (brand new) "passcolumns=" parameter.

  2. the "passcolumns=" parameter, nominally NULL, can contain a list ("passcolumns=c("mightynew", "newest")) naming additional columns to allow through.


the warning in #1 is to (try to) prevent mystification of the user when their "mightynew" column doesn't seem to be having any effect. "passcolumns=" then decouples, to some extent, releases of timevis() from upgrades in vis.js() Timeline.

my reasoning is that security is a hard problem. while we can't fix the problem, we can semi-plug one hole, maybe stopping some very embarrassing leak from happening in the future. (if we do prevent it, we'll never know it; if we don't, we'll possibly hear about it.) on the other hand, every small possibility increases the overall likelihood of security failures.

from timevis.

daattali avatar daattali commented on May 26, 2024

I do understand your viewpoint eg. this is a security issue that timevis can potentially take some measure against. But in my opinion this is overstepping what timevis is, it's simply a visualization, if someone has sensitive data then it's on the user to not pass that sensitive data around if not needed, to any function. You shouldn't pass it to R functions if it's truly sensitive information that is not needed for the function you're calling, because every function you call is a blackbox inside and you don't know what it does with it (unless you actually look at the source code). To me it doesn't seem like it makes sense for such a simplistic project that's an R binding to a JS library to try to solve this problem.

If others will agree that it's the correct thing to do, I'm open to changing my mind, which is why I specifically asked on Twitter for people to comment here if they have an opinion.

Or, alternatively, if you'll find several other htmlwidgets that also have that line of thinking that variables should be whitelisted for security reasons, then it would also prompt me to reconsider. But the argument of a security hole sounds too weak for me because it seems like the user should just not use a dataframe with information that they really don't want to share.

I'll leave this issue open for a while to see if others chime in!

from timevis.

greg-minshall avatar greg-minshall commented on May 26, 2024

Dean, @timelyportfolio,

one thing to keep in mind is that, by definition, the user we are protecting is the user who (after the warning is on the man page) is, in fact, doing the wrong thing, making a mistake. the user who reads, understands, and pays attention to the warning is not the person we are trying to protect. rather, it is the person who, for whatever reason, doesn't do all those things. but, you know how we are when we are trying to get something to work, we read this and that, look at google, stackexchange, etc. then, we try this and that, possibly forgetting -- certainly at my age! -- some things we previously read while doing so. the programmer who reads all the documentation, follows all the guidelines, imho, is a bit like homo economicus.

in a scenario, the department clerk is told to put together a presentation for the next City Council meeting of what how the department employees are scheduled for the next six months. remembering that Janet did something similar, he asks and she sends him some code, using timevis. working on that, the clerk notices Janet gets the employee task database using the office API (which maybe involves, behind the curtains, a few joins), and then forms a new datafram with "name", "group", etc. but, the clever clerk says, i already have a dataframe, i'll just rename a few columns ("Name" to "name"; "Section" to "group", etc.), pass it to timevis(), and wow! since the meeting is public, the resulting web page is published online for the council members, and interested members of the public, to see. to see (well, for hackers), including, the non-timevis, non-vis.js columns such as mobile phone numbers, salary, etc.

timevis is a really nice package, and hopefully it will get an expanding user base. but, the likelihood of a leak is, i would guess, proportional to the number of users. (of course, i already did a leak; luckily not that horrible a one, just some e-mail address that, presumably, nobody noticed.)

so, obviously it's you decision, but i still offer up my solution (which i'm happy to code up): new columns can be passed, but they must be explicitly specified as new columns.

from timevis.

greg-minshall avatar greg-minshall commented on May 26, 2024

i think i did a boo boo. reopening.

from timevis.

daattali avatar daattali commented on May 26, 2024

You did in fact boo boo, let's keep it open :) I completely understand your argument. However, I still personally think that would be overstepping the boundary of the package, and that if someone has sensitive info the burden of being careful with it and not giving it away to other services is on them, not on the service to try to throw away what's given to it. This is clearly not a black and white matter and involves some personal judgment on what the solution (or no solution) is, so it would help if I would see that other htmlwidgets do take this into consideration. I haven't looked, but I don't think they do, and I imagine it's from similar reasons to mine. But I may be wrong .

from timevis.

daattali avatar daattali commented on May 26, 2024

As mentioned in PR #42 , instead of keeping this open forever, I'm closing this issue as a non-fix

from timevis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.