Comments (2)
Semantics of the causes and associates relationships & language design
Based on the user studies, I think that it makes sense to clarify what causes
stands for by providing several different flavors, such as may_cause
, i_hypothesize_this_causes
, definitely_causes
. (Names could use some work, but I am going for clarity here, not good naming.) My question is how this would change the implementation? Is there any way to distinguish between these things in statistics?
I can see us applying some basic heuristics, like if we have a.may_cause(b)
, and a
is our DV, then we certainly don't want to suggest b
as an IV, or at least provide a very strong warning against it (i.e., make users jump through hoops -- if they check it off in the GUI, then they have to run through a whole "Terms and Conditions" type dialog popup and say that "they agree" multiple times) (if I'm remembering my crash course on causal modeling correctly, I think we were grappling with this issue on either the night of the submission or at least a few days before, and it was all a blur). This may allow users who may otherwise be wary of trying to state that a.causes(b)
from saying instead that a.associates_with(b)
and Tisane suggesting b
as an IV.
The i_hypothesize_this_causes
form seems like it would have no implementation changes; instead, it's more of a note for the researcher that "this is something that I am hypothesizing about."
The definitely_causes
form would be for things like "drunk driving kills people" or "smoking causes cancer," that everyone can agree upon.
Hidden Variables Confusion
I'm not entirely sure how someone would specify a hidden variable. Would the user hypothesize "there may be n
hidden variables out there somewhere"? And what would we do with this information?
On Causal Modeling
I, for one, do not feel like I have a good enough grasp on what causal modeling is to really make a lot of statements about what would be the right thing to do. My brief impression is that it seems theoretically nice but in actuality is kind of impractical?
(I have to wonder if there is a gender difference in how people would use the causes
relationship. Perhaps womxn are less likely to want to assert that there is a causal relationship, whereas men are more gung-ho about asserting causal relationships.) (In a similar vein, women tend to use a lot of modifiers to their language and downplay what they're actually doing. It's an interesting linguistics question (and also PL/usability question) as to whether re-naming causes
into several aliases (that all basically do the same thing) but with modifiers that qualify how strong of a statement they're making would make womxn more likely to actually use causes
. )
Implementation Changes
I think that these all sound reasonable. I think it would also make sense to provide some feedback to the user if they stipulate an associates_with
relationship. In Jupyter notebooks (you know how much of a fan I am of Jupyter notebooks, after all, lol), sometimes pandas functions will provide warnings that are displayed differently from other feedback, and Jupyter notebooks provide all sorts of hooks to allow you to do this kind of thing it seems, which may be worth looking into. In a REPL setting, we could also provide a print/send to stderr information warning them.
from tisane.
Regarding Semantics
In my opinion, causal definitions should only be reserved for universal truths (like drunk driving causes accidents). For all other possible causal relationships, we should "nudge" the user towards using something less powerful. This can be done through Audrey's idea of providing varying degrees of confidence for defining causal relationships. On the implementation front, we could multiply/raise to the power of some constant, to make certain relationships more or less powerful. I am not sure how we would determine such a constant.
Regarding Hidden Variables
With an infinite number of hidden variables, there are an infinite number of possible causal relationships
Could Tisane provide a list of viable hidden variables? For example, consider two variables X and Y for which the user has defined a causal relationship. Tisane could list all other variables that have a relationship with X and Y, presenting them as possible options for the hidden variable.
Potential Problem: Asking users to pick a hidden variable might force them to make more assumptions or define more relationships than they're comfortable with.
Regarding the Working List
I think everything mentioned on the list is a great idea. In addition to all of that, do we want to explore having different workflows in Tisane? We could provide users with different options and paths depending on their use case. This will help us add more features to Tisane that benefit a certain type of user without having to worry about the impact it could have on another type of user.
On Next Steps
I could re-implement Tisane in R so that it uses Daggity under the hood
Python alternative for Daggity: https://github.com/pgmpy/pgmpy
from tisane.
Related Issues (20)
- Tests to add
- Algorithm for (Interactive) solving for valid statistical models
- Knowledge Base Class
- Automatic data schema detection
- Collecting Assertions
- Knowledge Base Query Results
- When update concepts, not reflected in ConceptGraph
- 🐛 Unnecessarily ask for end-user input [Design -> StatisticalModel]
- Update API to focus on variable relationships
- Update Query API
- Graph created from a Design is empty HOT 2
- Fixes to graph visualization in tikz + dot! HOT 1
- Add info about Dot graph generation to GRAPH_VIS.md
- Explanation of effect parameter to causes is confusing
- The docs are inconsistent
- Moderation on Nominal Not Working HOT 1
- RFC on strategy/design doc for Tisane R HOT 2
- Change code generation module
- Broken links in PyPi
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tisane.