Comments (25)
Yeah, I've been meaning to get to it. I'll probably push something in the next 1-2 weeks.
from ggextra.
I got it too work by adding the argument habillage
to fviz_pca_ind()
. Apologies for any inconviennce. Man do those plots look good. Great work with ggMarginal.
from ggextra.
@kassambara thank you for your input
@crew102 and I are discussing this, and it seems like the likely API will indeed be without a list.
A few more items we agreed on:
- alpha will be defaulted to 1, as there already is an implicit alpha in every ggmarginal call. You can pass
alpha
into the...
argument, and it should also work for the case of grouped data. The documentation for this feature should make it clear that the user may find it useful to explicitly set alpha to a different fraction, but it does not need to be an enumerated argument - since this feature will allow grouped data to have a "fill" colour for density plots, we should also add support for "fill" in non-grouped data (currently, "fill" is not supported in density plots)
We did not settle on whether the colourGroup/fillGroup will be boolean flags or the name of a variable, though leaning towards the former. Need to ensure whatever we choose is not too restrictive and will support these scenarios:
ggplot(mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
with fillGroup based on gear (even though original plot doesn't have a fill)ggplot(mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear, fill = carb))
with both fillGroup and colourGroup based on gear (even though original plot has a different fill from colour)
from ggextra.
You are right, this would be useful to others as well. I unfortunately will probably not have time to look into this feature myself, but I would be happy to accept a pull request if someone wants to take the lead on this feature.
from ggextra.
This would be do-able...It would be a little awkward given that we currently use geom_line
for creating the density plots, which we would have to move over to geom_density
so that we could fill the distributions with color (i.e., specify a fill
param).
I actually think the API suggested by @kassambara is good. I.e., the call would look something like:
p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = aes(colour = gear))
I think we'll want to require that the user specifies a color or fill mapping for the scatterplot if they also specify one for the marginal plots. We could rely on the xParams
and yParams
arguments for passing in alpha values of the filled marginal plots, too.
I'll take a stab at it sometime next week. @daattali , we should think about submitting a new version to CRAN after this as well, no?
from ggextra.
Yep, I already emailed the authors of packages using ggextra and told them about an upcoming cran release and to check the package for any regression bugs. We're good to go for CRAN. If you're thinking to have a go at this within the next few weeks then the cran release can wait for that.
API: is the idea that the user can also specify a different mapping than the one in the plot? And would using the aes() function be required? In ghplot aes is needed because without it you take the value literally rather than a mapping, would that be needed here as well?
from ggextra.
If you're thinking to have a go at this within the next few weeks then the cran release can wait for that.
Yeah, let's wait until I take a stab at implementing this feature
API: is the idea that the user can also specify a different mapping than the one in the plot?
Technically, yes, but the mapping should use the same variable. For example, this would be OK:
p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = aes(fill = gear))
But we would not be supporting this:
p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = aes(colour = cyl))
And would using the aes() function be required?
We wouldn't have to use aes()
. I was planning on parsing the aes
call and going from there, instead of using it directly (so it would actually be easier to not use it). Do you think using aes
would be confusing, given that we won't actually be making a call to it? I was actually originally thinking we should do something like this:
ggMarginal(p = p, margMapping = list(colour= cyl))
But I came around on the use of aes
because we are doing something conceptually similar to using aes
directly.
from ggextra.
I think if we're not actually using aes()
then we shouldn't require the user to use it because they might assume that they can write anything that works for aes()
in there. Just like ggMarginal()
already has x
and y
params that accept a variable name, and it's not wrapped in aes()
.
Would there be a technical limitation or any extra code to make something like
p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = list(colour = cyl))
work? From an implementation point of view, does it matter that the grouping in the plot and in the margin is not the same?
from ggextra.
There would be two things that would make it awkward/more difficult if we tried to allow that:
- We would have to find a place to put the second legend
- We would have to add another param for manually mapping the colors of
cyl
to whatever the user wants to use. If we have just havecolour = gear
in the call toggMarginal
, ggplot figures out the colors from any potential call to, say,scale_color_manual
and puts them in the dataframe that we are using.
from ggextra.
Good point re: legend.
Would the only allowed values be either "colour" and "fill", or would it allow any kind of mapping? And what exactly would the enforcement on the variables be - would it only allow variables that already have some mapping in the original plot?
from ggextra.
I think the only relevant values for this would be colour or fill...Can you think of any others? The enforcement would basically just check that the variable specified in margMapping
either be mapped to color or fill in the scatter plot. Also note that we wouldn't be supporting the data
param for this feature (i.e., if the user wants to use margMapping
, they have to pass in p
instead of passing in data
, x
, and y
.
from ggextra.
If it's just colour and fill, then it feels wrong to me to have a parameter that claims to take a list of mappings when there are only 2 allowed elements.
What do you think instead of one of these two options, which would be the best for end users?
- Having two params like
colourGroup
andfillGroup
(these might be terrible names - maybe colourVar and fillVar? I'm bad with naming things) - Simply having a single boolean param such as
marginalGroup = TRUE/FALSE
with FALSE as default. When TRUE, the colour and fill mappings that exist in the original plot get copied over to the marginal plot. This sounds like it could be simpler code and simpler for users perhaps? - (option 3: what you were suggesting above)
Let me know your thoughts.
from ggextra.
My first instinct was to do a combo of choices 1 and 2, so something like:
ggMarginal(p = p, marginalGroup = list(colourGroup = TRUE, colourAlpha = .4, fillGroup = FALSE, fillAlpha = NA))
With the reason being that, I think people will want to use different values for the alpha of the points vs the fill of the distributions. I don't have any strong feelings for whether we just have one marginalGroup
argument (which would be a list with 4 elements) or two arguments (colourGroup
and fillGroup
, each with 2 elements). I think it's going to be awkward any way we do it, to be honest. What do you think is the most intuitive for users?
from ggextra.
Nvm, I forgot what I was planning to do for alpha, which was to just suggest that users specify it in the xParams
or yParams
argument...So I guess your option 2 would also work....I think I actually like that option the most, come to think of it!
But we should seperate colour and fill...So either a single marginalGroup
argument which takes a list of two bools, or two arguments (colourGroup
and fillGroup
, both of which take a single bool)
from ggextra.
I don't follow the whole alpha thing. Why is alpha needed for the marginal plots? I think alpha should always be 1 for the marginal density/histogram.
In the marginal plot, would it make sense to have mappings for both colour and fill into different variables? I don't even know what that would look like
from ggextra.
Alpha is needed (at least for fill) for the marginal plots because alpha = 1 will result in you not being able to see the distributions when they overlap. For example, in the example that kassambara posted, you get to see what the distributions look like across their entire support, even when there is another distribution that is overlapping. So we would want to set a default value for alpha somewhere around .5, I think.
I think we should just allow one variable to be mapped to fill or colour (or potentially both)...Using two different variables in the marginal map would bring up the two issues I mentioned above (e.g., adding an extra legend).
from ggextra.
Right, alpha <1 definitely needed. But let's just fix it at a value, doesn't need to be customized. You're right.
My second question was: would both colour AND fill be able to get a mapping? What would it look like when they both are used?
from ggextra.
I think we should allow users to specify the alpha level, given that it will be difficult to choose a default that looks good for all different scenarios (i.e., many vs few groups, lighter vs darker cols, etc.).
Regarding your second question, that's what I thought you meant...We could potentially map a single variable to both fill and colour (but again, there would be no support for two different variables mapped to fill and colour). When you specify a fill param but no colour, the distribution(s) is outlined in black:
library(ggplot2)
mtcars$gear <- as.factor(mtcars$gear)
ggplot(data = mtcars) +
geom_density(aes(x = mpg, fill = gear), alpha = .3)
When you specify colour as well, the outline shares the same colour as the fill, and you only get one legend (at least for the current version of ggplot2 that I'm at):
ggplot(data = mtcars) +
geom_density(aes(x = mpg, fill = gear, colour = gear), alpha = .3)
I just checked out the case for histograms, and it fill looks pretty bad. It's too difficult to tell which bins refer to which groups:
ggplot(data = mtcars) +
geom_histogram(aes(x = mpg, fill = gear), alpha = .3,
position = position_identity(), bins = 10)
The case for boxplot looks reasonable, though:
ggplot(data = mtcars) +
geom_boxplot(aes(x = mpg, y = mpg, fill = gear, colour = gear), alpha = .3,
position = position_identity())
I think we should support all three but just suggest that the user choose type
to be either histogram or boxplot when he/she wants to specify a marginal mapping.
from ggextra.
I think that fixing the default alpha = 0.5
is a good option. Having the possibility to use colourGroup = TRUE
and/or fillGroup= TRUE
will be also appreciated.
You might have also noted that, when type = "boxplot", the color/fill variable should be used as the x axis variable in the marginal box plot.
Thank you :-)!
from ggextra.
I'm wondering, If it wouldn't be better, if the final format of ggMarginal looks like this:
# Basic usage
ggMarginal(p)
# Grouped data
# (Only) color by groups
ggMarginal(p, colourGroup = TRUE)
# or
# (Only) fill by groups
ggMarginal(p, fillGroup = TRUE, alpha = 0.5)
# or
# color and fill by groups
ggMarginal(p, colourGroup = TRUE, fillGroup = TRUE, alpha = 0.5)
Instead of this (more typing):
# Basic usage
ggMarginal(p)
# Grouped data
ggMarginal(p, margMapping = list(colourGroup = TRUE))
# or
ggMarginal(p, margMapping = list(fillGroup = TRUE, alpha = 0.5))
# or
ggMarginal(p, margMapping = list(colourGroup = TRUE, fillGroup = TRUE, alpha = 0.5))
from ggextra.
@crew102 I think we left this unresolved - do you have time/would like to come back to this?
from ggextra.
Closed?
from ggextra.
Indeed! @kassambara this exists now
from ggextra.
Way late to this but perhaps worthwhile - I cannot figure out how to combine the functionality of fvis_pca_ind
with ggMarginal
. Even after adding a grouping variable outside of the fvis_pca_ind()
argument using geom_point
, ggMarginal doesn't appear to recognize the grouping variable. See code below - kind of ugly. Is this a communication breakdown between
fvis_pca_indto
ggplotto
ggMarginal` ? It recognizes that there are 3 groups but not the color or fill.
state <- fviz_pca_ind(move_pca,
# Individuals
fill.ind = dat$state,
# col.ind = "black",
# pointshape = 21,
# col = "black",
# fill = movevars1$state,
# pointsize = 2,
# labelsize = 5,
alpha = 0.5,
palette = cols,
addEllipses = TRUE,
ellipse.type = "confidence",
ellipse.level = 0.95,
mean.point = FALSE,
label = "var",
col.var = "black",
repel = TRUE,
legend.title = "",
ggtheme = theme_minimal(base_size = 16)) + # Close fviz_pca_ind
labs(title = "",
x = "Time (PC1)",
y = "Energy (PC2)"
) +
geom_point(aes(dat$pc1, dat$pc2, fill = dat$state), color = "black", size = 2, shape = 21) + # rewrite points
scale_fill_manual(values = cols) + # rewrite colors
theme_bw(base_size = 16) +
theme(aspect.ratio = 1,
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_text(color="black", size = 14),
legend.position = c(.35, .95),
legend.justification = c("right", "top"),
legend.box.background = element_blank()
#axis.title.y = element_text(color="black", size = 20),
#axis.title.x = element_text(color="black", size = 20)
)
state1 <- ggMarginal(state, type = "density", col = "black", groupFill = TRUE)```
from ggextra.
Sorry. For context even when I try to specify the data, it says the nrows are misaligned despite the ggplot object has stored data with all the data stored, including a fill variable -- so even specifying my own data, and x,y coords, and fill object throws an error which I'm not sure why:
tail(state$data) # 143 observations with x, y, and fill variables
name x y coord cos2 contrib Fill.
138 138 0.029920576 -0.04905992 0.003302116 0.002063812 0.0005767548 Tennessee
139 139 0.469097338 -0.15273796 0.243381196 0.257227431 0.0425094847 Tennessee
140 140 0.384657694 0.19810512 0.187207182 0.319287372 0.0326980103 Tennessee
141 141 -2.198499773 -0.72048493 5.352499779 0.620500493 0.9348791579 Tennessee
142 142 -0.926122738 0.44988083 1.060096089 0.806648433 0.1851586697 Tennessee
143 143 -0.009075775 -0.56402287 0.318204172 0.205827671 0.0555782271 Tennessee
state1 <- ggMarginal(data = state$data, x = state$data$x, y = state$data$y, fill = state$data$Fill., type = "density")
Error in `ggplot2::geom_density()`:
! Problem while setting up geom aesthetics.
ℹ **Error occurred in the 1st layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (512)**
✖ Fix the following mappings: `fill`
Backtrace:
1. ggExtra::ggMarginal(...)
5. ggExtra:::addTopMargPlot(pGrob, top, size)
6. ggExtra:::getMargGrob(top)
7. ggplot2::ggplotGrob(margPlot)
12. ggplot2:::ggplot_build.ggplot(x)
...
21. l$compute_geom_2(d)
22. ggplot2 (local) compute_geom_2(..., self = self)
23. self$geom$use_defaults(data, self$aes_params, modifiers)
24. ggplot2 (local) use_defaults(..., self = self)
25. ggplot2:::check_aesthetics(params[aes_params], nrow(data))
Not sure where ggMarginal
is pulling the data to get 512 for geom_density() when the data is clearly only 143 observations long. Thanks for any advice if/when convenient.
from ggextra.
Related Issues (20)
- Is it possible to add counts on top of histogram bars? HOT 1
- Does ggMarginalGadget(plot) work from the Console Panel (Rstudio)? HOT 5
- Addin: errors messages in console
- Reconsider which versions of ggplot2 we test under? HOT 6
- Marginals from different data HOT 1
- Add statistics via ggstat or ggpubr ? HOT 3
- parameters for marginal plots (xparams) not going through: 'boundary', 'center' for histogram HOT 1
- ggMarginal gadget: plot type "densigram" is (the only one) missing HOT 1
- ggMarginal support for groupShape = T HOT 2
- `groupFill` uses `colour` aes? HOT 2
- ggMarginal grouped boxplots are different widths HOT 2
- ggplotly support? HOT 2
- mean and variance plot for ggMarginal HOT 1
- Variable boxplot size and order HOT 3
- The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0. HOT 3
- Order of colored marginal boxplots to not match the order of the color factor used HOT 3
- Add a line at the marginal densities HOT 1
- merge groups for ggmarginal HOT 3
- ggMarginal plot margins HOT 8
- Use shiny's runExample
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ggextra.