Comments (10)
@kgoldfeld You told me that you manually test the speed of certain functions before submission to cran.
How are you doing that? Just generating very larg sets of data with each distribution etc.?
from simstudy.
Yes, I test each function manually - by very creating large data sets. I generally haven't done it systematically for all distributions and functions, but I've made sure that any new or affected functions work well with large data sets. It seems like a systematic approach is warranted here.
from simstudy.
The solution: https://github.com/lorenzwalthert/touchstone
from simstudy.
The solution: https://github.com/lorenzwalthert/touchstone
I will test this on my fork and then ready a pr to integrate this here.
from simstudy.
This looks pretty cool, but it still requires us to develop the large-scale tests - correct?
from simstudy.
Yes {touchstone} provides the infrastructure but we have to think about what we want to test for performance. (this is btw separate from the unit test using {testthat} which is testing functionality)
For now, my idea was to take the examples from the vignettes and create the test from them just with heavily increased sample sizes. Unless you have a more methodical idea?
from simstudy.
That sounds like a reasonable place to start
from simstudy.
So I worked with touchstone a bit and contributed some code and now have a better understanding of how to use it.
I think it makes sense to approach it kind of like unit tests: benchmark small units (like dist generating functions) we currently work on. We can still do large "integration" kinds of tests using bench manually. Due to the way touchstone's inference works all tests are run multiple times for each branch, so run time on these jobs can get very long. But maybe the def table with ALL dists in it was just over kill xD
On the other hand I would like to have some broader tests in there as well with the large changes we are going to make as to not miss some large, unintended slow down somewhere... I will keep working on it and introduce a pr once I feel it is at a good place :)
Let me know what you think on the matter!
from simstudy.
Thanks for working through this. I’m not totally sure I understand how touchstone relates to testthat, and how it relates to benchmarking for speed. If we have a set of pre-established generation data processes that should remain stable after making changes, I wouldn’t mind if it took a couple of minutes to run – that should be plenty of time. I cannot imagine what we would do that would take much longer than that – and even that seems quite long, given that data generation is pretty much instantaneous. (This is obviously not the case if we get into model estimation, but I don’t really see any need for that in these particular performance tests.)
Let me know if I am totally missing the point.
from simstudy.
Oh a def table with all dists and 1000000 rows takes a while xD I guess the main issue there is actually memory (-> #50 ) but yeah for a few thousands rows everything should be rather quick.
And in the end these tests will run on github as to not slow our machines down, so the duration is not that important anyway.
I will prepare some benchmarks on my fork to demonstrate :)
from simstudy.
Related Issues (20)
- Should we combine genCorData and genCorGen?
- Change assertPositiveDefinite to assertPositiveSemiDefinite
- Release simstudy 0.6.0 HOT 2
- add double-dot functionality for defSurv
- Generate unbalanced cluster sizes HOT 1
- Modify survParamPlot to allow x-axis limits HOT 1
- double dot notation not working properly in genSurv HOT 1
- Release simstudy 0.7.0
- Generating large data sets is slower than I thought HOT 8
- addCorGen can be quite slow
- genCorGen with varying cluster sizes
- Add flexibility to function logisticCoefs
- Treatment values change when `ratio` argument is used? HOT 6
- nonrandom distribution returns a single value when repeated values are expected HOT 1
- Release simstudy 0.7.1
- External variable in `logisticCoefs` call not recognized when inside function call HOT 9
- Create a longitudinal dataset is not just counting integers but reflects actual time points HOT 4
- genBlockMat() function doesn't exist HOT 2
- Release simstudy 0.8.0 HOT 1
- Use formula to simulate categorical variable HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simstudy.