Comments (5)
It is a way to distribute the tests. We want to make N groups that last less than 1h, so we can start N VMs and execute the tests in parallel.
from recommenders.
@miguelgfierro how much it will be different if we "round-robin" split tests into N, assuming the tests are already grouped into folders by heavy-vs-light workloads,
e.g. tests in "examples/deepdive" should be heavy, while tests in "utils/..." are generally light.
If round-robin load balancing is not very different from manual-grouping,
what we can do is, get all the tests names (ordered by test folders) using the following codes, then round-robin them into N groups and execute into N VMs (I assuming AzureML?).
# collect pytests test functions
result = subprocess.run(
[
"pytest",
"--collect-only",
"--quiet",
"--disable-warnings",
],
stdout=subprocess.PIPE,
)
# split the collected output by lines as each line shows each test function
lines = result.stdout.decode("utf-8").split("\n")
tests = []
for line in lines:
# some output lines are not the tests, so we only pick the lines starts with "tests"
# e.g. in the following example, only the first three lines are actual tests
# """
# tests/unit/recommenders/models/test_sar_singlenode.py::test_sar_item_similarity[3-lift-lift]
# tests/unit/recommenders/models/test_sar_singlenode.py::test_sar_item_similarity[3-mutual information-mi]
# tests/unit/recommenders/models/test_sar_singlenode.py::test_user_affinity
#
# 2 tests collected in 1.51s
# """
if line.startswith("tests"):
# test arguments are shown in the suffix "[]" of the test function name.
tests.append(line.split("[")[0])
# remove duplicate test function names due to the parametrized tests
unique_tests = list(set(tests))
for t in unique_tests:
print(t) # this will be like: "tests/unit/recommenders/utils/test_timer.py::test_timer_format"
if we literally spin up N VMs, not AzureML run execution, there's a package called pytest-xdist
that we may consider to use for parallel execution. It does have remote submit functionality but I haven't used it, and it also says ssh-based remote submission has been deprecated...
from recommenders.
pfff I tried pytest-sdist and it didn't work very well.
What you are proposing seems ok, but it is a huge amount of work that will only changing a manual list to a programmatic solution. In addition, we recently discovered a way to get very cheap AzureML VMs #1996
Up to you man if you want to try this and compare with the current solution. However, there are several things that are not working in the repo that might be a better use of your effort: #2011
from recommenders.
Agree that we want to prioritize stuffs, and I'm okay w/ deferring this work until we find something easier alternatives.
Then, what's the suggestion if I add new tests? Should I add their names into one of the test group lists?
from recommenders.
Yes, just add them to one of the more empty lists
from recommenders.
Related Issues (20)
- Issue during pip install HOT 4
- [BUG] Movielens not working in python 3.7 due to Pandera library HOT 15
- [BUG] OOM in nightly SliRec and DKN HOT 3
- [BUG] NameError in ImplicitCF HOT 3
- [BUG] Upgrade GitHub Action azure/login HOT 2
- [BUG] Migrate AzureML SDK from v1 to v2 HOT 1
- [FEATURE] Add r-precision metric HOT 1
- [ASK] Multi Modal Recommendation HOT 1
- [BUG] tensorflow-estimator is removed from tensorflow 2.16.1
- [ASK] Stopped the support of new Tensorflow versions >= 2.16
- [BUG] Issue with SARplus in the latest merge to main issue with permissions HOT 1
- [ASK] cannot import name 'store_metadata' from 'recommenders.utils.notebook_utils' HOT 3
- [FEATURE] Implement R-Precision with PySpark
- [ASK] Is binary relevance the only option in RankingMetric class for pyspark evaluation?
- [FEATURE] Switch scipy sparse matrix to scipy sparse array HOT 2
- [ASK] Perfect MAP@k is less than 1 HOT 1
- [ASK] Unexpected Behavior with LightGCN Model When Loading Saved Files HOT 1
- [FEATURE] Support Python 3.12 HOT 2
- [FEATURE] Add David Davรณ as contributor
- [ASK] MIND test dataset doesn't work for run_eval HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from recommenders.