jamesyang2333 / sam Goto Github PK
View Code? Open in Web Editor NEWA learning-based method for high-fidelity database generation.
License: Other
A learning-based method for high-fidelity database generation.
License: Other
I am attempting to test the relationship between the training dataset size and training time in the SAM repository. I adjusted the train_queries
variable in sam_multi/experiments.py
to 1000 and ran the following command:
python run_uae.py --run job-light-ranges-mscn-workload
However, I encountered the following error:
Traceback (most recent call last):
File "/root/anaconda3/envs/sam/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/root/anaconda3/envs/sam/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/root/anaconda3/envs/sam/lib/python3.7/site-packages/ray/worker.py", line 1538, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::NeuroCard.train() (pid=81614, ip=172.17.0.5)
File "python/ray/_raylet.pyx", line 479, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/root/anaconda3/envs/sam/lib/python3.7/site-packages/ray/tune/trainable.py", line 332, in train
result = self.step()
File "/root/anaconda3/envs/sam/lib/python3.7/site-packages/ray/tune/trainable.py", line 636, in step
result = self._train()
File "run_uae.py", line 1264, in _train
q_weight=self.q_weight if self.semi_train else 0
File "run_uae.py", line 542, in run_epoch_query_only
all_loss.backward(retain_graph=True)
File "/root/anaconda3/envs/sam/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/envs/sam/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function 'MmBackward' returned nan values in its 0th output.
In the job-light-ranges-mscn-workload configuration within sam_multi/experiments.py, are there any additional parameters or settings that need to be adjusted to properly test the relationship between training dataset size and training time?
I appreciate your time and assistance. Looking forward to your guidance on resolving this issue. Thank you!
after struggling with OS issue, I switched to a centos x86, still no luck
(sam) [root@copy-of-vm-ee-centos76-v1 sam_multi]# uname -a
Linux copy-of-vm-ee-centos76-v1.05 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Jamesyang 2333, could you please help ?
following the README
on MacOS Montery 12.4
Apple M1 Pro
conda 4.13.0
/Users/{userid}/opt/anaconda3/bin/python
I got below error when try to run 1000 test queries on the generated database sam single. It only occur with "dmv" database, the "census" run is ok.
python query_execute_single.py --dataset dmv --data-file ./generated_data_tables/dmv.csv --query-file ./queries/dmv_test.txt
Traceback (most recent call last):
File "query_execute_single.py", line 43, in
cols = [sample_table.columns[sample_table.ColumnIndex(col)] for col in train_data_raw['column'][i]]
File "query_execute_single.py", line 43, in
cols = [sample_table.columns[sample_table.ColumnIndex(col)] for col in train_data_raw['column'][i]]
File "/home/mltest/SAM/sam_single/common.py", line 152, in ColumnIndex
assert name in self.name_to_index
AssertionError
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.