secretflow / secretflow Goto Github PK
View Code? Open in Web Editor NEWA unified framework for privacy-preserving data analysis and machine learning
Home Page: https://www.secretflow.org.cn/docs/secretflow/en/
License: Apache License 2.0
A unified framework for privacy-preserving data analysis and machine learning
Home Page: https://www.secretflow.org.cn/docs/secretflow/en/
License: Apache License 2.0
Bug
binary
0.6
macos 12.4
3.8.13
No response
No response
https://spu.readthedocs.io/en/beta/getting_started/quick_start.html
运行Move JAX program to SPU时报错ModuleNotFoundError: No module named '__mp_main__'
# run make_rand on P1, the value is visible for P1 only.
x = ppd.device("P1")(make_rand)()
# run make_rand on P2, the value is visible for P2 only.
y = ppd.device("P2")(make_rand)()
# run greater on SPU, it automatically fetches x/y from P1/P2 (as ciphertext), and compute the result securely.
ans = ppd.device("SPU")(greater)(x, y)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Input In [22], in <cell line: 2>()
1 # run make_rand on P1, the value is visible for P1 only.
----> 2 x_ = ppd.device("P1")(make_rand)()
4 # run make_rand on P2, the value is visible for P2 only.
5 y_ = ppd.device("P2")(make_rand)()
File ~/anaconda3/envs/secretflow/lib/python3.8/site-packages/spu/binding/util/distributed.py:373, in PYU.Function.__call__(self, *args, **kwargs)
367 return pyfunc(*args, **kwargs)
369 args, kwargs = tree_map(prep_objref, (args, kwargs))
371 return tree_map(
372 partial(PYU.Object, self.device),
--> 373 self.device.node_client.run(server_fn, *args, **kwargs),
374 )
File ~/anaconda3/envs/secretflow/lib/python3.8/site-packages/spu/binding/util/distributed.py:152, in NodeClient.run(self, fn, *args, **kwargs)
150 """Run a function on the corresponding node server"""
151 self._check_args(*args, **kwargs)
--> 152 return self._call(self._stub.Run, fn, *args, **kwargs)
File ~/anaconda3/envs/secretflow/lib/python3.8/site-packages/spu/binding/util/distributed.py:143, in NodeClient._call(self, stub_method, fn, *args, **kwargs)
139 rsp_gen = stub_method(
140 RunRequest(data=split) for split in split_message(payload)
141 )
142 rsp_data = rebuild_messages(rsp_itr.data for rsp_itr in rsp_gen)
--> 143 result = pickle.loads(rsp_data)
144 if isinstance(result, Exception):
145 raise Exception("remote exception", result)
ModuleNotFoundError: No module named '__mp_main__'
Bug
binary
latest
ubuntu 18.04
3.8.13
No response
No response
But I used my own dataset which is much larger to run the demo,the following error occurred.
2022-07-14 09:14:38,001 WARNING worker.py:1416 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff86b5b6b3ddbc49317ecc0c5801000000 Worker ID: 5e383a84459651f44846404f76675b3a8dd9e9ac626ff4e286e39455 Node ID: e0f67ccadcf8945daacf7665de1242222467fc5e633957d4a381300d Worker IP address: 172.17.0.2 Worker port: 41807 Worker PID: 10310
Traceback (most recent call last):
File "jax_fk.py", line 178, in <module>
params = sf.reveal(params_spu)
File "/usr/local/lib/python3.8/site-packages/secretflow/device/driver.py", line 158, in reveal
value_obj = ray.get(value_ref)
File "/usr/local/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/ray/worker.py", line 1845, in get
raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: SPURuntime
actor_id: 86b5b6b3ddbc49317ecc0c5801000000
pid: 10310
namespace: 3b837c97-994a-4894-bdbc-f3e070b54a9e
ip: 172.17.0.2
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR_EXIT
(SPURuntime pid=10311) I0714 09:14:38.084491 10471 external/com_github_brpc_brpc/src/brpc/socket.cpp:2202] Checking Socket{id=0 addr=127.0.0.1:36067} (0x563ea08fe600)
The url of demo is https://secretflow.readthedocs.io/en/latest/tutorial/nn_with_spu.html.
The dataset's info :
x1 is : [[0.00291545 0.1 0. ... 0. 0.41176471 0.7852172 ]
[0.00291545 0.1 0. ... 0. 0.41176471 0.7852172 ]
[0.00291545 0.1 0.00980392 ... 0. 0.41176471 1. ]
...
[0. 0. 0. ... 0. 0.41176471 0.7852172 ]
[0. 0. 0. ... 0. 0.41176471 0.5 ]
[0.0058309 0.1 0.01960784 ... 0. 0.41176471 0.7852172 ]]
x1 type : <class 'numpy.ndarray'>
x1 dtype : float64
x1 shape : (106501, 400)
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import time
from jax.example_libraries import stax
from jax.example_libraries.stax import (
Dense,
Relu,
Softplus,
Dropout,
Sigmoid,
)
import jax
import jax.numpy as jnp
from jax.example_libraries import optimizers, stax
def load_fk_dataset(party_id) -> (np.ndarray, np.ndarray):
if party_id == 1:
data = pd.read_csv("./data/train_fk_cu.csv")
features = data.drop(["example_id"], axis=1)
return features.to_numpy(), None
else:
data = pd.read_csv("./data/train_fk_jd.csv")
labels = data["label"]
features = data.drop(["label", "example_id"], axis=1)
return features.to_numpy(), labels.to_numpy()
def load_train_dataset(party_id=None) -> (np.ndarray, np.ndarray):
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
X_train, _, y_train, _ = train_test_split(
features, label, test_size=0.8, random_state=42
)
if party_id:
if party_id == 1:
return X_train[:, 15:], _
else:
return X_train[:, :15], y_train
else:
return X_train, y_train
def load_test_dataset():
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
_, X_test, _, y_test = train_test_split(
features, label, test_size=0.8, random_state=42
)
return X_test, y_test
def MLP():
nn_init, nn_apply = stax.serial(
Dense(128*2),
Relu,
Dense(710),
Relu,
Dense(400),
Relu,
Dense(100),
Softplus,
Dense(50),
Softplus,
Dense(1),
Sigmoid,
)
return nn_init, nn_apply
KEY = jax.random.PRNGKey(0)
INPUT_SHAPE = (-1,731)
def init_state(learning_rate):
init_fun, _ = MLP()
_, params_init = init_fun(KEY, INPUT_SHAPE)
opt_init, _, _ = optimizers.sgd(learning_rate)
opt_state = opt_init(params_init)
return opt_state
def train(
train_x1,
train_x2,
train_y,
opt_state,
learning_rate,
epochs,
batch_size,
):
train_x = jnp.concatenate([train_x1, train_x2], axis=1)
_, predict_fun = MLP()
_, opt_update, get_params = optimizers.sgd(learning_rate)
def update_model(state, imgs, labels, i):
def mse(y, pred):
return jnp.mean(jnp.multiply(y - pred, y - pred) / 2.0)
def loss_func(params):
y = predict_fun(params, imgs)
return mse(y, labels), y
grad_fn = jax.value_and_grad(loss_func, has_aux=True)
(loss, y), grads = grad_fn(get_params(state))
return opt_update(i, grads, state)
import time
for i in range(1, epochs + 1):
begin =time.time()
imgs_batchs = jnp.array_split(train_x, len(train_x) / batch_size, axis=0)
labels_batchs = jnp.array_split(train_y, len(train_y) / batch_size, axis=0)
for batch_idx, (batch_images, batch_labels) in enumerate(
zip(imgs_batchs, labels_batchs)
):
opt_state = update_model(opt_state, batch_images, batch_labels, i)
end = time.time()
print("epoch-{} cost time:{} s".format(i, end - begin))
return get_params(opt_state)
from sklearn.metrics import roc_auc_score
def validate_model(params, X_test, y_test):
_, predict_fun = MLP()
y_pred = predict_fun(params, X_test)
return roc_auc_score(y_test, y_pred)
if __name__ == "__main__":
import jax
# Load the data
x1, _ = load_fk_dataset(party_id=1)
print("x1 is :", x1)
print("x1 type :", type(x1))
print("x1 dtype :", x1.dtype)
print("x1 shape :", x1.shape)
x2, y = load_fk_dataset(party_id=2)
# Hyperparameter
batch_size = 100
epochs = 10
learning_rate = 0.1
# Load the data
import secretflow as sf
# In case you have a running secretflow runtime already.
sf.shutdown()
sf.init(['alice', 'bob'], num_cpus=16, log_to_driver=True)
alice, bob = sf.PYU('alice'), sf.PYU('bob')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))
x1, _ = alice(load_fk_dataset)(party_id=1)
x2, y = bob(load_fk_dataset)(party_id=2)
device = spu
x1_, x2_, y_ = x1.to(device), x2.to(device), y.to(device)
init_params_ = sf.to(spu, lambda: init_state(learning_rate))
begin = time.time()
params_spu = spu(train, static_argnames=['learning_rate', 'epochs', 'batch_size'])(
x1_, x2_, y_,init_params_, learning_rate=learning_rate, epochs=epochs, batch_size=batch_size
)
params = sf.reveal(params_spu)
print("train cost time:",time.time() - begin)
# print(params)
X_test, y_test = load_test_dataset()
auc = validate_model(params, X_test, y_test)
print(f'auc={auc}')
我看 requirements.txt
里面用到了 secretflow-ray==2.0.0.dev0
, 我想问一下 secretflow-ray
跟 ray
的区别,相关代码有没有开源?
Documentation Bug
source
0.6.13
Ubuntu18.04
3.8.13
No response
No response
SS-LR/XGB和HESS-LR/XGB 有什么区别
SS-LR/XGB和HESS-LR/XGB 有什么区别
Build/Install
source
latest
No response
No response
No response
No response
When I use "pip install -U secretflow":
ERROR: Could not find a version that satisfies the requirement secretflow (from versions: none)
ERROR: No matching distribution found for secretflow
pip install -U secretflow
Build/Install
binary
latest
No response
No response
No response
No response
我有三台公网的服务器,A,B,C
A启动ray,作为head节点启动:ray start --head --node-ip-address="172.17.0.12" --port="6379" --resources='{"alice": 8}'
B和C分别连A的公网IP和端口
B:ray start --address="A公网IP:6379" --resources='{"bob": 8}'
C:ray start --address="A公网IP:6379" --resources='{"charlie": 8}'
ray status能看到集群是正常启动的,如:
(secretflow) ubuntu@VM-0-12-ubuntu:~$ ray status
======== Autoscaler status: 2022-07-19 11:55:11.912311 ========
Node status
---------------------------------------------------------------
Healthy:
1 node_1b2eea51cd2e55ee525b52f7ab9d1936e39cfe0836dfcf8879385c98
1 node_de20b9bdc340e6e008389ac8a1e014be3fe6a0d23d9cbe8e19cea057
1 node_8c9cd0fe5c5a53234aebcc11559d688a88f088dc47c272ee46e20782
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
0.0/2.0 CPU
0.0/8.0 alice
0.00/1.744 GiB memory
0.00/0.872 GiB object_store_memory
Demands:
(no resource demands)
我理解应该可以在A节点运行代码,但是让运算调度到B的device上吧?但是我不确定是否如此,然后sf.reveal的时候卡住了,代码如下(ray不太熟悉,不清楚是哪里的问题)
另外还想咨询下:
1、我如何证明运算是B处理的?(例如能否看到一些log)
2、如何保证最终结果只有C能看到?
求教,感谢!
>>> import secretflow as sf
>>> sf.init(address='172.17.0.12:6379')
>>> b = sf.PYU('bob')
>>> import numpy as np
>>> data = b(np.random.rand)(3, 4)
>>> sf.reveal(data)
ctrl+c 退出,返回报错:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/driver.py", line 158, in reveal
value_obj = ray.get(value_ref)
File "/home/ubuntu/anaconda3/envs/secretflow/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/secretflow/lib/python3.8/site-packages/ray/worker.py", line 1837, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/home/ubuntu/anaconda3/envs/secretflow/lib/python3.8/site-packages/ray/worker.py", line 364, in get_objects
data_metadata_pairs = self.core_worker.get_objects(
File "python/ray/_raylet.pyx", line 1200, in ray._raylet.CoreWorker.get_objects
File "python/ray/_raylet.pyx", line 169, in ray._raylet.check_status
KeyboardInterrupt
Others
source
beta
Centos7
3.8
No response
No response
我看咱们隐语的官方教程中,启动ray start resource和代码中init ray的时候都有一个名字——alice bob charlie,可是ray的官方教程中的例子都没有在resources中指定名字。
请问这两种情况下(起名、不起名),在使用时有什么区别吗?
我先说我个人的理解,不起名字时候,ray是一个集群,并没有party的概念,一个task中的所有fun会通过调度分散在集群中的不同节点上执行——即每个节点运行一部分fun
起名之后才有了party的概念,一个task会被submit到不同的party上面执行,而每个party都会运行task中所有的fun?
不知道这么理解是否正确?如下图
ray start --head --node-ip-address="192.168.137.3" --port="7000" --resources='{"alice": 8}'
ray start --address="172.16.4.140:6379" --resources='{"bob": 8}'
ray start --address="172.16.4.140:6379" --resources='{"charlie": 8}'
请问官方有推理的例子吗?
Others
binary
beta
No response
No response
No response
No response
按照官方文档的例子,执行到aggr = SecureAggregator(device=alice, participants=[alice, bob]) 之后,无限期执行下去,没有结果,请问怎么查找问题?
按照官方文档的例子,执行到aggr = SecureAggregator(device=alice, participants=[alice, bob]) 之后,无限期执行下去,没有结果,请问怎么查找问题?
Others
source
0.6.13b1
Rocky Linux release 8.5
3.8.12
No response
No response
(_run pid=1977253) 2022-07-20 11:11:55.857356: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=1977254) 2022-07-20 11:11:56.079320: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=1977495) 2022-07-20 11:11:56.253085: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=1977250) 2022-07-20 11:11:56.365197: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=1977496) 2022-07-20 11:11:56.503680: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=1977253) 2022-07-20 11:11:56,939,939 WARNING [xla_bridge.py:backends:265] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
(_run pid=1977254) 2022-07-20 11:11:57,120,120 WARNING [xla_bridge.py:backends:265] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
(SPURuntime pid=1977495) I0720 11:11:57.312257 1977495 external/com_github_brpc_brpc/src/brpc/server.cpp:1065] Server[yasl::link::internal::ReceiverServiceImpl] is serving on port=46567.
(SPURuntime pid=1977495) I0720 11:11:57.312313 1977495 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Check out http://localhost.localdomain:46567 in web browser.
(_run pid=1977250) 2022-07-20 11:11:57,401,401 WARNING [xla_bridge.py:backends:265] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
(SPURuntime pid=1977495) I0720 11:11:57.413011 1977723 external/com_github_brpc_brpc/src/brpc/socket.cpp:2202] Checking Socket{id=0 addr=127.0.0.1:58775} (0x65ef180)
(SPURuntime pid=1977496) I0720 11:11:57.516299 1977496 external/com_github_brpc_brpc/src/brpc/server.cpp:1065] Server[yasl::link::internal::ReceiverServiceImpl] is serving on port=58775.
(SPURuntime pid=1977496) I0720 11:11:57.516340 1977496 external/com_github_brpc_brpc/src/brpc/server.cpp:1068] Check out http://localhost.localdomain:58775 in web browser.
(SPURuntime pid=1977495) I0720 11:12:00.413251 1977722 external/com_github_brpc_brpc/src/brpc/socket.cpp:2262] Revived Socket{id=0 addr=127.0.0.1:58775} (0x65ef180) (Connectable)
2022-07-20 11:12:01,757 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=1977495, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fc1af31ea00>)
IndexError: tuple index out of range
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977495, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fc1af31ea00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/ray/workers/default_worker.py", line 238, in <module>
ray.worker.global_worker.main_loop()
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 528, in __len__
return self.aval._len(self)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 1335, in _len
raise TypeError("len() of unsized object") from err # same as numpy error
jax._src.traceback_util.UnfilteredStackTrace: TypeError: len() of unsized object
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977495, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fc1af31ea00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
TypeError: len() of unsized object
---------------------------------------------------------------------------
RayTaskError(TypeError) Traceback (most recent call last)
Input In [273], in <cell line: 2>()
1 w = device(abc)(x1_,x2_,y_)
----> 2 model_w = sf.reveal(w)
3 model_w
File ~/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/driver.py:158, in reveal(func_or_object)
155 value_ref.append(value.device.sk_keeper.decrypt.remote(value.data))
156 value_idx.append(i)
--> 158 value_obj = ray.get(value_ref)
159 idx = 0
160 for i in value_idx:
File ~/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
103 if func.__name__ != "init" or is_client_mode_enabled_by_default:
104 return getattr(ray, func.__name__)(*args, **kwargs)
--> 105 return func(*args, **kwargs)
File ~/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/ray/worker.py:1843, in get(object_refs, timeout)
1841 worker.core_worker.dump_object_store_memory_usage()
1842 if isinstance(value, RayTaskError):
-> 1843 raise value.as_instanceof_cause()
1844 else:
1845 raise value
RayTaskError(TypeError): ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
IndexError: tuple index out of range
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/ray/workers/default_worker.py", line 238, in <module>
ray.worker.global_worker.main_loop()
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 528, in __len__
return self.aval._len(self)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 1335, in _len
raise TypeError("len() of unsized object") from err # same as numpy error
jax._src.traceback_util.UnfilteredStackTrace: TypeError: len() of unsized object
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
TypeError: len() of unsized object
(SPURuntime pid=1977495) WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
(SPURuntime pid=1977496) WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
2022-07-20 11:12:06,945 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.get_var() (pid=1977495, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fc1af31ea00>)
At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
IndexError: tuple index out of range
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/ray/workers/default_worker.py", line 238, in <module>
ray.worker.global_worker.main_loop()
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 528, in __len__
return self.aval._len(self)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 1335, in _len
raise TypeError("len() of unsized object") from err # same as numpy error
jax._src.traceback_util.UnfilteredStackTrace: TypeError: len() of unsized object
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
TypeError: len() of unsized object
2022-07-20 11:12:06,946 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.get_var() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
IndexError: tuple index out of range
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/ray/workers/default_worker.py", line 238, in <module>
ray.worker.global_worker.main_loop()
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 528, in __len__
return self.aval._len(self)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/jax/core.py", line 1335, in _len
raise TypeError("len() of unsized object") from err # same as numpy error
jax._src.traceback_util.UnfilteredStackTrace: TypeError: len() of unsized object
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=1977496, ip=10.180.216.25, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fe6544a6a00>)
File "/root/.pyenv/versions/3.8.12/envs/secret_flow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_1290442/1368379063.py", line 2, in abc
TypeError: len() of unsized object
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def load_train_dataset(party_id=None) -> (np.ndarray, np.ndarray):
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
X_train, _, y_train, _ = train_test_split(
features, label, test_size=0.8, random_state=42
)
if party_id:
if party_id == 1:
return X_train[:, 15:], _
else:
return X_train[:, :15], y_train
else:
return X_train, y_train
def load_test_dataset():
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
_, X_test, _, y_test = train_test_split(
features, label, test_size=0.8, random_state=42
)
return X_test, y_test
import secretflow as sf
# In case you have a running secretflow runtime already.
sf.shutdown()
sf.init(['alice', 'bob'], num_cpus=8, log_to_driver=True)
alice, bob = sf.PYU('alice'), sf.PYU('bob')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))
x1, _ = alice(load_train_dataset)(party_id=1)
x2, y = bob(load_train_dataset)(party_id=2)
device = spu
x1_, x2_, y_ = (
x1.to(device),
x2.to(device),
y.to(device),
)
x1_, x2_, y_
from torch import nn
import numpy
class Lr(nn.Module):
def __init__(self):
super(Lr, self).__init__() #继承父类init的参数
self.linear = nn.Linear(30, 1)
def forward(self, x1, x2):
x = numpy.concatenate([x1, x2], axis=1)
out = self.linear(torch.tensor(x))
return out
def abc(x1, x2, y):
x1 = torch.tensor(x1,dtype=torch.float64)
x2 = torch.tensor(x2,dtype=torch.float64)
y = torch.tensor(y,dtype=torch.float64)
model = Lr()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for i in range(100):
model = model.double()
y_predict = model(x1, x2)
loss = criterion(torch.tensor(np.expand_dims(y,axis=1),dtype=torch.float64),y_predict)
optimizer.zero_grad()
loss.backward()
optimizer.step()
return dict(model.state_dict())['linear.weight'].numpy()
w = device(abc)(x1_,x2_,y_)
model_w = sf.reveal(w)
model_w
Others
source
0.6.13b1
Rocky Linux release 8.5
3.8.12
No response
No response
请问有为spu分配IP与端口的示例吗,[SPU **init**](https://github.com/secretflow/secretflow/blob/beta/secretflow/device/device/spu.py#L386)
我在启动bob节点时并没有为其设置端口,在上述示例中需要填写bob节点的address,应该如何填写?
我目前使用的address是bob的ip和alice的端口,具体如下:
import secretflow as sf
import spu
sf.shutdown()
sf.init(address='alice's ip:8881')
alice = sf.PYU('alice')
bob = sf.PYU('bob')
device = sf.SPU({
'nodes': [
{
'party': 'alice',
'id': 'local:0',
# The address for other peers.
'address': 'alice's ip:8881',
# The listen address of this node.
# Optional. Address will be used if listen_address is empty.
'listen_address': ''
},
{
'party': 'bob',
'id': 'local:1',
'address': 'bob's ip:8881',
'listen_address': ''
},
],
'runtime_config': {
'protocol': spu.spu_pb2.SEMI2K,
'field': spu.spu_pb2.FM128,
'sigmoid_mode': spu.spu_pb2.RuntimeConfig.SIGMOID_REAL,
}
})
data1 = alice(lambda x : x)(2).to(device)
data2 = bob(lambda x : x)(2).to(device)
def add(a,b):
return a+b
data = device(add)(data1,data2)
sf.reveal(data)
在reveal时报错:
RayActorError Traceback (most recent call last)
Input In [7], in <cell line: 2>()
1 data = device(add)(data1,data2)
----> 2 sf.reveal(data)
File ~/.pyenv/versions/3.8.12/envs/secretflow/lib/python3.8/site-packages/secretflow/device/driver.py:158, in reveal(func_or_object)
155 value_ref.append(value.device.sk_keeper.decrypt.remote(value.data))
156 value_idx.append(i)
--> 158 value_obj = ray.get(value_ref)
159 idx = 0
160 for i in value_idx:
File ~/.pyenv/versions/3.8.12/envs/secretflow/lib/python3.8/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook..wrapper(*args, **kwargs)
103 if func.name != "init" or is_client_mode_enabled_by_default:
104 return getattr(ray, func.name)(*args, **kwargs)
--> 105 return func(*args, **kwargs)
File ~/.pyenv/versions/3.8.12/envs/secretflow/lib/python3.8/site-packages/ray/worker.py:1845, in get(object_refs, timeout)
1843 raise value.as_instanceof_cause()
1844 else:
-> 1845 raise value
1847 if is_individual_id:
1848 values = values[0]
RayActorError: The actor died because of an error raised in its creation task, ray::SPURuntime.init() (pid=21797, ip=172.22.56.85, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fc6d3326c10>)
File "/root/.pyenv/versions/3.8.12/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 125, in init
self.link = link.create_brpc(desc, rank)
RuntimeError: what:
[external/yasl/yasl/link/context.cc:140] connect to mesh failed, failed to setup connection to rank=0
stacktrace:
#0 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7fc728c28ed7
#1 pybind11::cpp_function::dispatcher()+0x7fc728c150cb
#2 PyCFunction_Call+0x43be5a
此外,是否有更详细的集群部署示例?
import secretflow as sf
import spu
sf.shutdown()
sf.init(address='alice's ip:8881')
alice = sf.PYU('alice')
bob = sf.PYU('bob')
device = sf.SPU({
'nodes': [
{
'party': 'alice',
'id': 'local:0',
# The address for other peers.
'address': 'alice's ip:8881',
# The listen address of this node.
# Optional. Address will be used if listen_address is empty.
'listen_address': ''
},
{
'party': 'bob',
'id': 'local:1',
'address': 'bob's ip:8881',
'listen_address': ''
},
],
'runtime_config': {
'protocol': spu.spu_pb2.SEMI2K,
'field': spu.spu_pb2.FM128,
'sigmoid_mode': spu.spu_pb2.RuntimeConfig.SIGMOID_REAL,
}
})
data1 = alice(lambda x : x)(2).to(device)
data2 = bob(lambda x : x)(2).to(device)
def add(a,b):
return a+b
data = device(add)(data1,data2)
sf.reveal(data)
Error downloading object: tests/datasets/adult/horizontal/adult.alice.npy (b67b7b2): Smudge error: Error downloading tests/datasets/adult/horizontal/adult.alice.npy (b67b7b234b61e53fbc989be38dd08c4f19dd5a910b986b70a8982f47e3c4465d): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Errors logged to /home/vscode/secretflow-modelinghub/.git/lfs/logs/20220615T083829.124020079.log
Use git lfs logs last
to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: tests/datasets/adult/horizontal/adult.alice.npy: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'
在开发文档的demo中,图像分类联邦学习的源代码,导入FL模型部分代码似乎是错误,
原代码:
from secretflow.security.aggregation import SpuAggregator, SecureAggregator
from secretflow.ml.nn import FLModelTF
正确代码:
from secretflow.security.aggregation import SPUAggregator, SecureAggregator
from secretflow.ml.nn import FLModelTF
改为上述代码后,运行报错:
File ~/miniconda3/envs/tensorflow2/lib/python3.8/site-packages/secretflow/security/privacy/mechanism/tensorflow/layers.py:23, in
17 from abc import ABC, abstractmethod
19 from secretflow.security.privacy.accounting.rdp_accountant import (
20 get_rdp,
21 get_privacy_spent_rdp,
22 )
---> 23 import secretflow.security.privacy._lib.random as random
26 class EmbeddingDP(tf.keras.layers.Layer, ABC):
27 def init(self) -> None:
ModuleNotFoundError: No module named 'secretflow.security.privacy._lib'
请问如何解决?
官网上面的例子都是横向的:“Federate Learning for Image Classification”、“Federate Xgboosts”
有一个例子数据是纵向的,但是算法用的是mpc:“Logistic Regression with SPU”
有没有纵向联邦回归算法的例子啊?
Bug
source
secretflow-0.6.13b1
Linux ubuntu 18.04
python 3.8
bazel 5.1.1
gcc 12.1
Two-party psi unbalanced data set reports an error:
alice 10000000
bob: 100000000
intersection: 100000
logs:
Traceback (most recent call last):
File "/opt/mpc/secretflow/tests/psi2.py", line 16, in <module>
spu.psi_csv('id', input_path, output_path)
File "/opt/mpc/secretflow/secretflow/device/device/spu.py", line 491, in psi_csv
return dispatch('psi_csv', self, key, input_path, output_path, protocol, sort)
File "/opt/mpc/secretflow/secretflow/device/device/register.py", line 111, in dispatch
return _registrar.dispatch(self.device_type, name, self, *args, **kwargs)
File "/opt/mpc/secretflow/secretflow/device/device/register.py", line 80, in dispatch
return self._ops[device_type][name](*args, **kwargs)
File "/opt/mpc/secretflow/secretflow/device/kernels/spu.py", line 177, in psi_csv
return ray.get(res)
File "/opt/mpc/secretflow/venv/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/opt/mpc/secretflow/venv/lib/python3.8/site-packages/ray/worker.py", line 1843, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::SPURuntime.psi_csv() (pid=113075, ip=10.228.21.60, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f2eac002880>)
File "/opt/mpc/secretflow/secretflow/device/device/spu.py", line 368, in psi_csv
libs.kkrt_2pc_psi(
RuntimeError: what:
[external/yasl/yasl/link/transport/channel.cc:86] Get data timeout, key=root:3:ALLGATHER
stacktrace:
#0 yasl::link::Context::RecvInternal()+0x7f2f02b100b2
#1 yasl::link::AllGatherImpl<>()+0x7f2f029c8785
#2 yasl::link::AllGather()+0x7f2f029c8cb4
#3 spu::psi::PsiExecutorBase::Run()+0x7f2f02411709
#4 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f2f00c341e8
#5 pybind11::cpp_function::dispatcher()+0x7f2f00c150cb
#6 PyCFunction_Call+0x5eda96
Process finished with exit code 1
import time
import pandas as pd
import secretflow as sf
sf.shutdown()
sf.init(['alice', 'bob', 'carol'], log_to_driver=False)
alice, bob = sf.PYU('alice'), sf.PYU('bob')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))
input_path = {alice: '.data/alice.csv', bob: '.data/bob.csv'}
output_path = {alice: '.data/alice_psi.csv', bob: '.data/bob_psi.csv'}
time_psi2_begin = time.time()
spu.psi_csv('id', input_path, output_path)
time_psi2_end = time.time()
print('psi2 cost time:', time_psi2_end - time_psi2_begin, 's')
da_psi = pd.read_csv('.data/alice_psi.csv')
db_psi = pd.read_csv('.data/bob_psi.csv')
expected = pd.read_csv('.data/intersection')
print(da_psi.shape[0] == expected.shape[0])
print(db_psi.shape[0] == expected.shape[0])
Documentation Feature Request
source
latest
No response
No response
No response
No response
看源码感觉HEU初始化和SPU差别比较大,也没有看到专门的教程,这块能否提供一个小案例(如加法)便于上手呢,谢谢
同上
Others
source
beta
Centos7
3.8
No response
No response
请问是先创建ray网络,还是先执行任务。
如果先创建ray网络,如果header节点宕机了,高可用怎么做?
1、创建ray网络
ray start --head --node-ip-address="192.168.137.3" --port="7000" --resources='{"alice": 8}'
ray start --address="172.16.4.140:6379" --resources='{"bob": 8}'
2、执行任务
import secretflow as sf
import spu
sf.shutdown()
sf.init(address='alice's ip:8881')
alice = sf.PYU('alice')
bob = sf.PYU('bob')
device = sf.SPU({
……
1、创建ray网络
ray start --head --node-ip-address="192.168.137.3" --port="7000" --resources='{"alice": 8}'
ray start --address="172.16.4.140:6379" --resources='{"bob": 8}'
2、执行任务
import secretflow as sf
import spu
sf.shutdown()
sf.init(address='alice's ip:8881')
alice = sf.PYU('alice')
bob = sf.PYU('bob')
device = sf.SPU({
……
Others
binary
beta
Centos7
3.8
No response
No response
#!/usr/bin/env python
# coding: utf-8
# In[9]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def load_train_dataset(party_id=None) -> (np.ndarray, np.ndarray):
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
X_train, _, y_train, _ = train_test_split(
features, label, test_size=0.8, random_state=42
)
if party_id:
if party_id == 1 or party_id == 3:
return X_train[:, 15:], _
else:
return X_train[:, :15], y_train
else:
return X_train, y_train
def load_test_dataset():
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
_, X_test, _, y_test = train_test_split(
features, label, test_size=0.8, random_state=42
)
return X_test, y_test
# In[10]:
import jax.numpy as jnp
from jax import grad, jit, vmap
from jax import random
def sigmoid(x):
return 1 / (1 + jnp.exp(-x))
# Outputs probability of a label being true.
def predict(W, b, inputs):
return sigmoid(jnp.dot(inputs, W) + b)
# Training loss is the negative log-likelihood of the training examples.
def loss(W, b, inputs, targets):
preds = predict(W, b, inputs)
label_probs = preds * targets + (1 - preds) * (1 - targets)
return -jnp.mean(jnp.log(label_probs))
# In[11]:
from jax import value_and_grad
def train_step(W, b, x1, x2, x3, y, learning_rate):
x = jnp.concatenate([x1, x2, x3], axis=1)
loss_value, Wb_grad = value_and_grad(loss, (0, 1))(W, b, x, y)
W -= learning_rate * Wb_grad[0]
b -= learning_rate * Wb_grad[1]
return loss_value, W, b
# In[12]:
def fit(W, b, x1, x2, x3, y, epochs=1, learning_rate=1e-2):
losses = jnp.array([])
for _ in range(epochs):
l, W, b = train_step(W, b, x1, x2, x3, y, learning_rate=learning_rate)
losses = jnp.append(losses, l)
return losses, W, b
# In[13]:
from sklearn.metrics import roc_auc_score
def validate_model(W, b, X_test, y_test):
y_pred = predict(W, b, X_test)
return roc_auc_score(y_test, y_pred)
# In[14]:
import matplotlib.pyplot as plt
def plot_losses(losses):
plt.plot(np.arange(len(losses)), losses)
plt.xlabel('epoch')
plt.ylabel('loss')
# In[15]:
import secretflow as sf
import spu
# In case you have a running secretflow runtime already.
sf.shutdown()
sf.init(address='172.16.4.140:6379', _redis_password='')
alice, bob, charlie = sf.PYU('alice'), sf.PYU('bob'), sf.PYU('charlie')
device = sf.SPU({
'nodes': [
{
'party': 'alice',
'id': '140:0',
# The address for other peers.
'address': '172.16.4.140:8881',
# The listen address of this node.
# Optional. Address will be used if listen_address is empty.
# 'listen_address': ''
},
{
'party': 'bob',
'id':'141:0',
'address': '172.16.4.141:8881',
# 'listen_address': ''
},
{
'party': 'charlie',
'id':'142:0',
'address': '172.16.4.142:8881',
# 'listen_address': ''
}
],
'runtime_config': {
'protocol': spu.spu_pb2.ABY3,
'field': spu.spu_pb2.FM128,
'sigmoid_mode': spu.spu_pb2.RuntimeConfig.SIGMOID_REAL
}
})
# sf.init(['alice', 'bob', 'charlie'], num_cpus=8, log_to_driver=True)
# alice, bob, charlie = sf.PYU('alice'), sf.PYU('bob'), sf.PYU('charlie')
# spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob', 'charlie']))
# In[16]:
x1, _ = alice(load_train_dataset)(party_id=1)
x2, y = bob(load_train_dataset)(party_id=2)
x3, _ = charlie(load_train_dataset)(party_id=3)
x1, x2, x3, y
# In[17]:
W = jnp.zeros((30,))
b = 0.0
W_, b_, x1_, x2_, x3_, y_ = (
sf.to(device, W),
sf.to(device, b),
x1.to(device),
x2.to(device),
x3.to(device),
y.to(device),
)
# In[18]:
losses, W_, b_ = device(fit, static_argnames=['epochs'], num_returns=3)(
W_, b_, x1_, x2_, x3_, y_, epochs=10, learning_rate=1e-2
)
losses, W_, b_
# In[19]:
get_ipython().run_line_magic('matplotlib', 'inline')
losses = sf.reveal(losses)
plot_losses(losses)
# In[ ]:
执行到损失函数In[18]:的时候抛出异常
2022-08-01 21:06:21,049 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=16468, ip=172.16.4.140, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f38e1bd0be0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 982, in value_and_grad_f
ans, vjp_py = _vjp(f_partial, *dyn_args, reduce_axes=reduce_axes)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 2441, in _vjp
out_primal, out_vjp = ad.vjp(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 129, in vjp
out_primals, pvals, jaxpr, consts = linearize(traceable, *primals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 116, in linearize
jaxpr, out_pvals, consts = pe.trace_to_jaxpr(jvpfun_flat, in_pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 606, in trace_to_jaxpr
jaxpr, (out_pvals, consts, env) = fun.call_wrapped(pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 466, in cache_miss
out_flat = xla.xla_call(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 344, in process_call
result = call_primitive.bind(f_jvp, *primals, *nonzero_tangents, **new_params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 216, in process_call
out = primitive.bind(_update_annotation(f_, f.in_type, in_knowns),
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1534, in process_call
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 656, in dot
raise TypeError("Incompatible shapes for dot: got {} and {}.".format(
jax._src.traceback_util.UnfilteredStackTrace: TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=16468, ip=172.16.4.140, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f38e1bd0be0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
2022-08-01 21:06:21,054 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=16468, ip=172.16.4.140, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f38e1bd0be0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 982, in value_and_grad_f
ans, vjp_py = _vjp(f_partial, *dyn_args, reduce_axes=reduce_axes)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 2441, in _vjp
out_primal, out_vjp = ad.vjp(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 129, in vjp
out_primals, pvals, jaxpr, consts = linearize(traceable, *primals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 116, in linearize
jaxpr, out_pvals, consts = pe.trace_to_jaxpr(jvpfun_flat, in_pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 606, in trace_to_jaxpr
jaxpr, (out_pvals, consts, env) = fun.call_wrapped(pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 466, in cache_miss
out_flat = xla.xla_call(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 344, in process_call
result = call_primitive.bind(f_jvp, *primals, *nonzero_tangents, **new_params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 216, in process_call
out = primitive.bind(_update_annotation(f_, f.in_type, in_knowns),
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1534, in process_call
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 656, in dot
raise TypeError("Incompatible shapes for dot: got {} and {}.".format(
jax._src.traceback_util.UnfilteredStackTrace: TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=16468, ip=172.16.4.140, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f38e1bd0be0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
2022-08-01 21:06:21,056 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=16468, ip=172.16.4.140, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f38e1bd0be0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 982, in value_and_grad_f
ans, vjp_py = _vjp(f_partial, *dyn_args, reduce_axes=reduce_axes)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 2441, in _vjp
out_primal, out_vjp = ad.vjp(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 129, in vjp
out_primals, pvals, jaxpr, consts = linearize(traceable, *primals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 116, in linearize
jaxpr, out_pvals, consts = pe.trace_to_jaxpr(jvpfun_flat, in_pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 606, in trace_to_jaxpr
jaxpr, (out_pvals, consts, env) = fun.call_wrapped(pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 466, in cache_miss
out_flat = xla.xla_call(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 344, in process_call
result = call_primitive.bind(f_jvp, *primals, *nonzero_tangents, **new_params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 216, in process_call
out = primitive.bind(_update_annotation(f_, f.in_type, in_knowns),
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1534, in process_call
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 656, in dot
raise TypeError("Incompatible shapes for dot: got {} and {}.".format(
jax._src.traceback_util.UnfilteredStackTrace: TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=16468, ip=172.16.4.140, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f38e1bd0be0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
2022-08-01 21:06:21,305 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=32105, ip=172.16.4.141, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fdfbc16cac0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 982, in value_and_grad_f
ans, vjp_py = _vjp(f_partial, *dyn_args, reduce_axes=reduce_axes)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 2441, in _vjp
out_primal, out_vjp = ad.vjp(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 129, in vjp
out_primals, pvals, jaxpr, consts = linearize(traceable, *primals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 116, in linearize
jaxpr, out_pvals, consts = pe.trace_to_jaxpr(jvpfun_flat, in_pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 606, in trace_to_jaxpr
jaxpr, (out_pvals, consts, env) = fun.call_wrapped(pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 466, in cache_miss
out_flat = xla.xla_call(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 344, in process_call
result = call_primitive.bind(f_jvp, *primals, *nonzero_tangents, **new_params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 216, in process_call
out = primitive.bind(_update_annotation(f_, f.in_type, in_knowns),
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1534, in process_call
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 656, in dot
raise TypeError("Incompatible shapes for dot: got {} and {}.".format(
jax._src.traceback_util.UnfilteredStackTrace: TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=32105, ip=172.16.4.141, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fdfbc16cac0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
2022-08-01 21:06:21,311 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=32105, ip=172.16.4.141, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fdfbc16cac0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 982, in value_and_grad_f
ans, vjp_py = _vjp(f_partial, *dyn_args, reduce_axes=reduce_axes)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 2441, in _vjp
out_primal, out_vjp = ad.vjp(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 129, in vjp
out_primals, pvals, jaxpr, consts = linearize(traceable, *primals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 116, in linearize
jaxpr, out_pvals, consts = pe.trace_to_jaxpr(jvpfun_flat, in_pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 606, in trace_to_jaxpr
jaxpr, (out_pvals, consts, env) = fun.call_wrapped(pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 466, in cache_miss
out_flat = xla.xla_call(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 344, in process_call
result = call_primitive.bind(f_jvp, *primals, *nonzero_tangents, **new_params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 216, in process_call
out = primitive.bind(_update_annotation(f_, f.in_type, in_knowns),
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1534, in process_call
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 656, in dot
raise TypeError("Incompatible shapes for dot: got {} and {}.".format(
jax._src.traceback_util.UnfilteredStackTrace: TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=32105, ip=172.16.4.141, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fdfbc16cac0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
2022-08-01 21:06:21,314 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=32105, ip=172.16.4.141, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fdfbc16cac0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 807, in computation_maker
jaxpr, out_avals, consts = pe.trace_to_jaxpr_dynamic(jaxtree_fun, avals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1779, in trace_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 982, in value_and_grad_f
ans, vjp_py = _vjp(f_partial, *dyn_args, reduce_axes=reduce_axes)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 2441, in _vjp
out_primal, out_vjp = ad.vjp(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 129, in vjp
out_primals, pvals, jaxpr, consts = linearize(traceable, *primals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 116, in linearize
jaxpr, out_pvals, consts = pe.trace_to_jaxpr(jvpfun_flat, in_pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/profiler.py", line 206, in wrapper
return func(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 606, in trace_to_jaxpr
jaxpr, (out_pvals, consts, env) = fun.call_wrapped(pvals)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/api.py", line 466, in cache_miss
out_flat = xla.xla_call(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/ad.py", line 344, in process_call
result = call_primitive.bind(f_jvp, *primals, *nonzero_tangents, **new_params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 216, in process_call
out = primitive.bind(_update_annotation(f_, f.in_type, in_knowns),
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1771, in bind
return call_bind(self, fun, *args, **params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/core.py", line 1787, in call_bind
outs = top_trace.process_call(primitive, fun_, tracers, params)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1534, in process_call
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1816, in trace_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers_)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/linear_util.py", line 168, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 656, in dot
raise TypeError("Incompatible shapes for dot: got {} and {}.".format(
jax._src.traceback_util.UnfilteredStackTrace: TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
ray::SPURuntime.run() (pid=32105, ip=172.16.4.141, repr=<secretflow.device.device.spu.SPURuntime object at 0x7fdfbc16cac0>)
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 196, in run
cfn, output = jax.xla_computation(fn, return_shape=True)(*args, **kwargs)
File "/tmp/ipykernel_15144/2203275946.py", line 4, in fit
File "/tmp/ipykernel_15144/4019579140.py", line 5, in train_step
File "/tmp/ipykernel_15144/1730905303.py", line 17, in loss
File "/tmp/ipykernel_15144/1730905303.py", line 12, in predict
File "/opt/software/anaconda3/envs/secretflow/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 2692, in dot
return lax.dot(a, b, precision=precision)
TypeError: Incompatible shapes for dot: got (113, 45) and (30,).
Build/Install
binary
secretflow 最新版
Ubuntu18.04/Ubuntu22.04
python3.8.13
No response
No response
操作系统Ubuntu18.04下
如果使用 pip install -U secretflow,报错为:
Requirement already satisfied: aiohttp<=4 in ./.local/lib/python3.8/site-packages (from s3fs==2022.1.0->secretflow) (3.8.1)
Collecting aiobotocore~=2.1.0
Downloading http://pypi.doubanio.com/packages/4e/8d/01035d9b56893bd3b5d6eb4505d3ed1383d124b1c9c2b6024c175681c64b/aiobotocore-2.1.2.tar.gz (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.7/58.7 kB 3.0 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
ERROR: Can not execute setup.py since setuptools is not available in the build environment.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
在源码下pip install -r requirement,报错为:
Collecting jax==0.3.7
Downloading http://pypi.doubanio.com/packages/d7/ef/8ff361f49244956f48c3528a42c392c31bdbcbb9af5399eba19e153a5c26/jax-0.3.7.tar.gz (944 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.2/944.2 kB 3.0 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
ERROR: Can not execute setup.py since setuptools is not available in the build environment.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
在Ubuntu22.04下:
安装没有发生错误,但是运行简单的例子,出现错误:
(secretflow) shenghuo@shenghuo-machine:~/src$ python test.py
Traceback (most recent call last):
File "test.py", line 1, in
import secretflow as sf
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/init.py", line 15, in
from . import data, device, ml, preprocessing, security, utils
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/data/init.py", line 15, in
from . import horizontal, vertical
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/data/horizontal/init.py", line 15, in
from .dataframe import HDataFrame
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/data/horizontal/dataframe.py", line 21, in
from secretflow.data.base import DataFrameBase, Partition
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/data/base.py", line 23, in
from secretflow.device import PYUObject, reveal
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/init.py", line 15, in
from .device import *
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/init.py", line 16, in
from .heu import HEU
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/heu.py", line 17, in
from secretflow.device.device.spu import PyTreeLeaf
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 31, in
import spu
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/spu/init.py", line 33, in
from .binding.api import Io, Runtime, compile
File "/home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/spu/binding/api.py", line 21, in
from . import _lib
ImportError: /home/shenghuo/anaconda3/envs/secretflow/lib/python3.8/site-packages/spu/binding/_lib.so: undefined symbol: _dl_sym, version GLIBC_PRIVATE
import secretflow as sf
sf.init(['alice', 'bob', 'carol'], num_cpus=8, log_to_driver=True)
dev = sf.PYU('alice')
import numpy as np
data = dev(np.random.rand)(3, 4)
data
现阶段MPC协议算子在通信量和通信次数上可能会存在局限性,特别是在多方(n>3),公网环境上性能损耗明显。想问下TEE-based multi-party computation 会是多方安全计算的一种更优的解决方案么,你们怎么看?
Build/Install
binary
secretflow 最新版
Ubuntu 22.04
3.8.13
0
0
ERROR: Could not find a version that satisfies the requirement secretflow (from versions: none)
ERROR: No matching distribution found for secretflow
pip3 install -U secretflow
请问有ABY.3的例子吗
请问Federated XGBoost这个函数内部有封装任何Secret Flow的隐私保护的算法吗?
目前看隐语的架构有支持sql 方式进行安全聚合分析,请问有相关例子和文档吗
Bug
source
最新版
centos 7.6
3.8.13
No response
No response
RayActorError: The actor died because of an error raised in its creation task, ray::SPURuntime.__init__() (pid=89408, ip=172.16.160.4, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f6822262d60>)
File "/mnt/hgfs/code/secretflow/secretflow/device/device/spu.py", line 125, in __init__
self.link = link.create_brpc(desc, rank)
RuntimeError: what:
[external/yasl/yasl/link/context.cc:140] connect to mesh failed, failed to setup connection to rank=2
stacktrace:
#0 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7f6874c28ed7
#1 pybind11::cpp_function::dispatcher()+0x7f6874c150cb
#2 PyCFunction_Call+0x43bdca
运行说明书中三方psi的第10步
请问是否有federated learning的例子(不是split learning)
Others
binary
latest
ubuntu18.04
3.8.13
No response
No response
I want to use 3PC in NN, I set up three characters in the same way as 2PC, but the setting is didn't working, why? The machine is 8C16G
sf.init(['alice', 'bob', 'adan'], num_cpus=12, log_to_driver=True)
alice, bob, adan = sf.PYU('alice'), sf.PYU('bob'), sf.PYU('adan')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob', 'adan']))
x1, _ = alice(load_train_dataset)(party_id=0)
x0, _ = adan(load_train_dataset)(party_id=1)
x2, y = bob(load_train_dataset)(party_id=2)
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import time
from jax.example_libraries import stax
from jax.example_libraries.stax import (
Dense,
Relu,
)
import jax
import jax.numpy as jnp
from jax.example_libraries import optimizers, stax
from sklearn.metrics import roc_auc_score
def load_train_dataset(party_id=None) -> (np.ndarray, np.ndarray):
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
X_train, _, y_train, _ = train_test_split(
features, label, test_size=0.2, random_state=42
)
if party_id == 0:
return X_train[:, :10], _
elif party_id == 1:
return X_train[:, 10:20], _
else:
return X_train[:, 20:30], y_train
def load_test_dataset():
features, label = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
features = scaler.fit_transform(features)
_, X_test, _, y_test = train_test_split(
features, label, test_size=0.2, random_state=42
)
return X_test, y_test
def MLP():
nn_init, nn_apply = stax.serial(
Dense(500),
Relu,
Dense(500),
Relu,
Dense(500),
Relu,
Dense(1),
)
return nn_init, nn_apply
KEY = jax.random.PRNGKey(42)
INPUT_SHAPE = (-1,30)
def init_state(learning_rate):
init_fun, _ = MLP()
_, params_init = init_fun(KEY, INPUT_SHAPE)
opt_init, _, _ = optimizers.sgd(learning_rate)
opt_state = opt_init(params_init)
return opt_state
def train(
train_x0,
train_x1,
train_x2,
train_y,
opt_state,
learning_rate,
epochs,
batch_size,
):
train_x = jnp.concatenate([train_x0, train_x1, train_x2], axis=1)
_, predict_fun = MLP()
_, opt_update, get_params = optimizers.sgd(learning_rate)
def update_model(state, imgs, labels, i):
def mse(y, pred):
return jnp.mean(jnp.multiply(y - pred, y - pred) / 2.0)
def loss_func(params):
y = predict_fun(params, imgs)
return mse(y, labels), y
grad_fn = jax.value_and_grad(loss_func, has_aux=True)
(loss, y), grads = grad_fn(get_params(state))
return opt_update(i, grads, state)
import time
for i in range(1, epochs + 1):
begin =time.time()
imgs_batchs = jnp.array_split(train_x, len(train_x) / batch_size, axis=0)
labels_batchs = jnp.array_split(train_y, len(train_y) / batch_size, axis=0)
for batch_idx, (batch_images, batch_labels) in enumerate(
zip(imgs_batchs, labels_batchs)
):
opt_state = update_model(opt_state, batch_images, batch_labels, i)
end = time.time()
print("epoch-{} cost time:{} s".format(i, end - begin))
return get_params(opt_state)
def validate_model(params, X_test, y_test):
_, predict_fun = MLP()
y_pred = predict_fun(params, X_test)
return roc_auc_score(y_test, y_pred)
if __name__ == "__main__":
import jax
# Hyperparameter
batch_size = 100
epochs = 1
learning_rate = 0.1
init_params = init_state(learning_rate)
import secretflow as sf
# In case you have a running secretflow runtime already.
sf.shutdown()
sf.init(['alice', 'bob', 'adan'], num_cpus=12, log_to_driver=True)
alice, bob, adan = sf.PYU('alice'), sf.PYU('bob'), sf.PYU('adan')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob', 'adan']))
x1, _ = alice(load_train_dataset)(party_id=0)
x0, _ = adan(load_train_dataset)(party_id=1)
x2, y = bob(load_train_dataset)(party_id=2)
device = spu
x0_, x1_, x2_, y_ = x0.to(device), x1.to(device), x2.to(device), y.to(device)
init_params_ = sf.to(spu, lambda: init_state(learning_rate))
begin = time.time()
params_spu = spu(train, static_argnames=['learning_rate', 'epochs', 'batch_size'])(
x0_, x1_, x2_, y_,init_params_, learning_rate=learning_rate, epochs=epochs, batch_size=batch_size
)
params = sf.reveal(params_spu)
print("train cost time:",time.time() - begin)
print(params)
X_test, y_test = load_test_dataset()
auc = validate_model(params, X_test, y_test)
print(f'auc={auc}')
I saw some article introducing scretflow on the Internet , some explain the PPU with the picture of SPU, I am doubting about the relationship between PPU and SPU.
执行步骤:
- git clone https://github.com/secretflow/secretflow.git
- conda create -n secretflow python=3.8
- conda activate secretflow
- cd secretflow/
- pip install -r dev-requirements.txt -r requirements.txt
然后遇到关于zlib的问题:
请问你们有遇到过吗?如何解决的?
非常感谢!
请问有交流群么?
请问secretflow有提供示例来验证可信性吗?看到spu设备用reveal方法就可以获取原文。
Documentation Bug
binary
0.6.13b1
Anolis OS release 8.6
3.8.12
No response
No response
I was looking at tutorial's DataFrame.ipynb and noticed that the hdf.mean(numeric_only=True) and vdf.mean(numeric_only=True) results are different, is this expected behavior?
I thought the result of mean would be the same whether the division method is horizontal or vertical, am I misunderstanding something?
Thinking it might be a typo in the documentation, I tried to reproduce it using spu's Dockerfile and confirmed that the output is as follows, just as in the documentation.
>>> hdf.mean(numeric_only=True)
sepal length (cm) 1.168667
sepal width (cm) 0.611467
petal length (cm) 0.751600
petal width (cm) 0.239867
target 0.200000
dtype: float64
>>> vdf.mean(numeric_only=True)
sepal length (cm) 5.843333
sepal width (cm) 3.057333
petal length (cm) 3.758000
petal width (cm) 1.199333
target 1.000000
dtype: float64
>> data.mean(numeric_only=True)
sepal length (cm) 5.843333
sepal width (cm) 3.057333
petal length (cm) 3.758000
petal width (cm) 1.199333
target 1.000000
dtype: float64
Can be reproduced by running https://github.com/secretflow/secretflow/blob/beta/docs/tutorial/DataFrame.ipynb
您好,目前有tensorflow或者pytorch自定义算法与模型,然后如何结合secretflow使用的例子吗?
请问是否有乘法、减法、除法、方差、中位数的例子
Build/Install
binary
latest
Centos7.5
No response
No response
No response
执行下面这个操作的时候
RAY_DISABLE_REMOTE_CODE=true \
RAY_SECURITY_CONFIG_PATH=config.yml \
RAY_USE_TLS=1 \
RAY_TLS_SERVER_CERT=servercert.pem \
RAY_TLS_SERVER_KEY=serverkey.pem \
RAY_TLS_CA_CERT=cacert.pem \
ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
抛出异常
(yinyu) [root@alice secretflow_cluster]# RAY_DISABLE_REMOTE_CODE=true RAY_SECURITY_CONFIG_PATH=config.yml RAY_USE_TLS=1 RAY_TLS_SERVER_CERT=servercert.pem RAY_TLS_SERVER_KEY=serverkey.pem RAY_TLS_CA_CERT=cacert.pem ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
Usage stats collection is disabled.
Local node IP: 192.168.137.4
E0714 07:48:09.783010834 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:09.783055834 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:09.783061834 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:09.783066534 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:10.785822614 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:10.785848214 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:10.785853214 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:10.785857714 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:11.787983491 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:11.788010492 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:11.788015492 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:11.788019492 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:12.790182069 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:12.790247570 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:12.790256170 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:12.790260470 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:13.792359247 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:13.792404647 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:13.792410247 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:13.792423847 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:14.794512125 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:14.794539725 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:14.794545325 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:14.794549725 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
2022-07-14 07:48:14,794 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
执行下面这个操作的时候
RAY_DISABLE_REMOTE_CODE=true \
RAY_SECURITY_CONFIG_PATH=config.yml \
RAY_USE_TLS=1 \
RAY_TLS_SERVER_CERT=servercert.pem \
RAY_TLS_SERVER_KEY=serverkey.pem \
RAY_TLS_CA_CERT=cacert.pem \
ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
抛出异常
(yinyu) [root@alice secretflow_cluster]# RAY_DISABLE_REMOTE_CODE=true RAY_SECURITY_CONFIG_PATH=config.yml RAY_USE_TLS=1 RAY_TLS_SERVER_CERT=servercert.pem RAY_TLS_SERVER_KEY=serverkey.pem RAY_TLS_CA_CERT=cacert.pem ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
Usage stats collection is disabled.
Local node IP: 192.168.137.4
E0714 07:48:09.783010834 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:09.783055834 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:09.783061834 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:09.783066534 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:10.785822614 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:10.785848214 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:10.785853214 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:10.785857714 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:11.787983491 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:11.788010492 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:11.788015492 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:11.788019492 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:12.790182069 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:12.790247570 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:12.790256170 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:12.790260470 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:13.792359247 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:13.792404647 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:13.792410247 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:13.792423847 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
E0714 07:48:14.794512125 8132 ssl_transport_security.cc:845] Invalid cert chain file.
E0714 07:48:14.794539725 8132 ssl_security_connector.cc:116] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0714 07:48:14.794545325 8132 chttp2_connector.cc:317] Failed to create secure subchannel for secure name '192.168.137.4:6379'
E0714 07:48:14.794549725 8132 chttp2_connector.cc:276] Failed to create channel args during subchannel creation.
2022-07-14 07:48:14,794 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
想请问一下,隐语框架针对TEE部分,是哪块的代码,基于TEE做了哪些算法
Others
source
secretflow 最新版
macos 13.0
3.8.13
No response
No response
(scheduler +27s) Error: No available node types can fulfill resource request {'bob': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.
(scheduler +27s) Error: No available node types can fulfill resource request {'alice': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.
import secretflow as sf
alice, bob = sf.PYU('alice'), sf.PYU('bob')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))
input_path = {alice: '.data/alice.csv', bob: '.data/bob.csv'}
output_path = {alice: '.data/alice_psi.csv', bob: '.data/bob_psi.csv'}
spu.psi_csv('uid', input_path, output_path)
Others
binary
latest
ubuntu 18.04
3.8.13
No response
No response
2022-07-28 16:16:13,219 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::SPURuntime.run() (pid=13081, ip=10.100.82.74, repr=<secretflow.device.device.spu.SPURuntime object at 0x7f1fd47b1220>)
File "/home/ops/anaconda3/envs/secretflow/lib/python3.8/site-packages/secretflow/device/device/spu.py", line 224, in run
self.runtime.run(executable)
File "/home/ops/anaconda3/envs/secretflow/lib/python3.8/site-packages/spu/binding/api.py", line 43, in run
return self._vm.Run(executable.SerializeToString())
RuntimeError: what:
[external/yasl/yasl/link/transport/channel.cc:86] Get data timeout, key=root:110:ALLGATHER
stacktrace:
#0 yasl::link::Context::RecvInternal()+0x7f202eb100b2
#1 yasl::link::AllGatherImpl<>()+0x7f202e9c8785
#2 yasl::link::AllGather()+0x7f202e9c8cb4
#3 spu::mpc::Communicator::allReduce()+0x7f202e2c7a37
#4 spu::mpc::semi2k::B2A_Randbit::proc()::{lambda()#1}::operator()()::{lambda()#3}::operator()()+0x7f202e2bd9f2
#5 spu::mpc::semi2k::B2A_Randbit::proc()+0x7f202e2c0a89
#6 spu::mpc::UnaryKernel::evaluate()+0x7f202e19efdb
#7 spu::mpc::Object::call<>()+0x7f202e2c60b8
#8 spu::mpc::(anonymous namespace)::_Lazy2A()+0x7f202e2dfb19
#9 spu::mpc::ABProtAddSP::proc()+0x7f202e2e019b
#10 spu::mpc::BinaryKernel::evaluate()+0x7f202e19f2f2
#11 spu::mpc::Object::call<>()+0x7f202e2c6866
#12 spu::mpc::add_sp()+0x7f202e2c6994
#13 spu::hal::_add_sp()+0x7f202e171b63
#14 spu::hal::_add()+0x7f202e167486
#15 spu::hal::_popcount()+0x7f202e168b8c
在做三方逻辑回归时,遇到上述报错。似乎和训练的数据量有关系。这块如果代码不调整的话,是否只能升级机器配置或加计算节点优化呢?
Build/Install
binary
latest
No response
No response
No response
No response
按照官方的文档
《https://secretflow.readthedocs.io/en/latest/getting_started/deployment.html》
RAY_DISABLE_REMOTE_CODE=true \
RAY_SECURITY_CONFIG_PATH=config.yml \
RAY_USE_TLS=1 \
RAY_TLS_SERVER_CERT=servercert.pem \
RAY_TLS_SERVER_KEY=serverkey.pem \
RAY_TLS_CA_CERT=cacert.pem \
ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
192.168.137.4 6379是本机的redis的ip和port
执行ray start的时候抛出异常如下:
(yinyu) [root@alice secretflow_cluster]# ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
Usage stats collection is disabled.
Local node IP: 192.168.137.4
2022-07-14 07:02:52,453 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
2022-07-14 07:03:08,419 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
2022-07-14 07:05:37,765 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
请问这是什么造成的呢?(redis确定是可以连接成功的)
按照官方的文档
《https://secretflow.readthedocs.io/en/latest/getting_started/deployment.html》
RAY_DISABLE_REMOTE_CODE=true \
RAY_SECURITY_CONFIG_PATH=config.yml \
RAY_USE_TLS=1 \
RAY_TLS_SERVER_CERT=servercert.pem \
RAY_TLS_SERVER_KEY=serverkey.pem \
RAY_TLS_CA_CERT=cacert.pem \
ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
192.168.137.4 6379是本机的redis的ip和port
执行ray start的时候抛出异常如下:
(yinyu) [root@alice secretflow_cluster]# ray start --head --node-ip-address="192.168.137.4" --port="6379" --resources='{"alice": 8}' --include-dashboard=False --disable-usage-stats
Usage stats collection is disabled.
Local node IP: 192.168.137.4
2022-07-14 07:02:52,453 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
2022-07-14 07:03:08,419 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
2022-07-14 07:05:37,765 WARNING utils.py:1282 -- Unable to connect to GCS at 192.168.137.4:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.
请问这是什么造成的呢?(redis确定是可以连接成功的)
Bug
source
secretflow 最新版
MacOS 13.0
3.8.13
No response
No response
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [8], in <cell line: 22>()
20 # Validate the model
21 X_test, y_test = load_test_dataset()
---> 22 auc=validate_model(W,b, X_test, y_test)
23 print(f'auc={auc}')
24 print(type(x1))
Input In [5], in validate_model(W, b, X_test, y_test)
4 def validate_model(W, b, X_test, y_test):
5 y_pred = predict(W, b, X_test)
----> 6 return roc_auc_score(y_test, y_pred)
File ~/opt/anaconda3/envs/secretflow/lib/python3.8/site-packages/sklearn/metrics/_ranking.py:560, in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
553 raise ValueError(
554 "Partial AUC computation not available in "
555 "multiclass setting, 'max_fpr' must be"
556 " set to `None`, received `max_fpr={0}` "
557 "instead".format(max_fpr)
558 )
559 if multi_class == "raise":
--> 560 raise ValueError("multi_class must be in ('ovo', 'ovr')")
561 return _multiclass_roc_auc_score(
562 y_true, y_score, labels, multi_class, average, sample_weight
563 )
564 elif y_type == "binary":
ValueError: multi_class must be in ('ovo', 'ovr')
%matplotlib inline
# Load the data
x1, _ = load_train_dataset(party_id=1)
x2, y = load_train_dataset(party_id=2)
# Hyperparameter
W = jnp.zeros((4,))
b = 0.0
epochs = 10
learning_rate = 1e-2
# Train the model
losses, W, b = fit(W, b, x1, x2, y, epochs=100, learning_rate=1e-2)
# Plot the loss
plot_losses(losses)
# Validate the model
X_test, y_test = load_test_dataset()
auc=validate_model(W,b, X_test, y_test)
print(f'auc={auc}')
print(type(x1))
Others
source
beta
Centos7
3.8
No response
No response
我在搭建ray集群的时候,与官方的例子稍有不同的是,每一个party我给了2个服务器,所有三个party总共6个服务器,
(secretflow) [root@fateonspark140 ~]# ray status
======== Autoscaler status: 2022-07-28 12:21:29.600048 ========
Node status
---------------------------------------------------------------
Healthy:
1 node_4241b4efcb4c09ce8f883ce0d012b871afedfa5bb74fd4a5eb0076c4
1 node_ebf3803eb0f45c335b87b6f5545da48353aac0de806c7c06cd954528
1 node_367f61e86335ee89578d341a09b78f903ef6ebd4fdc32624e22874e3
1 node_3832e719372baff08ce3ea1c296a29ce3b2f4f4ba7bfb96faf8723ec
1 node_24628ec7113a604915dab58daf115b90f574807fd235c3b48789c0af
1 node_a41c00e81d5074eb5753d1e4b0f9fe183da7c677697664d4dc77214a
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
2.0/72.0 CPU
1.0/16.0 alice
1.0/16.0 bob
0.0/16.0 charlie
0.00/61.334 GiB memory
0.00/27.034 GiB object_store_memory
Demands:
(no resource demands)
但是在执行mpc spu的时候,发现每个party只能指定一个address,如下:
sf.init(address='172.16.4.140:6379', _redis_password='')
alice, bob = sf.PYU('alice'), sf.PYU('bob')
device = sf.SPU({
'nodes': [
{
'party': 'alice',
'id': '140:0',
# The address for other peers.
'address': '172.16.4.140:8881',
# The listen address of this node.
# Optional. Address will be used if listen_address is empty.
# 'listen_address': ''
},
{
'party': 'bob',
'id':'141:0',
'address': '172.16.4.141:8881',
# 'listen_address': ''
}
],
'runtime_config': {
'protocol': spu.spu_pb2.SEMI2K,
'field': spu.spu_pb2.FM128,
'sigmoid_mode': spu.spu_pb2.RuntimeConfig.SIGMOID_REAL
}
})
我的问题是:
1、一个party是否支持多个服务器?
2、一个任务在一个party上,是否只能跑在一个服务器上,因为代码中的address不是一个数组类型
3、必须指定address吗?如下,难道计算框架ray不能自己选择一台服务器吗?既然已经指定了Alice,难道不能在alice所拥有的服务器中自己选择一台吗?
{
'party': 'alice',
'id': '140:0',
# The address for other peers.
'address': '172.16.4.140:8881',
# The listen address of this node.
# Optional. Address will be used if listen_address is empty.
# 'listen_address': ''
},
4、能否同时运行两个任务?如果可以,两个任务都指定到了同一台服务器上会怎么样?
sf.init(address='172.16.4.140:6379', _redis_password='')
alice, bob = sf.PYU('alice'), sf.PYU('bob')
device = sf.SPU({
'nodes': [
{
'party': 'alice',
'id': '140:0',
# The address for other peers.
'address': '172.16.4.140:8881',
# The listen address of this node.
# Optional. Address will be used if listen_address is empty.
# 'listen_address': ''
},
{
'party': 'bob',
'id':'141:0',
'address': '172.16.4.141:8881',
# 'listen_address': ''
}
],
'runtime_config': {
'protocol': spu.spu_pb2.SEMI2K,
'field': spu.spu_pb2.FM128,
'sigmoid_mode': spu.spu_pb2.RuntimeConfig.SIGMOID_REAL
}
})
Others
binary
0.6.13
No response
No response
No response
No response
你好。。pip install -U secretflow报错,需要怎么处理?
ERROR: Could not find a version that satisfies the requirement spu==0.1.0b1 (from secretflow) (from versions: none)
ERROR: Could not find a version that satisfies the requirement spu==0.1.0b1 (from secretflow) (from versions: none)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.