Background When putting pytorch_blade</c

As we have done in <a href="https://github.com/alibaba/BladeDISC/pull/231" data-hoverc

To refine project structure of tensorflow_blade Currently,

To build all with Bazel. about bladedisc HOT 4 OPEN

alibaba commented on May 18, 2024 3

To build all with Bazel.

from bladedisc.

Comments (4)

Orion34-lanbo commented on May 18, 2024

As for one of our concerns, which we may need to change the current visibility of bazel targets, I did some research for RAL part.

RAL's dependency

As we can find in RAL's BUILD, these targets are needed(test dependency like //tensorflow/core:test_main are excluded).

        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//tensorflow/core:protos_all_cc",
        "//tensorflow/core:stream_executor_headers_lib",
        "//tensorflow/stream_executor",
        "//tensorflow/stream_executor/cuda:cuda_platform",
        "//tensorflow/stream_executor:cuda_platform",
        "//tensorflow/stream_executor:rocm_platform",
        "//tensorflow/stream_executor/rocm:rocm_driver",
        "//tensorflow/stream_executor/rocm:rocm_platform",

As in the current tf community’s code, only //tensorflow/stream_executor/rocm:rocm_driver is not public visible. However //tensorflow/stream_executor:rocm_platform depends on it, and is a public target we already depend on, thus we can replace depend on //tensorflow/stream_executor/rocm:rocm_driver with //tensorflow/stream_executor:rocm_platform.

As we can see from tf_community/tensorflow/stream_executor/rocm/BUILD

cc_library(
    name = "rocm_platform",
    srcs = if_rocm_is_configured(["rocm_platform.cc"]),
    hdrs = if_rocm_is_configured(["rocm_platform.h"]),
    visibility = ["//visibility:public"],
    deps = if_rocm_is_configured([
        ":rocm_driver",

tao compiler dependency

As for tao_compiler part, most of the dependencys are from tensorflow/compiler/mlir/hlo dir, all the targets from this dir are public.

However, as for targets under tensorflow/compiler/xla, the targets are visible for friends. This is the part where we maybe need to change the visibility. As for now, adding visibility change patch is necessary, or we can search for a public target containing the visibility: friends targets.

from bladedisc.

Orion34-lanbo commented on May 18, 2024

As we have done in build tao bridge with bazel for cu110 device #231
, we can successfully build tao_bridge for tensorflow-gpu versions.
However for cpu or even arm-cpu, aka aarch64, we have encountered a problem comes with mkldnn and acl.
Now we will download and build mkldnn and acl related in common_setup.py:

BladeDISC/scripts/python/common_setup.py

Lines 257 to 313 in d5f085b

    
           def config_mkldnn(root, args): 
        
               build_dir = mkldnn_build_dir(root) 
        
               ensure_empty_dir(build_dir, clear_hidden=False) 
        
               mkl_dir = mkl_install_dir(root) 
        
               acl_dir = acl_root_dir(root) 
        
               ensure_empty_dir(mkl_dir, clear_hidden=False) 
        
               ensure_empty_dir(acl_dir, clear_hidden=False) 
        
               if args.x86: 
        
                   with cwd(mkl_dir): 
        
                       # download mkl-lib/include 
        
                       download_cmd = """ 
        
                         unset HTTPS_PROXY 
        
                         curl -fsSL https://hlomodule.oss-cn-zhangjiakou.aliyuncs.com/mkl_package/mkl-static-2022.0.1-intel_117.tar.bz2  | tar xjv 
        
                         curl -fsSL https://hlomodule.oss-cn-zhangjiakou.aliyuncs.com/mkl_package/mkl-include-2022.0.1-h8d4b97c_803.tar.bz2 | tar xjv 
        
                       """ 
        
                       execute(download_cmd) 
        
               if args.aarch64: 
        
                   with cwd(acl_dir): 
        
                       # downlaod and build acl for onednn 
        
                       cmd = ''' 
        
                         readonly ACL_REPO="https://github.com/ARM-software/ComputeLibrary.git" 
        
                         MAKE_NP="-j$(grep -c processor /proc/cpuinfo)" 
        
                         ACL_DIR={} 
        
                         git clone --branch v22.02 --depth 1 $ACL_REPO $ACL_DIR 
        
                         cd $ACL_DIR 
        
                         scons --silent $MAKE_NP Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 os=linux arch=arm64-v8a build=native extra_cxx_flags="-fPIC" 
        
                         exit $? 
        
                       '''.format(acl_dir) 
        
                       execute(cmd) 
        
                       # a workaround for static linking 
        
                       execute('rm -f build/*.so') 
        
                       execute('mv build/libarm_compute-static.a build/libarm_compute.a') 
        
                       execute('mv build/libarm_compute_core-static.a build/libarm_compute_core.a') 
        
                       execute('mv build/libarm_compute_graph-static.a build/libarm_compute_graph.a') 
        
               with cwd(build_dir): 
        
                   cc = which("gcc") 
        
                   cxx = which("g++") 
        
                   # always link patine statically 
        
                   flags = " -DMKL_ROOT={} ".format(mkl_dir) 
        
                   envs = " CC={} CXX={} ".format(cc, cxx) 
        
                   if args.aarch64: 
        
                       envs += " ACL_ROOT_DIR={} ".format(acl_dir) 
        
                       flags += " -DDNNL_AARCH64_USE_ACL=ON " 
        
                   if args.ral_cxx11_abi: 
        
                       flags += " -DUSE_CXX11_ABI=ON" 
        
                   cmake_cmd = "{}  cmake .. {}".format(envs, flags) 
        
                   logger.info("configuring mkldnn ......") 
        
                   execute(cmake_cmd) 
        
                   logger.info("mkldnn configure success.")

After this, the built mkldnn and acl will be used by tao_bridge and tao_compiler with CMake and bazel.
When used in tao_compiler, the tao dir is linked under tf_community.

When trying to support bazel build for mkldnn, the newly added mkldnn bazel rules under third_party/bazel/mkldnn cannot be used in tf_community dir without patch code in tf_community. So for now, if we only support bazel build for tao_bridge part, the download and compile for mkldnn will not be deleted. However when tao_compiler become a single bazel workspace, we can use our own mkldnn bazel rules without doing extra build actions in common_setup.py.
We have the following actions to follow:

1. bazel rule for mkldnn used by tao_bridge(for x86)
2. bazel rule for acl used by tao_bridge (for aarch64)
3. make tao_compiler a single bazel workspace
4. tao_compiler build depend on our mkldnn/acl bazel rules

1 && 2 are onging actions.

from bladedisc.

Orion34-lanbo commented on May 18, 2024

Update:
In sprint2204, we have complete the work of build tao_bridge with bazel for internal version and opensource version build. And the internal also uses open-source tao_build.py now.
As for now, all of disc's target can be build by bazel. The remaining works to be done are as follows:

remove the useless CMake files and related scripts when we have done a successfully regression with the tao_bridge build by bazel.
Unify the configs in .bazelrc in tao_compiler tao pytorch_blade tensorflow_blade and share scripts logics as much as possible.
build tao_compiler binary inside tao_compiler workspace instead of org_tensorflow workspace.
make all the ral dependencies built by bazel, do not use CMake in tao_build.py anymore.

The last 2 items should be a little bit long-term work items since current bazel build from multiple workspaces works fine for now. However our final goal is still make the entire repo build in one large workspace.

from bladedisc.

qiuxiafei commented on May 18, 2024

To refine project structure of tensorflow_blade

Currently, tensorflow_blade is fully managed by Bazel, while in pytorch_blade Bazel only takes control of C++ code. After some discussion, we decide to let tensorflow_blade follow pytorch_blade's style. This will help to free python project from miscellaneous Bazel details. For examples,
1. Python developers usually treat a whole directory a module instead of using fine-grained Bazel targets.
2. Making .whl package is also not straight-forward in Bazel since it requires developers to specify dependencies carefully to make sure it's included in final package, to write a wrapper shell script and to make the script a Bazel target. But this is pretty simple and easy in original python.
Make the internal part of tensorflow_blade independent from PAI-Blade project and build from public tensorflow_blade just like torch_blade do. This will also help to make PAI-Blade as a pure python project (free from Bazel, too).

Tasks:

remove Bazel from tensorflow_blade's python part
- #379
make the above project structure able to work with internal part of tensorflow_blade.
- #440
- #435
other internal refactors.

update 2022-06-27:

Need to build tao_compiler_main from tensorflow_blade, see: #420

from bladedisc.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def config_mkldnn(root, args):
	build_dir = mkldnn_build_dir(root)
	ensure_empty_dir(build_dir, clear_hidden=False)
	mkl_dir = mkl_install_dir(root)
	acl_dir = acl_root_dir(root)
	ensure_empty_dir(mkl_dir, clear_hidden=False)
	ensure_empty_dir(acl_dir, clear_hidden=False)
	if args.x86:
	with cwd(mkl_dir):
	# download mkl-lib/include
	download_cmd = """
	unset HTTPS_PROXY
	curl -fsSL https://hlomodule.oss-cn-zhangjiakou.aliyuncs.com/mkl_package/mkl-static-2022.0.1-intel_117.tar.bz2 \| tar xjv
	curl -fsSL https://hlomodule.oss-cn-zhangjiakou.aliyuncs.com/mkl_package/mkl-include-2022.0.1-h8d4b97c_803.tar.bz2 \| tar xjv
	"""
	execute(download_cmd)

	if args.aarch64:
	with cwd(acl_dir):
	# downlaod and build acl for onednn
	cmd = '''
	readonly ACL_REPO="https://github.com/ARM-software/ComputeLibrary.git"
	MAKE_NP="-j$(grep -c processor /proc/cpuinfo)"

	ACL_DIR={}
	git clone --branch v22.02 --depth 1 $ACL_REPO $ACL_DIR
	cd $ACL_DIR

	scons --silent $MAKE_NP Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 os=linux arch=arm64-v8a build=native extra_cxx_flags="-fPIC"

	exit $?
	'''.format(acl_dir)
	execute(cmd)
	# a workaround for static linking
	execute('rm -f build/*.so')
	execute('mv build/libarm_compute-static.a build/libarm_compute.a')
	execute('mv build/libarm_compute_core-static.a build/libarm_compute_core.a')
	execute('mv build/libarm_compute_graph-static.a build/libarm_compute_graph.a')

	with cwd(build_dir):
	cc = which("gcc")
	cxx = which("g++")
	# always link patine statically
	flags = " -DMKL_ROOT={} ".format(mkl_dir)
	envs = " CC={} CXX={} ".format(cc, cxx)
	if args.aarch64:
	envs += " ACL_ROOT_DIR={} ".format(acl_dir)
	flags += " -DDNNL_AARCH64_USE_ACL=ON "

	if args.ral_cxx11_abi:
	flags += " -DUSE_CXX11_ABI=ON"

	cmake_cmd = "{} cmake .. {}".format(envs, flags)

	logger.info("configuring mkldnn ......")
	execute(cmake_cmd)
	logger.info("mkldnn configure success.")