Giter Site home page Giter Site logo

To build all with Bazel. about bladedisc HOT 4 OPEN

alibaba avatar alibaba commented on May 18, 2024 3
To build all with Bazel.

from bladedisc.

Comments (4)

Orion34-lanbo avatar Orion34-lanbo commented on May 18, 2024

As for one of our concerns, which we may need to change the current visibility of bazel targets, I did some research for RAL part.

RAL's dependency

As we can find in RAL's BUILD, these targets are needed(test dependency like //tensorflow/core:test_main are excluded).

        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//tensorflow/core:protos_all_cc",
        "//tensorflow/core:stream_executor_headers_lib",
        "//tensorflow/stream_executor",
        "//tensorflow/stream_executor/cuda:cuda_platform",
        "//tensorflow/stream_executor:cuda_platform",
        "//tensorflow/stream_executor:rocm_platform",
        "//tensorflow/stream_executor/rocm:rocm_driver",
        "//tensorflow/stream_executor/rocm:rocm_platform",

As in the current tf community’s code, only //tensorflow/stream_executor/rocm:rocm_driver is not public visible. However //tensorflow/stream_executor:rocm_platform depends on it, and is a public target we already depend on, thus we can replace depend on //tensorflow/stream_executor/rocm:rocm_driver with //tensorflow/stream_executor:rocm_platform.

As we can see from tf_community/tensorflow/stream_executor/rocm/BUILD

cc_library(
    name = "rocm_platform",
    srcs = if_rocm_is_configured(["rocm_platform.cc"]),
    hdrs = if_rocm_is_configured(["rocm_platform.h"]),
    visibility = ["//visibility:public"],
    deps = if_rocm_is_configured([
        ":rocm_driver",

tao compiler dependency

As for tao_compiler part, most of the dependencys are from tensorflow/compiler/mlir/hlo dir, all the targets from this dir are public.

However, as for targets under tensorflow/compiler/xla, the targets are visible for friends. This is the part where we maybe need to change the visibility. As for now, adding visibility change patch is necessary, or we can search for a public target containing the visibility: friends targets.

from bladedisc.

Orion34-lanbo avatar Orion34-lanbo commented on May 18, 2024

As we have done in build tao bridge with bazel for cu110 device #231
, we can successfully build tao_bridge for tensorflow-gpu versions.
However for cpu or even arm-cpu, aka aarch64, we have encountered a problem comes with mkldnn and acl.
Now we will download and build mkldnn and acl related in common_setup.py:

def config_mkldnn(root, args):
build_dir = mkldnn_build_dir(root)
ensure_empty_dir(build_dir, clear_hidden=False)
mkl_dir = mkl_install_dir(root)
acl_dir = acl_root_dir(root)
ensure_empty_dir(mkl_dir, clear_hidden=False)
ensure_empty_dir(acl_dir, clear_hidden=False)
if args.x86:
with cwd(mkl_dir):
# download mkl-lib/include
download_cmd = """
unset HTTPS_PROXY
curl -fsSL https://hlomodule.oss-cn-zhangjiakou.aliyuncs.com/mkl_package/mkl-static-2022.0.1-intel_117.tar.bz2 | tar xjv
curl -fsSL https://hlomodule.oss-cn-zhangjiakou.aliyuncs.com/mkl_package/mkl-include-2022.0.1-h8d4b97c_803.tar.bz2 | tar xjv
"""
execute(download_cmd)
if args.aarch64:
with cwd(acl_dir):
# downlaod and build acl for onednn
cmd = '''
readonly ACL_REPO="https://github.com/ARM-software/ComputeLibrary.git"
MAKE_NP="-j$(grep -c processor /proc/cpuinfo)"
ACL_DIR={}
git clone --branch v22.02 --depth 1 $ACL_REPO $ACL_DIR
cd $ACL_DIR
scons --silent $MAKE_NP Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 os=linux arch=arm64-v8a build=native extra_cxx_flags="-fPIC"
exit $?
'''.format(acl_dir)
execute(cmd)
# a workaround for static linking
execute('rm -f build/*.so')
execute('mv build/libarm_compute-static.a build/libarm_compute.a')
execute('mv build/libarm_compute_core-static.a build/libarm_compute_core.a')
execute('mv build/libarm_compute_graph-static.a build/libarm_compute_graph.a')
with cwd(build_dir):
cc = which("gcc")
cxx = which("g++")
# always link patine statically
flags = " -DMKL_ROOT={} ".format(mkl_dir)
envs = " CC={} CXX={} ".format(cc, cxx)
if args.aarch64:
envs += " ACL_ROOT_DIR={} ".format(acl_dir)
flags += " -DDNNL_AARCH64_USE_ACL=ON "
if args.ral_cxx11_abi:
flags += " -DUSE_CXX11_ABI=ON"
cmake_cmd = "{} cmake .. {}".format(envs, flags)
logger.info("configuring mkldnn ......")
execute(cmake_cmd)
logger.info("mkldnn configure success.")

After this, the built mkldnn and acl will be used by tao_bridge and tao_compiler with CMake and bazel.
When used in tao_compiler, the tao dir is linked under tf_community.

When trying to support bazel build for mkldnn, the newly added mkldnn bazel rules under third_party/bazel/mkldnn cannot be used in tf_community dir without patch code in tf_community. So for now, if we only support bazel build for tao_bridge part, the download and compile for mkldnn will not be deleted. However when tao_compiler become a single bazel workspace, we can use our own mkldnn bazel rules without doing extra build actions in common_setup.py.
We have the following actions to follow:

  • 1. bazel rule for mkldnn used by tao_bridge(for x86)
  • 2. bazel rule for acl used by tao_bridge (for aarch64)
  • 3. make tao_compiler a single bazel workspace
  • 4. tao_compiler build depend on our mkldnn/acl bazel rules

1 && 2 are onging actions.

from bladedisc.

Orion34-lanbo avatar Orion34-lanbo commented on May 18, 2024

Update:
In sprint2204, we have complete the work of build tao_bridge with bazel for internal version and opensource version build. And the internal also uses open-source tao_build.py now.
As for now, all of disc's target can be build by bazel. The remaining works to be done are as follows:

  • remove the useless CMake files and related scripts when we have done a successfully regression with the tao_bridge build by bazel.
  • Unify the configs in .bazelrc in tao_compiler tao pytorch_blade tensorflow_blade and share scripts logics as much as possible.
  • build tao_compiler binary inside tao_compiler workspace instead of org_tensorflow workspace.
  • make all the ral dependencies built by bazel, do not use CMake in tao_build.py anymore.

The last 2 items should be a little bit long-term work items since current bazel build from multiple workspaces works fine for now. However our final goal is still make the entire repo build in one large workspace.

from bladedisc.

qiuxiafei avatar qiuxiafei commented on May 18, 2024

To refine project structure of tensorflow_blade

  1. Currently, tensorflow_blade is fully managed by Bazel, while in pytorch_blade Bazel only takes control of C++ code. After some discussion, we decide to let tensorflow_blade follow pytorch_blade's style. This will help to free python project from miscellaneous Bazel details. For examples,
    1. Python developers usually treat a whole directory a module instead of using fine-grained Bazel targets.
    2. Making .whl package is also not straight-forward in Bazel since it requires developers to specify dependencies carefully to make sure it's included in final package, to write a wrapper shell script and to make the script a Bazel target. But this is pretty simple and easy in original python.
  2. Make the internal part of tensorflow_blade independent from PAI-Blade project and build from public tensorflow_blade just like torch_blade do. This will also help to make PAI-Blade as a pure python project (free from Bazel, too).

Tasks:

  • remove Bazel from tensorflow_blade's python part
  • make the above project structure able to work with internal part of tensorflow_blade.
  • other internal refactors.

update 2022-06-27:

  • Need to build tao_compiler_main from tensorflow_blade, see: #420

from bladedisc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.