sap-samples / datasphere-fedml Goto Github PK

The publication is a collection of sample code to show how data from SAP and non-SAP systems can be made available for training in ANY hyperscaler machine learning service via several layers of abstraction from data connection to training using our FedML Python libraries.

License: Apache License 2.0

Jupyter Notebook 82.89% Python 17.11%

sample-code sap-data-warehouse-cloud machine-learning data-federation data-to-value hyperscalers btp-use-case-factory sample sap-datasphere 4200

datasphere-fedml's Introduction

FedML

Description

The SAP Federated ML Python libraries (FedML) applies the Data Federation architecture of SAP Datasphere for intelligently sourcing SAP as well as non-SAP data for Machine Learning experiments done at the ML platforms removing the need for replicating or moving data. By abstracting the Data Connection, Data load (for all ML platforms) and Model training (with flexibility and provision for user provided training scripts), Model Deployment, and Inferencing (for Hyperscaler Machine learning platforms) , the FedML library provides end to end integration with few lines of code .

What's New

1. The new version of FedML (available as fedml-dsp in PyPi, V1.0.0) :

Is machine learning platform-independent. It can be used in all machine learning platforms
Supports NVIDIA RAPIDS™, CUDA cuDF and cuPy and hence can be used for training models in GPU environments.
Supports sourcing data from SAP Datasphere models directly into PySpark and cuPy (for GPU) dataframes.
Supports SAP AI Core Deployment - Models that are trained in any ML Platform (and containerized independently) can now be deployed in SAP GenAI Hub's AI Core with couple lines of code.
Supports writing inferenced results back to SAP Datasphere.

Solution Architecture

2.FedML (Original, V2.0) for hyperscaler platforms [AWS, GCP, Azure and Databricks] :

Is pip installable from PyPi for its respective hyperscaler platforms.
Supports model training and deployment to hyperscaler environment.
Supports deployment to SAP Business Technology Platform Kyma environment.
Supports inferencing with hyperscaler deployed as well as Kyma deployed models.
Supports writing inferenced results back to SAP Datasphere.

Solution Architecture - FedML Hyperscaler libraries

Requirements

SAP Datasphere tenant instance, with connectivity established to the remote data sources, and views exposed, that can be consumed by FedML.
Access to corresponding Machine learning Platforms with appropriate configurations. See Configuration section.

Download and Installation

Try out examples from the samples-notebooks directory of corresponding library folders

Configuration

For FedML (platform-independent) library specific pre-requisites, configuration and documentation, please refer here
For AWS FedML library specific pre-requisites, configuration and documentation, please refer here
For GCP FedML library specific pre-requisites, configuration and documentation, please refer here
For Azure FedML library specific pre-requisites, configuration and documentation, please refer here
For Databricks FedML library specific pre-requisites, configuration and documentation, please refer here

Limitations

None

How to obtain support

This project is provided "as-is" with no expectation for major changes or support.
Create an issue in this repository if you find a bug or have questions about the content.
For additional support, ask a question in SAP Community.

Licensing

datasphere-fedml's People

Contributors

Stargazers

Watchers

Forkers

pbaumann76 samanthalcombs isabella232 masriniraj rizedb kevingoh abourcevet afk-python nikhilchetwani rajatmandaniyan topecz1

datasphere-fedml's Issues

Support Pandas 2 and up

The current solution does not support Pandas 2 and up as it relies on iteritems() see https://pandas.pydata.org/docs/whatsnew/v2.0.0.html

There's a need to replace iteritems() with dict.items().

[rl-reuse_tool-3] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-3
Explanation: Is it registered in REUSE? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Incompatible packages--Unable to train sample notebook on Vertex AI prebuilt container

I am trying to using Vertex AI to train a linear regression model on data stored in SAP DWC by using the fedml_gcp package. I am following the sample notebook and script provided here. I have attached the notebook (test_1.ipynb) and python script (LinearRegressionScript.py) that I am running, along with the log files. They are in this zip file: [fed ml files.zip](
fedml files.zip
)

This is the code I ran the first time:

from fedml_gcp import dwcgcp
import os

PROJECT_ID = 'project-name'
REGION = 'asia-southeast1'

BUCKET_NAME = 'sap_dwc_fed'
BUCKET_URI = "gs://"+BUCKET_NAME
BUCKET_FOLDER = 'linear_test'
MODEL_OUTPUT_DIR = BUCKET_URI+'/'+BUCKET_FOLDER

SCRIPT_PATH = 'LinearRegressionScript.py'
JOB_NAME = "linear-regression-training"

MODEL_DISPLAY_NAME = "linear-regression-model"
DEPLOYED_MODEL_DISPLAY_NAME = 'linear-regression-deployed-model'

params = {'project':PROJECT_ID,
         'location':REGION, 
         'staging_bucket':BUCKET_URI}

dwc = dwcgcp.DwcGCP(params)

TRAIN_VERSION = "scikit-learn-cpu.0-23"
DEPLOY_VERSION = "sklearn-cpu.1-0"

TRAIN_IMAGE = "us-docker.pkg.dev/vertex-ai/training/{}:latest".format(TRAIN_VERSION)
DEPLOY_IMAGE = "us-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(DEPLOY_VERSION)

table_name = 'sample_dataset'
table_size = 1
job_dir = 'gs://'+BUCKET_NAME

cmd_args = [
    "--table_name=" + str(table_name),
    "--table_size="+ str(table_size),
    "--job-dir=" + str(job_dir),
    "--bucket_name=" + str(BUCKET_NAME),
    "--bucket_folder=" + str(BUCKET_FOLDER)
]

required_packages = [
    'fedml_gcp',
    'matplotlib>=2.2.3',
    'seaborn>=0.9.0',
    'scikit-learn>=0.20.2',
    'pandas>=1.1.4',
    'numpy',
    'hdbcli',
    'pandas-gbq'
]

inputs2 = {
    'display_name':JOB_NAME,
    'script_path':SCRIPT_PATH,
    'container_uri':TRAIN_IMAGE,
    'model_serving_container_image_uri':DEPLOY_IMAGE,
    'requirements':required_packages
}

run_job_params2 = {'model_display_name':MODEL_DISPLAY_NAME,
                  'args':cmd_args,
                  'replica_count':1,
                  'base_output_dir':MODEL_OUTPUT_DIR,
                  'sync':True}

model = dwc.train_model(training_inputs=inputs2, 
                      training_type='custom',
                       params=run_job_params2)

This resulted in package dependency issues (please refer to Log_1.json in the attached zip file). Based on the error messages, I changed the required packages to:

required_packages = [
    'fedml_gcp',
    'matplotlib>=2.2.3',
    'seaborn>=0.9.0',
    'scikit-learn>=0.20.2',
    'pandas>=1.1.4',
    'numpy',
    'hdbcli',
    'pandas-gbq',
    'google-auth<2,>=1.25.0',
    'google-auth-oauthlib<0.5,>=0.4.1',
    'google-api-core[grpc]<2.0.0dev,>=1.34.0',
    'google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<2dev,>=1.31.5',
    'google-cloud-core<2.0dev,>=1.1.0',
    'googleapis-common-protos[grpc]<2.0.0dev,>=1.56.0',
    'grpcio<2.0dev,>=1.47.0',
    'packaging<22.0.0dev,>=14.3',
    'protobuf!=3.20.0,!=3.20.1,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5'
]

This resulted in more package dependency issues (please refer to Log_2.json in the attached zip file). The errors shown this time require package versions that contradict the versions specified in the errors in the first log file. Here are the contradicting package requirements:

First log: "google-cloud-logging 1.15.0 has requirement google-cloud-core<2.0dev,>=1.1.0", and the second log: "google-cloud-storage 2.7.0 has requirement google-cloud-core<3.0dev,>=2.3.0"
First log: "google-api-python-client 1.9.3 has requirement google-api-core<2dev,>=1.18.0", and the second log: "pandas-gbq 0.19.1 has requirement google-api-core<3.0.0dev,>=2.10.2"
First log: "tensorboard 2.2.2 has requirement google-auth<2,>=1.6.3" and the second log: "pandas-gbq 0.19.1 has requirement google-auth>=2.13.0"
First log: "tensorboard 2.2.2 has requirement google-auth-oauthlib<0.5,>=0.4.1" and the second log: "pandas-gbq 0.19.1 has requirement google-auth-oauthlib>=0.7.0"

I am using the Vertex AI prebuilt container that is used in the sample notebook provided by SAP as a part of the documentation for fedml-gcp (this container). But the packages pandas-gbq, hdbcli, and fedml_gcp are not getting installed on this container.

Please help!

[rl-assigned_teams-3] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-3
Explanation: Does it have enough admins on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-reuse_tool-4] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-4
Explanation: Is it compliant with REUSE rules? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-reuse_tool-1] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-1
Explanation: Does README mention REUSE? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-assigned_teams-2] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-2
Explanation: Does it have an admin team on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-vulnerability_alerts-1] Violation against OSS Rules of Play

The OSPO bot created this issue by mistake - It did not have enough priviledge to check the vulnerability alerts, So I am closing this issue now. Sorry for any inconvenience.

sap-samples / datasphere-fedml Goto Github PK

datasphere-fedml's Introduction

FedML

Description

What's New

Solution Architecture

Solution Architecture - FedML Hyperscaler libraries

Requirements

Download and Installation

Configuration

Limitations

How to obtain support

Licensing

datasphere-fedml's People

Contributors

Stargazers

Watchers

Forkers

datasphere-fedml's Issues

Recommend Projects

Recommend Topics

Recommend Org