Giter Site home page Giter Site logo

ibm / monitor-wml-model-with-watson-openscale Goto Github PK

View Code? Open in Web Editor NEW
13.0 18.0 19.0 9.52 MB

Monitor performance, fairness, and quality of a WML model with AI OpenScale APIs

Home Page: https://developer.ibm.com/patterns/monitor-performance-fairness-and-quality-of-a-wml-model-with-ai-openscale-apis

License: Apache License 2.0

Jupyter Notebook 100.00%
ai watson-machine-learning jupyter-notebook ibm-watson-studio

monitor-wml-model-with-watson-openscale's Introduction

Monitor WML Model With Watson OpenScale

In this Code Pattern, we will use German Credit data to train, create, and deploy a machine learning model using Watson Machine Learning. We will create a data mart for this model with Watson OpenScale and configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

When the reader has completed this Code Pattern, they will understand how to:

  • Create and deploy a machine learning model using the Watson Machine Learning service
  • Setup Watson OpenScale Data Mart
  • Bind Watson Machine Learning to the Watson OpenScale Data Mart
  • Add subscriptions to the Data Mart
  • Enable payload logging and performance monitor for subscribed assets
  • Enable Quality (Accuracy) monitor
  • Enable Fairness monitor
  • Enable Drift montitor
  • Score the German credit model using the Watson Machine Learning
  • Insert historic payloads, fairness metrics, and quality metrics into the Data Mart
  • Use Data Mart to access tables data via subscription

architecture

Flow

  1. The developer creates a Jupyter Notebook on Watson Studio.
  2. The Jupyter Notebook is connected to a PostgreSQL database, which is used to store Watson OpenScale data.
  3. The notebook is connected to Watson Machine Learning and a model is trained and deployed.
  4. Watson OpenScale is used by the notebook to log payload and monitor performance, quality, and fairness.

Prerequisites

Steps

  1. Clone the repository
  2. Use free internal DB or Create a Databases for PostgreSQL DB
  3. Create a Watson OpenScale service
  4. Create a Watson Machine Learning instance
  5. Create a notebook in IBM Watson Studio on Cloud Pak for Data
  6. Run the notebook in IBM Watson Studio
  7. Setup OpenScale to utilize the dashboard

1. Clone the repository

git clone https://github.com/IBM/monitor-wml-model-with-watson-openscale
cd monitor-wml-model-with-watson-openscale

2. Use free internal DB or Create a Databases for PostgreSQL DB

If you wish, you can use the free internal Database with Watson OpenScale. To do this, make sure that the cell for KEEP_MY_INTERNAL_POSTGRES = True remains unchanged.

If you have or wish to use a paid Databases for Postgres instance, follow these instructions:

Note: Services created must be in the same region, and space, as your Watson Studio service.

  • Using the IBM Cloud Dashboard catalog, search for PostgreSQL and choose the Databases for Postgres service.
  • Wait for the database to be provisioned.
  • Click on the Service Credentials tab on the left and then click New credential + to create the service credentials. Copy them or leave the tab open to use later in the notebook.
  • Make sure that the cell in the notebook that has:
KEEP_MY_INTERNAL_POSTGRES = True

is changed to:

KEEP_MY_INTERNAL_POSTGRES = False

3. Create a Watson OpenScale service

Create Watson OpenScale, either on the IBM Cloud or using your On-Premise Cloud Pak for Data.

On IBM Cloud
  • If you do not have an IBM Cloud account, register for an account

  • Create a Watson OpenScale instance from the IBM Cloud catalog

  • Select the Lite (Free) plan, enter a Service name, and click Create.

  • Click Launch Application to start Watson OpenScale.

  • Click Auto setup to automatically set up your Watson OpenScale instance with sample data.

    Cloud auto setup

  • Click Start tour to tour the Watson OpenScale dashboard.

On IBM Cloud Pak for Data platform

Note: This assumes that your Cloud Pak for Data Cluster Admin has already installed and provisioned OpenScale on the cluster.

  • In the Cloud Pak for Data instance, go the (☰) menu and under Services section, click on the Instances menu option.

    Service

  • Find the OpenScale-default instance from the instances table and click the three vertical dots to open the action menu, then click on the Open option.

    Openscale Tile

  • If you need to give other users access to the OpenScale instance, go the (☰) menu and under Services section, click on the Instances menu option.

    Service

  • Find the OpenScale-default instance from the instances table and click the three vertical dots to open the action menu, then click on the Manage access option.

    Openscale Tile

  • To add users to the service instance, click the Add users button.

    Openscale Tile

  • For all of the user accounts, select the Editor role for each user and then click the Add button.

    Openscale Tile

4. Create a Watson Machine Learning instance

  • Under the Settings tab, scroll down to Associated services, click + Add service and choose Watson:

    Add service

  • Search for Machine Learning, Verify this service is being created in the same space as the app in Step 1, and click Create.

    Create Machine Learning

  • Alternately, you can choose an existing Machine Learning instance and click on Select.

  • The Watson Machine Learning service is now listed as one of your Associated Services.

  • In a different browser tab go to https://cloud.ibm.com/ and log in to the Dashboard.

  • Click on your Watson Machine Learning instance under Services, click on Service credentials and then on View credentials to see the credentials.

  • Save the credentials in a file. You will use them inside the notebook.

5. Create a notebook in IBM Watson Studio on Cloud Pak for Data

  • In Watson Studio or your on-premise Cloud Pak for Data, click New Project + under Projects or, at the top of the page click + New and choose the tile for Data Science and then Create Project.

  • Using the project you've created, click on + Add to project and then choose the Notebook tile, OR in the Assets tab under Notebooks choose + New notebook to create a notebook.

  • Select the From URL tab. [1]

  • Enter a name for the notebook. [2]

  • Optionally, enter a description for the notebook. [3]

  • For Runtime select the Default Spark Python 3.7 option. [4]

  • Under Notebook URL provide the following url: https://raw.githubusercontent.com/IBM/monitor-wml-model-with-watson-openscale/master/notebooks/WatsonOpenScaleMachineLearning.ipynb

Note: The current default (as of 8/11/2021) is Python 3.8. This will cause an error when installing the pyspark.sql SparkSession library, so make sure that you are using Python 3.7

  • Click the Create notebook button. [6]

OpenScale Notebook Create

6. Run the notebook in IBM Watson Studio

Follow the instructions for Provision services and configure credentials:

Your Cloud API key can be generated by going to the Users section of the Cloud console.

  • From that page, click your name, scroll down to the API Keys section, and click Create an IBM Cloud API key.

  • Give your key a name and click Create, then copy the created key and paste it below.

Alternately, from the IBM Cloud CLI :

ibmcloud login --sso
ibmcloud iam api-key-create 'my_key'
  • Enter the CLOUD_API_KEY in the cell 1.1 Cloud API key.

Create COS bucket and get credentials

  • In your IBM Cloud Object Storage instance, create a bucket with a globally unique name. The UI will let you know if there is a naming conflict. This will be used in cell 1.3.1 as BUCKET_NAME.

  • In your IBM Cloud Object Storage instance, get the Service Credentials for use as COS_API_KEY_ID, COS_RESOURCE_CRN, and COS_ENDPOINT:

    COS credentials

  • Add the COS credentials in cell 1.2 Cloud object storage details.

  • Insert your BUCKET_NAME in the cell 1.2.1 Bucket name.

  • Either use the internal Database, which requires No Changes or Add your DB_CREDENTIALS after reading the instructions preceeding that cell and change the cell KEEP_MY_INTERNAL_POSTGRES = True to become KEEP_MY_INTERNAL_POSTGRES = False.

  • Move your cursor to each code cell and run the code in it. Read the comments for each cell to understand what the code is doing. Important when the code in a cell is still running, the label to the left changes to In [*]:. Do not continue to the next cell until the code is finished running.

7. Setup OpenScale to utilize the dashboard

Now that you have created a machine learning model, you can utilize the OpenScale dashboard to gather insights.

Sample Output

You can find a sample notebook with output for WatsonOpenScaleMachineLearning-with-outputs.ipynb.

Openscale Dashboard

  • Go to the instance of Watson OpenScale for an IBM Cloud deployment, or to your deployed instance on Cloud Pak for Data on-premise version. Choose the Insights tab to get an overview of your monitored deployments, Accuracy alerts, and Fairness alerts.

WOS insights

  • Click on the tile for the Spark German Risk Deployment and you can see tiles for the Fairness, Accuracy, and Performance monitors.

OpenScale monitors

  • Click on one of the tiles, such as Drift to view details. Click on a point on the graph for more information on that particular time slice.

Drift monitor

  • You will see which types of drift were detected. Click on the number to bring up a list of transactions that led to drift.

Drift transactions

  • Click on the Explain icon on the left-hand menu and you'll see a list of transactions that have been run using an algorithm to provide explainability. Choose one and click Explain.

Choose transaction to explain

  • You will see a graph showing all the most influential features with the relative weights of contribution to the Predicted outcome.

View feature weights

  • Click the Inspect tab and you can experiment with changing the values of various features to see how that would affect the outcome. Click the Run analysis button to see what changes would be required to change the outcome.

Inspect features and change

License

Apache 2.0

monitor-wml-model-with-watson-openscale's People

Contributors

dolph avatar imgbot[bot] avatar ljbennett62 avatar rhagarty avatar sandhya-nayak avatar sanjeevghimire avatar scottdangelo avatar stevemart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monitor-wml-model-with-watson-openscale's Issues

Mistypes In NoteBook

Hi,
Found some mistype in the notebook:

  1. At the "load tranning data from github" section the link is broken and need to be replaces with - !wget https://raw.githubusercontent.com/IBM/monitor-wml-model-with-watson-openscale/master/data/german_credit_data_biased_training.csv

Screenshot_1

  1. At the "set up datamart" section there is if statement about DB2_CREDENTIALS which never defined - I assume it should be DB_CREDENTIALS
    Screenshot_2

Having issues running the notebook in my environment

I have an instance of Spark created in my cloud and use it as runtime for this notebook. I get the following error and am not able to resolve it. I would like to demo OpenScale, can someone please take a look at the error and fix it? I reached out to Lukasz Cmielowski and he asked me to open this issue. Thank you.

Here is the code I get the error in:
from pyspark.ml.classification import RandomForestClassifier
classifier = RandomForestClassifier(featuresCol="features")

pipeline = Pipeline(stages=[si_CheckingStatus, si_CreditHistory, si_EmploymentDuration, si_ExistingSavings, si_ForeignWorker, si_Housing, si_InstallmentPlans, si_Job, si_LoanPurpose, si_OthersOnLoan, si_OwnsProperty, si_Sex, si_Telephone, si_Label, va_features, classifier, label_converter])
model = pipeline.fit(train_data)

Here is the error:
Py4JJavaError Traceback (most recent call last)
/usr/local/src/spark21master/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:

/usr/local/src/spark21master/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:

Py4JJavaError: An error occurred while calling o163.transform.
: java.lang.IllegalArgumentException: Data type StringType is not supported.
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$transformSchema$1.apply(VectorAssembler.scala:121)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$transformSchema$1.apply(VectorAssembler.scala:117)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:117)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.feature.VectorAssembler.transform(VectorAssembler.scala:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:812)

During handling of the above exception, another exception occurred:

IllegalArgumentException Traceback (most recent call last)
in ()
3
4 pipeline = Pipeline(stages=[si_CheckingStatus, si_CreditHistory, si_EmploymentDuration, si_ExistingSavings, si_ForeignWorker, si_Housing, si_InstallmentPlans, si_Job, si_LoanPurpose, si_OthersOnLoan, si_OwnsProperty, si_Sex, si_Telephone, si_Label, va_features, classifier, label_converter])
----> 5 model = pipeline.fit(train_data)

/usr/local/src/spark21master/spark/python/pyspark/ml/base.py in fit(self, dataset, params)
62 return self.copy(params)._fit(dataset)
63 else:
---> 64 return self._fit(dataset)
65 else:
66 raise ValueError("Params must be either a param map or a list/tuple of param maps, "

/usr/local/src/spark21master/spark/python/pyspark/ml/pipeline.py in _fit(self, dataset)
104 if isinstance(stage, Transformer):
105 transformers.append(stage)
--> 106 dataset = stage.transform(dataset)
107 else: # must be an Estimator
108 model = stage.fit(dataset)

/usr/local/src/spark21master/spark/python/pyspark/ml/base.py in transform(self, dataset, params)
103 return self.copy(params)._transform(dataset)
104 else:
--> 105 return self._transform(dataset)
106 else:
107 raise ValueError("Params must be a param map but got %s." % type(params))

/usr/local/src/spark21master/spark/python/pyspark/ml/wrapper.py in _transform(self, dataset)
250 def _transform(self, dataset):
251 self._transfer_params_to_java()
--> 252 return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sql_ctx)
253
254

/usr/local/src/spark21master/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:

/usr/local/src/spark21master/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
77 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
78 if s.startswith('java.lang.IllegalArgumentException: '):
---> 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
80 raise
81 return deco

IllegalArgumentException: 'Data type StringType is not supported.'

The Jupyter note book does not work

Below is the error seen when running the cell with code -
"wml_models = wml_client.repository.get_details()
model_uid = None
for model_in in wml_models['models']['resources']:
if MODEL_NAME == model_in['entity']['name']:
model_uid = model_in['metadata']['guid']
break

if model_uid is None:
print("Storing model ...")

published_model_details = wml_client.repository.store_model(model=model, meta_props=model_props, training_data=train_data, pipeline=pipeline)
model_uid = wml_client.repository.get_model_uid(published_model_details)
print("Done")"

2019-07-15 06:27:06,712 - watson_machine_learning_client.wml_client_error - WARNING - Publishing model failed.
Reason: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Server': 'nginx', 'Date': 'Mon, 15 Jul 2019 06:27:06 GMT', 'Content-Type': 'application/json', 'Content-Length': '166', 'Connection': 'keep-alive', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1', 'Pragma': 'no-cache', 'Cache-Control': 'private, no-cache, no-store, must-revalidate', 'X-WML-User-Client': 'PythonClient', 'x-global-transaction-id': '423161n6m4d1d3f9f6c13759kew16c2e9f7b', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'})
HTTP response body: {"trace":"423161n6m4d1d3f9f6c13759kew16c2e9f7b","errors":[{"code":"invalid_framework_input","message":"The framework value specified: mllib, 2.4 is not supported."}]}

Error while running FPGrowth Model in Spark

Getting this error while trying to run a piece of code.
Code runs fine when using a different sample dataset.


fp_growth = FPGrowth(itemsCol="country", minSupport=0.1, minConfidence=0.5)
model = fp_growth.fit(grouped_orders)

Py4JJavaError Traceback (most recent call last)
in
1 fp_growth = FPGrowth(itemsCol="country", minSupport=0.1, minConfidence=0.5)
----> 2 model = fp_growth.fit(grouped_orders)

5 frames
/usr/local/lib/python3.8/dist-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o164.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 35.0 failed 1 times, most recent failure: Lost task 0.0 in stage 35.0 (TID 33) (973340009f58 executor driver): org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany).
at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:249)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2268)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2293)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1021)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1020)
at org.apache.spark.mllib.fpm.FPGrowth.genFreqItems(FPGrowth.scala:254)
at org.apache.spark.mllib.fpm.FPGrowth.run(FPGrowth.scala:219)
at org.apache.spark.ml.fpm.FPGrowth.$anonfun$genericFit$1(FPGrowth.scala:180)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.fpm.FPGrowth.genericFit(FPGrowth.scala:162)
at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:159)
at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany, Germany).
at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:249)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more

Problem in the notebook steps sequence

At the "Enable quality monitoring" step there is a call to enable quality monitoring with threshold and minrecords, the problem is that it gives error when you try to run it
Screenshot_3

I found out that you need to run the "Insert historical payloads" section before that to make it work like in the pictures below, to be more accurate what need to be done is the store payload logging function : subscription.payload_logging.store(records=recordsList)
Screenshot_1

Screenshot_2

Changes for new `Databases for PostgreSQL`

We no longer have Compose for PostgreSQL and will need to update the notebook and instructions to use Databases for PostgreSQL
The version of the Watson OpenScale Python SDK will need to be updated.

Add instructions to define DB_CREDENTIALS

It is not clear that DB_CREDENTIALS must be defined, but if will fail if not:

under Set up datamart

    data_mart_details = ai_client.data_mart.get_details()
    if 'internal_database' in data_mart_details and data_mart_details['internal_database']:
        if KEEP_MY_INTERNAL_POSTGRES:
            print('Using existing internal datamart.')
        else:
            if DB_CREDENTIALS is None:
                print('No postgres credentials supplied. Using existing internal datamart')
            else:
                print('Switching to external datamart')
                ai_client.data_mart.delete(force=True)
                ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
    else:
        print('Using existing external datamart')
except:
    if DB_CREDENTIALS is None:
        print('Setting up internal datamart')
        ai_client.data_mart.setup(internal_db=True)
    else:
        print('Setting up external datamart')
        try:
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
        except:
            print('Setup failed, trying Db2 setup')
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS, schema=DB_CREDENTIALS['username'])

utils.py create_connection_string() fails on IBM Cloud `Databases for PostgreSQL` credentials

Previously, a user was able to use Compose for PostGRE offering from IBM Cloud, but that offering is no longer available. I documented with this diagram:
https://github.com/IBM/monitor-custom-ml-engine-with-watson-openscale/blob/master/doc/source/images/ChooseComposePostgres.png

Now, we need the Compose for PostgreSQL version.
The credential structure is different, however, and the utils.py:create_connection_string() function fails with this exception trace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-df89728f4ac4> in <module>()
----> 1 create_postgres_schema(postgres_credentials=POSTGRES_CREDENTIALS, schema_name=SCHEMA_NAME)

/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/ibm_ai_openscale/utils/utils.py in create_postgres_schema(postgres_credentials, schema_name)
    273     import psycopg2
    274 
--> 275     conn_string = create_connection_string(postgres_credentials)
    276     conn = psycopg2.connect(conn_string)
    277     conn.autocommit = True

/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/ibm_ai_openscale/utils/utils.py in create_connection_string(postgres_credentials, db_name)
    291 
    292 def create_connection_string(postgres_credentials, db_name='compose'):
--> 293     hostname = postgres_credentials['uri'].split('@')[1].split(':')[0]
    294     port = postgres_credentials['uri'].split('@')[1].split(':')[1].split('/')[0]
    295     user = postgres_credentials['uri'].split('@')[0].split('//')[1].split(':')[0]

KeyError: 'uri'

Having issues with WML service credentials

Hi, I tried to run the jupyter notebook, everything was fine until I reached the section: Bind machine learning engines. I got the below error:


KeyError Traceback (most recent call last)
in
----> 1 binding_uid = ai_client.data_mart.bindings.add('WML instance', WatsonMachineLearningInstance(WML_CREDENTIALS))
2 if binding_uid is None:
3 binding_uid = ai_client.data_mart.bindings.get_details()['service_bindings'][0]['metadata']['guid']
4 bindings_details = ai_client.data_mart.bindings.get_details()
5 ai_client.data_mart.bindings.list()

/opt/conda/envs/Python36/lib/python3.6/site-packages/ibm_ai_openscale/engines/watson_machine_learning/instance.py in init(self, service_credentials)
31
32 validate_type(service_credentials['apikey'], 'service_credentials.apikey', str, True)
---> 33 validate_type(service_credentials['username'], 'service_credentials.username', str, True)
34 validate_type(service_credentials['password'], 'service_credentials.password', str, True)
35 AIInstance.init(self, service_credentials['instance_id'], service_credentials, WMLConsts.SERVICE_TYPE)

KeyError: 'username'

Note that my machine learning instance uses IAM token (API key) so it does not contain username and password. Moreover the section Save and deploy the model using the same credentials worked just fine.

from watson_machine_learning_client import WatsonMachineLearningAPIClient
import json

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)
......

published_model_details = wml_client.repository.store_model(model=model, meta_props=model_props, training_data=train_data, pipeline=pipeline)

By the way I ran the notebook in Watson Studio, Python 3.6, ibm-open-ai and pyspark are the same in the notebook. Could you please take a look? Thank you,

No longer works due to changes of IBM Cloud

when running code in note book:

time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.7, min_records=50)

Received errors:

MissingValue: No “output_data_schema” provided.
Reason: Column predictedLabel cannot be found in output_data_schema. Check if this column name is valid. Make sure that payload has been logged to populate schema.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.