Comments (20)
Hi @nidhidamodaran,
No, it's not tied to GCS access at all. Kubeflow Pipelines itself is designed to be run on-premise as well as on GKE, so you shouldn't feel the need to use GCP at all. The example TFX chicago taxi pipeline on kubeflow does use GCP services, including GCS for storage, Dataflow for Beam jobs and Cloud ML Engine for training at scale. You can however easily remove these dependencies and run the pipeline on-premise, though scalability will be a problem without a distributed runner for Beam. Hope that helps. Let me know if you have any more questions.
from tfx.
Hi @neuromage thanks for the reply.
I was trying out chicage taxi sample to get hands-on with tfx.
Below is the sample i tried:
@PipelineDecorator(
pipeline_name='chicago_taxi_simple',
log_root='/var/tmp/tfx/logs',
enable_cache=True,
additional_pipeline_args={'logger_args': logger_overrides},
pipeline_root=_pipeline_root)
def _create_pipeline():
examples = csv_input(_data_root)
example_gen = CsvExampleGen(input_base=examples)
statistics_gen = StatisticsGen(input_data=example_gen.outputs.examples)
infer_schema = SchemaGen(stats=statistics_gen.outputs.output)
validate_stats = ExampleValidator(
stats=statistics_gen.outputs.output, schema=infer_schema.outputs.output)
return [
example_gen, statistics_gen, infer_schema, validate_stats]
pipeline = KubeflowRunner().run(_create_pipeline())
When i run the pipeline, pod creation fails with error :
Unable to mount volumes for pod "chicago-taxi-simple-wwxmc-4030983742_kubeflow(438e1b67-4c73-11e9-a7e6-0273ce6a77d4)": timeout expired waiting for volumes to attach or mount for pod "kubeflow"/"chicago-taxi-simple-wwxmc-4030983742". list of unmounted volumes=[gcp-credentials]. list of unattached volumes=[podmetadata docker-lib docker-sock gcp-credentials pipeline-runner-token-gk4d7]
Could you help me understand what I am doing wrong in here.
from tfx.
Are you running this in a Kubeflow cluster in GCP? It looks like that's not the case. The error indicates that it was unable to mount the GCP credentials.
I think this brings up an important issue though. We should allow non-gcp usage of the components, and so mounting of GCP credentials should be user-configurable. I'll work on fixing this. In the meantime, if you're truly running on-prem, you can change the following line of code to remove the apply
call to not use GCP authentication:
from tfx.
yes @neuromage I will try that. Thanks.
from tfx.
Also, is there any option of using custom image for different pipeline stages in tfx?
from tfx.
Also, is there any option of using custom image for different pipeline stages in tfx?
Right now, this isn't possible without some work. You'd probably need to write the pipeline using Kubeflow PIpelines SDK instead, which would let you insert custom images/steps into your pipeline. However, this isn't straightforward, as you need to figure out how to pass around the metadata artifacts and use it in your custom step. I am planning to enable this use-case soon though, and will document it as a sample within the Kubeflow Pipelines repo when it's done.
/cc @krazyhaas
/cc @zhitaoli
/cc @ruoyu90
from tfx.
You’d probably need to write the pipeline using Kubeflow PIpelines SDK instead. … However, this isn’t straightforward, as you need to figure out how to pass around the metadata artifacts.
I'd love to add that to the docs or maybe even the README.md
on the tfx examples in the Kubeflow Pipelines repo.
We had found this PR and was wondering if it was possible already possible to pass these artifacts around, if that was encouraged, or if to use the metadata store the pipeline could only be constructed of tfx components. Your comment seems to answer all of these questions!
Thanks for all of these detailed responses @neuromage, they have been super helpful in connecting some dots while getting started.
from tfx.
Thanks @MattMorgis. Yeah, right now, we record those artifacts and TFX components know how to pass them around in a Kubeflow pipeline, but we haven't made this easily accessible by custom components just yet. I am planning on enabling this over the next few weeks. I'll update this thread with more info then.
from tfx.
We made some progress running this on AWS with Kubeflow, but we just hit one snag that is going to take a bit to overcome:
ValueError: Unable to get the Filesystem for path s3://<bucket>/data.csv
It's interesting because it is successfully connecting to S3 to read the filename, data.csv
. We simply specify the bucket.
However, I think the error that is raised is related to Apache Beams' Python SDK not having an S3 FileSystem: https://issues.apache.org/jira/browse/BEAM-2572
from tfx.
However, I think the error that is raised is related to Apache Beams' Python SDK not having an S3 FileSystem: https://issues.apache.org/jira/browse/BEAM-2572
That's correct. Until beam's python SDK supports S3, we can't run most of the TFX libraries on S3. We have a similar challenge with Azure Blob Storage.
from tfx.
I've been working on it. I'm about 50% complete and working with the Beam project/team to get it merged.
According to the ticket there is a Google Summer of Code student who may do the Azure Blob Storage file system as well.
from tfx.
from tfx.
That is a very good point @zhitaoli. We realized Tensorflow itself had S3 support, and it was able to find the CSV file in the bucket we were pointing to, however we then ran into the Beam unsupported S3 file system error.
I didn't realize Azure Blob Storage wasn't supported in Tensorflow itself either, in addition to Beam. I'll mention that in the ticket.
from tfx.
Looks like Beam support for S3 is close to being implemented (see https://issues.apache.org/jira/browse/BEAM-2572 and apache/beam#9955).
I would just like to second what has been discussed here. There is a pretty large user community who are interested in TFX and/or Kubeflow but are currently struggling to get into those frameworks due to a lack of non-GCP examples (and sometimes core functionality).
A TFX Chicaco Taxi example on Kubeflow for AWS/Azure/On-prem would be a great starting point for those of us who are currently not on GCP!
from tfx.
taxi_pipeline_kubeflow_local.py does not depend on GCO or GKE, but only depend on Kubeflow Pipelines deployment (backend) on any k8s.
from tfx.
To capture the discussion: TFX examples can already use HDFS and GCS (although we don't have example for former), and after Beam 2.18 is picked up there is also S3 support.
Azure blob storage support is tracked in Beam side.
We will file a separate feature request (#1185 ) for using separate images in each stage and discuss it there.
We will close this issue as won't fix. Please let us know if you think otherwise.
from tfx.
@zhitaoli I thought the beam fix won't be integrated until 2.19 per last update from Pablo. Can you confirm it will be available in 2.18?
from tfx.
@zhitaoli Can you please take a look at the above comment. Thanks!
from tfx.
@zhitaoli @gowthamkpr With Apache Beam 2.19, we still get
ValueError: Unable to get the Filesystem for path s3://<bucket>/test-data/*
Are there some tfx dependencies? We use tfx 0.21.
from tfx.
I think TFX installs Apache Beam with the gcp
extras, and to use S3 you'll need to install with the aws
ones. I also think the only way to do that right now is to rebuild from source.
from tfx.
Related Issues (20)
- loosening google-cloud-* package constraints for TFX 1.13 HOT 10
- TFX1.14.0 causing Google Cloud Dataflow jobs to fail HOT 27
- Slow parquet to TFRecord using parquet_executor.Executor HOT 3
- StatisticsGen treats zeros as missing data after FileBasedExampleGen with parquet_executor HOT 1
- [Request] Update to Apache Beam 2.52.0, enable Beam 2.46.0 compatibility HOT 5
- How to pass airflow task configuration to one custom component? HOT 3
- Error executing pip install tfx in new conda environment with python 3.10 HOT 6
- installing tfx 1.13.0 by pip takes so much time HOT 5
- TFX trainer component running in Kubeflow fails although it was successful in the Interactive Context HOT 8
- TFX components in GCP does not display component logs in GCP Vertex AI HOT 17
- DataFlow Job in TFX pipeline fails after running for an hour HOT 6
- TFX component never completes even though Vertex AI custom job succeeds / fails HOT 8
- Upgrade Tensorflow version HOT 3
- documentations for driver class HOT 2
- Custom driver support for KubeflowV2DagRunner HOT 3
- Error when starting Evaluator component HOT 6
- TFX 1.15.0 Issues HOT 1
- R2Score Metric is incompatible with Evaluator Component HOT 2
- Version issues with Savemodel
- Version Issues with Estimator SaveModel
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tfx.