Giter Site home page Giter Site logo

Comments (4)

mbookman avatar mbookman commented on August 27, 2024

Hi @faruqsandi !

Thank-you for reporting the problem you ran into. When you say "it is not working", can you clarify what you observe did happen? Was there a job failure?

We have a test for recursive outputs that does include spaces. It doesn't include spaces at the end of the path, as your test case did, however I just tested this scenario and did get outputs. I would expect that you would observe outputs in your target directory if you double-quote your path (and include a trailing slash):

gsutil ls "gs://mybucket/faruq/fastp_pe_manual /"

or you could use the extended wildcard support:

gsutil ls gs://mybucket/faruq/fastp_pe_manual**

dsub is technically doing the "right" thing here in supporting passing through strings that can have whitespace (uncommon on *nix systems, more common on Windows). That said, it would be very reasonable for dsub to try to detect "problematic" characters (such as spaces) in paths and emit a warning to users.

Please let us know what you find out.

Thanks!

from dsub.

faruqsandi avatar faruqsandi commented on August 27, 2024

Hello!
Thanks for the reply!

Yes, I agree too to use warning. Because probably this error (a typo) will only happen once in blue moon. I also just realized that we can actually create a folder with trailing spaces *nix. For example mkdir "hello! there!" is possible and we need to use cd hello\!\ \ \ \ \ \ \ \ \ \ \ \ there\!/ to get inside it.

So, while it is possible for bash to run a script that a path contains trailing whitespaces, probably it is not possible in GCP bucket. Anyway, I tried something that might be worth looking into..


Ok, Let me show you the success scenario with this whitespace_example.sh:

touch example.txt
mv example.txt $OUTPUT_DIR

using this dsub command (no double quotes, so trailing whitespaces doesnt matter):

dsub \
	 --provider google-cls-v2 \
	--project xna-labs-uvaca-labs-workspace  \
	--zones "us-central1-*" \
	--logging "gs://mybucket/faruqsandi_new/dsub_test/logging" \
	--output-recursive OUTPUT_DIR=gs://mybucket/faruqsandi_new/dsub_test  \
	--script ./whitespace_example.sh \
	--wait

the output of

gsutil ls -R  gs://mybucket/faruqsandi_new/

is

gs://mybucket/faruqsandi_new/:

gs://mybucket/faruqsandi_new/dsub_test/:
gs://mybucket/faruqsandi_new/dsub_test/example.txt

gs://mybucket/faruqsandi_new/dsub_test/logging/:
gs://mybucket/faruqsandi_new/dsub_test/logging/whitespace--bit--230318-125920-93-stderr.log
gs://mybucket/faruqsandi_new/dsub_test/logging/whitespace--bit--230318-125920-93-stdout.log
gs://mybucket/faruqsandi_new/dsub_test/logging/whitespace--bit--230318-125920-93.log

there is example.txt, which is what we expected. I believe, using this param (double quotes, no trailing whitespace)

	--output-recursive OUTPUT_DIR="gs://mybucket/faruqsandi_new/dsub_test"  \

will yield the same thing.


Let's move to failed scenario using this dsub command (double quotes and trailing whitespace):

dsub \
	 --provider google-cls-v2 \
	--project xna-labs-uvaca-labs-workspace  \
	--zones "us-central1-*" \
	--logging "gs://mybucket/faruqsandi/dsub_test/logging" \
	--output-recursive "OUTPUT_DIR=gs://mybucket/faruqsandi/dsub_test " \
	--script ./whitespace_example.sh \
	--wait

log says success:

[
  {
    "job-name": "whitespace-example",
    "last-update": "2023-03-18 12:54:03.834733",
    "status-message": "Success",
    "job-id": "whitespace--bit--230318-125228-99",
    "user-id": "bit",
    "status": "SUCCESS",
    "status-detail": "Success",
    "create-time": "2023-03-18 12:52:32.074038",
    "start-time": "2023-03-18 12:52:48.026275",
    "end-time": "2023-03-18 12:54:03.834733",
    "internal-id": "projects/asdasdas/locations/us-central1/operations/adsadsa",
    "logging": "gs://mybucket/faruqsandi/dsub_test/logging/whitespace--bit--230318-125228-99.log",
    "labels": {},
    "envs": {},
    "inputs": {},
    "input-recursives": {},
    "outputs": {},
    "output-recursives": {
      "OUTPUT_DIR": "gs://mybucket/faruqsandi/dsub_test "
    },
    "mounts": {},
    "provider": "google-cls-v2",
    "provider-attributes": {
      "ssh": false,
      "block-external-network": null,
      "instance-name": "google-pipelines-worker-sdasdas",
      "zone": "us-central1-f",
      "regions": [],
      "zones": [
        "us-central1-a",
        "us-central1-b",
        "us-central1-c",
        "us-central1-f"
      ],
      "machine-type": "n1-standard-1",
      "preemptible": false,
      "boot-disk-size": 10,
      "network": "",
      "subnetwork": "",
      "use_private_address": false,
      "cpu_platform": "",
      "accelerators": [],
      "enable-stackdriver-monitoring": false,
      "service-account": "default",
      "disk-size": 200,
      "disk-type": "pd-standard",
      "volumes": []
    },
    "events": [
      {
        "name": "start",
        "start-time": "2023-03-18 05:52:48.026275+00:00"
      },
      {
        "name": "pulling-image",
        "start-time": "2023-03-18 05:53:42.370788+00:00"
      },
      {
        "name": "localizing-files",
        "start-time": "2023-03-18 05:53:53.424937+00:00"
      },
      {
        "name": "running-docker",
        "start-time": "2023-03-18 05:53:55.086473+00:00"
      },
      {
        "name": "delocalizing-files",
        "start-time": "2023-03-18 05:53:56.370611+00:00"
      },
      {
        "name": "ok",
        "start-time": "2023-03-18 05:54:03.834733+00:00"
      }
    ],
    "dsub-version": "v0-4-8",
    "script-name": "whitespace_example.sh",
    "script": "\ntouch example.txt\nmv example.txt $OUTPUT_DIR"
  }
]

However when I run this command to see what is in the OUTPUT_DIR:

gsutil ls -R  gs://mybucket/faruqsandi/

the output is:

gs://mybucket/faruqsandi/dsub_test/:

gs://mybucket/faruqsandi/dsub_test/logging/:
gs://mybucket/faruqsandi/dsub_test/logging/whitespace--bit--230318-125228-99-stderr.log
gs://mybucket/faruqsandi/dsub_test/logging/whitespace--bit--230318-125228-99-stdout.log
gs://mybucket/faruqsandi/dsub_test/logging/whitespace--bit--230318-125228-99.log

there is no example.txt. Both stdout.log and stderr.log are empty. The content of log is:

2023-03-18 05:53:53 INFO: gsutil -h Content-Type:text/plain  -mq cp /tmp/continuous_logging_action/output gs://mybucket/faruqsandi/dsub_test/logging/whitespace--bit--230318-125228-99.log
2023-03-18 05:53:53 INFO: mkdir -m 777 -p /mnt/data/output/gs/mybucket/faruqsandi/dsub_test /
2023-03-18 05:53:57 INFO: Delocalizing OUTPUT_DIR
2023-03-18 05:53:57 INFO: gsutil  -mq rsync -r /mnt/data/output/gs/mybucket/faruqsandi/dsub_test / gs://mybucket/faruqsandi/dsub_test /

probably the clue is in the last line of that log.

from dsub.

mbookman avatar mbookman commented on August 27, 2024

GCS does support whitespace in object paths.

The issue here is actually with the test whitespace_example.sh.

Rather than:

touch example.txt
mv example.txt $OUTPUT_DIR 

This should be:

touch example.txt
mv example.txt "$OUTPUT_DIR"

Otherwise, the "mv" command becomes:

mv example.txt /mnt/data/output/gs/mybucket/faruqsandi/dsub_test /

Instead of

mv example.txt "/mnt/data/output/gs/mybucket/faruqsandi/dsub_test /"

and so when dsub goes to rsync the output directory, that directory is empty.

With the output directory quoted, I do see example.txt showing up in my bucket:

$ gsutil ls -R "gs://mybucket/faruqsandi/dsub_test "
gs://mybucket/faruqsandi/dsub_test /:
gs://mybucket/faruqsandi/dsub_test /example.txt

from dsub.

faruqsandi avatar faruqsandi commented on August 27, 2024

from dsub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.