Giter Site home page Giter Site logo

dvc-azure's Introduction

dvc-azure

azure plugin for dvc

dvc-azure's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dvc-azure's Issues

pull/repro: filesystem-related commands crash with ImportError (cannot import name 'fsspec_loop' from 'fsspec.asyn'`) for dvc[azure]

Bug Report

Description

Working with dvc[azure], installed as of 22.01.2023 (same with dependencies, generally fresh virtual environment), on previously working codebase, which was downloaded from version control – and executing DVC commands (such as pull/repro) results in a following error being thrown:
ERROR: unexpected error - azure is supported, but requires 'dvc-azure' to be installed: cannot import name 'fsspec_loop' from 'fsspec.asyn'

This issue seems like it is caused by some recent changes in the interface of fsspec – its new version was just released three days ago looking at the release history: https://pypi.org/project/fsspec/#history and DVC does not pin it to a specific release if I can see correctly.
When fsspec is manually downgraded to a previous version (pip install fsspec==2022.11.0), dvc commands work as intended.

Reproduce

Run filesystem-related commands (e.g. pull/repro) with dvc[azure] and the most recent fsspec version (fsspec==2023.1.0) installed.

Expected

Commands "getting through"/following with their actual logic without the described error.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 2.37.0 (pip)
---------------------------------
Platform: Python 3.10.6 on Linux-5.15.0-56-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 0.28.4
        dvc_objects = 0.14.0
        dvc_render = 0.0.15
        dvc_task = 0.1.6
        dvclive = 1.3.2
        scmrepo = 0.1.4
Supports:
        azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/sda1
Caches: local
Remotes: azure
Workspace directory: xfs on /dev/sda1
Repo: dvc, git

Additional Information (if any):

While I tested it on Azure specifically and marked it in the title as such, other providers might probably be affected as well.
Other than the version from dvc doctor output above, I tested it with dvc[azure] versions 2.8.3 and 2.42 (latest on PyPI) – the same error occurred in both, in 2.8.3 it even crashed when using just dvc without following the command.

fsspec update breaks dvc-azure

FSSPEC 2023.1.0 breaks dvc-azure with the following error:

ImportError: cannot import name 'fsspec_loop' from 'fsspec.asyn' (/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/fsspec/asyn.py)

`dvc push`: Get stuck when using Azure Cli login

Bug Report

dvc push: Get stuck when using Azure Cli login

Description

Following this example:
https://dvc.org/doc/command-reference/remote/modify#example-some-azure-authentication-methods

I tried to login with the az cli, add the remotes and then push some files. This gets stuck with:

2022-08-08 14:59:28,666 DEBUG: Preparing to transfer data from '/home/user/tests/dvc_data_registry/.dvc/cache' to 'dvc/cache'
2022-08-08 14:59:28,666 DEBUG: Preparing to collect status from 'dvc/cache'
2022-08-08 14:59:28,666 DEBUG: Collecting status from 'dvc/cache'
2022-08-08 14:59:28,667 DEBUG: Querying 1 oids via object_exists                                                               
  0% Querying remote cache|

somewhere in _indexed_dir_hashes in /home/user/envs/dvc/lib/python3.8/site-packages/dvc_data/status.py, when it tries to access the blob storage.

If you wait for around for like 5 minutes, this will show up:

ERROR: unexpected error - DefaultAzureCredential failed to retrieve a token from the included credentials.                     
Attempted credentials:
        EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot.this issue.

It works, when I use the account_key authentication.

Reproduce

  1. dvc init
  2. Copy dataset.zip to the directory
  3. dvc add dataset.zip
  4. dvc remote add -d myremote azure://mycontainer/object
  5. dvc remote modify myremote account_name 'myaccount'
  6. az login --use-device-code
  7. dvc push

Expected

Cache gets uploaded to blob storage.

Environment information

az cli version (installed with pip):

{
  "azure-cli": "2.39.0",
  "azure-cli-core": "2.39.0",
  "azure-cli-telemetry": "1.0.6",
  "extensions": {
    "azure-devops": "0.23.0",
    "ml": "2.5.0"
  }
}

Output of dvc doctor:

DVC version: 2.17.0 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.8.0-63-generic-x86_64-with-glibc2.29
Supports:
        azure (adlfs = 2022.7.0, knack = 0.9.0, azure-identity = 1.10.0),
        webhdfs (fsspec = 2022.7.1),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.6.0),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.6.0)

Additional Information (if any):

Connection Timeout when calling DVC Push to Azure remote storage

Bug Report

Description

When trying to push tracked files to remote storage on azure I get a error where the connection times out. The process appear to hang when trying to check for the existence of objects on the remote storage.

Reproduce

  1. dvc init
  2. dvc remote add -d remoteregistry azure://container_name/path
  3. dvc add dataset.zip
  4. dvc remote modify remoteregistry --local connection_string <connection_string>
  5. dvc push

Output

2023-03-01 16:30:42,551 DEBUG: v2.45.1 (pip), CPython 3.9.16 on Windows-10-10.0.19044-SP0
2023-03-01 16:30:42,551 DEBUG: command: C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\Scripts\dvc push datasets/test_folder/cb54000282dd4e3891aa8057adc092ff.jpg.dvc -v
2023-03-01 16:30:44,192 DEBUG: Preparing to transfer data from 'C:\NLP_Project\autodocs_dataset_creation\.dvc\cache' to 'autodocs-classifiers-datasets/'
2023-03-01 16:30:44,192 DEBUG: Preparing to collect status from 'autodocs-classifiers-datasets/'
2023-03-01 16:30:44,192 DEBUG: Collecting status from 'autodocs-classifiers-datasets/'
2023-03-01 16:30:44,194 DEBUG: Querying 1 oids via object_exists
2023-03-01 16:33:26,160 ERROR: unexpected error - Connection timeout to host https://storage_account.blob.core.windows.net/autodocs-classifiers-datasets/39/6dfb4c4cbe20ce15cd7ad4a569dd95: Connection timeout to host https://storage_account.blob.core.windows.net/autodocs-classifiers-datasets/39/6dfb4c4cbe20ce15cd7ad4a569dd95:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 980, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\asyncio\base_events.py", line 1050, in create_connection
    sock = await self._connect_sock(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\asyncio\base_events.py", line 961, in _connect_sock
    await self.sock_connect(sock, address)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\asyncio\selector_events.py", line 500, in sock_connect
    return await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\client.py", line 536, in _request
    conn = await self._connector.connect(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 540, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 901, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 980, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\async_timeout\__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\async_timeout\__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\transport\_aiohttp.py", line 257, in send
    result = await self.session.request(  # type: ignore
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\client.py", line 540, in _request
    raise ServerTimeoutError(
aiohttp.client_exceptions.ServerTimeoutError: Connection timeout to host https://storage_account.blob.core.windows.net/autodocs-classifiers-datasets/39/6dfb4c4cbe20ce15cd7ad4a569dd95

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\cli\__init__.py", line 210, in main
    ret = cmd.do_run()
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\cli\command.py", line 26, in do_run
    return self.run()
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\commands\data_sync.py", line 59, in run
    processed_files_count = self.repo.push(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\repo\__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\repo\push.py", line 89, in push
    result = self.cloud.push(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\data_cloud.py", line 154, in push
    return self.transfer(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\data_cloud.py", line 135, in transfer
    return transfer(src_odb, dest_odb, objs, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_data\hashfile\transfer.py", line 203, in transfer
    status = compare_status(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_data\hashfile\status.py", line 178, in compare_status
    dest_exists, dest_missing = status(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_data\hashfile\status.py", line 149, in status
    odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\db.py", line 412, in oids_exist
    return list(wrap_iter(remote_oids, callback))
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\db.py", line 36, in wrap_iter
    for index, item in enumerate(iterable, start=1):
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\db.py", line 358, in list_oids_exists
    in_remote = self.fs.exists(paths, batch_size=jobs)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\fs\base.py", line 345, in exists
    return fut.result()
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\concurrent\futures\_base.py", line 446, in result
    return self.__get_result()
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\concurrent\futures\_base.py", line 391, in __get_result
    raise self._exception
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\executors.py", line 134, in batch_coros
    result = fut.result()
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\adlfs\spec.py", line 1410, in _exists
    if await bc.exists(version_id=version_id):
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\aio\_blob_client_async.py", line 672, in exists
    await self._client.blob.get_properties(
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_generated\aio\operations\_blob_operations.py", line 473, in get_properties
    pipeline_response = await self._client._pipeline.run(  # type: ignore # pylint: disable=protected-access
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 200, in run
    return await first_node.send(pipeline_request)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  [Previous line repeated 5 more times]
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\policies\_redirect_async.py", line 62, in send
    response = await self.next.send(request)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\policies_async.py", line 137, in send
    raise err
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\policies_async.py", line 111, in send
    response = await self.next.send(request)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\policies_async.py", line 64, in send
    response = await self.next.send(request)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 101, in send
    await self._sender.send(request.http_request, **request.context.options),
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\base_client_async.py", line 176, in send
    return await self._transport.send(request, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\transport\_aiohttp.py", line 289, in send
    raise ServiceRequestError(err, error=err) from err
azure.core.exceptions.ServiceRequestError: Connection timeout to host https://<storage_account>.blob.core.windows.net/container_name/39/6dfb4c4cbe20ce15cd7ad4a569dd95

Expected

I expect the data to be pushed to the remote storage.

Environment information

Platform: Python 3.9.16 on Windows-10-10.0.19044-SP0

Subprojects:
dvc_data = 0.40.3
dvc_objects = 0.19.3
dvc_render = 0.2.0
dvc_task = 0.1.11
dvclive = 2.1.0
scmrepo = 0.1.11

Supports:
azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: NTFS on C:
Caches: local
Remotes: azure
Workspace directory: NTFS on C:
Repo: dvc, git

Additional Information (if any):

dvc connectivity with Azure Blob Storage

Linking to an existing closed bug without resolution: iterative/dvc#8309

I am experiencing the exact same issue with what seems like the same configuration. The storage account in question most definitely exists, and has been tested with azure storage explorer using the same connection string.

Any suggestions would be appreciated.

Pin DVC version

Since the dvc version in dvc-azure isn't pinned, it is possible that you are using a diffrent version of dvc even if you have pinned the dvc-azure version.
This may leads to issues with using dvc.

For example: The dvc version 3.* is available now. With this version there are some extra fileds added to the config which an older dvc version can't handle and run into errors. We used dvc-azure in a new Python environement with dvc version 3.* , pushed the data in the storage account and tried to download the data in another Python environment. Both environments had pinned the same version of dvc-azure, but since the second environment was created earlier, it still used dvc 2.* and can't handle the new fields.

dvc push to Azure Blob storage is very slow

Bug Report

dvc push: is very slow when pushing to Azure Blob Storage remote

Description

When pushing larger files (~700MB) to a Azure Blob Storage remote I'm experiencing very slow speeds (3-4 min for a single file = ~4 MB/s). The same file takes around ~10s to upload (~70 MB/s) when using AzCopy (https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-upload).
Is this to be expected, or am I doing something wrong?

Reproduce

  1. dvc init
  2. dvc remote add -d myremote azure://dvc/
  3. dvc remote modify --local myremote connection_string 'BlobEndpoint=h...'
  4. add new large file (large_file.wav) to folder (~700 MB)
  5. dvc add large_file.wav
  6. dvc push

Expected

Fast upload speed ~ 70 MB/s.

Environment information

DVC Version: 3.1.0
Python Version: 3.8.10
OS: tried on MacOS Ventura 13.1 and Ubuntu 20.04.6 LTS

Thank you very much for the help!

push: not updating anything on the remote

Bug Report

dvc push is not really pushing newly changed files remotely even though it confirms the changes.

Description

The remote is an Azure blob storage that has versioning enabled.

When I do dvc push I do get the confirmation 1 file pushed but in the end, nothing has been pushed to the remote blob storage.

I can confirm this visually by browsing the files in the blob container (have enabled version_aware and can see that the modified timestamp corresponds to the old files) but also by trying dvc pull on another repo.

Reproduce

  1. Set remote to azure blob: dvc remote add -d my_azure azure://my-blob/
  2. dvc repro & dvc push
  3. Make changes
  4. dvc repro & dvc push again
  5. No changes are actually pushed to the remote though dvc.lock is updated accordingly

Expected

Files should be updated on remote.

Environment information

❯ dvc doctor
DVC version: 2.57.0 (pip)
-------------------------
Platform: Python 3.10.6 on Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.5.2
        dvc_task = 0.2.1
        scmrepo = 1.0.3
Supports:
        azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.13.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Config:
        Global: /home/user/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/32582e8b1552224ea25e5d697a41250a

asyncio issue "got Future <Future pending> attached to a different loop" possibly since https://github.com/iterative/dvc-objects/pull/180

We are using DVC in our Azure ML Setup with Azure ML Compute Instances. Lately some of our Data Scientists are experiencing some strange errors with dvc which we can not always reproduce. They seem to be related to the creation date of the compute instances (newly created instances seem to have the problem, older ones don't - even though they are using the same conda environment)
The error may be related to iterative/dvc-objects#180 (move threadpool usage out of odb and into fs)

Sample error output (from dvc status --cloud -v):

(<xxx-showcase-xxx-env>) azureuser@<xxx>:/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/<zzz>$ dvc status --cloud -v
2023-03-27 13:13:53,171 DEBUG: v2.51.0 (conda), CPython 3.8.16 on Linux-5.15.0-1031-azure-x86_64-with-glibc2.10
2023-03-27 13:13:53,171 DEBUG: command: /anaconda/envs/<xxx-showcase-xxx-env>/bin/dvc status --cloud -v
2023-03-27 13:13:56,481 DEBUG: Preparing to collect status from '<containerName>/<path>'
2023-03-27 13:13:56,482 DEBUG: Collecting status from '<containerName>/<path>'
2023-03-27 13:13:56,483 DEBUG: Querying 3 oids via object_exists                                                                                                                                                
2023-03-27 13:13:56,660 ERROR: unexpected error - Task <Task pending name='Task-3' coro=<AzureBlobFileSystem._exists() running at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/adlfs/spec.py:1410> cb=[_wait.<locals>._on_completion() at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/tasks.py:518]> got Future <Future pending> attached to a different loop
Traceback (most recent call last):
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/cli/__init__.py", line 210, in main
    ret = cmd.do_run()
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/commands/status.py", line 54, in run
    st = self.repo.status(
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/repo/__init__.py", line 67, in wrapper
    return f(repo, *args, **kwargs)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/repo/status.py", line 121, in status
    return _cloud_status(
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/repo/status.py", line 96, in _cloud_status
    status_info = self.cloud.status(obj_ids, jobs, remote=remote)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/data_cloud.py", line 213, in status
    return compare_status(
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 178, in compare_status
    dest_exists, dest_missing = status(
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 134, in status
    exists = hashes.intersection(
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 42, in _indexed_dir_hashes
    indexed_dir_exists.update(hashes)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_objects/db.py", line 351, in list_oids_exists
    in_remote = self.fs.exists(paths, batch_size=jobs)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 337, in exists
    return fut.result()
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_objects/executors.py", line 132, in batch_coros
    result = fut.result()
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/adlfs/spec.py", line 1410, in _exists
    if await bc.exists(version_id=version_id):
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 79, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 672, in exists
    await self._client.blob.get_properties(
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 79, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/storage/blob/_generated/aio/operations/_blob_operations.py", line 473, in get_properties
    pipeline_response = await self._client._pipeline.run(  # type: ignore # pylint: disable=protected-access
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 200, in run
    return await first_node.send(pipeline_request)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 68, in send
    response = await self.next.send(request)  # type: ignore
  [Previous line repeated 3 more times]
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication_async.py", line 82, in send
    await await_result(self.on_request, request)
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_tools_async.py", line 37, in await_result
    return await result  # type: ignore
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication_async.py", line 55, in on_request
    async with self._lock:
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/locks.py", line 97, in __aenter__
    await self.acquire()
  File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/locks.py", line 203, in acquire
    await fut
RuntimeError: Task <Task pending name='Task-3' coro=<AzureBlobFileSystem._exists() running at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/adlfs/spec.py:1410> cb=[_wait.<locals>._on_completion() at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/tasks.py:518]> got Future <Future pending> attached to a different loop

2023-03-27 13:13:56,917 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-03-27 13:13:56,918 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/.LCq7gNtvSuquXHnxQBoqd6.tmp'
2023-03-27 13:13:56,942 DEBUG: link type hardlink is not available ([Errno 95] no more link types left to try out)
2023-03-27 13:13:56,943 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/.LCq7gNtvSuquXHnxQBoqd6.tmp'
2023-03-27 13:13:56,978 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/.LCq7gNtvSuquXHnxQBoqd6.tmp'
2023-03-27 13:13:56,998 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/<zzz>/.dvc/cache/.7jDFtgav9UYJPWkFMvGVfq.tmp'
2023-03-27 13:13:57,084 DEBUG: Version info for developers:
DVC version: 2.51.0 (conda)
---------------------------
Platform: Python 3.8.16 on Linux-5.15.0-1031-azure-x86_64-with-glibc2.10
Subprojects:
        dvc_data = 0.44.1
        dvc_objects = 0.21.1
        dvc_render = 0.3.1
        dvc_task = 0.2.0
        scmrepo = 0.1.17
Supports:
        azure (adlfs = 2023.1.0, knack = 0.6.3, azure-identity = 1.12.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: symlink
Cache directory: cifs on //xxxxx.file.core.windows.net/yyyyy
Caches: local
Remotes: azure
Workspace directory: cifs on //xxxxx.file.core.windows.net/yyyyy
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/372c785fffbd7bee53ab6172b6125311

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-03-27 13:13:57,088 DEBUG: Analytics is enabled.
2023-03-27 13:13:57,197 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp98p9vuh7']'
2023-03-27 13:13:57,199 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp98p9vuh7']'

The following workarounds seem to be possible:

  • Pinning dvc-objects to Version 0.15.0 seems to solve the problem.
  • Using Python 3.10 instead of 3.8 (with the same versions of dvc as mentioned above)

push -r azure: aiohttp error

Bug Report

push -r azure: aiohttp error

Description

With DVC 2.8.1, I am not able to push cache to remote storage at Azure Blob. I am getting aiohttp's ValueError: Cannot combine AUTHORIZATION header with AUTH argument or credentials encoded in URL in either of the auth methods.

I tried authentication using username and az login, as well as access key.

Reproduce

  1. Install "dvc[all]" with python 3.9.7
  2. Produce some cache
  3. Log in with az login
  4. Add azure remote storage
  5. The error is there

Expected

Cache gets pushed to the remote without error

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.8.1 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-5.14.12-arch1-1-x86_64-with-glibc2.33
Supports:
        azure (adlfs = 2021.9.1, knack = 0.8.2, azure-identity = 1.7.0),
        gdrive (pydrive2 = 1.10.0),
        gs (gcsfs = 2021.10.0),
        hdfs (fsspec = 2021.10.0, pyarrow = 5.0.0),
        webhdfs (fsspec = 2021.10.0),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2021.10.0, boto3 = 1.17.106),
        ssh (sshfs = 2021.9.0),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.3),
        webdavs (webdav4 = 0.9.3)
Cache types: symlink
Cache directory: ext4 on /dev/sdc1
Caches: local
Remotes: local, azure
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git

Additional Information (if any):

Here is the full stacktrace:

Stacktrace
❯ dvc push -r azure -v
2021-10-20 12:05:28,463 DEBUG: Preparing to transfer data from ‚../../../storage/dvc' to 'azure://<link to container>'
2021-10-20 12:05:28,463 DEBUG: Preparing to collect status from 'azure://<link to container>'
2021-10-20 12:05:28,494 DEBUG: Collecting status from 'azure://<link to container>'
2021-10-20 12:05:28,495 DEBUG: Querying 2 hashes via object_exists
2021-10-20 12:05:28,517 ERROR: unexpected error - Cannot combine AUTHORIZATION header with AUTH argument or credentials encoded in URL
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.9/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/user/.local/lib/python3.9/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/user/.local/lib/python3.9/site-packages/dvc/command/data_sync.py", line 57, in run
    processed_files_count = self.repo.push(
  File "/home/user/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/dvc/repo/push.py", line 48, in push
    pushed += self.cloud.push(obj_ids, jobs, remote=remote, odb=odb)
  File "/home/user/.local/lib/python3.9/site-packages/dvc/data_cloud.py", line 85, in push
    return transfer(
  File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/status.py", line 160, in compare_status
    dest_exists, dest_missing = status(
  File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/status.py", line 48, in _indexed_dir_hashes
    dir_exists.update(odb.list_hashes_exists(dir_hashes - dir_exists))
  File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 415, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 608, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 445, in result
    return self.__get_result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 406, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/user/.local/lib/python3.9/site-packages/dvc/fs/fsspec_wrapper.py", line 136, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/user/.local/lib/python3.9/site-packages/adlfs/spec.py", line 1350, in exists
    return sync(self.loop, self._exists, path)
  File "/home/user/.local/lib/python3.9/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/user/.local/lib/python3.9/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/user/.local/lib/python3.9/site-packages/adlfs/spec.py", line 1372, in _exists
    if await bc.exists():
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/tracing/decorator_async.py", line 74, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 597, in exists
    await self._client.blob.get_properties(
  File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_generated/aio/operations/_blob_operations.py", line 394, in get_properties
    pipeline_response = await self._client._pipeline.run(request, stream=False, **kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 215, in run
    return await first_node.send(pipeline_request)
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
    response = await self.next.send(request)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
    response = await self.next.send(request)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
    response = await self.next.send(request)  # type: ignore
  [Previous line repeated 5 more times]
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/policies/_redirect_async.py", line 64, in send
    response = await self.next.send(request)
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
    response = await self.next.send(request)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_shared/policies_async.py", line 99, in send
    response = await self.next.send(request)
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
    response = await self.next.send(request)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_shared/policies_async.py", line 56, in send
    response = await self.next.send(request)
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
    response = await self.next.send(request)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
    response = await self.next.send(request)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 116, in send
    await self._sender.send(request.http_request, **request.context.options),
  File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_shared/base_client_async.py", line 180, in send
    return await self._transport.send(request, **kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/transport/_aiohttp.py", line 231, in send
    result = await self.session.request(    # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/aiohttp/client.py", line 468, in _request
    raise ValueError(
ValueError: Cannot combine AUTHORIZATION header with AUTH argument or credentials encoded in URL
------------------------------------------------------------
2021-10-20 12:05:28,949 DEBUG: Version info for developers:
DVC version: 2.8.1 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-5.14.12-arch1-1-x86_64-with-glibc2.33
Supports:
        azure (adlfs = 2021.9.1, knack = 0.8.2, azure-identity = 1.7.0),
        gdrive (pydrive2 = 1.10.0),
        gs (gcsfs = 2021.10.0),
        hdfs (fsspec = 2021.10.0, pyarrow = 5.0.0),
        webhdfs (fsspec = 2021.10.0),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2021.10.0, boto3 = 1.17.106),
        ssh (sshfs = 2021.9.0),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.3),
        webdavs (webdav4 = 0.9.3)
Cache types: symlink
Cache directory: ext4 on /dev/sdc1
Caches: local
Remotes: local, azure
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-10-20 12:05:28,950 DEBUG: Analytics is enabled.
2021-10-20 12:05:29,039 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmprl9ltfue']'
2021-10-20 12:05:29,042 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmprl9ltfue']'

Delete a blob file do not work

When we do a dvc gc with -c argument, the file is expected to be cleaned on the cloud.
However, the remove on dvc_azure.AzureFileSystem seems do not work, only removes the file locally ?!

push: Azure non-versioned remote not providing error message about not supporting versioning

Bug Report

push: Azure non-versioned remote not providing error message about not supporting versioning

Description

I have been trying out DVC for the first time, using an Azure remote. I mistakenly set version_aware true, but my Azure remote is not version enabled. When trying dvc push, got unexpected error, rather than failed to push data to the cloud - remote 'myremote' does not support versioning. This caused me a lot of debugging time, so an informative error would have been better.

I can see an informative error should have been produced from test_versioning function, but this didn't happen as the function produced an error on line info = dest_fs.info(dest_path), rather than producing the desired informative error message.

The full error is below.

dvc push -v
2023-03-16 14:37:19,723 DEBUG: v2.50.0 (pip), CPython 3.10.4 on Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
2023-03-16 14:37:19,723 DEBUG: command: /home/glenn/Projects/dvc/.venv/bin/dvc push -v
2023-03-16 14:37:19,955 DEBUG: Pushing version-aware files to 'dvs-test/dvc'
2023-03-16 14:37:20,227 DEBUG: '['dvs-test/dvc/data/data2.csv']' file already exists, skipping                                                                    
2023-03-16 14:37:20,548 ERROR: unexpected error                                                                                                                   
Traceback (most recent call last):
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 705, in _ls_blobs
    async for next_blob in blobs:
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/async_paging.py", line 149, in __anext__
    return await self.__anext__()
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/async_paging.py", line 152, in __anext__
    self._page = await self._page_iterator.__anext__()
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/async_paging.py", line 96, in __anext__
    self._response = await self._get_next(self.continuation_token)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/aio/_list_blobs_helper.py", line 90, in _get_next_cb
    process_storage_error(error)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
  File "<string>", line 1, in <module>
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/aio/_list_blobs_helper.py", line 83, in _get_next_cb
    return await self._command(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 79, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/_generated/aio/operations/_container_operations.py", line 1785, in list_blob_hierarchy_segment
    map_error(status_code=response.status_code, response=response, error_map=error_map)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/exceptions.py", line 109, in map_error
    raise error
azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:887a78c3-c01e-0077-7fa7-570344000000
Time:2023-03-16T01:37:20.5036327Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:887a78c3-c01e-0077-7fa7-570344000000
Time:2023-03-16T01:37:20.5036327Z</Message></Error>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 210, in main
    ret = cmd.do_run()
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 60, in run
    processed_files_count = self.repo.push(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 67, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/push.py", line 50, in push
    pushed += _push_worktree(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/push.py", line 119, in _push_worktree
    return push_worktree(repo, remote, targets=targets, jobs=jobs, **kwargs)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/worktree.py", line 176, in push_worktree
    pushed += checkout(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 102, in checkout
    entry.meta = test_versioning(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 38, in test_versioning
    info = dest_fs.info(dest_path)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 467, in info
    return self.fs.info(path, **kwargs)
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 506, in info
    return sync(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 100, in sync
    raise return_result
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 55, in _runner
    result[0] = await coro
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 561, in _info
    out = await self._ls(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 812, in _ls
    output = await self._ls_blobs(
  File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 738, in _ls_blobs
    raise FileNotFoundError
FileNotFoundError

2023-03-16 14:37:20,623 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-03-16 14:37:20,623 DEBUG: Removing '/home/glenn/Projects/.GKTi22QwBRBYvnZLnJQW8Y.tmp'
2023-03-16 14:37:20,623 DEBUG: Removing '/home/glenn/Projects/.GKTi22QwBRBYvnZLnJQW8Y.tmp'
2023-03-16 14:37:20,624 DEBUG: Removing '/home/glenn/Projects/.GKTi22QwBRBYvnZLnJQW8Y.tmp'
2023-03-16 14:37:20,624 DEBUG: Removing '/home/glenn/Projects/dvc/.dvc/cache/.oR9jKxwCthEwWrBWfHWEfp.tmp'
2023-03-16 14:37:20,631 DEBUG: Version info for developers:
DVC version: 2.50.0 (pip)
-------------------------
Platform: Python 3.10.4 on Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Subprojects:
        dvc_data = 0.44.1
        dvc_objects = 0.21.1
        dvc_render = 0.2.0
        dvc_task = 0.2.0
        scmrepo = 0.1.15
Supports:
        azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdc
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdc
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/2b5af32cb49e0e43fcdab0056793557a

Reproduce

  1. dvc init
  2. Set up a non-versioned Azure storage account with hierarchical namespacing
  3. dvc remote add -d myremote azure://mycontainer/path
  4. dvc remote modify myremote version_aware true
  5. dvc remote modify myremote --local account_name <storage_account_name>
  6. dvc remote modify --local myremote connection_string <connection_string>
  7. create dummy file: data/data2.csv
  8. dvc add data/data2.csv
  9. dvc push

Expected

dvc push                          
  0% Pushing to remote 'myremote'  | dvs-test/dvc/data/data2.csv                                                                                                                                        
ERROR: while uploading 'dvs-test/dvc/data/data2.csv', support for versioning could not be detected                                                                
ERROR: failed to push data to the cloud - remote 'myremote' does not support versioning  

Environment information

Output of dvc doctor:

DVC version: 2.50.0 (pip)
-------------------------
Platform: Python 3.10.4 on Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Subprojects:
        dvc_data = 0.44.1
        dvc_objects = 0.21.1
        dvc_render = 0.2.0
        dvc_task = 0.2.0
        scmrepo = 0.1.15
Supports:
        azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdc
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdc
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/2b5af32cb49e0e43fcdab0056793557a

Additional Information (if any):

The data file I was trying to push was actually uploaded to Azure during this process. Once uploaded, I was able to delete the data file and cache locally, and use dvc pull to successfully pull the data.

As seen in the stack trace, I was also getting an error The specified container does not exist. I can confirm the container does exist. When setting version_aware false, dvc push worked as expected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.