dvc-azure
azure plugin for dvc
Azure plugin for dvc
License: Apache License 2.0
dvc-azure
azure plugin for dvc
It has this fixes: fsspec/adlfs#441
We need to plan this with the upcoming dvc-objects==4.0
release, since it'll remove the workaround in the callback for this.
Working with dvc[azure], installed as of 22.01.2023 (same with dependencies, generally fresh virtual environment), on previously working codebase, which was downloaded from version control – and executing DVC commands (such as pull/repro) results in a following error being thrown:
ERROR: unexpected error - azure is supported, but requires 'dvc-azure' to be installed: cannot import name 'fsspec_loop' from 'fsspec.asyn'
This issue seems like it is caused by some recent changes in the interface of fsspec – its new version was just released three days ago looking at the release history: https://pypi.org/project/fsspec/#history and DVC does not pin it to a specific release if I can see correctly.
When fsspec is manually downgraded to a previous version (pip install fsspec==2022.11.0
), dvc commands work as intended.
Run filesystem-related commands (e.g. pull/repro) with dvc[azure] and the most recent fsspec version (fsspec==2023.1.0) installed.
Commands "getting through"/following with their actual logic without the described error.
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.37.0 (pip)
---------------------------------
Platform: Python 3.10.6 on Linux-5.15.0-56-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 0.28.4
dvc_objects = 0.14.0
dvc_render = 0.0.15
dvc_task = 0.1.6
dvclive = 1.3.2
scmrepo = 0.1.4
Supports:
azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/sda1
Caches: local
Remotes: azure
Workspace directory: xfs on /dev/sda1
Repo: dvc, git
Additional Information (if any):
While I tested it on Azure specifically and marked it in the title as such, other providers might probably be affected as well.
Other than the version from dvc doctor output above, I tested it with dvc[azure] versions 2.8.3 and 2.42 (latest on PyPI) – the same error occurred in both, in 2.8.3 it even crashed when using just dvc without following the command.
FSSPEC 2023.1.0 breaks dvc-azure with the following error:
ImportError: cannot import name 'fsspec_loop' from 'fsspec.asyn' (/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/fsspec/asyn.py)
dvc push
: Get stuck when using Azure Cli loginFollowing this example:
https://dvc.org/doc/command-reference/remote/modify#example-some-azure-authentication-methods
I tried to login with the az cli, add the remotes and then push some files. This gets stuck with:
2022-08-08 14:59:28,666 DEBUG: Preparing to transfer data from '/home/user/tests/dvc_data_registry/.dvc/cache' to 'dvc/cache'
2022-08-08 14:59:28,666 DEBUG: Preparing to collect status from 'dvc/cache'
2022-08-08 14:59:28,666 DEBUG: Collecting status from 'dvc/cache'
2022-08-08 14:59:28,667 DEBUG: Querying 1 oids via object_exists
0% Querying remote cache|
somewhere in _indexed_dir_hashes
in /home/user/envs/dvc/lib/python3.8/site-packages/dvc_data/status.py
, when it tries to access the blob storage.
If you wait for around for like 5 minutes, this will show up:
ERROR: unexpected error - DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot.this issue.
It works, when I use the account_key
authentication.
Cache gets uploaded to blob storage.
az cli version (installed with pip):
{
"azure-cli": "2.39.0",
"azure-cli-core": "2.39.0",
"azure-cli-telemetry": "1.0.6",
"extensions": {
"azure-devops": "0.23.0",
"ml": "2.5.0"
}
}
Output of dvc doctor
:
DVC version: 2.17.0 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.8.0-63-generic-x86_64-with-glibc2.29
Supports:
azure (adlfs = 2022.7.0, knack = 0.9.0, azure-identity = 1.10.0),
webhdfs (fsspec = 2022.7.1),
http (aiohttp = 3.8.1, aiohttp-retry = 2.6.0),
https (aiohttp = 3.8.1, aiohttp-retry = 2.6.0)
Additional Information (if any):
The newest release of dvc-azure (3.1.0) is currently not available on conda-forge:
https://anaconda.org/conda-forge/dvc-azure/files
When trying to push tracked files to remote storage on azure I get a error where the connection times out. The process appear to hang when trying to check for the existence of objects on the remote storage.
2023-03-01 16:30:42,551 DEBUG: v2.45.1 (pip), CPython 3.9.16 on Windows-10-10.0.19044-SP0
2023-03-01 16:30:42,551 DEBUG: command: C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\Scripts\dvc push datasets/test_folder/cb54000282dd4e3891aa8057adc092ff.jpg.dvc -v
2023-03-01 16:30:44,192 DEBUG: Preparing to transfer data from 'C:\NLP_Project\autodocs_dataset_creation\.dvc\cache' to 'autodocs-classifiers-datasets/'
2023-03-01 16:30:44,192 DEBUG: Preparing to collect status from 'autodocs-classifiers-datasets/'
2023-03-01 16:30:44,192 DEBUG: Collecting status from 'autodocs-classifiers-datasets/'
2023-03-01 16:30:44,194 DEBUG: Querying 1 oids via object_exists
2023-03-01 16:33:26,160 ERROR: unexpected error - Connection timeout to host https://storage_account.blob.core.windows.net/autodocs-classifiers-datasets/39/6dfb4c4cbe20ce15cd7ad4a569dd95: Connection timeout to host https://storage_account.blob.core.windows.net/autodocs-classifiers-datasets/39/6dfb4c4cbe20ce15cd7ad4a569dd95:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 980, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\asyncio\base_events.py", line 1050, in create_connection
sock = await self._connect_sock(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\asyncio\base_events.py", line 961, in _connect_sock
await self.sock_connect(sock, address)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\asyncio\selector_events.py", line 500, in sock_connect
return await fut
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\client.py", line 536, in _request
conn = await self._connector.connect(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 540, in connect
proto = await self._create_connection(req, traces, timeout)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 901, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 1175, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\connector.py", line 980, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\async_timeout\__init__.py", line 129, in __aexit__
self._do_exit(exc_type)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\async_timeout\__init__.py", line 212, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\transport\_aiohttp.py", line 257, in send
result = await self.session.request( # type: ignore
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\aiohttp\client.py", line 540, in _request
raise ServerTimeoutError(
aiohttp.client_exceptions.ServerTimeoutError: Connection timeout to host https://storage_account.blob.core.windows.net/autodocs-classifiers-datasets/39/6dfb4c4cbe20ce15cd7ad4a569dd95
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\cli\__init__.py", line 210, in main
ret = cmd.do_run()
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\cli\command.py", line 26, in do_run
return self.run()
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\commands\data_sync.py", line 59, in run
processed_files_count = self.repo.push(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\repo\__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\repo\push.py", line 89, in push
result = self.cloud.push(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\data_cloud.py", line 154, in push
return self.transfer(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc\data_cloud.py", line 135, in transfer
return transfer(src_odb, dest_odb, objs, **kwargs)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_data\hashfile\transfer.py", line 203, in transfer
status = compare_status(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_data\hashfile\status.py", line 178, in compare_status
dest_exists, dest_missing = status(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_data\hashfile\status.py", line 149, in status
odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\db.py", line 412, in oids_exist
return list(wrap_iter(remote_oids, callback))
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\db.py", line 36, in wrap_iter
for index, item in enumerate(iterable, start=1):
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\db.py", line 358, in list_oids_exists
in_remote = self.fs.exists(paths, batch_size=jobs)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\fs\base.py", line 345, in exists
return fut.result()
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\concurrent\futures\_base.py", line 446, in result
return self.__get_result()
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\concurrent\futures\_base.py", line 391, in __get_result
raise self._exception
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\dvc_objects\executors.py", line 134, in batch_coros
result = fut.result()
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\adlfs\spec.py", line 1410, in _exists
if await bc.exists(version_id=version_id):
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\aio\_blob_client_async.py", line 672, in exists
await self._client.blob.get_properties(
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_generated\aio\operations\_blob_operations.py", line 473, in get_properties
pipeline_response = await self._client._pipeline.run( # type: ignore # pylint: disable=protected-access
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 200, in run
return await first_node.send(pipeline_request)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
[Previous line repeated 5 more times]
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\policies\_redirect_async.py", line 62, in send
response = await self.next.send(request)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\policies_async.py", line 137, in send
raise err
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\policies_async.py", line 111, in send
response = await self.next.send(request)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\policies_async.py", line 64, in send
response = await self.next.send(request)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\_base_async.py", line 101, in send
await self._sender.send(request.http_request, **request.context.options),
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\storage\blob\_shared\base_client_async.py", line 176, in send
return await self._transport.send(request, **kwargs)
File "C:\ProgramData\Anaconda3\envs\autodocs_dataset_creation\lib\site-packages\azure\core\pipeline\transport\_aiohttp.py", line 289, in send
raise ServiceRequestError(err, error=err) from err
azure.core.exceptions.ServiceRequestError: Connection timeout to host https://<storage_account>.blob.core.windows.net/container_name/39/6dfb4c4cbe20ce15cd7ad4a569dd95
I expect the data to be pushed to the remote storage.
Platform: Python 3.9.16 on Windows-10-10.0.19044-SP0
Subprojects:
dvc_data = 0.40.3
dvc_objects = 0.19.3
dvc_render = 0.2.0
dvc_task = 0.1.11
dvclive = 2.1.0
scmrepo = 0.1.11
Supports:
azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: NTFS on C:
Caches: local
Remotes: azure
Workspace directory: NTFS on C:
Repo: dvc, git
Additional Information (if any):
Linking to an existing closed bug without resolution: iterative/dvc#8309
I am experiencing the exact same issue with what seems like the same configuration. The storage account in question most definitely exists, and has been tested with azure storage explorer using the same connection string.
Any suggestions would be appreciated.
Since the dvc version in dvc-azure isn't pinned, it is possible that you are using a diffrent version of dvc even if you have pinned the dvc-azure version.
This may leads to issues with using dvc.
For example: The dvc version 3.* is available now. With this version there are some extra fileds added to the config which an older dvc version can't handle and run into errors. We used dvc-azure in a new Python environement with dvc version 3.* , pushed the data in the storage account and tried to download the data in another Python environment. Both environments had pinned the same version of dvc-azure, but since the second environment was created earlier, it still used dvc 2.* and can't handle the new fields.
When pushing larger files (~700MB) to a Azure Blob Storage remote I'm experiencing very slow speeds (3-4 min for a single file = ~4 MB/s). The same file takes around ~10s to upload (~70 MB/s) when using AzCopy (https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-upload).
Is this to be expected, or am I doing something wrong?
Fast upload speed ~ 70 MB/s.
DVC Version: 3.1.0
Python Version: 3.8.10
OS: tried on MacOS Ventura 13.1 and Ubuntu 20.04.6 LTS
Thank you very much for the help!
dvc push
is not really pushing newly changed files remotely even though it confirms the changes.
The remote is an Azure blob storage that has versioning enabled.
When I do dvc push
I do get the confirmation 1 file pushed
but in the end, nothing has been pushed to the remote blob storage.
I can confirm this visually by browsing the files in the blob container (have enabled version_aware
and can see that the modified timestamp
corresponds to the old files) but also by trying dvc pull
on another repo.
dvc remote add -d my_azure azure://my-blob/
dvc repro
& dvc push
dvc repro
& dvc push
againdvc.lock
is updated accordinglyFiles should be updated on remote.
❯ dvc doctor
DVC version: 2.57.0 (pip)
-------------------------
Platform: Python 3.10.6 on Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
dvc_data = 0.51.0
dvc_objects = 0.22.0
dvc_render = 0.5.2
dvc_task = 0.2.1
scmrepo = 1.0.3
Supports:
azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.13.0),
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Config:
Global: /home/user/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/32582e8b1552224ea25e5d697a41250a
We are using DVC in our Azure ML Setup with Azure ML Compute Instances. Lately some of our Data Scientists are experiencing some strange errors with dvc which we can not always reproduce. They seem to be related to the creation date of the compute instances (newly created instances seem to have the problem, older ones don't - even though they are using the same conda environment)
The error may be related to iterative/dvc-objects#180 (move threadpool usage out of odb and into fs)
dvc status --cloud -v
):(<xxx-showcase-xxx-env>) azureuser@<xxx>:/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/<zzz>$ dvc status --cloud -v
2023-03-27 13:13:53,171 DEBUG: v2.51.0 (conda), CPython 3.8.16 on Linux-5.15.0-1031-azure-x86_64-with-glibc2.10
2023-03-27 13:13:53,171 DEBUG: command: /anaconda/envs/<xxx-showcase-xxx-env>/bin/dvc status --cloud -v
2023-03-27 13:13:56,481 DEBUG: Preparing to collect status from '<containerName>/<path>'
2023-03-27 13:13:56,482 DEBUG: Collecting status from '<containerName>/<path>'
2023-03-27 13:13:56,483 DEBUG: Querying 3 oids via object_exists
2023-03-27 13:13:56,660 ERROR: unexpected error - Task <Task pending name='Task-3' coro=<AzureBlobFileSystem._exists() running at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/adlfs/spec.py:1410> cb=[_wait.<locals>._on_completion() at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/tasks.py:518]> got Future <Future pending> attached to a different loop
Traceback (most recent call last):
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/cli/__init__.py", line 210, in main
ret = cmd.do_run()
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/cli/command.py", line 26, in do_run
return self.run()
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/commands/status.py", line 54, in run
st = self.repo.status(
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/repo/__init__.py", line 67, in wrapper
return f(repo, *args, **kwargs)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/repo/status.py", line 121, in status
return _cloud_status(
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/repo/status.py", line 96, in _cloud_status
status_info = self.cloud.status(obj_ids, jobs, remote=remote)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc/data_cloud.py", line 213, in status
return compare_status(
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 178, in compare_status
dest_exists, dest_missing = status(
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 134, in status
exists = hashes.intersection(
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 42, in _indexed_dir_hashes
indexed_dir_exists.update(hashes)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_objects/db.py", line 351, in list_oids_exists
in_remote = self.fs.exists(paths, batch_size=jobs)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 337, in exists
return fut.result()
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/dvc_objects/executors.py", line 132, in batch_coros
result = fut.result()
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/adlfs/spec.py", line 1410, in _exists
if await bc.exists(version_id=version_id):
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 672, in exists
await self._client.blob.get_properties(
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/storage/blob/_generated/aio/operations/_blob_operations.py", line 473, in get_properties
pipeline_response = await self._client._pipeline.run( # type: ignore # pylint: disable=protected-access
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 200, in run
return await first_node.send(pipeline_request)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_base_async.py", line 68, in send
response = await self.next.send(request) # type: ignore
[Previous line repeated 3 more times]
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication_async.py", line 82, in send
await await_result(self.on_request, request)
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/_tools_async.py", line 37, in await_result
return await result # type: ignore
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication_async.py", line 55, in on_request
async with self._lock:
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/locks.py", line 97, in __aenter__
await self.acquire()
File "/anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/locks.py", line 203, in acquire
await fut
RuntimeError: Task <Task pending name='Task-3' coro=<AzureBlobFileSystem._exists() running at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/site-packages/adlfs/spec.py:1410> cb=[_wait.<locals>._on_completion() at /anaconda/envs/<xxx-showcase-xxx-env>/lib/python3.8/asyncio/tasks.py:518]> got Future <Future pending> attached to a different loop
2023-03-27 13:13:56,917 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-03-27 13:13:56,918 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/.LCq7gNtvSuquXHnxQBoqd6.tmp'
2023-03-27 13:13:56,942 DEBUG: link type hardlink is not available ([Errno 95] no more link types left to try out)
2023-03-27 13:13:56,943 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/.LCq7gNtvSuquXHnxQBoqd6.tmp'
2023-03-27 13:13:56,978 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/.LCq7gNtvSuquXHnxQBoqd6.tmp'
2023-03-27 13:13:56,998 DEBUG: Removing '/mnt/batch/tasks/shared/LS_root/mounts/clusters/<xxx>/code/Users/<yyy>/<zzz>/.dvc/cache/.7jDFtgav9UYJPWkFMvGVfq.tmp'
2023-03-27 13:13:57,084 DEBUG: Version info for developers:
DVC version: 2.51.0 (conda)
---------------------------
Platform: Python 3.8.16 on Linux-5.15.0-1031-azure-x86_64-with-glibc2.10
Subprojects:
dvc_data = 0.44.1
dvc_objects = 0.21.1
dvc_render = 0.3.1
dvc_task = 0.2.0
scmrepo = 0.1.17
Supports:
azure (adlfs = 2023.1.0, knack = 0.6.3, azure-identity = 1.12.0),
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: symlink
Cache directory: cifs on //xxxxx.file.core.windows.net/yyyyy
Caches: local
Remotes: azure
Workspace directory: cifs on //xxxxx.file.core.windows.net/yyyyy
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/372c785fffbd7bee53ab6172b6125311
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-03-27 13:13:57,088 DEBUG: Analytics is enabled.
2023-03-27 13:13:57,197 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp98p9vuh7']'
2023-03-27 13:13:57,199 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp98p9vuh7']'
The following workarounds seem to be possible:
With DVC 2.8.1, I am not able to push cache to remote storage at Azure Blob. I am getting aiohttp
's ValueError
: Cannot combine AUTHORIZATION header with AUTH argument or credentials encoded in URL
in either of the auth methods.
I tried authentication using username and az login
, as well as access key.
az login
Cache gets pushed to the remote without error
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.8.1 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-5.14.12-arch1-1-x86_64-with-glibc2.33
Supports:
azure (adlfs = 2021.9.1, knack = 0.8.2, azure-identity = 1.7.0),
gdrive (pydrive2 = 1.10.0),
gs (gcsfs = 2021.10.0),
hdfs (fsspec = 2021.10.0, pyarrow = 5.0.0),
webhdfs (fsspec = 2021.10.0),
http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
s3 (s3fs = 2021.10.0, boto3 = 1.17.106),
ssh (sshfs = 2021.9.0),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.3),
webdavs (webdav4 = 0.9.3)
Cache types: symlink
Cache directory: ext4 on /dev/sdc1
Caches: local
Remotes: local, azure
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git
Additional Information (if any):
Here is the full stacktrace:
❯ dvc push -r azure -v
2021-10-20 12:05:28,463 DEBUG: Preparing to transfer data from ‚../../../storage/dvc' to 'azure://<link to container>'
2021-10-20 12:05:28,463 DEBUG: Preparing to collect status from 'azure://<link to container>'
2021-10-20 12:05:28,494 DEBUG: Collecting status from 'azure://<link to container>'
2021-10-20 12:05:28,495 DEBUG: Querying 2 hashes via object_exists
2021-10-20 12:05:28,517 ERROR: unexpected error - Cannot combine AUTHORIZATION header with AUTH argument or credentials encoded in URL
------------------------------------------------------------
Traceback (most recent call last):
File "/home/user/.local/lib/python3.9/site-packages/dvc/main.py", line 55, in main
ret = cmd.do_run()
File "/home/user/.local/lib/python3.9/site-packages/dvc/command/base.py", line 45, in do_run
return self.run()
File "/home/user/.local/lib/python3.9/site-packages/dvc/command/data_sync.py", line 57, in run
processed_files_count = self.repo.push(
File "/home/user/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 50, in wrapper
return f(repo, *args, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/dvc/repo/push.py", line 48, in push
pushed += self.cloud.push(obj_ids, jobs, remote=remote, odb=odb)
File "/home/user/.local/lib/python3.9/site-packages/dvc/data_cloud.py", line 85, in push
return transfer(
File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/transfer.py", line 153, in transfer
status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/status.py", line 160, in compare_status
dest_exists, dest_missing = status(
File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/status.py", line 122, in status
exists = hashes.intersection(
File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/status.py", line 48, in _indexed_dir_hashes
dir_exists.update(odb.list_hashes_exists(dir_hashes - dir_exists))
File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 415, in list_hashes_exists
ret = list(itertools.compress(hashes, in_remote))
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 608, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 445, in result
return self.__get_result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 406, in exists_with_progress
ret = self.fs.exists(path_info)
File "/home/user/.local/lib/python3.9/site-packages/dvc/fs/fsspec_wrapper.py", line 136, in exists
return self.fs.exists(self._with_bucket(path_info))
File "/home/user/.local/lib/python3.9/site-packages/adlfs/spec.py", line 1350, in exists
return sync(self.loop, self._exists, path)
File "/home/user/.local/lib/python3.9/site-packages/fsspec/asyn.py", line 71, in sync
raise return_result
File "/home/user/.local/lib/python3.9/site-packages/fsspec/asyn.py", line 25, in _runner
result[0] = await coro
File "/home/user/.local/lib/python3.9/site-packages/adlfs/spec.py", line 1372, in _exists
if await bc.exists():
File "/home/user/.local/lib/python3.9/site-packages/azure/core/tracing/decorator_async.py", line 74, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 597, in exists
await self._client.blob.get_properties(
File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_generated/aio/operations/_blob_operations.py", line 394, in get_properties
pipeline_response = await self._client._pipeline.run(request, stream=False, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 215, in run
return await first_node.send(pipeline_request)
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
response = await self.next.send(request) # type: ignore
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
response = await self.next.send(request) # type: ignore
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
response = await self.next.send(request) # type: ignore
[Previous line repeated 5 more times]
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/policies/_redirect_async.py", line 64, in send
response = await self.next.send(request)
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
response = await self.next.send(request) # type: ignore
File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_shared/policies_async.py", line 99, in send
response = await self.next.send(request)
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
response = await self.next.send(request) # type: ignore
File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_shared/policies_async.py", line 56, in send
response = await self.next.send(request)
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
response = await self.next.send(request) # type: ignore
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 83, in send
response = await self.next.send(request) # type: ignore
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/_base_async.py", line 116, in send
await self._sender.send(request.http_request, **request.context.options),
File "/home/user/.local/lib/python3.9/site-packages/azure/storage/blob/_shared/base_client_async.py", line 180, in send
return await self._transport.send(request, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/azure/core/pipeline/transport/_aiohttp.py", line 231, in send
result = await self.session.request( # type: ignore
File "/home/user/.local/lib/python3.9/site-packages/aiohttp/client.py", line 468, in _request
raise ValueError(
ValueError: Cannot combine AUTHORIZATION header with AUTH argument or credentials encoded in URL
------------------------------------------------------------
2021-10-20 12:05:28,949 DEBUG: Version info for developers:
DVC version: 2.8.1 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-5.14.12-arch1-1-x86_64-with-glibc2.33
Supports:
azure (adlfs = 2021.9.1, knack = 0.8.2, azure-identity = 1.7.0),
gdrive (pydrive2 = 1.10.0),
gs (gcsfs = 2021.10.0),
hdfs (fsspec = 2021.10.0, pyarrow = 5.0.0),
webhdfs (fsspec = 2021.10.0),
http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
s3 (s3fs = 2021.10.0, boto3 = 1.17.106),
ssh (sshfs = 2021.9.0),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.3),
webdavs (webdav4 = 0.9.3)
Cache types: symlink
Cache directory: ext4 on /dev/sdc1
Caches: local
Remotes: local, azure
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-10-20 12:05:28,950 DEBUG: Analytics is enabled.
2021-10-20 12:05:29,039 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmprl9ltfue']'
2021-10-20 12:05:29,042 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmprl9ltfue']'
It appears that we actually support it iterative/dvc#7566 (comment) . Does it makes sense to have specific tests?
When we do a dvc gc with -c argument, the file is expected to be cleaned on the cloud.
However, the remove on dvc_azure.AzureFileSystem seems do not work, only removes the file locally ?!
It will also affect users on a fresh install, but we can comment as workaround to manually downgrade/pin azure-storage-blob
For some reason specifically on linux https://github.com/iterative/dvc-azure/actions/runs/6268969092
I have been trying out DVC for the first time, using an Azure remote. I mistakenly set version_aware true
, but my Azure remote is not version enabled. When trying dvc push
, got unexpected error
, rather than failed to push data to the cloud - remote 'myremote' does not support versioning
. This caused me a lot of debugging time, so an informative error would have been better.
I can see an informative error should have been produced from test_versioning
function, but this didn't happen as the function produced an error on line info = dest_fs.info(dest_path)
, rather than producing the desired informative error message.
The full error is below.
dvc push -v
2023-03-16 14:37:19,723 DEBUG: v2.50.0 (pip), CPython 3.10.4 on Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
2023-03-16 14:37:19,723 DEBUG: command: /home/glenn/Projects/dvc/.venv/bin/dvc push -v
2023-03-16 14:37:19,955 DEBUG: Pushing version-aware files to 'dvs-test/dvc'
2023-03-16 14:37:20,227 DEBUG: '['dvs-test/dvc/data/data2.csv']' file already exists, skipping
2023-03-16 14:37:20,548 ERROR: unexpected error
Traceback (most recent call last):
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 705, in _ls_blobs
async for next_blob in blobs:
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/async_paging.py", line 149, in __anext__
return await self.__anext__()
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/async_paging.py", line 152, in __anext__
self._page = await self._page_iterator.__anext__()
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/async_paging.py", line 96, in __anext__
self._response = await self._get_next(self.continuation_token)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/aio/_list_blobs_helper.py", line 90, in _get_next_cb
process_storage_error(error)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "<string>", line 1, in <module>
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/aio/_list_blobs_helper.py", line 83, in _get_next_cb
return await self._command(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/storage/blob/_generated/aio/operations/_container_operations.py", line 1785, in list_blob_hierarchy_segment
map_error(status_code=response.status_code, response=response, error_map=error_map)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/azure/core/exceptions.py", line 109, in map_error
raise error
azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:887a78c3-c01e-0077-7fa7-570344000000
Time:2023-03-16T01:37:20.5036327Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:887a78c3-c01e-0077-7fa7-570344000000
Time:2023-03-16T01:37:20.5036327Z</Message></Error>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 210, in main
ret = cmd.do_run()
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/cli/command.py", line 26, in do_run
return self.run()
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 60, in run
processed_files_count = self.repo.push(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 67, in wrapper
return f(repo, *args, **kwargs)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/push.py", line 50, in push
pushed += _push_worktree(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/push.py", line 119, in _push_worktree
return push_worktree(repo, remote, targets=targets, jobs=jobs, **kwargs)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc/repo/worktree.py", line 176, in push_worktree
pushed += checkout(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 102, in checkout
entry.meta = test_versioning(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 38, in test_versioning
info = dest_fs.info(dest_path)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 467, in info
return self.fs.info(path, **kwargs)
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 506, in info
return sync(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 100, in sync
raise return_result
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/fsspec/asyn.py", line 55, in _runner
result[0] = await coro
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 561, in _info
out = await self._ls(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 812, in _ls
output = await self._ls_blobs(
File "/home/glenn/Projects/dvc/.venv/lib/python3.10/site-packages/adlfs/spec.py", line 738, in _ls_blobs
raise FileNotFoundError
FileNotFoundError
2023-03-16 14:37:20,623 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-03-16 14:37:20,623 DEBUG: Removing '/home/glenn/Projects/.GKTi22QwBRBYvnZLnJQW8Y.tmp'
2023-03-16 14:37:20,623 DEBUG: Removing '/home/glenn/Projects/.GKTi22QwBRBYvnZLnJQW8Y.tmp'
2023-03-16 14:37:20,624 DEBUG: Removing '/home/glenn/Projects/.GKTi22QwBRBYvnZLnJQW8Y.tmp'
2023-03-16 14:37:20,624 DEBUG: Removing '/home/glenn/Projects/dvc/.dvc/cache/.oR9jKxwCthEwWrBWfHWEfp.tmp'
2023-03-16 14:37:20,631 DEBUG: Version info for developers:
DVC version: 2.50.0 (pip)
-------------------------
Platform: Python 3.10.4 on Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Subprojects:
dvc_data = 0.44.1
dvc_objects = 0.21.1
dvc_render = 0.2.0
dvc_task = 0.2.0
scmrepo = 0.1.15
Supports:
azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdc
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdc
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/2b5af32cb49e0e43fcdab0056793557a
dvc push
0% Pushing to remote 'myremote' | dvs-test/dvc/data/data2.csv
ERROR: while uploading 'dvs-test/dvc/data/data2.csv', support for versioning could not be detected
ERROR: failed to push data to the cloud - remote 'myremote' does not support versioning
Output of dvc doctor
:
DVC version: 2.50.0 (pip)
-------------------------
Platform: Python 3.10.4 on Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Subprojects:
dvc_data = 0.44.1
dvc_objects = 0.21.1
dvc_render = 0.2.0
dvc_task = 0.2.0
scmrepo = 0.1.15
Supports:
azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdc
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdc
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/2b5af32cb49e0e43fcdab0056793557a
Additional Information (if any):
The data file I was trying to push was actually uploaded to Azure during this process. Once uploaded, I was able to delete the data file and cache locally, and use dvc pull
to successfully pull the data.
As seen in the stack trace, I was also getting an error The specified container does not exist
. I can confirm the container does exist. When setting version_aware false
, dvc push
worked as expected.
The newest release of dvc-azure (2.23.0
) is currently not available on conda-forge:
https://anaconda.org/conda-forge/dvc-azure/files
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.