Giter Site home page Giter Site logo

azure-storage-net-data-movement's Introduction

Support Statement

  • We will be making only fixes related data integrity and security for 2.0.XX.
  • We will not be adding new storage service version support for this SDK.
  • We will not be back porting fixes and features added to the current version to the versions in this repo
  • We will not be making any changes to the performance characteristics of this SDK.

If you are looking for support for any feature in our new storage service versions (e.g. Blob, File, DataLake) please look to our V12 releases.

SDK Name Version Description NuGet/API Reference Links
Blob Storage SDK v12 for .NET v12.0.0 The next generation Blob Storage SDK. Supports sync and async IO. NuGet - Reference
File Storage SDK v12 for .NET 12.0.0-preview.5 The next generation File Storage SDK. Supports sync and async IO. NuGet - Reference
Data Lake Storage SDK v12 for .NET 12.0.0-preview.6 The next generation Data Lake Storage SDK. Supports sync and async IO. NuGet

Microsoft Azure Storage Data Movement Library (2.0.1)

The Microsoft Azure Storage Data Movement Library designed for high-performance uploading, downloading and copying Azure Storage Blob and File. This library is based on the core data movement framework that powers AzCopy.

For more information about the Azure Storage, please visit Microsoft Azure Storage Documentation.

Note: As of 0.11.0, the namespace has changed to Microsoft.Azure.Storage.DataMovement from Microsoft.WindowsAzure.Storage.DataMovement.

Features

  • Blobs

    • Download/Upload/Copy Blobs.
    • Copy Blobs with synchronous copying, service side asynchronous copying and service side synchronous copying.
    • Concurrently transfer Blobs and Blob chunks, define number of concurrent operations
    • Download Specific Blob Snapshot
  • Files

    • Download/Upload/Copy Files.
    • Copy File with synchronous copying and service side asynchronous copying.
    • Concurrently transfer Files and File ranges, define number of concurrent operations
  • General

    • Track data transfer progress
    • Recover the data transfer
    • Set Access Condition
    • Set User Agent Suffix
    • Directory/recursive transfer

Getting started

For the best development experience, we recommend that developers use the official Microsoft NuGet packages for libraries. NuGet packages are regularly updated with new functionality and hotfixes.

Target Frameworks

  • .NET Framework 4.5.2 or above
  • Netstandard2.0

Requirements

To call Azure services, you must first have an Azure subscription. Sign up for a free trial or use your MSDN subscriber benefits.

Download & Install

Via Git

To get the source code of the SDK via git just type:

git clone https://github.com/Azure/azure-storage-net-data-movement.git
cd azure-storage-net-data-movement

Via NuGet

To get the binaries of this library as distributed by Microsoft, ready for use within your project you can also have them installed by the .NET package manager NuGet.

Install-Package Microsoft.Azure.Storage.DataMovement

Dependencies

Azure Storage Blob Client Library

This version depends on Azure Storage Blob Client Library

Azure Storage File Client Library

This version depends on Azure Storage File Client Library

Code Samples

Find more samples at the sample folder.

Upload a blob

First, include the classes you need, here we include Storage client library, the Storage data movement library and the .NET threading because data movement library provides Task Asynchronous interfaces to transfer storage objects:

using System;
using System.Threading;
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Blob;
using Microsoft.Azure.Storage.DataMovement;

Now use the interfaces provided by Storage client lib to setup the storage context (find more details at how to use Blob Storage from .NET):

string storageConnectionString = "myStorageConnectionString";
CloudStorageAccount account = CloudStorageAccount.Parse(storageConnectionString);
CloudBlobClient blobClient = account.CreateCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.GetContainerReference("mycontainer");
blobContainer.CreateIfNotExists();
string sourcePath = "path\\to\\test.txt";
CloudBlockBlob destBlob = blobContainer.GetBlockBlobReference("myblob");

Once you setup the storage blob context, you can start to use WindowsAzure.Storage.DataMovement.TransferManager to upload the blob and track the upload progress,

// Setup the number of the concurrent operations
TransferManager.Configurations.ParallelOperations = 64;
// Setup the transfer context and track the upload progress
SingleTransferContext context = new SingleTransferContext();
context.ProgressHandler = new Progress<TransferStatus>((progress) =>
{
	Console.WriteLine("Bytes uploaded: {0}", progress.BytesTransferred);
});
// Upload a local blob
var task = TransferManager.UploadAsync(
	sourcePath, destBlob, null, context, CancellationToken.None);
task.Wait();

Copy a blob

First, include the classes you need, which is the same as the sample to Upload a blob.

using System;
using System.Threading;
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Blob;
using Microsoft.Azure.Storage.DataMovement;

Now use the interfaces provided by Storage client lib to setup the storage contexts (find more details at how to use Blob Storage from .NET):

string sourceStorageConnectionString = "sourceStorageConnectionString";
CloudStorageAccount sourceAccount = CloudStorageAccount.Parse(sourceStorageConnectionString);
CloudBlobClient sourceBlobClient = sourceAccount.CreateCloudBlobClient();
CloudBlobContainer sourceBlobContainer = sourceBlobClient.GetContainerReference("sourcecontainer");
CloudBlockBlob sourceBlob = sourceBlobContainer.GetBlockBlobReference("sourceBlobName");

string destStorageConnectionString = "destinationStorageConnectionString";
CloudStorageAccount destAccount = CloudStorageAccount.Parse(destStorageConnectionString);
CloudBlobClient destBlobClient = destAccount.CreateCloudBlobClient();
CloudBlobContainer destBlobContainer = destBlobClient.GetContainerReference("destinationcontainer");
CloudBlockBlob destBlob = destBlobContainer.GetBlockBlobReference("destBlobName");

Once you setup the storage blob contexts, you can start to use WindowsAzure.Storage.DataMovement.TransferManager to copy the blob and track the copy progress:

// Setup the number of the concurrent operations
TransferManager.Configurations.ParallelOperations = 64;
// Setup the transfer context and track the copy progress
SingleTransferContext context = new SingleTransferContext();
context.ProgressHandler = new Progress<TransferStatus>((progress) =>
{
	Console.WriteLine("Bytes Copied: {0}", progress.BytesTransferred);
});

// Copy a blob
var task = TransferManager.CopyAsync(
    sourceBlob, destBlob, CopyMethod.ServiceSideSyncCopy, null, context, CancellationToken.None);
task.Wait();

DMLib supports three different copying methods: Synchronous Copy, Service Side Asynchronous Copy and Service Side Synchronous Copy. The above sample uses Service Side Synchronous Copy. See Choose Copy Method for details on how to choose the copy method.

Best Practice

Increase .NET HTTP connections limit

By default, the .Net HTTP connection limit is 2. This implies that only two concurrent connections can be maintained. It prevents more parallel connections accessing Azure blob storage from your application.

AzCopy will set ServicePointManager.DefaultConnectionLimit to the number of eight multiple the core number by default. To have a comparable performance when using Data Movement Library alone, we recommend you set this value as well.

ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;

Turn off 100-continue

When the property "Expect100Continue" is set to true, client requests that use the PUT and POST methods will add an Expect: 100-continue header to the request and it will expect to receive a 100-Continue response from the server to indicate that the client should send the data to be posted. This mechanism allows clients to avoid sending large amounts of data over the network when the server, based on the request headers, intends to reject the request.

However, once the entire payload is received on the server end, other errors may still occur. And if Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small size storage objects.

ServicePointManager.Expect100Continue = false;

Pattern/Recursive in DMLib

The following matrix explains how the DirectoryOptions.Recursive and DirectoryOptions.SearchPattern properties work in DMLib.

Source Search Pattern Recursive Search Pattern Example Comments
Local Wildcard Match TRUE "foo*.png" The search pattern is a standard wild card match that is applied to the current directory and all subdirectories.
Local Wildcard Match FALSE "foo*.png" The search pattern is a standard wild card match that is applied to the current directory only.
Azure Blob Prefix Match TRUE <domainname>/<container>/<virtualdirectory>/<blobprefix>

"blah.blob.core.windows.net/ipsum/lorem/foo*"
The search pattern is a prefix match.
Azure Blob Exact Match FALSE <domainname>/<container>/<virtualdirectory>/<fullblobname>

"blah.blob.core.windows.net/ipsum/lorem/foobar.png"
The search pattern is an exact match. If the search pattern is an empty string, no blobs will be matched.
Azure File N/A TRUE N/A Recursive search is not supported and will return an error.
Azure File Exact Match FALSE <domainname>/<share>/<directory>/<fullfilename>

"blah.files.core.windows.net/ipsum/lorem/foobar.png"
The search pattern is an exact match. If the search pattern is an empty string, no files will be matched.
  • Default pattern option:

    • Local:*
    • Blob: Empty string
    • File: Empty string
  • Default recursive option: false

Choose Copy Method

DMLib supports three copy methods:

  • Synchronous Copy
    DMLib downloads data from source to memory, and uploads the data from memory to destination.

  • Service Side Asynchronous Copy
    DMLib sends request to Azure Storage Server to start the copying, and monitors its status until the copying is completed.

  • Service Side Synchronous Copy
    DMLib leverages Put Block From URL, Append Block From URL, Put Page From URL to copy Azure Storage Blobs.

Following is suggested copy method for different scenarios:

  • From performance aspect:
    Based on Azure Storage Server's SLA for Storage Accounts, following are suggestions when copying performance is most important to you:
    • To copy between blobs or files inner storage account, Service Side Asynchronous Copy would be suggested.
    • To copy Blobs, Service Side Synchronous Copy would be suggested.
    • To copy Files or copy between Blobs and Files, Synchronous Copy would be suggested. To achieve best copying performance, Synchronous Copy would need a powerful machine in Azure.
  • From cost aspect, Service Side Asynchronous Copy would cost least.
  • From data flow approach, Synchronous Copy would be well controlled. With Synchronous Copy, the data will go through the network configured by you.
  • From supported directions, Synchronous Copy and Service Side Asynchronous Copy support more directions than Service Side Synchronous Copy. See details in below table

Following table shows supported directions with different copy method.

Append Blob Block Blob Page Blob Azure File
Append Blob Synchronous Copy
Service Side Asynchronous Copy
Service Side Synchronous Copy
N/A N/A Synchronous Copy
Service Side Asynchronous Copy
Block Blob N/A Synchronous Copy
Service Side Asynchronous Copy
Service Side Synchronous Copy
N/A Synchronous Copy
Service Side Asynchronous Copy
Page Blob N/A N/A Synchronous Copy
Service Side Asynchronous Copy
Service Side Synchronous Copy
Synchronous Copy
Service Side Asynchronous Copy
File Synchronous Copy
Service Side Asynchronous Copy
Synchronous Copy
Service Side Asynchronous Copy
Synchronous Copy
Service Side Asynchronous Copy
Synchronous Copy
Service Side Asynchronous Copy

Need Help?

Be sure to check out the Microsoft Azure Developer Forums on MSDN if you have trouble with the provided code or use StackOverflow.

Collaborate & Contribute

We gladly accept community contributions.

  • Issues: Please report bugs using the Issues section of GitHub
  • Forums: Interact with the development teams on StackOverflow or the Microsoft Azure Forums
  • Source Code Contributions: Please follow the contribution guidelines for Microsoft Azure open source that details information on onboarding as a contributor

For general suggestions about Microsoft Azure please use our UserVoice forum.

Learn More

azure-storage-net-data-movement's People

Contributors

amnguye avatar azure-dmsh avatar blueww avatar derekbekoe avatar elmo61 avatar emmazhu avatar hason-msft avatar jasonyang-msft avatar jiacfan avatar microsoft-github-policy-service[bot] avatar micurd avatar nowenl avatar oferze avatar pawelpabich avatar takekazuomi avatar thomasclaudiushuber avatar vavane-ms avatar vincent81-jiang avatar vinjiang avatar zhimingyuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-storage-net-data-movement's Issues

TransferSkippedException thrown when not overwriting

I'm trying to use this library to transfer multiple files into blob storage. In order to prevent re-transferring files that are already in blob storage, I compare the Azure MD5 with the file MD5 and return false from the should overwrite callback if they are the same.

I was then surprised to see that an exception was thrown. Would it make more sense to return a status code from the task indicating the transfer was skipped?

No sample for DownloadDirectoryAsync()

Any chance you could add a sample for how to use TransferManager.DownloadDirectoryAsync(). That is a scenario you are currently not covering (you have an example for upload, but not download). I'm having some difficulties, and an example would be most welcome.

ProgressTransfer class removed - code broken

The ProgressTransfer class has been removed causing unit tests to fail. Could you provide some explanation on how to get hold of the progress transfer within the context of a transfer since most examples seem to refer to the old way (i.e. when ProgressTransfer was still available)
Thanks

Synchronous copy between different storage accounts

AzCopy allows for synchronous copy between storage accounts:
http://blogs.msdn.com/b/windowsazurestorage/archive/2015/01/13/azcopy-introducing-synchronous-copy-and-customized-content-type.aspx

Is there an way to achieve this with azure-storage-net-data-movement?
i.e. through C# code and not the command line?

Scenario: I have 2 storage accounts. A 3rd party service places files on temp storage account A.
When I know the files are on A, I need to copy them to my main storage account B.
I don't want to give the 3rd party access to storage account B for security.

Ability to rename content as part of directory download/upload

AzCopy and DMLib account for some naming conflicts related when writing content to a file system. For example, case sensitivity. Automatically handling this at download time is great but I may want to honor case sensitivity if I ever need to upload the content back. Another case that is not accounted for is file and folder with the same name (e.g. B and B/A). Blob storage allows this, Windows file system does not.

In cases like this, I need the control to handle the conflict as part of downloading and the ability to return the content back to its original naming convention as part of upload. I might not have the ability to change the schema/naming conventions of the service I am downloading from (maybe its a production Azure service). I need to account for the conflict during download and honor the schema during upload.

I can think of a few alternatives:

  • .NET SDK - this means I am rolling my own custom directory downloader/uploader to mitigate the name resolution logic. DMLib and AzCopy already do all the heavy lifting, have been battle hardened, and already have to deal with other name conflicts. A generic mechanism seems more appropriate.
  • FileFailed event - listening for this event and then initiating a single file transfer that handles the name conflict is reasonable for edge case conflicts. There are cases where I know every download will have a naming conflict. Also this path only addresses the download. I still need to write my own directory upload logic to modify the name to match the services schema.
  • Fork DMLib - Given than AzCopy and DMLib already handle some file system naming conflicts, I prefer to see a generic mechanism. https://blogs.msdn.microsoft.com/windowsazurestorage/2012/12/03/azcopy-uploadingdownloading-files-for-windows-azure-blobs/

Are there others?

Table Support?

Is there currently open development on adding support for Table Migration using DMLib? I was thinking of looking to add this functionality after I fork, so I was just curious.

BlockBlob Metadata not getting updated when overwriting an existing one

Repro -

  1. Create a CloudBlockBlob with some Metadata.
  2. Get a reference to this blob.
  3. Execute blob.FetchAttributes()
  4. Set blob.Metadata["key"] = "value".
  5. Now execute UploadAsync.

Although the blob gets overwritten however, the metadata still remains the same.

PS - The metadata does get updated in case of a blob which does not exist.

In BlockBlobWriter.cs -

Based on my understanding, there is a FetchAttributes call, if the blob does not exist then this call fails, after that there is a call to UploadBlobAsync and then to CommitAsync which has a call to SetAttributes which sets the metadata.

Now, in the case when the blob already exists the FetchAttributes call overrides the metadata of the blob specified during UploadBlobAsync.

Add support for .net core

I'm trying to upload / download file from blob storage to nano servers on azure, but found no tools can be used directly. Since the azure storage assembly and newtonsoft.json has support the .net core, please add .net core support!

"1 sub transfer(s) failed." on upload

Hello,

The below code works fine when the network bandwidth is good - uploads complete successfully. However, when the network is congested, the below "UploadDirectoryAsync" method throws the error "1 sub transfer(s) failed." frequently. Are there any settings that can be modified to fix this? Any other ideas?

ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;
ServicePointManager.Expect100Continue = false;
var task = TransferManager.UploadDirectoryAsync(path, directory, options, context);
task.Wait();

Here is the stack trace:

InnerException.StackTrace: at Microsoft.WindowsAzure.Storage.DataMovement.MultipleObjectsTransfer.d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.WindowsAzure.Storage.DataMovement.DirectoryTransfer.d__1b.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.WindowsAzure.Storage.DataMovement.TransferManager.d__2.MoveNext()

OutOfMemoryException Copying From Azure Files to Azure Blob

We are copying 4.2 TB of small files from Azure File Storage to Azure Blob Storage. After a decent amount of runtime, the copy fails with an OutOfMemoryException coming from the await on CopyDirectoryAsync(). Debugging locally, we noticed that memory constantly increases indicating there may be a memory leak. I've included our relevant snippets of code below.

transfer parameters:

const int parallelTasks = 16;
ServicePointManager.DefaultConnectionLimit = parallelTasks;
ServicePointManager.Expect100Continue = false;
TransferManager.Configurations.ParallelOperations = parallelTasks;

var copyOptions = new CopyDirectoryOptions
{
   Recursive = true,
};

and the transfer code...

private async Task CopyDirectoryAsync(CloudFileDirectory sourceDirectory, CloudBlobDirectory destinationDirectory, CopyDirectoryOptions copyOptions,
TransferContext context)
{
   try
   {
         await TransferManager.CopyDirectoryAsync(sourceDirectory, destinationDirectory, true, copyOptions, context);
   }
   catch (TransferException)
   {

   }
   catch(Exception e)          // OutOfMemoryException wrapped in AggregateException here.
   { 
         var exceptions = new List<Exception> { e };
         var aggregateException = e as AggregateException;

         if (aggregateException != null && aggregateException.InnerExceptions != null)
            exceptions.AddRange(aggregateException.InnerExceptions);

         exceptions.ForEach(exc => this.LogException(exc, SeverityLevelEnum.Error));
         throw;
   }
}

403 (Not authorized) when trying to download newly created blob with SAS

I'm uploading small text file, to newly created container, that contains only that file. No problems.
Then I'm regenerating URL with SAS token to download it (this is integration tests verification code). Code that creates policy and token looks like following [access service]:

`
public async Task GetSasTokenAsync(string containerName, bool isReadOnly)
{
string policyName;

        // TODO: move to the config
        int expirationHours = 24;

        var blobClient = this.account.CreateCloudBlobClient();
        var container = blobClient.GetContainerReference(containerName);

        // Get the current permissions for the blob container.
        var blobPermissions = await container.GetPermissionsAsync();

        if (!isReadOnly)
        {
            policyName = $"Write_{DateTime.UtcNow.AddHours(-1).ToString("yyyyMMdd")}";
        }
        else
        {
            policyName = $"Read_{DateTime.UtcNow.ToString("yyyyMMdd")}";
        }

        if (blobPermissions.SharedAccessPolicies.ContainsKey(policyName))
        {
            return container.GetSharedAccessSignature(new SharedAccessBlobPolicy(), policyName);
        }
        this.RemoveExpiredPolicies(blobPermissions);

        // The new shared access policy provides read/write access to the container for 24 hours.
        var sharedAccessPolicy = new SharedAccessBlobPolicy()
        {
            // To ensure SAS is valid immediately, don’t set the start time.
            // This way, you can avoid failures caused by small clock differences.
            SharedAccessExpiryTime = DateTime.UtcNow.AddHours(expirationHours),
            SharedAccessStartTime = DateTime.UtcNow.AddHours(-1)

        };
        if (!isReadOnly)
        {
            sharedAccessPolicy.Permissions = SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Read;
        }
        else
        {
            sharedAccessPolicy.Permissions = SharedAccessBlobPermissions.Read;
        }
        blobPermissions.SharedAccessPolicies.Add(policyName, sharedAccessPolicy);

        // The public access setting explicitly specifies that
        // the container is private, so that it can't be accessed anonymously.
        blobPermissions.PublicAccess = BlobContainerPublicAccessType.Off;

        // Set the new stored access policy on the container.
        await container.SetPermissionsAsync(blobPermissions);

        // Get the shared access signature token to share with users.
        var sasToken = container.GetSharedAccessSignature(new SharedAccessBlobPolicy(), policyName);

        return sasToken;
    }

`
(Please note, that setting date one hour back is newest addition, added in hope to solve this issue... Previously there was nothing, as comment above states.)

The download code [test code]:
`
using (Stream stream = new FileStream(info.TargetPath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None))
{
var sourceBlob = new CloudBlockBlob(new Uri(info.DownloadUrl, UriKind.Absolute));
var transferContext = new OverwritingContext()
{
LogLevel = LogLevel.Verbose
};
var downloadOptions = new DownloadOptions
{
DisableContentMD5Validation = true,
SourceAccessCondition = AccessCondition.GenerateIfExistsCondition()
};

                // .. here attaching event handlers...

                await TransferManager.DownloadAsync(sourceBlob, stream, downloadOptions, transferContext, cancellationToken);

                stream.Close();
            }

`

Where info.DownloadUrl contains generated URL, like https://[domain].blob.core.windows.net/53f3d657-8926-41dc-b3b4-41d30272339a/86e3274c-34b7-4675-ac71-ed030e6dccbe?sv=2016-05-31&sr=c&si=Read_20170508&sig=N5XDYdRTWxIiYrTy9vlSezMiq0GFmAhNUApH1kAtRHk%3D.

So, the thing is. If using this URL from web browser it works perfectly. When I use TransferManager within less than 25 seconds after receiving URL from access service it fails with 403. If I just sleep 25 seconds after receiving the URL and then try to use TransferManager - everything seems to work just fine.
So I'm guessing that there is something wrong within TransferManager itself rather than our token-generation code, but who knows...

Timeout while DownloadDirectoryAsync is running

I am trying to get a synchronous blob listing while a large transfer is in progress, using TransferManager.DownloadDirectoryAsync(...)

I am seeing a timeout (StorageException) on the following method:

if (container.Exists(requestOptions: new BlobRequestOptions() { MaximumExecutionTime = _defaultTimeout }) == true) { /* get listing */ }

It doesn't seem to matter how large the timeout is.
If no transfer is running, the above code works just fine.

Add GzipStream support

I am now using DM library to move big files from on-premise to Azure storage, once the data is in cloud, it will be consumed by Hive and Spark. Since both Hive and Spark support decompress and read gzip compressed files, am thinking if DM can support GzipStream when uploaded to cloud?

VS Project won't compile

When trying to build the sample project, I get the following error:

Severity Code Description Project File Line Suppression State
Error The command "..\DMLibTestCodeGen\bin\Debug\DMLibTestCodeGen.exe bin\Debug\DMLibTest.dll Generated" exited with code -532462766. DMLibTest C:\Users\larrywa\Downloads\DataMovement\test\DMLibTest\DMLibTest.csproj 214

In the project DMTestLib in file DMLibTest.csproj.

Actually I can't find any .csproj files in the project at all.

Can we have an async version of should overwrite callback?

As part of a project, I'm using this to upload files into azure blob storage. Part of the processing checks to see if the MD5 of a file on disk is the same as in blob storage, and if it is to skip the file.

Ideally I want to asynchronously calculate the MD5 for the files if they exist in azure, but the callback is not asynchronous.

Are there any plans to bring this more in line with other .Net core APIs and make things async by default?

Upload sample with pause/resume

We're trying to upload a large file and want to support pause/resume. However the upload doesn't finish and is stuck in the while loop:

TransferManager.Configurations.BlockSize = 4 * 1024 * 1024; // 4 MB

// create stream with 100 MB
var content = Enumerable
    .Range(0, 100 * 1024 * 1024)
    .Select(i => (byte)'b')
    .ToArray();
var source = new MemoryStream(content);

TransferCheckpoint checkPoint = null;
while (true)
{
    Console.Write(".");
    // Resume if there is already a checkpoint
    var context = checkPoint != null
        ? new SingleTransferContext(checkPoint)
        : new SingleTransferContext();

    context.ShouldOverwriteCallback = (src, dest) => true;
    context.LogLevel = Microsoft.WindowsAzure.Storage.LogLevel.Verbose;
    var recorder = new ProgressRecorder();
    context.ProgressHandler = recorder;
    try
    {
        // Interrupt upload after 20 seconds
        using (var cts = new CancellationTokenSource(TimeSpan.FromSeconds(20)))
        {
            await TransferManager.UploadAsync(source, blob, null, context, cts.Token);
        }
        // break if no exception occured, i.e. the upload finished
        break;
    }
    catch (OperationCanceledException) { }
    checkPoint = context.LastCheckpoint; // save the last checkpoint
}

What are we doing wrong?

How to know which data slice is being copied

I have 40 mil blobs of 10 TB in blob storage. I am using DML to copy these into another storage account for backup purpose. It took nearly 2 weeks to complete. Now i am worried that until which date the blobs are copied to target directory. Is it the date when the job started or the date job finished ?

Does DML uses anything like data slices ?

how to overwrite by datemodified?

I installed 0.4.1 version and there are some changes comparing to previous one. But I cant figure out how to overwrite my existing images if they are changed on my local directory. I believe i need to syncronize datemodified between source and destination but how can I do that?

is ShouldOverwriteCallback function what I am looking for? how is the usage of it?

thanks.

ParallelOperations and DefaultConnectionLimit

I am implementing the ParallelOperations property on the TransferManager. I have run a number of tests with it set to different values (8, 16, 32, 64) and the results are confusing and inconsistent.

Is there any guidance for how to determine what this value should be?

Do I (or can I) set this value in conjunction with ServicePointManager.DefaultConnectionLimit where it is equal to Environment.ProcessorCount * 8? I want to make sure the ParallelOperations property is tuned to the VM it is running in.

Also, why is your guidance for DefaultConnectionLimit = Environment.ProcessorCount * 8? The guidance I have seen for this suggests it should be Environment.ProcessorCount * 12.

Thanks.

Setting CopyOptions for Incremental Copy

I'm having some trouble understanding the correct settings for CopyOptions to allow DML to overwrite a file when it exists on the transfer destination location.

I've tried multiple combinations of AccessConditions on the source and destination but getting failures on the transfer.

What is the CopyOptions to allow for an incremental copy where the source is newer than the destination to force an overwrite?

Thanks much.

Performance Tips for blob copying

I'm doing some testing on a library that copies all the blobs from one storage account to another. I'm comparing the results of the copy operations to AzCopy and I'm finding AzCopy is orders of magnitude faster.

Are there any performance tips you can share which I can implement when using DML to get my library at or near AzCopy performance? I am already leveraging your guidance for DefaultConnectionLimit and Expect100Continue. Do I need to implement an async type of Parallel.ForEach() when iterating through blobs in a container and then calling CopyAsync() or is there another means I should look at?

Thanks,
Mark

[CoreCLR][Linux][KnownIssue]Got "The transfer failed: Cannot access a closed Stream." exception when retry an upload/download operation.

Got following exception when trying to upload to a page blob. This only happens on Linux when DMLib is retrying on a request.

Microsoft.WindowsAzure.Storage.DataMovement.TransferException: The transfer failed: Cannot access a closed Stream.. ---> Microsoft.WindowsAzure.Storage.StorageException: Cannot access a closed Stream. ---> System.ObjectDisposedException: Cannot access a closed Stream.

at System.IO.__Error.StreamIsClosed()

at System.IO.MemoryStream.Seek(Int64 offset, SeekOrigin loc)

at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpContentFactory.BuildContentFromStream[T](Stream stream, Int64 offset, Nullable1 length, String md5, RESTCommand1 cmd, OperationContext operationContext)

at Microsoft.WindowsAzure.Storage.Blob.CloudPageBlob.<>c__DisplayClass60_0.b__0(RESTCommand`1 cmd, OperationContext ctx)

at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.d__4`1.MoveNext()

--- End of inner exception stack trace ---

at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.d__4`1.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Microsoft.WindowsAzure.Storage.Blob.CloudPageBlob.<>c__DisplayClass47_0.<b__0>d.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.PageBlobWriter.d__12.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.RangeBasedWriter.d__20.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.RangeBasedWriter.d__19.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.RangeBasedWriter.d__13.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.SyncTransferController.d__13.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.TransferControllerBase.d__30.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferScheduler.d__22.MoveNext()

--- End of inner exception stack trace ---

at Microsoft.WindowsAzure.Storage.DataMovement.TransferScheduler.d__22.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.SingleObjectTransfer.d__6.MoveNext() -> Microsoft.WindowsAzure.Storage.StorageException: Cannot access a closed Stream. ---> System.ObjectDisposedException: Cannot access a closed Stream.

at System.IO.__Error.StreamIsClosed()

at System.IO.MemoryStream.Seek(Int64 offset, SeekOrigin loc)

at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpContentFactory.BuildContentFromStream[T](Stream stream, Int64 offset, Nullable1 length, String md5, RESTCommand1 cmd, OperationContext operationContext)

at Microsoft.WindowsAzure.Storage.Blob.CloudPageBlob.<>c__DisplayClass60_0.b__0(RESTCommand`1 cmd, OperationContext ctx)

at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.d__4`1.MoveNext()

--- End of inner exception stack trace ---

at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.d__4`1.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at Microsoft.WindowsAzure.Storage.Blob.CloudPageBlob.<>c__DisplayClass47_0.<b__0>d.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.PageBlobWriter.d__12.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.RangeBasedWriter.d__20.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.RangeBasedWriter.d__19.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.RangeBasedWriter.d__13.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.SyncTransferController.d__13.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.TransferControllerBase.d__30.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()

at Microsoft.WindowsAzure.Storage.DataMovement.TransferScheduler.d__22.MoveNext()

Request Information

RequestID:

RequestDate:

StatusMessage:

Upload throughput drops when uploading larger files

Hi,

I'm unsure if this is the right place to voice this. I also have a thread open on MSDN.

I'm currently using ADML (0.5.0) on a machine with 6 physical cores @ 3.5GHz and a gigabit internet connection.

TransferManager.UploadAsync will upload a 1-2 GB file at an average rate of >800 Mbps; however when I try to upload a 2TB file, the average upload rate struggles to break 100 Mpbs. I have tried this on a couple other machines with similar specs with the same results. AzCopy also struggles to break the 100 Mbps upload rate with the 2TB file, but will upload smaller files close to gigabit speeds.

All uploads are made after setting:

ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8; ServicePointManager.Expect100Continue = false;

Am I neglecting to set some configuration to upload files this large? Is anyone else able to reproduce this?

[Known issue] Build fail for DataMovement_k.sln in Visual studio

repro step:

  1. Sync DMlib git hub to local
  2. open and build DataMovement.sln in visual studio, and it pass build
  3. open and build DataMovement_k.sln in visual studio, and it fail with:

Severity Code Description Project File Line Suppression State
Error An item with the same key has already been added. Key: Microsoft.WindowsAzure.Storage.DataMovement DMLibTest

Workaround:

  1. in cmd window, cd netcore
  2. run build.cmd
  3. in Visual studio which has DataMovement_k.sln opened, clean solution and rebuild, then build pass

How to enable FIPS compliant MD5 setting in this library?

When we use DataMovement library to do blob copy, it generates “This implementation is not part of the Windows Platform FIPS validated cryptographic algorithms.” exception. How can we enable FIPS compliant MD5 setting in this library?

Copying Containers Functionality?

Is there a task in the pipeline to add support to copy all blobs from an existing container? Using AzCopy, we can pass the container name as part of the Uri and it will move all blobs in that container to another defined container. Is this supported by the new Data Movement library?

NullReferenceException from MultipleObjectsTransfer when using UploadDirectoryAsync (possible hang on upload completion)

Using 0.2.0 of DMLib to facilitate transferring a set of files from a local physical disk to Azure Blob storage (on Azure Gov) via UploadDirectoryAsync. Testing with 12 files totaling roughly ~500 MB.

I'd guess about 50% of the time, we catch this exception on our app's unobserved handler:

TaskScheduler.UnobservedTaskException: Task1. Message: A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. Exception: 1 System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.WindowsAzure.Storage.DataMovement.MultipleObjectsTransfer.<DoTransfer>d__16.MoveNext()

We've put the upload method in it's own method in our app:

private static Task UploadToDirectory(TransferLocation source, TransferContext context, CancellationToken cToken, CloudBlobDirectory destination) { var uploadOptions = new UploadDirectoryOptions { BlobType = BlobType.BlockBlob, Recursive = true, SearchPattern = "*" }; return XferManager.UploadDirectoryAsync(source.SourceUri.LocalPath, destination, uploadOptions, context, cToken); }

Nothing exotic there. Just aliasing TransferManager as XferManager as we have an existing type named that.

I pulled the UploadDirectoryAsync logic out of our production application into a test application to isolate it from our code. I wired up all the transfer context events to see all the file behaviour, as well as the operation context to monitor the requests. All files transfer without skips or fails, but the method's task never completes. I don't see any lingering requests. I think our production application is more aggressive in terminating the task, as I was able to leave the test app running overnight without the method ever completing. Just hangs and never completes.

Tried swapping the 6.2 Azure Storage library for 7.0 (and 7.0.1-preview) without any difference. Not sure if this an Azure Gov vs Commercial issue, a bug in the Data Movement library, or some interaction between the two.

Inject data transform logic

I have some scenarios when data needs to be transformed prior to uploading to storage. One example is use my own hash algorithm to encrypt a few column data. Am thinking if DM can provide a interface where we can pass some functions for data transform and DM can call them when during the streaming process, for example, apply the function on each line or each file, etc.

Blob download gets stuck

In our prod environment, Sometimes I see a behavior that a file is download about 60% and then nothing happens. It does not errors out nor returns. With this happening the download is stuck indefinitely. The handle to the file where it is written is not released. I am assuming that the task that is create in StartSchedule() method in TransferScheduler class is running indefinitely. The dumps does not reveal anything. Is there a way to figure what might be the issue ?

Update:
I am using version 0.5.0

[CoreCLR][KnownIssue]MacOS stores Hangul syllables in decomposed way

When downloading a blob whose name includes Korean Hangul syllables, its destination name may seem different with the source name, because azure could deal with precomposed way to store Hangul syllables while MacOS only stores Hangul syllables in decomposed way, like: when blob name includes 쮥, the destination file in MacOS file system will be 쮥.

Documentation Quality

Code samples / examples should be supplementary to reference quality documentation, especially for a class library. The documentation should focus first on providing conceptual help, and then code samples can be used to demonstrate those concepts.

This problem is rampant across Microsoft.

Cheers,
Trevor Sullivan
Microsoft MVP PowerShell
http://trevorsullivan.net
http://twitter.com/pcgeek86

Cancelling UploadAsync using CancellationToken can take 3 minutes for 100MB file.

Hi,
I'm using a CancellationToken to enable users to cancel a file upload, however for a 128MB file upload, it can take over 3 minutes for the UploadAsync exception to be raised, after the _cancellationToken.Cancel() has been called.
Note that I'm also using a TransferProgress object to track the progress of the upload, and after cancelling the upload via the token, I am still seeing my Report() method on my TransferProgress object being called multiple times before the UploadAsync exception is raised.
I've tried having a look through the DML code, but I need some help here finding out why it's taking so long. I would have hoped that the cancellation token is checked before the progress is updated? Can any improvements be made here?

Network disconnection in the middle of a download operation doesn't throw any exceptions

Using code like:

try
{
    ...
    await TransferManager.DownloadAsync(latestUpd, fs, options, context);
    ...
}
catch (Exception e)
{
    // logging code, etc
    ...
}

I disconnect my wifi interface while it's running, no exceptions are thrown, the Task never completes, so it's stuck indefinitely. Is this a bug?

I know I can code timeout functionality with CancellationToken, but because of variable network connections and variable file sizes, it's extremely difficult to get the good timeout values.

It's slow when trying with code example

==================================================================
@nowenL I try run this example and it's so slow, and by the way it's using my internet network to download and upload the blobs to destination container.
What is wrong?

Linked issue

[CoreCLR][KnownIssue]Hang in when cancelling a request due a dead lock in .Net Core framework

Encountered a hang during function tests against DMLib. It's a known issue in .Net Core framework: https://github.com/dotnet/corefx/issues/10040

Found from DMLib with System.Net.Http:4.1.0

Not Flagged 12708 37 Worker Thread Worker Thread System.Private.CoreLib.ni.dll!System.Threading.SpinWait.SpinOnce Normal
System.Private.CoreLib.ni.dll!System.Threading.SpinWait.SpinOnce()
System.Private.CoreLib.ni.dll!System.Threading.CancellationTokenSource.WaitForCallbackToComplete(System.Threading.CancellationCallbackInfo callbackInfo)
System.Net.Http.dll!System.Net.Http.WinHttpRequestState.DisposeCtrReadFromResponseStream()
System.Net.Http.dll!System.Net.Http.WinHttpRequestCallback.OnRequestReadComplete(System.Net.Http.WinHttpRequestState state, uint bytesRead)
System.Net.Http.dll!System.Net.Http.WinHttpRequestCallback.RequestCallback(System.IntPtr handle, System.Net.Http.WinHttpRequestState state, uint internetStatus, System.IntPtr statusInformation, uint statusInformationLength)
System.Net.Http.dll!System.Net.Http.WinHttpRequestCallback.WinHttpCallback(System.IntPtr handle, System.IntPtr context, uint internetStatus, System.IntPtr statusInformation, uint statusInformationLength)
[Native to Managed Transition]
[Managed to Native Transition]
System.Net.Http.dll!System.Net.Http.WinHttpResponseStream.ReadAsync.AnonymousMethod__0(System.Threading.Tasks.Task previousTask)
System.Private.CoreLib.ni.dll!System.Threading.Tasks.Task.Execute()
System.Private.CoreLib.ni.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
System.Private.CoreLib.ni.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot)
System.Private.CoreLib.ni.dll!System.Threading.Tasks.Task.ExecuteEntry(bool bPreventDoubleExecution)
System.Private.CoreLib.ni.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()

Not Flagged 40036 45 Worker Thread Worker Thread System.Net.Http.dll!System.Net.Http.WinHttpResponseStream.CancelPendingResponseStreamReadOperation Normal
System.Net.Http.dll!System.Net.Http.WinHttpResponseStream.CancelPendingResponseStreamReadOperation()
System.Net.Http.dll!System.Net.Http.WinHttpResponseStream.ReadAsync.AnonymousMethod__17_1(object s)
System.Private.CoreLib.ni.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
System.Private.CoreLib.ni.dll!System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(bool throwOnFirstException)
System.Private.CoreLib.ni.dll!System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(bool throwOnFirstException)
System.Private.CoreLib.ni.dll!System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(bool throwOnFirstException)
System.Private.CoreLib.ni.dll!System.Threading.CancellationTokenSource.Cancel()
Microsoft.WindowsAzure.Storage.DataMovement.dll!Microsoft.WindowsAzure.Storage.DataMovement.TransferControllers.TransferControllerBase.CancelWork.AnonymousMethod__34_0() Line 234
System.Private.CoreLib.ni.dll!System.Threading.Tasks.Task.Execute()
System.Private.CoreLib.ni.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
System.Private.CoreLib.ni.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot)
System.Private.CoreLib.ni.dll!System.Threading.Tasks.Task.ExecuteEntry(bool bPreventDoubleExecution)
System.Private.CoreLib.ni.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()

How do you shutdown TransferManager.

I have a windows service which use TransferManager to upload a backup file (as a blob) every night.

This works fine, but when the process has complete, the CPU for the service does not return to zero and can be as high as 10%.

The problem is not as noticeable on my development machine, but attempting to profile the service still shows it spends quite a lot of time in the TransferScheduler methods StartSchedule and FillInQueue.

Is there some way I can stop all activity until the next backup?

CopyOptions: blobs not copied when using AccessCondition on the destination

I am trying to perform a blob copy only if the destination blob does not already exist (I don't want to overwrite existing destination blobs, even if the source changed).

The following successfully copies my blob from one storage container to the other:

var copyTask = TransferManager.CopyAsync(srcBlobRef, destBlobRef, true);

However, when I use CopyOptions, the blob does not get copied to the destination container, even when the destination blob does not exist:

var opts = new CopyOptions
{
    DestinationAccessCondition = AccessCondition.GenerateIfNotExistsCondition()
};
var copyTask = TransferManager.CopyAsync(srcBlobRef, destBlobRef, true, opts, null);

Because the destination blob does not exist, I would expect the copy operation to execute.

Support for client side encryption

Not sure if it's possible, but it would sure be nice to support the client side encryption available in the normal Azure storage SDK. Since most of the encryption stuff has fixed size result blocks it should be mathematically doable just very unsure how you would hook it all together for parallel type operations.

Copy directory from File Storage to Blob Storage is much slower than AzCopy

Copying a File Storage directory using DML I would expect comparable performance to what I get when I do the same with AzCopy.

When I copy around 4500 files sized between 5KB and 150KB within the same Storage Account it takes around 90 seconds from AzCopy. From DML it takes around 400 seconds. Some times it even times out inside an Azure Storage library.

I set the parameter isServiceCopy to true in my method call. But I have a hypothesis that it actually downloads everything to my computer and uploads it again. I base this on inspecting my network activity during operation. It is around 400-500 Kbps during the transfer and it drops to almost zero when it is done.

The command I run from AzCopy is this.
> azcopy /Source:"https://<myAccount>.file.core.windows.net/fileshare/folder/" /SourceKey:<key> /Dest:"https://<myAccount>.blob.core.windows.net/container/folder" /DestKey:<key> /S

This is taken from my source code using DML:

TransferManager.Configurations.ParallelOperations = 64;
ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;
ServicePointManager.Expect100Continue = false;
var result = await TransferManager.CopyDirectoryAsync(fileDirectory, cloudBlobDirectory, true, new CopyDirectoryOptions {Recursive = true}, new DirectoryTransferContext());

404 when using a SAS in local storage

When I try to use a SAS to write a blob from the client library to development storage, I get a 404 error saying that blob cannot be found. I can change the permissions of the SAS but to no effect. It worked fine when I used UploadAsync but now that I am trying the movement library it doesn't work. I'm wondering if the SAS is not getting sent when using PutBlock? Or is there something that I need to do first?

UploadAsync Blob SAS Issue

I am trying to use a SAS to upload via DM Library. I get a 404 and XML Exception when doing so and nothing is uploaded.
I had a similar result for #15. The SAS that I use is generated for the Blob on a server and then returned to application to be used. Which permissions do I need for the SAS in order to upload?

The exception is:
"System.Xml.XmlException":"Root element is missing."
Source: "System.Xml.ReaderWriter"
StackTrace: at System.Xml.XmlTextReaderImpl.Throw(Exception e)

It occurs in Executor.cs

Where it breaks at the exception, the execution state has a response: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature."

Here is my SAS example:
?sv=2015-12-11&sr=b&si=some-uploads&sig=somesignature&se=2016-12-20T17:34:31Z&sp=rw&api-version=2015-12-11&timeout=300

Here is my code:


var cred = new StorageCredentials(sas);

var blob = new CloudBlockBlob(uri, cred);

TransferManager.Configurations.ParallelOperations = 64;

var context = new SingleTransferContext();

context.ProgressHandler = new Progress<TransferStatus>((res) =>
{
	//progress
});

context.LogLevel = LogLevel.Verbose;

context.FileFailed += (obj, args) =>
{
  //handle failed (not called by the way)
};

await TransferManager.UploadAsync(fullFilePath, blob, null, context, token);

Request for Tech Review

Hi guys. I have created an update to the Azure WebJobs & AzCopy solution I built last year to use DML.

I am planning on writing a blog post on this solution but before I do I wanted to ask if you guys could review the solution I built to make sure I implemented the DML pieces correctly. I am particularly interested if my use of ForEachAsync() is how you parallelize copies in AzCopy.

If you could take a look that would be great. https://github.com/markjbrown/AzureDmlBackup The DML specific code is all within the DmlExec project.

Thanks much,
Mark

Rely on CloudBlob.BlobType rather than blob concrete type

I'm copying blobs using the following functionality:
await TransferManager.CopyAsync(sourceBlob, targetBlob, isServiceCopy: false);
Ideally, I would like to pass to CopyAsync the base class CloudBlob, as I don't care of the blob concrete type.
Internally, SyncTransferController class relies on the blobs concrete types, CloudBlockBlob/CloudPageBlob/CloudAppendBlob, rather than the BlobType property of the CloudBlob base class.

If the developer does not care of the blob concrete type (for example, when implementing a copy service) this forces the him to resolve the blob concrete type prior to copying the blob, something like:

switch (blobType)
{
    case BlobType.AppendBlob:
        targetBlob = container.GetAppendBlobReference(targetBlobName);
        break;

    case BlobType.BlockBlob:
        targetBlob = container.GetBlockBlobReference(targetBlobName);
        break;

    case BlobType.PageBlob:
        targetBlob = container.GetPageBlobReference(targetBlobName);
        break;

    default:
        throw new Exception(string.Format("Unexpected blob type: {0}.", blobType));
}

Is that really required to internally use the concrete type rather than the BlobType property?

"Failed to validate destination." when using UploadDirectoryAsync

I tried UploadAsync to upload 1 file, everything works.

But when I tried to use UploadDirectoryAsync to recursive upload a folder, I hits this "Failed to validate destination", and inner exception is "The remote server returned an error: (403) Forbidden."

CloudBlobContainer blobContainer = new CloudBlobContainer(new Uri(sasUrl));
CloudBlobDirectory destBlobDir = blobContainer.GetDirectoryReference("myDir");
UploadDirectoryOptions options = new UploadDirectoryOptions();
options.BlobType = BlobType.BlockBlob;
var task = TransferManager.UploadDirectoryAsync(sourceFolder, destBlobDir, options, context, CancellationToken.None);
task.Wait();

[CoreCLR][KnownIssue]Authentication failure when sending requests to a blob whose name includes special chars

There is an issue when trying to access a blob whose name includes special chars like “blob[name” on .Net Core, it got an error of “Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.” This only occurs on Windows,
It can be easily reproduced with the following code:

            CloudBlobContainer container = new CloudBlobContainer(new Uri("http://account.blob.core.windows.net/container1"), new StorageCredentials("account", "***"));

            CloudBlockBlob blob = container.GetBlockBlobReference("blobname[blobname");

            blob.FetchAttributesAsync().Wait();

This is a known issue on the CoreFx side.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.