Giter Site home page Giter Site logo

Comments (16)

Jinming-Hu avatar Jinming-Hu commented on August 24, 2024 1

@blueww This seems to be a bug in Azurite. Azurite doesn't clear all blob properties when it's overwritten.

from azurite.

blueww avatar blueww commented on August 24, 2024 1

@mikamins , @Jinming-Hu

Thanks for the investigation!
I will look into it and update later.

from azurite.

blueww avatar blueww commented on August 24, 2024

@mikamins

The Azurite debug log has no error occur, so the error you meet should be reported from c++ SDK.

However, from above code, it looks you are using old c++ SDK which is already deprecated. (see link)

Would you please see if you can repro this issue with latest c++ SDK?
If so, we will continue investigation on it.
Here's a migration guild from the old deprecated c++ SDK to latest c++ SDK: https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/storage/MigrationGuide.md

BTW, from the code it looks first upload a blob with 3 bytes length, then download 6 bytes from it.
Currently Azurite will just return the 3 bytes in the blob, which looks is aligned with server behavior per my test. Then not sure why the error happens. So please try to repro the issue with latest c++ SDK. If so, we can find SDK to look why the error happens.

from azurite.

mikamins avatar mikamins commented on August 24, 2024

Azurite is returning the correct status code 206 and returning the partial content as expected. The issue is in the response headers.

The latest C++ SDK performs all operations synchronously, so it can never be adopted by our team within XStore. Could you please investigate why Azurite does not work with Microsoft.Azure.Storage.CPP.v140 v7.5.0?

The MD5 and version headers are only major differences I see between the responses from Azurite and Azure/Storage Emulator. One of them is causing issues with the SDK. Considering the exception message says Calculated MD5 does not match existing property., I suspect the MD5 header

from azurite.

blueww avatar blueww commented on August 24, 2024

@mikamins

From the server responds header (get from fiddler) and Azurite responds header (get from Azurite debug log), of a GetBlob request with "x-ms-range: bytes=0-5" on a blob whose length is 3B.
They are very similar.

Besides same status code, same content, they also have same Content-Range, Content-Length, x-ms-blob-content-md5 headers. Azurite has one additional header content-md5 whose value is also correct.

So not sure why the deprecated C++ SDK report this error.
I can't repro this issue with other SDK like .net.

It will need SDK team support to look into deprecated SDK code and find the issue.
If you would like to continue the investigation, would you please file a github issue to C++ SDK and ask why the error reported?
When we know why the error happen, then we can know how to fix it in Azurite.

Azure Server

HTTP/1.1 206 Partial Content
Content-Length: 3
Content-Type: application/octet-stream
Content-Range: bytes 0-2/3
Last-Modified: Wed, 12 Jun 2024 02:52:43 GMT
Accept-Ranges: bytes
ETag: "0x8DC8A8ABB866A3A"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: cc35a03b-301e-0057-6173-bced59000000
x-ms-client-request-id: a2141546-1af3-4e05-b763-e4f220100d5a
x-ms-version: 2019-07-07
x-ms-creation-time: Wed, 12 Jun 2024 02:52:43 GMT
x-ms-blob-content-md5: kAFQmDzST7DWlj99KOF/cg==
x-ms-lease-status: unlocked
x-ms-lease-state: available
x-ms-blob-type: BlockBlob
x-ms-server-encrypted: true
Date: Wed, 12 Jun 2024 02:52:47 GMT

Azurite

Headers={
"server":"Azurite-Blob/3.30.0",
"last-modified":"Wed, 12 Jun 2024 02:50:58 GMT",
"x-ms-creation-time":"Wed, 12 Jun 2024 02:50:58 GMT",
"content-length":"3",
"content-type":"application/octet-stream",
"content-range":"bytes 0-2/3",
"etag":"\"0x1F4502392A75EB0\"",
"content-md5":"kAFQmDzST7DWlj99KOF/cg==",
"x-ms-blob-type":"BlockBlob",
"x-ms-lease-state":"available",
"x-ms-lease-status":"unlocked",
"x-ms-client-request-id":"4ac1e003-92a8-4722-b5db-819b53abf9fe",
"x-ms-request-id":"579187d7-dd0d-4c3f-9ab5-9197d75ff924",
"x-ms-version":"2024-05-04",
"accept-ranges":"bytes",
"date":"Wed, 12 Jun 2024 02:51:32 GMT",
"x-ms-server-encrypted":"true",
"x-ms-blob-content-md5":"kAFQmDzST7DWlj99KOF/cg=="}

from azurite.

Jinming-Hu avatar Jinming-Hu commented on August 24, 2024

Hi @mikamins , we were not able to reproduce this issue with latest versions of Azurite and C++ SDK. Was the attached log generated with your sample code?

We found

2024-06-07T15:04:19.988Z 60120bfb-be88-425d-8db6-5d7a9eda1537 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/43221676-0E2C-4EF8-AEDD-7FB73B1E18CA?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:4vOY+2gpZE7ww3MmewE83WaD4yiRuAgPdpEscZLzq2Y=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"00eaef02-b1b5-48f1-b585-647d4ec2975f","x-ms-date":"Fri, 07 Jun 2024 15:04:19 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

in your log, x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg== indicates you set blob content md5 by yourself, but you didn't in your sample code.

from azurite.

blueww avatar blueww commented on August 24, 2024

Hi @mikamins,

If you really need the new C++ SDK to support Async call to upgrade to it, you can raise an issue in https://github.com/Azure/azure-sdk-for-cpp/issues to raise your requirement.

from azurite.

mikamins avatar mikamins commented on August 24, 2024

Hi @mikamins , we were not able to reproduce this issue with latest versions of Azurite and C++ SDK. Was the attached log generated with your sample code?

We found

2024-06-07T15:04:19.988Z 60120bfb-be88-425d-8db6-5d7a9eda1537 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/43221676-0E2C-4EF8-AEDD-7FB73B1E18CA?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:4vOY+2gpZE7ww3MmewE83WaD4yiRuAgPdpEscZLzq2Y=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"00eaef02-b1b5-48f1-b585-647d4ec2975f","x-ms-date":"Fri, 07 Jun 2024 15:04:19 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

in your log, x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg== indicates you set blob content md5 by yourself, but you didn't in your sample code.

Yes, the attached log was created with the sample code, and using Azurite 3.30 and azure-storage-cpp 7.50. If you are unable to reproduce, could you provide the exact versions that you used and attach the log?

from azurite.

mikamins avatar mikamins commented on August 24, 2024

I stepped through the sample code in more detail, and the SDK is doing the right thing. Azurite is returning the incorrect md5 when downloading the blob.

Blob setup:

  • An empty blob is created
    • Note: md5("") = d41d8cd98f00b204e9800998ecf8427e, in base64 1B2M2Y8AsgTpgAmY7PhCfg==
  • A block with content "abc" is committed
    • Note: md5("abc") = 900150983cd24fb0d6963f7d28e17f72, in base64 kAFQmDzST7DWlj99KOF/cg==

Failing download:

  • When downloading the byte range 0-5, Azurite is correctly returning 206 Partial Content with content "abc". However, the response header content-md5: 1B2M2Y8AsgTpgAmY7PhCfg== is not correct
    • As noted above 1B2M2Y8AsgTpgAmY7PhCfg== is the md5 for an empty string, but the file content is "abc" at this point
    • Since the response has a content-md5 header, the SDK verifies it with the expected md5 (kAFQmDzST7DWlj99KOF/cg==), and throws on the mismatch
    • Note: Neither Azure or Storage Emulator includes a content-md5 header in this response

Log is attached:
azurite-2024-06-17.log

Download HTTP Request:

GET http://127.0.0.1:10000/devstoreaccount1/unittest/E2642A3C-58CF-4CA4-A7C5-2CE4C7A29B91 HTTP/1.1
Connection: Keep-Alive
Accept-Encoding: peerdist
Authorization: SharedKey devstoreaccount1:JhenggacHCvhOxTnO7qcK8+OaibtuQcSzPTkZ8zu6zw=
User-Agent: Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)
x-ms-client-request-id: 31a5ac86-f1a3-458a-ba22-cc4400be02a9
x-ms-date: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-range: bytes=0-5
x-ms-version: 2019-12-12
X-P2P-PeerDist: Version=1.1
X-P2P-PeerDistEx: MinContentInformation=1.0, MaxContentInformation=2.0
Host: 127.0.0.1:10000

Response with bad content-md5:

HTTP/1.1 206 Partial Content
Server: Azurite-Blob/3.30.0
last-modified: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-creation-time: Tue, 18 Jun 2024 00:17:02 GMT
content-length: 3
content-type: application/octet-stream
content-range: bytes 0-2/3
etag: "0x22AF68371ECD940"
content-md5: 1B2M2Y8AsgTpgAmY7PhCfg==
x-ms-blob-type: BlockBlob
x-ms-lease-state: available
x-ms-lease-status: unlocked
x-ms-client-request-id: 31a5ac86-f1a3-458a-ba22-cc4400be02a9
x-ms-request-id: bff087cc-05f0-4c98-996f-0a39ccd4838e
x-ms-version: 2024-05-04
accept-ranges: bytes
date: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-server-encrypted: true
x-ms-blob-content-md5: 1B2M2Y8AsgTpgAmY7PhCfg==
Connection: keep-alive
Keep-Alive: timeout=5

abc

from azurite.

blueww avatar blueww commented on August 24, 2024

@mikamins

I can't repro this with Azurite.
Azurite will return correct content MD5 "kAFQmDzST7DWlj99KOF/cg==" after commit block list with a block contains "abc".

After look into the debug log shared in the above comment from you, I see you have set header "x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==" when commit the block list, so the wrong content MD5 is send from client side.
If client set the content MD5, Azurite will respect it, else Azurite should have the correct MD5.

2024-06-18T00:17:02.633Z 63c975e5-7b5d-4755-ac9a-a65e617053c7 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/E2642A3C-58CF-4CA4-A7C5-2CE4C7A29B91?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:bprNXXG2v3W9YXS4l8Z9KS6A4MYYoiOhsMWQWqSoKd0=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"31a5ac86-f1a3-458a-ba22-cc4400be02a9","x-ms-date":"Tue, 18 Jun 2024 00:17:02 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

from azurite.

Jinming-Hu avatar Jinming-Hu commented on August 24, 2024

@blueww Track1 SDK keeps state of a blob at client side (state includes blob properties). Is it possible that when we get properties of the old blob (empty content), the local state is populated, then md5 is sent out over the wire when calling CommitBlocks?

This cannot be reproed with public Azure because public Azure service doesn't return blob-md5 for partial read. Hmm, it explains everything.

from azurite.

blueww avatar blueww commented on August 24, 2024

Thanks @Jinming-Hu for the investigation!

Per rest API doc, Put Blob should return Content-MD5, and Azurite is aligned with the rest API doc.
Besides Azurite is returning the correct MD5. (If user set it, return the user set value. )

@mikamins
The suggested way to fix this issue for long term is upgrading to the latest C++ SDK.
Else a workaround is to clear the blob object contentMD5 properties before you run blob.upload_block_list(). Would you please try and see if it works on you scenario?

from azurite.

Jinming-Hu avatar Jinming-Hu commented on August 24, 2024

@blueww

per REST API doc

If the request is to read a specified range and the x-ms-range-get-content-md5 is set to true, the request returns an MD5 hash for the range, as long as the range size is less than or equal to 4 MiB.
If neither of these sets of conditions is true, no value is returned for the Content-MD5 header.

Azurite should fix its wrong behavior.

from azurite.

blueww avatar blueww commented on August 24, 2024

@Jinming-Hu

The REST API doc you shared is for Get blob API.
But the API which get the content MD5 is Put Blob (per the c++ code and Azurite debug log in this issue, blob object get the Content MD5 "1B2M2Y8AsgTpgAmY7PhCfg==" when Put blob with 0 size).
Put Blob API doc should be the one I shared: rest API doc.
And ContentMD5 should be returned per this API doc.

from azurite.

Jinming-Hu avatar Jinming-Hu commented on August 24, 2024

@blueww I don't think we're on the same page. Anyway, the workaround you proposed does sound good to me.

from azurite.

blueww avatar blueww commented on August 24, 2024

Close as the fix PR (#2417 Download blob range only return ContentMD5 when request has header x-ms-range-get-content-md5) has been merged, will be in next Azurite release.

from azurite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.