Comments (13)
Sorry, I don't know much about casing. @MichaelTsengLZ @jeffmcaffer or @jeffmendoza may be able to help here.
from curated-data.
I've proposed a PR at #7838.
from curated-data.
Thanks @sschuberth for finding this. I vaguely recall doing a deep dive on case preservation and sensitivity and feel like there's a doc or issue around that talks about our handling strategy. Seem to recall that Dan did this work back in the day. Can't find it right now. On the surface it would appear that this curation is the right thing to do. I am however concerned about the root cause and whether this this change will either perpetuate or exacerbate the problem (the latter seems unlikely).
It is entirely possible that the actual package changed their casing at some point in time. That seems especially possible in the NuGet world (seen it before). In that case (hah! pun) we'd have to fall back on the aforementioned casing analysis. My vague recollection is that we ended up at case preserving but case insensitive. The basic assumption being that we did not find any providers that were case sensitive (There were some old cases in npm land I think but they were old and appeared to addressed).
/cc @nellshamrell
from curated-data.
clearlydefined/service#157 talks about this somewhat at the implementation level. Still feel like there was an analysis of all the supported (and some prime potential) package managers with some summarization / discussion.
from curated-data.
At least for NuGet, given that the API requires the package ID to be lower case, maybe it makes sense to align on all lower case file names?
Or actually, does the casing of the file name ever matter? I guess not, as the name
from the coordinates
in the YAML should be taken for any lookups. Which means we could always (and for any package manager) align on lower case file names.
from curated-data.
I'm hesitant to make any concrete/general call on this without consulting the prior research. Casing is fraught with oddities. For this particular PR it should be possible to validate that with the changes merged, the curations are still applied to the relevant component/version.
Stepping back a bit, @sschuberth, how did this curation come about with different casing? Is this coming from ORT tooling using lowercase? or some other workflow?
from curated-data.
how did this curation come about with different casing?
I have no idea. To the best of my knowledge none of these curations were initially created by ORT or by myself in person. My brief checks just showed that the initial commit each was done by @clearlydefinedbot.
from curated-data.
yeah, that likely means that the PR was entered via the website and we are unable to do proper attribution (there was a technical challenge IIRC)
Anyway, I believe that NuGet is case preserving but not case sensitive. The metadata for DiscUtils.Core has mixed case (see below) while the protocol/service, as you point out, lowercases everything. So the user view of the package is mixed case.
<package xmlns="http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd">
<metadata>
<id>DiscUtils.Core</id>
<version>0.15.1-ci0007</version>
Reaching way back I believe we our goal was to present users with the most authentic naming so are attempting to preserve the casing. So while the protocol is all lower, the package itself and the user view is mixed.
That still leaves us with the challenge of what casing to use in the underlying implementation given that Git is case-sensitive (your point here). I would have said that if the system is case IN-sensitive, we should toLower() the coordinates for storage. Again, reaching back, I recall there may have been a few places where we look at the storage layer and reverse engineer coordinates. That's not a great and we may have stopped doing that.
HAH! I found the doc
https://github.com/clearlydefined/clearlydefined/blob/533afead4c95e3ace9dbd727e2a166d89d229af8/docs/providers.md
Unfortunately that doesn't quite help in as much it captures observed behavior, not quite the reasoning why.
Path forward.
- This is an awesome discussion and it's likely a good thing to be revisited and properly documented (with reasoning). That should likely be done in either a
service
orclearlydefined
(near the doc) repo issue. Might get lost here - The PR cited here IMO should NOT be merged as it would vary from the stated behavior. Better to be consistently wrong (if we are at all) than inconsistently right. While that may make it LOOK like we split the curation, if we go by the casing in the package itself, we'll always have curation casing that matches the targeted package's casing.
from curated-data.
I would have said that if the system is case IN-sensitive, we should toLower() the coordinates for storage. Again, reaching back, I recall there may have been a few places where we look at the storage layer and reverse engineer coordinates. That's not a great and we may have stopped doing that.
So, what's missing for me in the path forward is:
- Find out whether there still are place where you reverse engineer coordinates from file path in the storage layer.
- If there are such places, get rid of them by looking at the file contents instead of the file path to deduce coordinates.
- Align on all lower case paths, but keep file contents to be case preserving.
- Ensure that no conflicting casing can be added via the website. That is, existing curations must be amended even if the casing is different but the system is case-preserving and case-insensitive.
The PR cited here IMO should NOT be merged
Then please feel free to close my PR (with a rationale pointing back to here) so that it does not accidentally get merged.
from curated-data.
@nellshamrell - unclear if this Issue can be closed?
from curated-data.
It's easy to verify the problem persists. Just try to clone this repo on Windows and you'll still get
Updating files: 100% (6359/6359), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:
'curations/git/github/Azure/azure-sdk-for-go.yaml'
'curations/git/github/azure/azure-sdk-for-go.yaml'
'curations/nuget/nuget/-/7Zip4PowerShell.yaml'
'curations/nuget/nuget/-/7Zip4Powershell.yaml'
'curations/nuget/nuget/-/CodeProject.ObjectPool.yaml'
'curations/nuget/nuget/-/codeproject.objectpool.yaml'
'curations/nuget/nuget/-/DiscUtils.yaml'
'curations/nuget/nuget/-/Discutils.yaml'
'curations/nuget/nuget/-/Elmah.MVC.yaml'
'curations/nuget/nuget/-/Elmah.Mvc.yaml'
'curations/nuget/nuget/-/EMGU.CV.yaml'
'curations/nuget/nuget/-/Emgu.CV.yaml'
'curations/nuget/nuget/-/EnvDTE.yaml'
'curations/nuget/nuget/-/envdte.yaml'
'curations/nuget/nuget/-/EnvDTE100.yaml'
'curations/nuget/nuget/-/envdte100.yaml'
'curations/nuget/nuget/-/EnvDTE80.yaml'
'curations/nuget/nuget/-/envdte80.yaml'
'curations/nuget/nuget/-/EnvDTE90.yaml'
'curations/nuget/nuget/-/envdte90.yaml'
'curations/nuget/nuget/-/EnvDTE90a.yaml'
'curations/nuget/nuget/-/envdte90a.yaml'
'curations/nuget/nuget/-/GitLink.yaml'
'curations/nuget/nuget/-/gitlink.yaml'
'curations/nuget/nuget/-/Jint.yaml'
'curations/nuget/nuget/-/jint.yaml'
'curations/nuget/nuget/-/LINQKit.yaml'
'curations/nuget/nuget/-/LinqKit.yaml'
'curations/nuget/nuget/-/Microsoft.AspNetCore.Authentication.JwtBearer.yaml'
'curations/nuget/nuget/-/microsoft.aspnetcore.authentication.jwtbearer.yaml'
'curations/nuget/nuget/-/MongoDB.LibMongocrypt.yaml'
'curations/nuget/nuget/-/MongoDB.Libmongocrypt.yaml'
'curations/nuget/nuget/-/MSBuildTasks.yaml'
'curations/nuget/nuget/-/MsBuildTasks.yaml'
'curations/nuget/nuget/-/PDFsharp-MigraDoc-GDI.yaml'
'curations/nuget/nuget/-/PDFsharp-MigraDoc-gdi.yaml'
'curations/nuget/nuget/-/React.Core.yaml'
'curations/nuget/nuget/-/react.core.yaml'
'curations/nuget/nuget/-/Semver.yaml'
'curations/nuget/nuget/-/semver.yaml'
'curations/nuget/nuget/-/StructureMap.yaml'
'curations/nuget/nuget/-/structuremap.yaml'
'curations/nuget/nuget/-/System.Data.HashFunction.SpookyHash.yaml'
'curations/nuget/nuget/-/system.data.hashfunction.spookyhash.yaml'
'curations/nuget/nuget/-/System.Windows.Interactivity.WPF.yaml'
'curations/nuget/nuget/-/system.windows.interactivity.wpf.yaml'
'curations/nuget/nuget/-/TinyMCE.JQuery.yaml'
'curations/nuget/nuget/-/TinyMCE.jQuery.yaml'
'curations/nuget/nuget/-/ValueInjecter.yaml'
'curations/nuget/nuget/-/valueinjecter.yaml'
'curations/nuget/nuget/-/VsWhere.yaml'
'curations/nuget/nuget/-/vswhere.yaml'
'curations/nuget/nuget/-/Xamarin.GooglePlayServices.Gcm.yaml'
'curations/nuget/nuget/-/xamarin.googleplayservices.gcm.yaml'
'curations/pypi/pypi/-/Pillow.yaml'
'curations/pypi/pypi/-/pillow.yaml'
'curations/pypi/pypi/-/Resource.yaml'
'curations/pypi/pypi/-/resource.yaml'
'curations/pypi/pypi/-/wxPython.yaml'
'curations/pypi/pypi/-/wxpython.yaml'
from curated-data.
@ariel11 let's keep it open. I don't know when I will have time to address this, but it's good to keep it on our backlog.
from curated-data.
From what I do I use lower case for my files one it's faster to write and to the mainframe will except lower case for filing. And it keeps it from working harder. You start doing fancy stuff then there become problems.as long as there is great security for the files
from curated-data.
Related Issues (20)
- How to add validation for kubernetes/apimachinery
- How to deal with derived/binary files? HOT 13
- Validation failing on valid curation HOT 16
- Declared license not recognized due to capitalization? HOT 1
- Update the default branch
- License information not updated HOT 1
- License info still not updated HOT 1
- License info still not updated HOT 2
- Support Version Wildcards HOT 1
- Implement PRs checks to validate that curated license expressions are actually valid SPDX HOT 2
- Help me
- License info still not updated for HotChocolate.Language 11.0.0-preview.58
- Help me amazing
- not found golang packages HOT 5
- Pending curation on an already merged curation.
- Issue contributing to curated data for DPDK stable HOT 1
- [Idea] Add harvest button to component pages
- Question: Does clearlydefined analyze go-lang dependency transitively? HOT 4
- Google.Protobuf.Tools NuGet packages may not be BSD 3-Clause only HOT 2
- Facets should not be arrays HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from curated-data.