The carbon.txt from thegreenwebfoundation

What about self-hosted websites

Hello,

What about self-hosted websites?

Thanks

Extending to present more green credentials

Good work, Chris! While I totally understand that this is meant for presenting meta data for websites (like robots.txt) that can be crawled, I wonder if there's an opportunity to extend the idea to also present more green credentials for an organisation, such as the providers they choose to use.

For example, to show that the organisation choose a green energy supplier, you might have something like:

[electricity]
www.ecotricity.co.uk

[gas]
www.ecotricity.co.uk

Allowing for supporting documents to existing bodies requesting relevant data

There seems to be no discernible pattern for finding structured information on company websites about their actions relating to environmental or climate responsibility, even when they have invested significant amounts of time making shiny reports like Google has here:

https://storage.googleapis.com/gweb-sustainability.appspot.com/pdf/Google_2018-Environmental-Report.pdf

Or when Scout 24 so something worth sharing

https://csrbericht.scout24.com/wp-content/uploads/2018/04/180418_Scout24_GRI_Chapter_Environment.pdf

Or when they companies share detailed info about their DC use:
https://storage.googleapis.com/gweb-sustainability.appspot.com/pdf/24x7-carbon-free-energy-data-centers.pdf

Or made submissions to groups like Google's submission to the CDP:

https://storage.googleapis.com/gweb-environment.appspot.com/pdf/alphabet-2017-cdp-climate-change-response.pdf

Or when they've made filings inline with the SASB, where Etsy list steps they're taking:

See page 25 in this SEC filing. This stuff is really hard to find!

https://investors.etsy.com/financials/sec-filings/sec-filings-details/default.aspx?FilingId=13261228

Someone who's checking the information in a carbon.txt can reasonably be assumed to care about this too, and making it easier to find.

Define a set of accepted values for the "service" key

The current carbon.txt syntax allows upstream providers to be listed as an object with the keys domain and service. For example:

{ domain="cloud.google.com", service = "infrastructure" },

Currently, the service key is not used/referenced anywhere. However, it is easy to imagine a future where it might come in handy for reporting, checks, or other instances where data might be dissected.

The purpose of this issue is to define a set of accepted values for the service key.

Support a way to confirm ip addresses without exposing it, to avoid DDOS attacks

As Matthew mentions, the shift of the web from relying on transit to relying on CDNs, is
making it harder to rely on the public facing IP of sites now for telling if a site is using green power or not:

https://mobile.twitter.com/dracos/status/1268915142621835269

An example is cloudflare - when they were offsetting for their North American network operations, it made things unclear when they were in the middle and handling bandwidth. This was resolved somewhat when they switched over to account for the emissions in all their regions with RECs and so on, but as more people use them, it means its much harder to sell if the origin server was running on green infra.

Listing the real IP might work in a carbon.txt file, but then one of the key ideas behind DDOS protectio, or using a CDN is not exposing this server to attack, and if you know the IP address it's possible to target this server again.

Define a way for provider locations to be specified

There are cases when one might use a provider's service in just one or a couple of regions. For example, you might use Object Storage services from Provider X, but only provision those services in that provider's us-east location.

In the current carbon.txt specification, there is no way to capture this detail. To provide more granularity and transparency through carbon.txt, there should be a method through which implementors can specify the provider regions they use as part of their service.

Work out how to make it discoverable - `well-known`, TXT records or root domains

We have loads of prior work to look to for establishing convention for this.

The .wellknown convention has been around for ages, and stops us polluting the root namespace
Google uses DNS Text records as a way to tie information to a domain as well.
Amazon and others use the conventions for email address to check SSL certificates, by sending an email to confirm information ought to be associated with a domain. This page outlines how it works

What would this look like?

What about individual websites?

This issue has been created to track future discussion on how the carbon.txt specification can be adopted by individual website owners for their own sites.

Document flow for checking a carbon.txt file

The flow is as follows:

Check the domain name is a valid one.
Check there if there is carbon-txt DNS TXT record for the given domain.
Perform an HTTP request at https://domain.com/carbon.txt, OR the overide URL given as the value in the DNS TXT lookup.
If there is valid 200 response and a parseable file, parse the file.
If there is a no valid 200/OK response at https://domain.com/carbon.txt (i.e. a 404, or 403), check the HTTP for a Via header with a new domain, as a new domain to check.
Repeat steps 1 through 5 until we end up with a 200 response with a parsable carbon.txt payload, or bad request (i.e. 40x, 50x) with no HTTP Via header.

Why do it this way?

This flow is designed to allows CDNs and managed service providers to serve information in a default carbon.txt file, whilst allowing "downstream" providers to share their own, more detailed information if need be.

Why support the carbon.txt DNX TXT record?

Supporting the DNS lookup allows an organisation that owns or operates multiple domains to refer to a single URL for them to maintain.

if you served traffic from a domain like cdn-domain.com, you would add a TXT record to cdn-domain.com, with the following content:

carbon-txt=https://actual-domain.com/carbon.txt

This would set an override url, to allowing multiple domains to point to the one carbon.txt file for a organisation.

The "override URL" also allows for organisations that prefer to serve their file from a .well-known directory to do so:

carbon-txt=https://actual-domain.com/.well-known/carbon.txt

This allows folks to support the .well-known convention of storing files in a clearly identified place where it makes sense to do so, without requiring people who do not know what a .well-known directory is, or for people who do not have control over what is allowed to write to the .well-knowndirectory in a server.

Why use the Via header?

Consider the case where managed-service-provider.com is hosting customer-a.com's website.

The managed service provider may be offering a CDN or managed hosting service, but they may not have control over the customer-a.com domain. They may not have, or want direct control over what a downstream user is sharing at a given url. However because they are offering some service "in front" of customer-a's website, and serving it over a secure connection, they are able to add headers to HTTP requests.

the HTTP Via header exists specifically to serve this purpose, and provides a well specified way to pass along information about a domain of the organisation providing a managed service, when the domain is different.

The link above outlines the spec, but for convenience you would add a header looking like so:

Via: 1.1 alternative-domain.com

Why use domain/carbon.txt as the path?

Defaulting to a root carbon.txt makes it possible to implement a carbon.txt file without needing to know about .well-known directories, that by convention are normally invisible files. Having a single default place to look avoids needing to support a hierarchy of potential places to look, and precedence rules for where to look - there is either one place to default to when making an HTTP request, OR the single override.

Create examples in a directory for common use cases

Ideally we'd have some examples to show what this might look like for common stacks, to people who run them can understand how this would look, and make it easier to adopt.

Own sites, and small-medium hosting company

Own static site on own server -> hosting co -> DC provider -> Energy Co > Carbon Credits
Own wordpress.org on own server -> hosting co -> DC provider -> Energy Co > Carbon Credits
Managed hosting of wordpress.org -> DC Provider -> Energy co > Carbon Credits

Own sites, and cloud giant (Digital Ocean, M$, GCP, AWS, etc.)

The main difference here is that the bigger cloud companies tend to act as both a hosting provider and a DC operator, as they often own their own DC's. They often have more complicated products in their portfolio, like hosted database as a service, object storage, and so on (smaller ones have this too, but it's less common).

Own static site on own server -> hosting co -> DC provider -> Energy Co > Carbon Credits
Own dynamic site site on own server -> hosting co -> DC provider -> Energy Co > Carbon Credits
Own static site on object storage -> hosting co -> DC provider -> Energy Co > Carbon Credits

Larger hosting company, providing carbon.txt data for all their customer's sites

hosted wordpress.com style site -> xxx -> Energy Co > Carbon Credits

To think about

PaaS
Hosted static sites like netlify etc. (would it be different to the static site on own server case? Not sure)
Distributed web cases ( this sounds really hard - Dat acts like a kind of collaborative CDN, for example, so in theory every peer would need to be able to declare where it's own power came from…)

Define how to include comments

I might have missed it, but there doesn't seem to be any mention of how to include human-readable comments in the carbon.txt format. Ignoring anything from a "#" char to the end of line would be fairly standard, and is what robots.txt uses.

create examples folder for discussion about syntax

Supporting multiple domains

Websites rely on content from multiple domains. How do we handle this?

We could do something like:

6 out 8 domains run on green power on this site.

use robots.txt-style 'Name: value' formatting

I think this is a fantastic approach and well worth doing. One thing that would improve it IMO would be to use a "Header: value" style more similar to robots.txt, though;

So for instance, instead of

[upstream]
krystal.co.uk

it'd use something like

[upstream]
Domain: krystal.co.uk

Or other kinds of identifier:

[upstream]
Name: Ecotricity Group Ltd

This gives more room for extensibility in future, where new forms of addressing or relationships can be supported.

thegreenwebfoundation / carbon.txt Goto Github PK

carbon.txt's Introduction

Green Web Foundation API

Overview

Apps - API Server at api.thegreenwebfoundation.org

Packages - Greencheck

Packages - public suffix

carbon.txt's People

Contributors

Stargazers

Watchers

Forkers

carbon.txt's Issues

Why do it this way?

Own sites, and small-medium hosting company

Own sites, and cloud giant (Digital Ocean, M$, GCP, AWS, etc.)

Larger hosting company, providing carbon.txt data for all their customer's sites

To think about

Recommend Projects

Recommend Topics

Recommend Org