Giter Site home page Giter Site logo

Comments (32)

robertbastian avatar robertbastian commented on September 15, 2024 2

All questions of the form "what if a timezone wants to do something different than the rest of the metazone" should be answered by creating a new metazone. My expectation is that all zones in a metazone fully agree on offsets today and in the future, but maybe that's not guaranteed.

from icu4x.

sffc avatar sffc commented on September 15, 2024 1

One other note: I very frequently encounter people using "PST" to mean Pacific Time, not specifically Pacific Standard Time, and similarly with EST and CST and others. For example, it is very common to see people say "let's meet in San Francisco on September 7 at 10am PST", and if you show up at that time according to the TZDB/CLDR definition, unless it is a time zone nerds meetup, you will be an hour late.

What this means: this is all so imprecise anyway, so let's just land something reasonable and otherwise encourage people to use city-based time zone names. Maybe CLDR can focus on adding a short location format, such as "LA Time" or "NYC Time" to use instead of the ambiguous things it currently uses.

from icu4x.

srl295 avatar srl295 commented on September 15, 2024

Else, if the Standard Offset is 1 less than the IXDTF string's Offset: set zone_variant to Daylight.

Instead of '1 less' couldn't you query the tz data to look for a transition from that data and use it? In other words, couldn't your table have both a standard offset and a daylight offset?

Time Zone ID Standard Offset Daylight Offset
America/Los_Angeles 8 7

Actually, querying the offset table for that exact time 2024-08-29T11:53:18 for America/Los_Angeles should result in an offset of 0700 from GMT.

from icu4x.

sffc avatar sffc commented on September 15, 2024

My goal is, assuming that an IXDTF string is correct (has the correct offset for the given date, time, and time zone), format that data without relying directly on the TZDB at runtime.

I can store both the standard offset and daylight offset for each time zone. I guess my questions then would be:

  1. Does each IANA zone have a stable mapping of what offset is "standard" and which offset is "daylight"?
  2. Is the daylight offset ever not 1 hour more than the standard offset?

from icu4x.

srl295 avatar srl295 commented on September 15, 2024

@sffc

  1. yes. In tzdb it's the SAVE column
  2. in modern zones I'm not sure, but it's not a good reason to hard code it.

from icu4x.

sffc avatar sffc commented on September 15, 2024

Actually I guess the counter example is when a city switches from one metazone to another metazone, not just changing its transition dates, such as what happened last year in Chihuahua, Mexico, which switched from Mountain Time to Central Time

https://www.timeanddate.com/time/zone/mexico/chihuahua

So maybe this mapping needs to be from metazones, not time zones, to what their standard and daylight offsets are?

from icu4x.

srl295 avatar srl295 commented on September 15, 2024

Actually I guess the counter example is when a city switches from one metazone to another metazone, not just changing its transition dates, such as what happened last year in Chihuahua, Mexico, which switched from Mountain Time to Central Time

https://www.timeanddate.com/time/zone/mexico/chihuahua

So maybe this mapping needs to be from metazones, not time zones, to what their standard and daylight offsets are?

a metazone's offsets are valid for that zone for a certain time period. So the Mexico_Pacific and America_Central offsets will be different.

https://github.com/eggert/tz/blob/main/northamerica#L2731-L2732

			<timezone type="America/Chihuahua">
				<usesMetazone to="1998-04-05 09:00" mzone="America_Central"/>
				<usesMetazone to="2022-10-30 08:00" from="1998-04-05 09:00" mzone="Mexico_Pacific"/>
				<usesMetazone from="2022-10-30 08:00" mzone="America_Central"/>
			</timezone>

from icu4x.

sffc avatar sffc commented on September 15, 2024

Does a particular metazone always have the same offsets corresponding to its standard and daylight variants?

from icu4x.

sffc avatar sffc commented on September 15, 2024

It seems that ICU4C determines the zone variant by reading "is the current datetime DST or not" from the TZDB.

That bit appears fetchable from tzif, and it is in the tzif crate:

https://unicode-org.github.io/icu4x/rustdoc/tzif/data/tzif/struct.LocalTimeTypeRecord.html

I think my previous question though is still a valid question to ask. Does a particular metazone always have the same offsets corresponding to its standard and daylight variants? That could perhaps be data that could be added to CLDR.

Also, regarding whether the DST shift should be fixed at 1 hour: it seems that the ICU4C code currently assumes this in multiple places, such as https://github.com/unicode-org/icu/blob/eda184e6af63d6eee1b3a59c61d1695eef44fcb4/icu4c/source/i18n/timezone.cpp#L1241

from icu4x.

BurntSushi avatar BurntSushi commented on September 15, 2024

Also, regarding whether the DST shift should be fixed at 1 hour: it seems that the ICU4C code currently assumes this in multiple places

My favorite counter-example to this is Antarctica/Troll, which uses a DST shift of 2 hours:

$ tail -n1 /usr/share/zoneinfo/Antarctica/Troll
<+00>0<+02>-2,M3.5.0/1,M10.5.0/3

And then there is also the case of Ireland, whose DST shift is inverted from what's typical:

$ tail -n1 /usr/share/zoneinfo/Europe/Dublin
IST-1GMT0,M10.5.0,M3.5.0/1

As you noted, TZ strings invert the sign. So Europe/Dubin uses +0100 for standard time and +0000 for DST.

from icu4x.

nekevss avatar nekevss commented on September 15, 2024

FWIW, here's a markdown table of the output of find -L /usr/share/zoneinfo/ -maxdepth 3 -type f,l | xargs tail -n1. Although, I think it does pull in some noise from /usr/share/zoneinfo/right/.

from icu4x.

nekevss avatar nekevss commented on September 15, 2024

It's already been noted regarding the sign in the POSIX tz string. But just found the below quote in the TZ Variable section of the GNU C LIbrary manual.

This is positive if the local time zone is west of the Prime Meridian and negative if it is east. The hour must be between 0 and 24, and the minute and seconds between 0 and 59.

from icu4x.

robertbastian avatar robertbastian commented on September 15, 2024

I think the question "how to set the ZoneVariant" is an XY problem. For formatting, we need a way to look up a time zone name given an offset (this is the only use case for ZoneVariant). The straightforward solution to this would be to instead of

"ampa": {
   "dt": "Pacific Daylight Time",
   "st": "Pacific Standard Time"
}

store

"ampa": {
   "-7:00": "Pacific Daylight Time",
   "-8:00": "Pacific Standard Time"
}

This doesn't require any additional lookup at runtime, as we already have the offset, and naturally handles any kind of DST (even multiple).

from icu4x.

nordzilla avatar nordzilla commented on September 15, 2024

From @robertbastian

I think the question "how to set the ZoneVariant" is an XY problem. For formatting, we need a way to look up a time zone name given an offset (this is the only use case for ZoneVariant). The straightforward solution to this would be to instead of

"ampa": {
   "dt": "Pacific Daylight Time",
   "st": "Pacific Standard Time"
}

store

"ampa": {
   "-7:00": "Pacific Daylight Time",
   "-8:00": "Pacific Standard Time"
}

This doesn't require any additional lookup at runtime, as we already have the offset, and naturally handles any kind of DST (even multiple).


I agree that I think data in this format would be ideal.

"ampa": {
   "-7:00": "Pacific Daylight Time",
   "-8:00": "Pacific Standard Time"
}

This data could be added to supplemental/metaZones.xml in CLDR.

However, there are a few things to consider:


1) Has a metazone ever changed its associated time variants?

If not, the data is straightforward, exactly as shown above.

If so, this data could still reasonably be captured and added to the file.

Consider a hypothetical situation where America_Central (amce) decided to move its standard-time offset for all of its associated time zones by half an hour for one year, and then changed it back to the way it was before:

"ampa": {
   "-7:00": "Pacific Daylight Time",
   "-8:00": "Pacific Standard Time"
},
"amce": {
  "usesTimeVariants": {
    "-5:00": "Central Daylight Time",
    "-6:00": "Central Standard Time",
    "_to": "2024-09-06 00:00"
  },
  "usesTimeVariants": {
    "-5:00": "Central Daylight Time",
    "-5:30": "Central Standard Time",
    "_from": "2024-09-06 00:00",
    "_to": "2025-09-06 00:00"
  },
  "usesTimeVariants": {
    "-5:00": "Central Daylight Time",
    "-6:00": "Central Standard Time",
    "_from": "2025-09-06 00:00"
  },
},

This format seems reasonable and is the same structure as how Time Zone ID's are mapped to MetaZones in the same file.


2) What would happen if a time zone within an associated metazone observes the same time-variants offsets, but transitions among them at different datetimes than other zones within that metazone?

One relevant example of this is the recent proposal for some of the West Coast states to observe permanent Daylight Savings Time:

https://www.opb.org/article/2024/02/20/oregon-bill-to-end-daylight-saving-time-fails-legislature/

If this were the case, then the offset would remain UTC-7 year round, and those time zones, e.g. America/Los_Angeles would just format to Pacific Daylight Time year round.

This all seems okay to me.


3) What would happen if an individual time zone wants to use use different offsets than the current time-variant offsets established by the metazone?

I am not aware of any such case like this that exists, but I think there are two reasonable solutions:

A) That time zone could switch to a new metazone (either new or preexisting) that matches its desired offsets. This happens all the time.

B) We could add that offset data to CLDR.

"ampa": {
   "-7:00": "Pacific Daylight Time",
   "-7:30": "Pacific Cool New Time",
   "-8:00": "Pacific Standard Time"
},

The time zones that use the prior offsets would go on as usual, and the time zone with the new offset would have its new localized name.

I recall a conversation with @sffc years ago that perhaps daylight_time and standard_time are not great identifiers within the icu4x code base, because sometimes it's formatted as "Summer Time" for example, and in the future it may be possible that there are more than 2 variants.

A format such as this would allow us to be agnostic of naming conventions, instead tying the internationalized name of the variant to an offset.

However, there are a few more considerations to take into account in this case:

3.1) What if a time zone wants to add a new offset, but have the same localized name as another offset?

"ampa": {
   "-7:00": "Pacific Daylight Time",
   "-7:30": "Pacific Standard Time",
   "-8:00": "Pacific Standard Time"
},

This probably wouldn't cause a data ambiguity issue, but I think it would be incredibly confusing, as "Pacific Standard Time" would now be semantically ambiguous.

This should not be allowed.

3.2) What if a metazone wants to add a new localized name for an offset that is already present?

"ampa": {
   "-7:00": "Pacific Daylight Time",
   "-7:00": "Pacific Cool New Time",
   "-8:00": "Pacific Standard Time"
},

This would cause a data issue and should not be allowed.


Conclusion

I don't feel that I have the cycles to take on this work myself right now, but I would support collaborating on making this data available (if people agree it is sound).

Here is an example of when the short metazone identifiers were added to that same CLDR file: https://unicode-org.atlassian.net/browse/CLDR-14607

Filing an issue on Jira would be a good next step if we reach a consensus here.

from icu4x.

nordzilla avatar nordzilla commented on September 15, 2024

All questions of the form "what if a timezone wants to do something different than the rest of the metazone" should be answered by creating a new metazone. My expectation is that all zones in a metazone fully agree on offsets today and in the future, but maybe that's not guaranteed.

That would be much simpler and more stringent. I would agree with imposing these restrictions. I was just trying to think of all the cases.

from icu4x.

sffc avatar sffc commented on September 15, 2024

My favorite counter-example to this is Antarctica/Troll, which uses a DST shift of 2 hours:

Another counter-example to the 60-minute transition: https://www.atlasobscura.com/places/lord-howe-islands-time

from icu4x.

sffc avatar sffc commented on September 15, 2024

I agree with the workaround of creating a new metazone if the offset invariants ever break down. Metazones are purely a CLDR/ICU construction, not TZDB, so we have a lot of latitude for how we handle them.

For example, if all US West Coast states decided to abolish daylight savings time and that Pacific Time should be GMT-7 instead of GMT-8 (a proposal I don't support but which is good for illustrative purposes), then we would need to create a new metazone such as amp2 meaning "version 2 of ampa".

It is highly likely that such changes already occurred in the last 50 years, and we should probably look for them in datagen.

from icu4x.

sffc avatar sffc commented on September 15, 2024

As far as data sources are concerned, it seems perfectly fine to me for this data to be derived from TZDB. Currently ICU4C uses TZDB to determine which zone variant to use when formatting, so if ICU4X used TZDB during datagen, then we should be able to guarantee consistency with ICU4C. ICU4X could manually spawn new "private use" metazones as needed.

from icu4x.

sffc avatar sffc commented on September 15, 2024

OK, one other issue I realized. There are numerous countries that use their own country name as the metazone. The first one I pulled is "kyrg", Kyrgyzstan:

https://en.wikipedia.org/wiki/Kyrgyzstan_Time

Kyrgyzstan has switched between UTC+5 and UTC+6 multiple times, but presumably the metazone has not changed.

from icu4x.

justingrant avatar justingrant commented on September 15, 2024

https://en.wikipedia.org/wiki/Kyrgyzstan_Time

Kyrgyzstan has switched between UTC+5 and UTC+6 multiple times, but presumably the metazone has not changed.

Yeah, this was gonna be my concern: cases where oddball metazones are tidally locked to a country. I assume this fact means that the "use the offset only" idea won't work?

from icu4x.

sffc avatar sffc commented on September 15, 2024

Yeah, this was gonna be my concern: cases where oddball metazones are tidally locked to a country. I assume this fact means that the "use the offset only" idea won't work?

I think it can still "work"; it's just something we need to factor in. A few ways of resolving this:

  1. Should Kyrgystan even have a specific (offset-based) time zone name, since it doesn't have a useful meaning? It is a generic (location-based) time zone name, not a specific time zone name. We could just remove it and fall back to the generic time zone name.
  2. If we need to have a specific time zone name, we could just add both UTC+5 and UTC+6 as offsets with the same name.
  3. Or, we could split it into two metazones.

from icu4x.

justingrant avatar justingrant commented on September 15, 2024

Maybe CLDR can focus on adding a short location format, such as "LA Time" or "NYC Time" to use instead of the ambiguous things it currently uses.

Normal people (other than those who are super-familiar with how IANA timezones work, which is a very small Venn diagram overlap with "normal people") don't use "LA Time" or "NYC time". So I'm not sure it'd make sense to add that to CLDR. I understand the desire for consistency, but this seems to be a case where there's no evading the inconsistency of human language use.

from icu4x.

sffc avatar sffc commented on September 15, 2024

My hypothesis is that "normal people" would understand what you meant by "LA Time", even if they haven't often seen it before, and it is also the most unambiguous definition for an i18n library to produce.

from icu4x.

yumaoka avatar yumaoka commented on September 15, 2024

Random comment for earlier replies.

  • IANA TZ Database files has DST flags. But the information is lost in standard zone data binaries. If you just look at the content of zone data binary file, you cannot tell if a given time is in DST or not. Of course, you can guess DST or not by looking around offset around the time. For example, UTC offset of America/Los_Angeles on 2024-09-01T00:00:00Z is UTC--07:00. But there is no info about whether it's DST or not in zone data binary. ICU want to keep the info to support old TimeZone API, and ICU zone compiler made some modification to store the flag along with zone offset transition data.

  • IANA TZ Database contains DST offset not exactly 1 hour. For example, Australia/Lord_Howe advances 30 minutes in DST. There are many other zones using non-1 hour DST changes historically.

  • Metazone is not associated with specific UTC offsets. Metazone is associated with a set of names. Because North America and Europe assign names associated with standard offsets, you might think standard offset and Metazone are related. Someone commented Metazone with multiple historic standard offsets are odd balls, but I would say North America/Europe are actually exceptional.

I think the concept of ZoneVariant in the struct is problematic.

from icu4x.

robertbastian avatar robertbastian commented on September 15, 2024

Random observation:

Same time Formatted with generic TZ
2024-07-01T12:00:00-06:00[America/Denver] 12:00 Mountain Time
2024-07-01T11:00:00-07:00[America/Phoenix] 11:00 Mountain Time
2024-07-01T18:00:00Z 18:00 UTC

from icu4x.

nordzilla avatar nordzilla commented on September 15, 2024

From @robertbastian:

Random observation:

Same time Formatted with generic TZ
2024-07-01T12:00:00-06:00[America/Denver] 12:00 Mountain Time
2024-07-01T11:00:00-07:00[America/Phoenix] 11:00 Mountain Time
2024-07-01T18:00:00Z 18:00 UTC

These are all technically correct, though confusing. They're both Mountain Time. It's just that Denver is in Mountain Daylight Time and Phoenix is in Mountain Standard Time because Arizona does not observe DST.

I would argue that this is a reason why populating the ZoneVariant struct whenever possible is worthwhile.


EDIT:

Though, to clarify, the above "Mountain Time" formats are "Generic non-location format".

The UTS-35 spec defines several formats with fallbacking:

Generic non-location format

Examples: "Pacific Time" (long), "PT" (short)

Generic partial location format

Examples: "Pacific Time (Canada)" (long), "PT (Whitehorse)" (short)

Generic location format

Examples: "France Time", "Italy Time"

Specific non-location format

Examples: "Pacific Standard Time" (long), "PST" (short), "Pacific Daylight Time" (long), "PDT" (short)

Localized GMT format

Examples: "GMT+03:30" (long), "GMT+3:30" (short), "UTC-03.00" (long), "UTC" (for zero offset)

ISO 8601 time zone formats

Examples: "-0800" (basic), "-08:00" (extended), "Z" (for UTC)

It was years ago, so I'm not sure if the current implementations within ICU4X are exactly the same, but I tried to implement the fallbacking rules according to the spec.

The above strings have enough information available to utilize either Generic location format e.g. Phoenix Time, or Generic partial location format e.g. Mountain Time (Denver).

from icu4x.

justingrant avatar justingrant commented on September 15, 2024

Generic partial location format

Examples: "Pacific Time (Canada)" (long), "PT (Whitehorse)" (short)

FWIW, I think this is a nice solution to this problem described above, where if there's a colloquial name for a time zone like "Pacific Time", it's still used but with a disambiguator for less common cases like Arizona.

from icu4x.

sffc avatar sffc commented on September 15, 2024

The observation about generic non-location being ambiguous is well known and largely working as intended. It should only be used if the location of the event is known from context. Here is the language I wrote for how to select your time zone style in semantic skeleta:

  • Specific: A time zone that unambiguously maps the time of day to an instant, which can be understood independently of the location or time of year. This field could resolve to specific non-location (pattern symbol "x", "xxxx") or offset (pattern symbols "O", "OOOO"), depending on the locale, length, and time zone identity.
  • Generic: A time zone based on the location of an event. This field could resolve to generic non-location (pattern symbols "v", "vvvv"), generic partial-location, or location (pattern symbol "VVVV"), depending on the locale, length, and time zone identity. Do not use this field if the location of the event is unknown from context, because doing so could lead to ambiguity.
  • Location: A time zone based on the identity of the IANA time zone. This field always resolves to the location format (pattern symbol "VVVV").
  • Offset: A time zone based on the time offset from UTC.

from icu4x.

sffc avatar sffc commented on September 15, 2024

Example use cases where generic time zone style is acceptable:

  • Meet me at the Google San Francisco office at 11:00am Pacific Time.
  • All year round, the bells at St. John's Cathedral strike at 12:00pm Mountain Time.
  • It's best to hike the Grand Canyon before 4:00pm Mountain Time.
  • Your flight departs St. Louis Lambert airport at 6:25pm Central Time.

Note: In most or all of these cases, it would be acceptable to say "local time" or simply drop the qualifier.

Example where generic time is not acceptable and a different style should be used, unless the location is otherwise known from context:

  • The TV show starts at 6:00pm Mountain Time.
  • The teleconference starts at 8:00am Eastern Time.

My point is that there are enough legitimate use cases for generic non-location format, but since it could introduce ambiguity, it should only be used if the developer opts in.

from icu4x.

robertbastian avatar robertbastian commented on September 15, 2024

Generic partial location format

Examples: "Pacific Time (Canada)" (long), "PT (Whitehorse)" (short)

This seems to be the non-ambiguous version of the generic non-location format. We don't seem to support this in ICU4X, however?


What we need for full correctness is a ZoneVariantCalculator that maps (TimeZoneBcp47Id, DateTime<Iso>) -> (UtcOffset, Option<UtcOffset>). It would do this by storing a sequence of ISO minutes with associated offsets for each zone, similar to MetaZonePeriodsV1.

If there is sufficient overlap between the offset list and the metazone list for each location, they could be combined, as the bulk of these structures will be the keys.

from icu4x.

robertbastian avatar robertbastian commented on September 15, 2024

Re generic partial location format, it sounds like we're meant to detect when a metazone is not specific ambiguous, and add the location to it. We can do that, I've found a lot of non-specific ambiguous metazones in #5515. We can extend the return value of MetazoneCalculator with an is_ambiguous flag, in which case the formatter would add the location (or the offset if locations aren't available).

from icu4x.

sffc avatar sffc commented on September 15, 2024

What we need for full correctness is a ZoneVariantCalculator that maps (TimeZoneBcp47Id, DateTime<Iso>) -> (UtcOffset, Option<UtcOffset>). It would do this by storing a sequence of ISO minutes with associated offsets for each zone, similar to MetaZonePeriodsV1.

If there is sufficient overlap between the offset list and the metazone list for each location, they could be combined, as the bulk of these structures will be the keys.

LGTM

from icu4x.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.