Giter Site home page Giter Site logo

Comments (3)

ppKrauss avatar ppKrauss commented on August 17, 2024

Introducing the internal 1 byte ID for country. Each country (or set of "sister countries") is a face in the DNGS mosaic.

PS: the numeric ISO Country code needs 10 bits. This was the problem, to express with 4 bytes all subdivisions, and encode all the hierarchical structures in a hCount int (27 bits).

country_1byte_id

The List of sovereign states have ~200 states. In a technical context, for Census and similar local applications, is possible to join some countries with its "sister country", as Vatican City State (VA), that can be joined with Italy (IT). Perhaps others, examples:

  • Åland Islands (AX), can be expressed as part of Finland.
  • Gibraltar (GI), with less than 10 km2, but it have ~33000 people, not so "insignificant".
COPY (
select *, 1+row_number() over (order by jurisd_base_id) as country_1byte_id 
   -- 1 for future inclusion of Antarctica
from (
  select DISTINCT  jurisd_base_id, substring(isolabel_ext,1,2) as iso2 
  from optim.jurisdiction order by 1
) t
where jurisd_base_id>1 and iso2!='VA' order by jurisd_base_id
)  -- 239 rows in 2024
to '/tmp/country_1byte_id.csv' CSV HEADER;

Some disputed nations like the Republic of Kosovo (XK) can be considered. Other disputed nations, like Diego Garcia (DG) are insignificant in its area (32 km2) and population (~ 4,200).

Status

At this moment removing only DG, XK and VA.
Check on updated (2022) list, https://github.com/datasets/un-locode/blob/main/data/country-codes.csv
But pending iso numeric code.

select l.* from lix l left join optim.jurisdiction j ON j.isolabel_ext=iso2 where j.isolabel_ext is null;
iso2 name
AQ Antarctica
AS American Samoa
PF French Polynesia
PM Saint Pierre and Miquelon
TF French Southern Territories
WF Wallis and Futuna
XZ Installations in International Waters
BL Saint Barthélemy
MF Saint Martin (French Part)
(9 rows)

So, pending to join AQ and perhaps more 8+3=11 nations.

Result

jurisd_base_id iso2 country_1byte_id
4 AF 2
8 AL 3
12 DZ 4
...
32 AR 9
...
68 BO 20
...
76 BR 23
...
858 UY 235
...
887 YE 239
894 ZM 240

from ggeohash.

ppKrauss avatar ppKrauss commented on August 17, 2024

Ok, "ID de face DNGS" está em https://github.com/osm-codes/WS/blob/main/data/country_1byte_id.csv
Ver osm-codes/WS#52

Como osm_id é bigint, não deveriamos e aparentemente não precisamos reduzir o ID de jurisdição ao um inteiro de 4 bytes, podem ser 8. Aí dá e sobra para códigos ISO e códigos locais.

from ggeohash.

ppKrauss avatar ppKrauss commented on August 17, 2024

Retomando o optim.jurisdiction.id mas agora como BigInt e reversível (retoma iso_country e local_id).

Funções de interpretação por país

Primeiros 12 bits para acomodar 10 bits do padrão ISO corrente e reserva de 2 bits para expansão na ISO, apesar de não ser visivel risco em horizonte de uma década. Hoje temos max=894, com 10 bits ainda restariam 130 países. Para os restantes, a meta é a reversibilidade, ou seja, com funções de encode/decode podemos resgatar os IDs oficiais de jurisdição.

Abaixo só revisar para os IDs da ISO, e resolvido.

-- Avaliador de perfil:
SELECT substring(isolabel_ext,1,2) as pais, isolevel, bit_MSB(jurisd_local_id) as bits,
       count(*) n 
FROM optim.jurisdiction where jurisd_local_id>0
GROUP by 1,2,3 order by 1, 4 desc
; -- Chile (CL) cabe em 27 bits, Brasil (BR) não cabe, mas não passa de 23 bits por município (isolevel 3). 
-- BR   |        3 |   22 | 3861
-- BR   |        3 |   23 | 1259

-- Função indepente da escolha de encode/decode:
CREATE or replace FUNCTION optim.jurisd_local_to_vbparts(local_id int, base_id int  default 1, add_face boolean default true) RETURNS varbit[] AS $f$
  SELECT CASE WHEN add_face THEN base_id::bit(12)::varbit || i ELSE i END
  FROM (
   SELECT CASE base_id 
     WHEN 1 THEN  -- BR BRASIL. 3 + 4 + 18 bits = 25 bits.  18+1 para reserva IBGE  para MG e TO . ex.3170701 ou 3171303.
-- usar 4 no lugar de 3 para a representação base16h ficar hierárquica, como colômbia
       CASE 
      WHEN l=1 AND local_id=0 THEN array[ ''::varbit ]
      WHEN l=1 AND local_id>0 THEN array[ ((x::int)::bit(3))::varbit ]
      WHEN l=2 THEN  array[ ((substr(x,1,1)::int)::bit(3))::varbit, ((substr(x,2,1)::int)::bit(4))::varbit ]
      ELSE array[ ((substr(x,1,1)::int)::bit(3))::varbit, ((substr(x,2,1)::int)::bit(4))::varbit, ((substr(x,3)::int)::bit(19))::varbit ]
      END

    WHEN 2 THEN  -- CO COLOMBIA. 4 + 4 + 12 bits. 12+2 para garantir reserva DANE.
      CASE
      WHEN l=1 AND local_id=0 THEN array[ ''::varbit ]
      WHEN l=1 THEN array[ ((0::int)::bit(4))::varbit, ((x::int)::bit(4))::varbit ]
      WHEN l=2 THEN array[ ((substr(x,1,1)::int)::bit(4))::varbit, ((substr(x,2)::int)::bit(4))::varbit ]
      WHEN local_id=26855 THEN array[ ((2::int)::bit(4))::varbit, ((6::int)::bit(4))::varbit ] --  CO-SAP insular.
      WHEN l=4 THEN array[ ((0::int)::bit(4))::varbit, ((substr(x,1,1)::int)::bit(4))::varbit,  ((substr(x,2)::int)::bit(14))::varbit ] -- ex 5125
      ELSE array[ ((substr(x,1,1)::int)::bit(4))::varbit, ((substr(x,2,1)::int)::bit(4))::varbit, ((substr(x,3)::int)::bit(14))::varbit ] -- ex 91798
      END

    -- MEXICO tem ID oficial. 
    -- ... Todos os demais apenas 3 níveis, partindo de contadores isolabel.
    -- Chiele bem equilibrado e até cabe em 27 bits.
    
    ELSE NULL
    END
   FROM (SELECT x, length(x) as l FROM (SELECT local_id::text) t1(x) ) t2
  ) t3(i)
$f$ LANGUAGE SQL IMMUTABLE;
COMMENT ON FUNCTION optim.jurisd_local_to_vbparts(int,int,boolean)
  IS 'Transforma ID inteiro da jurisdição local em varbit para identificação hierárquica, quando for cabível. Quando indefinido retorna NULL. O país é bitstring vazia.'
;
-- SELECT isolabel_ext, jurisd_local_id, optim.jurisd_local_to_vbparts(jurisd_local_id,int_country_id) as bits FROM optim.jurisdiction order by 1;

CREATE or replace FUNCTION optim.jurisd_vbparts_to_dec(p varbit[]) RETURNS text AS $f$
  SELECT CASE natcod.vBit_to_intval(p[1])
     WHEN 1 THEN CASE l
       WHEN 0 THEN '0'  -- 'len '||l::text
       WHEN 3 THEN natcod.vBit_to_intval(b'0'||p[2])::text
       WHEN 7 THEN natcod.vBit_to_intval(p[2])::text || natcod.vBit_to_intval(p[3])::text
       ELSE natcod.vBit_to_intval(p[2])::text || natcod.vBit_to_intval(p[3])::text || TO_CHAR(natcod.vBit_to_intval(p[4]),'fm00000')
       END
 
    WHEN 2 THEN CASE l
       WHEN 0 THEN '0'
       WHEN 4 THEN natcod.vBit_to_intval(p[2])::text
       WHEN 8 THEN iif( p[2]||p[3]=b'00100110', '26855',  natcod.vBit_to_intval(p[2])::text || natcod.vBit_to_intval(p[3])::text )
       ELSE natcod.vBit_to_intval(p[2])::text || natcod.vBit_to_intval(p[3])::text || TO_CHAR(natcod.vBit_to_intval(p[4]),'fm000')
       END
       
    ELSE NULL --old 'country '||natcod.vBit_to_intval(p[1])::text
    END
  FROM ( SELECT length(array_to_string(p::text[],''))-8 ) t(l) -- depois será -10 
$f$ LANGUAGE SQL IMMUTABLE;

-- Ida e volta!
select *, optim.jurisd_vbparts_to_dec(bits) ret 
from (
  SELECT isolabel_ext, jurisd_local_id, optim.jurisd_local_to_vbparts(jurisd_local_id,int_country_id) as bits
  FROM optim.jurisdiction
) t order by 1;

from ggeohash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.