openspending / fiscal-data-package Goto Github PK

MOVED TO https://github.com/frictionlessdata/specs/issues?q=is%3Aopen+is%3Aissue+label%3A%22Fiscal+Data+Package%22

fiscal-data-package's Introduction

Fiscal Data Package

Fiscal Data Package is a simple, open technical specification for government budget and spending data.

It is lightweight and user-oriented specification which aims to be extremely easy to use both for those publishing data (e.g. governments) and for those wanting to use the data (such as researchers and journalists).

Get started

Read the full RFC-style specification for the Fiscal Data Package format.

Additional Materials

fiscal-data-package.json (hosted in the frictionlessdata/schemas repository) contains a JSON schema for Fiscal Data Package metadata.

Note: this JSON schema only specifies the basic structure of the metadata descriptor. It does not check fine-grained properties like the required fields associated with different dataset types, and it does not specify the well-formedness of CSV datasets.

Contributing

Fiscal Data Package is an open specification. Development is led by Open Knowledge in collaboration with the World Bank and GIFT, the Global Initiative on Fiscal Transparency.

Development closely involves the community of users -- including data producers, intermediaries, and consumers.

You can already contribute to the development process by leaving suggestions and queries in the issue tracker.

fiscal-data-package's People

Stargazers

Watchers

Forkers

rufuspollock trickvi stevage mingyu-lee lujoc openspending-clone millsvonmilski

fiscal-data-package's Issues

dimension and dimensionType

The current spec for this is:

...
"mapping": {
  "ownDimensionName": {
    "dimensionType": "datetime|entity|classification|project|{null which means custom}"
  }
}
...

But why not:

...
"mapping": {
  "dimensions": { // see https://github.com/openspending/fiscal-data-package/issues/43
    "datetime|entity|classification|project|{null which means custom}": { # the identifier for the dimension is the type
      "name": "ownDimensionName" // required
    }
  }
}
...

dimensions could be an array and not a hash if we want to support multiple dimensions of the same type.

Make COFOG and GFSM recommended

COFOG and GFSM are not actually suitable as requirements for data publication.

We've approached these classification systems as though they are politically neutral and as though adoption of those systems an apolitical matter of technical capacity. This is not correct. COFOG is the creation of the OECD; GFSM is the creation of the IMF. For a country that does not already use these systems, to adopt either of them means building a relationship with those groups—a significant political decision. This spec should not demand that its users make such decisions.

COFOG and GFSM should therefore become recommendations rather than requirements. Any country that already implements those classification systems is asked to use them; any that does not is required to use functional, economic, etc. (the fields used to report country-specific functional and economic classification systems).

Looking forward, we need universal systems of budget and spending data classification that are as close to value-neutral as possible—which means that we need systems that come from below, from the community of data users.

Clarification of status field

Status can be "proposed", "approved", "adjusted", or "executed". Can a "typical" budget process be described so that these statuses are clear? Right now there's just:

Data can come from any stage in the budget cycle (proposal, approval, adjustment, execution). This includes three different types of planned / projected budget items (proposal, approval, adjustment) and one representing actual completed transactions (execution).

Elsewhere in the spec I see dataBudgeted, amountBudgeted, dateAdjusted, amountAdjusted and dateReported. Is "budgeted" here what's been "approved", and is "reported" what's been "executed"? Should "Budgeted" be changed to "Approved" and "Reported" to "Executed" to better align with the status field?

Related: #7 recommends adding some descriptive text in cases where the budget package skips all earlier statuses and is simply executed.

Dimension example improvements

I'm a bit confused by the examples of dimensions, for example:

We illustrate here some common dimensions.

date

"date": {
# note the list of fields is for illustration - you can have any fields you like
  "fields": {
    "year": "source field name"
  }
}

Assuming that the enclosing element is mapping (it'd also be good if this was clearer - sometimes it's confusing trying to guess where a snippet of code sits exactly), then shouldn't this be:

"date": {
  "dimensionType": "datetime",
# note the list of fields is for illustration - you can have any fields you like
  "fields": {
    "year": "source field name"
  }
}

Understood that dimensionType is "optional", but surely it should be present in the example.

Also, IMHO it's confusing starting with a list of 4 dimensionTypes, and then having examples that don't really match them - how about an example of the project dimensionType for instance?

Or maybe the dimensionType concept just hasn't been fully fleshed out yet.

On "granularity"

granularity is currently declared at the package level.

Should we consider it as a property of resources?
- Should we allow both, where a declaration on a resource overrides that of the package?
If there is no default, how should the data be understood in the absence of the granularity property?

Possible new fields: Performance/Result information

IATI and others are interested in performance measures. Could that be added to the specification as a field?

Data package spec allows simple license field

Rather than licenses plural just allow a single license as per data package spec

Hierarchical links between budget items

Sarah Bird from Open Contracting pointed out that budget data won't necessarily always be published at the most granular possible level and that it would be advantageous to be able to link budget items in a hierarchy. She proposed adding a parent_item ID field in order to make this possible.

Links to ID schemas

Sarah Bird from Open Contracting observed that the BDP spec's numerical ID fields currently aren't really good for much: they're just strings, and without any further context, such strings can't be used for linking up datasets (etc.).

She walked me through the Open Contracting spec's approach to ID fields, which divides them up into four:

freeform text
freeform identifier
link to schema
URI

She suggested that we should take a similar approach with our spec, at least adding in some kind of link to the schema (which I suggested we could put in the metadata for the relevant field). This would facilitate joining up datasets by turning freeform IDs into meaningful identifiers.

Including a feature in the spec that allows budgets and contracts to be linked up would be a huge win for both specifications, so it's important that this feature be included in the next revision.

How exactly this should work on a technical level is still up for discussion (holding a conversation on the stairs outside a bar while jetlagged, which is how this idea was floated, is not really the best way to nail down technical details)—and clearly Sarah Bird should be invited to be the leading voice in this discussion.

Conceivably the "schema link" could point to some sort of vocabulary or codesheet, and the link URL could serve as a namespace prefix on the ID itself. (This would obviate any need for an additional URI field.)

Use of the budget data package for simple spending data

These fields may cause problems:

status - would this be executed or ...?

"Form" vs "Content" in v0.3

The form vs content discussion made in the overview is a very useful one.

Combined with the "physical" vs "logical" model distinction introduced in v0.3 this could be a powerful way to structure the spec and to allow us to reintroduce the "content" ideas (currently not in v0.3). Specifically:

Form is "required"
Content is optional. We provide content structure in form of defined properties / fields in the logical model that you CAN map to - but if you don't have them that's fine

MUST vs SHOULD (and defaulting) fields

Could we default:

granularity to transactional
type to expenditure
standard - to latest version (??) - no-one ever remembers to add this sort of field ;-)
- also couldn't / shouldn't this be part of profiles entry - see frictionlessdata/datapackage#87

SHOULD vs MUST

granularity (why is this a MUST)?

On "direction"

direction is currently declared at the package level, indicating one of revenue or expenditure.

Should we consider it as a property of resources?
- Should we allow both, where a declaration on a resource overrides that of the package?
If there is no default, how should the data be understood in the absence of the direction property?

Deploy mini-website at nice url e.g. spec.openspending.org

Why: A URL to point people that is not a github repo, that looks nice and is permanent.

theme it - https://github.com/okfn/jekyll-template
add home page
google analytics
DNS

One side-effect of this change is that we should switch this repo to run off gh-pages and treat gh-pages as the main branch ...

dateLastUpdated

If we are going to have this suggest using modified or dateModified following DCMI terms / schema.org.

Note modified was in Data Package main spec until taken out - see frictionlessdata/datapackage#87

Hierarchies for organisational structure, functional classifications, ... ?

(migrated from OS forum)

Why do we want to explicitly model organisational hierarchies?

As a data consumer, I want to progressively dive into a dataset by aggregating it initially at levels I'm familiar with (federal departments) before discovering the internal structure of those departments.
As a data storyteller, I want to use hierarchies in the visualisations that I build.

For example:

Declaring type and status on each resource: why?

The spec requires a type for each resource, where type is one of "expenditure" or "revenue".

The spec requires a status for each resource, where status is one of "proposed", "approved", "adjusted" or "executed".

This pattern can lead to large amounts of repeated data.

Example:

I have a large dataset of local budgets in a country. For each "item" in a budget, my structure is like:

code: the unique identifier of the item
name: the name of the item
description: a text description of the item
type: "revenue" or "expenditure"
budget_amount: "approved" (the approved proposed budget)
actual_amount: "executed" (what was actually spent)

Note that, with Budget Data Package, I'd be required to:

have four distinct CSV files (and/or data packages) instead of one
repeat code, name, and description in 4 different places (more room for error, etc.)

I'm wondering:

Is there a use case that the spec aims to support that I don't grok?
If not, how could the spec be adjusted to deal with my use case (which I do not believe is unique)

On "fiscalPeriod"

fiscalPeriod is currently declared at the package level.

Should we consider it as a property of resources?
- Should we allow both, where a declaration on a resource overrides that of the package?
If there is no default, how should the data be understood in the absence of the fiscalPeriod property?

Rename 'type' to 'flow' or 'direction'?

type in the metadata could mean anything, not just revenue/exp. Wouldn't a name such as flow be more descriptive?

Multiple years/granularity/status on a single data package

Disclaimer: I'll point out a specific use case with which separating the budget by year/granularity/status makes it more difficult to produce the data. I'm not suggesting that it should be changed in the BDP, but just explaining my use case.

On http://cameroon.openspending.org and http://orcamento.inesc.org.br, we have a few visualizations using fiscal data. Everything on these sites is coming from OpenSpending: there's no other database. Taking the Cameroon site as an example, we have two main datasets: https://openspending.org/cm-budgets and https://openspending.org/cm-pib .

The cm-budgets contain the budget data for every councils in Cameroon, in every year, and granularity, and status. To sum it up, it contains everything related to council budgets. We've done this because when a new year or a new council is added, the site automatically uses the new data. There's no need to configure it to use a new dataset name, as it's all in a single place. So, the process for adding data is just to add stuff to that dataset. The website automatically uses whatever data is available.

If we followed the current BDP proposal, we would have a single dataset for each council, year, granularity, etc. There can't be a BDP for multiple years right now, as far as I can see. This would make it more difficult to update the site, because we would need to add in some configuration file the datasets available (e.g. cm-tignere-2009-expenses, cm-tignere-2009-revenue, cm-tignere-2010-expenses, ...).

"properties" or "fields" for dimension attributes?

Both are used in the current examples

mapping.date.properties.year
mapping.payee.fields.id

Coordination with related efforts - Open Contracting, DATA Act, etc

I'm not intimately familiar with this effort or the technical details of Open Contracting, but I've seen mention of Open Contracting in a few other issues and I wondered if there was an overview of how the efforts relate to one another and where there might be common elements.

I also wanted to bring up a new related effort to create the data standard to be used for tracking spending within the US Federal Government (thru the DATA Act) which is also being managed on on github. See fedspendingtransparency.github.io and their issue tracker

Consider renaming status to "phase"

Less ambiguous than status. Used in the current RDF Budget Model for Open Budgets

Dimension types are inconsistent in the specification

From an example in the spec where dimension types are first discussed:

dimensionType is optional

it can be used to indicate this is a standard types e.g. entity, classification, program etc

"dimensionType": "...",

Then later in the spec the dimension types are listed (without a description of what they are):

Dimension Types:

datetime

entity

classification

project

Note that "program" from the list in the example has been dropped and instead "datetime" and "project" have been introduced. I suspect there's a mix up between "program" and "project" but I don't really understand either of those dimension types (and description is missing) so I don't know which one is the right one and under which circumstances it should be used so I can't make a pull request to fix this.

What is the contribution model for this standard?

I've been trying to engage in the design of this standard over the last year or so, but it's entirely unclear to me how contributions actually make it into the spec. Tickets from non-OKFN contributors are closed randomly, and I can't tell if much (if any) external feedback is ever incorporated.

Right now, I can see two scenarios:

a) This standard is not intended for public consultation. It is built for it's customers (the World Bank, IMF) and the implementing organisation (OKFN) as an internal measure.

or:

b) If this is to be an open standard, a clear channel for contributions, which is fully valid for all members of the project and defines how changes are incorporated into the standard, is required. This can be restrictive (i.e. contributions are accepted only from accredited contributors), but it would need to be spelled out.

Right now, I see a real risk that this standard will in effect be made through method (a) but then advertised as if it were the outcome of an open process (b). This would undermine the notion of "open standards" and discredit OKFN as a participant in standards processes.

cc @rgrp @pwalsh @jpmckinney

Nesting dimensions under mapping

https://github.com/openspending/fiscal-data-package/blob/master/spec/index.md#details

We nest measures mapping.measures. I think we should do the same for dimensions: mapping.dimensions.

It is more consistent in terms of the spec, and means that code that works with the spec does not need to special case "measures" in order to work out what under mapping is a dimension.

cc @trickvi

Create dedicated examples section

This can also illustrate key patterns e.g.

how to specify different types of classification

Change field label "project" to "task" ?

In the description it is written "Name of the project underwriting the budget item. A project is an indivisible activity with a dedicated budget and fixed schedule." However I've been found with real budget classification where project was not the most granular activity. I would therefore suggest to use a more neutral term such as "task to identify the most granular activity in the Budget Data Package an as such avoid confusion when the term project is used.

budgetLineItem clarifications

I'm not sure what an example value would be, especially if a package's status is "approved" or "proposed".

Versioning of information

Gisele Craveiro has pointed out that sometimes mappings change, for instance a local functional classification mapping to COFOG might get changed between years. The specification should have a way of dealing with such versioning of data and possibly updates.

Explain differences between aggregated and transactional

I'm not sure why the following differences exist in expenditures:

cofog is required in aggregated but not transactional
financialSource and type exists in aggregated but not transactional

(I'd be happy to drop financialSource and type entirely)

Describing sources inconsistency

In "Describing Sources", the pattern for describing sources is described as:

"source": "name-of-field-on-the-resource",
"resource": "name-of-resource"

Elsewhere, I see examples like:

"source": "entities/about"

and

"source": "budget/budget"

I assume the pattern was "resource/field", but as there no description of this pattern, I'm guessing it is a holdover from a previous spec?

Change granularity to boolean

If we only anticipate two values, aggregated and transactional, then change the field to a boolean, and rename the field to either aggregated or transactional. I prefer aggregated, such that its omission means that the data is transactional.

Move aggregated data's dates to metadata; split it into two fields

There are two problems with the way dates are currently handled in the spec:

Our "date" field does not distinguish between the fiscal year represented by aggregate data and the date at which a data point was created in a given status (e.g. the date when an adjusted budget dataset was published).
By the principle that values that are the same for every data point belong in the metadata rather than the dataset, dates don't belong in aggregate data. Unlike transactional data, which is generated transaction-by-transaction and therefore may be associated with a wide range of times, aggregate data is typically generated in chunks, and a whole aggregate dataset will be generated in a single creation event.

A sensible way to handle these flaws is:

Add fiscalYear and dateUpdated to the resource metadata. (Make it required for all data types.)
Remove date from aggregate data.
Add dateCompleted to transactional data. Make it mandatory.
Add other fields from the old draft spec to transactional data as optional fields: dateBudgeted, dateAllocated, dateReported.

Add more guidance into datapackage section for versioning / maintaining historical data packages.

In open contracting we want to be able to link to budget data packages and items in a budget data package to provide a link between the budget and the procurement against it.

The current way we're accomplishing this, although input is greatly received, is simply through specifying a URI to the budgetDataPackage and the id of the budget item.

However, I can imagine a scenario where each year a publisher overwrites their annual budget data package.

Obviously we would want links to budget data packages to live on for historical referencing and so it would be great to see this documented in the specification as an area for compliance.

Geocode field needs a better explanation

Currently the geocode field is just a string with "name" of area. That needs better explanation because this just invites too many interpretations.

Representing dimensions which are trees, normalised across multiple resources

The classic case is functional classification, but it could apply to any dimension potentially (projects, entities, etc.)

How do we use mapping.{dimension} to represent a tree, when budget line data has a single field that points to an "object" in another resource? The reference from one resource to another is already provided by the schema property on the resource (JSON Table Schema).

Here is an example from an older version of this spec work (as OpenSpending Data Package)

Reduce COFOG/GFSM requirement

Requiring COFOG is likely to slow the adoption of the specification. A lot of countries have already put in place a different functional classification tailored to that country.

It has been proposed to move COFOG into a recommended field, not required.
Instead we can functional classification and perhaps a mapping of how that classification can be mapped to COFOG.

The same applies to GFSM and economic classifications.

"Budget" vs Fiscal vs Spending

I note that the "budget" data package is also suitable for publishing spending information like the information published by UK departments and local bodies (though this is not obviously budget-related data).

Do we need to tweak naming or, at least, make this clear in the introduction?

More phase field options

I just wanted to forward the WB concern that the proposed options of status/moment/phase/etc. ("proposed", "approved", "adjusted", or "executed") don't necessarily map well to all datasets. WB suggested that this is a mainly a problem on the expenditure side, whose cycles can vary even between different Latin American countries. cc: @cecilaki

Programme/project fields clarification needed

The programme and project fields are not clear on what they should contain. A better explanation is needed to avoid misinterpretations.

fiscalYear isn't a date

In it's simplest case, it's a date range (e.g. one year, two years starting in July, or seven years as in the case of the MFF). But a budget can also contain different data for different years, e.g. a budget for the next year and a financial framework for the one after that. Maybe to complex to model at the beginning, though.

Representing dimensions which are trees, denormalised in a single resource

The classic case is functional classification, but it could apply to any dimension potentially (projects, entities, etc.)

How do we use mapping.{dimension} to represent a tree, when the tree is denormalised in a single resource?

Some recent discussion around this:

Property name suggestions

If location is restricted to country, then I would rename to countryCode (as in the Geonames ontology).
admin: I find this a strange way to refer to a government entity. If we're referring to the administrator of an account, why not be expressive and name it administrator?
cofog: To ensure people enter 01.1 instead of "Executive and legislative organs, financial and fiscal affairs, external affairs", rename it to cofogCode, which is how the UN refers to these.
economic, functional: Whenever I read these, I think "economic what?" Why not expand to economicClassification and functionalClassification?
financialSource: this field is defined to be a classification. "financial source" makes me think of specific sources like "IMF", not classes of sources like "aid". A clearer name is needed.
fund (fundID): account is a more common accounting term.
geocode: too easily confused with geocoding (finding lat/lngs). Why not the more expressive geographicCode (if it's meant to be a code)?
purchaserID (purchaserOrgID): Not all expenditures are purchases. procuringEntityID (used by Open Contracting) would be more appropriate. Further, in Canada, the purchaser is literally Her Majesty the Queen in Right of Canada, but the organizations we care about are the procuring organization (usually Public Works and Government Services Canada) and the administering organization (whichever department).
type: with valid values "personnel", "non-personnel recurrent", "capital", "other". Not sure what these values mean. Once clarified, I might suggest a better term than the generic type.

On "status"

status is currently declared at the package level.

Should we consider it as a property of resources?
- Should we allow both, where a declaration on a resource overrides that of the package?
If there is no default, how should the data be understood in the absence of the status property?

Also, see #35

Organisational IDs

According to a mailing list thread at IATI they are dropping support for Organisational IDs. This creates a problem for us and we need to find a replacement.

Suggestions:

Annelise Parr says they will bring Organisational IDs back in version 2
Tom Lee asks if Open Civic IDs can work

Is the amount reported assumed to be the same as amount adjusted?

There are three date fields (budgeted, adjusted, reported) but two amounts.

"OpenSpending data package" mentioned in a confusing way

An OpenSpending data package MUST contain a datapackage.json - it is the central file in an Budget Data Package.

Switch to GFSM 2014 once published

It's currently a pre-publication draft.

http://www.imf.org/external/np/sta/gfsm/

Allowing arbitrary names of dimensions and their fields might lead to difficulties

It seems to me that the current version of Fiscal Data Package (0.3.0-alpha2) defines only the form of dimensions and measures in the mapping, but not their content or semantics: the creator of a data package can use any, arbitrary, names for dimensions and their fields. If that is so, it could be quite difficult or almost impossible to automatically process the data package. For example, one package could contain a dimension called "payee" with field "name", while another would have a dimension "payment_target" with field "id", even though both would have the same meaning. The former OpenSpending Data Package specification clearly defined a set of mapping properties (payee, payor, date etc.) and their meanings. I think Fiscal Data Package should stick to that.

openspending / fiscal-data-package Goto Github PK

fiscal-data-package's Introduction

Fiscal Data Package

Get started

Additional Materials

Contributing

fiscal-data-package's People

Stargazers

Watchers

Forkers

fiscal-data-package's Issues

dimensionType is optional

it can be used to indicate this is a standard types e.g. entity, classification, program etc

Recommend Projects

Recommend Topics

Recommend Org