alphagov / publishing-api Goto Github PK

View Code? Open in Web Editor NEW

33.0 58.0 16.0 18.34 MB

API to publish content on GOV.UK

Home Page: https://docs.publishing.service.gov.uk/apps/publishing-api.html

License: MIT License

Ruby 79.46% Dockerfile 0.04% Procfile 0.01% Jsonnet 20.49%

govuk publishing container

publishing-api's Introduction

Publishing API

The Publishing API aims to provide workflow as a service so that common publishing features can be written once and used by all publishing applications across Government. Content can be stored and retrieved using the API and workflow actions can be performed, such as creating a new draft or publishing an existing piece of content.

Publishing API sends content downstream to the draft and live Content Stores, as well as on a Rabbit message queue, which enables things like sending emails to users subscribed to that content. Read "Downstream Sidekiq background processing triggered by publishing".

Nomenclature

Document: A document is a piece of content in a particular locale. It is associated with editions that represent the versions of the document.
Edition: The content of a document is represented by an edition, it represents a distinct version of a Document.
Content Item: A representation of content that can be sent to a content store.
Links: Used to capture relationships between pieces of content (e.g. parent/child). Can be of type link set link or edition link.
Unpublishing: An object indicating a previously published edition which has been removed from the live site.
User: A user of the system, which is used to track who initiated requests and to restrict access to draft content.
Path Reservation: An object that attributes a path on GOV.UK to a piece of content. It is used when paths need to be reserved before that content enters the system.
Event Log: A log of all requests to the Publishing API that have the potential to mutate its internal state.
Action: A record of activity on a particular edition, used to assist custom workflows of publishing applications.
Link Expansion: A process that converts the stored and automatic links for an edition into a JSON representation.
Dependency Resolution: A process that determines other editions that require updating downstream as a result of a change to an edition.

Technical documentation

This is a Ruby on Rails app, and should follow our Rails app conventions.

You can use the GOV.UK Docker environment to run the application and its tests with all the necessary dependencies. Follow the usage instructions to get started.

Use GOV.UK Docker to run any commands that follow.

Running the test suite

You can run the tests locally with: bundle exec rake.

The Publishing API also has contract tests with GDS API Adapters (where it acts as the "provider") and with Content Store ( where it acts as the contract "consumer"). Read the guidance for how to run the tests locally.

Further documentation

Licence

MIT License

publishing-api's People

Contributors

Stargazers

Watchers

Forkers

tadast westi rgarner universefall elliotcm simonesurdi sportsbite budmc29 thomasjones4 alphagov-mirror davidslv riszin-llc tubbz-alt uk-gov-mirror danmooregds xander-gds

publishing-api's Issues

Proposal for new version of lookup-by-base-path

Background

In regular publishing apps, we refer to documents by content ID, and using base paths as an identifier is discouraged. GET /v2/content can be used to search for editions.

In content tagger, things work a bit differently. We work at the document level rather than with specific editions, and use PATCH /v2/links to add links via the LinkSet.

For the basic tagging workflow, users enter a URL or base path of anything on GOV.UK, and we lookup the content id with POST /lookup-by-base-path.

This request may take multiple content ids to lookup simultaneously, which we use for bulk tagging workflows.

Problem

Currently if a document has not been published yet, we can't look up the item in content tagger.

We want to allow content tagger to work with any draft content, while continuing to support live content (which may be unpublished and withdrawn). We should distinguish between the two so the user can either view their changes immediately (live) or preview them on the draft-origin (draft).

For error handling, it would also be useful to distinguish between content that doesn't exist, and content that used to exist, but has been redirected/removed.

This is something we noticed when bulk tagging to the education taxonomy, because while identifying content to tag to the taxonomy, education publishers also improved the content and consolidated a lot of pages. In this case the bulk imports triggered a lot of "Content not on GOV.UK" errors.

Proposal

I'd like to create a new version of /lookup-by-base-path, that is less restrictive and returns more context about what content store it's in and whether it's been unpublished.

Request

POST /v2/lookup-by-base-path

base_path[]=/foo&base_path[]=/bar

Response

In the response there is now an object representing the current state of each content store.

{
  "/foo" => {
    {
     draft: {
      content_id: "123123-12312-12312",
     }
     live: {
       content_id: "1213123-12312-12552",
       unpublishing: {
         type: "redirect"
         alternative_path: "/foo",
       },
      },
    }
  }
  "/bar" => {
    {
     draft: {
      content_id: "123123-12312-12312",
     }
     live: {
       content_id: "1213123-12312-12552",
     }
    }
  }
}

If the content id never existed, the base path is not returned in the response.
draft returns details of the content item currently in the draft content store. There should always be a draft object.
live returns details of the content item currently in the live content store. There will only be a live object for published and unpublished content.
unpublishing is returned for the live content item if the live content item
is an unpublished document.
anything with a nil value is omitted from the response

Usage

We'd add a new gds-api-adapter to return the response.

In content tagger we'd use it like this:

if response.include?('live')
  live = response['live']
  unpublishing_type = response.dig('unpublishing', 'type')

  if unpublishing_type.nil? || unpublishing_type == 'withdrawn'
    state = 'live'
    content_id = live['content_id']
  else 
    # error: you can't tag this page any more
  end

else
  state = 'draft'
  content_id = response['draft']['content_id']
end

Rejected Alternatives

We could combine the two objects into a single one, with the current state as a field.

I don't like this solution as much because a state field is easily confusable with the publishing state of an edition. I think for this request, we are thinking more in terms of content store content items than publishing api editions. I'm also not convinced this is flexible enough, as it assumes the content id will never change for a base path, and we'll never be interested in additional fields that differ between draft and live.
We could use edition states instead of content store states

This would make the responses more difficult to work with.
We could keep using v1 and use a query parameter to toggle the new response

I'd prefer to indicate the version more explicitly but not sure if there's other considerations here.

POST vs GET

In the proposal above I've left it as POST (matching the existing request), although I think GET would convey the semantics better.

IIRC, the request was POST rather than GET to avoid hitting problems with limits on the length of a URI due to a long query string.

Alternatively we could use GET with a body. Happy to go with whichever approach is preferred.

Dependabot can't resolve your Ruby dependency files

Dependabot can't resolve your Ruby dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

Bundler::VersionConflict with message: Bundler could not find compatible versions for gem "link_header":
  In Gemfile:
    gds-api-adapters (~> 55.0.2) was resolved to 55.0.2, which depends on
      link_header

Could not find gem 'link_header', which is required by gem 'gds-api-adapters (~> 55.0.2)', in any of the sources.

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

You can mention @dependabot in the comments below to contact the Dependabot team.

Tests still run if the compile fails

If there is a compile error, the tests still run and produce their pretty output. If you're not paying attention you might think your changes are OK.

➜  make 
gom install
downloading github.com/codegangsta/negroni
downloading github.com/dghubble/warp
downloading gopkg.in/unrolled/render.v1
downloading github.com/alext/tablecloth
downloading github.com/onsi/ginkgo
downloading github.com/onsi/gomega
rm -rf _vendor/src/github.com/alphagov
mkdir -p _vendor/src/github.com/alphagov
ln -s /var/govuk/publishing-api _vendor/src/github.com/alphagov/publishing-api
gom test -v ./...
# _/var/govuk/publishing-api_test
./integration_test.go:29: undefined: testPublishingAPI
FAIL    _/var/govuk/publishing-api [build failed]
=== RUN TestURLArbiter
Running Suite: URL arbiter client
=================================
Random Seed: 1423584627
Will run 4 of 4 specs

••••
Ran 4 of 4 Specs in 0.012 seconds
SUCCESS! -- 4 Passed | 0 Failed | 0 Pending | 0 Skipped --- PASS: TestURLArbiter (0.01 seconds)
PASS
ok      _/var/govuk/publishing-api/contentstore 0.094s
?       _/var/govuk/publishing-api/request_logger   [no test files]
=== RUN TestURLArbiter
Running Suite: URL arbiter client
=================================
Random Seed: 1423584628
Will run 3 of 3 specs

•••
Ran 3 of 3 Specs in 0.003 seconds
SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 0 Skipped --- PASS: TestURLArbiter (0.00 seconds)
PASS
ok      _/var/govuk/publishing-api/urlarbiter   0.039s
gom:  exit status 2
make: *** [test] Error 1

Prevent new content from being created with a trailing period (.) in the route

As a user with a letter from the government containing a URL
I want to be able to type that URL as I see it
So that I can get to the page I am looking for

"As a Core team member I want to implement a rule whereby if a user adds a full stop to the end of the URL (most likely because they are sent a printed letter that shows the URL at the end of a sentence and therefore with a '.' appended) that the full stop is automatically stripped out so that the user does not see a 404."

Over 6 months 107,000 404s were URLs ending in a full stop. This is about 3% of all 404s recorded during this time frame. A small % of the overall number, but still a significant number of errors. Implementing this change would address the bulk of these URL problems, though they could contain other issues that we can't see from the report.

A letter will be going out soon to 4m people which will include a link at the end of a sentence.

We're now automatically redirecting all URLs containing a trailing period in Varnish, having ascertained that no content legitimately uses one, so we should ensure that new content cannot be created with a trailing period in its route.

If dependent apps are not running, unhelpful error is returned

Actual:

// url-arbiter, content-store are not running
$ make run
$ curl http://localhost:3000/content/new-vat-rates -X PUT -H 'Content-type: application/json' -d '<content-item-json>'
> {"Offset":1}

Expected error message:

> FATAL: Couldn't connect to URL Arbiter at http://url-arbiter.dev.gov.uk.

Links and expanded links oddities

I spoke to @danielroseman about this in person.

For this content item:
https://www.gov.uk/api/content/government/collections/national-driving-and-riding-standards

I noticed a couple of counter-intuitive things.

Expanded links appears to have less information than links

{
"analytics_identifier": null,
"api_url": "https://www.gov.uk/api/content/government/publications/car-and-light-van-driver-competence-framework",
"base_path": "/government/publications/car-and-light-van-driver-competence-framework",
"content_id": "5e5dd324-7631-11e4-a3cb-005056011aef",
"description": "The research, statistics and professional opinions which form the basis of the 'National standard for driving cars and light vans'.",
"locale": "en",
"title": "Car and light van driver competence framework",
"web_url": "https://www.gov.uk/government/publications/car-and-light-van-driver-competence-framework",
"expanded_links": {}
}

Compared to

{
"content_id": "5e5dd324-7631-11e4-a3cb-005056011aef",
"title": "Car and light van driver competence framework",
"base_path": "/government/publications/car-and-light-van-driver-competence-framework",
"description": "The research, statistics and professional opinions which form the basis of the 'National standard for driving cars and light vans'.",
"api_url": "https://www.gov.uk/api/content/government/publications/car-and-light-van-driver-competence-framework",
"web_url": "https://www.gov.uk/government/publications/car-and-light-van-driver-competence-framework",
"locale": "en",
"schema_name": "placeholder_publication",
"document_type": "guidance",
"analytics_identifier": null,
"links": {
  "organisations": [
    "d39237a5-678b-4bb5-a372-eb2cb036933d"
  ],
  "alpha_taxons": [
    "47f86db5-4641-4aca-b3fc-c15fcfa4da46"
  ]
}
}

Links contain properties that don't validate

When copying and pasting the links block into an example.json file in content schemas, there are some "additional properties" that cause validation to fail.

For example:

The property '#/links/document_collections/0/links' contains additional properties ["organisations", "alpha_taxons"] outside of the schema when none are allowed in schema file:///var/govuk-sites/govuk-content-schemas/dist/formats/publication/frontend/schema.json#

Dependabot can't resolve your Ruby dependency files

Dependabot can't resolve your Ruby dependencies.

As a result, Dependabot couldn't update any of your dependencies.

This could have been caused by a git reference having been deleted at the source, by an out-of-sync lockfile, or by a bug in Dependabot.

To help diagnose the issue, please try running bundle update --patch locally. If no errors occur, get in touch and we'll help dig into it.

You can mention @dependabot in the comments below to contact the Dependabot team.

cannot run a clean migration

what happens

i get the following error when i try and run migrations

StandardError: An error has occurred, this and all later migrations canceled:

Table 'live_content_items' has no foreign key for draft_content_item
/Users/andrewhilton/Code/publishing-api/db/migrate/20151124113325_allow_null_draft_content_id.rb:3:in `change'
/Users/andrewhilton/.rbenv/versions/2.3.1/bin/bundle:23:in `load'
/Users/andrewhilton/.rbenv/versions/2.3.1/bin/bundle:23:in `<main>'
ArgumentError: Table 'live_content_items' has no foreign key for draft_content_item
/Users/andrewhilton/Code/publishing-api/db/migrate/20151124113325_allow_null_draft_content_id.rb:3:in `change'
/Users/andrewhilton/.rbenv/versions/2.3.1/bin/bundle:23:in `load'
/Users/andrewhilton/.rbenv/versions/2.3.1/bin/bundle:23:in `<main>'
Tasks: TOP => db:migrate
(See full trace by running task with --trace)

what do you expect to happen

migrations run cleanly

steps to reproduce

rake db:create
rake db:migrate

phone_numbers optional fields are never used

None of the documents currently in the live content store use any of the optional fields of phone_number. Is it possible to use them, or are they effectively deprecated?

I wonder whether the fact that the number field is required encourages users to put fax numbers into it, and then to set the title to "Fax", instead of using the optional fax field, for example.

publishing-api/content_schemas/formats/contact.jsonnet

Lines 145 to 181 in e4db864

    
           phone_numbers: { 
        
             type: "array", 
        
             items: { 
        
               type: "object", 
        
               additionalProperties: false, 
        
               required: [ 
        
                 "title", 
        
                 "number", 
        
               ], 
        
               properties: { 
        
                 title: { 
        
                   type: "string", 
        
                 }, 
        
                 number: { 
        
                   type: "string", 
        
                 }, 
        
                 textphone: { 
        
                   type: "string", 
        
                 }, 
        
                 international_phone: { 
        
                   type: "string", 
        
                 }, 
        
                 fax: { 
        
                   type: "string", 
        
                 }, 
        
                 description: { 
        
                   type: "string", 
        
                 }, 
        
                 open_hours: { 
        
                   type: "string", 
        
                 }, 
        
                 best_time_to_call: { 
        
                   type: "string", 
        
                 }, 
        
               }, 
        
             }, 
        
           },

Knowledge Alpha content type appears to be obsolete

I found https://content-data.publishing.service.gov.uk/metrics/knowledge-alpha

which says there's a content type of knowledge_alpha

It was added in 2018 https://github.com/alphagov/govuk-content-schemas/blob/main/formats/knowledge_alpha.jsonnet

No page exists and the URL associated with it is a 404. Should we get rid of it?

	phone_numbers: {
	type: "array",
	items: {
	type: "object",
	additionalProperties: false,
	required: [
	"title",
	"number",
	],
	properties: {
	title: {
	type: "string",
	},
	number: {
	type: "string",
	},
	textphone: {
	type: "string",
	},
	international_phone: {
	type: "string",
	},
	fax: {
	type: "string",
	},
	description: {
	type: "string",
	},
	open_hours: {
	type: "string",
	},
	best_time_to_call: {
	type: "string",
	},
	},
	},
	},