Giter Site home page Giter Site logo

Comments (17)

jedcunningham avatar jedcunningham commented on August 16, 2024 4

Yeah, that's more or less what I was thinking, but I hadn't quite connected the dots. The good news is I don't think we need to worry about any extra complexity now 🍺.

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024 2

We're exploring an approach to move older versions of the docs to a newly created https://github.com/apache/airflow-site-archive repository (thanks to @potiuk for creating this repository) and we will keep those in the github-pages branch for the docs to be published. We need to figure out how to handle the redirects back and forth with respect to choosing the version from the drop-down.

from airflow-site.

potiuk avatar potiuk commented on August 16, 2024 2

Yeah. I like it too, though it would be great to modernize things a bit as well :) - but I agree if we can NOT involve another repo/redirection we seem to be good for now.

What we can do though - we could potentially rewrite the history for the whole airflow-site and keep maybe few last commits ? And we could do it periodically. That would leave us even more space I think and all the operations on the repo would be much faster (for now just getting my liquidprompt to show the version takes visible time.

from airflow-site.

potiuk avatar potiuk commented on August 16, 2024 1

I think this is about bad hugo version - look at the CI of ours for the versions it uses, I think I had very similar issue when I tried to build the docs on Mac - and I could not solve it when I tried. One of the ways of solving it was to use a docker container based on debian to build the docs. You could potentially use Breeze image for it (if you for example check-out the sites in "files" folder) but it might have another problem it could suffer from - slow filesystem mounted to docker on Mac,

Going Linux/Debian first (maybe using remote build machine for it) is likely the fastest way to solve the problem.

There is also an idea to modernise our build pipeline for the sites - wich could solve the problem.

from airflow-site.

potiuk avatar potiuk commented on August 16, 2024 1

I do not know the system that well, I think at least in some cases building the docs history has been skipped to save the size of generated images - look at the CI steps, I think there was even a comment about it

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024 1

Since build-site is copying stuff from docs-archive into dist, and we upload dist, can we not just delete docs-archive once we are done building? That'll give us 10gb of extra room without having any negatives or extra complexity?

Yes, @jedcunningham. I tried your suggestion and have created a draft PR.

I pushed 2 commits to display disk-free CLI command (df -h) output after each of the significant steps in our CI job (bf25b69 with the existing setup just adding a df -h and 6bc8ab7 after removing docs-archive directory to display df -h output).
CI job with first commit - https://github.com/apache/airflow-site/actions/runs/4830163412/jobs/8606043658?pr=777
CI job with second commit - https://github.com/apache/airflow-site/actions/runs/4830346989/jobs/8606461864?pr=777

I did a search in the repo and found there is no other reference to docs-archive after the site is built, so I believe we're safe to remove it from the CI job for the subsequent steps once the site is built. This step kept before the Deploy website on asf-site branch step will ensure (🤞🏽 due to +10GB reclaimed) that CI build will not fail as observed in this step last time due to no disk space available.

I am attaching the outputs (both PDF and PNG; you will need to zoom in unfortunately since I did a full-page capture) of the CI job with the 2 commits mentioned above.

PDF output:
first commit: before_pdf.pdf
second commit: after_pdf.pdf

PNF Output:
first commit:
Before
second commit:
after

If this solution is accepted, I guess this is a quick win for us for now :)

from airflow-site.

jedcunningham avatar jedcunningham commented on August 16, 2024 1

I'd be 100% on board if squashing it all helps. It's become really painful!

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

When I do a ./site.sh build-site after following the steps outlined in README.md and installing brew install hugo, I get the below error and currently stuck at resolving this

WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).
This can impact web performance.
Assets:
  chunk-4.3d5f5.js (1.53 MiB)
$ cross-env HUGO_ENV=production hugo -d ../dist -s site -v
Start building sites …
hugo v0.111.3+extended darwin/arm64 BuildDate=unknown
INFO 2023/04/26 19:02:52 syncing static files to /
ERROR 2023/04/26 19:02:54 render of "page" failed: "/Users/pankajkoti/airflow-site/landing-pages/site/layouts/_default/baseof.html:23:7": execute of template failed: template: _default/search.html:23:7: executing "_default/search.html" at <partial "head.html" .>: error calling partial: execute of template failed: html/template:partials/head.html:15:17: no such template "_internal/google_news.html"
ERROR 2023/04/26 19:02:54 render of "page" failed: "/Users/pankajkoti/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: execute of template failed: html/template:partials/head.html:15:17: no such template "_internal/google_news.html"
ERROR 2023/04/26 19:02:54 render of "page" failed: "/Users/pankajkoti/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: execute of template failed: html/template:partials/head.html:15:17: no such template "_internal/google_news.html"
ERROR 2023/04/26 19:02:54 render of "page" failed: "/Users/pankajkoti/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: execute of template failed: html/template:partials/head.html:15:17: no such template "_internal/google_news.html"
Error: Error building site: failed to render pages: render of "page" failed: "/Users/pankajkoti/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: execute of template failed: html/template:partials/head.html:15:17: no such template "_internal/google_news.html"
Total in 2060 ms
error Command failed with exit code 255.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 255.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

yes, looks like the template was removed https://discourse.gohugo.io/t/page-render-error/43594

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

Our CI seems to user Hugo version 0.91.2 as per https://github.com/apache/airflow-site/actions/runs/4253003040/jobs/7397288933, so I am installing that version now with
go install -tags extended github.com/gohugoio/[email protected] and see how it goes thereafter.

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

That helped @potiuk , thank you! The site was built and I can load the index page, however, the docs are not loading. Do I need to do something additional while building?
Screenshot 2023-04-26 at 7 37 56 PM

Screenshot 2023-04-26 at 7 38 06 PM

cc: @jedcunningham @phanikumv

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

I got a chance today to read more about our setup in the repo and studied the CI build.yml, site.sh scripts.

Understanding so far

The build jobs create a dist folder when we run the ./site.sh build-site command. The size of the dist folder is roughly 10.3GB in the current main branch when I build it locally. There is a docs folder in the dist folder which itself occupies most of the space and it also reads at ~10.3GB at the moment. So all other directories occupy minimum space relative to the docs folder.

du on root folder
Screenshot 2023-04-27 at 7 44 06 PM

du on the dist folder
Screenshot 2023-04-27 at 7 44 30 PM

du on the dist/docs folder
Screenshot 2023-04-27 at 8 17 37 PM

Github runners guarantee that they provide at least 14GB for the runs actions/runner-images#2840 (comment).

What I understood is when we create a PR, in the line in our CI build, the docs folder is removed before proceeding to the next steps and as a result, the build job when creating PRs would hardly fail.
But when we try to merge the PR and merge it to the main this huge docs folder is not removed and when we tried to deploy the website here:

- name: 🚀 Deploy website on asf-site branch
, the Deploy website on asf-site branch github action job failed with disk out of space issue while copying the dist folder to the gh-pages branch of our repository using the wrapper action apache/airflow-JamesIves-github-pages-deploy-action(The base action is https://github.com/JamesIves/github-pages-deploy-action).

I believe our website is deployed from the gh-pages branch and all the content that is available in it gets published as per my chat with ChatGPT :)

Solution Proposal (Theory)

We can replicate the setup including the CI and files/folder from this repo into https://github.com/apache/airflow-site-archive with the following tweaks.

  1. Split and copy a few sets of files from our docs-archive folder which gets translated to docs folder while building (occupying this huge space ~10.3GB) to the new repo with either of the below approaches:
    a. Keep certain providers in this repo and the rest providers in the new repo based on the sizing of the providers' wrt. to space they occupy as can be seen in the above screenshot for the dist/docs directory
    b. Keep all providers in both repos but split them by versions, meaning keep the latest versions here and the older versions in the new repo
  2. Have the site build / CI build only generate the dist folder with the docs we plan to keep in each repo.
  3. Set the target repository for the build job in the new repo to point to the gh-pages branch of this repo. Upon reading the options for the action, I believe, we can set the repository-name with the needed token in the new repo pointing to this repo.
  4. Ensure that
    CLEAN: true # Automatically remove deleted files from the deploy branch
    is set to False in this repo as otherwise the docs that are not in this repo but in the new repo will be cleaned out when CI is run in this repo on merge to main. Alternatively, set clean-exclude (again based on the options available in the GitHub action job) in this repo's CI build to not clean such files that are in the new repo.

With the above steps, I believe we will be able to have all the docs in this same repo's gh-pages branch and we would not need to worry about additional changes in the JS/CSS files of the repo, handling redirects, etc.

Next steps

The above proposal is all a theory based on my understanding so far and would like to hear opinions on this. Would like to hear if someone already knows whether this approach could make sense, is feasible/achievable or if we could sense some issues/blockers here.

Would really appreciate your time in reading this comment and would also appreciate if you have pointers on who we could reach out to more for seeking feedback/additional expert advice.

@potiuk @jedcunningham @phanikumv @mik-laj

from airflow-site.

jedcunningham avatar jedcunningham commented on August 16, 2024

Since build-site is copying stuff from docs-archive into dist, and we upload dist, can we not just delete docs-archive once we are done building? That'll give us 10gb of extra room without having any negatives or extra complexity?

from airflow-site.

ashb avatar ashb commented on August 16, 2024

Did we talk about keeping all the old versions somewhere else than the main branch (separated detached branches in this repo?)

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

Did we talk about keeping all the old versions somewhere else than the main branch (separated detached branches in this repo?)

yes @ashb, thank you for your comment. Jed had suggested this idea too and then we decided to try the other repository approach first. I am sorry I don't remember the reasoning but maybe @jedcunningham can tell more about his thought process.

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

okay, thanks a lot for the suggestions, feedback and go ahead! I will create a first PR with this approach of removing the docs-archive directory and we can iterate later again. I will check and try creating another PR later for squashing the commits (Keeping only the latest commits) as suggested by Jarek.

from airflow-site.

pankajkoti avatar pankajkoti commented on August 16, 2024

PR #777 was merged which reclaims us an additional 10GB+ space for the failing CI build job step, we do not need to move out the docs to the newer repo in the near future. Closing this issue for now.

@potiuk Can we keep the new repo or do we need to archive airflow-site-archive?

from airflow-site.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.