Giter Site home page Giter Site logo

eidf-docs's Introduction

EIDF Documentation

The Edinburgh International Data Facility (EIDF) is built and operated by EPCC at the University of Edinburgh. EIDF is a place to store, find and work with data of all kinds. You can find more information on the service and the research it supports on the EIDF website.

This repository contains the documentation for the service and is linked to a rendered version currently hosted on Github pages.

Rendered documentation

How to contribute

We welcome contributions from the EIDF community and beyond. Contributions can take many different forms, some examples are:

  • Raising Issues if you spot a mistake or something that could be improved
  • Adding/updating material via a Pull Request
  • Adding your thoughts and ideas to any open issues

To find out how to contribute, please read CONTRIBUTING.

eidf-docs's People

Contributors

agngrant avatar akrause2014 avatar aturner-epcc avatar aturner-test avatar awat31 avatar clairbarrass avatar claoidek avatar davidhenty avatar dimmestp avatar dscobbie avatar dvalters avatar eleanor-broadway avatar hayjohnny2000 avatar holly-t avatar jbeechb avatar jhay-epcc avatar jsindt avatar juanfrh avatar kavousan avatar kevinstratford avatar lcebaman avatar markgbeckett avatar mbareford avatar nickaj avatar otbrown avatar pbartholomew08 avatar rmbaxter avatar welucas2 avatar wood-chris avatar xguo-epcc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eidf-docs's Issues

[Documentation]: Missing MATLAB docs

What documentation issue are you reporting

Missing Documentation

Issue

No MATLAB docs. It's not always clear when users should opt for a high tasks-per-node with a single cpus-per-task or the opposite for their jobs.

[Documentation]: The first time you login on to the VM you MUST go through the vdi

What documentation issue are you reporting

Incomplete Documentation

Issue

If you login directly using:

$ ssh -J [SAFE-username]@eidf-gateway.epcc.ed.ac.uk [VM-username]@[VM/IP]

it will fail with a slightly cryptic:

Enter passphrase for key '/Users/mario/.ssh/id_rsa':
[email protected]'s password:
channel 0: open failed: administratively prohibited: open failed
stdio forwarding failed
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535

Even though the correct pass phrase and password have been used. In order for the ssh access to succeed you MUST apparently go through the vdi first where your password will be reset BEFORE you can login to the VM directly. The documentation should state this.

[Documentation]: Missing docs for SDF-CS1

What documentation issue are you reporting

Missing Documentation

Issue

As the Cerebras system is already available to people, it would be good to have some documentation as soon as possible that covers:

  • How to request an account on SDF-CS1 via EPCC SAFE (may be useful to point to SAFE docs at: https://epcced.github.io/safe-docs/)
  • How to log on and transfer data to SDF-CS1 via SSH
  • How to setup models for use with CS-1
  • How to use Slurm to submit jobs to use CS-1

Section on policies re customer-run, outward-facing services

What documentation issue are you reporting

No response

Issue

Please add the following section to https://epcced.github.io/eidf-docs/services/virtualmachines/policies/

Customer-run Outward Facing Services

PIs can apply to run an outward-facing service; that is a webservice on port 443, running on a project-owned VM. The policy requires the customer to accept the following conditions:

  • Agreement that the customer will automatically apply security patches, run regular maintenance, and have named contacts who can act should we require it.
  • Agreement that should EPCC detect any problematic behaviour (of users or code), we reserve the right to remove web access.
  • Agreement that the customer understands all access is filtered and gated by EPCC’s Firewalls and NGINX (or other equivalent software) server such that there is no direct exposure to the internet of their application.
  • Agreement that the customer owns the data, has permission to expose it, and that it will not bring UoE into disrepute.

Pis can apply for such a service on application and also at any time by contacing the EIDF Service Desk.

[Documentation]: Suggestion - enable https://pre-commit.ci/

What documentation issue are you reporting

Formatting

Issue

We already have a .pre-commit-config.yaml in the repo, but this is not run if contributors don't have pre-commit installed locally or if they submit a PR directly in GitHub.

We could add https://pre-commit.ci to the repo, which will ensure the checks are run on every PR. Any formatting changes will also be automatically committed back to the branch by the tool, which is handy.

[Documentation]: Lack of clarity about uniqueness of usernames across projects

What documentation issue are you reporting

Missing Documentation

Issue

Users expect to have a single EIDF username across services which can be true IFF their project spans those services (or the services do not share an authentication system ie LDAP vs IPA).

This should be added to the FAQs so we can point users to it.

[Documentation]:

What documentation issue are you reporting

Incorrect Documentation

Issue

"Known Issues" section is incorrect:

  1. Users can be given sudo rights by PIs if requested
  2. There is a known issue with Firefox on current Ubuntu builds due to an issue with snapd not working with non-standard home directories (though this may be fixed soon)

[Documentation]: SLURM time examples missing

What documentation issue are you reporting

Incomplete Documentation

Issue

Lack of time-based examples with Ultra2 and CS-1 codes. Default is hour which can be a bit short for some large/complex CS-1 codes.

Requires adding a note about it and putting the code in the 3 or 4 example snippets of SLURM batch.

[Documentation]: Missing VM sizes for DSC

What documentation issue are you reporting

Missing Documentation

Issue

We should tell users the stock sizes for VMs to make it easier for them to request changes etc.

[Documentation]: missing info about ProxyCommand

What documentation issue are you reporting

Missing Documentation

Issue

The docs don't highlight that the -i flag isn't honoured by the jump host. It would be useful for the docs to explain how to get around this, both for SSH and SCP

[Documentation]: Missing Namespace Documentation

What documentation issue are you reporting

Incomplete Documentation

Issue

Issue Reported about the lack on information about namespaces on new projects - causing an error on operations trying to target the wrong namespace

[Documentation]: Short Video Examples

What documentation issue are you reporting

Missing Documentation

Issue

It would be good to have a video walkthrough of registering, logging in and changing password.

[Documentation]: GPU Service FAQ typo

What documentation issue are you reporting

Incorrect Documentation

Issue

Under "Access to GPU Service resources in default namespace is Forbidden" change

This arises when you forgot to specify you are submitting job/pods to your project namespace, not the "default" namespace which you do not have permissions to use.

to

This arises when the project namespace is not included in the kubectl command for submitting job/pods and kubectl tries to use the "default" namespace which projects do not have permissions to use.

[Documentation]: EIDFGPU Service K8s Lessons

What documentation issue are you reporting

Incomplete Documentation

Issue

Overarching issues to ensure documentation is broadly self-contained and accessible.

  • Some overview of what is K8s and the justification behind it being used in the EIDFGPU service.
  • Ensure standard K8s terms are defined before use:
    • YAML manifest file
    • Service/deployment/job
    • Helm Chart
  • Add that Docker image caching is node dependent and that there is no cluster-wide image repo leading to varying PodInitializing times

[Documentation]: Update VD Policy to refer to patching

What documentation issue are you reporting

Incomplete Documentation

Issue

In response to SPACe EPCC PL (Mark Sawyer) comment, can we please update the "EIDF Data Science Cloud Policies", with (something like) the following section at the top:

Patching of user VMs

While the EIDF updates and patches the hypervisors and the Cloud manager software as part of the EIDF Maintenance sessions, it is the responsibility of project PIs to keep the VMs in their projects up to date. To help with this endeavour, the Ubuntu operating system patches for security and alerts users on log-on to the VMs to reboot them as necessary for the changes to take effect. It also encourages users to update packages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.