Giter Site home page Giter Site logo

ai-guide's Introduction

Mozilla AI Guide

Mozilla champions an open, accessible Internet where people have the reins. We welcome these recent amazing new AI breakthroughs.

However, with substantial corporate dollars being invested into AI R&D, it's unclear for both junior and senior engineers new to the scene to identify which paths are the ones that lead to sustainable open software. We've seen this story before.

Mozilla's efforts in AI are more than just technical - they're a call to action and unity across the currently fragmented open source AI Community.

Our AI Guide is a living, breathing resource, rooted in collaboration and community input for experts and newcomers alike, and we invite you to build alongside us.

We are currently welcoming contributions for:

  • Fine-tuning LLMs
  • Building LLMs from scratch
  • Multi-modal LLMs
  • Audio & Video models
  • Image models

Contribution Guide

Join our Discord →

PRs are open. Email [email protected] if you need to reach us.

Installation

Getting Started

This project is a Static webpage with HTML, CSS and JS. We use Webpack to bundle everything to the dist/ folder. This project also uses Nunjucks Templates to enable the use of templating in this project. To convert Nunjucks templates to HTML, we use the html-bundler-webpack-plugin.

Installation

These instruction assume you have NodeJS installed.

To build AI Guide from source and run the site locally, you can clone the repo from GitHub:

npm install

Running npm install will install the NPM dependencies.

Make it run

Build the site and start the web server with:

npm start

That will run the webpack dev server.

View the site at http://localhost:8000/

Build static HTML files

npm run build

The Webpack will output all HTML files to a folder called dist.

AI Guide-specific instructions

The AI Guide is hosted in this repo, and uses a slightly different Markdown-flavored templating system, but the same npm steps above. It also uses Tailwind for CSS and doesn't use Protocol.

Content for the guide is generated from Markdown files in templates/ai/content using scripts in tools/. To generate fresh content: tools/build_ai_guide.sh

Note that pages in /pages/ai/content should always be generated using the script above.

To run the server: tools/build_ai_guide.sh npx tailwindcss -w npm run start

Go to /

Folder Hierarchy

All Nunjucks files are either located in the templates/ folder or the pages/ folder. The templates/ folder contains base templates that can be extended, or partials, which can be included in the files in the pages/ folder.

The pages/ folder contains the Nunjucks files which will be compiled to HTML and used on the MIECO site.

Deploy

Branches in the pull request queue will be given a demo server by Netlify. The bot will comment on the PR with the link.

The main branch is automatically deployed to the staging server https://mozilla-ai-guide.netlify.app/

To deploy to production push the main branch to the production branch.

git push origin main:prod

ai-guide's People

Contributors

couci avatar heaversm avatar johnshaughnessy avatar lmorchard avatar pmac avatar skyfallsin avatar webdiscus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ai-guide's Issues

New Topic Idea: On-device AI development environments

Please describe your issue

Setting up a local development environment that uses on-device hardware for machine learning can be difficult to navigate, given the wide range of projects, matching hardware to software, and understanding chains of dependencies for various workflows.

Several team members within Mozilla Innovation are in the process of (or have built) multi-GPU hardware setups for doing small model training and ML application on local hardware (instead of within cloud environments). On-device ML can be especially compelling for exploring privacy and security-sensitive personal computing use cases, such as referencing documents on the local file system or browser history augmentation.

Describe the solution you'd like to see

A section that explores the considerations for doing on-device machine learning could include tested projects and hardware combinations, a rough idea of hardware specifications for different types of work (e.g. inference, RAG, fine-tuning), an overview of how to work with CUDA, recommended operating system / development environment best practices in configuration a system to ensure dependencies for individual projects are kept separate, and multi-user environments.

AI Basics: Missing word "in"

Sub-section title

WHAT ARE THE PROS & CONS OF USING AN LLM?

Please describe your issue

The word "in" is missing from the first sentence in the section.

Describe the solution you'd like to see

The sentence should state (missing "in" bolded for clarity):

"Although LLMs have made it possible for computers to process human language in ways that have been previously difficult, if not impossible, they are not without trade-offs:"

Include a references section in each guide document

Please describe your problem or observation

While Mozilla is, I think in general, considered a trustworthy source, including a references section would help to reinforce that this guide is part of a larger research and educational effort.

The first two chapters of the guide (AI Basics and Language Models 101) are especially concerning: Given the large volume of technical terminology references would serve to further legitimize the guide. For anyone new to the space, a references section is a valuable tool for deeper learning.

Corollary: I don't think in-text citations are necessary. While they might be a nice addition (or even in-text hyperlinks for technical terms), the guide is not meant to be an academic paper, and including them may serve to distract the audience from the overall purpose of the guide.

Describe the solution you'd like to see

A detailed references section for each chapter pointing to valuable external resources that support the claims made in the guide.

Missing section about ML Monitoring

Sub-section title

Monitoring ML Models

Please describe your issue

I think the current topic list is missing a section on ML monitoring. It can be about techniques and algorithms to monitor language and vision models.

Describe the solution you'd like to see

I would like the section to touch on points like:

  • Model performance estimation for language and vision models.
  • Monitoring estimated and realized performance of AI models.
  • How to detect data drift in language and vision models.
  • Issue resolution techniques to address model failures.

Language Models 101 confusing spelling of "top_k" vs "top-k"

Sub-section title

WHAT IS 'TOP_K' SAMPLING?

Please describe your issue

https://ai-guide.future.mozilla.org/content/llms-101/#what-is-top_k-sampling
The sidebar nav says top_k (underscore).
The page heading id says top_k (underscore).
The content switches to top-k (hyphen).

Same w/ top_p vs top-p in the nav and section below it.

Describe the solution you'd like to see

🤷
I'm not an AI person so I'm not sure what the correct term is. I'd assume hyphenated.

Langauge Models 101 - Missing word "on"

Sub-section title

HOW DOES A TYPICAL TRAINING RUN WORK?

Please describe your issue

The third sentence of the first paragraph is missing the word "on" between "errors" and "these".

Describe the solution you'd like to see

The third sentence in the first paragraph should read (missing "on" bolded for clarity):

"The model then uses its prediction errors on these inputs to repeatedly update its parameters through an optimization algorithm like stochastic gradient descent (SGD) or Adam."

Language Models 101 - Lowercase typo in first paragraph

Sub-section title

WHAT'S THE DIFFERENCE BETWEEN A "LANGUAGE MODEL" AND A "LARGE LANGUAGE MODEL"?

Please describe your issue

The third sentence of the section leads with a lowercase "usually".

Describe the solution you'd like to see

"usually" should be capitalized:

"Usually, a LLM provides higher quality results than smaller LMs due to its ability to capture more complex patterns in the data."

Typo: "bleeding age" instead of "bleeding edge"

Sub-section title

Our First Project - Summarization

Please describe your issue

The last paragraph in the "A brief pause for context" section reads:

However, once we have a workflow to address all of the above, you will have the means to forever be on the bleeding age of published AI research.

"bleeding age" should probably be "bleeding edge".

Describe the solution you'd like to see

"bleeding age" should probably be "bleeding edge".

AI Basics: Replace scare quotes with italics

Sub-section title

WHEN I SEND A TRANSFORMER-BASED LLM A “PROMPT”, WHAT HAPPENS INTERNALLY IN MORE TECHNICAL TERMS?

Please describe your issue

Throughout this section (and possibly elsewhere) technical terms are wrapped in single quotes. For example, the following:

It uses a Transformer architecture which allows it to pay varying levels of 'attention' to different parts of the input sequence at each step of the encoding process.

Additionally, single quotes have been used to wrap example natural language phrases:

For instance, a word-level tokenizer will convert the sentence "I love coding" into 'I', 'love', 'coding'.

And in at least one instance (in section "WHY ARE PEOPLE CONCERNED ABOUT LLMS?") double quotes are actually used as scare quotes:

The tendency of users to view LLM-powered tools as “officious oracles” can lead humans to make flawed or harmful decisions based on the biases and misinformation these systems can produce.

Describe the solution you'd like to see

Technical terms should be written with an alternative typography (i.e. italics or boldface) to make clear that these aren't terms that are used loosely, but that are technical terms used in practice. For example:

It uses a Transformer architecture which allows it to pay varying levels of attention to different parts of the input sequence at each step of the encoding process.

Choosing ML Models doesnt have a button to proceed to the next section (Notable Projects)

Sub-section title

No response

Please describe your issue

https://ai-guide.future.mozilla.org/content/choosing-ml-models/ then scroll to the bottom of the page.
I expected a consistent button to take me to the next section (Notable Projects, according to the sidebar). But there is no such button. I'm lost and scared in an abyss, surrounded by despair.

Although there is a "Share your Feedback" button that takes me to a Google Form. Not sure if that is intentional, for survey needs or if that was a leftover from the old system and then "Feedback form" has been replaced by the Contribution Guide link in the sidebar.

Describe the solution you'd like to see

No response

AI Basics has confusing label on "Next" button

Sub-section title

No response

Please describe your issue

The button at the bottom of the page says "LLMs 101", but the sidebar seems to show that we renamed that section to "Language Models 101". The button label should be renamed to match

Describe the solution you'd like to see

- LLMS 101
+ Language Models 101

Language Models 101 - Font change associated with <code> elements

Sub-section title

WHAT IS FREQUENCY?

Please describe your issue

The

text in the WHAT IS FREQUENCY? section is inheriting a different font-family from the rest of the

-wrapped text in the document. Each of the text sections related to a figure has additional nested

elements, one of which is passing CSS classes to the text, including the ff-montserrat class, which is overriding the document font-family.

Perhaps this is intentional, but I don't read it as such given the rest of the formatting. If the the font is intended to be different, perhaps some additional

or
formatting is warranted.

This also applies to the sections WHAT IS TEMPERATURE? and WHAT IS 'TOP_P' SAMPLING?

Describe the solution you'd like to see

All

text in the document should have a consistent font style.

AI Basics inconsistent numbering for "Attention Mechanism"

Sub-section title

AI Basics: When I send a Transformer-based LLM a “prompt”, what happens internally in more technical terms?

Please describe your issue

I'm unclear on what "Attention Mechanism" is. It seems to break the numbering, although the numbering continues at "6. Output Generation" below. Should it be "6. Attention Mechanism" and then "7. Output Generation"?

Describe the solution you'd like to see

Fix indentation/numbering? I'm unclear if "Attention Mechanism" is part of "Decoder" or not.

AI Basics: What are the pros & cons of using an LLM

Sub-section title

AI Basics

Please describe your issue

https://ai-guide.future.mozilla.org/content/ai-basics/#what-are-the-pros-cons-of-using-an-llm something is wrong with that slug. IIRC we looked at it before and the & wasn’t encoding so it is probably pros--cons instead of pros-cons.

a href="/content/ai-basics/#what-are-the-pros-cons-of-using-an-llm"
h4 id="what-are-the-pros--cons-of-using-an-llm"

Describe the solution you'd like to see

Change the href to use pros--cons so it finds the anchor and fixes the sidebar nav.

Just wanted to say thank you for this resource

Please describe your problem or observation

I'm upskilling in AI and I found this guide very interesting.

Also using "colab" is very convenient.

Describe the solution you'd like to see

No response

Bad link on homepage for Evaluating Models > measure summarization performance

Sub-section title

Evaluating-models

Please describe your issue

https://ai-guide.future.mozilla.org/#Evaluating-models

the 4th paragraph has a link saying "measure summarization performance" which seems to take me to a disambiguation page on wikipedia: https://en.wikipedia.org/wiki/ROUGE_(metric

I think that trailing comma in the rendered page is supposed to be part of the link itself.

- https://en.wikipedia.org/wiki/ROUGE_(metric
+ https://en.wikipedia.org/wiki/ROUGE_(metric)

Describe the solution you'd like to see

I need to clone the site and see if it appears anywhere else.

Q: Should sidebar navigation auto-open depending on the current page?

Please describe your problem or observation

  1. Go to https://ai-guide.future.mozilla.org/
  2. Expand the "AI Basics" sidebar on the left and click on "What exactly is an LLM?"

Page refreshes to https://ai-guide.future.mozilla.org/content/ai-basics/#what-exactly-is-an-llm, as expected, but the sidebar navigation is closed now and I have to reopen it. Not sure if keeping the navigation open when the page loads makes it any more obvious. Could possibly be confusing if the navigation wasn't open and they clicked top level nav and now we show 20 new H4 nav bullets.

Describe the solution you'd like to see

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.