Giter Site home page Giter Site logo

gangagsoc2024's Introduction

Challenge for Ganga projects in GSoC 2024

The challenge that forms part of the Ganga projects in GSoC is divided up into two pieces.

  1. A part that demonstrates a basic proficiency in Ganga and the ability to work with Python
  2. Demonstration that you can integrate communication with an LLM from inside a python script.

Setup

Following the steps below will ensure that you can work freely on your project, can submit code through pushing it to GitHub but at the same time keep it private. Please avoid making your repository public as we want each student to work on this independently.

  • Create a duplicate of the repository following the instructions. Then make that duplicate private. Please do not make a normal fork as that will then prevent you from making the repository private.
  • Give the GitHub users egede, alexanderrichards, mesmith75 access to your repository as collaborators.

For performing actual work for the challenge you need to work within a linux environment. This can be either as a native linux machine, on a virtual machine or using WSL. Your python version should be python 3.8 or higher. To setup the working environment, we suggest to use a virtual environment in the follwing way:

python3 -m venv GSoC
cd GSoC/
. bin/activate
python -m pip install --upgrade pip wheel setuptools
python -m pip install -e git+https://github.com/YOUR-GITHUB-USERNAME-HERE/GangaGSoC2024#egg=gangagsoc

Through the dependency, this will install Ganga as well, such that you can work with it directly inside the virtualenv.

Communication

Communication is an important part of working in GSoC. We use the CERN Mattermost server for instant messaging about the project. Please

  • Create a CERN external account by following the instructions
  • Follow the link to join the Ganga Team in MatterMost. You can install MatterMost as an application, or you can just use it inside your browser.
  • When you have joined the Ganga Team, please join the GSoC2024 channel.
  • Introduce yourself with a few words to the channel.

It is by far the best if most communication is public in the MatterMost channel, but you can also instant message @egede, @masmith and @arichard for more specific issues. Please do not post solutions to the challenge to the public channel but you are welcome to discuss issues regarding the challenge.

Completing challenge

To complete the challenge you should push everything to the main branch of your forked repository. In the repository, we expect

  • That the setup.py file is updated with any extra python package dependencies that you may have introduced.
  • Update the file PROJECT.md to document what you have done and how we can test it. You can also include images or short screen grabbed movies that illustrate functionality.
  • Add a file CV.pdf that contains your CV.
  • That you have implemented tests of the code that can be tested with running python -m pytest to illustrate that everything works as expected. Tests should be placed in the directory test and have self-explaining names.

Ganga initial task

Start by performing this task.

  1. Demonstrate that you can run a simple Hello World Ganga job that executes on a Local backend.
  2. Create a job in Ganga that demonstrates splitting a job into multiple pieces and then collates the results at the end.
  • Use the included file LHC.pdf.
  • Create a job in Ganga that in python (or through using system calls) split the pdf file into individual pages.
  • Create a a second job in Ganga that will count the number of occurences of the word "it" in the text of the PDF file. It should be counted whether it is capitalised or not. Make sure not to count other words that have the letters "it" inside them. So "It is best when it uses Ganga" should have a count of two, while "The initial test did no work" should have a count of zero.
  • Using the ArgSplitter create subjobs that each will count the occurences for a single page.
  • Create a merger that adds up the number extracted from each page and places the total number into a file.
  1. Create test cases that demonstrate what you have done and that it is working. In the test directory you will find an example of a trivial test. All tests can be executed by python -m unittest when executed from the gangagsoc directory. To make tests that include Ganga objects, be inspired by tests in existing tests. Note the use of sleep_until_completed.
  2. Confirm that your new tests are running successfully within the defined GitHub action. Click on "Actions" in the web interface of your repository. The tests will run each time you push a file to the "main" branch.

Interfacing Ganga

The purpose of this challenge is to demonstrate that you can communicate with a Large Language Model in a programmatic way. Which LLM you use is for you to decide and the performance of the LLM itself (in terms of the quality of the replies) is irrelevant.

  1. Create code that sets up a conversation with an LLM of your choice.
  2. Craft a set of questions to the LLM that will make it write code that can execute a job in Ganga that will calculate an approximation to the number pi using an accept-reject simulation method will one million simulations. The job should be split into a number of subjobs that each do thousand simulations.
  3. Write a test that executes the proposed code. The test should succeed if the test tries to execute the code in Ganga. It doesn't matter if the code proposed by the LLM works.

Feedback on challenge

You are welcome to seek feedback on your solution to the challenges while they are still work in progress. For feedback PM @egede, @masmith and @arichard in MatterMost. Make sure that you have given access in Github to us as directed above.

Submission

When you are ready to submit, then please use the Google form to let us know.

gangagsoc2024's People

Contributors

egede avatar alexanderrichards avatar atharvjairath avatar dg1223 avatar

Stargazers

Aman Sharma avatar yash avatar Sourav Biswal avatar Yassin Elbedwihy avatar Suchith Krishna S Donni avatar V Abhijith Rao avatar Rishabh Singh avatar Eric Cheng avatar Ishan Rai avatar Jalaz Kumar avatar Mayank Dhiman avatar

Watchers

James Cloos avatar Robert Currie avatar  avatar  avatar Mark Slater avatar Ganga CI avatar Mark Smith avatar Kostas Georgiou avatar Tirth Jain avatar Ishan Rai avatar  avatar

gangagsoc2024's Issues

Clarification on GUI Task

You are welcome to do all three parts, but if you are only interested in applying for the GUI project, you will not be disadvantaged by not attempting the one related to the persistent storage.

Is the GUI task to be attempted by everyone. Even those who are primarily looking to apply for persistent storage task?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.