Giter Site home page Giter Site logo

agnostiqhq / covalent Goto Github PK

View Code? Open in Web Editor NEW
692.0 23.0 85.0 379.51 MB

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.

Home Page: https://www.covalent.xyz

License: Apache License 2.0

Python 73.78% Dockerfile 0.30% HTML 0.07% CSS 0.06% JavaScript 24.49% Jupyter Notebook 1.24% Mako 0.04% Shell 0.02%
hpc workflow workflow-automation workflow-management quantum-computing hpc-applications python machine-learning pipelines covalent

covalent's People

Contributors

alejandroesquivel avatar andrew-s-rosen avatar annagwen avatar araghukas avatar aravind-psiog avatar arunpsiog avatar cjao avatar dependabot[bot] avatar dwelsch-esi avatar emmanuel289 avatar filipbolt avatar fyzhsn avatar haimhorowitzagnostiq avatar jkanem avatar kessler-frost avatar madhur-tandon avatar mpvgithub avatar poojithurao avatar prasy12 avatar pre-commit-ci[bot] avatar ravipsiog avatar ruihao-li avatar sayandipdutta avatar scottwn avatar sriranjanivenkatesan avatar udayan853 avatar valkostadinov avatar venkatbala avatar wingcode avatar wjcunningham7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covalent's Issues

Deprecate and possibly remove unused functions and conditions

There are certain functions in covalent which are not being used currently and may as well be removed. Some examples include check_consumable, check_constraint_specific_sum in lattice.py, some utility functions, the schedule condition in _plan_workflow (_plan_workflow may itself be removed but that isn't that big a deal).

Plotting inside lattice creates matplotlib process that hangs indefinitely

@FyzHsn commented on Thu Dec 30 2021

The following code snippet can be used to recreate the issue:

import covalent as ct
import matplotlib.pyplot as plt

@ct.electron
def get_plot():
    fig, ax = plt.subplots()
    ax.plot([1, 2, 3, 4], [5, 6, 7, 8])
    return ax

@ct.lattice
def workflow():
    p1 = get_plot()
    p2 = get_plot()
    return p1, p2

This will start a matplotlib process that hangs. The following modifications to the code snippet did not resolve the issue:

@ct.electron
def get_plot():
    fig, ax = plt.subplots()
    ax.plot([1, 2, 3, 4], [5, 6, 7, 8])
    plt.close()
    return ax

@ct.lattice
def workflow():
    p1 = get_plot()
    plt.close()
    p2 = get_plot()
    plt.close()
    return p1, p2

Edit:

This hangs when doing:

id = workflow.dispatch()
res = ct.get_result(dispatch_id=id, wait=True)

Remove dependence on sentinel module

The sentinel module (or a sentinel object) is used in a few places when a dictionary would work just fine. For simplification purposes, and because the sentinel module isn't widely available (it's not in conda-forge), dependence on it should be removed.

This should be as simple as:

  • replacing the one sentinel object (_DEFAULT_CONSTRAINT_VALUES in covalent/_shared_files/defaults.py) with a dictionary with the same information
  • changing any access method used on _DEFAULT_CONSTRAINT_VALUES to one suitable to dictionaries.

Replace `Optional[T]` with `T` for default args, that expect `T` and not None

In quite a few places, function parameters with defaults are annotated with typing.Optional[T] (where T could be str, int, ...), I propose changing these to T where it is deemed to be the expected type of the variable for the rest of the block, and explicit None is not valid.

typing.Optional docs says the following about default value type annotations.

Note that this is not the same concept as an optional argument, which is one that has a default. An optional argument with a default does not require the Optional qualifier on its type annotation just because it is optional.


This is not merely due to aesthetics, in my experience, Optional qualifiers can create some annoying type checking issues with mypy. For example:
test.py

from typing import Optional
def foo(arg: str) -> None:
    pass

def bar(arg: Optional[str] = "") -> None:
    foo(arg)
    return

Running mypy:

> mypy .\test.py
.\test.py:6: error: Argument 1 to "foo" has incompatible type "Optional[str]"; expected "str"
Found 1 error in 1 file (checked 1 source file)

Then you will need to do one of the following:

from typing import cast
def bar(arg: Optional[str] = "") -> None:
    foo(cast(str, arg))
    return

or

def bar(arg: Optional[str] = "") -> None:
    assert arg is not None
    foo(arg)
    return

or

def bar(arg: Optional[str] = "") -> None:
    if isinstance(arg, str):
        foo(arg)
    return

Although a single cast , assert or isinstance may not be a big deal, if you have to use it repeatedly, especially in loops, it looks unclean and reduces performance.


Originally posted by @sayandipdutta in #75 (comment)

Related: #75 (comment)

OSError Too many open files error while using server dispatch

@FyzHsn commented on Thu Dec 09 2021

When using the lattice.dispatch method to dispatch a workflow with many electrons, we get the following error:

distributed.worker - WARNING - Compute Failed
Function:  to_run
args:      ('use-nice-word', '/Users/faiyaz/Code/nimbus/doc/source/tutorials/quantum_chemistry/results/')
kwargs:    {}
Exception: "OSError(24, 'Too many open files')"

To recreate this issue:

import covalent as ct

# !pip install ase
# !pip install ase-notebook
from ase import Atoms
from ase.calculators.emt import EMT
from ase.constraints import FixAtoms
from ase.optimize import QuasiNewton
from ase.build import fcc111, add_adsorbate
from ase.io import read
from ase.io.trajectory import Trajectory
from ase_notebook import AseView, ViewConfig

@ct.electron
def construct_cu_slab(
    unit_cell=(4, 4, 2),
    vacuum=10.0,
):
    slab = fcc111("Cu", size=unit_cell, vacuum=vacuum)
    return slab


@ct.electron
def compute_system_energy(system):
    system.calc = EMT()
    return system.get_potential_energy()


@ct.electron
def construct_n_molecule(d=0):
    return Atoms("2N", positions=[(0.0, 0.0, 0.0), (0.0, 0.0, d)])


@ct.electron
def get_relaxed_slab(slab, molecule, height=1.85):
    slab.calc = EMT()
    add_adsorbate(slab, molecule, height, "ontop")
    constraint = FixAtoms(mask=[a.symbol != "N" for a in slab])
    slab.set_constraint(constraint)
    dyn = QuasiNewton(slab, trajectory="/tmp/N2Cu.traj", logfile="/tmp/temp")
    dyn.run(fmax=0.01)
    return slab

@ct.lattice
def compute_energy(initial_height=3, distance=1.10):
    N2 = construct_n_molecule(d=distance)
    e_N2 = compute_system_energy(system=N2)

    slab = construct_cu_slab(unit_cell=(4, 4, 2), vacuum=10.0)
    e_slab = compute_system_energy(system=slab)

    relaxed_slab = get_relaxed_slab(slab=slab, molecule=N2, height=initial_height)
    e_relaxed_slab = compute_system_energy(system=relaxed_slab)
    final_result = e_slab + e_N2 - e_relaxed_slab

    return final_result

optimize_height = ct.electron(compute_energy)

import numpy as np


@ct.lattice
def vary_distance(distance):
    result = []
    for i in distance:
        result.append(optimize_height(initial_height=3, distance=i))
    return result

distance = np.linspace(1, 1.5, 30)
dispatch_id = vary_distance.dispatch(distance=distance)

result = ct.get_result(dispatch_id, wait=True)
print(result.status)

The error msg can be found in ~/.cache/covalent/dispatcher.log

This issue is possibly related to issue #184.

Actionables:

  1. If possible, write a test that fails before implementing the bug fix.
  2. Fix the Quantum Chemistry tutorial and change the serverless dispatch to the server dispatch when the bug is removed.

Improve Contributing.md guide to be more user friendly

This issue is dedicated to making the CONTRIBUTING.md file more user friendly.

  • Add documentation section and instructions on (Faiyaz):
    • Building documentation locally (including installation).
    • Checking the html file in the browser.
    • Add section on how to expose API documentation.
      Add section on how to add new modules.
      Add instructions on installing pre-commit hooks.
  • Add instructions on how to create new branches (@HaimHorowitzAgnostiq ).
  • Add instructions on linking PR to issue number (@HaimHorowitzAgnostiq).
  • Add instructions and links on how / when to do pull requests (@HaimHorowitzAgnostiq).
    Add links on resolving merge conflicts.

is_dispatcher_running and is_ui_running functions for better error handling.

@santoshkumarradha commented on Sun Jan 02 2022

Throughout the codebase, we are making API requests to local servers without proper error handling leading to issues listed in #224. Better handling of this with usual pattern of checking if indeed the flask servers are running in each respective module needs to be added like

from covalent_dispatcher import is_dispatcher_running
from covalent_ui import is_ui_running
if is_dispatcher_running():  ...
if is_ui_running():  ...

and then throw meaningful error. Currently if dispatcher is not running and we call the dispatch, we wait and through HTTP request error for API.


@santoshkumarradha commented on Mon Jan 03 2022

Related to #224


@FyzHsn commented on Mon Jan 31 2022

This task can be broken down into the following subparts:

  • Determine where the two methods should be implemented.
  • Write test for is_ui_running().
  • Write test for is_dispatcher_running.
  • Implement is_ui_running().
  • Implement is_server_running().
  • The status of whether the servers are running can be checking using the _read_pid methods in covalent_dispatcher/_cli/service.py.
  • The filepaths for the pid can be read from covalent_dispatcher/_cli/service.py as well. The relevant variables are DISPATCHER_PIDFILE and UI_PIDFILE.

@FyzHsn commented on Mon Jan 31 2022

@santoshkumarradha @wjcunningham7 How do you feel about the UI for the is_server_running functionality to be:

from covalent import is_ui_running, is_dispatcher_running

and also place the functionalities inside _cli/service.py?

Also, another alternate suggestion:

import covalent as ct

ct.is_server_running(type="ui")
ct.is_server_running(type="dispatcher")

Lastly, in terms of code placement, I could also put is_dispatcher_running inside covalent_dispatcher/app.py and similary for the UI server.

Adding CI/CD checks for reminding/catching version update/ Change log update in PR.

What should we add?

Updates to PR contributors similar to

Hello. You may have forgotten to update the changelog!
Please edit changelog.md with the changes you have made

for all incoming PRs to reduce and automate the code review process. Potential things to check include

  • version
  • changelog

Describe alternatives you've considered.

None.

Unnecessary `electron` metadata

Environment

  • Covalent version: 0.22.6
  • Python version: 3.8.12
  • Operating system: macOS 11.6

What is happening?

electron metadata contains a lot of unnecessary fields fields such as num_cpu, num_gpu, memory, etc. This is due to the default_constraints_dict in covalent/_shared_files/defaults.py.

How can we reproduce the issue?

import covalent as ct
from matplotlib import pyplot as plt

@ct.electron
def sample_task(x):
    return x ** 2

@ct.lattice
def sample_workflow(a, b):
    c = sample_task(a)
    d = sample_task(b)
    return c, d


sample_workflow.build_graph(1, 2)

graph = sample_workflow.transport_graph.get_internal_graph_copy()
node_metadatas = [node_dict["metadata"] for _, node_dict in graph.nodes(data=True)]

print(node_metadatas)

This will output:

[
{'backend': 'local'}, {'schedule': False, 'num_cpu': 1, 'cpu_feature_set': [], 'num_gpu': 0, 'gpu_type': '', 'gpu_compute_capability': [], 'memory': '1G', 'backend': 'local', 'time_limit': '00-00:00:00', 'budget': 0, 'conda_env': ''}, 
{'backend': 'local'}, {'schedule': False, 'num_cpu': 1, 'cpu_feature_set': [], 'num_gpu': 0, 'gpu_type': '', 'gpu_compute_capability': [], 'memory': '1G', 'backend': 'local', 'time_limit': '00-00:00:00', 'budget': 0, 'conda_env': ''}
]

What should happen?

These fields can be moved to a different dictionary and commented. Further, that dictionary can be used as an example on how users can define their custom metadata fields and their default values. default_constraints_dict should only contain backend as a metadata field for now.

Any suggestions?

No response

Fix results directory behavior

@wjcunningham7 commented on Fri Jan 21 2022

Currently write_streams_to_file will attach the stdout and stderr files to the results directory if those files are specified using relative paths. However, this always queries the results directory in the config file when instead it should be looking at the electron metadata.

Edit:

The stdout/stderr log files that are written to is determined by priority.

  1. Top priority goes to the stdout/stderr log files in the covalent.conf file IF it is an absolute path. At the moment these files are overwritten but ideally they should be appended to.
  2. In the case the the stdout/etderr log files in the covalent.conf file is a relative path, we first check if results_dir has been specified in the lattice metadata. If it has been specified, that has the second highest priority. The logfiles should be written to results_dir/dispatch_id/ folder.
  3. Lastly, in the case where the log file paths are relative and results_dir has not been specified for electrons, we should use the results_dir in the covalent.conf file and store the log files in the results_dir/dispatch_id folder.

In this issue, we ensure that priority 2 is indeed being checked and implemented.

More informative responses from dispatcher

What should we add?

Currently for any error that occurs while dispatching, a generic exception is thrown by the dispatcher server which may not make sense to the user and they might not be able to resolve that on their own. We need the server to reply with exact details and error messages to the user about why it failed to say start a dispatch, or cancel a job, etc.

For instance, if the server isn't running, and someone tries to dispatch a workflow then the following exception is thrown,

requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=48008): Max retries exceeded with url: /api/submit (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fca0a6cea90>: Failed to establish a new connection: [Errno 61] Connection refused'))

Which is not sufficient to explain that the server has not been started yet. Similar exception is thrown when a cancellation request is sent and the dispatch id doesn't exist. Hence, the need for more robust and informative responses.

Describe alternatives you've considered.

No response

Ensure that covalent purge stops server before deleting pid files

Currently when the covalent servers have been started, running covalent purge will delete the corresponding server pid files without actually stopping the servers. This creates a discrepancy in the server status. Furthermore, we can do endless loops of covalent start and covalent purge and start more and more servers this way. The fix is to ensure that the server is stopped before the pid files are deleted.

Remove draw function from tutorials, howtos

@wjcunningham7 commented on Wed Jan 05 2022

We want to remove the graphviz dependency. In this issue, replace any references to draw in the documentation with a UI screenshot. Then deprecate the draw function and confirm that the package is pip installable without graphviz.


@santoshkumarradha commented on Thu Jan 06 2022

@wjcunningham7 instead of depreciating, could we have a error there to put out if graphviz was not there abs asking user to install? Vowing the graph without Ui might still be important in few cases(places like these tutorials or quick prototype before sending to cloud without having UI, debuting too as the plot will not go beyond the electron that does unsupported things inside lattice etc..). And ideally our dependency of graphviz comes not out of totally needing it, but comes only out of plotting the coordinates, so we can in theory write our own code to replace it.

One feature we are including in the Ui (not in current version but next one) is to display before dispatching. Until we are able to do this I think the draw graph is necessary even in tutorials (with a warning that it's not absolutely required) as there is no way yet to look at the workflow without submitting.

Tests for PyPi releases

What should we add?

As we saw recently that even though things were working as expected when covalent is installed using the GitHub repo, if someone did pip install cova they wouldn't have been able to use covalent. We should add tests/actions in pypi.yml to check whether installing from pip works as intended before every merge to develop or master.

Describe alternatives you've considered.

For starters we can add a diff check between what is built by python setup.py sdist and what is there in the repo barring files/dirs containing docs, tests, tutorials, etc.

Support arm64 architecture in Docker images

Building Docker images on arm64 architecture fails. This can be seen by doing a simple docker build on an M1 Mac or docker build -t covalent-test --platform linux/arm64 . on a x86 Linux machine.

The failure is in Step 4/6 : RUN pip install --no-cache-dir --use-feature=in-tree-build /opt/covalent

Error output is very long, but it starts with

Building wheels for collected packages: cova, matplotlib, tailer, psutil
  Building wheel for cova (PEP 517): started
  Building wheel for cova (PEP 517): finished with status 'done'
  Created wheel for cova: filename=cova-0.22.6-py3-none-any.whl size=13620098 sha256=98a540f4448907e0967a8b755b2ee05b44f3bb8d7d69bb66646a0099b634863c
  Stored in directory: /tmp/pip-ephem-wheel-cache-ph74pyyw/wheels/19/65/fd/7e179e81c8b96897f364759844dc110384ba868ac7aa8a97bd
  Building wheel for matplotlib (setup.py): started
  Building wheel for matplotlib (setup.py): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-babip5km/matplotlib_47f7efd011e745eb90af7a5745205214/setup.py'"'"'; __file__='"'"'/tmp/pip-install-babip5km/matplotlib_47f7efd011e745eb90af7a5745205214/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-w4c8bdop
       cwd: /tmp/pip-install-babip5km/matplotlib_47f7efd011e745eb90af7a5745205214/
  Complete output (617 lines):

  Edit setup.cfg to change the build options; suppress output with --quiet.

  BUILDING MATPLOTLIB
    matplotlib: yes [3.3.1]
        python: yes [3.8.12 (default, Jan 26 2022, 15:15:26)  [GCC 8.3.0]]
      platform: yes [linux]
   sample_data: yes [installing]
         tests: no  [skipping due to configuration]
        macosx: no  [Mac OS-X only]

and ends with

ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-babip5km/psutil_c8872f416e734ac7b9580be584ede140/setup.py'"'"'; __file__='"'"'/tmp/pip-install-babip5km/psutil_c8872f416e734ac7b9580be584ede140/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ybj7jtfb/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/psutil Check the logs for full command output.
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
The command '/bin/sh -c pip install --no-cache-dir --use-feature=in-tree-build /opt/covalent' returned a non-zero code: 1

First step should be to see what's breaking the matplotlib installation.

Improve automation for versioning and changelog

@scottwn commented on Wed Nov 24 2021

In GitLab by @scottwn on Nov 15, 2021, 14:11

Overview

It might be helpful to use tagged commit messages to automatically generate changelog and semantic version.

Technical Details

Currently GitLab CI has limitations that cause issues with this. It is not possible for the squash commit to automatically include all the commit messages from the branch. See https://gitlab.com/gitlab-org/gitlab/-/issues/26303

If the squash commit issue can be resolved, then we can use a tool like https://github.com/relekang/python-semantic-release

Expected Results

Versioning and changelog will be controlled automatically by commit messages rather than by hand.

Minor `README` fixes

What should we add?

In README.md sometimes if the user directly copy-pastes the example code written, it might not work as expected. Two places where I found this were:

  1. In README.md for the "Without and With Covalent" example, result = ct.get_result(dispatch_id), here since the default value of wait parameter is False, this might not give the expected result because it didn't wait for the calculation to complete. So, simply changing it to result = ct.get_result(dispatch_id, wait=True) will do the trick.

  2. In the same file, ideally we would like to increase the width of the Without and With Covalent table, using some markdown magic in order to fit the dispatch_id = run_experiment.dispatch in one line.

  3. In covalent_ui/README.md the port number is mentioned as 48008 whereas now the UI server runs at 47007 by default.

Add type hint for base executor

Some of the fields in the covalent/executor/base.py module are missing Type hints. This issue is to ensure that all the fields have type hints.

Covalent dispatcher OSERROR - too many open files

@FyzHsn commented on Tue Jan 04 2022


@codecov[bot] commented on Tue Jan 04 2022

Codecov Report

Merging #246 (3e78d08) into develop (434a95b) will increase coverage by 0.56%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #246      +/-   ##
===========================================
+ Coverage    70.76%   71.32%   +0.56%     
===========================================
  Files           23       23              
  Lines         1245     1245              
===========================================
+ Hits           881      888       +7     
+ Misses         364      357       -7     
Impacted Files Coverage Δ
covalent_dispatcher/_core/__init__.py 91.12% <0.00%> (+5.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 434a95b...3e78d08. Read the comment docs.

version workflow is reporting false failure

@scottwn commented on Thu Jan 27 2022

example of the problem 89f21cd

This is a squashed merge commit to develop. The version workflow should not run on it, since it is in develop which is ignored in the workflow.

However, you can see the red ❌ showing "some checks were not successful". Those checks are version runs when the commit is pushed as part of a rebased branch, after it has been merged to develop. They are expected to fail. We need to find some way to prevent the check from running on such pushes.

UI: post-release fixes and features

@valkostadinov commented on Tue Jan 25 2022

This will track the development of post-release fixes and additional UI features:

  • re-order sidebar sections according to latest figma designs
  • update favicon
  • remove dispatch id from tab title
  • fit new uuids
  • show lattice and electron inputs
  • display error message on failed status for lattice and electron
  • extraction and display of executor attributes (log_stdout, conda_env, etc.)
  • log file output - display in Output tab of all available log file output. Initial solution will "tail" the files by periodically making an API call that fetches last n lines to simulate near real-time tailing.
  • auto-refresh result state on initial render of listing and graph pages
  • adjust theme text primary and secondary colors
  • truncate long electron/param names inside graph nodes at 70 chars, show full name elsewhere

Switch from `workflow.dispatch` to `covalent.dispatch`

What should we add?

Currently we are using workflow.dispatch to send a lattice workflow for execution to the dispatcher server. Maybe we can use a more intuitive approach for dispatching which is similar to how we obtain the result for a particular dispatch. The proposal is to have a function covalent.dispatch similar to covalent.get_result so that it is more intuitive for users to send a job the same way they get the result.

Resolving this issue may also involve updating the README.mds, tutorials, documentation, etc. So it will turn into a big issue.

Describe alternatives you've considered.

No response

Replacing dask with a suitable alternative

What should we add?

There are several things which make the usage of dask in our scenario not worth it or almost unreasonable,

  1. We are already creating a custom graph for deciding the order of execution
  2. Dask creates more servers for their scheduler, ui, etc. which further occupies ports on the local network
  3. We are mainly using dask to implement multithreading; our purposes our much simpler than how dask is supposed to be used.
  4. Considering the fact that how custom built covalent is, we might need a higher level of control over how tasks are executed.
  5. Jobs start failing when CPU is under load, instead we need the tasks to wait for resource availability before being started.

Considering these issues, I propose we implement our own multithreading and remove dask dependency.

Describe alternatives you've considered.

No response

Unreasonable amount of time spent in getting function string

After a simple profiling of a covalent example, it was found that almost 98% of the total time (to dispatch) was spent in get_serialized_function_str inside which, It seems like getting imports is main the culprit. We need to modify/move this function so that dispatching a job doesn't take a long time when a lot of electrons are involved.

Example used for profiling:

@ct.electron
@qml.qnode(qml.device('default.qubit', wires=1))
def calculate(x):
    for _ in range(30):
        qml.RX(x, wires=0)
        qml.RY(x, wires=0)
    return qml.expval(qml.PauliZ(0))

@ct.lattice
def parallalize(x_list):
    values=[]
    for x in x_list:
        values.append(calculate(x))
    return values

job_id=parallalize.dispatch(x_list=np.arange(50))

Time taken to dispatch ~ 2-3 minutes

Afterwards the following in covalent/_shared_files/utils.py : get_serialized_function_str was commented out,

# imports = _get_imports_from_source()
ct_decorators = []  # _get_cova_imports(imports)

Then the time taken gets reduced to ~ 0.005 seconds

This issue blocks the PR #66 as it will most likely amplify the impact.

Add heartbeat file to get dispatcher / UI server updates

Make the covalent status method more robust by checking the server status from a heartbeat file in addition to checking the existence of the dispatcher/ui.pid files. This will ensure that we provide the correct status when the server crashes and the pid files has not been deleted.

Fix results directory behavior

@wjcunningham7 commented on Fri Jan 21 2022

Currently write_streams_to_file will attach the stdout and stderr files to the results directory if those files are specified using relative paths. However, this always queries the results directory in the config file when instead it should be looking at the electron metadata.

Edit:

The stdout/stderr log files that are written to is determined by priority.

  1. Top priority goes to the stdout/stderr log files in the covalent.conf file IF it is an absolute path. At the moment these files are overwritten but ideally they should be appended to.
  2. In the case the the stdout/etderr log files in the covalent.conf file is a relative path, we first check if results_dir has been specified in the electron metadata. If it has been specified, that has the second highest priority. The logfiles should be written to results_dir/dispatch_id/ folder.
  3. Lastly, in the case where the log file paths are relative and results_dir has not been specified for electrons, we should use the results_dir in the covalent.conf file and store the log files in the results_dir/dispatch_id folder.

In this issue, we ensure that priority 2 is indeed being checked and implemented.

Testing Docker image before deploying to ECR

We need to add a test in the GitHub workflow to check that the image runs before pushing it to ECR. I imagine this will go in publish_master.yml.

The test should start the container using the method specified in Getting Started:
https://github.com/AgnostiqHQ/covalent/blob/master/doc/source/getting_started/index.rst?plain=1#L48

And then do some very simple dispatch example.

Some ideas on how to do this properly are on Stackoverflow:

https://stackoverflow.com/questions/65330029/access-a-container-by-hostname-in-github-actions-from-within-an-action

https://stackoverflow.com/questions/67134410/how-to-test-my-dockerfile-for-my-python-project-using-github-actions

Add troubleshooting section

Acceptance Criteria

  • Should be an accessible section in Covalent RTD
  • Should cover common issues and how to resolve them (see below)

Guide

Should cover typical issues users may face including but not limited to:

  1. Issues using covalent without virtual environments (resolution: use pyenv, conda, poetry)
  2. For long running workflows what to do if you see max-depth issue
  3. For large workflows how to increase open file limit
  4. Issues faced when running compute intensive workflows (how to increase dask cluster compute)
  5. Also see: https://agnostiqworkspace.slack.com/archives/C02JS6NAFV3/p1643123017371300

Handle reading the result better in get_result(wait=True).

@santoshkumarradha commented on Wed Dec 15 2021

Overview

Currently we do https://github.com/AgnostiqHQ/covalent-staging/blob/9348253dbb3037616e18c12b7b5b0d83c3f21b62/covalent/_results_manager/results_manager.py#L68 to check if result is updated. This is bit of a bad design as when running in wait mode, we are reading a file that is actively be written to by the dispatcher without proper flushing. Ideally we would want to wait till the file has been modified and then read it again and flush it properly and not read it infinitely in a while loop. We should use more standard observer tools/patterns for watching like watchdog - https://github.com/gorakhargosh/watchdog

Verify and modify `redirect_stdout` behavior in executor

Environment

  • Covalent version: 0.22.9
  • Python version: 3.8.12
  • Operating system: macOS Big Sur

What is happening?

In file covalent/executor/executor_plugins/local.py we are using contextlib.redirect_stdout and contextlib.redirect_stderr functions to redirect any print statements performed inside an electron to a io.StringIO object which is then used to write those strings to a file if requested by the user and then into the results object. If no file is given then it prints to stdout/stderr but the strings are still stored in the result object.

The main issue with using redirect_stdout and redirect_stderr is:

Note that the global side effect on sys.stdout means that this context manager is not suitable for use in library code and most threaded applications. It also has no effect on the output of subprocesses. However, it is still a useful approach for many utility scripts.

as mentioned in their documentation here. Which makes those print statements appear in the dispatcher.log file sometimes which is undesirable. We will need to find a way to keep this useful redirection feature but also do it in a thread-safe manner.

How can we reproduce the issue?

Since the nature of this issue is random, a working example is hard to produce. But one can try dispatching a job where in an electron function definition print("Something") is used multiple times and a log_stdout argument is passed to the electron's executor. Then check if Something is printed in the dispatcher log file at ~/.cache/covalent/dispatcher.log. If it does get printed there even once then the issue can be confirmed.

What should happen?

Printing anything in an electron should only write that string to the log_stdout file mentioned in that electron's executor and not in the global stdout of the dispatcher process.

Any suggestions?

No response

Fix results directory behavior

@wjcunningham7 commented on Fri Jan 21 2022

Currently write_streams_to_file will attach the stdout and stderr files to the results directory if those files are specified using relative paths. However, this always queries the results directory in the config file when instead it should be looking at the electron metadata.

Edit:

The stdout/stderr log files that are written to is determined by priority.

  1. Top priority goes to the stdout/stderr log files in the covalent.conf file IF it is an absolute path. At the moment these files are overwritten but ideally they should be appended to.
  2. In the case the the stdout/etderr log files in the covalent.conf file is a relative path, we first check if results_dir has been specified in the electron metadata. If it has been specified, that has the second highest priority. The logfiles should be written to results_dir/dispatch_id/ folder.
  3. Lastly, in the case where the log file paths are relative and results_dir has not been specified for electrons, we should use the results_dir in the covalent.conf file and store the log files in the results_dir/dispatch_id folder.

In this issue, we ensure that priority 2 is indeed being checked and implemented.

PyPI Packaging Problem

Environment

  • Covalent version: 0.22.6
  • Python version: 3.8
  • Operating system: Ubuntu 18.04

What is happening?

The directories covalent/executor/executor_plugins and covalent_dispatcher/_service are not being properly included. This is because neither has an __init__.py file inside, so they are "data" directories. From my understanding, the MANIFEST.in as well as package_data_dirs in setup.py will control what's included.

How can we reproduce the issue?

Run pip install cova and examine what's installed into the site-packages directory. Also look at the tarball hosted on PyPI.

What should happen?

Those directories should be included.

Any suggestions?

No response

Tests for iterability of electrons inside lattice

What should we add?

We currently allow reading electrons like this inside a lattice:

@electron
def task():
    ...

@lattice
def workflow():
    ...
    a, b = task()
    c = task(a[0])
    d = task(b.test)
    ...

And similar ways to read the electrons a and b, but we don't have tests written for them yet. We need these tests to ensure future changes don't break this functionality.

Describe alternatives you've considered.

No response

Covalent's executer in API docs RTD

@santoshkumarradha commented on Tue Jan 18 2022

Edit: Moving the preceding subtask to a different issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.