star-whale / starwhale Goto Github PK
View Code? Open in Web Editor NEWan MLOps/LLMOps platform
Home Page: https://starwhale.ai
License: Apache License 2.0
an MLOps/LLMOps platform
Home Page: https://starwhale.ai
License: Apache License 2.0
Current Behavior
No check for agent daemonset.
Proposed Behavior
Add livenessProbe/readinessProbe/startupProbe for agent daemonset.
Proposed Behavior
Solution Proposal
input.json
structure can be used by swds and jsonline, so we should refactor swds
field.
Current Behavior
use UTC timezone.
Proposed Behavior
controller can config timezone for cluster, then task/agent use it.
Is your feature request related to a problem? Please describe.
user don't know what the version of SW is when he/she is attempting to make a bug report
Describe the solution you'd like
Is your feature request related to a problem? Please describe.
Swcli only use pre-defined pip-req.txt
or pip freeze
in venv environment or conda export
in conda environment today. Can we auto detect python dependencies by static-code-analysis ? It will more simple for end-user.
Describe the solution you'd like
WIP.
Current Behavior
swmp 1.0 is a very simple structure, which is a single tar file. one line changed will make a new swmp tar, it is uneconomical.
Proposed Behavior
blake2b
checksum into meta file.Current Behavior
swcli build swmp only.
Proposed Behavior
users can use web-ui to create, edit the swmp.
Current Behavior
controller and agent have indepent docker images.
Proposed Behavior
use one single docker image to deploy controller and agent.
Describe the bug
when we run swcli model list
, it will take a lot of time to read meta in swmp tar file for meta.
Expected behavior
quick、efficient method for list.
Current Behavior
Only very long version to track swmp/swds.
Proposed Behavior
Tag can reduce the users short-term memory load and provide more human friendly interactive.
Solution Proposal
WIP.
Current Behavior
only in-premise version.
Proposed Behavior
Provide a cloud version, include some features:
Current Behavior
use full version name.
Proposed Behavior
use short version name, such as only seven prefix of version.
Current Behavior
no file index field.
Proposed Behavior
add index files for ppl output
Current Behavior
Proposed Behavior
Ref
Current Behavior
no validator.
Proposed Behavior
add output validator.
Current Behavior
setup.py and requirments.txt include the same python requirements.
Proposed Behavior
Only one place include python requirements.
Describe the bug
Users can write wrong runtime field which is not actual python runtime. This issue will lead import wrong python version.
To Reproduce
Write runtime: 3.9, but venv local mode use python3.7.
Expected behavior
Describe the bug
Expected behavior
conda export MUST include starwhale, if not , it will lead to import error in ppl/cmp phase.
Proposal
add some doc and console warnings
Current Behavior
no text classification.
Proposed Behavior
Solution Proposal
ref: https://github.com/pytorch/serve/blob/master/examples/text_classification
Current Behavior
no object detection example.
Proposed Behavior
Solution Proposal
Current Behavior
run push cmd, and wait wait cmd exit. no upload progress, no auto retry.
Proposed Behavior
createTime
vs createdTime
vs startTime
/api/v1/project/{pid}/dataset/{did}
may return serialized meta field directly./api/v1/project/{pid}/model/{mid}/version
add meta field./api/v1/login
return user's details, such as role, createdtime.owner
field in model/dataset list api? It will bring a lots of redundant fields./api/v1/project
api add username query string to show the specific projects? in default, only show the projects of current login user./api/v1/project/{pid}/job/{jid}/task
finishTime
and duration
field.task type
field, which can describe cmp
or ppl
task.modelName
is required for /api/v1/project/{pid}/model
?
swcli model list --remote
cmd in local environment. users may specify multi dimensional parameters, such as --project
, --model-name
, --self
(own projects models)./api/v1/project/{pid}/model/{mid}
api add version into url path or params.
Describe the bug
job/result expect to be job metrics, but log content returned
To Reproduce
Steps to reproduce the behavior:
Expected behavior
expect to be job metrics
Additional context
none
Is your feature request related to a problem? Please describe.
Now we give an unique constrain on anget through ip address. But the ip address of an agent is constantly changing in K8S context. A new uniqueness field of agent is required
Describe the solution you'd like
Current Behavior
Manual trigger
Proposed Behavior
automation
Solution Proposal
based on github actions
Current Behavior
show 0.1.0-dev15
only.
Proposed Behavior
show more details for swcli version, such as build date, git commit sha.
Is your feature request related to a problem? Please describe.
We need write exclude_pkg_data
in dataset.yaml or model.yaml twice. In another side, a lot of ignore fields may be disturb the concision of model.yaml/dataset.yaml.
Describe the solution you'd like
Define .swignore
file, REMOVE exclude_pkg_data field.
Current Behavior
too large.
Proposed Behavior
remove useless dependecies.
Describe the bug
/api/v1/project/{pid}/job/{jid}/task/result
use typo world.
To Reproduce
visit result api.
Expected behavior
mutlilabel
-> multilabel
Describe the bug
raise CreateFailed: root path '/home/xxx/.cache/starwhale/pkg' does not exist
exception.
Current Behavior
only mnist.
Proposed Behavior
Solution Proposal
Current Behavior
docker-in-docker
Proposed Behavior
Current Behavior
multi_classification does not include auc/roc data.
Proposed Behavior
auc/roc is very common and import for multi_classification problem.
Solution Proposal
add methods for multi_classification decorator.
Describe the bug
if minio/s3 cannot be connected, the load program will be blocked unitl the user kill the process.
Expected behavior
Current Behavior
one input.json only include one dataset.
Proposed Behavior
include dataset name dimension into input.json.
Current Behavior
agent generate fixed oss connections via properties when bootstrap
Proposed Behavior
dynamically generated according to the connection information sent by the controller
Solution Proposal
dynamicallly generated
Current Behavior
We only run the complete-flow evaluation in controller.
Proposed Behavior
When we run swcli eval run --local
,cli will use local swmp and swds to run complete-flow evaluation. The feature make a great help for debug.
Design Proposal
swcli eval run [--local/--remote] --model xx --dataset xx --dataset yy [--project xx] [--baseimage xx] [--resource gpu:1] [--name xx] [--description xx] [--phase]
--local/--remote
: optional, in local or remote cluster, remote cluster is the default option.--model
: required, model id or model name:version(local mode)--dataset
: required, dataset id or datset name:version(local mode)--project
: optional, project id, only for remote cluster mode.--baseimage
: optional, task run image. if omitted, starwhale will use the latest baseimage, name is: starwhaleai/starwhale:latest
--resource
: optional, only for remote cluster mode, fmt is [resource:gpu|cpu]:[cnt:int >0], default is cpu:1--name
: optional, eval job name, the username-timestamp-randomstr is the default name.--desc
: optional.--gencmd
: optional, only generate docker run cmd in local mode.--phase
: optional, only for local mode. choices: all|ppl|cmp, default is all.docker run
cmd for ppldocker run
cmd for cmp{@snapshot_dir}/run/eval/{version}/
, we will store all result artifacts.swcli eval list --local
will show local eval list.swcli eval inspect xxx --local
will show local result and report.swcli dataset fuse xxx
: generate fuse input.jsonswcli model extract xxx
: extract swmp tarCurrent Behavior
very simple mnist example will use more than 2GB size for swmp, should we do some optimizations for reduce swmp size?
Proposed Behavior
Current Behavior
no ci
Proposed Behavior
Current Behavior
no ci
Proposed Behavior
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Subtasks
Current Behavior
only mnist example
Proposed Behavior
Solution Proposal
Current Behavior
no homepage
Proposed Behavior
use github-pages to host starwhale.ai homepage
Current Behavior
controller does not support proxy or cache. when agent reproduce swmp python package or pull image, it will spend a lot for time.
Proposed Behavior
Current Behavior
only support http.
Proposed Behavior
Cloud and on-premise both https.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.