Giter Site home page Giter Site logo

burlachenkok / flpytorch Goto Github PK

View Code? Open in Web Editor NEW
36.0 36.0 8.0 85.67 MB

FL_PyTorch: Optimization Research Simulator for Federated Learning

License: Apache License 2.0

Python 99.98% Batchfile 0.01% Shell 0.02%
federated-learning optimization-algorithms

flpytorch's People

Contributors

burlachenkok avatar techwizrd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

flpytorch's Issues

Compactly store experimental results

The storage space for experimental results is pretty big
(For Neural Nets, it can easily be 20-100 Gigabytes). Investigate how it can save memory in serialized at the end of the day server state "H."

Support for mac osx

Hi,
I am trying to install fl_pytorch on mac osx Ventura 13.0.1. with arm64 processor.
I know there is no cuda/gpu support for mac osx -- so does one need to edit the run.py file and comment out all the cuda/gpu lines of code?
If not, please suggest how this should be done!
Many thanks,
Haimonti

Runtime Error: expected scalar type Long but found Double

Hi @burlachenkok,
I installed the project successfully but ended up with this runtime error "expected scalar type Long but found Double" that occurred with different configurations.
Another error I found while trying different datasets "Using a target size (torch.Size([500])) that is different to the input size (torch.Size([500, 10])) is deprecated. Please ensure they have the same size."
I would greatly appreciate any help.

Here is the log of the project:

Job '{simcounter}job_id{now}' with algorithm 'FEDAVG' has been sumbitted

Command line for experiment with job_id=1_job_id_1679617053

python run.py
--rounds "3000"
--client-sampling-type "uniform"
--num-clients-per-round "10"
--global-lr "0.1"
--global-optimiser "sgd"
--global-weight-decay "0.0"
--number-of-local-iters "1"
--batch-size "500"
--local-lr "0.01"
--local-optimiser "sgd"
--local-weight-decay "0.0"
--dataset "cifar10_fl"
--loss "crossentropy"
--model "tv_resnet18"
--use-pretrained
--train-last-layer
--metric "top_1_acc"
--global-regulizer "none"
--global-regulizer-alpha "0.0"
--checkpoint-dir "../check_points"
--do-not-save-eval-checkpoints
--data-path "../data/"
--compute-type "fp64"
--gpu "-1"
--log-gpu-usage
--num-workers-train "0"
--num-workers-test "0"
--deterministic
--manual-init-seed "123"
--manual-runtime-seed "456"
--group-name ""
--comment ""
--hostname "nfl"
--eval-every "100"
--eval-async-threads "0"
--save-async-threads "0"
--threadpool-for-local-opt "0"
--run-id "1_job_id_1679617053"
--algorithm "fedavg"
--algorithm-options "internal_sgd:full-gradient"
--logfile "../logs/1_log_1679617053.txt"
--client-compressor "ident:5%"
--extra-track "full_gradient_norm_train,full_objective_value_train"
--allow-use-nv-tensorcores
--initialize-shifts-policy "zero"
--wandb-key ""
--wandb-project-name "fl_pytorch_simulation"
--loglevel "debug"
--logfilter ".*"
--out "1_job_id_1679617053.bin"

===========================================================

2023-03-24 11:19:48.269860

===========================================================

Command line for currently selected configuration in GUI

python run.py
--rounds "3000"
--client-sampling-type "uniform"
--num-clients-per-round "10"
--global-lr "0.1"
--global-optimiser "SGD"
--global-weight-decay "0.0"
--number-of-local-iters "1"
--batch-size "500"
--local-lr "0.01"
--local-optimiser "SGD"
--local-weight-decay "0.0"
--dataset "cifar10_fl"
--loss "CROSSENTROPY"
--model "tv_resnet18"
--use-pretrained
--train-last-layer
--metric "top_1_acc"
--global-regulizer "none"
--global-regulizer-alpha "0.0"
--checkpoint-dir "../check_points"
--do-not-save-eval-checkpoints
--data-path "../data/"
--compute-type "fp64"
--gpu "-1"
--log-gpu-usage
--num-workers-train "0"
--num-workers-test "0"
--deterministic
--manual-init-seed "123"
--manual-runtime-seed "456"
--group-name ""
--comment ""
--hostname "nfl"
--eval-every "100"
--eval-async-threads "0"
--save-async-threads "0"
--threadpool-for-local-opt "0"
--run-id "{simcounter}job_id{now}"
--algorithm "FEDAVG"
--algorithm-options "internal_sgd:full-gradient"
--logfile "../logs/{simcounter}log{now}.txt"
--client-compressor "ident:5%"
--extra-track "full_gradient_norm_train,full_objective_value_train"
--allow-use-nv-tensorcores
--initialize-shifts-policy "zero"
--wandb-key ""
--wandb-project-name "fl_pytorch_simulation"
--loglevel "DEBUG"
--logfilter ".*"
--out "current.bin"

===========================================================

Release unoccupied cache memory from PyTorch...
Running the garbage collector...
Done. 0.01 MB was removed from Virtual and Resident memory of interpreter. Current used amount of memory is 38366.88 MBytes
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.2.1 (x86_64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) [Clang 6.0 (clang-600.0.57)] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.24.2
[pip3] torch==1.10.0
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.0
[conda] Could not collect

Offline centralized storage of experimental results

  • It would be nice to have ability to have some SQL database to store experiments configuration (e.g. MySQL, PostgreSQL)
  • And augment that database by some filestorage where experimental results are stored (Reddis, Casandra, or just plaing files transfered in some way)

Refactor code of GUI application

When we started, the GUI was simple. But the GUI code in terms of the number of lines has been expanded heavily.
Refactor that code with separation classes into different source files - it may be a good idea for reading.

Running Error

@burlachenkok Hi, I am really interested in your project and have established the environment. However, when I ran it, I could not get the result successfully. One of the errors I met showed "ValueError: Cannot take a larger sample than population when 'replace=False'." Do you have any solutions for this? Thank you!
1669264799875
Configuration:
1669264904065
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.