Fastai
If you're using fastai to train your models, W&B has an easy integration using the WandbCallback
. Explore the details in interactive docs with examples →
Log with W&B
a) Sign up for a free account at https://wandb.ai/site and then log in to your wandb account.
b) Install the wandb library on your machine in a Python 3 environment using pip
c) log in to the wandb library on your machine. You will find your API key here: https://wandb.ai/authorize.
- Command Line
- Notebook
pip install wandb
wandb login
!pip install wandb
import wandb
wandb.login()
Then add the WandbCallback
to the learner
or fit
method:
import wandb
from fastai.callback.wandb import *
# start logging a wandb run
wandb.init(project="my_project")
# To log only during one training phase
learn.fit(..., cbs=WandbCallback())
# To log continuously for all training phases
learn = learner(..., cbs=WandbCallback())
If you use version 1 of Fastai, refer to the Fastai v1 docs.
WandbCallback Arguments
WandbCallback
accepts the following arguments:
Args | Description |
---|---|
log | Whether to log the model's: "gradients " , "parameters ", "all " or None (default). Losses & metrics are always logged. |
log_preds | whether we want to log prediction samples (default to True ). |
log_preds_every_epoch | whether to log predictions every epoch or at the end (default to False ) |
log_model | whether we want to log our model (default to False). This also requires SaveModelCallback |
model_name | The name of the file to save, overrides SaveModelCallback |
log_dataset |
Note: subfolder "models" is always ignored. |
dataset_name | name of logged dataset (default to folder name ). |
valid_dl | DataLoaders containing items used for prediction samples (default to random items from learn.dls.valid . |
n_preds | number of logged predictions (default to 36). |
seed | used for defining random samples. |
For custom workflows, you can manually log your datasets and models:
log_dataset(path, name=None, metadata={})
log_model(path, name=None, metadata={})
Note: any subfolder "models" will be ignored.
Distributed Training
fastai
supports distributed training by using the context manager distrib_ctx
. W&B supports this automatically and enables you to track your Multi-GPU experiments out of the box.
A minimal example is shown below:
- Script
- Notebook
import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback
wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")
def train():
dls = ImageDataLoaders.from_name_func(
path,
get_image_files(path),
valid_pct=0.2,
label_func=lambda x: x[0].isupper(),
item_tfms=Resize(224),
)
wandb.init("fastai_ddp", entity="capecape")
cb = WandbCallback()
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
with learn.distrib_ctx(sync_bn=False):
learn.fit(1)
if __name__ == "__main__":
train()
Then, in your terminal you will execute:
$ torchrun --nproc_per_node 2 train.py
in this case, the machine has 2 GPUs.
You can now run distributed training directly inside a notebook!
import wandb
from fastai.vision.all import *
from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback
wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"
def train():
dls = ImageDataLoaders.from_name_func(
path,
get_image_files(path),
valid_pct=0.2,
label_func=lambda x: x[0].isupper(),
item_tfms=Resize(224),
)
wandb.init("fastai_ddp", entity="capecape")
cb = WandbCallback()
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
with learn.distrib_ctx(in_notebook=True, sync_bn=False):
learn.fit(1)
notebook_launcher(train, num_processes=2)
Logging only on the main process
In the examples above, wandb
launches one run per process. At the end of the training, you will end up with two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you will have to detect in which process you are manually and avoid creating runs (calling wandb.init
in all other processes)
- Script
- Notebook
import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback
wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")
def train():
cb = []
dls = ImageDataLoaders.from_name_func(
path,
get_image_files(path),
valid_pct=0.2,
label_func=lambda x: x[0].isupper(),
item_tfms=Resize(224),
)
if rank_distrib() == 0:
run = wandb.init("fastai_ddp", entity="capecape")
cb = WandbCallback()
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
with learn.distrib_ctx(sync_bn=False):
learn.fit(1)
if __name__ == "__main__":
train()
in your terminal call:
$ torchrun --nproc_per_node 2 train.py
import wandb
from fastai.vision.all import *
from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback
wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"
def train():
cb = []
dls = ImageDataLoaders.from_name_func(
path,
get_image_files(path),
valid_pct=0.2,
label_func=lambda x: x[0].isupper(),
item_tfms=Resize(224),
)
if rank_distrib() == 0:
run = wandb.init("fastai_ddp", entity="capecape")
cb = WandbCallback()
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
with learn.distrib_ctx(in_notebook=True, sync_bn=False):
learn.fit(1)
notebook_launcher(train, num_processes=2)
Examples
- Visualize, track, and compare Fastai models: A thoroughly documented walkthrough
- Image Segmentation on CamVid: A sample use case of the integration