Artifacts FAQs
The proceeding questions are commonly asked questions about W&B Artifacts and W&B Artifact workflows.
Questions about Artifacts
Who has access to my artifacts?
Artifacts inherit the access to their parent project:
- If the project is private, then only members of the project's team have access to its artifacts.
- For public projects, all users have read access to artifacts but only members of the project's team can create or modify them.
- For open projects, all users have read and write access to artifacts.
Questions about Artifacts workflows
This section describes workflows for managing and editing Artifacts. Many of these workflows use the W&B API, the component of our client library which provides access to data stored with W&B.
How do I log an artifact to an existing run?
Occasionally, you may want to mark an artifact as the output of a previously logged run. In that scenario, you can reinitialize the old run and log new artifacts to it as follows:
with wandb.init(id="existing_run_id", resume="allow") as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
artifact.add_file("my_data/file.txt")
run.log_artifact(artifact)
How do I set a retention or expiration policy on my artifact?
If you have artifacts that are subject to data privacy regulations such as dataset artifacts containing PII, or want to schedule the deletion of an artifact version to manage your storage, you can set a TTL (time-to-live) policy. Learn more in this guide.
How can I find the artifacts logged or consumed by a run? How can I find the runs that produced or consumed an artifact?
W&B automatically tracks the artifacts a given run has logged as well as the artifacts a given run has used and uses the information to construct an artifact graph -- a bipartite, directed, acyclic graph whose nodes are runs and artifacts, like this one (click "Explode" to see the full graph).
You can walk this graph programmatically with the Public API, starting from either a run or an artifact.
- From an Artifact
- From a Run
api = wandb.Api()
artifact = api.artifact("project/artifact:alias")
# Walk up the graph from an artifact:
producer_run = artifact.logged_by()
# Walk down the graph from an artifact:
consumer_runs = artifact.used_by()
# Walk down the graph from a run:
next_artifacts = consumer_runs[0].logged_artifacts()
# Walk up the graph from a run:
previous_artifacts = producer_run.used_artifacts()
api = wandb.Api()
run = api.run("entity/project/run_id")
# Walk down the graph from a run:
produced_artifacts = run.logged_artifacts()
# Walk up the graph from a run:
consumed_artifacts = run.used_artifacts()
# Walk up the graph from an artifact:
earlier_run = consumed_artifacts[0].logged_by()
# Walk down the graph from an artifact:
consumer_runs = produced_artifacts[0].used_by()
How do I best log models from runs in a sweep?
One effective pattern for logging models in a sweep is to have a model artifact for the sweep, where the versions will correspond to different runs from the sweep. More concretely, you would have:
wandb.Artifact(name="sweep_name", type="model")
How do I find an artifact from the best run in a sweep?
You can use the following code to retrieve the artifacts associated with the best performing run in a sweep:
api = wandb.Api()
sweep = api.sweep("entity/project/sweep_id")
runs = sorted(sweep.runs, key=lambda run: run.summary.get("val_acc", 0), reverse=True)
best_run = runs[0]
for artifact in best_run.logged_artifacts():
artifact_path = artifact.download()
print(artifact_path)
How do I save code?
Use save_code=True
in wandb.init
to save the main script or notebook where you’re launching the run. To save all your code to a run, version code with Artifacts. Here’s an example:
code_artifact = wandb.Artifact(type="code")
code_artifact.add_file("./train.py")
wandb.log_artifact(code_artifact)
Using artifacts with multiple architectures and runs?
There are many ways in which you can think of version a model. Artifacts provides a you a tool to implement model versioning as you see fit. One common pattern for projects that explore multiple model architectures over a number of runs is to separate artifacts by architecture. As an example, one could do the following:
- Create a new artifact for each different model architecture. You can use
metadata
attribute of artifacts to describe the architecture in more detail (similar to how you would useconfig
for a run). - For each model, periodically log checkpoints with
log_artifact
. W&B will automatically build a history of those checkpoints, annotating the most recent checkpoint with thelatest
alias so you can refer to the latest checkpoint for any given model architecture usingarchitecture-name:latest
Reference Artifact FAQs
How can I fetch these Version IDs and ETags in W&B?
If you've logged an artifact reference with W&B and if the versioning is enabled on your buckets then the version IDs can be seen in the S3 UI. To fetch these version IDs and ETags in W&B, you can fetch the artifact and then get the corresponding manifest entries. For example:
artifact = run.use_artifact("my_table:latest")
for entry in artifact.manifest.entries.values():
versionID = entry.extra.get("versionID")
etag = manifest_entry.extra.get("etag")