Adding training functionalities to Toolkit#108
Open
laserkelvin wants to merge 378 commits into
Open
Conversation
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Add TrainingUpdateHook framework and orchestrator
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Add `MixedPrecisionHook`
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> # Conflicts: # nvalchemi/training/hooks/update.py # test/training/test_strategy.py
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> # Conflicts: # docs/modules/training/hooks.rst # test/training/test_strategy.py
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> # Conflicts: # nvalchemi/training/hooks/update.py # test/training/test_strategy.py
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Add EMAHook for exponential moving average of model weights
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
…t-loading Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
This reverts commit 22ecded. Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Collaborator
Author
|
/ok to test 8073ecf |
Collaborator
Author
|
/ok to test bda67ad |
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Collaborator
Author
|
/ok to test 8aa39b4 |
8aa39b4 to
4d4e024
Compare
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Collaborator
Author
|
/ok to test 4d49093 |
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Collaborator
Author
|
/ok to test 9d99fc3 |
Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Collaborator
Author
|
/ok to test b66aeaf |
Comment on lines
+260
to
+269
| Use :class:`~nvalchemi.dynamics.hooks.StageTimingHook` for lightweight stage | ||
| timing and optional NVTX ranges. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from nvalchemi.dynamics.hooks import ProfilerHook | ||
| from nvalchemi.dynamics.hooks import StageTimingHook | ||
|
|
||
| hook = ProfilerHook(enable_nvtx=True, enable_timer=True, frequency=10) | ||
| hook = StageTimingHook("step", frequency=10, log_path="stage_timing.csv") | ||
| dynamics = DemoDynamics(model=model, n_steps=1_000, dt=0.5, hooks=[hook]) | ||
| dynamics.run(batch) |
Collaborator
There was a problem hiding this comment.
This example doesn't really tell me what the heck this hook is doing, what "step" refers to, what "frequency" means. We don't need full api doc here but I would expect just another sentence with sufficient exposition explaining what is going on here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ALCHEMI Toolkit Pull Request
Description
This PR introduces the core functionalities required to support training and fine-tuning of models in
nvalchemi-toolkit.Type of Change
Related Issues
Changes Made
create_model_specmethods and dynamic pydantic model creation forpickle-less serialization of configurationTrainingStrategypydantic model as a recipe validation and loop executor. The execution is highly modular and extendible, allowing for (hopefully) arbitrarily complex training workflows to be built, and not limited to MLIPsFineTuningStrategythat specializesTrainingStrategyfor...fine-tuning workflows by making pre-existing checkpoints and layer addition/modification integral to the workflowTesting
make pytest)make lint)Checklist
Additional Notes
Tip
This repository uses Greptile, an AI code review service, to help conduct
pull request reviews. We encourage contributors to read and consider suggestions
made by Greptile, but note that human maintainers will provide the necessary
reviews for merging: Greptile's comments are not a qualitative judgement
of your code, nor is it an indication that the PR will be accepted/rejected.
We encourage the use of emoji reactions to Greptile comments, depending on
their usefulness and accuracy.