Fix torch multiprocessing error in gptneox conversion script#587
Fix torch multiprocessing error in gptneox conversion script#587rohithkrn wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
can you provide some buggy cases that the current script can not deal with? |
|
@AkiyamaYummy The command I am using is: It fails at the above line #L141 with error because I don't have that directory. I see in the gptneox_guide, it suggests to clone the model To get around the error, I added the check and ran the same command. It runs in to the error which is similar to #443 Therefore, I introduced the change in this PR to fix the errors. |
|
@byshiue @AkiyamaYummy |
|
By using Of course, you can also make this script more convenient by being compatible with this scenario. PS: Some of us, including myself, prefer to maintain Huggingface's model files on our own because Huggingface defaults to downloading models to its cache folder, and I don't want large files to take up a lot of my hard disk resources without being visible in my workspace. |
|
@AkiyamaYummy that makes sense, my change will not break that behavior. |
|
@byshiue @AkiyamaYummy reminder for review. |
Fixes:
Context has already been seterror from torch multiprocessing similar to [BUG FIX] place multi-processing init to main method #443device_countis incorrectly set in the example script.