-
Notifications
You must be signed in to change notification settings - Fork 115
Description
Silent Failure:
I'm attempting to use llama.py to reduce the size of the Airochronos L2 13B model.
sparsegpt on ξ master via π v3.10.12
β― python ./llama.py kingbri/airochronos-l2-13B c4 --sparsity 0.5 --save ./airochronos-l2-13B-sparse --wbits 4
config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 640/640 [00:00<?, ?B/s]
pytorch_model.bin.index.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 29.9k/29.9k [00:00<?, ?B/s]
pytorch_model-00001-of-00003.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββ| 12.9G/12.9G [05:56<00:00, 36.2MB/s]
pytorch_model-00002-of-00003.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββ| 12.8G/12.8G [10:34<00:00, 20.2MB/s]
pytorch_model-00003-of-00003.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 328M/328M [00:08<00:00, 40.3MB/s]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [16:42<00:00, 334.24s/it]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Then it just quits without any error message. The same thing occurs without the --wbits 4 flag.
I am able to use opt.py perfectly well, and Task Manager shows that nothing is being maxed out so it can't be a hardware limitation:

System instability:
When I was running the command I mentioned above, trying to capture the image of the Task Manager usage graphs, an Electron app restarted itself and Task Manager displayed some black artifacts and closed. This is not a regular occurence for this machine, and only happened when I ran that command. On another occasion, Firefox froze for a few seconds while llama.py was attempting to run.
Unfortunately, I have been unable to reproduce either of these behaviours. Even so, I feel it's worth mentioning.