add LinearWarmupScheduler#1537
Conversation
| return self.value_at_epoch[old_index], self.value_at_epoch[index] | ||
|
|
||
|
|
||
| class LinearWarmupScheduler: |
There was a problem hiding this comment.
HI and thanks! Is this resumable ? I see a "current_step" shouldn't the sheduler be saved as well in case of resuming experiment? This can be easily done with hooks (see other scheduler with states). What do you think?
There was a problem hiding this comment.
I agree, on a side note, StepScheduler also does not have hooks. we should fix that in a separate PR.
There was a problem hiding this comment.
Hi, this is a very good question. TBH, I am not very familiar with the concept of hooks. But I will take a look at how other schedulers are implemented.
There was a problem hiding this comment.
I have added the checkpoint hooks. Please take a look at it.
|
Huge thanks! |
|
I notice the design is quite different from the one in PyTorch native schedulers that have a step() function and have load_state_dict() and state_dict() functions. We also ended up changing the interface a bit, as I wanted something where you could step on both minibatches and epochs. [In our case it's not part of a unified interface, though; because as for now, for flexibility of early development, our model is to put most of the complexity in local scripts without putting most things in any central place.] |
|
Agreed. Torch schedulers are a bit rigid. Although, you can use it natively with SB as well. As you may have seen, we follow the opposite direction for now: more central places and less complexity in local scripts. I guess it's a balance to do between how much maintenance you can put from a coordinated team (central) vs how much you wish to rely on the community to do that (local scripts). At least, this is a personal opinion, I find it hard to maintain properly recipes as they tend to grow way too rapidly in number :p |
|
Hm yes, for now we are aiming to get the best possible WER with reasonable latency before we add lots of recipes; at a later time we might consider centralizing things a bit. I figure if people really need recipes that work for a specific dataset, they can always get it from speechbrain or ESPNet. |
|
The numbers you get with Transducers are really impressive, I really wish we soon obtain enough resources to put someone on this full-time. The last intern that tried did not succeed but he had other things to do as well (the PR where he tried your nice pruned transducer loss). |
Create a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.