This is an open question. When a new category is proposed, it may be that its support is not good enough.
How to determine if a category is of enough quality to be added? One could be the number of samples, another one could be determined on the F1 (but maybe a little misleading)