fleshing this out a bit more, I'm pretty sure the idea here was to intelligently add capacity to a network mid-training?
use a stickbreaking prior of chinese restuarant or indian buffet process (etc) to sample features weights into correlated groups
each "table" has its own weight sampling parameters that are fixed. so whenever we break a stick or create a new table (etc), we sample a e.g. a mean and variance for weight initialization