I'm writing a repo that simply takes all the toy models in pints, and all the methods (optimisers and inference), and then tests every method against every model. This will take a while, so its doing it all on arcus-b (there is a lot of machine-specific stuff in there, so it's not suitable to put into Pints itself)
When I'm comparing optimisers, I compare using the following criteria:
- final score
- time taken to reach the final score (this is the time taken using all the cores of a whole node on arcus-b)
I might also average these results over multiple runs of the optimiser, since some of them will be stochastic.
I'm less sure how to compare the inference methods, perhaps:
- Effective sample size (in a given time limit?)
- Rhat
- anything else?
What other criteria do you all think are necessary @MichaelClerx @ben18785 @sanmitraghosh @mirams @chonlei ? I'm hoping this will give a bunch of heat maps comparing the performance of all of our methods, and will go into the first paper
I'm writing a repo that simply takes all the toy models in pints, and all the methods (optimisers and inference), and then tests every method against every model. This will take a while, so its doing it all on arcus-b (there is a lot of machine-specific stuff in there, so it's not suitable to put into Pints itself)
When I'm comparing optimisers, I compare using the following criteria:
I might also average these results over multiple runs of the optimiser, since some of them will be stochastic.
I'm less sure how to compare the inference methods, perhaps:
What other criteria do you all think are necessary @MichaelClerx @ben18785 @sanmitraghosh @mirams @chonlei ? I'm hoping this will give a bunch of heat maps comparing the performance of all of our methods, and will go into the first paper