Change host norm implementation to use thrust and split the vector into chunks small enough to fit on the gpu