Implement parallel prefix-sum with OpenMP

The specialization of PrefixSum for host is only sequential, any parallelization is missing.

Edited by Jakub Klinkovský