* master: Tuning Ellpack formats. Tuning Ellpack format. Fixing bug in tnlVector::operator !=. Optimizing CUDA L2 norm. Adding banchmark for CUDA lp norm.