Commit 206aa116 authored by Tomáš Oberhuber's avatar Tomáš Oberhuber
Browse files

Writting redcution tutorial.

parent 2d0b5ea5
Loading
Loading
Loading
Loading
+2 −6
Original line number Diff line number Diff line
@@ -9,13 +9,9 @@ IF( BUILD_CUDA )
   ADD_CUSTOM_COMMAND( COMMAND MaximumNormExample > MaximumNormExample.out OUTPUT MaximumNormExample.out )
   CUDA_ADD_EXECUTABLE( ComparisonExample ComparisonExample.cu )
   ADD_CUSTOM_COMMAND( COMMAND ComparisonExample > ComparisonExample.out OUTPUT ComparisonExample.out )
#   CUDA_ADD_EXECUTABLE( UpdateAndResidueExample UpdateAndResidueExample.cu )
#   ADD_CUSTOM_COMMAND( COMMAND UpdateAndResidueExample > UpdateAndResidueExample.out OUTPUT UpdateAndResidueExample.out )
ENDIF()

ADD_EXECUTABLE( UpdateAndResidueExample UpdateAndResidueExample.cpp )
   CUDA_ADD_EXECUTABLE( UpdateAndResidueExample UpdateAndResidueExample.cu )
   ADD_CUSTOM_COMMAND( COMMAND UpdateAndResidueExample > UpdateAndResidueExample.out OUTPUT UpdateAndResidueExample.out )

ENDIF()

IF( BUILD_CUDA )
ADD_CUSTOM_TARGET( TutorialsReduction-cuda ALL DEPENDS
+1 −1
Original line number Diff line number Diff line
@@ -18,7 +18,7 @@ double updateAndResidue( Vector< double, Device >& u, const Vector< double, Devi
      return add * add; };
   auto reduce = [] __cuda_callable__ ( double& a, const double& b ) { a += b; };
   auto volatileReduce = [=] __cuda_callable__ ( volatile double& a, const volatile double& b ) { a += b; };
   return Reduction< Device >::reduce( u_view.getSize(), reduce, volatileReduce, fetch, 0.0 );
   return sqrt( Reduction< Device >::reduce( u_view.getSize(), reduce, volatileReduce, fetch, 0.0 ) );
}

int main( int argc, char* argv[] )
+9 −6
Original line number Diff line number Diff line
@@ -11,6 +11,8 @@ This tutorial introduces flexible parallel reduction in TNL. It shows how to eas
   3. [Scalar product](#flexible_parallel_reduction_scalar_product)
   4. [Maxium norm](#flexible_parallel_reduction_maximum_norm)
   5. [Vectors comparison](#flexible_parallel_reduction_vector_comparison)
   6. [Update and Residue](#flexible_parallel_reduction_update_and_residue)
   7. [Simple Mask and Reduce](#flexible_parallel_reduction_simple_mask_and_reduce)

## Flexible parallel reduction<a name="flexible_parallel_reduction"></a>

@@ -106,7 +108,7 @@ In iterative solvers we often need to update a vector and compute the update nor
\bf u^{k+1} = \bf u^k + \tau \Delta \bf u.
\f]

Except the vector addition, we may want to compute \f$L_p\f$-norm of \f$\Delta \bf u\f$ which may indicate convergence. Computing first the addition and then the norm would be inefficient because we would have to fetch the vector \f$\Delta \bf u\f$ twice from the memory. The following example shows how to do the addition and norm computation at the same time.
Together with the vector addition, we may want to compute also \f$L_2\f$-norm of \f$\Delta \bf u\f$ which may indicate convergence. Computing first the addition and then the norm would be inefficient because we would have to fetch the vector \f$\Delta \bf u\f$ twice from the memory. The following example shows how to do the addition and norm computation at the same time.

\include UpdateAndResidueExample.cpp

@@ -114,4 +116,5 @@ The result reads as:

\include UpdateAndResidueExample.out

### Simple Mask and Reduce<a name="flexible_parallel_reduction_simple_mask_and_reduce"></a>