Writting redcution tutorial. (206aa116) · Commits · TNL / tnl-dev

Documentation/Tutorials/Reduction/CMakeLists.txt

+2 −6

Original line number	Diff line number	Diff line
		@@ -9,13 +9,9 @@ IF( BUILD_CUDA )
		ADD_CUSTOM_COMMAND( COMMAND MaximumNormExample > MaximumNormExample.out OUTPUT MaximumNormExample.out )
		CUDA_ADD_EXECUTABLE( ComparisonExample ComparisonExample.cu )
		ADD_CUSTOM_COMMAND( COMMAND ComparisonExample > ComparisonExample.out OUTPUT ComparisonExample.out )
		# CUDA_ADD_EXECUTABLE( UpdateAndResidueExample UpdateAndResidueExample.cu )
		# ADD_CUSTOM_COMMAND( COMMAND UpdateAndResidueExample > UpdateAndResidueExample.out OUTPUT UpdateAndResidueExample.out )
		ENDIF()

		ADD_EXECUTABLE( UpdateAndResidueExample UpdateAndResidueExample.cpp )
		CUDA_ADD_EXECUTABLE( UpdateAndResidueExample UpdateAndResidueExample.cu )
		ADD_CUSTOM_COMMAND( COMMAND UpdateAndResidueExample > UpdateAndResidueExample.out OUTPUT UpdateAndResidueExample.out )

		ENDIF()

		IF( BUILD_CUDA )
		ADD_CUSTOM_TARGET( TutorialsReduction-cuda ALL DEPENDS

Documentation/Tutorials/Reduction/UpdateAndResidueExample.cpp

+1 −1

Original line number	Diff line number	Diff line
		@@ -18,7 +18,7 @@ double updateAndResidue( Vector< double, Device >& u, const Vector< double, Devi
		return add * add; };
		auto reduce = [] __cuda_callable__ ( double& a, const double& b ) { a += b; };
		auto volatileReduce = [=] __cuda_callable__ ( volatile double& a, const volatile double& b ) { a += b; };
		return Reduction< Device >::reduce( u_view.getSize(), reduce, volatileReduce, fetch, 0.0 );
		return sqrt( Reduction< Device >::reduce( u_view.getSize(), reduce, volatileReduce, fetch, 0.0 ) );
		}

		int main( int argc, char* argv[] )

Documentation/Tutorials/Reduction/tutorial_03_Reduction.md

+9 −6

Original line number	Diff line number	Diff line
		@@ -11,6 +11,8 @@ This tutorial introduces flexible parallel reduction in TNL. It shows how to eas
		3. [Scalar product](#flexible_parallel_reduction_scalar_product)
		4. [Maxium norm](#flexible_parallel_reduction_maximum_norm)
		5. [Vectors comparison](#flexible_parallel_reduction_vector_comparison)
		6. [Update and Residue](#flexible_parallel_reduction_update_and_residue)
		7. [Simple Mask and Reduce](#flexible_parallel_reduction_simple_mask_and_reduce)

		## Flexible parallel reduction<a name="flexible_parallel_reduction"></a>

		@@ -106,7 +108,7 @@ In iterative solvers we often need to update a vector and compute the update nor
		\bf u^{k+1} = \bf u^k + \tau \Delta \bf u.
		\f]

		Except the vector addition, we may want to compute \f$L_p\f$-norm of \f$\Delta \bf u\f$ which may indicate convergence. Computing first the addition and then the norm would be inefficient because we would have to fetch the vector \f$\Delta \bf u\f$ twice from the memory. The following example shows how to do the addition and norm computation at the same time.
		Together with the vector addition, we may want to compute also \f$L_2\f$-norm of \f$\Delta \bf u\f$ which may indicate convergence. Computing first the addition and then the norm would be inefficient because we would have to fetch the vector \f$\Delta \bf u\f$ twice from the memory. The following example shows how to do the addition and norm computation at the same time.

		\include UpdateAndResidueExample.cpp

		@@ -114,4 +116,5 @@ The result reads as:

		\include UpdateAndResidueExample.out

		### Simple Mask and Reduce<a name="flexible_parallel_reduction_simple_mask_and_reduce"></a>