Loading Documentation/Examples/Algorithms/CMakeLists.txt +3 −3 Original line number Diff line number Diff line Loading @@ -9,11 +9,11 @@ ENDIF() ADD_EXECUTABLE(staticForExample staticForExample.cpp) ADD_CUSTOM_COMMAND( COMMAND staticForExample > ${TNL_DOCUMENTATION_OUTPUT_SNIPPETS_PATH}/staticForExample.out OUTPUT staticForExample.out ) ADD_EXECUTABLE(UnrolledForExample UnrolledForExample.cpp) ADD_CUSTOM_COMMAND( COMMAND UnrolledForExample > ${TNL_DOCUMENTATION_OUTPUT_SNIPPETS_PATH}/UnrolledForExample.out OUTPUT UnrolledForExample.out ) ADD_EXECUTABLE(unrolledForExample unrolledForExample.cpp) ADD_CUSTOM_COMMAND( COMMAND unrolledForExample > ${TNL_DOCUMENTATION_OUTPUT_SNIPPETS_PATH}/unrolledForExample.out OUTPUT unrolledForExample.out ) ADD_CUSTOM_TARGET( RunAlgorithmsExamples ALL DEPENDS ParallelForExample.out UnrolledForExample.out unrolledForExample.out staticForExample.out ) Documentation/Examples/Algorithms/UnrolledForExample.cpp→Documentation/Examples/Algorithms/unrolledForExample.cpp +7 −8 Original line number Diff line number Diff line #include <iostream> #include <TNL/Containers/StaticVector.h> #include <TNL/Algorithms/UnrolledFor.h> #include <TNL/Algorithms/unrolledFor.h> using namespace TNL; using namespace TNL::Containers; Loading @@ -19,13 +19,12 @@ int main( int argc, char* argv[] ) /**** * Compute an addition of a vector and a constant number. */ auto addition = [&]( int i, const double& c ) { a[ i ] = b[ i ] + c; Algorithms::unrolledFor< int, 0, Size >( [&]( int i ) { a[ i ] = b[ i ] + 3.14; sum += a[ i ]; }; Algorithms::UnrolledFor< 0, Size >::exec( addition, 3.14 ); } ); std::cout << "a = " << a << std::endl; std::cout << "sum = " << sum << std::endl; } Documentation/Tutorials/ForLoops/tutorial_ForLoops.md +12 −12 Original line number Diff line number Diff line Loading @@ -65,24 +65,24 @@ For completeness, we show modification of the previous example into 3D: ## Unrolled For \ref TNL::Algorithms::UnrolledFor is a for-loop that it is explicitly unrolled via C++ templates when the loop is short (up to eight iterations). The bounds of `UnrolledFor` loops must be constant (i.e. known at the compile time). \ref TNL::Algorithms::unrolledFor is a for-loop that it is explicitly unrolled via C++ templates when the loop is short (up to eight iterations). The bounds of `unrolledFor` loops must be constant (i.e. known at the compile time). It is often used with static arrays and vectors. See the following example: \include UnrolledForExample.cpp \include unrolledForExample.cpp Notice that the unrolled for-loop works with a lambda function similar to parallel for-loop. The bounds of the loop are passed as template parameters in the statement `Algorithms::UnrolledFor< 0, Size >`. The parameters of the static method `exec` are the lambda functions to be performed in each iteration and auxiliary data to be passed to the function. The function gets the loop index `i` first followed by the auxiliary data `sum` in this example. The bounds of the loop are passed as template parameters in the statement `Algorithms::unrolledFor< int, 0, Size >`. The parameter of the `unrolledFor` function is the functor to be called in each iteration. The function gets the loop index `i` only, see the following example: The result looks as: \include UnrolledForExample.out \include unrolledForExample.out The effect of `UnrolledFor` is really the same as usual for-loop. The effect of `unrolledFor` is really the same as usual for-loop. The following code does the same as the previous example: ```cpp Loading @@ -93,14 +93,14 @@ for( int i = 0; i < Size; i++ ) }; ``` The benefit of `UnrolledFor` is mainly in the explicit unrolling of short loops which can improve performance in some situations. `UnrolledFor` can be forced to do the loop-unrolling in any situations using the third template parameter as follows: The benefit of `unrolledFor` is mainly in the explicit unrolling of short loops which can improve performance in some situations. The maximum length of loops that will be fully unrolled can be specified using the fourth template parameter as follows: ```cpp Algorithms::UnrolledFor< 0, Size, true >::exec( addition, 3.14 ); Algorithms::unrolledFor< int, 0, Size, 16 >( ... ); ``` `UnrolledFor` can be used also in CUDA kernels. `unrolledFor` can be used also in CUDA kernels. ## Static For Loading src/TNL/Algorithms/UnrolledFor.hdeleted 100644 → 0 +0 −88 Original line number Diff line number Diff line /*************************************************************************** UnrolledFor.h - description ------------------- begin : Jul 16, 2019 copyright : (C) 2019 by Tomas Oberhuber email : tomas.oberhuber@fjfi.cvut.cz ***************************************************************************/ /* See Copyright Notice in tnl/Copyright */ #pragma once #include <utility> #include <TNL/Cuda/CudaCallable.h> namespace TNL { namespace Algorithms { /** * \brief UnrolledFor is a wrapper for common for-loop with explicit unrolling. * * UnrolledFor can be used only for for-loops bounds of which are known at the * compile time. UnrolledFor performs explicit loop unrolling for better performance. * This, however, does not make sense for loops with a large iterations * count. For a very large iterations count it could trigger the compiler's * limit on recursive template instantiation. Also note that the compiler * will (at least partially) unroll loops with static bounds anyway. For theses * reasons, the explicit loop unrolling can be controlled by the third template * parameter. * * \tparam Begin the loop will iterate over indexes [Begin,End) * \tparam End the loop will iterate over indexes [Begin,End) * \tparam unrolled controls the explicit loop unrolling. If it is true, the * unrolling is performed. * * \par Example * \include Algorithms/UnrolledForExample.cpp * \par Output * \include UnrolledForExample.out */ template< int Begin, int End, bool unrolled = (End - Begin <= 8) > struct UnrolledFor; template< int Begin, int End > struct UnrolledFor< Begin, End, true > { static_assert( Begin < End, "Wrong index interval for UnrolledFor. Begin must be less than end." ); /** * \brief Static method for the execution of the UnrolledFor. * * \param f is a (lambda) function to be performed in each iteration. * \param args are auxiliary data to be passed to the function f. */ template< typename Function, typename... Args > __cuda_callable__ static void exec( const Function& f, Args&&... args ) { f( Begin, args... ); UnrolledFor< Begin + 1, End >::exec( f, std::forward< Args >( args )... ); } }; template< int End > struct UnrolledFor< End, End, true > { template< typename Function, typename... Args > __cuda_callable__ static void exec( const Function& f, Args&&... args ) {} }; template< int Begin, int End > struct UnrolledFor< Begin, End, false > { static_assert( Begin <= End, "Wrong index interval for UnrolledFor. Begin must be less than or equal to end." ); template< typename Function, typename... Args > __cuda_callable__ static void exec( const Function& f, Args&&... args ) { for( int i = Begin; i < End; i++ ) f( i, std::forward< Args >( args )... ); } }; } // namespace Algorithms } // namespace TNL src/TNL/Algorithms/unrolledFor.h 0 → 100644 +100 −0 Original line number Diff line number Diff line /*************************************************************************** unrolledFor.h - description ------------------- begin : Jul 16, 2019 copyright : (C) 2019 by Tomas Oberhuber email : tomas.oberhuber@fjfi.cvut.cz ***************************************************************************/ /* See Copyright Notice in tnl/Copyright */ #pragma once #include <utility> namespace TNL { namespace Algorithms { namespace detail { template< typename Index, Index begin, Index end > struct UnrolledFor { static_assert( begin < end, "internal error - wrong iteration index for UnrolledFor" ); template< typename Func > static constexpr void exec( Func&& f ) { f( begin ); UnrolledFor< Index, begin + 1, end >::exec( std::forward< Func >( f ) ); } }; template< typename Index, Index end > struct UnrolledFor< Index, end, end > { template< typename Func > static constexpr void exec( Func&& f ) {} }; // specialization for short loops - unrolling template< typename Index, Index begin, Index end, Index unrollFactor, typename Func > constexpr std::enable_if_t< (begin < end && end - begin <= unrollFactor) > unrolled_for_dispatch( Func&& f ) { UnrolledFor< Index, begin, end >::exec( std::forward< Func >( f ) ); } // specialization for long loops - normal for-loop template< typename Index, Index begin, Index end, Index unrollFactor, typename Func > constexpr std::enable_if_t< (begin < end && end - begin > unrollFactor) > unrolled_for_dispatch( Func&& f ) { for( Index i = begin; i < end; i++ ) f( i ); } // specialization for empty loop template< typename Index, Index begin, Index end, Index unrollFactor, typename Func > constexpr std::enable_if_t< (begin >= end) > unrolled_for_dispatch( Func&& f ) {} } // namespace detail /** * \brief Generic for-loop with explicit unrolling. * * \e unrolledFor performs explicit loop unrolling of short loops which can * improve performance in some cases. The bounds of the for-loop must be constant * (i.e. known at the compile time). Loops longer than \e unrollFactor are not * unrolled and executed as a normal for-loop. * * The unroll factor is configurable, but note that full unrolling does not * make sense for very long loops. It might even trigger the compiler's limit * on recursive template instantiation. Also note that the compiler will (at * least partially) unroll loops with static bounds anyway. * * \tparam Index is the type of the loop indices. * \tparam begin is the left bound of the iteration range `[begin, end)`. * \tparam end is the right bound of the iteration range `[begin, end)`. * \tparam unrollFactor is the maximum length of loops to fully unroll via * recursive template instantiation. * \tparam Func is the type of the functor (it is usually deduced from the * argument used in the function call). * * \param f is the functor to be called in each iteration. * * \par Example * \include Algorithms/unrolledForExample.cpp * \par Output * \include unrolledForExample.out */ template< typename Index, Index begin, Index end, Index unrollFactor = 8, typename Func > constexpr void unrolledFor( Func&& f ) { detail::unrolled_for_dispatch< Index, begin, end, unrollFactor >( std::forward< Func >( f ) ); } } // namespace Algorithms } // namespace TNL Loading
Documentation/Examples/Algorithms/CMakeLists.txt +3 −3 Original line number Diff line number Diff line Loading @@ -9,11 +9,11 @@ ENDIF() ADD_EXECUTABLE(staticForExample staticForExample.cpp) ADD_CUSTOM_COMMAND( COMMAND staticForExample > ${TNL_DOCUMENTATION_OUTPUT_SNIPPETS_PATH}/staticForExample.out OUTPUT staticForExample.out ) ADD_EXECUTABLE(UnrolledForExample UnrolledForExample.cpp) ADD_CUSTOM_COMMAND( COMMAND UnrolledForExample > ${TNL_DOCUMENTATION_OUTPUT_SNIPPETS_PATH}/UnrolledForExample.out OUTPUT UnrolledForExample.out ) ADD_EXECUTABLE(unrolledForExample unrolledForExample.cpp) ADD_CUSTOM_COMMAND( COMMAND unrolledForExample > ${TNL_DOCUMENTATION_OUTPUT_SNIPPETS_PATH}/unrolledForExample.out OUTPUT unrolledForExample.out ) ADD_CUSTOM_TARGET( RunAlgorithmsExamples ALL DEPENDS ParallelForExample.out UnrolledForExample.out unrolledForExample.out staticForExample.out )
Documentation/Examples/Algorithms/UnrolledForExample.cpp→Documentation/Examples/Algorithms/unrolledForExample.cpp +7 −8 Original line number Diff line number Diff line #include <iostream> #include <TNL/Containers/StaticVector.h> #include <TNL/Algorithms/UnrolledFor.h> #include <TNL/Algorithms/unrolledFor.h> using namespace TNL; using namespace TNL::Containers; Loading @@ -19,13 +19,12 @@ int main( int argc, char* argv[] ) /**** * Compute an addition of a vector and a constant number. */ auto addition = [&]( int i, const double& c ) { a[ i ] = b[ i ] + c; Algorithms::unrolledFor< int, 0, Size >( [&]( int i ) { a[ i ] = b[ i ] + 3.14; sum += a[ i ]; }; Algorithms::UnrolledFor< 0, Size >::exec( addition, 3.14 ); } ); std::cout << "a = " << a << std::endl; std::cout << "sum = " << sum << std::endl; }
Documentation/Tutorials/ForLoops/tutorial_ForLoops.md +12 −12 Original line number Diff line number Diff line Loading @@ -65,24 +65,24 @@ For completeness, we show modification of the previous example into 3D: ## Unrolled For \ref TNL::Algorithms::UnrolledFor is a for-loop that it is explicitly unrolled via C++ templates when the loop is short (up to eight iterations). The bounds of `UnrolledFor` loops must be constant (i.e. known at the compile time). \ref TNL::Algorithms::unrolledFor is a for-loop that it is explicitly unrolled via C++ templates when the loop is short (up to eight iterations). The bounds of `unrolledFor` loops must be constant (i.e. known at the compile time). It is often used with static arrays and vectors. See the following example: \include UnrolledForExample.cpp \include unrolledForExample.cpp Notice that the unrolled for-loop works with a lambda function similar to parallel for-loop. The bounds of the loop are passed as template parameters in the statement `Algorithms::UnrolledFor< 0, Size >`. The parameters of the static method `exec` are the lambda functions to be performed in each iteration and auxiliary data to be passed to the function. The function gets the loop index `i` first followed by the auxiliary data `sum` in this example. The bounds of the loop are passed as template parameters in the statement `Algorithms::unrolledFor< int, 0, Size >`. The parameter of the `unrolledFor` function is the functor to be called in each iteration. The function gets the loop index `i` only, see the following example: The result looks as: \include UnrolledForExample.out \include unrolledForExample.out The effect of `UnrolledFor` is really the same as usual for-loop. The effect of `unrolledFor` is really the same as usual for-loop. The following code does the same as the previous example: ```cpp Loading @@ -93,14 +93,14 @@ for( int i = 0; i < Size; i++ ) }; ``` The benefit of `UnrolledFor` is mainly in the explicit unrolling of short loops which can improve performance in some situations. `UnrolledFor` can be forced to do the loop-unrolling in any situations using the third template parameter as follows: The benefit of `unrolledFor` is mainly in the explicit unrolling of short loops which can improve performance in some situations. The maximum length of loops that will be fully unrolled can be specified using the fourth template parameter as follows: ```cpp Algorithms::UnrolledFor< 0, Size, true >::exec( addition, 3.14 ); Algorithms::unrolledFor< int, 0, Size, 16 >( ... ); ``` `UnrolledFor` can be used also in CUDA kernels. `unrolledFor` can be used also in CUDA kernels. ## Static For Loading
src/TNL/Algorithms/UnrolledFor.hdeleted 100644 → 0 +0 −88 Original line number Diff line number Diff line /*************************************************************************** UnrolledFor.h - description ------------------- begin : Jul 16, 2019 copyright : (C) 2019 by Tomas Oberhuber email : tomas.oberhuber@fjfi.cvut.cz ***************************************************************************/ /* See Copyright Notice in tnl/Copyright */ #pragma once #include <utility> #include <TNL/Cuda/CudaCallable.h> namespace TNL { namespace Algorithms { /** * \brief UnrolledFor is a wrapper for common for-loop with explicit unrolling. * * UnrolledFor can be used only for for-loops bounds of which are known at the * compile time. UnrolledFor performs explicit loop unrolling for better performance. * This, however, does not make sense for loops with a large iterations * count. For a very large iterations count it could trigger the compiler's * limit on recursive template instantiation. Also note that the compiler * will (at least partially) unroll loops with static bounds anyway. For theses * reasons, the explicit loop unrolling can be controlled by the third template * parameter. * * \tparam Begin the loop will iterate over indexes [Begin,End) * \tparam End the loop will iterate over indexes [Begin,End) * \tparam unrolled controls the explicit loop unrolling. If it is true, the * unrolling is performed. * * \par Example * \include Algorithms/UnrolledForExample.cpp * \par Output * \include UnrolledForExample.out */ template< int Begin, int End, bool unrolled = (End - Begin <= 8) > struct UnrolledFor; template< int Begin, int End > struct UnrolledFor< Begin, End, true > { static_assert( Begin < End, "Wrong index interval for UnrolledFor. Begin must be less than end." ); /** * \brief Static method for the execution of the UnrolledFor. * * \param f is a (lambda) function to be performed in each iteration. * \param args are auxiliary data to be passed to the function f. */ template< typename Function, typename... Args > __cuda_callable__ static void exec( const Function& f, Args&&... args ) { f( Begin, args... ); UnrolledFor< Begin + 1, End >::exec( f, std::forward< Args >( args )... ); } }; template< int End > struct UnrolledFor< End, End, true > { template< typename Function, typename... Args > __cuda_callable__ static void exec( const Function& f, Args&&... args ) {} }; template< int Begin, int End > struct UnrolledFor< Begin, End, false > { static_assert( Begin <= End, "Wrong index interval for UnrolledFor. Begin must be less than or equal to end." ); template< typename Function, typename... Args > __cuda_callable__ static void exec( const Function& f, Args&&... args ) { for( int i = Begin; i < End; i++ ) f( i, std::forward< Args >( args )... ); } }; } // namespace Algorithms } // namespace TNL
src/TNL/Algorithms/unrolledFor.h 0 → 100644 +100 −0 Original line number Diff line number Diff line /*************************************************************************** unrolledFor.h - description ------------------- begin : Jul 16, 2019 copyright : (C) 2019 by Tomas Oberhuber email : tomas.oberhuber@fjfi.cvut.cz ***************************************************************************/ /* See Copyright Notice in tnl/Copyright */ #pragma once #include <utility> namespace TNL { namespace Algorithms { namespace detail { template< typename Index, Index begin, Index end > struct UnrolledFor { static_assert( begin < end, "internal error - wrong iteration index for UnrolledFor" ); template< typename Func > static constexpr void exec( Func&& f ) { f( begin ); UnrolledFor< Index, begin + 1, end >::exec( std::forward< Func >( f ) ); } }; template< typename Index, Index end > struct UnrolledFor< Index, end, end > { template< typename Func > static constexpr void exec( Func&& f ) {} }; // specialization for short loops - unrolling template< typename Index, Index begin, Index end, Index unrollFactor, typename Func > constexpr std::enable_if_t< (begin < end && end - begin <= unrollFactor) > unrolled_for_dispatch( Func&& f ) { UnrolledFor< Index, begin, end >::exec( std::forward< Func >( f ) ); } // specialization for long loops - normal for-loop template< typename Index, Index begin, Index end, Index unrollFactor, typename Func > constexpr std::enable_if_t< (begin < end && end - begin > unrollFactor) > unrolled_for_dispatch( Func&& f ) { for( Index i = begin; i < end; i++ ) f( i ); } // specialization for empty loop template< typename Index, Index begin, Index end, Index unrollFactor, typename Func > constexpr std::enable_if_t< (begin >= end) > unrolled_for_dispatch( Func&& f ) {} } // namespace detail /** * \brief Generic for-loop with explicit unrolling. * * \e unrolledFor performs explicit loop unrolling of short loops which can * improve performance in some cases. The bounds of the for-loop must be constant * (i.e. known at the compile time). Loops longer than \e unrollFactor are not * unrolled and executed as a normal for-loop. * * The unroll factor is configurable, but note that full unrolling does not * make sense for very long loops. It might even trigger the compiler's limit * on recursive template instantiation. Also note that the compiler will (at * least partially) unroll loops with static bounds anyway. * * \tparam Index is the type of the loop indices. * \tparam begin is the left bound of the iteration range `[begin, end)`. * \tparam end is the right bound of the iteration range `[begin, end)`. * \tparam unrollFactor is the maximum length of loops to fully unroll via * recursive template instantiation. * \tparam Func is the type of the functor (it is usually deduced from the * argument used in the function call). * * \param f is the functor to be called in each iteration. * * \par Example * \include Algorithms/unrolledForExample.cpp * \par Output * \include unrolledForExample.out */ template< typename Index, Index begin, Index end, Index unrollFactor = 8, typename Func > constexpr void unrolledFor( Func&& f ) { detail::unrolled_for_dispatch< Index, begin, end, unrollFactor >( std::forward< Func >( f ) ); } } // namespace Algorithms } // namespace TNL