Skip to content
Snippets Groups Projects
Commit 776af14a authored by Tomáš Oberhuber's avatar Tomáš Oberhuber
Browse files

Writing vector tutorial.

parent e2c84633
No related branches found
No related tags found
1 merge request!32Expression templates 2
# Arrays tutorial
## Introduction
This tutorial introduces arrays in TNL. Array is one of the most important structure for memory management. Methods implemented in arrays are particularly usefull for GPU programming. From this point of view, the reader will learn how to easily allocate memory on GPU, transfer data between GPU and CPU but also, how to initialise data allocated on GPU. In addition, the resulting code is hardware platform independent, so it can be ran on CPU without any changes.
## Arrays
Array is templated class defined in namespace ```TNL::Containers``` having three template parameters:
* ```Value``` is type of data to be stored in the array
* ```Device``` is the device wheer the array is allocated. Currently it can be either ```Devices::Host``` for CPU or ```Devices::Cuda``` for GPU supporting CUDA.
* ```Index``` is the type to be used for indexing the array elements.
The following example shows how to allocate arrays on CPU and GPU and how to manipulate the data.
\include ArrayAllocation.cpp
The result looks as follows:
\include ArrayAllocation.out
## Arrays binding
Arrays can share data with each other or data allocated elsewhere. It is called binding and it can be done using method ```bind```. The following example shows how to bind data allocated on host using the ```new``` operator. In this case, the TNL array do not free this data at the and of its life cycle.
\include ArrayBinding-1.cpp
It generates output like this:
\include ArrayBinding-1.out
One may also bind another TNL array. In this case, the data is shared and can be shared between multiple arrays. Reference counter ensures that the data is freed after the last array sharing the data ends its life cycle.
\include ArrayBinding-2.cpp
The result is:
\include ArrayBinding-2.out
Binding may also serve for data partitioning. Both CPU and GPU prefere data allocated in large contiguous blocks instead of many fragmented pieces of allocated memory. Another reason why one might want to partition the allocated data is demonstrated in the following example. Consider a situation of solving incompressible flow in 2D. The degrees of freedom consist of density and two components of velocity. Mostly, we want to manipulate either density or velocity. But some numerical solvers may need to have all degrees of freedom in one array. It can be managed like this:
\include ArrayBinding-3.cpp
The result is:
\include ArrayBinding-3.out
## Array views
Because of the data sharing, TNL Array is relatively complicated structure. In many situations, we prefer lightweight structure which only encapsulates the data pointer and keeps information about the data size. Passing array structure to GPU kernel can be one example. For this purpose there is ```ArrayView``` in TNL. It templated structure having the same template parameters as ```Array``` (it means ```Value```, ```Device``` and ```Index```). In fact, it is recommended to use ```Array``` only for the data allocation and to use ```ArrayView``` for most of the operations with the data since array view offer better functionality (for example ```ArrayView``` can be captured by lambda functions in CUDA while ```Array``` cannot). The following code snippet shows how to create an array view.
\include ArrayView-1.cpp
Its output is:
\include ArrayView-1.out
Of course, one may bind his own data into array view:
\include ArrayView-2.cpp
Output:
\include ArrayView-2.out
Array view never allocated or deallocate the memory managed by it. Therefore it can be created even in CUDA kernels which is not true for ```Array```.
## Accessing the array elements
There are two ways how to work with the array (or array view) elements - using the indexing operator (```operator[]```) which is more efficient or methods ```setElement``` and ```getElement``` which is more flexible.
### Accessing the array elements with ```operator[]```
Indexing operator ```operator[]``` is implemented in both ```Array``` and ```ArrayView``` and it is defined as ```__cuda_callable__```. It means that it can be called even in CUDA kernels if the data is allocated on GPU, i.e. the ```Device``` parameter is ```Devicess::Cuda```. This operator returns a reference to given array element and so it is very efficient. However, calling this operator from host for data allocated in device (or vice versa) leads to segmentation fault (on the host system) or broken state of the device. It means:
* You may call the ```operator[]``` on the **host** only for data allocated on the **host** (with device ```Devices::Host```).
* You may call the ```operator[]``` on the **device** only for data allocated on the **device** (with device ```Devices::Cuda```).
The following example shows use of ```operator[]```.
\include ElementsAccessing-1.cpp
Output:
\include ElementsAccessing-1.out
In general in TNL, each method defined as ```__cuda_callable__``` can be called from the CUDA kernels. The method ```ArrayView::getSize``` is another example. We also would like to point the reader to better ways of arrays initiation for example with method ```ArrayView::evaluate``` or with ```ParalleFor```.
### Accessing the array element with ```setElement``` and ```getElement```
On the other hand, the methods ```setElement``` and ```getElement``` can be called **from the host only** no matter where the array is allocated. None of the methods can be used in CUDA kernels. ```getElement``` returns copy of an element rather than a reference. Therefore it is slightly slower. If the array is on GPU, the array element is copied from the device on the host (or vice versa) which is significantly slower. In those parts of code where the perfomance matters, these methods shall not be called. Their use is, however, much easier and they allow to write one simple code for both CPU and GPU. Both methods are good candidates for:
* reading/wiriting of only few elements in the array
* arrays inititation which is done only once and it is not time critical part of a code
* debugging purposes
The following example shows the use of ```getElement``` and ```setElement```:
\include ElementsAccessing-2.cpp
Output:
\include ElementsAccessing-2.out
## Arrays initiation with lambdas
More eifficient and still quite simple method for the arrays initiation is with the use of C++ lambda functions and method ```evaluate```. This method is implemented in ```ArrayView``` only. As an argument a lambda function is passed which is then evaluated for all elemeents. Optionaly one may define only subinterval of element indexes where the lambda shall be evaluated. If the underlaying array is allocated on GPU, the lambda function is called from CUDA kernel. This is why it is more efficient than use of ```setElement```. On the other hand, one must be carefull to use only ```__cuda_callable__``` methods inside the lambda. The use of the method ```evaluate``` demonstrates the following example.
\include ArrayViewEvaluate.cpp
Output:
\include ArrayViewEvaluate.out
## Checking the array contents
Methods ```containsValue``` and ```containsOnlyValue``` serve for testing the contents of the arrays. ```containsValue``` returns ```true``` of there is at least one element in the array with given value. ```containsOnlyValue``` returnd ```true``` only if all elements of the array equal given value. The test can be restricted to subinterval of array elements. Both methods are implemented in ```Array``` as well as in ```ArrayView```. See the following code snippet for example of use.
\include ContainsValue.cpp
Output:
\include ContainsValue.out
## IO operations with Arrays
Methods ```save``` and ```load``` serve for storing/restoring the array to/from a file in binary form. In case of ```Array```, loading of an array from a file causes data reallocation. ```ArrayView``` cannot do reallocatation, therefore the data loaded from a file is copied to the memory managed by the ```ArrayView```. The number of elements managed by the array view and those loaded from the file must be equal. See the following example.
\include ArrayIO.cpp
Output:
\include ArrayIO.out
#IF( BUILD_CUDA )
# CUDA_ADD_EXECUTABLE( ArrayAllocation ArrayAllocation.cu )
# ADD_CUSTOM_COMMAND( COMMAND ArrayAllocation > ArrayAllocation.out OUTPUT ArrayAllocation.out )
#ENDIF()
IF( BUILD_CUDA )
#ADD_EXECUTABLE( Expressions Expressions.cpp )
CUDA_ADD_EXECUTABLE( Expressions Expressions.cu )
ADD_CUSTOM_COMMAND( COMMAND Expressions > Expressions.out OUTPUT Expressions.out )
ENDIF()
#IF( BUILD_CUDA )
#ADD_CUSTOM_TARGET( TutorialsVectors-cuda ALL DEPENDS
# ArrayViewEvaluate.out )
#ENDIF()
IF( BUILD_CUDA )
ADD_CUSTOM_TARGET( TutorialsVectors-cuda ALL DEPENDS
Expressions.out )
ENDIF()
# set input and output files
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/Doxyfile.in)
......@@ -22,4 +23,4 @@ add_custom_target( doc_doxygen_tutorial_vectors ALL
COMMENT "Generating API documentation with Doxygen"
VERBATIM )
INSTALL( DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/html/ DESTINATION ${CMAKE_INSTALL_PREFIX}/share/doc/tnl/html/Tutorials/Arrays )
\ No newline at end of file
INSTALL( DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/html/ DESTINATION ${CMAKE_INSTALL_PREFIX}/share/doc/tnl/html/Tutorials/Vectors )
\ No newline at end of file
#include <iostream>
#include <TNL/Containers/Vector.h>
#include <TNL/Containers/VectorView.h>
using namespace TNL;
using namespace TNL::Containers;
template< typename Device >
void expressions()
{
using VectorType = Vector< float, Device >;
using ViewType = VectorView< float, Device >;
/****
* Create vectors
*/
const int size = 6;
VectorType a_v( size ), b_v( size ), c_v( size );
ViewType a = a_v.getView();
ViewType b = b_v.getView();
ViewType c = c_v.getView();
a.evaluate( [] __cuda_callable__ ( int i )->float { return i - 3;} );
b = abs( a );
c = sign( b );
std::cout << "a = " << a << std::endl;
std::cout << "b = " << b << std::endl;
std::cout << "c = " << c << std::endl;
std::cout << "a + 3 * b + c * min( c, 0 ) = " << a + 3 * b + c * min( c, 0 ) << std::endl;
}
int main( int argc, char* argv[] )
{
/****
* Perform test on CPU
*/
std::cout << "Expressions on CPU ..." << std::endl;
expressions< Devices::Host >();
/****
* Perform test on GPU
*/
std::cout << "Expressions on GPU ..." << std::endl;
expressions< Devices::Cuda >();
}
Expressions.cpp
\ No newline at end of file
......@@ -2,13 +2,31 @@
## Introduction
This tutorial introduces vectors in TNL. ```Vector```, in addition to ```Array```, offers also basic operations from linear algebra. Methods implemented in arrays and vectors are particularly usefull for GPU programming. From this point of view, the reader will learn how to easily allocate memory on GPU, transfer data between GPU and CPU but also, how to initialise data allocated on GPU and perform parallel reduction and vector operations without writting low-level CUDA kernels. In addition, the resulting code is hardware platform independent, so it can be ran on CPU without any changes.
This tutorial introduces vectors in TNL. `Vector`, in addition to `Array`, offers also basic operations from linear algebra. Methods implemented in arrays and vectors are particularly usefull for GPU programming. From this point of view, the reader will learn how to easily allocate memory on GPU, transfer data between GPU and CPU but also, how to initialise data allocated on GPU and perform parallel reduction and vector operations without writting low-level CUDA kernels. In addition, the resulting code is hardware platform independent, so it can be ran on CPU without any changes.
## Vectors
# Table of Contents
1. [Vectors](#vectors)
2. [Static vectors](#static_vectors)
```Vector``` is, similar to ```Array``` templated class defined in namespace ```TNL::Containers``` having three template parameters:
## Vectors <a name="vectors"></a>
* ```Value``` is type of data to be stored in the array
* ```Device``` is the device wheer the array is allocated. Currently it can be either ```Devices::Host``` for CPU or ```Devices::Cuda``` for GPU supporting CUDA.
* ```Index``` is the type to be used for indexing the array elements.
`Vector` is, similar to `Array`, templated class defined in namespace `TNL::Containers` having three template parameters:
* `Real` is type of data to be stored in the vector
* `Device` is the device where the vector is allocated. Currently it can be either `Devices::Host` for CPU or `Devices::Cuda` for GPU supporting CUDA.
* `Index` is the type to be used for indexing the vector elements.
`Vector`, unlike `Array`, requires that the `Real` type is numeric or a type for which basic algebraic operations are defined. What kind of algebraic operations is required depends on what vector operations the user will call. `Vector` is derived from `Array` so it inherits all its methods. In the same way the `Array` has its counterpart `ArraView`, `Vector` has `VectorView` which is derived from `ArrayView`. We refer to to [Arrays tutorial](../Arrays/index.html) for more details.
### Vector expressions
Vector expressions in TNL are processed by the [Expression Templates](https://en.wikipedia.org/wiki/Expression_templates). It makes algebraic operations with vectors easy to do and very efficient at the same time. In some cases, one get even more efficient code compared to [Blas](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and [cuBlas](https://developer.nvidia.com/cublas). See the following example to learn how simple it is.
\include Expressions.cpp
Output is:
\include Expressions.out
## Static vectors <a name="static_vectors"></a>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment