Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
if(UNIX)
# Direct CMake to use dpcpp rather than the default C++ compiler/linker
set(CMAKE_CXX_COMPILER dpcpp)
else() # Windows
# Force CMake to use dpcpp rather than the default C++ compiler/linker
# (needed on Windows only)
include (CMakeForceCompiler)
CMAKE_FORCE_CXX_COMPILER (dpcpp IntelDPCPP)
include (Platform/Windows-Clang)
endif()

cmake_minimum_required (VERSION 3.4)

project(DataBundle CXX)

set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

add_subdirectory (src)
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Copyright Intel Corporation

SPDX-License-Identifier: MIT
https://opensource.org/licenses/MIT

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
# Data Transfers Using Pipes
This FPGA tutorial shows how to use pipes to transfer data between kernels.

***Documentation***: The [DPC++ FPGA Code Samples Guide](https://software.intel.com/content/www/us/en/develop/articles/explore-dpcpp-through-intel-fpga-code-samples.html) helps you to navigate the samples and build your knowledge of DPC++ for FPGA. <br>
The [oneAPI DPC++ FPGA Optimization Guide](https://software.intel.com/content/www/us/en/develop/documentation/oneapi-fpga-optimization-guide) is the reference manual for targeting FPGAs through DPC++. <br>
The [oneAPI Programming Guide](https://software.intel.com/en-us/oneapi-programming-guide) is a general resource for target-independent DPC++ programming.

| Optimized for | Description
--- |---
| OS | Linux* Ubuntu* 18.04/20.04, RHEL*/CentOS* 8, SUSE* 15; Windows* 10
| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA <br> Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) <br> Intel® FPGA 3rd party / custom platforms with oneAPI support <br> *__Note__: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*
| Software | Intel® oneAPI DPC++ Compiler <br> Intel® FPGA Add-On for oneAPI Base Toolkit
| What you will learn | The basics of the of DPC++ pipes extension for FPGA<br> How to declare and use pipes in a DPC++ program
| Time to complete | 15 minutes



## Purpose
This tutorial demonstrates how a kernel in a DPC++ FPGA program transfers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This tutorial demonstrates how a kernel in a DPC++ FPGA program transfers
This tutorial demonstrates how a kernel in a DPC++ FPGA program streams

I would like to lean on the idea that pipes are streams. particularly with an eye to IP Authoring.

data to or from another kernel using the pipe abstraction.

### Definition of a Pipe
The primary goal of pipes is to allow concurrent execution of kernels that need
to exchange data.

A pipe is a FIFO data structure connecting two endpoints that communicate
using the pipe's `read` and `write` operations. An endpoint can be either a kernel
or an external I/O on the FPGA. Therefore, there are three types of pipes:
* kernel-kernel
* kernel-I/O
* I/O-kernel

This tutorial focuses on kernel-kernel pipes, but
the concepts discussed here apply to other kinds of pipes as well.

The `read` and `write` operations have two variants:
* Blocking variant: Blocking operations may not return immediately but are always successful.
* Non-blocking variant: Non-blocking operations take an extra boolean parameter
that is set to `true` if the operation happened successfully.

Data flows in a single direction inside pipes. In other words, for a pipe `P`
and two kernels using `P`, one of the kernels is exclusively going to perform
`write` to `P` while the other kernel is exclusively going to perform `read` from
`P`. Bidirectional communication can be achieved using two pipes.

Each pipe has a configurable `capacity` parameter describing the number of `write`
operations that may be performed without any `read` operations being performed. For example,
consider a pipe `P` with capacity 3, and two kernels `K1` and `K2` using
`P`. Assume that `K1` performed the following sequence of operations:

`write(1)`, `write(2)`, `write(3)`

In this situation, the pipe is full because three (the `capacity` of
`P`) `write` operations were performed without any `read` operation. In this
situation, a `read` must occur before any other `write` is allowed.

If a `write` is attempted to a full pipe, one of two behaviors occur:

* If the operation is non-blocking, it returns immediately, and its
boolean parameter is set to `false`. The `write` does not have any effect.
* If the operation is blocking, it does not return until a `read` is
performed by the other endpoint. Once the `read` is performed, the `write`
takes place.

The blocking and non-blocking `read` operations have analogous behaviors when
the pipe is empty.

### Defining a Pipe in DPC++

In DPC++, pipes are defined as a class with static members. To declare a pipe that
transfers integer data and has `capacity=4`, use a type alias:

```c++
using ProducerToConsumerPipe = pipe< // Defined in the DPC++ headers.
class ProducerConsumerPipe, // An identifier for the pipe.
int, // The type of data in the pipe.
4>; // The capacity of the pipe.
```

The `class ProducerToConsumerPipe` template parameter is important to the
uniqueness of the pipe. This class need not be defined but must be distinct
for each pipe. Consider another type alias with the exact same parameters:

```c++
using ProducerToConsumerPipe2 = pipe< // Defined in the DPC++ headers.
class ProducerConsumerPipe, // An identifier for the pipe.
int, // The type of data in the pipe.
4>; // The capacity of the pipe.
```

The uniqueness of a pipe is derived from a combination of all three template
parameters. Since `ProducerToConsumerPipe` and `ProducerToConsumerPipe2` have
the same template parameters, they define the same pipe.

### Using a Pipe in DPC++

This code sample defines a `Consumer` and a `Producer` kernel connected
by the pipe `ProducerToConsumerPipe`. Kernels use the
`ProducerToConsumerPipe::write` and `ProducerToConsumerPipe::read` methods for
communication.

The `Producer` kernel reads integers from the global memory and writes those integers
into `ProducerToConsumerPipe`, as shown in the following code snippet:

```c++
void Producer(queue &q, buffer<int, 1> &input_buffer) {
std::cout << "Enqueuing producer...\n";

auto e = q.submit([&](handler &h) {
accessor input_accessor(input_buffer, h, read_only);
auto num_elements = input_buffer.get_count();

h.single_task<ProducerTutorial>([=]() {
for (size_t i = 0; i < num_elements; ++i) {
ProducerToConsumerPipe::write(input_accessor[i]);
}
});
});
}
```

The `Consumer` kernel reads integers from `ProducerToConsumerPipe`, processes
the integers (`ConsumerWork(i)`), and writes the result into the global memory.

```c++
void Consumer(queue &q, buffer<int, 1> &output_buffer) {
std::cout << "Enqueuing consumer...\n";

auto e = q.submit([&](handler &h) {
accessor out_accessor(out_buf, h, write_only, no_init);
size_t num_elements = output_buffer.get_count();

h.single_task<ConsumerTutorial>([=]() {
for (size_t i = 0; i < num_elements; ++i) {
int input = ProducerToConsumerPipe::read();
int answer = ConsumerWork(input);
output_accessor[i] = answer;
}
});
});
}
```

**NOTE:** The `read` and `write` operations used are blocking. If
`ConsumerWork` is an expensive operation, then `Producer` might fill
`ProducerToConsumerPipe` faster than `Consumer` can read from it, causing
`Producer` to block occasionally.

## Key Concepts
* The basics of the of DPC++ pipes extension for FPGA
* How to declare and use pipes in a DPC++ program

## License
Code samples are licensed under the MIT license. See
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.

Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)

## Building the `pipes` Tutorial

### Include Files
The included header `dpc_common.hpp` is located at `%ONEAPI_ROOT%\dev-utilities\latest\include` on your development system.

### Running Samples in DevCloud
If running a sample in the Intel DevCloud, remember that you must specify the type of compute node and whether to run in batch or interactive mode. Compiles to FPGA are only supported on fpga_compile nodes. Executing programs on FPGA hardware is only supported on fpga_runtime nodes of the appropriate type, such as fpga_runtime:arria10 or fpga_runtime:stratix10. Neither compiling nor executing programs on FPGA hardware are supported on the login nodes. For more information, see the Intel® oneAPI Base Toolkit Get Started Guide ([https://devcloud.intel.com/oneapi/documentation/base-toolkit/](https://devcloud.intel.com/oneapi/documentation/base-toolkit/)).

When compiling for FPGA hardware, it is recommended to increase the job timeout to 12h.


### Using Visual Studio Code* (Optional)

You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations,
and browse and download samples.

The basic steps to build and run a sample using VS Code include:
- Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
- Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
- Open a Terminal in VS Code (**Terminal>New Terminal**).
- Run the sample in the VS Code terminal using the instructions below.

To learn more about the extensions and how to configure the oneAPI environment, see
[Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).

After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.

### On a Linux* System

1. Generate the `Makefile` by running `cmake`.
```
mkdir build
cd build
```
To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command:
```
cmake ..
```
Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command:

```
cmake .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10
```
You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command:
```
cmake .. -DFPGA_BOARD=<board-support-package>:<board-variant>
```

2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow:

* Compile for emulation (fast compile time, targets emulated FPGA device):
```
make fpga_emu
```
* Generate the optimization report:
```
make report
```
* Compile for FPGA hardware (longer compile time, targets FPGA device):
```
make fpga
```
3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded <a href="https://iotdk.intel.com/fpga-precompiled-binaries/latest/pipes.fpga.tar.gz" download>here</a>.

### On a Windows* System

1. Generate the `Makefile` by running `cmake`.
```
mkdir build
cd build
```
To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command:
```
cmake -G "NMake Makefiles" ..
```
Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command:

```
cmake -G "NMake Makefiles" .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10
```
You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command:
```
cmake -G "NMake Makefiles" .. -DFPGA_BOARD=<board-support-package>:<board-variant>
```

2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow:

* Compile for emulation (fast compile time, targets emulated FPGA device):
```
nmake fpga_emu
```
* Generate the optimization report:
```
nmake report
```
* Compile for FPGA hardware (longer compile time, targets FPGA device):
```
nmake fpga
```

*Note:* The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.<br>
*Note:* If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory.

### In Third-Party Integrated Development Environments (IDEs)

You can compile and run this tutorial in the Eclipse* IDE (in Linux*) and the Visual Studio* IDE (in Windows*). For instructions, refer to the following link: [Intel® oneAPI DPC++ FPGA Workflows on Third-Party IDEs](https://software.intel.com/en-us/articles/intel-oneapi-dpcpp-fpga-workflow-on-ide)

## Examining the Reports
Locate `report.html` in the `pipes_report.prj/reports/` directory. Open the report in any of Chrome*, Firefox*, Edge*, or Internet Explorer*.

Navigate to the "System Viewer" to visualize the structure of the kernel system. Identify the pipe connecting the two kernels.

## Running the Sample

1. Run the sample on the FPGA emulator (the kernel executes on the CPU):
```
./pipes.fpga_emu (Linux)
pipes.fpga_emu.exe (Windows)
```
2. Run the sample on the FPGA device:
```
./pipes.fpga (Linux)
```

### Example of Output
You should see the following output in the console:

1. When running on the FPGA emulator
```
Input Array Size: 8192
Enqueuing producer...
Enqueuing consumer...

Profiling Info
Producer:
Start time: 0 ms
End time: +8.18174 ms
Kernel Duration: 8.18174 ms
Consumer:
Start time: +7.05307 ms
End time: +8.18231 ms
Kernel Duration: 1.12924 ms
Design Duration: 8.18231 ms
Design Throughput: 4.00474 MB/s

PASSED: The results are correct
```
NOTE: The FPGA emulator does not accurately represent the performance nor the kernels' relative timing (i.e., the start and end times).

2. When running on the FPGA device
```
Input Array Size: 1048576
Enqueuing producer...
Enqueuing consumer...

Profiling Info
Producer:
Start time: 0 ms
End time: +4.481 ms
Kernel Duration: 4.481 ms
Consumer:
Start time: +0.917 ms
End time: +4.484 ms
Kernel Duration: 3.568 ms
Design Duration: 4.484 ms
Design Throughput: 935.348 MB/s

PASSED: The results are correct
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.28307.705
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "data_bundle", "data_bundle.vcxproj", "{BE9E5E70-F644-4119-9A1F-E2B75C85B9E2}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|x64 = Debug|x64
Release|x64 = Release|x64
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{BE9E5E70-F644-4119-9A1F-E2B75C85B9E2}.Debug|x64.ActiveCfg = Debug|x64
{BE9E5E70-F644-4119-9A1F-E2B75C85B9E2}.Debug|x64.Build.0 = Debug|x64
{BE9E5E70-F644-4119-9A1F-E2B75C85B9E2}.Release|x64.ActiveCfg = Release|x64
{BE9E5E70-F644-4119-9A1F-E2B75C85B9E2}.Release|x64.Build.0 = Release|x64
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {47B77939-C7AE-44EC-AD38-EF8459A9C41A}
EndGlobalSection
EndGlobal
Loading