Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.14)

project(embedded_function VERSION 2.1.5 LANGUAGES CXX)
project(embedded_function VERSION 2.1.6 LANGUAGES CXX)
add_library(embedded_function INTERFACE)
add_library(embedded_function::functions ALIAS embedded_function)
set_target_properties(embedded_function PROPERTIES EXPORT_NAME functions)
Expand Down
168 changes: 77 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Embedded Function

<p align="center">
<img src="https://img.shields.io/badge/Version-2.1.5-yellow?style=for-the-badge&logo=github" alt="Version - 2.1.5">
<img src="https://img.shields.io/badge/Version-2.1.6-yellow?style=for-the-badge&logo=github" alt="Version - 2.1.6">
<img src="https://img.shields.io/badge/License-MIT-orange?style=for-the-badge" alt="License - MIT">
<img src="https://img.shields.io/badge/C++-11/14/17/20/23-blue?style=for-the-badge&logo=c%2B%2B" alt="C++ - 11/14/17/20/23">
</p>
Expand All @@ -21,9 +21,11 @@

## 📌 Overview

*Embedded Function* is a **lightweight** and **no-heap-allocation** function wrapper collection implemented based on the C++11 standard, tailored specifically for embedded systems.
*Embedded Function* is a **lightweight** and **no-heap-allocation** function wrapper collection implemented based on the C++11 standard, optimized([see below](#-performance-optimization)) for resource-constrained or high-performance environments.

In only **one** [header file](./include/embed/embed_function.hpp), **4** function wrappers are provided as follows:
The library is [freestanding](https://en.cppreference.com/w/cpp/freestanding), making it feasible for embedded development or kernel design of an operating system.

In a [single header file](./include/embed/embed_function.hpp), **four** function wrappers are provided as follows:

```cpp
namespace ebd {
Expand Down Expand Up @@ -76,18 +78,28 @@ auto main() -> int {

```cpp
/// The definition of method of a function wrapper is as follows:
ebd::fn<int (int, float, char) const, 3*sizeof(void*)> fn_;
// ^ ^ ^ ^ ^ ^
// | | | | | |
// Return type | | | | |
// Parameters ~|~~~~~|~~~~~| | |
// Qualifier ~~~~~~~~~~~~~~~~~~~~| |
// Buffer size ~~~~~~~~~~~~~~~~~~~~~~~~~~~|
FnWrapper <void(int, char) const, 3*sizeof(void*)> fn_ = +[](int, char) {};
// ^ ^ ^~~~~~~ ^ ^~~~~~ ^~~~~~~~~~~~~
// | | | | | |
// Function wrapper | | | | |
// Return type ~~~~~| | | | |
// Parameters ~~~~~~~~~~| | | |
// Qualifier ~~~~~~~~~~~~~~~~~~~~~~~~| | |
// Buffer size ~~~~~~~~~~~~~~~~~~~~~~~~~~~~| |
// Callable object ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
```

> The *`Qualifier`* is used to restrict the callable objects wrapped within `ebd::fn`, rather than `ebd::fn` itself. In other words, the `operator()` of the `ebd::fn` object will be qualified with the `Qualifier` modifier.
- *`Function wrapper`*: One of `ebd::fn`, `ebd::unique_fn`, `ebd::safe_fn` and `ebd::fn_ref`.

- *`Return type`*: A type that can be implicitly converted from the direct return type of *`Callable object`*.

- *`Parameters`*: Types that can implicitly converts to the parameter types of *`Callable object`*.

> The *`Buffer size`* is the size used to store the callable object, which can be omitted. If omitted, this parameter will be set to `detail::default_buffer_size::value` by default, which is sufficient to store most common callable objects, including function pointers, simple non-capturing and capturing lambdas, and lightweight custom classes.
- *`Qualifier`*: Applies to the wrapper's `operator()` (e.g., `const`, `noexcept`, `&`, `&&`), restricting which callable objects can be stored.

- *`Buffer size`*: Size (in bytes) of the internal storage. Defaults to `DefaultSize`. Triggers `static_assert` if insufficient - no heap allocation.

- *`Callable object`*: Any entity callable with the target signature (function pointer, lambda, function object, `std::reference_wrapper`). Copied or moved into the buffer depending on wrapper type.

## 🧠 Design goals driving the design

Expand Down Expand Up @@ -138,25 +150,19 @@ ebd::fn<int (int, float, char) const, 3*sizeof(void*)> fn_;

4. **Triviality**: `fn_ref` is trivially copyable (same as `std::function_ref`).

## 🚀 Performance optimization

### Branch elimination

`ebd::fn` / `ebd::unique_fn` / `ebd::safe_fn` / `ebd::fn_ref` completely eliminate runtime checks for empty function states during invocation, significantly boosting performance of frequent function calls.

### Smart forwarding

`ebd::fn` / `ebd::unique_fn` / `ebd::safe_fn` / `ebd::fn_ref` enable scalar arguments and small-sized trivial arguments to be passed via registers instead of having to be passed via the stack as in `std::function`. This significantly reduces the memory access overhead during parameter passing.

### Zero-stack overhead

`ebd::fn_ref` occupies no stack space when used as a function parameter; it is passed entirely in registers. This allows the compiler to directly tail-call the wrapped target, removing the cost of an extra stack frame. See [x86_64-asm](./docs/perf/x86_64_gcc_fn_ref_zero_stack.md).

### Stateless elimination
### Convertibility

`ebd::fn` / `ebd::unique_fn` / `ebd::safe_fn` / `ebd::fn_ref` do not store the functor or its pointer if the functor is stateless (e.g., empty classes with trivial operations). This reduces memory access operations and improves cache efficiency.
- `Yes-D`: Convertible and direct wrapping (`To.BufferSize` >= `From.BufferSize`);
- `Yes-I`: Convertible and indirect wrapping (`To.BufferSize` >= `sizeof(From)`);
- `Yes-R`: Convertible and non-owning wrapping.
- `No`: Inconvertible

> Click [x64-asm](./docs/perf/x86_64_msvc_asm_analysis.md), [rv32-asm](./docs/perf/riscv_gcc_asm_analysis.md) and [arm32-asm](./docs/perf/arm_gcc_asm_analysis.md) to see more details.
| From \ To | `ebd::fn` | `ebd::unique_fn` | `ebd::safe_fn` | `ebd::fn_ref` |
| :---: | :---: | :---: | :---: | :---: |
| `ebd::fn` | Yes-D | Yes-D | No | Yes-R |
| `ebd::unique_fn` | No | Yes-D | No | Yes-R |
| `ebd::safe_fn` | Yes-I | Yes-I | Yes-D | Yes-R |
| `ebd::fn_ref` | Yes-I | Yes-I | Yes-I | Yes-D |

## 🧩 Automatic deduction

Expand Down Expand Up @@ -193,11 +199,11 @@ auto f = ebd::make_fn<Signature>(Ambiguous_Callable_Object);

```cpp
// Create specified function wrapper and automatically deduce the template arguments.
// The Callable_Object should be unambiguously callable (non-overload).
auto f = ebd::make_fn<ebd::fn>(Callable_Object);
auto f = ebd::make_fn<ebd::unique_fn>(Callable_Object);
auto f = ebd::make_fn<ebd::safe_fn>(Callable_Object);
auto f = ebd::make_fn<ebd::fn_ref>(Callable_Object);
// The Callable_Object should be unambiguously callable (non-overload) if `Signature` is omitted.
auto f = ebd::make_fn<ebd::fn[, Signature]>(Callable_Object);
auto f = ebd::make_fn<ebd::unique_fn[, Signature]>(Callable_Object);
auto f = ebd::make_fn<ebd::safe_fn[, Signature]>(Callable_Object);
auto f = ebd::make_fn<ebd::fn_ref[, Signature]>(Callable_Object);
```

```cpp
Expand Down Expand Up @@ -291,73 +297,53 @@ Every compiler with modern C++11 support should work.

Go to the `<root>/test/` directory, and follow the instructions in [`HOW-TO-TEST.md`](./test/HOW-TO-TEST.md) to run the tests.

## ⏱️ Benchmark
## 🚀 Performance optimization

Go to the `<root>/benchmark/` directory, and follow the instructions in [`HOW-TO-BENCHMARK.md`](./benchmark/HOW-TO-BENCHMARK.md) to run the tests.
### Branch elimination

> *( Compiler: `MSVC` Standard: `C++14` Config: `Release` Tool: [picobench](https://github.com/iboB/picobench) )*
`ebd::fn` / `ebd::unique_fn` / `ebd::safe_fn` / `ebd::fn_ref` completely eliminate runtime checks for empty function states during invocation, significantly boosting performance of frequent function calls.

> **std**: `std::function`, **ebd**: `ebd::fn`, **fu2**: [`fu2::function`](https://github.com/Naios/function2)
### Smart forwarding

```md
## FreeFunction.ScalarParameters:
`ebd::fn` / `ebd::unique_fn` / `ebd::safe_fn` / `ebd::fn_ref` enable scalar arguments and small-sized trivial arguments to be passed via registers instead of having to be passed via the stack as in `std::function`. This significantly reduces the memory access overhead during parameter passing.

Name (* = baseline) | Dim | Total ms | ns/op |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
free_scalar_std * | 10000 | 0.030 | 3 | - |332225913.6
free_scalar_ebd | 10000 | 0.028 | 2 | 0.930 |357142857.1
free_scalar_fu2 | 10000 | 0.052 | 5 | 1.731 |191938579.7
free_scalar_std * | 100000 | 0.301 | 3 | - |332667997.3
free_scalar_ebd | 100000 | 0.265 | 2 | 0.881 |377643504.5
free_scalar_fu2 | 100000 | 0.523 | 5 | 1.742 |191021967.5
free_scalar_std * | 1000000 | 3.006 | 3 | - |332712270.4
free_scalar_ebd | 1000000 | 2.708 | 2 | 0.901 |369317132.6
free_scalar_fu2 | 1000000 | 5.264 | 5 | 1.751 |189958778.9

## FreeFunction.TrivialParameters:
### Zero-stack overhead

Name (* = baseline) | Dim | Total ms | ns/op |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
free_trivial_std * | 10000 | 0.032 | 3 | - |311526479.8
free_trivial_ebd | 10000 | 0.024 | 2 | 0.754 |413223140.5
free_trivial_fu2 | 10000 | 0.052 | 5 | 1.626 |191570881.2
free_trivial_std * | 100000 | 0.322 | 3 | - |310366232.2
free_trivial_ebd | 100000 | 0.240 | 2 | 0.746 |415800415.8
free_trivial_fu2 | 100000 | 0.510 | 5 | 1.583 |196001568.0
free_trivial_std * | 1000000 | 3.222 | 3 | - |310375865.2
free_trivial_ebd | 1000000 | 2.508 | 2 | 0.778 |398692289.3
free_trivial_fu2 | 1000000 | 5.792 | 5 | 1.798 |172660876.8

## FreeFunction.CopyHardParameters:
`ebd::fn_ref` occupies no stack space when used as a function parameter; it is passed entirely in registers. This allows the compiler to directly tail-call the wrapped target, removing the cost of an extra stack frame. See [x86_64-asm](./docs/perf/x86_64_gcc_fn_ref_zero_stack.md).

Name (* = baseline) | Dim | Total ms | ns/op |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
free_copyhard_std * | 10000 | 0.197 | 19 | - | 50684237.2
free_copyhard_ebd | 10000 | 0.198 | 19 | 1.004 | 50505050.5
free_copyhard_fu2 | 10000 | 0.303 | 30 | 1.537 | 32981530.3
free_copyhard_std * | 100000 | 1.976 | 19 | - | 50604726.5
free_copyhard_ebd | 100000 | 1.982 | 19 | 1.003 | 50456632.5
free_copyhard_fu2 | 100000 | 3.044 | 30 | 1.541 | 32849352.9
free_copyhard_std * | 1000000 | 19.898 | 19 | - | 50256307.2
free_copyhard_ebd | 1000000 | 20.052 | 20 | 1.008 | 49870088.4
free_copyhard_fu2 | 1000000 | 31.358 | 31 | 1.576 | 31889890.6

## FreeFunction.CallTrivialParameters:
### Stateless elimination

`ebd::fn` / `ebd::unique_fn` / `ebd::safe_fn` / `ebd::fn_ref` do not store the functor or its pointer if the functor is stateless (e.g., empty classes with trivial operations). This reduces memory access operations and improves cache efficiency.

> Click [x64-asm](./docs/perf/x86_64_msvc_asm_analysis.md), [rv32-asm](./docs/perf/riscv_gcc_asm_analysis.md) and [arm32-asm](./docs/perf/arm_gcc_asm_analysis.md) to see more details.

## ⏱️ Benchmark

**Embedded-Function has 5%~30% performance enhancement over `std::function`.**

> *( `Compiler`: GCC-14 `Standard`: C++14 `Config`: -Os `Tool`: [picobench](https://github.com/iboB/picobench) `fu2`: [function2](https://github.com/Naios/function2) )*

### StdOperatorWrapper.FunctionWrapperAsParams:

Name (* = baseline) | Dim | Total ms | ns/op |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
free_calltrivial_std * | 10000 | 0.032 | 3 | - |311526479.8
free_calltrivial_ebd | 10000 | 0.024 | 2 | 0.751 |414937759.3
free_calltrivial_fu2 | 10000 | 0.056 | 5 | 1.757 |177304964.5
free_calltrivial_std * | 100000 | 0.320 | 3 | - |312597686.8
free_calltrivial_ebd | 100000 | 0.257 | 2 | 0.802 |389711613.4
free_calltrivial_fu2 | 100000 | 0.584 | 5 | 1.827 |171115674.2
free_calltrivial_std * | 1000000 | 3.223 | 3 | - |310269934.8
free_calltrivial_ebd | 1000000 | 2.407 | 2 | 0.747 |415506710.4
free_calltrivial_fu2 | 1000000 | 5.934 | 5 | 1.841 |168517551.1
```

> See [here](https://github.com/Kim-J-Smith/Embedded-Function/actions/workflows/benchmark.yml) for more benchmark results.
`std::function` * | 10000 | 0.090 | 8 | - |111671952.5
`fu2::function` | 10000 | 0.176 | 17 | 1.968 | 56744349.7
**`ebd::fn`** | 10000 | 0.068 | 6 | 0.758 |147412179.2
`fu2::function_view` | 10000 | 0.034 | 3 | 0.379 |294602875.3
**`ebd::fn_ref`** | 10000 | 0.034 | 3 | 0.375 |297424305.5
`std::function` * | 100000 | 0.895 | 8 | - |111756442.5
`fu2::function` | 100000 | 1.765 | 17 | 1.973 | 56644386.5
**`ebd::fn`** | 100000 | 0.678 | 6 | 0.758 |147444347.1
`fu2::function_view` | 100000 | 0.340 | 3 | 0.380 |294061429.4
**`ebd::fn_ref`** | 100000 | 0.308 | 3 | 0.345 |324361494.4
`std::function` * | 1000000 | 9.952 | 9 | - |100481295.4
`fu2::function` | 1000000 | 17.733 | 17 | 1.782 | 56391833.9
**`ebd::fn`** | 1000000 | 6.832 | 6 | 0.686 |146378186.5
`fu2::function_view` | 1000000 | 3.420 | 3 | 0.344 |292392274.6
**`ebd::fn_ref`** | 1000000 | 3.249 | 3 | 0.326 |307826614.8

> See [here](https://github.com/Kim-J-Smith/Embedded-Function/actions/workflows/benchmark.yml) for more benchmark results. Follow [`HOW-TO-BENCHMARK.md`](./benchmark/HOW-TO-BENCHMARK.md) to run the benchmark in your platform.

## 🧭 Future learning & evolution reference

Expand Down
Loading
Loading