Introduction

I've been having some intrusive thoughts - that while async embedded Rust is great, it could also be better, more transparent and best practices should be documented.

This book serves two main purposes:

  • To demystify some parts of the current embedded Rust ecosystem and provide example solutions to some pain points that exist today.
  • To serve as a notebook for my ideas. Note that these are just ideas, not a definitive source of truth. These ideas may be presented in a very raw form and important parts may be missing.
Embedded Rust is a result of a work of many exremely talented and hardworking people. I have my utmost respect for them and for what they achieved. This book is not about complaining about problems of the ecosystem, but rather about providing some of the missing pieces.

My intrusive thoughts revolve around the following ideas (in no particular order):

  • Tooling improvements - make common tasks easy (measure binary size, bss use), crash inspections, logs inspections.
  • Explanation of async on embedded by developing a simple async executor.
  • Exploration of intrusive linked list as an alternative to static or fixed size allocation.
  • Tracing for embedded async.
  • Standardization of reading and writing of firmware metadata.
  • Developing best practices for panic/hardfault handling and post-mortem debugging.
  • Developing a limited example RP2350 HAL with primitives for more low level DMA and drivers (something like lilos's Notify).

Some of the aforementioned rough edges IMHO are:

  • It is unclear how to do some common things (e.g. static mut handling, especially in the context of 2024 edition changes).
  • Writing hardware independent/HAL independent drivers requires a lot of "infectious" generics.
  • HALs lock the users into a specific ways of using peripherals, because it is often impractical to implement all the peripheral IP features. As a result of this, making highly special things is hard - an example of this is abusing the double buffered DMA to support reading from the DCMI peripheral on STM32 to allow for DMA reads consisting of more than 65535 transfers.
  • Debugging of why things don't work (for example even before defmt is available) is not well documented.

liltcp

liltcp is a demo project concerned with developing a basic glue library for connecting together smoltcp, an HAL and an async runtime. The name is sort of a pun on both smoltcp and Cliff L. Biffle's lilos, because both of these are used as a basis for the glue. The goal of the project is to be able to produce a working yet very basic alternative to embassy-net, therefore documenting how it works and how to use smoltcp's async capabilities. To avoid depdency on embassy itself, stm32h7xx-hal is used as a HAL.

The demo project is developed for the STM32H743ZI Nucleo devkit, but it should work with any other H7 board, providing pin mappings are corrected.

Getting started

Before diving into developing the networking code, let's first make an LED blinking smoke test. This is just to make sure that the environment is set up correctly and there are no broken things (devkit, cables, etc). The smoke test also makes sure that we have lilos working together with the HAL.

The code below implements such a smoke test.

#![no_main]
#![no_std]

use liltcp as _;

#[cortex_m_rt::entry]
fn main() -> ! {
    let mut cp = cortex_m::Peripherals::take().unwrap();
    let dp = stm32h7xx_hal::pac::Peripherals::take().unwrap();

    let ccdr = liltcp::initialize_clock(dp.PWR, dp.RCC, &dp.SYSCFG);

    let gpio = liltcp::init_gpio(
        dp.GPIOA,
        ccdr.peripheral.GPIOA,
        dp.GPIOB,
        ccdr.peripheral.GPIOB,
        dp.GPIOC,
        ccdr.peripheral.GPIOC,
        dp.GPIOE,
        ccdr.peripheral.GPIOE,
        dp.GPIOG,
        ccdr.peripheral.GPIOG,
    );

    lilos::time::initialize_sys_tick(&mut cp.SYST, ccdr.clocks.sysclk().to_Hz());
    lilos::exec::run_tasks(
        &mut [core::pin::pin!(liltcp::led_task(gpio.led))],
        lilos::exec::ALL_TASKS,
    )
}

First, it initializes the clock, then the GPIO. These are initialized with functions created to allow for easier code sharing, so these include more code than necessary. Next, we initialize the SYSTICK and spawn an LED blinking task.

The LED blinking task itself is pretty bare:

#![allow(unused)]
fn main() {
pub async fn led_task(mut led: ErasedPin<Output>) -> Infallible {
    let mut gate = PeriodicGate::from(lilos::time::Millis(500));
    loop {
        led.toggle();
        gate.next_time().await;
    }
}
}

If everything went well you should see a blinking LED (amber on the Nucleo devkit). We can now move to initializing the Ethernet peripheral to do some basic link state polling.

Initializing and polling the Ethernet peripheral

At this point, we know that the devkit is able to run our code, but it doesn't yet do anything network related, so let's change that.

First, we need to initialize the Ethernet peripheral driver from the HAL.

#![allow(unused)]
fn main() {
    let (_eth_dma, eth_mac) = ethernet::new(
        dp.ETHERNET_MAC,
        dp.ETHERNET_MTL,
        dp.ETHERNET_DMA,
        gpio.eth_pins,
        unsafe { liltcp::take_des_ring() },
        liltcp::MAC,
        ccdr.peripheral.ETH1MAC,
        &ccdr.clocks,
    );

    let mut lan8742a = ethernet::phy::LAN8742A::new(eth_mac.set_phy_addr(0));
    lan8742a.phy_reset();
    lan8742a.phy_init();
}

The initialization itself is pretty bare, the only remotely interesting part is the initialization of the PHY on the address 0.

The ethernet peripheral internally sets up DMA for receiving and transmitting data and lets the user know that something happened using an interrupt handler.

#![allow(unused)]
fn main() {
#[cortex_m_rt::interrupt]
fn ETH() {
    unsafe {
        ethernet::interrupt_handler();
    }
}
}

The interrupt must also be enabled in NVIC, which is done using the following function, called just before lilos spawns tasks.

#![allow(unused)]
fn main() {
pub unsafe fn enable_eth_interrupt(nvic: &mut pac::NVIC) {
    ethernet::enable_interrupt();
    nvic.set_priority(stm32h7xx_hal::stm32::Interrupt::ETH, NVIC_BASEPRI - 1);
    cortex_m::peripheral::NVIC::unmask(stm32h7xx_hal::stm32::Interrupt::ETH);
}
}

Once this is done, the peripheral is ready to send and receive data. That, however, is a topic for the next chapter. For now, we only want to check if the link is up. This is done by polling the PHY. Let's now add a new async task, which will periodically poll the PHY and print the link state on change. To also see the link state on the devkit, let's also turn the LED on, when the link is UP.

#![allow(unused)]
fn main() {
// Periodically poll if the link is up or down
async fn poll_link<MAC: StationManagement>(
    mut phy: LAN8742A<MAC>,
    mut link_led: ErasedPin<Output>,
) -> Infallible {
    let mut gate = PeriodicGate::from(Millis(1000));
    let mut eth_up = false;
    loop {
        gate.next_time().await;

        let eth_last = eth_up;
        eth_up = phy.poll_link();

        link_led.set_state(eth_up.into());

        if eth_up != eth_last {
            if eth_up {
                defmt::info!("UP");
            } else {
                defmt::info!("DOWN");
            }
        }
    }
}
}

The final thing left to do is to spawn the task and run the binary on our devkit.

#![allow(unused)]
fn main() {
    unsafe {
        liltcp::enable_eth_interrupt(&mut cp.NVIC);

        lilos::exec::run_tasks_with_preemption(
            &mut [
                core::pin::pin!(liltcp::led_task(gpio.led)),
                core::pin::pin!(poll_link(lan8742a, gpio.link_led)),
            ],
            lilos::exec::ALL_TASKS,
            Interrupts::Filtered(liltcp::NVIC_BASEPRI),
        );
    }
}

When you plug in an Ethernet cable, there should be a log visible in the terminal and an LED should light up.

We are now ready to move on to actually receiving and transmitting data via the Ethernet.

Polled TCP

When developing a classic embedded Rust application that uses smoltcp for networking (either using RTIC or no executor at all), a common way to do that is to handle networking as part of the ethernet interrupt. This has a few problems:

  • Dependencies to the interrupt have to be declared as global statics.
  • The IRQ must never block.
  • It is harder to add another source of forcing the stack polling.
  • It is up to the developer to handle the state machine properly. (This will be solved in the next chapter with async.)

Let's try to solve the first two problems by adding a simple async task, which will periodically poll the smoltcp interface and handle a TCP client.

For reference, an example of an RTIC example can be found here.

Configuring the IP address

At this point, we will be using the network layer, so the first thing we need to do is to configure an IP address for our smoltcp interface.

    let config = smoltcp::iface::Config::new(liltcp::MAC.into());
    let mut interface = Interface::new(config, &mut eth_dma, liltcp::smoltcp_lilos::smol_now());
    interface.update_ip_addrs(|addrs| {
        let _ = addrs.push(IpCidr::new(
            liltcp::IP_ADDR.into_address(),
            liltcp::PREFIX_LEN,
        ));
    });

    let mut storage = [SocketStorage::EMPTY; 1];
    let mut sockets = SocketSet::new(&mut storage[..]);

The IP address and PREFIX_LEN are defined in the lib.rs as follows:

pub const IP_ADDR: Ipv4Address = Ipv4Address::new(10, 106, 0, 251);
pub const PREFIX_LEN: u8 = 24;

In theory, it should be possible to initialize the whole CIDR address in a single constant, but the patch has only landed recently and is not released yet.

Another thing included in the snippet is allocation of a SocketStorage and a SocketSet, which is smoltcp's way of storing active sockets. In this case, we will add only one socket, so the storage array length will be 1.

Network task

Now, that the preparations are out of the way, we can define our net_task. This task will handle both polling of the stack and handling of TCP (even though it will be simplified.

async fn net_task(
    mut interface: Interface,
    mut dev: ethernet::EthernetDMA<4, 4>,
    sockets: &mut SocketSet<'_>,
    mut phy: LAN8742A<impl StationManagement>,
    mut link_led: ErasedPin<Output>,
) -> Infallible {
    static mut RX: [u8; 1024] = [0u8; 1024];
    static mut TX: [u8; 1024] = [0u8; 1024];

    let rx_buffer = unsafe { RingBuffer::new(&mut RX[..]) };
    let tx_buffer = unsafe { RingBuffer::new(&mut TX[..]) };

    let client = smoltcp::socket::tcp::Socket::new(rx_buffer, tx_buffer);

    let handle = sockets.add(client);

    let mut eth_up = false;

    loop {
        'worker: {
            let eth_last = eth_up;
            eth_up = phy.poll_link();

            link_led.set_state(eth_up.into());

            if eth_up != eth_last {
                if eth_up {
                    defmt::info!("UP");
                } else {
                    defmt::info!("DOWN");
                }
            }
            if !eth_up {
                break 'worker;
            }

            let ready = interface.poll(liltcp::smoltcp_lilos::smol_now(), &mut dev, sockets);

            if !ready {
                break 'worker;
            }

            let socket = sockets.get_mut::<smoltcp::socket::tcp::Socket>(handle);
            if !socket.is_open() {
                defmt::info!("not open, issuing connect");
                defmt::unwrap!(socket.connect(
                    interface.context(),
                    liltcp::REMOTE_ENDPOINT,
                    liltcp::LOCAL_ENDPOINT,
                ));

                break 'worker;
            }

            let mut buffer = [0u8; 10];
            if socket.can_recv() {
                let len = defmt::unwrap!(socket.recv_slice(&mut buffer));
                defmt::info!("recvd: {} bytes {}", len, buffer[..len]);
            }
            if socket.can_send() {
                defmt::unwrap!(socket.send_slice(b"world"));
            }
        }

        // NOTE: Not performant, doesn't handle interrupt signal, cancel the wait on IRQ, etc.
        // NOTE: In async code, this will be replaced with a more elaborate calling of poll_at.
        lilos::time::sleep_for(lilos::time::Millis(1)).await;
    }
}

First, we define buffers that the TCP socket will internally use. These are defined as mutable statics, because they need to have the same lifetime or outlive the 'a lifetime defined for the SocketSet. Next, we create a TCP socket and add it to our SocketSet. This call gives us a handle that can be used to later access the socket through the SocketSet.

Now, the polling itself takes place. This is done in a loop with a labeled block called 'worker. First, we check that the link is UP, if it is not the case, let's just break the 'worker block. If the link is UP, we poll the interface to check if there are any new data to be processed by our socket. When there are, we can access our socket using the aforementioned handle and we can do operations with it. In this case, we check if it is open, if it is not the case, we attempt to connect to a remote endpoint and break the 'worker block to let the interface be polled again. On next polls, if the socket is open, we attempt to do a read and subsequently a write.

In the case of completion of the 'worker block or the block being interrupted by the break 'worker, the task will sleep for a millisecond.

This implementation is not meant to showcase an implementation of a TCP socket. Right now, there are many unhandled states and it is very likely that this will panic if you look at it wrong.

Another big problem here is performance, the polling loop runs with a fixed period of 1 ms.

Spawning the network task

Now we can simply spawn our task and let it do the polling and TCP handling.

        lilos::exec::run_tasks_with_preemption(
            &mut [
                core::pin::pin!(liltcp::led_task(gpio.led)),
                core::pin::pin!(net_task(
                    interface,
                    eth_dma,
                    &mut sockets,
                    lan8742a,
                    gpio.link_led
                )),
            ],
            lilos::exec::ALL_TASKS,
            Interrupts::Filtered(liltcp::NVIC_BASEPRI),
        );

Conclusions

This solution is probably good enough for a simple tests, but apart from it not being async, there is one big problem - adding the TCP handling will soon become a hassle, with any addition.

This is caused by these factors:

  • It is tightly coupled with smoltcp stack polls.
  • Adding more sockets will clutter the code even more.
  • Adding any kind of timeout would block the entire task, or you'd need to implement some sort of a state machine that will handle this - but this is what we want to use async for.

Let's now have a quick intermezzo concerning decoupling of polling and socket handling. Let's share the smoltcp stack across tasks.

Intermezzo - sharing smoltcp stack between tasks

Sharing data between tasks is usually dependent on the executor and other environment. For example in embassy, sharing can be done with references with a static lifetime, since tasks are allocated in statics. In the std environment, you'd typically used something like an Arc.

In our environment (lilos executor), tasks are allocated on the stack. This means that for sharing data, we don't need to use references with a static lifetime, but with a generic lifetime. This is important, as we don't have to deal with either static muts or initialization of statics with local data.

A simple example of this can be seen in the following snippet.

fn main() -> ! {
    let shared_resource = 0;

    lilos::run_tasks(
        &mut [
            pin!(task_a(&shared_resource)),
            pin!(task_b(&shared_resource)),
        ] 
    )
}

async fn task_a(res: &i32) -> Infallible { .. }
async fn task_b(res: &i32) -> Infallible { .. }

Mutating the shared resources

This basically solves the problem of sharing data between tasks, but one problem still remains - how can we mutate the shared data? We can't have multiple mutable references at the same time, so we need to utilize some kind of interior mutability pattern. This is usually done with the Cell or RefCell types. Cell is not very useful for our use case, since it provides mutability by moving in and out of it. RefCell is much more interesting, because it allows us to obtain mutable and immutable references to our data. Without going into much detail, RefCell basically implements the borrow checker and its rules in the runtime, instead of compile time.

In embedded systems, there is one more thing we care about and that is sharing data between our tasks and interrupt handlers. This is usually done by using something along the lines of Mutexes that protect data access using a critical sections. This has been ommitted here on purpose, since our system doesn't require it.

When we wrap our shared resource with RefCell, our example code will look like the following snippet.

fn main() -> ! {
    let shared_resource = RefCell::new(i32);

    lilos::run_tasks(
        &mut [
            pin!(task_a(&shared_resource)),
            pin!(task_b(&shared_resource)),
        ] 
    )
}

async fn task_a(res: &RefCell<i32> -> Infallible { .. }
async fn task_b(res: &RefCell<i32> -> Infallible { .. }

Now, when we want to access some data in a task we can do:

async fn task_a(res: &RefCell<i32>) -> Infallible {
  {
    let r = res.borrow_mut();
    *r += 1;
  }
  yield_cpu().await;
}

Notice, that the shared reference access is done in a block. That is to assure that the r which is actually a "smart" pointer to the underlying data is dropped before we yield control to the executor. If it weren't dropped before the yield (actually any await point), the code would crash upon obtaining another mutable borrow from the RefCell.

Hiding the implementation details and providing a nice API

This code is quite good until there are more shared resources, or the need arises to implement methods on the shared resource. Ideally, we'd like to be able to wrap the shared state into a structure and not expose the implementation detail of the shared reference and interior mutability.

The approach I have chosen for this is to create a wrapper around the shared reference. Until we add some more fields to the wrapper, it will be trivially copyable - meaning it can be passed into as many tasks as required and using it, we can make a nice API, that hides the aforementioned implementation detail. This pattern is generally used, embassy-net, which this tutorial is based on, also uses it.

Let's implement it:

We'll define our shared state as InnerStack struct.

pub struct InnerStack {
  // stack fields
}

Now, let's create a wrapper struct that we'll implement our API on.

pub struct Stack<'a> {
  pub inner: &'a RefCell<InnerStack>,
}

We want to avoid handling the RefCell in every function call, so let's create an accessor function.

impl<'a> Stack<'a> {
    pub fn with<F, U>(&mut self, f: F) -> U
    where
        F: FnOnce(&mut InnerStack) -> U,
    {
        f(&mut self.inner.borrow_mut())
    }
}

Now, we can implement methods on the Stack that look like this:

impl<'a> Stack<'a> {
  pub fn poll(&mut self) -> bool {
    self.with(|stack| stack.poll())
  }
}

Which is much more readable, hides the RefCell and most importantly limits the scope of the RefCell borrows.

Sharing a smoltcp stack

This implementation works for the simpler cases, but there is a problem with smoltcp: for some calls, you need to have mutable references to two fields of the InnerStack - to the SocketStorage and to the Interface.

This seems simple at first, but is a bit involved as it goes against the borrow checker's rules on mutable borrows. Trying it out is left as an exercise to the reader.

The solution to this is to use the RefMut::map_split function to effectively split one RefMut into two RefMuts.

Combining all the above together and modifying it to fit the needs of a smoltcp wrapper, we get the following code.

use core::cell::{RefCell, RefMut};

use smoltcp::iface::{Interface, SocketSet, SocketStorage};

pub struct InnerStack<'a> {
    sockets: SocketSet<'a>,
    interface: Interface,
}

impl<'a> InnerStack<'a> {
    pub fn new(storage: &'a mut [SocketStorage<'a>], interface: Interface) -> Self {
        Self {
            sockets: SocketSet::new(storage),
            interface,
        }
    }
}

#[derive(Clone, Copy)]
pub struct Stack<'a> {
    inner: &'a RefCell<InnerStack<'a>>,
}

impl<'a> Stack<'a> {
    pub fn new(inner: &'a RefCell<InnerStack<'a>>) -> Self {
        Self { inner }
    }

    pub fn with<F, U>(&mut self, f: F) -> U
    where
        F: FnOnce((&mut SocketSet<'a>, &mut Interface)) -> U,
    {
        let (mut interface, mut sockets) = RefMut::map_split(self.inner.borrow_mut(), |r| {
            (&mut r.interface, &mut r.sockets)
        });
        f((&mut sockets, &mut interface))
    }
}

Cleaning up the API

The code now implements everything we need from it, but still has a problem that we are leaking the information about the RefCell to the creator of the stack, which in turn requires us to make the InnerStack public.

A possible solution to this is the following:

use core::{cell::RefCell, mem::MaybeUninit};

pub struct StackResources {
    inner: MaybeUninit<RefCell<InnerStack>>,
}

struct InnerStack {
    resource_a: i32,
}

struct Stack<'a> {
    inner: &'a RefCell<InnerStack>,
}

impl<'a> Stack<'a> {
    fn new(resources: &'a mut StackResources) -> Self {
        let inner = resources
            .inner
            .write(RefCell::new(InnerStack { resource_a: 42 }));
        Self { inner }
    }
}

This code is a heavily distilled solution of how embassy-net does this. You can find the original solution here.

This approach will not be used in the remainder of the tutorial because I believe it complicates things and doesn't add much value to the goal of the tutorial, which is to write an async glue between smoltcp and any HAL.

Having this out of the way, we can now finally go and implement an asynchronous TCP socket.

Fully asynchronous TCP client

In the previous chapter, we managed to share a wrapper around smoltcp between tasks. That means that we are now ready to separate polling the stack and handling sockets.

Polling the stack

Let's start by implementing the stack polling. There are two signals that should trigger polling:

  1. The Ethernet interrupt
  2. smoltcp's internal timers
There can be many more signals that could, in theory, improve performance - such as triggering poll whenever a buffer is filled with data, or whenever a new buffer is read, or written to the peripheral's descriptor ring. However, adding these sources is out of scope for this tutorial. In the case of the descriptor ring buffers, it'd require hacking the HAL itself.

As for signaling from the Ethernet interrupt, we can use lilos's Notify synchronization primitive.

static IRQ_NOTIFY: lilos::exec::Notify = lilos::exec::Notify::new();

We must declare it statically, so that it can be accessed from the interrupt handler. Luckily, it has a const new() function, so nothing special needs to be done to initialize it.

Now, whenever the interrupt handler is called, we can notify that something happened.

#[cortex_m_rt::interrupt]
fn ETH() {
    unsafe {
        ethernet::interrupt_handler();
    }
    // NOTE: embassy_net wakes polling task any time RX or TX tokens are consumed, resulting in 3x
    // throughput
    IRQ_NOTIFY.notify();
}

We can wait for the signal in our polling task using the Notify::until_next method.

Now, let's go back to the polling signaled by the smoltcp internal timers. smoltcp's Interface contains a mechanism of letting the polling code know when it should be polled next or after how much time it should be polled next. For the delaying of the polling, we can use lilos::time::sleep_for async function. So, we now have two futures, we need to combine and whenever one of them completes, we can poll the interface. For this we can use the select(A, B) asynchronous function from embassy-futures, which does exactly what we need, receives two features and returns whenever one of the features resolves.

The whole polling task is in the following snippet.

async fn net_task(
    mut stack: Stack<'_>,
    mut dev: ethernet::EthernetDMA<4, 4>,
    mut phy: LAN8742A<impl StationManagement>,
    mut link_led: ErasedPin<Output>,
) -> Infallible {
    let mut eth_up = false;

    loop {
        let poll_delay = stack.with(|(sockets, interface)| {
            interface
                .poll_delay(smol_now(), sockets)
                .unwrap_or(Duration::from_millis(1))
        });

        match embassy_futures::select::select(
            lilos::time::sleep_for(lilos::time::Millis(poll_delay.millis())),
            IRQ_NOTIFY.until_next(),
        )
        .await
        {
            select::Either::First(_) => {}
            select::Either::Second(_) => {}
        }

        let eth_last = eth_up;
        eth_up = phy.poll_link();

        link_led.set_state(eth_up.into());

        if eth_up != eth_last {
            if eth_up {
                defmt::info!("UP");
            } else {
                defmt::info!("DOWN");
            }
        }
        if !eth_up {
            continue;
        }

        stack.with(|(sockets, interface)| interface.poll(smol_now(), &mut dev, sockets));
    }
}

Apart from just polling, it also handles the link state.

Adding a TCP client socket

With polling out of the way, we can now focus on adding a task that will handle a TCP connection. What we want is to connect to a TCP server, and loopback the data the server sent us. This time, let's start with the top-down approach and write the body of the task first, without worrying about the implementation.

async fn tcp_client_task(stack: Stack<'_>) -> Infallible {
    static mut TX: [u8; 1024] = [0u8; 1024];
    static mut RX: [u8; 1024] = [0u8; 1024];

    let mut client = TcpClient::new(stack, unsafe { &mut RX[..] }, unsafe { &mut TX[..] });

    client
        .connect(liltcp::REMOTE_ENDPOINT, liltcp::LOCAL_ENDPOINT)
        .await
        .unwrap();

    defmt::info!("Connected.");

    // loopback
    loop {
        let mut buffer = [0u8; 5];
        let len = defmt::unwrap!(client.recv(&mut buffer).await);
        // Let's not care about the number of sent bytes,
        // with the current buffer settings, it should always write full buffer.
        defmt::unwrap!(client.send(&buffer[..len]).await);
    }
}

We can see, that first, we initialize the transmitting and receiving buffers. Then we create a new socket on our stack and pass it the buffers. unsafe here is unavoidable without a lot of code because static muts are inherently unsafe and will not even be possible in the future.

Socket definition and initialization

Let's have a look at the socket definition and initialization.

pub struct TcpClient<'a> {
    pub stack: Stack<'a>,
    pub handle: SocketHandle,
}

Here, the TcpClient struct contains the wrapper to our Stack and a handle pointing to the Stack's SocketSet.

    pub fn new(mut stack: Stack<'a>, rx_buffer: &'a mut [u8], tx_buffer: &'a mut [u8]) -> Self {
        let rx_buffer = RingBuffer::new(rx_buffer);
        let tx_buffer = RingBuffer::new(tx_buffer);

        let socket = smoltcp::socket::tcp::Socket::new(rx_buffer, tx_buffer);
        let handle = stack.with(|(sockets, _interface)| sockets.add(socket));

        Self { stack, handle }
    }

What happens here is wrapping the raw buffers into smoltcp's ring buffers. Then, a new socket is initialized with them and the socket is added to the Stack's SocketSet. The SocketSet::add call returns a SocketHandle, which we can later use to access the socket.

Accessing the socket

The TcpClient is basically a wrapper around the Stack with a SocketHandle, together forming a "wrapper" around smoltcp::socket::tcp::Socket, which can be indirectly accessed with these two values.

That means that whenever we want to do something with the raw TCP socket, we need to obtain a reference to it via a handle.

To do this, we can utilize a similar pattern as in the previous chapter with the Stack.

    fn with<F, U>(&mut self, f: F) -> U
    where
        F: FnOnce(&mut tcp::Socket, &mut Context) -> U,
    {
        self.stack.with(|(sockets, interface)| {
            let socket = sockets.get_mut(self.handle);

            f(socket, interface.context())
        })
    }

This way, when doing anything with the socket, we don't need to write the boilerplate needed to access it via the Stack and SocketHandle combo.

Connecting

Let's now connect to the server. This will be the first async function utilizing smoltcp's async support.

    pub async fn connect(
        &mut self,
        remote_endpoint: impl Into<IpEndpoint>,
        local_endpoint: impl Into<IpListenEndpoint>,
    ) -> Result<(), ConnectError> {
        self.with(|socket, context| socket.connect(context, remote_endpoint, local_endpoint))?;

        poll_fn(|cx| {
            self.with(|socket, _context| {
                // shamelessly copied from embassy
                match socket.state() {
                    tcp::State::Closed | tcp::State::TimeWait => {
                        Poll::Ready(Err(ConnectError::InvalidState))
                    }
                    tcp::State::Listen => unreachable!(), // marks invalid state
                    tcp::State::SynSent | tcp::State::SynReceived => {
                        socket.register_send_waker(cx.waker());
                        socket.register_recv_waker(cx.waker());
                        Poll::Pending
                    }
                    _ => Poll::Ready(Ok(())),
                }
            })
        })
        .await
    }

Here, we first, initiate the connecting process and then, we create a future using the poll_fn. The poll_fn creates a future, that upon being polled calls a closure returning core::task::Poll, the closure also has access to Future Context, meaning that we can register its Waker to the socket.

That means that after the connecting process is initiated, the closure is called once and then whenever it is awaken by smoltcp. In the body of the closure, the state of the socket is checked for possible failures, or a success. In the case, there is nothing yet to be done, it registers its waker to the socket (this is done every time, because some executors may change the waker over time).

This is the working principle of all the async smoltcp glue code.

Sending data

Sending data utilizes the same working principle as connecting. When polled, it attempts to write as much data to the socket buffers as possible and postpones its execution if the buffers are full.

    pub async fn send(&mut self, buf: &[u8]) -> Result<usize, SendError> {
        poll_fn(|cx| {
            self.with(|socket, _context| match socket.send_slice(buf) {
                Ok(0) => {
                    socket.register_send_waker(cx.waker());
                    Poll::Pending
                }
                Ok(n) => Poll::Ready(Ok(n)),
                Err(e) => Poll::Ready(Err(e)),
            })
        })
        .await
    }

Receiving data

Receiving the data is similar to send data. When polled, it attempts to read some bytes, and when no_data is available, it waits for next poll.

    pub async fn recv(&mut self, buf: &mut [u8]) -> Result<usize, RecvError> {
        poll_fn(|cx| {
            self.with(|socket, _context| match socket.recv_slice(buf) {
                // return 0 doesn't mean EOF when buf is empty
                Ok(0) if buf.is_empty() => Poll::Ready(Ok(0)),
                Ok(0) => {
                    socket.register_recv_waker(cx.waker());
                    Poll::Pending
                }
                Ok(n) => Poll::Ready(Ok(n)),
                // EOF
                Err(RecvError::Finished) => Poll::Ready(Ok(0)),
                Err(RecvError::InvalidState) => Poll::Ready(Err(RecvError::InvalidState)),
            })
        })
        .await
    }

Conclusion

And that is all there is to it. We now have a working async networking stack with quite nice API.

The TCP socket is by no means complete, but adding more functionality to it should not be much of a problem.

Conclusion

The goal of this tutorial was to explore the way to implement an asynchronous networking stack and to show how embassy-net works under the hood. Huge kudos to @dirbaio for all the work he did to make this possible.

The tutorial went from a strictly blocking code up to a fully asynchronous TCP client socket. I did some measurements on its throughput and the maximum throughput on the Nucleo devkit was around 8 Mbits, embassy-net achieves 24 Mbits, which is likely due to polling each time a buffer is dispatched through the peripheral. Adding support for this would require significant changes to the stm32h7xx-hal crate.

The whole source code for this tutorial is available in the intrusive-thoughts repo. Don't hesitate to open any issues or post pull-requests with improvements.

It should be possible to make these wrappers HAL agnostic and have an async stack that can be shared across many HALs, but that is out of scope of this tutorial.

Sharing resources in no-std environments

Work in Progress

This article describes ways to share some resources across multiple tasks.

fn main() {
    println!("hello")
}

// hides interior mutability implementation detail (creation of the inner state), infects code with
// references
// Can't be copy
// PROs
// - hides interior mutability primitive (is this desired though? - embassy-mutex flexibility)
// CONs
// - can't have &mut self receiver
mod reference_outside {
    use std::cell::RefCell;

    struct Inner {
        a: i32,
    }

    struct Outer(RefCell<Inner>);

    impl Outer {
        fn new() -> Self {
            Self(RefCell::new(Inner { a: 0 }))
        }

        fn describe(&self) {
            println!("a: {}", self.0.borrow().a)
        }

        // Can't pass &mut
        fn modify(&self, a: i32) {
            self.0.borrow_mut().a = a;
        }
    }

    fn main() {
        let outer = Outer::new();
    }

    fn a(outer: &Outer) {
        outer.describe();
    }

    fn b(outer: &Outer) {
        outer.modify(1);
    }
}

// shows interior mutability implementation detail (creation of the inner state), infects code with
// lifetimes
// Is meant to be copied
// PROs
// - can have &mut self receiver - API shows intent better
// CONs
// - internal implementation detail is shown
mod reference_inside {
    use std::cell::RefCell;

    struct Inner {
        a: i32,
    }

    #[derive(Clone, Copy)]
    struct Outer<'a>(&'a RefCell<Inner>);

    impl<'a> Outer<'a> {
        fn new(inner: &'a RefCell<Inner>) -> Self {
            Self(inner)
        }

        fn describe(&self) {
            println!("a: {}", self.0.borrow().a)
        }

        fn modify(&mut self, a: i32) {
            self.0.borrow_mut().a = a;
        }
    }

    fn a(outer: Outer) {
        outer.describe();
    }

    fn b(mut outer: Outer) {
        outer.modify(1);
    }

    fn main() {
        let inner = RefCell::new(Inner { a: 0 });
        let outer = Outer::new(&inner);

        a(outer);
        b(outer);
    }
}

// Hides interior mutability implementation detail (creation of the inner state), infects code with
// lifetimes
//
// Is meant to be copied
//
// Makes need to allocate resources still visible
//
// PROs
// - can have &mut self receiver - API shows intent better
// - internal implementation detail is hidden
// - handling of init with multiple resources is easier
// - still shows that there is some shared state
// CONs
// - Boilerplate, that should be removable with a macro
mod reference_inside_hide_state {
    use std::{cell::RefCell, mem::MaybeUninit};

    struct Inner {
        a: i32,
    }

    struct OuterAllocations {
        inner: MaybeUninit<RefCell<Inner>>,
    }

    impl Default for OuterAllocations {
        fn default() -> Self {
            OuterAllocations {
                inner: MaybeUninit::uninit(),
            }
        }
    }

    #[derive(Clone, Copy)]
    struct Outer<'a> {
        inner: &'a RefCell<Inner>,
    }

    impl<'a> Outer<'a> {
        // &'a mut here makes sure that allocations is not used multiple times
        fn new(allocations: &'a mut OuterAllocations) -> Self {
            let inner = &*allocations.inner.write(RefCell::new(Inner { a: 0 }));

            Self { inner }
        }

        fn describe(&self) {
            println!("a: {}", self.inner.borrow().a)
        }

        fn modify(&mut self, a: i32) {
            self.inner.borrow_mut().a = a;
        }
    }

    fn a(outer: Outer) {
        outer.describe();
    }

    fn b(mut outer: Outer) {
        outer.modify(1);
    }

    fn main() {
        let mut allocations = OuterAllocations::default();
        let outer = Outer::new(&mut allocations);

        a(outer);
        b(outer);
    }
}