How to Write Parallel Multitasking Applications for ESP32 using FreeRTOS & Arduino

Learn how to take advantage of the multitasking features of FreeRTOS for ESP32 dual-core SoC using your favorite Arduino IDE. Use our examples to learn about mutex, semaphore and critical section code.
How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-CIRCUITSTATE-Electronics-Featured-Image-01-2

ESP32 is a powerful and feature-rich SoC from Espressif. It is undoubtedly one of our favorites. Part of the reason why we like it so much is that it is a dual-core SoC with a fast operating clock and large memory capacity. Two cores mean there are two separate processors inside the SoC die. To fully take advantage of these awesome hardware features, the usual sequential programming method is not enough. Sequential programming or execution is when one task is executed while all other tasks wait for the running task to finish. In contrast, if you think about your personal computer, you know that you can do multiple tasks at a time as if each task/application is running parallel and independently of the other. While sequential programming can do that if run fast enough, there are real advantages to making use of “true parallelism” with multiple cores. Fortunately, we can write true parallel applications for ESP32 within the Arduino environment, thanks to the excellent support from Espressif engineers. You don’t need to install anything to accomplish this. It is already present in the Arduino environment and you have knowingly or unknowingly used it if you have ever used ESP32. In this tutorial, we will learn how to properly write parallel tasks for ESP32 using FreeRTOS and the Arduino Development Framework (ADF).

If you are new to ESP32, we have a great getting-started tutorial where we use the DOIT ESP32 DevKit V1 board for the demo.

Gettgin Started with Espressif ESP32 WiFi BLE SoC Using DOIT-ESP32-DevKit-V1 CIRCUITSTATE Electronics Featured Image

Getting Started with Espressif ESP32 Wi-Fi & Bluetooth SoC using DOIT-ESP32-DevKit-V1 Development Board

Learn how to use Espressif ESP32 SoC for Wi-Fi and Bluetooth development using DOIT ESP32 DevKit V1 development board. Use Arduino, ESP-IDF, PlatformIO and VS Code for software development.

We can develop embedded firmware for you

CIRCUITSTATE can develop embedded firmware for any microcontroller/microprocessor including 8051, PIC, AVR, ARM, STM32, ESP32, and RISC-V using industry-leading SDKs, frameworks, and tools. Contact us today to share your requirements.

Contact Us
Electronics Networking Vector Image

Sequential vs Parallel Programming

This is a complex computer science topic and we won’t be covering the entirety of it here. Instead, we will limit the details enough to understand the concept and goals of this tutorial. We will stick with familiar examples and use the Arduino environment as much as possible. We know that a basic Arduino sketch has two functions at the least; setup() and loop(). These functions are two logical constructs that help us to implement the logic of our application program. The setup() function runs only once after powering up and it initializes every pin and interfaces we need. The loop() function on the other hand can contain a sequence of instructions that will restart after finishing an iteration/cycle. Let’s take the Blinky sketch as an example.

In the setup() function we set the LED pin to output. This has to be done only once as the effect of the operation remains the same until we change it further, and setting it again has no effect beyond the first one (such an operation is called the idempotent operation). But in order to blink an LED we need to turn it ON and OFF in a cycle with some finite delay between each operation. We know how to write that inside the loop() function. We write the statements to turn ON and OFF the LED with delay() calls in between.

All good. But what if we need to blink two or more LEDs at different rates? You must have wondered about that at least once. If you try to use the delay() function to add different rates, then it will mess up the timing logic and you won’t achieve what you are trying to implement. So what’s the solution? One workaround is by using the millis() function. It returns a counter value that represents the number of milliseconds that have passed since the startup. We can check the current value of the millis() function and decide when to turn ON or OFF multiple LEDs. But we have just complicated the program right? What if we need to write so many tasks beyond just blinking LEDs? The program will get unmanageably complicated before you think about it.

So what’s the best solution? One solution is a form of abstraction called multitasking where multiple tasks run concurrently as if they are separate loop() functions. If you had two loop() functions, you could achieve the blinking of two LEDs at different rates right, without the overhead of using millis(). That’s what parallel/concurrent programming can do for us. It abstracts away all complicated stuff and lets us write application code without worrying about one task interfering with another task.

Multitasking, Multiprocessing and Multithreading

Yes, these terms all look alike and can be confusing. Do they refer to the same thing? Depends on when and where you ask. Multitasking usually refers to when a user runs multiple applications (high-level user tasks) at the same time instead of opening and closing each application at a time. Applications may run on single or multiple cores. Multiprocessing refers to when a single application runs multiple processes each doing different operations. The processes may use single core or multiple cores. Finally, multithreading refers to when you run the same process in multiple instances, each called a “thread”, to speed up the entire process.

In this tutorial, we will stick with the terms task and multitasking. Task refers to a single function doing some operation. Multitasking refers to running multiple tasks together either in a single core or multiple cores.

There are a few ways multitasking is achieved depending on the execution environment. If there is only one processor core, we can execute tasks either in a preemptive or cooperative fashion. They are two of the many methods. In preemptive multitasking, the currently running task is stopped after a definite interval to allow the next task to run in an interleaved fashion. The number of tasks, their priorities, and the value of the time slice determines how seamlessly your parallel tasks can run. If there are too many tasks to run it will slow down all tasks. This might be familiar to you when you try to run too many applications at once and everything gets hung.

On the other hand, cooperative multitasking allows each task to take its time to finish executing while all other tasks wait in a queue to be executed next. If the tasks do not take much time to finish, then everything runs smoothly. But if any task takes more than the expected time, all other tasks will be delayed and you will notice the delay in execution. As a computer programmer implementing logic, you don’t want your tasks to get delayed for any reason.

ESP32 SoC has two processor cores (three in fact, if you also count the ULP core). Most modern processors have multiple cores on a single die. If we have multiple cores, then we can tell the processor to execute our tasks in different cores thus achieving true parallelism. Even if one core gets hung, all other cores would be running just fine. This is called parallel execution. You can think of it as having multiple independent loop() functions. We will see how we can write parallel tasks in the next step.

Concurrent vs Parallel Execution

Concurrent execution refers to when multiple tasks make progresses by sharing the single CPU time. The effect will be as if the tasks are running simultaneously. Parallel execution is when tasks use multiple processor cores to execute without sharing the CPU time with other tasks. This is what we refer to as “true parallelism”. Read more about it here. You can use a mix of both styles to improve the system efficiency and throughput.

Writing Concurrent Tasks

Adding a new task to run in parallel to other tasks is easy. Try uploading the following code to your ESP32 board. We are using the DOIT ESP32 DEVKIT V1.

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode (LED_BUILTIN, OUTPUT);

  Serial.begin (115200);

  xTaskCreatePinnedToCore (
    loop2,     // Function to implement the task
    "loop2",   // Name of the task
    1000,      // Stack size in bytes
    NULL,      // Task input parameter
    0,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );
}

// the loop function runs over and over again forever
void loop() {
  digitalWrite (LED_BUILTIN, HIGH);  // turn the LED on (HIGH is the voltage level)
  delay (1000);  // wait for a second
  digitalWrite (LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
  delay (1000);  // wait for a second
}

// the loop2 function also runs forver but as a parallel task
void loop2 (void* pvParameters) {
  while (1) {
    Serial.print ("Hello");
    delay (500); // wait for half a second
    Serial.println (" World");
    delay (500); // wait for half a second
  }
}
C++

In the setup() function there are two familiar lines; one for setting the LED pins as OUTPUT and another for initializing the default USB serial port (Serial) with 115200 baudrate. The next function call xTaskCreatePinnedToCore() allows us to define a new task. That function is actually part of the FreeRTOS core for ESP32 which allows the creation of tasks that can be pinned to the processor core of our choice. We will explain more about it later in this post. The function xTaskCreatePinnedToCore() can accept a few parameters/arguments. They are,

  1. loop2 – This is the name of the function you want to run as a parallel task. In our case we have named the function loop2.
  2. "loop2" – This is a user-readable name for the function. This can be any string you like. But keep it short and readable.
  3. 1000 – This is the stack size (memory size) for the task. The task will use this amount of memory when it needs to store temporary variables and results. The value is in number of bytes and 1000 bytes is more than enough for our simple task. Memory-intensive tasks will require more memory.
  4. NULL – This is a pointer to the parameter that will be passed to the new task. We are not using it here and therefore it is set to NULL.
  5. 0 – This is the priority of the task. We are setting it to 0.
  6. NULL – This is a handle or pointer to the task that we are going to create. This handle can be used to invoke the task. We don’t need it for our example and therefore it can be NULL.
  7. 0 – This is the ID of the processor core we want our task to run at. ESP32 has two cores identified as 0 and 1. So we are pinning our task to core 0.

When you execute the program, you can see the LED blinking every second and at the same time, “Hello World” string is printed to the serial monitor. Now if you change the delay values of loop() and loop2() to any values you like, you can observe that changes made to one task have no effect on the other task. Essentially, we have achieved parallelism. This small example is enough for you to implement complex parallel applications. But there are a few things to keep in mind.

You can see that the function loop2() has a single argument called void* pvParameters which we are not even using anywhere inside the function. That argument part is required for any function that is going to be run as a task. The parameter is a type of void pointer with the name pvParameters. That name doesn’t have to be the same and it can be any. But for the sake of clarity, keep the name as is. The return type of a task function should always be void. Otherwise, it will generate a compilation error. If you are wondering why the default loop() function does not have any arguments, that is because the default loop() is automatically encapsulated by a preprocessor (an application that scans and rearranges your code) before it is compiled.

Another thing to note is the presence of the while loop inside loop2(). When we convert a function to a task, we are only asking the operating system to run it; not run it repeatedly. That’s why we need an infinite loop inside the loop2() function. Without the while loop, the task loop2() will be executed only once. The reason why the default loop() does not have such a loop inside is that the Arduino environment already manages it.

Which core to use?

Since we have two cores, you might start thinking about which core to use for your tasks. Is one core better than the other? Well, like we said each core is essentially a standalone processor. Each processor can execute instructions parallelly but share common resources such as memory and peripherals. By default, all your Arduino code runs on Core 1 and the Wi-Fi and RF functions (these are usually hidden from the Arduino environment) use the Core 0. To know which core your current task is running at, you can use the function xPortGetCoreID(). Try running the following sketch.

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode (LED_BUILTIN, OUTPUT);

  Serial.begin (115200);

  xTaskCreatePinnedToCore (
    loop2,     // Function to implement the task
    "loop2",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    0,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );
}

// the loop function runs over and over again forever
void loop() {
  digitalWrite (LED_BUILTIN, HIGH);  // turn the LED on (HIGH is the voltage level)
  delay (1000);  // wait for a second
  digitalWrite (LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
  delay (1000);  // wait for a second
  Serial.print ("loop() running in core ");
  Serial.println (xPortGetCoreID());
}

// the loop2 function also runs forver but as a parallel task
void loop2 (void* pvParameters) {
  while (1) {
    Serial.print ("Hello");
    delay (500); // wait for half a second
    Serial.print (" World from loop2() at ");
    Serial.println (xPortGetCoreID());
    delay (500); // wait for half a second
  }
}
C++

This will print the ID of the core the functions are running at. Deciding which core to use depends on how much workload is managed by a core. RF operations of the ESP32 SoC require time-sensitive and interrupt-based software which can be complex. Since core 0 is already used for those tasks, it is always best to use the second core 1. But this is only applicable if you are using some of the RF features such as Wi-Fi or BLE. If you don’t need any features, then choose whichever cores you like.

You can try changing the core for loop2() to 1 in our example. That will make both loop() and loop2() run on the same core. In that case, our tasks become concurrent rather than parallel. But how does that work? To understand that, we need to learn a little about FreeRTOS.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-Running-Cores-CIRCUITSTATE-Electronics-01
Printing core ID with ESP32

FreeRTOS

FreeRTOS is a lightweight Real-Time Operating System (RTOS) designed for low-performance processors like microcontrollers. It is an open-source project widely adopted for commercial and industrial applications. An RTOS is a type of operating system with deterministic behavior. An example of deterministic behavior is, suppose a task takes 1 second to execute inside an RTOS. That task will always take 1 second to execute whenever you run it. That means its timing behavior is determined and is not variable. In contrast, a General Purpose Operating System (GPOS) can adjust the timing of the task dynamically to give you a better throughput and response when you run multiple tasks parallelly. An example of a GPOS is Microsoft Windows. There are advantages and disadvantages to both RTOS and GPOS depending on the type of application. For low-power microcontrollers used mainly for control applications, we need real-time behavior and so it is always better to use RTOS.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-FreeRTOS-Organization-CIRCUITSTATE-Electronics-01
FreeRTOS organization

The core part of an RTOS is a kernel (a supervising program with the highest system privileges). The kernel takes care of all the dirty work for us, including timing, queuing, task synchronization, priorities, interrupts, and more. FreeRTOS can work on both a single-core or multi-core environment. In cases where we have multiple cores, we can execute tasks in different cores. But if there is only one core, FreeRTOS slices the processor time to different quanta (a minimum unit of time, usually a very small time interval) and allows each task to consume the time slices periodically. That means we can run multiple parallel RTOS tasks on the same core without any worries, just like how we run loop() and loop2() on the same core.

When you write an Arduino application for ESP32, FreeRTOS is used in the underlying layers to schedule your tasks without you even knowing. The default loop() function is run inside a FreeRTOS task called loopTask(). The function xTaskCreatePinnedToCore() is one of the many methods to create a task in FreeRTOS. The more generic version of the function is called xTaskCreate(). This function does not pin the task to any core explicitly but determines it automatically. You can read about all available FreeROTS functions here at Espressif documentation.

Resource Sharing

Now is the best time to take a look at the internal block diagram of the ESP32 SoC. As you can see, there are two (1 for some variants) Xtensa 32-bit LX6 microprocessor cores inside the SoC. But every other resource is shared. For example, both cores share the same ROM and SRAM space. This is also true for other peripherals. Both cores can access any peripherals it wants to. But if that is the case, then wouldn’t there be conflicts if one core tries to do some operation with one peripheral and the other core tries to do something else with the same thing? Yes, such a situation can arise in a multi-core environment. For that reason, it is your responsibility as a programmer to write conflict-free code when you are developing applications that require parallel tasks running on multiple cores. Don not be deterred though. There are enough tools and techniques to help you write clean and efficient code for your multitasking applications.

Espressif-ESP32-Internal-Block-Diagram-CIRCUITSTATE-Electronics-01
ESP32 internal block diagram. Source: Espressif

One example of a resource-sharing conflict in a multi-core environment is that of a race condition that occurs during memory access. Suppose we have a memory location with an address 0xF00. We can write two parallel tasks; both of them will try to write a random number to that location and read it afterward. So what happens when one task is writing a value and at the same time the other task is also trying to do the same? Which task gets the chance to write to the memory location? The answer is undefined. That means, we can not predict which one of the tasks gets to write to the memory location because they both are racing to achieve the same.

Race conditions give rise to weird bugs that may not show up during normal testing. So whenever you need to share a variable, object, function, or peripheral between tasks, make sure that only one task accesses the resource at a time. The accessing task must also leave the resource in a determined state after the operation. But how do we do it? Read about it in the next section.

Task Synchronization

Whenever we need to share a resource or make two tasks/processes communicate with each other, we need a mechanism to prevent both tasks from accessing and modifying the resource at the same time. When a task accesses a resource we need to signal all other tasks that the resource is being used at the moment. This is called task synchronization. There are two techniques to achieve this; Mutex and Semaphore. Mutex stands for “Mutual Exclusion”. A mutex is simply a shared variable that is always accessed through atomic operations. What is an atomic operation you ask? It’s a way of executing a sequence of instructions without anything interrupting the operation until it finishes. A mutex variable can remain in two states; locked and unlocked (0 or 1). You can think of it as a boolean variable. It is used to protect a common resource from simultaneous access by multiple tasks. A process/task can acquire a mutex lock by calling a function called acquire(). Once a lock is acquired the task can continue its operations on the object protected by the mutex without worrying about other tasks modifying the resource. If any other tasks try to acquire a mutex lock when it is already in the locked position, the requesting tasks will be put in a wait state called busy waiting. The processes that try to get a lock will repeatedly try to acquire it until they succeed. Busy waiting is also called a spinlock.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-Mutex-Working-Illustration-CIRCUITSTATE-Electronics-01
Mutex functional illustration

The task that initially acquired the lock will release the lock when it finishes all of its operations. Releasing the mutex lock can be done by release(). Now, other tasks waiting to get a lock can acquire it this time. Both acquire() and release() are generic function names used for explaining the concept and the actual names can be platform-specific.

The other type of task synchronization is called a Semaphore which is a method of signaling. The difference between a semaphore and a mutex is that a semaphore acts as a signal like an LED indication while a mutex is a lock that is analogous to a padlock with a key. A semaphore is also a variable shared between the tasks. But it can be in more than two states, unlike mutexes. Semaphores are implemented in two ways; counting semaphores and binary semaphores. A counting semaphore is an integer variable whose value will be initialized with the number of instances of a particular resource to be shared. For example, if you have 10 instances of a data structure then, a semaphore is initialized with the value 10. This means all 10 structs are free at the beginning. Then a task can request one instance of that data and decrement the semaphore variable. The semaphore value will then become 9. When all instances are taken up by the tasks, the semaphore becomes 0. When a task has finished using a resource, it can release the resource and increment the semaphore by 1. If a task requests access when the semaphore is 0, it can either wait until at least one resource is free, or do some other operations without wasting time waiting.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-Semaphore-Working-Illustration-CIRCUITSTATE-Electronics-01
Example of a semaphore. Multiple tasks trying to acquire three instances of a resource. Only the first three tasks get access and the other tasks are in a wait state.

A binary semaphore as the name suggests can only count to 1 or be in a maximum of two states (0 and 1). The difference between a binary semaphore and a mutex is that the binary semaphore only acts as a signal of whether a resource is being used or not but the mutex causes a resource to be locked from other tasks. Both find different use cases. Let’s summarize the concepts of mutex and semaphore if you feel confused.

  1. A mutex is a mutual exclusion lock that is used when multiple tasks compete for a single resource.
  2. A mutex can only be in two states; locked and unlocked.
  3. Mutex causes requesting tasks to “spin” (busy wait) until a resource is available.
  4. Semaphore is a signaling method. Tasks are free to do something else without wasting time by waiting.
  5. Semaphores can be in N number of states and the maximum value indicates the total number of resources available.
  6. Semaphores keep track of resource usage and are good for splittable or multi-instance data.
  7. Semaphore does not give exclusive access to a resource, but instead gives access to one of the many instances of a resource.
  8. A binary semaphore indicates the availability of only one instance of a resource.

Examples

Mutex

Let’s see how to write an Arduino sketch to implement a mutex for ESP32 using FreeRTOS.

//================================================================================//
/*
  ESP32 Mutex Example
  Read more at https://circuitstate.com/tutorials/how-to-write-parallel-multitasking-applications-for-esp32-with-freertos-arduino
*/
//================================================================================//

SemaphoreHandle_t xMutex = NULL;  // Create a mutex object

int counter = 0;  // A shared variable

//================================================================================//

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode (LED_BUILTIN, OUTPUT);

  Serial.begin (115200);

  xMutex = xSemaphoreCreateMutex();  // crete a mutex object

  xTaskCreatePinnedToCore (
    task1,     // Function to implement the task
    "task1",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );

  xTaskCreatePinnedToCore (
    task2,     // Function to implement the task
    "task2",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );
}

//================================================================================//

// the loop function runs over and over again forever
void loop() {
  digitalWrite (LED_BUILTIN, HIGH);  // turn the LED on (HIGH is the voltage level)
  delay (1000);  // wait for a second
  digitalWrite (LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
  delay (1000);  // wait for a second
}

//================================================================================//

// this task will periodically lock the mutex, increment the counter by 1 and unlock the mutex
void task1 (void *pvParameters) {
  while (1) {
    if (xSemaphoreTake (xMutex, portMAX_DELAY)) {  // take the mutex
      Serial.print ("Task 1: Mutex acquired at ");
      Serial.println (xTaskGetTickCount());
      counter = counter + 1;  // increment the counter
      Serial.print ("Task 1: Counter = ");
      Serial.println (counter);
      delay (1000);
      xSemaphoreGive (xMutex);  // release the mutex
      delay (100);
    }
  }
}

//================================================================================//

// this task will periodically lock the mutex, increment the counter by 1000 and unlock the mutex
void task2 (void *pvParameters) {
  while (1) {
    if (xSemaphoreTake (xMutex, (200 * portTICK_PERIOD_MS))) {  // try to acquire the mutex
      Serial.print ("Task 2: Mutex acquired at ");
      Serial.println (xTaskGetTickCount());
      counter = counter + 1000;
      Serial.print ("Task 2: Counter = ");
      Serial.println (counter);
      xSemaphoreGive (xMutex);  // release the mutex
      delay (100);
    }
    else {  // if the mutex was not acquired within 200ms
      Serial.print ("Task 2: Mutex not acquired at ");
      Serial.println (xTaskGetTickCount());
    }
  }
}

//================================================================================//
C++

First, we create a mutex object xMutex of type SemaphoreHandle_t. But why does it say “semaphore handle” when we need a mutex? Well, in FreeRTOS both mutex and semaphore are implemented as common sharable routines. This is because of the similarities between both concepts and it makes, from a programmer’s perspective, better abstractions. A semaphore handle can be used to create any type of semaphores or mutex. xMutex is only a handle initialized with a NULL value. It is not usable yet. We define the type of mutex in the setup() function using xSemaphoreCreateMutex().

The counter is a global variable that will act as a common resource. We have two tasks; task1 and task2. Both these tasks can access the counter variable. However, since it is a shared resource and our tasks are running in parallel, we need the mutex mechanism to prevent conflicts such as race conditions.

In the setup() function we also create our tasks by pinning them to core 0. The default loop() will continue to blink our LED as if nothing is happening. Our first task, task1 tries to get a lock on our mutex xMutex with xSemaphoreTake (xMutex, portMAX_DELAY) function call. We need to pass two parameters here; one is our mutex handle and the second is a timeout value. We are using portMAX_DELAY macro which corresponds to an indefinite delay. That means task1 will try to acquire a lock indefinitely until it succeeds. When it gets the lock, it will print some information and increment the counter by 1. The task will then print the value of the counter and wait for 1 second before releasing the mutex lock by calling xSemaphoreGive (xMutex). It will wait for another 100 milliseconds before repeating the whole operation.

Meanwhile, our second task task2 also competes for the same counter variable. It will also try to get a lock on our mutex xMutex but with a definite timeout of 200 milliseconds specified as 200 * portTICK_PERIOD_MS. If the task can not acquire the lock within that time period, it will momentarily stop trying and print a message instead. Remember that task2 can only acquire the lock if it is in an unlocked state or in other words, not locked by task1. Because task1 takes 1 second to release the mutex, the task2 will have to wait at least 1 second before it can acquire the lock. This will become obvious to you when you see the serial monitor output.

When task1 increments the counter by 1, task2 does it by 1000. This is so that we can easily see who is doing what to the counter. Also, unlike task1, task2 releases the mutex as soon as it is done with the operation. This means that task1 will not need to wait to acquire a lock after 1 second of waiting it already does.

Try uploading the code to your ESP32 board and see the following result in the serial monitor.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-ESP32-Mutex-Example-Sketch-Serial-Monitor-CIRCUITSTATE-Electronics-01
ESP32 mutex example

Binary Semaphore

Let’s see how we can implement a Binary Semaphore for ESP32 using FreeRTOS and Arduino. Try uploading the following code to your ESP32 board and open the serial monitor.

//================================================================================//
/*
  ESP32 Binary Semaphore Example
  Read more at https://circuitstate.com/tutorials/how-to-write-parallel-multitasking-applications-for-esp32-with-freertos-arduino
*/
//================================================================================//

SemaphoreHandle_t xSemaphore = NULL;  // Create a semaphore object

//================================================================================//

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode (LED_BUILTIN, OUTPUT);

  Serial.begin (115200);

  xSemaphore = xSemaphoreCreateBinary();  // Set the semaphore as binary

  xTaskCreatePinnedToCore (
    task1,     // Function to implement the task
    "task1",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );

  xTaskCreatePinnedToCore (
    task2,     // Function to implement the task
    "2000",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );
}

//================================================================================//

// the loop function runs over and over again forever
void loop() {
  digitalWrite (LED_BUILTIN, HIGH);  // turn the LED on (HIGH is the voltage level)
  delay (1000);  // wait for a second
  digitalWrite (LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
  delay (1000);  // wait for a second
}

//================================================================================//

// this task will periodically release the binary semaphore
void task1 (void *pvParameters) {
  while (1) {
    Serial.print ("Binary Semaphore released at ");
    Serial.println (xTaskGetTickCount());
    xSemaphoreGive (xSemaphore);  // Release the semaphore
    delay (1000);
  }
}

//================================================================================//

// this task will wait for the binary semaphore to be released
void task2 (void *pvParameters) {
  while (1) {
    if (xSemaphoreTake (xSemaphore, (200 * portTICK_PERIOD_MS))) {  // try to acquire the semaphore
      Serial.print ("Binary Semaphore acquired at ");
      Serial.println (xTaskGetTickCount());
    }
    else {  // if the semaphore was not acquired within 200ms
      Serial.print ("Binary Semaphore not acquired at ");
      Serial.println (xTaskGetTickCount());
    }
  }
}

//================================================================================//
C++

First, we will create a semaphore object called xSemaphore with the type SemaphoreHandle_t. This semaphore handle can be used to create any type of semaphore. In the setup() function, we will set this semaphore handle as a binary semaphore with the help of xSemaphoreCreateBinary(). After that, we will create two tasks called task1 and task2. We are pinning both tasks to core 0 but it doesn’t matter. The default loop() simply binks our LED and does nothing else. Since it is a separate task, other parts of our code have no effects on it.

We are using task1 to release or give the semaphore which signals any listening task that the semaphore is available. task1 will repeat this operation every second. The function used to release a semaphore is xSemaphoreGive(). We have to pass our semaphore xSemaphore to that function to release it. The function will set the semaphore to some value representing the released state (this can true for example).

task2 is a separate task running parallel to task1. task2 will check if the semaphore xSemaphore is available using the xSemaphoreTake() function. We need to pass two arguments to this function; the semaphore handle to check and the timeout (as tick count) for the operation. If the semaphore is not acquired within the timeout, the function will return false. This will skip the first if block and print a “Binary Semaphore not acquired” message. If the semaphore gets acquired within the timeout, the message “Binary Semaphore acquired” will be printed along with the timestamp.

xTaskGetTickCount() returns the tick count (a counter value that represents the clock signal count used by the processor). portTICK_PERIOD_MS is a macro that allows specifying time in milliseconds in tick count format. Since we are only releasing the semaphore every second, task2 will only acquire it every second and fail to acquire it in every other case. Try executing the program and you will see the result shown below.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-ESP32-Binary-Semaphore-Example-CIRCUITSTATE-Electronics-01
ESP32 Binary Semaphore example

We can see that task2 fails to acquire the semaphore four times before it acquires it the fifth time. The most important thing difference you need to understand is that the binary semaphore we used did not lock any resources from other tasks. Instead, task1 was only signaling task2 that a semaphore was available. That’s the difference between a binary semaphore and a mutex.

Counting Semaphore

Now let’s write an Arduino sketch to demonstrate a counting semaphore on ESP32. Try uploading the following code.

//================================================================================//
/*
  ESP32 Counting Semaphore Example
  Read more at https://circuitstate.com/tutorials/how-to-write-parallel-multitasking-applications-for-esp32-with-freertos-arduino
*/
//================================================================================//

SemaphoreHandle_t countingSem = NULL;  // counting semaphore
SemaphoreHandle_t semFull = NULL;  // semaphore indicating that the counting semaphore is full

//================================================================================//

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode (LED_BUILTIN, OUTPUT);

  Serial.begin (115200);

  semFull = xSemaphoreCreateBinary();  // Create a binary semaphore
  countingSem = xSemaphoreCreateCounting (3, 3);  // create a semaphore object

  if (countingSem != NULL) {
    Serial.println ("Semaphore created successfully");
  }

  xTaskCreatePinnedToCore (
    task1,     // Function to implement the task
    "task1",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,        // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );

  xTaskCreatePinnedToCore (
    task2,     // Function to implement the task
    "task2",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,        // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );
}

//================================================================================//

// the loop function runs over and over again forever
void loop() {
  digitalWrite (LED_BUILTIN, HIGH);  // turn the LED on (HIGH is the voltage level)
  delay (1000);  // wait for a second
  digitalWrite (LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
  delay (1000);  // wait for a second
}

//================================================================================//

// this will acquire two counting semaphores and wait for a signal from task2
void task1 (void *pvParameters) {
  while(1) {
    if (uxSemaphoreGetCount (countingSem) == 3) { // check if the counting semaphore is free
      if (xSemaphoreTake (countingSem, portMAX_DELAY) && xSemaphoreTake (countingSem, portMAX_DELAY)) { // acquire two semaphores
        Serial.print ("Task 1: 2 semaphores acquired at ");
        Serial.println (xTaskGetTickCount());
        Serial.print ("Task 1: Semaphores left = ");
        Serial.println (uxSemaphoreGetCount (countingSem)); // print the number of semaphores left
        delay (1000);
      }
    }

    if (xSemaphoreTake (semFull, portMAX_DELAY)) {  // wait for signal from task 2
      Serial.println ("Task 1: Semaphore is full. Releasing 2 semaphores..");
      xSemaphoreGive (countingSem); // release the two semaphores
      xSemaphoreGive (countingSem);
      delay (1000);
    }
  }
}

//================================================================================//

// this task will acquire only semaphore
void task2 (void *pvParameters) {
  while (1) {
    if (uxSemaphoreGetCount (countingSem) == 1) {  // check when only 1 semaphore is left
      if (xSemaphoreTake (countingSem, portMAX_DELAY)) {  //try to acquire the semaphore
        Serial.println ("Task 2: Acquiring last semaphore..");
        Serial.print ("Task 2: Semaphore acquired at ");
        Serial.println (xTaskGetTickCount());
        Serial.print ("Task 2: Semaphores left = ");
        Serial.println (uxSemaphoreGetCount (countingSem));
        delay (1000);
        xSemaphoreGive (semFull); // signal that the counting semaphore is full
        delay (1000);
        Serial.print ("Task 2: Releasing 1 semaphore at ");
        Serial.println (xTaskGetTickCount());
        xSemaphoreGive (countingSem); // release the semaphore
      }
    }
  }
}

//================================================================================//
C++

We will first create two semaphore handles; a counting semaphore countingSem and a binary semaphore semFull to signal when the counting semaphore is full. semFull will be set to a binary semaphore using xSemaphoreCreateBinary(). To create a counting semaphore we should use xSemaphoreCreateCounting() and pass two values; the maximum value of the counting semaphore and the initial value. The maximum number can indicate the number of resources you have, for example. In this example, we are setting the maximum value to 3, which means countingSem can only be acquired for a maximum of three times. Trying to acquire beyond 3 will fail.

The initial value can be set to either maximum (3) or minimum (0). For example, if you have a finite number of resources that you want to distribute to your tasks, then you can set it to the maximum value. But if you are counting some events, then you can set the initial value to 0. In both cases, you can use the function uxSemaphoreGetCount() to get the current number of free semaphores.

In task1, we will first check if the counting semaphore count is 3. Since the initial value of countingSem is 3, this check will return true. Then we will take two semaphores from countingSem using the now-familiar xSemaphoreTake(). We will use the maximum timeout portMAX_DELAY for acquiring. Once the two semaphores are acquired, task1 will wait task2 to raise the semFull signal.

Meanwhile, task2 will be checking when the countingSem becomes 1, or in other words when task2 acquired 2 semaphores. So now it’s task2‘s time to acquire the remaining 1 semaphore. After acquiring the last semaphore, task2 will wait for 1 second before raising the semFull signal using xSemaphoreGive. This will signal task1 that all counting semaphores have been acquired. When task1 gets this signal, it takes the semaphore using xSemaphoreTake() and releases the two semaphores it initially acquired. After two seconds, task2 will also release its semaphore. At that instance, countingSem will be completely freed. The whole process repeats again from there.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-ESP32-Counting-Semaphore-Example-Serial-Monitor-CIRCUITSTATE-Electronics-01
Counting semaphore in action

FreeRTOS Critical Section

So far all our examples and discussions revolved around a calm and predictable environment of sequential and parallel tasks that are easy to picture in mind. The coding was also relatively easy. But there is an important concept we haven’t taken into account yet; interrupts. Interrupts disturb the normal flow of a program and it is applicable to both sequential and concurrent programming. Interrupt service routines (ISR) are designed to respond as fast as possible to an asynchronous event (the time when it happens is not predictable). Interrupt services must also finish their operations faster. Depending on the priority of the interrupt, a set of currently running tasks are deferred until the interrupt routine is completed. Interrupts are already complicated in a single-core environment and it gets even more complicated in a multi-core environment.

Interrupts can also interrupt FreeRTOS tasks depending on how they are configured. If so what happens when you have a piece of code that is so critical that you don’t want the interrupts to block it even for a short time? Disabling the interrupts is one method, but that defeats the purpose of using interrupts. To solve the problem, FreeRTOS allows us to define critical section code using taskENTER_CRITICAL() and taskEXIT_CRITICAL() macros. You can call taskENTER_CRITICAL() just before your critical code and then call taskEXIT_CRITICAL() once done. Always make sure that your critical section code is as short as possible. Otherwise, it will adversely affect the interrupt response time.

Let’s see how you can write a critical section code for ESP32.

//================================================================================//
/*
  ESP32 Critical Section Example
  Read more at https://circuitstate.com/tutorials/how-to-write-parallel-multitasking-applications-for-esp32-with-freertos-arduino
*/
//================================================================================//

portMUX_TYPE taskMux = portMUX_INITIALIZER_UNLOCKED;  // critical section mutex

int counter = 0;  // A shared variable

//================================================================================//

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode (LED_BUILTIN, OUTPUT);

  Serial.begin (115200);

  xTaskCreatePinnedToCore (
    task1,     // Function to implement the task
    "task1",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );

  xTaskCreatePinnedToCore (
    task2,     // Function to implement the task
    "2000",   // Name of the task
    1000,      // Stack size in words
    NULL,      // Task input parameter
    10,         // Priority of the task
    NULL,      // Task handle.
    0          // Core where the task should run
  );
}

//================================================================================//

// the loop function runs over and over again forever
void loop() {
  digitalWrite (LED_BUILTIN, HIGH);  // turn the LED on (HIGH is the voltage level)
  delay (1000);  // wait for a second
  digitalWrite (LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
  delay (1000);  // wait for a second
}

//================================================================================//

// this task will periodically lock the mutex, increment the counter by 1 and unlock the mutex
void task1 (void *pvParameters) {
  while(1) {
    Serial.print ("Task 1: Trying to increment the counter at ");
    Serial.println (xTaskGetTickCount());

    portENTER_CRITICAL (&taskMux);  // lock the mutex (busy waiting)
      counter = counter + 1;  // increment the counter
      Serial.print ("Task 1: Counter = ");
      Serial.println (counter);
    portEXIT_CRITICAL (&taskMux); // unlock the mutex
    
    delay (1000);
  }
}

//================================================================================//

// this task will periodically lock the mutex, increment the counter by 1000 and unlock the mutex
void task2 (void *pvParameters) {
  while (1) {
    Serial.print ("Task 2: Trying to increment the counter at ");
    Serial.println (xTaskGetTickCount());

    portENTER_CRITICAL (&taskMux);  // lock the mutex (busy waiting)
      counter = counter + 1000;
      Serial.print ("Task 2: Counter = ");
      Serial.println (counter);
    portEXIT_CRITICAL (&taskMux); // unlock the mutex

    delay (500);
  }
}

//================================================================================//
C++

The method for creating a critical section mutex for ESP32 is called portENTER_CRITICAL which is equivalent to taskENTER_CRITICAL(). Similarly, there is portEXIT_CRITICAL that is equivalent to taskEXIT_CRITICAL(). In order to use the macros, we need to pass a mutex object. It can be created with the data type portMUX_TYPE. We create a mutex object called taskMux and initialize it with portMUX_INITIALIZER_UNLOCKED which tells the mutex is free/unlocked initially. We have the rest of the code similar to our previous mutex example. Since counter is our shared variable, we can access it under a critical section to prevent other tasks from interfering. So just before accessing the counter variable we call the portENTER_CRITICAL (&taskMux) function. Notice that we are passing our mutex object. Any task which is also trying to acquire the same mutex will have to wait until we are done. Also, this allows us to create multiple mutexes and thus critical section code in many places.

After we are done modifying the counter value, we will exit the critical section by calling portEXIT_CRITICAL (&taskMux). At this time, other tasks can acquire a lock to the mutex. Interrupts are momentarily suspended when the processor enters a critical section code. It is extremely useful for sharing hardware registers in a multi-core environment without conflicts. See the output below.

How-to-Write-Parallel-Multitasking-Applications-for-ESP32-with-FreeRTOS-and-Arduino-ESP32-Critical-Section-Code-Mutex-Example-Serial-Monitor-Output-CIRCUITSTATE-Electronics-01
Critical section code example

Since task1 waits for 1 second and task2 only 0.5 seconds after their critical section operations, task2 gets to increment the counter twice consecutively.

What’s Next?

There are still so many things to explain in detail, such as Watchdog Timer, Interrupts, Queues etc. But I think for this tutorial we have explained enough things to help you write parallel multitasking applications taking advantage of the dual cores of ESP32 and managing task synchronization. You can find better and more detailed tutorials on these topics all over the internet. We have two suggestions; one is https://techtutorialsx.com/ from Nuno Santos and https://esp32tutorials.com/. Both these websites have a long list of tutorials covering different features of the ESP32. Finally, you can read everything about using FreeRTOS with ESP32 from the official documentation. If you run into issues, you can always try debugging your code with the official ESP-Prog debug probe. Check out our following tutorial to learn more. Happy coding 😀

Debugging ESP32 Arduino and ESP-IDF Projects using ESP-Prog and PlatformIO CIRCUITSTATE Electronics Featured Image

Debugging ESP32 Arduino & ESP-IDF Projects using ESP-Prog and PlatformIO

Learn how to use the official Espressif ESP-Prog to debug your ESP32 Arduino and ESP-IDF projects with the help of PlatformIO.
  1. Semaphores and Mutexes – FreeRTOS
  2. Concurrency vs Parallellism
  3. ESP32 Tutorials – https://techtutorialsx.com/category/esp32/
  4. Difference between binary semaphore and mutex
  5. SemaphoreHandle_t and critical section – the difference
  6. Critical section code – FreeRTOS
  7. FreeRTOS in 20 Minutes

Share to your friends
Vishnu Mohanan

Vishnu Mohanan

Founder and CEO at CIRCUITSTATE Electronics

Articles: 91

10 Comments

  1. Hi,
    Thanks a lot for your tutorial. However, when I ran the code of the critical section example, my ESP32 repeatedly reset and could not finish printing out the counter value for task 2. Is it because the program in the critical section takes too much time to process? Because when I removed the Serial.print() inside the critical section, the code worked fine.

    • Thanks for your feedback. We just ran the code on a FireBeetle-ESP32E board and we observed the same problem. We are using the latest versions of all packages. I think something must have changed in the packages after we published the post. We are currently trying to find out what the issue is.

    • The error message we got included the following line.

      rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)

      This indicates that the Timer Group 1 WDT was triggered. We tried disabling the Serial.print() lines on task2 and it was working correctly. So you are right, the print() function was taking more time for some reason, and triggering the watchdog.

      In actual applications, we must keep the critical section code as short as possible and avoid calling functions with busy waits. Since the code provided is only an example, let’s ignore the error for now.

  2. Hi,

    src/main.cpp: In function 'void setup()':
    src/main.cpp:13:5: error: 'loop2' was not declared in this scope
    loop2, // Function to implement the task
    ^~~~~
    src/main.cpp:13:5: note: suggested alternative: 'loop'
    loop2, // Function to implement the task
    ^~~~~
    loop
    *** [.pio\build\lolin_s3_mini\src\main.cpp.o] Error 1

    What is missing and how did you get it working in platformio without
    #include
    #include

    ?

    • Hi Tom. The header files reqired are automatically included by the compiler when the Arduino sketch is converted to a proper C/C++ file. Regarding the error you are getting, please check if you have selected the right board type and platform in the configuration.

      Also, if you are using PlatformIO, you need to include the forward declarations of all functions before all of the function definitions. Otherwise, PIO will throw errors.

  3. I was wondering if the shared variables in the code segments should be declared as volatile and why not if they do not need to be?

    • The volatile keyword is used to prevent the compiler from optimizing away the variable by using cached copy instead of directly reading from memory every time. This is useful when declaring variables that are directly connected to an underlying asynchronous hardware logic, for example GPIO register of a microcontroller. The value of the GPIO register can change any time due to external inputs and the program must always read it directly from memory instead of using a recent copy of it. But in a multi-threaded and multi-core environment, it is not necessary to use the volatile keyword unless it is a hardware register. Using volatile doesn’t give access protection in a multi-threaded environment. Instead, we must always use mutexes and semaphores to synchronize access to shared variables between tasks, just like we have shown.

  4. Great article. I wish it had gone a bit further and covered when things like WiFi is enabled.

    I have a timer function running every 20 microseconds (also tried a interrupt function on state change of a GPIO pin) to measure the state and duration of a QAM_RX 433 receiver. When the message is received it displays on the serial port.

    Works a treat until I run WiFi.begin() and then my function doesn’t trigger (all the time).

    For now I’ve gone back to the Arduino Nano doing the 433 stuff and then via Serial talking to the ESP32 so via WiFi it can update a MQTT queue. I was hoping to do it all in the ESP32 but I just don’t seem to be able to get it working.

    • Hi Ian. Thanks for the feedback. Does your timer function utilize one of the ESP32’s hardware timers? If not, for precise and deterministic timing, we recommend using a hardware timer. You can load the timer with a preset and wait for the interrupt to call your function.

      Regarding the issue of Wi-Fi interrupting your timing function, could you please confirm if you are using the same or different core for your timer function? ESP32 has two cores and Core 1 is used by the Wi-Fi and Bluetooth stack by default. Running timing critical functions in the same core when Wi-Fi is running can cause issues. We suspect that to be the issue. But we can’t be sure without seeing your code. ESP32 and its software framework is very complex, and so if we do not do things the right way, things will fail. Please let us know if you are able to narrow down the problem.

      • Hi

        Thanks for your reply. I’ve put my example code on github for you to have a look at. Welcome any suggestions.

        I’m rapidly coming to the conclusion I can’t do 433Mhz stuff and WiFi stuff on the same ESP32. I might go back to the Arduino Nano for 433Mhz and let it talk to the ESP32 via serial. 🙁

        https://github.com/ichilver/esp32-433-wifi

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.