STM32 Jump to System Memory Bootloader
All STM32 MCUs has a builtin bootloader stored in so-called system memory. The system memory is a ROM (read-only memory) which is created during the production of the MCU and can never be changed (it can however be disabled - see Read Out Protection). When the MCU startup the Boot0 pin is asserted and if high (pulled up to VCC) the MCU will execute the bootloader.
Videos
We have made a video describing this approach.
Watch on Youtube here: https://www.youtube.com/watch?v=qXi6o8hhwUE
Another video documenting an alternative and less naughty approach is here: [1]
The method in that video is a bit naughty - watch how to fix it here: [2]
The "problem" (or challenge)
But what if we would like to execute this bootloader programmatically based on some other event for example the press of a user button or a command in a serial console. While this is entirely possible it is surprisingly difficult. The problem is that the built-in bootloader make a lot of assumptions and it is necessary to make certain all these assumptions are met before jumping to it. In general that would look something like:
- Find system memory location for specific STM32 in AN2606 (see Miscellaneous Links)
- Set RCC to default values (the same as on startup) [Internal clock, no PLL, etc.)
- Disable SysTick interrupt and reset it to default
- Disable all interrupts
- Map system memory to 0x00000000 location
- Set jump location to memory location + 4 bytes offset
- Set main stack pointer to value stored at system memory location address
- Call virtual function assigned before
Depending on which peripherals is being used, the above can be quite complicated and changes in code can easily screw this up.
The easy solution
Fortunately, there is in fact an easier way to do this. If we look at the code generated by STM32CubeMX the startup is like this:
int main(void)
{
  /* USER CODE BEGIN 1 */
  /* USER CODE END 1 */
  /* MCU Configuration--------------------------------------------------------*/
  /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
  HAL_Init();
  /* USER CODE BEGIN Init */
  /* USER CODE END Init */
  /* Configure the system clock */
  SystemClock_Config();
  /* USER CODE BEGIN SysInit */
  /* USER CODE END SysInit */
  /* Initialize all configured peripherals */
  MX_GPIO_Init();
  MX_USB_DEVICE_Init();
...
By the time the main function is being called, only the memory has been initialized (BSS has been set to 0x00 throughout), and if we were to do the jump immediately (in the USER CODE BEGIN 1 section) that would work just fine. That would look something like this:
/* USER CODE BEGIN 1 */ #define BOOTLOADER_ADDRESS 0x1FFF0000 typedef void (*pFunction)(void); pFunction JumpToApplication; uint32_t JumpAddress; /* Jump to system memory bootloader */ JumpAddress = *(__IO uint32_t*) (BOOTLOADER_ADDRESS + 4); JumpToApplication = (pFunction) JumpAddress; JumpToApplication(); /* USER CODE END 1 */
Of course the result of that would be an application that always executed the bootloader, and that is not really useful, nor is it what we wanted to do.
An eagle eyes viewer of one of the videos linked above, asked why the "+ 4" which is a very good question. STM32 binaries contain the "stack address" as the first word (4 bytes) of the binary, followed by the address of the reset handler. At this point of time in the code, the stack has already been initialised, so the +4 simply jumps to the reset vector.
What we need is a way to store a flag, which will survive a system restart of the MCU. This is a little bit tricky since the user memory (RSS) is set to zero during restart. There are multiple examples online of how to store variables/flags which will survive a processor reset. A common approach is to use the user registers of the Real Time Clock (RTC), however this approach is quite complicated (require the RTC to be initialized) and it is quite hard to make it generic for all STM32s since the RTC differs quite a lot between the STM32 variants.
Fortunately there's a hack which can be utilized here. If we look at the memory (RAM) of a typical STM32 application it looks something like:
The DATA and RSS are being reset during restart, but everything above that (heap + stack) are being left alone. Not only is it being left alone, but we got variables from the linker which tells us exactly where they are independent of the actual MCU. Looking at a typical linker script (generated by STM32CubeMX we will find the following:
/* Entry Point */
ENTRY(Reset_Handler)
/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);	/* end of "RAM" Ram type memory */
_Min_Heap_Size = 0x400 ;	/* required amount of heap  */
_Min_Stack_Size = 0x800 ;	/* required amount of stack */
/* Memories definition */
MEMORY
{
  RAM    (xrw)    : ORIGIN = 0x20000000,   LENGTH = 128K
  FLASH    (rx)    : ORIGIN = 0x8000000,   LENGTH = 512K
}
...
In other word, the _estack points to the highest memory location (the stack grows down, while the heap grows up).
Of course under "normal" circumstances one shouldn't mess with the stack, but in our case immediately after messing with it we perform a system reset, so we can afford to lose whatever is stored somewhere in the stack. If we debug the application the MCU registers contain the following at the start of the main function:
The interesting register here is the stack pointer SP which contains 0x2001fff0. The highest RAM address on the particular MCU used in this example (Black Pill STM32F411) is 0x2001ffff, so the stack at this point have used about 15 bytes. Any value after that (remember the stack grow down, so lower address) will be unused at this point. If we create a pointer like this:
#define BOOTLOADER_FLAG_OFFSET 100 uint32_t *bootloader_flag; bootloader_flag = (uint32_t*) (&_estack - BOOTLOADER_FLAG_OFFSET); // 100 bytes below top of stack
we can store a "known" value at that location, reset the MCU and then check that value immediately at startup.
Our initial code shown earlier can now be expanded like this:
  /* USER CODE BEGIN 1 */
	bootloader_flag = (uint32_t*) (&_estack - BOOTLOADER_FLAG_OFFSET); // 100 bytes below top of stack
	if (*bootloader_flag == BOOTLOADER_FLAG_VALUE) {
		*bootloader_flag = 0;
		/* Jump to system memory bootloader */
		JumpAddress = *(__IO uint32_t*) (BOOTLOADER_ADDRESS + 4);
		JumpToApplication = (pFunction) JumpAddress;
		JumpToApplication();
	}
	*bootloader_flag = 0; // So next boot won't be affected
  /* USER CODE END 1 */
If our bootloader flag contains anything but BOOTLOADER_FLAG_VALUE, the application will move on with it's normal initialization of interrupts, timers, peripherals etc., but IF our flag contains BOOTLOADER_FLAG_VALUE it will jump - successfully - to the internal bootloader. Triggering this with a user button could look like this:
void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin) {
	if (GPIO_Pin == BTN_Pin) // If the button
	{
		GPIO_PinState pinState = HAL_GPIO_ReadPin(BTN_GPIO_Port, BTN_Pin);
		if (pinState == GPIO_PIN_RESET) {
			push_count = HAL_GetTick();
		} else {
			if (HAL_GetTick() - push_count > 1000) {
				*bootloader_flag = BOOTLOADER_FLAG_VALUE;
				HAL_NVIC_SystemReset();
			}
			push_count = 0;
		}
	}
}
In this case, if a button connected to BTN_Pin is pressed for more than 1 second, the device will reboot in bootloader mode.
Evaluation
Now, I am fully aware that messing with the stack like I do in this example is quite naughty. The question is, is there any way this could go wrong. Well, the answer is a "yes" but unlikely. If we look at the point where we do mess with the stack, it looks like this:
*bootloader_flag = BOOTLOADER_FLAG_VALUE; HAL_NVIC_SystemReset();
I guess that if a interrupt happened between setting the bootloader flag and calling the reset something unexpected could happen. Also a possible issue would be if the HAL_NVIC_SystemReset function call pushed the return address pushed to the stack would be at the exact offset location. This might require a bit more analysis.
Alternative - less naughty - approach
After using the approach described above for a long time, I realized that there is an alternative and much less naughty approach. Rather than dumping the flag at a fixed offset in the middle of the stack somewhere, one could simply move the stack 8 bytes "down" and reserve the top 8 bytes for this boot flag.
First step is to edit the linker script like this:
/* Highest address of the user mode stack */ _bflag = ORIGIN(RAM) + LENGTH(RAM) - 8; /* end of "RAM" Ram type memory */ _estack = _bflag;
The _bflag will now point to where the stack used to start, while the actual _estack is shifted down by 4 bytes. The _only_ other change is to add the _bflag to the code and initialise the bootflag to that address:
extern int _bflag; dfu_boot_flag = (uint32_t*) (&_bflag); // set in linker script
And that is about - quite a lot more elegant and definitely less risky.
Notice that the stack need to be 8-byte aligned or some of the ARM libraries start failing.
Source
The source for this example can be found on our github.

