STM32 Bare Metal Development
This page will describe how to do Bare Metal development on STM32 MCUs.
There seems to be some disagreement and discussion about what the term bare metal refers to. In this article, and indeed everywhere on this site and on our Youtube channel, we will use the term to mean as little abstraction as possible. Some would call this register level programming.
A lot of the initial inspiration came from this page and all credit goes to the author Sergey Lyubka.
Prerequisites
Even though we do not use any pre-made libraries such as HAL or libopencm3, some prerequisites are required:
- Make tools - either make or cmake but we will be using make
- ARM C compiler
- ARM debugger gdb-multiarch
- ST-Link CLI
- Debug interface - openocd
- Some form of editor or IDE - we will be using CLion and Visual Studio Code
On Debian all these tools are available as standard packages.
Make tools
On Debian, GNU Make exists as a separate tool, which can be installed with the following command:
# apt install make
ARM C compiler
Multiple C compilers exist targeting ARM MCU, for example GNU C and Clang. On these pages and in the videos we will primarily be using GNU C. On Debian that can be installed thus:
# apt install gcc-arm-none-eabi
The installation will pull in the C compiler itself but also necessary tools to handle binary files.
Videos
Announcement of the new bare metal video series:
Absolutely minimal bare-bones project
The first project is an absolutely minimal bare-bones project ending up in a main super loop which does absolutely nothing except maintaining a counter.
Linker Script
The linker script is MCU dependent as it defines the memory regions in the MCU. For STM32F407 the minimal linker script is:
ENTRY(_reset); /* entry point */
MEMORY {
flash(rx) : ORIGIN = 0x08000000, LENGTH = 512k
sram(rwx) : ORIGIN = 0x20000000, LENGTH = 92k /* remaining 64k in a separate address space */
}
_estack = ORIGIN(sram) + LENGTH(sram); /* stack points to end of SRAM */
SECTIONS {
.vectors : { KEEP(*(.vectors)) } > flash
.text : { *(.text*) } > flash
.rodata : { *(.rodata*) } > flash
.data : {
_sdata = .; /* .data section start */
*(.first_data)
*(.data SORT(.data.*))
_edata = .; /* .data section end */
} > sram AT > flash
_sidata = LOADADDR(.data);
.bss : {
_sbss = .; /* .bss section start */
*(.bss SORT(.bss.*) COMMON)
_ebss = .; /* .bss section end */
} > sram
. = ALIGN(8);
_end = .; /* for cmsis_gcc.h */
}
Bootstrapping
On most STM32 bare metal programming guides, an assembler file is included for bootstrapping the MCU. It is however not necessary as the entire bootstrapping can be implemented in C.
The first step is to define the interrupt vectors. In our bare-bone example that will look like this:
extern void _estack(void); // Defined in link.ld
// 16 standard and 91 STM32-specific handlers
__attribute__((section(".vectors"))) void (*const tab[16 + 91])(void) = {
_estack, _reset
};
Notice, only the first 2 words are defined here, the stack address (defined in the linker file) and a pointer to _reset which will be resolved by the linker.
Second, the _reset must be defined:
// Startup code
__attribute__((naked, noreturn)) void _reset(void) {
extern long _sbss, _ebss, _sdata, _edata, _sidata;
for (long* dst = &_sbss; dst < &_ebss; dst++) *dst = 0;
for (long *dst = &_sdata, *src = &_sidata; dst < &_edata;) *dst++ = *src++;
main();
for (;;) (void)0; // Infinite loop - should never be reached
}
This function does little but zero-ing the data on startup and finally call main(). The infinite loop after the call to main should not really be necessary and might even be optimized out by the compiler.
We can finally define our main() thus:
// main
int main(void) {
uint32_t cnt = 0, half;
while (1) {
cnt += 2;
half = cnt / 2;
++half;
}
}
The main asis maintain two counters (if only one the compiler will figure out it is not used and optimize it away).
Makefile
The final step is to build the project. While we could do that manually it is much easier to throw together a simple Makefile:
CFLAGS ?= -W -Wall -Wextra -Werror -Wundef -Wshadow -Wdouble-promotion \ -Wformat-truncation -fno-common -Wconversion \ -g3 -O0 -ffunction-sections -fdata-sections -I. \ -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 $(EXTRA_CFLAGS) LDFLAGS ?= -Tf407.ld -nostartfiles -nostdlib --specs nano.specs -lc -lgcc -Wl,--gc-sections -Wl,-Map=$@.map SOURCES = main.c all: firmware.bin firmware.elf: $(SOURCES) arm-none-eabi-gcc $(SOURCES) $(CFLAGS) $(LDFLAGS) -o $@ firmware.bin: firmware.elf arm-none-eabi-objcopy -O binary $< $@ flash: firmware.bin st-flash --reset write $< 0x8000000 clean: rm -rf firmware.* .PHONY: all clean flash
We can now build our project and check the resulting binary:
lth@nb7:~/src/stm32fun_bare-metal/examples/1st-minimal-vscode$ make clean rm -rf firmware.* lth@nb7:~/src/stm32fun_bare-metal/examples/1st-minimal-vscode$ make arm-none-eabi-gcc main.c -W -Wall -Wextra -Werror -Wundef -Wshadow -Wdouble-promotion -Wformat-truncation -fno-common -Wconversion -g3 -O0 -ffunction-sections -fdata-sections -I. -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -Tf407.ld -nostartfiles -nostdlib --specs nano.specs -lc -lgcc -Wl,--gc-sections -Wl,-Map=firmware.elf.map -o firmware.elf arm-none-eabi-objcopy -O binary firmware.elf firmware.bin lth@nb7:~/src/stm32fun_bare-metal/examples/1st-minimal-vscode$ ls -l total 48 -rw-rw-r-- 1 lth lth 946 Jan 24 17:10 f407.ld -rwxrwxr-x 1 lth lth 528 Jan 24 17:52 firmware.bin -rwxrwxr-x 1 lth lth 86364 Jan 24 17:52 firmware.elf -rw-rw-r-- 1 lth lth 3903 Jan 24 17:52 firmware.elf.map -rw-rw-r-- 1 lth lth 745 Jan 24 16:54 main.c -rw-rw-r-- 1 lth lth 657 Jan 24 16:54 Makefile
As can be seen above, the resulting binary is 528 byte. The stack pointer and vector table is (1 + 16 + 91) * 4 = 432 byte, so the code only takes less than 100 byte.