STM32 Bare Metal Development

From Stm32World Wiki
Jump to navigation Jump to search

This page will describe how to do Bare Metal development on STM32 MCUs.

There seems to be some disagreement and discussion about what the term bare metal refers to. In this article, and indeed everywhere on this site and on our Youtube channel, we will use the term to mean as little abstraction as possible. Some would call this register level programming.

A lot of the initial inspiration came from this page and all credit goes to the author Sergey Lyubka.

Prerequisites

Even though we do not use any pre-made libraries such as HAL or libopencm3, some prerequisites are required:

  • Make tools - either make or cmake but we will be using make
  • ARM C compiler
  • ARM debugger gdb-multiarch
  • ST-Link CLI
  • Debug interface - openocd
  • Some form of editor or IDE - we will be using CLion and Visual Studio Code

On Debian all these tools are available as standard packages.

Make tools

On Debian, GNU Make exists as a separate tool, which can be installed with the following command:

# apt install make 

ARM C compiler

Multiple C compilers exist targeting ARM MCU, for example GNU C and Clang. On these pages and in the videos we will primarily be using GNU C. On Debian that can be installed thus:

# apt install gcc-arm-none-eabi

The installation will pull in the C compiler itself but also necessary tools to handle binary files.

Videos

Announcement of the new bare metal video series:

Absolutely minimal bare-bones project

The first project is an absolutely minimal bare-bones project ending up in a main super loop which does absolutely nothing except maintaining a counter.

Linker Script

The linker script is MCU dependent as it defines the memory regions in the MCU. For STM32F407 the minimal linker script is:


ENTRY(_reset);                                   /* entry point */

MEMORY {
  flash(rx) : ORIGIN = 0x08000000, LENGTH = 512k
  sram(rwx) : ORIGIN = 0x20000000, LENGTH = 92k  /* remaining 64k in a separate address space */
}

_estack     = ORIGIN(sram) + LENGTH(sram);       /* stack points to end of SRAM */

SECTIONS {
  .vectors  : { KEEP(*(.vectors)) }   > flash
  .text     : { *(.text*) }           > flash
  .rodata   : { *(.rodata*) }         > flash

  .data : {
    _sdata = .;                                  /* .data section start */
    *(.first_data)
    *(.data SORT(.data.*))
    _edata = .;                                  /* .data section end */
  } > sram AT > flash
  _sidata = LOADADDR(.data);

  .bss : {
    _sbss = .;                                   /* .bss section start */
    *(.bss SORT(.bss.*) COMMON)
    _ebss = .;                                   /* .bss section end */
  } > sram

  . = ALIGN(8);
  _end = .;     /* for cmsis_gcc.h  */
}
 
 

Bootstrapping

On most STM32 bare metal programming guides, an assembler file is included for bootstrapping the MCU. It is however not necessary as the entire bootstrapping can be implemented in C.

The first step is to define the interrupt vectors. In our bare-bone example that will look like this:

extern void _estack(void);  // Defined in link.ld

// 16 standard and 91 STM32-specific handlers
__attribute__((section(".vectors"))) void (*const tab[16 + 91])(void) = {
    _estack, _reset
};

Notice, only the first 2 words are defined here, the stack address (defined in the linker file) and a pointer to _reset which will be resolved by the linker.

Second, the _reset must be defined:

// Startup code
__attribute__((naked, noreturn)) void _reset(void) {
    extern long _sbss, _ebss, _sdata, _edata, _sidata;

    for (long* dst = &_sbss; dst < &_ebss; dst++) *dst = 0;

    for (long *dst = &_sdata, *src = &_sidata; dst < &_edata;) *dst++ = *src++;

    main();

    for (;;) (void)0;  // Infinite loop - should never be reached
}

This function does little but zero-ing the data on startup and finally call main(). The infinite loop after the call to main should not really be necessary and might even be optimized out by the compiler.

We can finally define our main() thus:

// main
int main(void) {
    uint32_t cnt = 0, half;

    while (1) {
        cnt += 2;
        half = cnt / 2;
        ++half;
    }
}

The main asis maintain two counters (if only one the compiler will figure out it is not used and optimize it away).

Makefile

The final step is to build the project. While we could do that manually it is much easier to throw together a simple Makefile:

CFLAGS  ?=  -W -Wall -Wextra -Werror -Wundef -Wshadow -Wdouble-promotion \
	-Wformat-truncation -fno-common -Wconversion \
	-g3 -O0 -ffunction-sections -fdata-sections -I. \
	-mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 $(EXTRA_CFLAGS)

LDFLAGS ?= -Tf407.ld -nostartfiles -nostdlib --specs nano.specs -lc -lgcc -Wl,--gc-sections -Wl,-Map=$@.map

SOURCES = main.c

all: firmware.bin

firmware.elf: $(SOURCES)
	arm-none-eabi-gcc $(SOURCES) $(CFLAGS) $(LDFLAGS) -o $@

firmware.bin: firmware.elf
	arm-none-eabi-objcopy -O binary $< $@

flash: firmware.bin
	st-flash --reset write $< 0x8000000

clean:
	rm -rf firmware.*

.PHONY: all clean flash

We can now build our project and check the resulting binary:

lth@nb7:~/src/stm32fun_bare-metal/examples/1st-minimal-vscode$ make clean
rm -rf firmware.*
lth@nb7:~/src/stm32fun_bare-metal/examples/1st-minimal-vscode$ make 
arm-none-eabi-gcc main.c -W -Wall -Wextra -Werror -Wundef -Wshadow -Wdouble-promotion -Wformat-truncation -fno-common -Wconversion -g3 -O0 -ffunction-sections -fdata-sections -I. -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16  -Tf407.ld -nostartfiles -nostdlib --specs nano.specs -lc -lgcc -Wl,--gc-sections -Wl,-Map=firmware.elf.map -o firmware.elf
arm-none-eabi-objcopy -O binary firmware.elf firmware.bin
lth@nb7:~/src/stm32fun_bare-metal/examples/1st-minimal-vscode$ ls -l
total 48
-rw-rw-r-- 1 lth lth   946 Jan 24 17:10 f407.ld
-rwxrwxr-x 1 lth lth   528 Jan 24 17:52 firmware.bin
-rwxrwxr-x 1 lth lth 86364 Jan 24 17:52 firmware.elf
-rw-rw-r-- 1 lth lth  3903 Jan 24 17:52 firmware.elf.map
-rw-rw-r-- 1 lth lth   745 Jan 24 16:54 main.c
-rw-rw-r-- 1 lth lth   657 Jan 24 16:54 Makefile

As can be seen above, the resulting binary is 528 byte. The stack pointer and vector table is (1 + 16 + 91) * 4 = 432 byte, so the code only takes less than 100 byte.

Miscellaneous Links