| .. SPDX-License-Identifier: GPL-2.0+ |
| .. Copyright (c) 2013 The Chromium OS Authors. |
| |
| Tracing in U-Boot |
| ================= |
| |
| U-Boot supports a simple tracing feature which allows a record of execution |
| to be collected and sent to a host machine for analysis. At present the |
| main use for this is to profile boot time. |
| |
| |
| Overview |
| -------- |
| |
| The trace feature uses GCC's instrument-functions feature to trace all |
| function entry/exit points. These are then recorded in a memory buffer. |
| The memory buffer can be saved to the host over a network link using |
| tftpput or by writing to an attached memory device such as MMC. |
| |
| On the host, the file is first converted with a tool called 'proftool', |
| which extracts useful information from it. The resulting trace output |
| resembles that emitted by Linux's ftrace feature, so can be visually |
| displayed by pytimechart. |
| |
| |
| Quick-start using Sandbox |
| ------------------------- |
| |
| Sandbox is a build of U-Boot that can run under Linux so it is a convenient |
| way of trying out tracing before you use it on your actual board. To do |
| this, follow these steps: |
| |
| Add the following to config/sandbox_defconfig |
| |
| .. code-block:: c |
| |
| CONFIG_TRACE=y |
| |
| Build sandbox U-Boot with tracing enabled: |
| |
| .. code-block:: console |
| |
| $ make FTRACE=1 O=sandbox sandbox_config |
| $ make FTRACE=1 O=sandbox |
| |
| Run sandbox, wait for a bit of trace information to appear, and then capture |
| a trace: |
| |
| .. code-block:: console |
| |
| $ ./sandbox/u-boot |
| |
| U-Boot 2013.04-rc2-00100-ga72fcef (Apr 17 2013 - 19:25:24) |
| |
| DRAM: 128 MiB |
| trace: enabled |
| Using default environment |
| |
| In: serial |
| Out: serial |
| Err: serial |
| =>trace stats |
| 671,406 function sites |
| 69,712 function calls |
| 0 untracked function calls |
| 73,373 traced function calls |
| 16 maximum observed call depth |
| 15 call depth limit |
| 66,491 calls not traced due to depth |
| =>trace stats |
| 671,406 function sites |
| 1,279,450 function calls |
| 0 untracked function calls |
| 950,490 traced function calls (333217 dropped due to overflow) |
| 16 maximum observed call depth |
| 15 call depth limit |
| 1,275,767 calls not traced due to depth |
| =>trace calls 0 e00000 |
| Call list dumped to 00000000, size 0xae0a40 |
| =>print |
| baudrate=115200 |
| profbase=0 |
| profoffset=ae0a40 |
| profsize=e00000 |
| stderr=serial |
| stdin=serial |
| stdout=serial |
| |
| Environment size: 117/8188 bytes |
| =>host save host 0 trace 0 ${profoffset} |
| 11405888 bytes written in 10 ms (1.1 GiB/s) |
| =>reset |
| |
| |
| Then run proftool to convert the trace information to ftrace format |
| |
| .. code-block:: console |
| |
| $ ./sandbox/tools/proftool -m sandbox/System.map -p trace dump-ftrace >trace.txt |
| |
| Finally run pytimechart to display it |
| |
| .. code-block:: console |
| |
| $ pytimechart trace.txt |
| |
| Using this tool you can zoom and pan across the trace, with the function |
| calls on the left and little marks representing the start and end of each |
| function. |
| |
| |
| CONFIG Options |
| -------------- |
| |
| CONFIG_TRACE |
| Enables the trace feature in U-Boot. |
| |
| CONFIG_CMD_TRACE |
| Enables the trace command. |
| |
| CONFIG_TRACE_BUFFER_SIZE |
| Size of trace buffer to allocate for U-Boot. This buffer is |
| used after relocation, as a place to put function tracing |
| information. The address of the buffer is determined by |
| the relocation code. |
| |
| CONFIG_TRACE_EARLY |
| Define this to start tracing early, before relocation. |
| |
| CONFIG_TRACE_EARLY_SIZE |
| Size of 'early' trace buffer. Before U-Boot has relocated |
| it doesn't have a proper trace buffer. On many boards |
| you can define an area of memory to use for the trace |
| buffer until the 'real' trace buffer is available after |
| relocation. The contents of this buffer are then copied to |
| the real buffer. |
| |
| CONFIG_TRACE_EARLY_ADDR |
| Address of early trace buffer |
| |
| |
| Building U-Boot with Tracing Enabled |
| ------------------------------------ |
| |
| Pass 'FTRACE=1' to the U-Boot Makefile to actually instrument the code. |
| This is kept as a separate option so that it is easy to enable/disable |
| instrumenting from the command line instead of having to change board |
| config files. |
| |
| |
| Collecting Trace Data |
| --------------------- |
| |
| When you run U-Boot on your board it will collect trace data up to the |
| limit of the trace buffer size you have specified. Once that is exhausted |
| no more data will be collected. |
| |
| Collecting trace data has an affect on execution time/performance. You |
| will notice this particularly with trivial functions - the overhead of |
| recording their execution may even exceed their normal execution time. |
| In practice this doesn't matter much so long as you are aware of the |
| effect. Once you have done your optimizations, turn off tracing before |
| doing end-to-end timing. |
| |
| The best time to start tracing is right at the beginning of U-Boot. The |
| best time to stop tracing is right at the end. In practice it is hard |
| to achieve these ideals. |
| |
| This implementation enables tracing early in board_init_f(). This means |
| that it captures most of the board init process, missing only the |
| early architecture-specific init. However, it also misses the entire |
| SPL stage if there is one. |
| |
| U-Boot typically ends with a 'bootm' command which loads and runs an |
| OS. There is useful trace data in the execution of that bootm |
| command. Therefore this implementation provides a way to collect trace |
| data after bootm has finished processing, but just before it jumps to |
| the OS. In practical terms, U-Boot runs the 'fakegocmd' environment |
| variable at this point. This variable should have a short script which |
| collects the trace data and writes it somewhere. |
| |
| Trace data collection relies on a microsecond timer, accessed through |
| timer_get_us(). So the first think you should do is make sure that |
| this produces sensible results for your board. Suitable sources for |
| this timer include high resolution timers, PWMs or profile timers if |
| available. Most modern SOCs have a suitable timer for this. Make sure |
| that you mark this timer (and anything it calls) with |
| __attribute__((no_instrument_function)) so that the trace library can |
| use it without causing an infinite loop. |
| |
| |
| Commands |
| -------- |
| |
| The trace command has variable sub-commands: |
| |
| stats |
| Display tracing statistics |
| |
| pause |
| Pause tracing |
| |
| resume |
| Resume tracing |
| |
| funclist [<addr> <size>] |
| Dump a list of functions into the buffer |
| |
| calls [<addr> <size>] |
| Dump function call trace into buffer |
| |
| If the address and size are not given, these are obtained from environment |
| variables (see below). In any case the environment variables are updated |
| after the command runs. |
| |
| |
| Environment Variables |
| --------------------- |
| |
| The following are used: |
| |
| profbase |
| Base address of trace output buffer |
| |
| profoffset |
| Offset of first unwritten byte in trace output buffer |
| |
| profsize |
| Size of trace output buffer |
| |
| All of these are set by the 'trace calls' command. |
| |
| These variables keep track of the amount of data written to the trace |
| output buffer by the 'trace' command. The trace commands which write data |
| to the output buffer can use these to specify the buffer to write to, and |
| update profoffset each time. This allows successive commands to append data |
| to the same buffer, for example:: |
| |
| => trace funclist 10000 e00000 |
| => trace calls |
| |
| (the latter command appends more data to the buffer). |
| |
| |
| fakegocmd |
| Specifies commands to run just before booting the OS. This |
| is a useful time to write the trace data to the host for |
| processing. |
| |
| |
| Writing Out Trace Data |
| ---------------------- |
| |
| Once the trace data is in an output buffer in memory there are various ways |
| to transmit it to the host. Notably you can use tftput to send the data |
| over a network link:: |
| |
| fakegocmd=trace pause; usb start; set autoload n; bootp; |
| trace calls 10000000 1000000; |
| tftpput ${profbase} ${profoffset} 192.168.1.4:/tftpboot/calls |
| |
| This starts up USB (to talk to an attached USB Ethernet dongle), writes |
| a trace log to address 10000000 and sends it to a host machine using |
| TFTP. After this, U-Boot will boot the OS normally, albeit a little |
| later. |
| |
| |
| Converting Trace Output Data |
| ---------------------------- |
| |
| The trace output data is kept in a binary format which is not documented |
| here. To convert it into something useful, you can use proftool. |
| |
| This tool must be given the U-Boot map file and the trace data received |
| from running that U-Boot. It produces a text output file. |
| |
| Options |
| |
| -m <map_file> |
| Specify U-Boot map file |
| |
| -p <trace_file> |
| Specify profile/trace file |
| |
| Commands: |
| |
| dump-ftrace |
| Write a text dump of the file in Linux ftrace format to stdout |
| |
| |
| Viewing the Trace Data |
| ---------------------- |
| |
| You can use pytimechart for this (sudo apt-get pytimechart might work on |
| your Debian-style machine, and use your favourite search engine to obtain |
| documentation). It expects the file to have a .txt extension. The program |
| has terse user interface but is very convenient for viewing U-Boot |
| profile information. |
| |
| |
| Workflow Suggestions |
| -------------------- |
| |
| The following suggestions may be helpful if you are trying to reduce boot |
| time: |
| |
| 1. Enable CONFIG_BOOTSTAGE and CONFIG_BOOTSTAGE_REPORT. This should get |
| you are helpful overall snapshot of the boot time. |
| |
| 2. Build U-Boot with tracing and run it. Note the difference in boot time |
| (it is common for tracing to add 10% to the time) |
| |
| 3. Collect the trace information as described above. Use this to find where |
| all the time is being spent. |
| |
| 4. Take a look at that code and see if you can optimize it. Perhaps it is |
| possible to speed up the initialization of a device, or remove an unused |
| feature. |
| |
| 5. Rebuild, run and collect again. Compare your results. |
| |
| 6. Keep going until you run out of steam, or your boot is fast enough. |
| |
| |
| Configuring Trace |
| ----------------- |
| |
| There are a few parameters in the code that you may want to consider. |
| There is a function call depth limit (set to 15 by default). When the |
| stack depth goes above this then no tracing information is recorded. |
| The maximum depth reached is recorded and displayed by the 'trace stats' |
| command. |
| |
| |
| Future Work |
| ----------- |
| |
| Tracing could be a little tidier in some areas, for example providing |
| run-time configuration options for trace. |
| |
| Some other features that might be useful: |
| |
| - Trace filter to select which functions are recorded |
| - Sample-based profiling using a timer interrupt |
| - Better control over trace depth |
| - Compression of trace information |
| |
| |
| Simon Glass <sjg@chromium.org> |
| April 2013 |