blob: 0cede093d1d7a4c616966c8e8b249b90d246283f [file] [log] [blame] [view]
# Performance with Native Code
How to assess performance of Dart and native code, and how to improve it.
## Profiling Performance
| Tool | Platform | Primary Use Case | Measures (Dart CPU) | Measures (Native CPU) | Measures (Dart Heap) | Measures (Native Heap) |
| --------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- |
| [Dart DevTools] | All | Profiles Dart VM, UI jank, Dart heap | Yes | Opaque "Native" block | Yes | Tracks "External" VM-aware memory only; Misses native-heap leaks |
| [Xcode Instruments (Time Profiler)] | iOS/macOS | Profiles native CPU call stacks | No | Yes (full symbolication) | No | No |
| [Xcode Instruments (Leaks/Allocations)] | iOS/macOS | Profiles native heap (malloc, mmap) | No | No | No | Yes |
| [Android Studio Profiler (CPU)] | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No |
| [Perfetto (heapprofd)] | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) |
| [Linux perf] | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No |
| [Visual Studio CPU Usage Profiler] | Windows | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No |
| [WPA (Heap Analysis)] | Windows | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) |
<!-- TODO: Add documentation for the other tools. -->
### Dart DevTools
For only assessing the performance of the Dart code, and treating native code as
a black box, use the Dart performance tooling.
See the documentation on https://dart.dev/tools/dart-devtools and
https://docs.flutter.dev/perf. For FFI, most specifically, you can use
https://docs.flutter.dev/tools/devtools/cpu-profiler and
https://docs.flutter.dev/tools/devtools/performance#timeline-events-tab.
For synchronous FFI calls you can add synchronous timeline events, and for
asynchronous code (using async callbacks or helper isolates) you can use async
events.
### `perf` on Linux
To see both Dart and native symbols in a flame graph, you can use `perf` on
Linux.
To run the [FfiCall benchmark] in JIT mode with `perf`:
```
$ perf record -g dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart && \
perf report --hierarchy
```
Note that Flutter apps are deployed in AOT mode. So prefer profiling in AOT
mode.
For AOT, we currently don't have a [single command
yet](https://github.com/dart-lang/sdk/issues/54254). You need to use
`precompiler2` command from the Dart SDK. See [building the Dart SDK] for how to
build the Dart SDK.
```
$ pkg/vm/tool/precompiler2 benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \
perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart.bin && \
perf report --hierarchy
```
To analyze a performance issue in Flutter, it is best to reproduce the issue in
Dart standalone.
## Improving performance
There are some typical patterns to improve performance:
* To avoid dropped frames, move long-running FFI calls to a helper isolate.
* To avoid copying data where possible:
* Keep data in native memory, operating on [`Pointer`][]s and using
[`asTypedList`][] to convert the pointers into [`TypedData`][].
* For short calls, if the memory is in Dart, avoid copying by using leaf calls
([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]) and [`address`]. (Leaf
calls prevent the Dart GC from running on all isolates, which allows giving
a pointer to native code of an object in Dart.)
* Use [`Isolate.exit`][] to send large data from a helper isolate to the main
isolate after a large computation.
* For many small calls, limit the overhead per call. This makes a significant
difference for calls shorter than 1 us (one millionth of a second), and can be
considered for calls of up to 10 us.
* Use leaf calls ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]).
* Prefer using [build hooks][] with [`Native`] `external`
functions over [`DynamicLibrary.lookupFunction`][] and
[`Pointer.asFunction`][].
For reference, the [FfiCall benchmark][] reports 1000 FFI calls in AOT on Linux x64:
```
FfiCall.Uint8x01(RunTime): 234.61104068226345 us.
FfiCall.Uint8x01Leaf(RunTime): 71.9994712538334 us.
FfiCall.Uint8x01Native(RunTime): 216.07292770828917 us.
FfiCall.Uint8x01NativeLeaf(RunTime): 27.64136415181509 us.
```
A single call that is native-leaf takes 28 ns, while an `asFunction`-non-leaf
takes 235 ns. So for calls taking ~1000 ns that's a 20% speedup.
## Community sources
* (Video) Using Dart FFI for Compute-Heavy Tasks:
https://www.youtube.com/watch?v=eJR5C0VRCjU
* (Video) Maximize Speed with Dart FFI: Beginners Guide to High-Performance
Integration https://www.youtube.com/watch?v=HF8gHAakb1Q
[`address`]: https://api.dart.dev/dart-ffi/StructAddress/address.html
[`asTypedList`]: https://api.dart.dev/dart-ffi/Uint8Pointer/asTypedList.html
[`DynamicLibrary.lookupFunction`]: https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html
[`isLeaf` (2)]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html
[`isLeaf` (3)]:https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html
[`isLeaf`]: https://api.dart.dev/dart-ffi/Native/isLeaf.html
[`Isolate.exit`]: https://api.dart.dev/dart-isolate/Isolate/exit.html
[`Native`]: https://api.dart.dev/dart-ffi/Native-class.html
[`Pointer.asFunction`]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html
[`Pointer`]: https://api.dart.dev/dart-ffi/Pointer-class.html
[`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html
[Android Studio Profiler (CPU)]: https://developer.android.com/studio/profile
[build hooks]: https://dart.dev/tools/hooks
[building the Dart SDK]: https://github.com/dart-lang/sdk/blob/main/docs/Building.md
[Dart DevTools]: https://dart.dev/tools/dart-devtools
[FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart
[Linux perf]: https://perfwiki.github.io/main/
[Perfetto (heapprofd)]: https://perfetto.dev/
[Visual Studio CPU Usage Profiler]: https://learn.microsoft.com/en-us/visualstudio/profiling/cpu-usage
[WPA (Heap Analysis)]: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer
[Xcode Instruments (Leaks/Allocations)]: https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use
[Xcode Instruments (Time Profiler)]: https://developer.apple.com/tutorials/instruments