| > [!IMPORTANT] | 
 | > This page was copied from https://github.com/dart-lang/sdk/wiki and needs review. | 
 | > Please [contribute](../CONTRIBUTING.md) changes to bring it up-to-date - | 
 | > removing this header - or send a CL to delete the file. | 
 |  | 
 | --- | 
 |  | 
 | A programming language and its core libraries sits near the bottom of a | 
 | developer's technology stack. Because of that, they expect it to be reliable and | 
 | bug free. They expect new releases of the Dart SDK to not break any of their | 
 | existing programs, even in rare, strange edge cases. At the scale of Dart today, | 
 | even the darkest corner of the language or libraries still has users who care | 
 | about its behavior. | 
 |  | 
 | We engineer that reliability using automated tests. The tests for the language | 
 | and core libraries, which is what this doc is about, live in the SDK repo under | 
 | `tests/`. As of this writing, there are 25,329 Dart files in there, containing | 
 | over 1.5 million lines of code and over 95,000 individual assertions. Making | 
 | this even harder is that Dart includes a variety of tools and | 
 | implementations. We have multiple compilers, runtimes, and static analyzers. | 
 | Each supports several platforms and options. All told, we currently have 476 | 
 | supported, tested configurations. | 
 |  | 
 | Since our testing needs are both complex and unique, we have several custom test | 
 | formats and our own test runner for executing those tests. This doc explains how | 
 | to work with tests and use the test runner. | 
 |  | 
 | ## Concepts | 
 |  | 
 | There are several workflows that share the same concepts and terms, so let's get | 
 | those out of the way first. | 
 |  | 
 | *   **"test.py"**, **"test.dart"**, **test_runner** - The tool that runs tests. | 
 |     For many years, it had no real name, so was simply called "test.py" after | 
 |     the main entrypoint script used to invoke it. (It was originally written in | 
 |     Python before we had any functioning Dart implementations.) When we started | 
 |     migrating away from status files, we created a new entrypoint script, | 
 |     "test.dart". The pub package that contains all of the real code is named | 
 |     "test_runner" and lives at `pkg/test_runner`. | 
 |  | 
 | *   **Test suite** - A collection of test files that can be run. Most suites | 
 |     correspond to a top level directory under `tests/`: `language_2`, `lib_2`, | 
 |     and `co19_2` are each test suites. There are a couple of special suites | 
 |     like `pkg` whose files live elsewhere and have special handling in the test | 
 |     runner. | 
 |  | 
 |     Directories with the `_2` suffix contain pre-null-safety tests, while the | 
 |     corresponding suites without `_2` are the null-safe tests. (The `_2` | 
 |     suffix is a vestige of the migration from Dart 1.0 to Dart 2.0.) | 
 |  | 
 | *   **Test** - A test is a Dart file (which may reference other files) that the | 
 |     test runner will send to a Dart implementation and then validate the | 
 |     resulting behavior. Most test files are named with an `_test.dart` suffix. | 
 |  | 
 | *   **Configuration** - A specific combination of tools and options that can be | 
 |     used to invoke a test. For example, "run on the VM on Linux in debug mode" | 
 |     is a configuration, "compile to JavaScript with DDC and then run in | 
 |     Chrome", or "analyze using the analyzer package with its asserts enabled". | 
 |     Each configuration is a combination of architecture, mode, operating system, | 
 |     compiler, runtime and many other options. There are thousands of potential | 
 |     combinations. We officially support hundreds of points in that space by | 
 |     testing them on our bots. | 
 |  | 
 | *   **Bots**, **BuildBots**, **builders** - The infrastructure for automatically | 
 |     running the tests in the cloud across a number of configurations. Whenever a | 
 |     change lands on main, the bots run all of the tests to determine what | 
 |     behavior changed. | 
 |  | 
 | ## Expectations, outcomes, status files, and the results database | 
 |  | 
 | This concept is so important it gets its own section. The intent of the test | 
 | corpus is to ensure that the behavior we ship is the behavior we intend to ship. | 
 | In a perfect world, at every commit, every configuration of every tool would | 
 | pass every test. Alas, there are many complications: | 
 |  | 
 | *   Some tests deliberately test the error-reporting behavior of a tool. Compile | 
 |     errors are also specified behavior, so we have tests to guarantee things | 
 |     like "this program reports a compile error". In that case "pass" does *not* | 
 |     mean "executes the program without error" because the intent of the test is | 
 |     to report the error. | 
 |  | 
 | *   Some tests validate behavior that is not supported by some configurations. | 
 |     We have tests for "dart:io", but that library is not supported on the web. | 
 |     That means all of the "dart:io" tests "fail" on DDC and dart2js, but we of | 
 |     course intend and expect that to be true. | 
 |  | 
 | *   Some tests are for features that aren't implemented yet. We're constantly | 
 |     evolving and at any point in time, there is often some lag between the | 
 |     state of various tests and implementations. | 
 |  | 
 | *   Some tests are failing that shouldn't be, but they have been for some time | 
 |     and we don't currently intend to fix what's causing the failure. We should | 
 |     at some point, but we don't want this constant failure to drown out other, | 
 |     newer, unexpected failures. | 
 |  | 
 | All this means that it's not as simple as "run test and if no errors then great, | 
 | otherwise blow up." At a high level, what we really want is: | 
 |  | 
 | * To encode the behavior we *intend* to ship. | 
 | * To know what behavior *are* shipping. | 
 | * To know when that behavior *changes* and what commits cause it. | 
 |  | 
 | ### Expectation | 
 |  | 
 | You can think of each test as having an intent. A human reading the test's code | 
 | (and any special comments in it) should be able to predict the behavior the test | 
 | produces. For most tests, the intent is to run to completion without error. For | 
 | example: | 
 |  | 
 | ```dart | 
 | import 'package:expect/expect.dart'; | 
 |  | 
 | main() { | 
 |   Expect.equals(3, 1 + 2); | 
 | } | 
 | ``` | 
 |  | 
 | The `Expect.equals()` method throws an exception if the two arguments aren't | 
 | equal. So this test will complete silently if `1 + 2` is `3`. Otherwise, it will | 
 | throw an uncaught exception, which the test runner detects. So the intent is to | 
 | "pass" where "pass" means "run to completion". | 
 |  | 
 | Other tests exist to validate the error reporting behavior of the system: | 
 |  | 
 | ```dart | 
 | main() { | 
 |   int i = "not int"; //# 01: compile-time error | 
 | } | 
 | ``` | 
 |  | 
 | This test (a "multitest", see below), says that a conforming tool should produce | 
 | a compile time error when the second line of code is included. | 
 |  | 
 | Before running a test, the test runner parses the test file and infers an | 
 | **expectation** from it—what the human author says a tool should do when | 
 | it runs the test. By default, the expectation is "pass", but some tests are | 
 | expected to produce runtime errors or compile-time errors. | 
 |  | 
 | ### Outcome | 
 |  | 
 | When the test runner runs a test on some tool, the tool does some thing. It can | 
 | report a compile error or report a runtime error. Maybe it hangs indefinitely | 
 | and the test runner has to shut it down and consider it a time out. If it does | 
 | none of those things and exits cleanly, it "passes". | 
 |  | 
 | We call this the **outcome**. It's the actual observed behavior of the tool. | 
 |  | 
 | ### Status | 
 |  | 
 | In order to tell if a behavior *changes*, we need to record what the previous | 
 | behavior was the last time we ran the test under some configuration. We call | 
 | this the test's **status**. | 
 |  | 
 | In the past, status was recorded in a separate set of | 
 | [**status files**](Status-files.md). These live in the repo under `tests/`. | 
 | In order to change the expected | 
 | behavior of a test, you had to manually change the status file and then commit | 
 | that change. In practice, we found this doesn't scale to the number of tools and | 
 | configurations we have today. | 
 |  | 
 | Instead, we now store the results of previous test runs in a **results | 
 | database**. This database is updated automatically by the bots when tests | 
 | complete. Every time a bot runs a test suite for some configuration it gathers | 
 | up the outcome of every test and records that in the database, associated with | 
 | the commit that it ran against. That database is queried by tools (including the | 
 | test runner itself) to determine the status of any test for any supported | 
 | configuration, at any point in the commit stream. | 
 |  | 
 | #### Viewing the current status | 
 |  | 
 | The [current results app](https://dart-current-results.web.app/) shows the | 
 | current status of all tests, with the ability to filter by test name (path | 
 | prefix). | 
 |  | 
 | The [results feed](https://dart-ci.firebaseapp.com/) shows the most recent | 
 | changes to test statuses, with a tab for unapproved changes. | 
 |  | 
 | ### Skips and slows | 
 |  | 
 | However, we have not yet completely eliminated status files. They are still | 
 | around because they are still the source of truth for some hand-authored data | 
 | about what a test *should* do. In particular, some tests don't make sense for a | 
 | certain configuration at all and should be **skipped**. For example, there's no | 
 | point in running the "dart:io" tests on dart2js. | 
 |  | 
 | Today, the place a human says "skip these 'dart:io' tests if the configuration | 
 | uses dart2js" is in the *status* files. Likewise, some tests are particularly | 
 | slow on some configurations and we want to give the test runner more time before | 
 | it declares them timing out. That data is also stored in the status files. | 
 |  | 
 | Eventually, we hope to move this data into the test itself. | 
 |  | 
 | ### Comparison and confusion | 
 |  | 
 | The previous three sections lay things out in a nice clean way, but the reality | 
 | is much more muddled. This is largely because we end up overloading words and don't have clear ways to talk about the *combinations* of outcome, expectation, and status. | 
 |  | 
 | For example, if a test "passes", it could mean: | 
 |  | 
 | *   The *outcome* is that it completes without error, even though the | 
 |     *expectation* is that it should not. | 
 | *   The *outcome* is that it completes without error and the *expectation* is | 
 |     that it does that. | 
 | *   The *expectation* is that it produces a compile error and the outcome is | 
 |     that it correctly produces that error. | 
 | *   The *outcome* matches the *status*, regardless of what they are. | 
 |  | 
 | If a test "fails", it could be: | 
 |  | 
 | *   The *outcome* is that it reports an error. | 
 | *   The *outcome* is that it does *not* and the *expectation* is that it | 
 |     *should*. | 
 | *   The *outcome* is that it reports an error, the *expectation* is that it | 
 |     should *not*, but the *status* is that it *does*. | 
 |  | 
 | Ugh, I could go on. All this really means is that the combination of outcome, | 
 | expectation, and status makes things confusing and you have to be careful when | 
 | talking about tests and trying to understand the output of tools. Now that | 
 | you're nice and confused... | 
 |  | 
 | ## How the test runner works | 
 |  | 
 | When you invoke the test runner, it walks all of the files in the specified | 
 | test suites. For each test file: | 
 |  | 
 | 1.  **Parse the test file to figure out its expectation.** As you'll see below, | 
 |     there are a enough of special marker comments you can use in a test to tell | 
 |     the test runner what the intent of the test is. | 
 |  | 
 | 2.  **Figure out what commands to execute with what arguments.** For a simple VM | 
 |     test, that may be just "run the VM and give it the path to the test file". | 
 |     For a web test, it's multiple commands: One to compile the test to | 
 |     JavaScript, and then a separate command to run the result in a JavaScript | 
 |     environment. There is custom code in the test runner for each pair of | 
 |     compiler and runtime to do this. | 
 |  | 
 | 3.  **Run the commands and wait for them to complete.** There's a whole little | 
 |     dependency graph build system in there that knows which commands depend on | 
 |     which ones and tries to run stuff in parallel when it can. | 
 |  | 
 | 4.  **Analyze the result of the commands to determine the test's outcome.** For | 
 |     each command the test runner invokes, it has code to parse the command's | 
 |     output. This is usually some combination of looking at the exit code, | 
 |     stdout, and stderr. | 
 |  | 
 | 5.  **Compare the expectation to the outcome.** This produces a new value that | 
 |     the test runner vaguely calls "actual". It's sort of a higher-order outcome | 
 |     relative to the expectation. We should figure out a better word for it. Some | 
 |     examples: | 
 |  | 
 |     ```text | 
 |     expectation      + outcome          -> actual | 
 |     -------------------------------------------------- | 
 |     Pass             + Pass                Pass | 
 |     CompileTimeError + CompileTimeError -> Pass | 
 |     Pass             + CompileTimeError -> CompileTimeError | 
 |     CompileTimeError + Pass             -> MissingCompileTimeError | 
 |     ``` | 
 |  | 
 |     In other words, if the outcome and the expectation align, that's a "pass" in | 
 |     that the tool does what a human says it's supposed to. If they disagree, | 
 |     there is some kind of "failure"—either an error was supposed to | 
 |     reported and wasn't, or an unexpected error occurred. | 
 |  | 
 | 6.  **Compare actual to the status and report any difference.** Now we know what | 
 |     the tool did relative to what a human said it's supposed to do. Next is | 
 |     figuring out how that result compares to the tool's *previous* behavior. | 
 |  | 
 |     If the result and status are the same, the test runner reports the test as | 
 |     passing. Otherwise, it reports a failure and shows you the both the result | 
 |     and status. To make things profoundly confusing, it refers to the status as | 
 |     "expectation". | 
 |  | 
 | ### A confusing example | 
 |  | 
 | The fact that we have three levels of "result" which then get mixed together is | 
 | what makes this so confusing, but each level does serve a useful function. The | 
 | tools could definitely be clearer about how it's presented. | 
 |  | 
 | Here's a maximally tricky example. Let's say we decide to change Dart and make | 
 | it a static error to add doubles and ints. You create this test: | 
 |  | 
 | ``` | 
 | void main() { | 
 |   1 + 2.3; //# 00: compile-time error | 
 | } | 
 | ``` | 
 |  | 
 | You run it on analyzer before the analyzer team has had a chance to implement | 
 | the new behavior. This test is brand new, so it has no existing status and | 
 | defaults to "Pass". You'll get: | 
 |  | 
 | *   Expectation: CompileTimeError. That's what the multitest marker comment | 
 |     means. | 
 | *   Outcome: Pass. In Dart today, this code has no errors, so the analyzer | 
 |     doesn't report any compile error. | 
 | *   Actual: MissingCompileTimeError. There was supposed to be an error reported, | 
 |     but the tool didn't report any, so the result was a failure to report an | 
 |     expected error. | 
 | *   Status: Pass. This is the default status since it's a new test. | 
 |  | 
 | Then the test runner prints out something like: | 
 |  | 
 | ``` | 
 | FAILED: dart2analyzer-none release_x64 language_2/int_double_plus_test/00 | 
 | Expected: Pass | 
 | Actual: MissingCompileTimeError | 
 | ``` | 
 |  | 
 | If you change the status to `MissingCompileTimeError` then it will "pass" and | 
 | not print a failure. If, instead, the analyzer team implements the new error, | 
 | then the outcome will become CompileTimeError. Since that aligns with the | 
 | expectation, the actual becomes Pass. That in turn matches the status, so the | 
 | whole test will report as successful. | 
 |  | 
 | In short: **When the test runner reports a test as succeeding it means the | 
 | difference between the tool's actual behavior and intended behavior has not | 
 | changed since the last time a human looked at it.** | 
 |  | 
 | ## Running tests locally | 
 |  | 
 | There are two ways to run tests (cue surprisingly accurate "the old deprecated | 
 | way and the way that doesn't work yet" joke). | 
 |  | 
 | ### Using test.py | 
 |  | 
 | The old entrypoint is "test.py": | 
 |  | 
 | ```sh | 
 | ./tools/test.py -n analyzer-mac language_2/spread_collections | 
 | ``` | 
 |  | 
 | The `-n analyzer-mac` means "run the tests using the 'analyzer-mac' named | 
 | configurations". Configurations are defined in the "test matrix", which is a | 
 | giant JSON file at `tools/bots/test_matrix.json`. The | 
 | `language_2/spread_collections` argument is a "selector". The selector syntax is | 
 | a little strange but it's basically a test suite name followed by an optional | 
 | somewhat glob-like path for the subset of tests you want to run. | 
 |  | 
 | When running the test runner through the `test.py` entrypoint, it does not look | 
 | up the current test status from the results database. Instead it just uses the | 
 | old status files. This is dubious because those status files are no longer being | 
 | maintained, so you will likely get spurious failures simply because the status | 
 | is out of date even the tool is doing what it should. | 
 |  | 
 | Eventually, this way of running tests should be removed, along with the status | 
 | files. | 
 |  | 
 | ### Using test.dart | 
 |  | 
 | The new entrypoint is "test.dart": | 
 |  | 
 | ```sh | 
 | $ ./tools/sdks/dart-sdk/bin/dart tools/test.dart -n analyzer-asserts-mac | 
 | ``` | 
 |  | 
 | This ultimately uses the same test runner and works similar to "test.py", | 
 | except reads test status from the **results database**. | 
 |  | 
 | ### Finding a configuration | 
 |  | 
 | Reading a several thousand line JSON file to find the name of the configuration | 
 | that matches the way you want to run a batch of tests is not fun. Usually, you | 
 | know what compiler and runtime you want to test, and you want something that | 
 | can run on your local machine. To help you find a configuration that matches | 
 | that, the test runner has a `--find-configurations` option. | 
 |  | 
 | Pass that, and the test runner prints out the list of configuration names that | 
 | match an optional set of filters you provide, which can be any combination of: | 
 |  | 
 | ``` | 
 | -m, --mode | 
 |       Mode in which to run the tests. | 
 |  | 
 | -c, --compiler | 
 |       How the Dart code should be compiled or statically processed. | 
 |  | 
 | -r, --runtime | 
 |       Where the tests should be run. | 
 |  | 
 |     --nnbd | 
 |       Which set of non-nullable type features to use. | 
 |  | 
 | -a, --arch | 
 |       The architecture to run tests for. | 
 |  | 
 | -s, --system | 
 |       The operating system to run tests on. | 
 | ``` | 
 |  | 
 | If you don't provide them, then `mode` defaults to `release`, `system` to your | 
 | machine's OS, and `arch` to `x64`. If you don't want those default filters, you | 
 | can use `all` for any of those options to not filter by that. | 
 |  | 
 | Pass `--help` to the test runner to see the allowed values for all of these. | 
 |  | 
 | ### Local configurations | 
 |  | 
 | You typically want to run the tests locally using the same configuration that | 
 | the status results are pulled from. It would be strange, for example, to use the | 
 | status of some dart2js configuration to define the expected outcome of a batch | 
 | of VM tests. | 
 |  | 
 | But the bots don't always test every configuration and sometimes there is a | 
 | "nearby" configuration to what you want to run that will work for what you're | 
 | testing locally. The common case is if you're running tests on a Mac but the bot | 
 | only tests a Linux config. Results don't often vary by platform, so you can just | 
 | use the Linux results for your local run. To do that, there is an extra flag: | 
 |  | 
 | ```sh | 
 | $ ./tools/sdks/dart-sdk/bin/dart tools/test.dart -n analyzer-asserts-linux \ | 
 |     -N analyzer-asserts-mac | 
 | ``` | 
 |  | 
 | The `-n` (lowercase) flag says "use the results for this config". The `-N` | 
 | (uppercase) flag says "run this configuration locally". By default, when you | 
 | omit `-N`, it runs the same config locally that it gets results from. | 
 |  | 
 | ## Running tests on the bots and commit queue | 
 |  | 
 | Once you have some code working to your satisfaction, the next step is getting | 
 | it landed. This is where the bots come into play. Your first interaction with | 
 | them will be the **trybots** on the **commit queue**. When you send a change out | 
 | for code review, a human must approve it before you can land it. But it must | 
 | also survive a gauntlet of robots. | 
 |  | 
 | The commit queue is a system that automatically runs your not-yet-landed change | 
 | on a subset of all of the bots. This subset is the trybots. They run a selected | 
 | set of configurations to balance good coverage against completing in a timely | 
 | manner. If all of those tests pass the change can land on main. Here "pass" | 
 | means that the test result is *the same as it was before your change.* A | 
 | *change* in status is what's considered "failure". | 
 |  | 
 | Of course, many times getting something working and changing the outcome of a | 
 | test from failing to passing is your exact goal! In that case, you can *approve* | 
 | the changes. This is how a human tells the bots that the change in status is | 
 | deliberate and desired. | 
 |  | 
 | This workflow is still in flux as part of the move away from status files. | 
 | **TODO: Rewrite or remove this once the approval workflow is gone or complete.** | 
 |  | 
 | Once you please all of the trybots, the change can land on main. After that | 
 | *all* of the bots pick up the change and run the tests on all of the supported | 
 | configurations against your change. Assuming we've picked a good set of trybots, | 
 | these should all pass. But sometimes you encounter a test that behaves as | 
 | expected on the subset of trybots but still changes the behavior on some other | 
 | configuration. So failures can happen here and the bots "turn red". When this | 
 | happens, you'll need to either approve the new outcomes, revert the change, or | 
 | land a fix. | 
 |  | 
 | ## Working on tests | 
 |  | 
 | Your job may mostly entail working on a Dart implementation, but there's still | 
 | a good chance you'll end up touching the tests themselves. If you are designing | 
 | or implementing a new feature, you will likely edit or add new tests. Our tests | 
 | have an unfortunate amount of technical debt, including bugs, so you may end up | 
 | finding and needing to fix some of that too. | 
 |  | 
 | SDK tests are sort of like "integration tests". They are complete Dart programs | 
 | that validate some Dart tool's behavior by having the tool run the program and | 
 | then verifying the tool's externally visible behavior. (Most tools also have | 
 | their own set of white box unit tests, but those are out of scope for this doc.) | 
 |  | 
 | The simplest and most typical test is simply a Dart script that the tool should | 
 | run without error and then exit with exit code zero. For example: | 
 |  | 
 | ```dart | 
 | import 'package:expect/expect.dart'; | 
 |  | 
 | main() { | 
 |   Expect.equals(3, 1 + 2); | 
 | } | 
 | ``` | 
 |  | 
 | We don't use the ["test"][test pkg] package for writing language and core | 
 | library tests. Since the behavior we're testing is so low level and fundamental, | 
 | we try to minimize the quantity and complexity of the code under test. The | 
 | "test" package is great but it uses tons of language and core library | 
 | functionality. If you're trying to fix a bug in that same functionality, it's no | 
 | fun if your test framework itself is broken by the same bug. | 
 |  | 
 | [test pkg]: https://pub.dev/packages/test | 
 |  | 
 | Instead, we have a much simpler package called ["expect"][expect pkg]. It's [a | 
 | single Dart file][expect lib] that exposes a rudimentary JUnit-like API for | 
 | making assertions about behavior. Fortunately, since the behavior we're testing | 
 | is also pretty low-level, that API is usually sufficient. | 
 |  | 
 | [expect pkg]: https://github.com/dart-lang/sdk/tree/main/pkg/expect | 
 | [expect lib]: https://github.com/dart-lang/sdk/blob/main/pkg/expect/lib/expect.dart | 
 |  | 
 | The way it works is that if an assertion fails, it throws an exception. That | 
 | exception unwinds the entire callstack and a Dart implementation then exits in | 
 | some failed way that the test runner can detect and determine that a runtime | 
 | error occurred. | 
 |  | 
 | If you are writing asynchronous tests, there is a separate tiny | 
 | ["async_helper"][async helper] library that talks to the test runner to ensure all | 
 | asynchronous operations performed by the test have a chance to complete. | 
 |  | 
 | [async helper]: https://github.com/dart-lang/sdk/tree/main/pkg/expect/lib/async_helper.dart | 
 |  | 
 | With these two libraries, it's straightforward to write tests of expected correct | 
 | runtime behavior. You can also write tests for validating runtime *failures* by | 
 | using the helper functions for checking that certain exceptions are thrown: | 
 |  | 
 | ```dart | 
 | import 'package:expect/expect.dart'; | 
 |  | 
 | main() { | 
 |   var list = [0, 1]; | 
 |   Expect.throwsRangeError(() { | 
 |     list[2]; | 
 |   }); | 
 | } | 
 | ``` | 
 |  | 
 | This test correctly passes if executing `list[2]` throws a RangeError. | 
 |  | 
 | We also need to pin down the *static* behavior of Dart programs. The set of | 
 | compile errors are specified by the language and tools are expected to report | 
 | them correctly. Plain tests aren't enough for that because a test containing a | 
 | static error can't be run at all. To handle that, the test runner has built-in | 
 | support for other kinds of tests. They are: | 
 |  | 
 | ### Multitests | 
 |  | 
 | A simple multitest looks like this: | 
 |  | 
 | ```dart | 
 | // language_2/abstract_syntax_test.dart | 
 | class A { | 
 |   foo(); //# 00: compile-time error | 
 |   static bar(); //# 01: compile-time error | 
 | } | 
 | ``` | 
 |  | 
 | Each line containing `//#` marks that line as being owned by a certain multitest | 
 | section. The identifier after that is the name of the section. It's usually just | 
 | a number like `00` and `01` here, but can be anything. Then, after the colon, | 
 | you have an expected outcome. | 
 |  | 
 | The test runner takes this file and splits it into several new test files, one | 
 | for each section, with an extra one for the "no sections". Each file contains | 
 | all of the unmarked lines as well as the lines marked for a certain section. So | 
 | the above test gets split into three files: | 
 |  | 
 | ```dart | 
 | // language_2/abstract_syntax_test/none.dart | 
 | class A { | 
 |  | 
 |  | 
 | } | 
 | ``` | 
 |  | 
 | ```dart | 
 | // language_2/abstract_syntax_test/00.dart | 
 | class A { | 
 |   foo(); //# 00: compile-time error | 
 |  | 
 | } | 
 | ``` | 
 |  | 
 | ```dart | 
 | // language_2/abstract_syntax_test/01.dart | 
 | class A { | 
 |  | 
 |   static bar(); //# 01: compile-time error | 
 | } | 
 | ``` | 
 |  | 
 | This is literally done textually. Then the test runner runs each of those files | 
 | separately. The expectation for the file is whatever was set in its section's | 
 | marker. The "none" file is always expected to pass. | 
 |  | 
 | So, in this case, it expects `none.dart` to compile and run without error. It | 
 | expects `00.dart` and `01.dart` to report some kind of compile-time error | 
 | anywhere in the file. | 
 |  | 
 | A single file can contain multiple distinct sections which lets you test for | 
 | multiple different errors in a single file. It can distinguish between reporting | 
 | a compile-time error versus a runtime error. It's the best we had for a long | 
 | time, and works pretty well, so there are many many multitests. | 
 |  | 
 | But they aren't great. A single test file containing 20 multitest section gets | 
 | split into 21 files, each of which has to pass through the entire compilation | 
 | and execution pipeline independently. That's pretty slow. You get better | 
 | granularity, but still not perfect. As long as *some* error is reported | 
 | *somewhere* in the file that contains the section's lines, it considers that | 
 | good enough. | 
 |  | 
 | We see a lot of multitests that incorrectly pass because they are supposed to | 
 | detect some interesting static type error reported by the type checker. But they | 
 | actually pass because the author had a typo somewhere, which gets reported as a | 
 | syntax error at compile time. | 
 |  | 
 | To try to more precisely and easily write tests for static errors, there is a | 
 | yet another kind of test... | 
 |  | 
 | ### Static error tests | 
 |  | 
 | This is the newest form of test. These tests are *only* for validating static | 
 | errors reported by one of the two front ends: CFE and analyzer. For all other | 
 | configurations, they are automatically skipped. | 
 |  | 
 | The previous multitest converted to a static error test looks like this: | 
 |  | 
 | ```dart | 
 | class A { | 
 | //    ^ | 
 | // [cfe] The non-abstract class 'A' is missing implementations for these members: | 
 |   foo(); | 
 | //^^^^^^ | 
 | // [analyzer] STATIC_WARNING.CONCRETE_CLASS_WITH_ABSTRACT_MEMBER | 
 |  | 
 |   static bar(); | 
 | //            ^ | 
 | // [analyzer] SYNTACTIC_ERROR.MISSING_FUNCTION_BODY | 
 | // [cfe] Expected a function body or '=>'. | 
 | } | 
 | ``` | 
 |  | 
 | Each group of adjacent line comments here defines an **error expectation**. The | 
 | first comment line defines the error's location. The line is the preceding line, | 
 | the column is the column containing the first caret, and the length is the | 
 | number of carets. If the preceding line is itself part of some other error | 
 | expectation, it will be skipped over, so you can define multiple errors that | 
 | are reported on the same line: | 
 |  | 
 | ```dart | 
 | int i = "not int" / 345; | 
 | //      ^^^^^^^^^ | 
 | // [analyzer] STATIC_WARNING.SOME_ERROR | 
 | //                  ^^^ | 
 | // [analyzer] STATIC_WARNING.ANOTHER_ERROR | 
 | ``` | 
 |  | 
 | In cases where the location doesn't neatly fit into this syntax—it either | 
 | starts before column 2 or spans multiple lines—an explicit syntax can be | 
 | used instead: | 
 |  | 
 | ```dart | 
 | var obj1 = [...(123)]; | 
 | // [error line 1, column 17, length 3] | 
 | // [analyzer] CompileTimeErrorCode.AMBIGUOUS_SET_OR_MAP_LITERAL_BOTH | 
 | // [cfe] Error: Unexpected type 'int' of a spread.  Expected 'dynamic' or an Iterable. | 
 | ``` | 
 |  | 
 | After the location comment line are line defining how analyzer and CFE | 
 | should report the error. First, a line comment starting with `[analyzer]` | 
 | followed by an analyzer error code specifies that analyzer should report an | 
 | error at this location with this error code. If omitted, analyzer is not | 
 | expected to report this error. | 
 |  | 
 | Finally, a line comment starting with "[cfe] " followed by an error message | 
 | specifies that CFE should report an error with the given text at this location. | 
 | If omitted, the CFE is not expected to report an error here. If the CFE error | 
 | message is longer than a single line, you can have further line comments after | 
 | the initial `// [cfe]` one: | 
 |  | 
 | ```dart | 
 | var obj1 = [...(123)]; | 
 | //             ^^^^^ | 
 | // [cfe] Error: Unexpected type 'int' of a spread. | 
 | // Expected 'dynamic' or an Iterable. | 
 | // Another line of error message text. | 
 | ``` | 
 |  | 
 | When the test runner runs a test, it looks for and parses all error | 
 | expectations. If it finds any, the test is a static error test. It runs the test | 
 | once under the given configuration and validates that it produces all of the | 
 | expected errors at the expected locations. The test passes if all errors are | 
 | reported at the right location. With the analyzer front end, all expected error | 
 | codes must match. For the CFE, error messages must match. | 
 |  | 
 | Using a single invocation to report all errors requires our tools to do decent | 
 | error recovery and detect and report multiple errors. But our users expect that | 
 | behavior anyway, and this lets us validate that and get a significant | 
 | performance benefit. | 
 |  | 
 | #### Web static errors | 
 |  | 
 | Most static errors are reported by all Dart tools. The implementation of that | 
 | error reporting either comes from analyzer or the CFE and the above two static | 
 | error categories `[analyzer]` and `[cfe]` cover almost all static error tests. | 
 |  | 
 | However, the web compilers DDC and dart2js also have a few additional static | 
 | restrictions they place on Dart. Code that fails to meet those restrictions | 
 | produces compile-time errors. We have static error tests for those too, and | 
 | they use a separate `[web]` marker comment. Those tests are run on DDC and | 
 | dart2js to ensure the error is reported. | 
 |  | 
 | #### Divergent errors | 
 |  | 
 | Since we don't validate analyzer error messages or CFE error codes, there is | 
 | already some flexibility when the two front ends don't report identical errors. | 
 | But sometimes one front end may not report an error at all, or at a different | 
 | location. | 
 |  | 
 | To support that, the error code or message can be omitted. An error expectation | 
 | with no error code is ignored on analyzer. Conversely one with no message is | 
 | ignored on CFE. Note in the above example that CFE reports the error about | 
 | `foo()` on the class declaration while analyzer reports it at the declaration of | 
 | `foo()` itself. Those are two separate error expectations. | 
 |  | 
 | #### Unspecified errors | 
 |  | 
 | A good tool takes into account the processes around it. One challenge for us is | 
 | that tests for new errors are often authored before the implementations exist. | 
 | In the case of co19, those tests are authored by people outside of the team in a | 
 | separate Git repo. At that point in time, we don't know what the precise error | 
 | code or message will be. | 
 |  | 
 | To make this a little less painful, we allow an error to be "unspecified": | 
 |  | 
 | ```dart | 
 | var obj1 = [...(123)]; | 
 | //             ^^^^^^ | 
 | // [analyzer] unspecified | 
 | // [cfe] unspecified | 
 | ``` | 
 |  | 
 | The special word `unspecifed` in place of an error code or message means the | 
 | error must be reported on the given line, but that any code or message is | 
 | acceptable. Also, the column and length information is ignored. That enables | 
 | this workflow: | 
 |  | 
 | 1.  Someone adds a test for an unimplemented feature using `unspecified` lines. | 
 | 2.  The analyzer or CFE team implements the feature and lands support. The tests | 
 |     start passing. | 
 | 3.  In order to pin down the precise behavior now that it's known, someone goes | 
 |     back and replaces `unspecified` with the actual error code or message. Now, | 
 |     any minor change will cause the test to break so we notice if, for example, | 
 |     a syntax error starts masking a type error. | 
 | 4.  The other front end team does the same. | 
 | 5.  Now the test is fully precise and passing. | 
 |  | 
 | The important points are that the tests start passing on step 2 and continue to | 
 | pass through the remaining steps. The syntax above is chosen to be easily | 
 | greppable so we can keep track of how many tests need to be made more precise. | 
 |  | 
 | #### Automatic error generation | 
 |  | 
 | Step three in the above workflow is a chore. Run a test on some front end. Copy | 
 | the error output. Paste it into the test file in the right place. Turn it into a | 
 | comment. This needs to happen both for new tests and any time the error | 
 | reporting for a front end changes, which is frequent. The analyzer and CFE folks | 
 | are always improving the usability of their tools by tweaking error messages and | 
 | locations. | 
 |  | 
 | To make this less painful, we have a tool that will automatically take the | 
 | current output of either front end and insert the corresponding error | 
 | expectation into a test file for you. It can also remove existing error | 
 | implementations. | 
 |  | 
 | The tool is: | 
 |  | 
 | ```sh | 
 | $ dart pkg/test_runner/tool/update_static_error_tests.dart -u "**/abstract_syntax_test.dart" | 
 | ``` | 
 |  | 
 | It takes a couple of flags for what operation to perform (insert, remove, | 
 | update) and for what front end (analyzer, CFE, both) followed by a glob for | 
 | which files to modify. Then it goes through and updates all of the matching | 
 | tests. | 
 |  | 
 | This should give us very precise testing of static error behavior without | 
 | much manual effort and good execution performance. |