|  | > [!IMPORTANT] | 
|  | > This page was copied from https://github.com/dart-lang/sdk/wiki and needs review. | 
|  | > Please [contribute](../CONTRIBUTING.md) changes to bring it up-to-date - | 
|  | > removing this header - or send a CL to delete the file. | 
|  |  | 
|  | --- | 
|  |  | 
|  | A programming language and its core libraries sits near the bottom of a | 
|  | developer's technology stack. Because of that, they expect it to be reliable and | 
|  | bug free. They expect new releases of the Dart SDK to not break any of their | 
|  | existing programs, even in rare, strange edge cases. At the scale of Dart today, | 
|  | even the darkest corner of the language or libraries still has users who care | 
|  | about its behavior. | 
|  |  | 
|  | We engineer that reliability using automated tests. The tests for the language | 
|  | and core libraries, which is what this doc is about, live in the SDK repo under | 
|  | `tests/`. As of this writing, there are 25,329 Dart files in there, containing | 
|  | over 1.5 million lines of code and over 95,000 individual assertions. Making | 
|  | this even harder is that Dart includes a variety of tools and | 
|  | implementations. We have multiple compilers, runtimes, and static analyzers. | 
|  | Each supports several platforms and options. All told, we currently have 476 | 
|  | supported, tested configurations. | 
|  |  | 
|  | Since our testing needs are both complex and unique, we have several custom test | 
|  | formats and our own test runner for executing those tests. This doc explains how | 
|  | to work with tests and use the test runner. | 
|  |  | 
|  | ## Concepts | 
|  |  | 
|  | There are several workflows that share the same concepts and terms, so let's get | 
|  | those out of the way first. | 
|  |  | 
|  | *   **"test.py"**, **"test.dart"**, **test_runner** - The tool that runs tests. | 
|  | For many years, it had no real name, so was simply called "test.py" after | 
|  | the main entrypoint script used to invoke it. (It was originally written in | 
|  | Python before we had any functioning Dart implementations.) When we started | 
|  | migrating away from status files, we created a new entrypoint script, | 
|  | "test.dart". The pub package that contains all of the real code is named | 
|  | "test_runner" and lives at `pkg/test_runner`. | 
|  |  | 
|  | *   **Test suite** - A collection of test files that can be run. Most suites | 
|  | correspond to a top level directory under `tests/`: `language_2`, `lib_2`, | 
|  | and `co19_2` are each test suites. There are a couple of special suites | 
|  | like `pkg` whose files live elsewhere and have special handling in the test | 
|  | runner. | 
|  |  | 
|  | Directories with the `_2` suffix contain pre-null-safety tests, while the | 
|  | corresponding suites without `_2` are the null-safe tests. (The `_2` | 
|  | suffix is a vestige of the migration from Dart 1.0 to Dart 2.0.) | 
|  |  | 
|  | *   **Test** - A test is a Dart file (which may reference other files) that the | 
|  | test runner will send to a Dart implementation and then validate the | 
|  | resulting behavior. Most test files are named with an `_test.dart` suffix. | 
|  |  | 
|  | *   **Configuration** - A specific combination of tools and options that can be | 
|  | used to invoke a test. For example, "run on the VM on Linux in debug mode" | 
|  | is a configuration, "compile to JavaScript with DDC and then run in | 
|  | Chrome", or "analyze using the analyzer package with its asserts enabled". | 
|  | Each configuration is a combination of architecture, mode, operating system, | 
|  | compiler, runtime and many other options. There are thousands of potential | 
|  | combinations. We officially support hundreds of points in that space by | 
|  | testing them on our bots. | 
|  |  | 
|  | *   **Bots**, **BuildBots**, **builders** - The infrastructure for automatically | 
|  | running the tests in the cloud across a number of configurations. Whenever a | 
|  | change lands on main, the bots run all of the tests to determine what | 
|  | behavior changed. | 
|  |  | 
|  | ## Expectations, outcomes, status files, and the results database | 
|  |  | 
|  | This concept is so important it gets its own section. The intent of the test | 
|  | corpus is to ensure that the behavior we ship is the behavior we intend to ship. | 
|  | In a perfect world, at every commit, every configuration of every tool would | 
|  | pass every test. Alas, there are many complications: | 
|  |  | 
|  | *   Some tests deliberately test the error-reporting behavior of a tool. Compile | 
|  | errors are also specified behavior, so we have tests to guarantee things | 
|  | like "this program reports a compile error". In that case "pass" does *not* | 
|  | mean "executes the program without error" because the intent of the test is | 
|  | to report the error. | 
|  |  | 
|  | *   Some tests validate behavior that is not supported by some configurations. | 
|  | We have tests for "dart:io", but that library is not supported on the web. | 
|  | That means all of the "dart:io" tests "fail" on DDC and dart2js, but we of | 
|  | course intend and expect that to be true. | 
|  |  | 
|  | *   Some tests are for features that aren't implemented yet. We're constantly | 
|  | evolving and at any point in time, there is often some lag between the | 
|  | state of various tests and implementations. | 
|  |  | 
|  | *   Some tests are failing that shouldn't be, but they have been for some time | 
|  | and we don't currently intend to fix what's causing the failure. We should | 
|  | at some point, but we don't want this constant failure to drown out other, | 
|  | newer, unexpected failures. | 
|  |  | 
|  | All this means that it's not as simple as "run test and if no errors then great, | 
|  | otherwise blow up." At a high level, what we really want is: | 
|  |  | 
|  | * To encode the behavior we *intend* to ship. | 
|  | * To know what behavior *are* shipping. | 
|  | * To know when that behavior *changes* and what commits cause it. | 
|  |  | 
|  | ### Expectation | 
|  |  | 
|  | You can think of each test as having an intent. A human reading the test's code | 
|  | (and any special comments in it) should be able to predict the behavior the test | 
|  | produces. For most tests, the intent is to run to completion without error. For | 
|  | example: | 
|  |  | 
|  | ```dart | 
|  | import 'package:expect/expect.dart'; | 
|  |  | 
|  | main() { | 
|  | Expect.equals(3, 1 + 2); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | The `Expect.equals()` method throws an exception if the two arguments aren't | 
|  | equal. So this test will complete silently if `1 + 2` is `3`. Otherwise, it will | 
|  | throw an uncaught exception, which the test runner detects. So the intent is to | 
|  | "pass" where "pass" means "run to completion". | 
|  |  | 
|  | Other tests exist to validate the error reporting behavior of the system: | 
|  |  | 
|  | ```dart | 
|  | main() { | 
|  | int i = "not int"; //# 01: compile-time error | 
|  | } | 
|  | ``` | 
|  |  | 
|  | This test (a "multitest", see below), says that a conforming tool should produce | 
|  | a compile time error when the second line of code is included. | 
|  |  | 
|  | Before running a test, the test runner parses the test file and infers an | 
|  | **expectation** from it—what the human author says a tool should do when | 
|  | it runs the test. By default, the expectation is "pass", but some tests are | 
|  | expected to produce runtime errors or compile-time errors. | 
|  |  | 
|  | ### Outcome | 
|  |  | 
|  | When the test runner runs a test on some tool, the tool does some thing. It can | 
|  | report a compile error or report a runtime error. Maybe it hangs indefinitely | 
|  | and the test runner has to shut it down and consider it a time out. If it does | 
|  | none of those things and exits cleanly, it "passes". | 
|  |  | 
|  | We call this the **outcome**. It's the actual observed behavior of the tool. | 
|  |  | 
|  | ### Status | 
|  |  | 
|  | In order to tell if a behavior *changes*, we need to record what the previous | 
|  | behavior was the last time we ran the test under some configuration. We call | 
|  | this the test's **status**. | 
|  |  | 
|  | In the past, status was recorded in a separate set of | 
|  | [**status files**](Status-files.md). These live in the repo under `tests/`. | 
|  | In order to change the expected | 
|  | behavior of a test, you had to manually change the status file and then commit | 
|  | that change. In practice, we found this doesn't scale to the number of tools and | 
|  | configurations we have today. | 
|  |  | 
|  | Instead, we now store the results of previous test runs in a **results | 
|  | database**. This database is updated automatically by the bots when tests | 
|  | complete. Every time a bot runs a test suite for some configuration it gathers | 
|  | up the outcome of every test and records that in the database, associated with | 
|  | the commit that it ran against. That database is queried by tools (including the | 
|  | test runner itself) to determine the status of any test for any supported | 
|  | configuration, at any point in the commit stream. | 
|  |  | 
|  | #### Viewing the current status | 
|  |  | 
|  | The [current results app](https://dart-current-results.web.app/) shows the | 
|  | current status of all tests, with the ability to filter by test name (path | 
|  | prefix). | 
|  |  | 
|  | The [results feed](https://dart-ci.firebaseapp.com/) shows the most recent | 
|  | changes to test statuses, with a tab for unapproved changes. | 
|  |  | 
|  | ### Skips and slows | 
|  |  | 
|  | However, we have not yet completely eliminated status files. They are still | 
|  | around because they are still the source of truth for some hand-authored data | 
|  | about what a test *should* do. In particular, some tests don't make sense for a | 
|  | certain configuration at all and should be **skipped**. For example, there's no | 
|  | point in running the "dart:io" tests on dart2js. | 
|  |  | 
|  | Today, the place a human says "skip these 'dart:io' tests if the configuration | 
|  | uses dart2js" is in the *status* files. Likewise, some tests are particularly | 
|  | slow on some configurations and we want to give the test runner more time before | 
|  | it declares them timing out. That data is also stored in the status files. | 
|  |  | 
|  | Eventually, we hope to move this data into the test itself. | 
|  |  | 
|  | ### Comparison and confusion | 
|  |  | 
|  | The previous three sections lay things out in a nice clean way, but the reality | 
|  | is much more muddled. This is largely because we end up overloading words and don't have clear ways to talk about the *combinations* of outcome, expectation, and status. | 
|  |  | 
|  | For example, if a test "passes", it could mean: | 
|  |  | 
|  | *   The *outcome* is that it completes without error, even though the | 
|  | *expectation* is that it should not. | 
|  | *   The *outcome* is that it completes without error and the *expectation* is | 
|  | that it does that. | 
|  | *   The *expectation* is that it produces a compile error and the outcome is | 
|  | that it correctly produces that error. | 
|  | *   The *outcome* matches the *status*, regardless of what they are. | 
|  |  | 
|  | If a test "fails", it could be: | 
|  |  | 
|  | *   The *outcome* is that it reports an error. | 
|  | *   The *outcome* is that it does *not* and the *expectation* is that it | 
|  | *should*. | 
|  | *   The *outcome* is that it reports an error, the *expectation* is that it | 
|  | should *not*, but the *status* is that it *does*. | 
|  |  | 
|  | Ugh, I could go on. All this really means is that the combination of outcome, | 
|  | expectation, and status makes things confusing and you have to be careful when | 
|  | talking about tests and trying to understand the output of tools. Now that | 
|  | you're nice and confused... | 
|  |  | 
|  | ## How the test runner works | 
|  |  | 
|  | When you invoke the test runner, it walks all of the files in the specified | 
|  | test suites. For each test file: | 
|  |  | 
|  | 1.  **Parse the test file to figure out its expectation.** As you'll see below, | 
|  | there are a enough of special marker comments you can use in a test to tell | 
|  | the test runner what the intent of the test is. | 
|  |  | 
|  | 2.  **Figure out what commands to execute with what arguments.** For a simple VM | 
|  | test, that may be just "run the VM and give it the path to the test file". | 
|  | For a web test, it's multiple commands: One to compile the test to | 
|  | JavaScript, and then a separate command to run the result in a JavaScript | 
|  | environment. There is custom code in the test runner for each pair of | 
|  | compiler and runtime to do this. | 
|  |  | 
|  | 3.  **Run the commands and wait for them to complete.** There's a whole little | 
|  | dependency graph build system in there that knows which commands depend on | 
|  | which ones and tries to run stuff in parallel when it can. | 
|  |  | 
|  | 4.  **Analyze the result of the commands to determine the test's outcome.** For | 
|  | each command the test runner invokes, it has code to parse the command's | 
|  | output. This is usually some combination of looking at the exit code, | 
|  | stdout, and stderr. | 
|  |  | 
|  | 5.  **Compare the expectation to the outcome.** This produces a new value that | 
|  | the test runner vaguely calls "actual". It's sort of a higher-order outcome | 
|  | relative to the expectation. We should figure out a better word for it. Some | 
|  | examples: | 
|  |  | 
|  | ```text | 
|  | expectation      + outcome          -> actual | 
|  | -------------------------------------------------- | 
|  | Pass             + Pass                Pass | 
|  | CompileTimeError + CompileTimeError -> Pass | 
|  | Pass             + CompileTimeError -> CompileTimeError | 
|  | CompileTimeError + Pass             -> MissingCompileTimeError | 
|  | ``` | 
|  |  | 
|  | In other words, if the outcome and the expectation align, that's a "pass" in | 
|  | that the tool does what a human says it's supposed to. If they disagree, | 
|  | there is some kind of "failure"—either an error was supposed to | 
|  | reported and wasn't, or an unexpected error occurred. | 
|  |  | 
|  | 6.  **Compare actual to the status and report any difference.** Now we know what | 
|  | the tool did relative to what a human said it's supposed to do. Next is | 
|  | figuring out how that result compares to the tool's *previous* behavior. | 
|  |  | 
|  | If the result and status are the same, the test runner reports the test as | 
|  | passing. Otherwise, it reports a failure and shows you the both the result | 
|  | and status. To make things profoundly confusing, it refers to the status as | 
|  | "expectation". | 
|  |  | 
|  | ### A confusing example | 
|  |  | 
|  | The fact that we have three levels of "result" which then get mixed together is | 
|  | what makes this so confusing, but each level does serve a useful function. The | 
|  | tools could definitely be clearer about how it's presented. | 
|  |  | 
|  | Here's a maximally tricky example. Let's say we decide to change Dart and make | 
|  | it a static error to add doubles and ints. You create this test: | 
|  |  | 
|  | ``` | 
|  | void main() { | 
|  | 1 + 2.3; //# 00: compile-time error | 
|  | } | 
|  | ``` | 
|  |  | 
|  | You run it on analyzer before the analyzer team has had a chance to implement | 
|  | the new behavior. This test is brand new, so it has no existing status and | 
|  | defaults to "Pass". You'll get: | 
|  |  | 
|  | *   Expectation: CompileTimeError. That's what the multitest marker comment | 
|  | means. | 
|  | *   Outcome: Pass. In Dart today, this code has no errors, so the analyzer | 
|  | doesn't report any compile error. | 
|  | *   Actual: MissingCompileTimeError. There was supposed to be an error reported, | 
|  | but the tool didn't report any, so the result was a failure to report an | 
|  | expected error. | 
|  | *   Status: Pass. This is the default status since it's a new test. | 
|  |  | 
|  | Then the test runner prints out something like: | 
|  |  | 
|  | ``` | 
|  | FAILED: dart2analyzer-none release_x64 language_2/int_double_plus_test/00 | 
|  | Expected: Pass | 
|  | Actual: MissingCompileTimeError | 
|  | ``` | 
|  |  | 
|  | If you change the status to `MissingCompileTimeError` then it will "pass" and | 
|  | not print a failure. If, instead, the analyzer team implements the new error, | 
|  | then the outcome will become CompileTimeError. Since that aligns with the | 
|  | expectation, the actual becomes Pass. That in turn matches the status, so the | 
|  | whole test will report as successful. | 
|  |  | 
|  | In short: **When the test runner reports a test as succeeding it means the | 
|  | difference between the tool's actual behavior and intended behavior has not | 
|  | changed since the last time a human looked at it.** | 
|  |  | 
|  | ## Running tests locally | 
|  |  | 
|  | There are two ways to run tests (cue surprisingly accurate "the old deprecated | 
|  | way and the way that doesn't work yet" joke). | 
|  |  | 
|  | ### Using test.py | 
|  |  | 
|  | The old entrypoint is "test.py": | 
|  |  | 
|  | ```sh | 
|  | ./tools/test.py -n analyzer-mac language_2/spread_collections | 
|  | ``` | 
|  |  | 
|  | The `-n analyzer-mac` means "run the tests using the 'analyzer-mac' named | 
|  | configurations". Configurations are defined in the "test matrix", which is a | 
|  | giant JSON file at `tools/bots/test_matrix.json`. The | 
|  | `language_2/spread_collections` argument is a "selector". The selector syntax is | 
|  | a little strange but it's basically a test suite name followed by an optional | 
|  | somewhat glob-like path for the subset of tests you want to run. | 
|  |  | 
|  | When running the test runner through the `test.py` entrypoint, it does not look | 
|  | up the current test status from the results database. Instead it just uses the | 
|  | old status files. This is dubious because those status files are no longer being | 
|  | maintained, so you will likely get spurious failures simply because the status | 
|  | is out of date even the tool is doing what it should. | 
|  |  | 
|  | Eventually, this way of running tests should be removed, along with the status | 
|  | files. | 
|  |  | 
|  | ### Using test.dart | 
|  |  | 
|  | The new entrypoint is "test.dart": | 
|  |  | 
|  | ```sh | 
|  | $ ./tools/sdks/dart-sdk/bin/dart tools/test.dart -n analyzer-asserts-mac | 
|  | ``` | 
|  |  | 
|  | This ultimately uses the same test runner and works similar to "test.py", | 
|  | except reads test status from the **results database**. | 
|  |  | 
|  | ### Finding a configuration | 
|  |  | 
|  | Reading a several thousand line JSON file to find the name of the configuration | 
|  | that matches the way you want to run a batch of tests is not fun. Usually, you | 
|  | know what compiler and runtime you want to test, and you want something that | 
|  | can run on your local machine. To help you find a configuration that matches | 
|  | that, the test runner has a `--find-configurations` option. | 
|  |  | 
|  | Pass that, and the test runner prints out the list of configuration names that | 
|  | match an optional set of filters you provide, which can be any combination of: | 
|  |  | 
|  | ``` | 
|  | -m, --mode | 
|  | Mode in which to run the tests. | 
|  |  | 
|  | -c, --compiler | 
|  | How the Dart code should be compiled or statically processed. | 
|  |  | 
|  | -r, --runtime | 
|  | Where the tests should be run. | 
|  |  | 
|  | --nnbd | 
|  | Which set of non-nullable type features to use. | 
|  |  | 
|  | -a, --arch | 
|  | The architecture to run tests for. | 
|  |  | 
|  | -s, --system | 
|  | The operating system to run tests on. | 
|  | ``` | 
|  |  | 
|  | If you don't provide them, then `mode` defaults to `release`, `system` to your | 
|  | machine's OS, and `arch` to `x64`. If you don't want those default filters, you | 
|  | can use `all` for any of those options to not filter by that. | 
|  |  | 
|  | Pass `--help` to the test runner to see the allowed values for all of these. | 
|  |  | 
|  | ### Local configurations | 
|  |  | 
|  | You typically want to run the tests locally using the same configuration that | 
|  | the status results are pulled from. It would be strange, for example, to use the | 
|  | status of some dart2js configuration to define the expected outcome of a batch | 
|  | of VM tests. | 
|  |  | 
|  | But the bots don't always test every configuration and sometimes there is a | 
|  | "nearby" configuration to what you want to run that will work for what you're | 
|  | testing locally. The common case is if you're running tests on a Mac but the bot | 
|  | only tests a Linux config. Results don't often vary by platform, so you can just | 
|  | use the Linux results for your local run. To do that, there is an extra flag: | 
|  |  | 
|  | ```sh | 
|  | $ ./tools/sdks/dart-sdk/bin/dart tools/test.dart -n analyzer-asserts-linux \ | 
|  | -N analyzer-asserts-mac | 
|  | ``` | 
|  |  | 
|  | The `-n` (lowercase) flag says "use the results for this config". The `-N` | 
|  | (uppercase) flag says "run this configuration locally". By default, when you | 
|  | omit `-N`, it runs the same config locally that it gets results from. | 
|  |  | 
|  | ## Running tests on the bots and commit queue | 
|  |  | 
|  | Once you have some code working to your satisfaction, the next step is getting | 
|  | it landed. This is where the bots come into play. Your first interaction with | 
|  | them will be the **trybots** on the **commit queue**. When you send a change out | 
|  | for code review, a human must approve it before you can land it. But it must | 
|  | also survive a gauntlet of robots. | 
|  |  | 
|  | The commit queue is a system that automatically runs your not-yet-landed change | 
|  | on a subset of all of the bots. This subset is the trybots. They run a selected | 
|  | set of configurations to balance good coverage against completing in a timely | 
|  | manner. If all of those tests pass the change can land on main. Here "pass" | 
|  | means that the test result is *the same as it was before your change.* A | 
|  | *change* in status is what's considered "failure". | 
|  |  | 
|  | Of course, many times getting something working and changing the outcome of a | 
|  | test from failing to passing is your exact goal! In that case, you can *approve* | 
|  | the changes. This is how a human tells the bots that the change in status is | 
|  | deliberate and desired. | 
|  |  | 
|  | This workflow is still in flux as part of the move away from status files. | 
|  | **TODO: Rewrite or remove this once the approval workflow is gone or complete.** | 
|  |  | 
|  | Once you please all of the trybots, the change can land on main. After that | 
|  | *all* of the bots pick up the change and run the tests on all of the supported | 
|  | configurations against your change. Assuming we've picked a good set of trybots, | 
|  | these should all pass. But sometimes you encounter a test that behaves as | 
|  | expected on the subset of trybots but still changes the behavior on some other | 
|  | configuration. So failures can happen here and the bots "turn red". When this | 
|  | happens, you'll need to either approve the new outcomes, revert the change, or | 
|  | land a fix. | 
|  |  | 
|  | ## Working on tests | 
|  |  | 
|  | Your job may mostly entail working on a Dart implementation, but there's still | 
|  | a good chance you'll end up touching the tests themselves. If you are designing | 
|  | or implementing a new feature, you will likely edit or add new tests. Our tests | 
|  | have an unfortunate amount of technical debt, including bugs, so you may end up | 
|  | finding and needing to fix some of that too. | 
|  |  | 
|  | SDK tests are sort of like "integration tests". They are complete Dart programs | 
|  | that validate some Dart tool's behavior by having the tool run the program and | 
|  | then verifying the tool's externally visible behavior. (Most tools also have | 
|  | their own set of white box unit tests, but those are out of scope for this doc.) | 
|  |  | 
|  | The simplest and most typical test is simply a Dart script that the tool should | 
|  | run without error and then exit with exit code zero. For example: | 
|  |  | 
|  | ```dart | 
|  | import 'package:expect/expect.dart'; | 
|  |  | 
|  | main() { | 
|  | Expect.equals(3, 1 + 2); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | We don't use the ["test"][test pkg] package for writing language and core | 
|  | library tests. Since the behavior we're testing is so low level and fundamental, | 
|  | we try to minimize the quantity and complexity of the code under test. The | 
|  | "test" package is great but it uses tons of language and core library | 
|  | functionality. If you're trying to fix a bug in that same functionality, it's no | 
|  | fun if your test framework itself is broken by the same bug. | 
|  |  | 
|  | [test pkg]: https://pub.dev/packages/test | 
|  |  | 
|  | Instead, we have a much simpler package called ["expect"][expect pkg]. It's [a | 
|  | single Dart file][expect lib] that exposes a rudimentary JUnit-like API for | 
|  | making assertions about behavior. Fortunately, since the behavior we're testing | 
|  | is also pretty low-level, that API is usually sufficient. | 
|  |  | 
|  | [expect pkg]: https://github.com/dart-lang/sdk/tree/main/pkg/expect | 
|  | [expect lib]: https://github.com/dart-lang/sdk/blob/main/pkg/expect/lib/expect.dart | 
|  |  | 
|  | The way it works is that if an assertion fails, it throws an exception. That | 
|  | exception unwinds the entire callstack and a Dart implementation then exits in | 
|  | some failed way that the test runner can detect and determine that a runtime | 
|  | error occurred. | 
|  |  | 
|  | If you are writing asynchronous tests, there is a separate tiny | 
|  | ["async_helper"][async helper] library that talks to the test runner to ensure all | 
|  | asynchronous operations performed by the test have a chance to complete. | 
|  |  | 
|  | [async helper]: https://github.com/dart-lang/sdk/tree/main/pkg/expect/lib/async_helper.dart | 
|  |  | 
|  | With these two libraries, it's straightforward to write tests of expected correct | 
|  | runtime behavior. You can also write tests for validating runtime *failures* by | 
|  | using the helper functions for checking that certain exceptions are thrown: | 
|  |  | 
|  | ```dart | 
|  | import 'package:expect/expect.dart'; | 
|  |  | 
|  | main() { | 
|  | var list = [0, 1]; | 
|  | Expect.throwsRangeError(() { | 
|  | list[2]; | 
|  | }); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | This test correctly passes if executing `list[2]` throws a RangeError. | 
|  |  | 
|  | We also need to pin down the *static* behavior of Dart programs. The set of | 
|  | compile errors are specified by the language and tools are expected to report | 
|  | them correctly. Plain tests aren't enough for that because a test containing a | 
|  | static error can't be run at all. To handle that, the test runner has built-in | 
|  | support for other kinds of tests. They are: | 
|  |  | 
|  | ### Multitests | 
|  |  | 
|  | A simple multitest looks like this: | 
|  |  | 
|  | ```dart | 
|  | // language_2/abstract_syntax_test.dart | 
|  | class A { | 
|  | foo(); //# 00: compile-time error | 
|  | static bar(); //# 01: compile-time error | 
|  | } | 
|  | ``` | 
|  |  | 
|  | Each line containing `//#` marks that line as being owned by a certain multitest | 
|  | section. The identifier after that is the name of the section. It's usually just | 
|  | a number like `00` and `01` here, but can be anything. Then, after the colon, | 
|  | you have an expected outcome. | 
|  |  | 
|  | The test runner takes this file and splits it into several new test files, one | 
|  | for each section, with an extra one for the "no sections". Each file contains | 
|  | all of the unmarked lines as well as the lines marked for a certain section. So | 
|  | the above test gets split into three files: | 
|  |  | 
|  | ```dart | 
|  | // language_2/abstract_syntax_test/none.dart | 
|  | class A { | 
|  |  | 
|  |  | 
|  | } | 
|  | ``` | 
|  |  | 
|  | ```dart | 
|  | // language_2/abstract_syntax_test/00.dart | 
|  | class A { | 
|  | foo(); //# 00: compile-time error | 
|  |  | 
|  | } | 
|  | ``` | 
|  |  | 
|  | ```dart | 
|  | // language_2/abstract_syntax_test/01.dart | 
|  | class A { | 
|  |  | 
|  | static bar(); //# 01: compile-time error | 
|  | } | 
|  | ``` | 
|  |  | 
|  | This is literally done textually. Then the test runner runs each of those files | 
|  | separately. The expectation for the file is whatever was set in its section's | 
|  | marker. The "none" file is always expected to pass. | 
|  |  | 
|  | So, in this case, it expects `none.dart` to compile and run without error. It | 
|  | expects `00.dart` and `01.dart` to report some kind of compile-time error | 
|  | anywhere in the file. | 
|  |  | 
|  | A single file can contain multiple distinct sections which lets you test for | 
|  | multiple different errors in a single file. It can distinguish between reporting | 
|  | a compile-time error versus a runtime error. It's the best we had for a long | 
|  | time, and works pretty well, so there are many many multitests. | 
|  |  | 
|  | But they aren't great. A single test file containing 20 multitest section gets | 
|  | split into 21 files, each of which has to pass through the entire compilation | 
|  | and execution pipeline independently. That's pretty slow. You get better | 
|  | granularity, but still not perfect. As long as *some* error is reported | 
|  | *somewhere* in the file that contains the section's lines, it considers that | 
|  | good enough. | 
|  |  | 
|  | We see a lot of multitests that incorrectly pass because they are supposed to | 
|  | detect some interesting static type error reported by the type checker. But they | 
|  | actually pass because the author had a typo somewhere, which gets reported as a | 
|  | syntax error at compile time. | 
|  |  | 
|  | To try to more precisely and easily write tests for static errors, there is a | 
|  | yet another kind of test... | 
|  |  | 
|  | ### Static error tests | 
|  |  | 
|  | This is the newest form of test. These tests are *only* for validating static | 
|  | errors reported by one of the two front ends: CFE and analyzer. For all other | 
|  | configurations, they are automatically skipped. | 
|  |  | 
|  | The previous multitest converted to a static error test looks like this: | 
|  |  | 
|  | ```dart | 
|  | class A { | 
|  | //    ^ | 
|  | // [cfe] The non-abstract class 'A' is missing implementations for these members: | 
|  | foo(); | 
|  | //^^^^^^ | 
|  | // [analyzer] STATIC_WARNING.CONCRETE_CLASS_WITH_ABSTRACT_MEMBER | 
|  |  | 
|  | static bar(); | 
|  | //            ^ | 
|  | // [analyzer] SYNTACTIC_ERROR.MISSING_FUNCTION_BODY | 
|  | // [cfe] Expected a function body or '=>'. | 
|  | } | 
|  | ``` | 
|  |  | 
|  | Each group of adjacent line comments here defines an **error expectation**. The | 
|  | first comment line defines the error's location. The line is the preceding line, | 
|  | the column is the column containing the first caret, and the length is the | 
|  | number of carets. If the preceding line is itself part of some other error | 
|  | expectation, it will be skipped over, so you can define multiple errors that | 
|  | are reported on the same line: | 
|  |  | 
|  | ```dart | 
|  | int i = "not int" / 345; | 
|  | //      ^^^^^^^^^ | 
|  | // [analyzer] STATIC_WARNING.SOME_ERROR | 
|  | //                  ^^^ | 
|  | // [analyzer] STATIC_WARNING.ANOTHER_ERROR | 
|  | ``` | 
|  |  | 
|  | In cases where the location doesn't neatly fit into this syntax—it either | 
|  | starts before column 2 or spans multiple lines—an explicit syntax can be | 
|  | used instead: | 
|  |  | 
|  | ```dart | 
|  | var obj1 = [...(123)]; | 
|  | // [error line 1, column 17, length 3] | 
|  | // [analyzer] CompileTimeErrorCode.AMBIGUOUS_SET_OR_MAP_LITERAL_BOTH | 
|  | // [cfe] Error: Unexpected type 'int' of a spread.  Expected 'dynamic' or an Iterable. | 
|  | ``` | 
|  |  | 
|  | After the location comment line are line defining how analyzer and CFE | 
|  | should report the error. First, a line comment starting with `[analyzer]` | 
|  | followed by an analyzer error code specifies that analyzer should report an | 
|  | error at this location with this error code. If omitted, analyzer is not | 
|  | expected to report this error. | 
|  |  | 
|  | Finally, a line comment starting with "[cfe] " followed by an error message | 
|  | specifies that CFE should report an error with the given text at this location. | 
|  | If omitted, the CFE is not expected to report an error here. If the CFE error | 
|  | message is longer than a single line, you can have further line comments after | 
|  | the initial `// [cfe]` one: | 
|  |  | 
|  | ```dart | 
|  | var obj1 = [...(123)]; | 
|  | //             ^^^^^ | 
|  | // [cfe] Error: Unexpected type 'int' of a spread. | 
|  | // Expected 'dynamic' or an Iterable. | 
|  | // Another line of error message text. | 
|  | ``` | 
|  |  | 
|  | When the test runner runs a test, it looks for and parses all error | 
|  | expectations. If it finds any, the test is a static error test. It runs the test | 
|  | once under the given configuration and validates that it produces all of the | 
|  | expected errors at the expected locations. The test passes if all errors are | 
|  | reported at the right location. With the analyzer front end, all expected error | 
|  | codes must match. For the CFE, error messages must match. | 
|  |  | 
|  | Using a single invocation to report all errors requires our tools to do decent | 
|  | error recovery and detect and report multiple errors. But our users expect that | 
|  | behavior anyway, and this lets us validate that and get a significant | 
|  | performance benefit. | 
|  |  | 
|  | #### Web static errors | 
|  |  | 
|  | Most static errors are reported by all Dart tools. The implementation of that | 
|  | error reporting either comes from analyzer or the CFE and the above two static | 
|  | error categories `[analyzer]` and `[cfe]` cover almost all static error tests. | 
|  |  | 
|  | However, the web compilers DDC and dart2js also have a few additional static | 
|  | restrictions they place on Dart. Code that fails to meet those restrictions | 
|  | produces compile-time errors. We have static error tests for those too, and | 
|  | they use a separate `[web]` marker comment. Those tests are run on DDC and | 
|  | dart2js to ensure the error is reported. | 
|  |  | 
|  | #### Divergent errors | 
|  |  | 
|  | Since we don't validate analyzer error messages or CFE error codes, there is | 
|  | already some flexibility when the two front ends don't report identical errors. | 
|  | But sometimes one front end may not report an error at all, or at a different | 
|  | location. | 
|  |  | 
|  | To support that, the error code or message can be omitted. An error expectation | 
|  | with no error code is ignored on analyzer. Conversely one with no message is | 
|  | ignored on CFE. Note in the above example that CFE reports the error about | 
|  | `foo()` on the class declaration while analyzer reports it at the declaration of | 
|  | `foo()` itself. Those are two separate error expectations. | 
|  |  | 
|  | #### Unspecified errors | 
|  |  | 
|  | A good tool takes into account the processes around it. One challenge for us is | 
|  | that tests for new errors are often authored before the implementations exist. | 
|  | In the case of co19, those tests are authored by people outside of the team in a | 
|  | separate Git repo. At that point in time, we don't know what the precise error | 
|  | code or message will be. | 
|  |  | 
|  | To make this a little less painful, we allow an error to be "unspecified": | 
|  |  | 
|  | ```dart | 
|  | var obj1 = [...(123)]; | 
|  | //             ^^^^^^ | 
|  | // [analyzer] unspecified | 
|  | // [cfe] unspecified | 
|  | ``` | 
|  |  | 
|  | The special word `unspecifed` in place of an error code or message means the | 
|  | error must be reported on the given line, but that any code or message is | 
|  | acceptable. Also, the column and length information is ignored. That enables | 
|  | this workflow: | 
|  |  | 
|  | 1.  Someone adds a test for an unimplemented feature using `unspecified` lines. | 
|  | 2.  The analyzer or CFE team implements the feature and lands support. The tests | 
|  | start passing. | 
|  | 3.  In order to pin down the precise behavior now that it's known, someone goes | 
|  | back and replaces `unspecified` with the actual error code or message. Now, | 
|  | any minor change will cause the test to break so we notice if, for example, | 
|  | a syntax error starts masking a type error. | 
|  | 4.  The other front end team does the same. | 
|  | 5.  Now the test is fully precise and passing. | 
|  |  | 
|  | The important points are that the tests start passing on step 2 and continue to | 
|  | pass through the remaining steps. The syntax above is chosen to be easily | 
|  | greppable so we can keep track of how many tests need to be made more precise. | 
|  |  | 
|  | #### Automatic error generation | 
|  |  | 
|  | Step three in the above workflow is a chore. Run a test on some front end. Copy | 
|  | the error output. Paste it into the test file in the right place. Turn it into a | 
|  | comment. This needs to happen both for new tests and any time the error | 
|  | reporting for a front end changes, which is frequent. The analyzer and CFE folks | 
|  | are always improving the usability of their tools by tweaking error messages and | 
|  | locations. | 
|  |  | 
|  | To make this less painful, we have a tool that will automatically take the | 
|  | current output of either front end and insert the corresponding error | 
|  | expectation into a test file for you. It can also remove existing error | 
|  | implementations. | 
|  |  | 
|  | The tool is: | 
|  |  | 
|  | ```sh | 
|  | $ dart pkg/test_runner/tool/update_static_error_tests.dart -u "**/abstract_syntax_test.dart" | 
|  | ``` | 
|  |  | 
|  | It takes a couple of flags for what operation to perform (insert, remove, | 
|  | update) and for what front end (analyzer, CFE, both) followed by a glob for | 
|  | which files to modify. Then it goes through and updates all of the matching | 
|  | tests. | 
|  |  | 
|  | This should give us very precise testing of static error behavior without | 
|  | much manual effort and good execution performance. |