blob: 26d9e2002c0e8fe066dbc7f6ea6525e7db1b959d [file] [log] [blame] [view]
> [!IMPORTANT]
> This page was copied from https://github.com/dart-lang/sdk/wiki and needs review.
> Please [contribute](../CONTRIBUTING.md) changes to bring it up-to-date -
> removing this header - or send a CL to delete the file.
---
A programming language and its core libraries sits near the bottom of a
developer's technology stack. Because of that, they expect it to be reliable and
bug free. They expect new releases of the Dart SDK to not break any of their
existing programs, even in rare, strange edge cases. At the scale of Dart today,
even the darkest corner of the language or libraries still has users who care
about its behavior.
We engineer that reliability using automated tests. The tests for the language
and core libraries, which is what this doc is about, live in the SDK repo under
`tests/`. As of this writing, there are 25,329 Dart files in there, containing
over 1.5 million lines of code and over 95,000 individual assertions. Making
this even harder is that Dart includes a variety of tools and
implementations. We have multiple compilers, runtimes, and static analyzers.
Each supports several platforms and options. All told, we currently have 476
supported, tested configurations.
Since our testing needs are both complex and unique, we have several custom test
formats and our own test runner for executing those tests. This doc explains how
to work with tests and use the test runner.
## Concepts
There are several workflows that share the same concepts and terms, so let's get
those out of the way first.
* **"test.py"**, **"test.dart"**, **test_runner** - The tool that runs tests.
For many years, it had no real name, so was simply called "test.py" after
the main entrypoint script used to invoke it. (It was originally written in
Python before we had any functioning Dart implementations.) When we started
migrating away from status files, we created a new entrypoint script,
"test.dart". The pub package that contains all of the real code is named
"test_runner" and lives at `pkg/test_runner`.
* **Test suite** - A collection of test files that can be run. Most suites
correspond to a top level directory under `tests/`: `language_2`, `lib_2`,
and `co19_2` are each test suites. There are a couple of special suites
like `pkg` whose files live elsewhere and have special handling in the test
runner.
Directories with the `_2` suffix contain pre-null-safety tests, while the
corresponding suites without `_2` are the null-safe tests. (The `_2`
suffix is a vestige of the migration from Dart 1.0 to Dart 2.0.)
* **Test** - A test is a Dart file (which may reference other files) that the
test runner will send to a Dart implementation and then validate the
resulting behavior. Most test files are named with an `_test.dart` suffix.
* **Configuration** - A specific combination of tools and options that can be
used to invoke a test. For example, "run on the VM on Linux in debug mode"
is a configuration, "compile to JavaScript with DDC and then run in
Chrome", or "analyze using the analyzer package with its asserts enabled".
Each configuration is a combination of architecture, mode, operating system,
compiler, runtime and many other options. There are thousands of potential
combinations. We officially support hundreds of points in that space by
testing them on our bots.
* **Bots**, **BuildBots**, **builders** - The infrastructure for automatically
running the tests in the cloud across a number of configurations. Whenever a
change lands on main, the bots run all of the tests to determine what
behavior changed.
## Expectations, outcomes, status files, and the results database
This concept is so important it gets its own section. The intent of the test
corpus is to ensure that the behavior we ship is the behavior we intend to ship.
In a perfect world, at every commit, every configuration of every tool would
pass every test. Alas, there are many complications:
* Some tests deliberately test the error-reporting behavior of a tool. Compile
errors are also specified behavior, so we have tests to guarantee things
like "this program reports a compile error". In that case "pass" does *not*
mean "executes the program without error" because the intent of the test is
to report the error.
* Some tests validate behavior that is not supported by some configurations.
We have tests for "dart:io", but that library is not supported on the web.
That means all of the "dart:io" tests "fail" on DDC and dart2js, but we of
course intend and expect that to be true.
* Some tests are for features that aren't implemented yet. We're constantly
evolving and at any point in time, there is often some lag between the
state of various tests and implementations.
* Some tests are failing that shouldn't be, but they have been for some time
and we don't currently intend to fix what's causing the failure. We should
at some point, but we don't want this constant failure to drown out other,
newer, unexpected failures.
All this means that it's not as simple as "run test and if no errors then great,
otherwise blow up." At a high level, what we really want is:
* To encode the behavior we *intend* to ship.
* To know what behavior *are* shipping.
* To know when that behavior *changes* and what commits cause it.
### Expectation
You can think of each test as having an intent. A human reading the test's code
(and any special comments in it) should be able to predict the behavior the test
produces. For most tests, the intent is to run to completion without error. For
example:
```dart
import 'package:expect/expect.dart';
main() {
Expect.equals(3, 1 + 2);
}
```
The `Expect.equals()` method throws an exception if the two arguments aren't
equal. So this test will complete silently if `1 + 2` is `3`. Otherwise, it will
throw an uncaught exception, which the test runner detects. So the intent is to
"pass" where "pass" means "run to completion".
Other tests exist to validate the error reporting behavior of the system:
```dart
main() {
int i = "not int"; //# 01: compile-time error
}
```
This test (a "multitest", see below), says that a conforming tool should produce
a compile time error when the second line of code is included.
Before running a test, the test runner parses the test file and infers an
**expectation** from it—what the human author says a tool should do when
it runs the test. By default, the expectation is "pass", but some tests are
expected to produce runtime errors or compile-time errors.
### Outcome
When the test runner runs a test on some tool, the tool does some thing. It can
report a compile error or report a runtime error. Maybe it hangs indefinitely
and the test runner has to shut it down and consider it a time out. If it does
none of those things and exits cleanly, it "passes".
We call this the **outcome**. It's the actual observed behavior of the tool.
### Status
In order to tell if a behavior *changes*, we need to record what the previous
behavior was the last time we ran the test under some configuration. We call
this the test's **status**.
In the past, status was recorded in a separate set of
[**status files**](Status-files.md). These live in the repo under `tests/`.
In order to change the expected
behavior of a test, you had to manually change the status file and then commit
that change. In practice, we found this doesn't scale to the number of tools and
configurations we have today.
Instead, we now store the results of previous test runs in a **results
database**. This database is updated automatically by the bots when tests
complete. Every time a bot runs a test suite for some configuration it gathers
up the outcome of every test and records that in the database, associated with
the commit that it ran against. That database is queried by tools (including the
test runner itself) to determine the status of any test for any supported
configuration, at any point in the commit stream.
#### Viewing the current status
The [current results app](https://dart-current-results.web.app/) shows the
current status of all tests, with the ability to filter by test name (path
prefix).
The [results feed](https://dart-ci.firebaseapp.com/) shows the most recent
changes to test statuses, with a tab for unapproved changes.
### Skips and slows
However, we have not yet completely eliminated status files. They are still
around because they are still the source of truth for some hand-authored data
about what a test *should* do. In particular, some tests don't make sense for a
certain configuration at all and should be **skipped**. For example, there's no
point in running the "dart:io" tests on dart2js.
Today, the place a human says "skip these 'dart:io' tests if the configuration
uses dart2js" is in the *status* files. Likewise, some tests are particularly
slow on some configurations and we want to give the test runner more time before
it declares them timing out. That data is also stored in the status files.
Eventually, we hope to move this data into the test itself.
### Comparison and confusion
The previous three sections lay things out in a nice clean way, but the reality
is much more muddled. This is largely because we end up overloading words and don't have clear ways to talk about the *combinations* of outcome, expectation, and status.
For example, if a test "passes", it could mean:
* The *outcome* is that it completes without error, even though the
*expectation* is that it should not.
* The *outcome* is that it completes without error and the *expectation* is
that it does that.
* The *expectation* is that it produces a compile error and the outcome is
that it correctly produces that error.
* The *outcome* matches the *status*, regardless of what they are.
If a test "fails", it could be:
* The *outcome* is that it reports an error.
* The *outcome* is that it does *not* and the *expectation* is that it
*should*.
* The *outcome* is that it reports an error, the *expectation* is that it
should *not*, but the *status* is that it *does*.
Ugh, I could go on. All this really means is that the combination of outcome,
expectation, and status makes things confusing and you have to be careful when
talking about tests and trying to understand the output of tools. Now that
you're nice and confused...
## How the test runner works
When you invoke the test runner, it walks all of the files in the specified
test suites. For each test file:
1. **Parse the test file to figure out its expectation.** As you'll see below,
there are a enough of special marker comments you can use in a test to tell
the test runner what the intent of the test is.
2. **Figure out what commands to execute with what arguments.** For a simple VM
test, that may be just "run the VM and give it the path to the test file".
For a web test, it's multiple commands: One to compile the test to
JavaScript, and then a separate command to run the result in a JavaScript
environment. There is custom code in the test runner for each pair of
compiler and runtime to do this.
3. **Run the commands and wait for them to complete.** There's a whole little
dependency graph build system in there that knows which commands depend on
which ones and tries to run stuff in parallel when it can.
4. **Analyze the result of the commands to determine the test's outcome.** For
each command the test runner invokes, it has code to parse the command's
output. This is usually some combination of looking at the exit code,
stdout, and stderr.
5. **Compare the expectation to the outcome.** This produces a new value that
the test runner vaguely calls "actual". It's sort of a higher-order outcome
relative to the expectation. We should figure out a better word for it. Some
examples:
```text
expectation + outcome -> actual
--------------------------------------------------
Pass + Pass Pass
CompileTimeError + CompileTimeError -> Pass
Pass + CompileTimeError -> CompileTimeError
CompileTimeError + Pass -> MissingCompileTimeError
```
In other words, if the outcome and the expectation align, that's a "pass" in
that the tool does what a human says it's supposed to. If they disagree,
there is some kind of "failure"—either an error was supposed to
reported and wasn't, or an unexpected error occurred.
6. **Compare actual to the status and report any difference.** Now we know what
the tool did relative to what a human said it's supposed to do. Next is
figuring out how that result compares to the tool's *previous* behavior.
If the result and status are the same, the test runner reports the test as
passing. Otherwise, it reports a failure and shows you the both the result
and status. To make things profoundly confusing, it refers to the status as
"expectation".
### A confusing example
The fact that we have three levels of "result" which then get mixed together is
what makes this so confusing, but each level does serve a useful function. The
tools could definitely be clearer about how it's presented.
Here's a maximally tricky example. Let's say we decide to change Dart and make
it a static error to add doubles and ints. You create this test:
```
void main() {
1 + 2.3; //# 00: compile-time error
}
```
You run it on analyzer before the analyzer team has had a chance to implement
the new behavior. This test is brand new, so it has no existing status and
defaults to "Pass". You'll get:
* Expectation: CompileTimeError. That's what the multitest marker comment
means.
* Outcome: Pass. In Dart today, this code has no errors, so the analyzer
doesn't report any compile error.
* Actual: MissingCompileTimeError. There was supposed to be an error reported,
but the tool didn't report any, so the result was a failure to report an
expected error.
* Status: Pass. This is the default status since it's a new test.
Then the test runner prints out something like:
```
FAILED: dart2analyzer-none release_x64 language_2/int_double_plus_test/00
Expected: Pass
Actual: MissingCompileTimeError
```
If you change the status to `MissingCompileTimeError` then it will "pass" and
not print a failure. If, instead, the analyzer team implements the new error,
then the outcome will become CompileTimeError. Since that aligns with the
expectation, the actual becomes Pass. That in turn matches the status, so the
whole test will report as successful.
In short: **When the test runner reports a test as succeeding it means the
difference between the tool's actual behavior and intended behavior has not
changed since the last time a human looked at it.**
## Running tests locally
There are two ways to run tests (cue surprisingly accurate "the old deprecated
way and the way that doesn't work yet" joke).
### Using test.py
The old entrypoint is "test.py":
```sh
./tools/test.py -n analyzer-mac language_2/spread_collections
```
The `-n analyzer-mac` means "run the tests using the 'analyzer-mac' named
configurations". Configurations are defined in the "test matrix", which is a
giant JSON file at `tools/bots/test_matrix.json`. The
`language_2/spread_collections` argument is a "selector". The selector syntax is
a little strange but it's basically a test suite name followed by an optional
somewhat glob-like path for the subset of tests you want to run.
When running the test runner through the `test.py` entrypoint, it does not look
up the current test status from the results database. Instead it just uses the
old status files. This is dubious because those status files are no longer being
maintained, so you will likely get spurious failures simply because the status
is out of date even the tool is doing what it should.
Eventually, this way of running tests should be removed, along with the status
files.
### Using test.dart
The new entrypoint is "test.dart":
```sh
$ ./tools/sdks/dart-sdk/bin/dart tools/test.dart -n analyzer-asserts-mac
```
This ultimately uses the same test runner and works similar to "test.py",
except reads test status from the **results database**.
### Finding a configuration
Reading a several thousand line JSON file to find the name of the configuration
that matches the way you want to run a batch of tests is not fun. Usually, you
know what compiler and runtime you want to test, and you want something that
can run on your local machine. To help you find a configuration that matches
that, the test runner has a `--find-configurations` option.
Pass that, and the test runner prints out the list of configuration names that
match an optional set of filters you provide, which can be any combination of:
```
-m, --mode
Mode in which to run the tests.
-c, --compiler
How the Dart code should be compiled or statically processed.
-r, --runtime
Where the tests should be run.
--nnbd
Which set of non-nullable type features to use.
-a, --arch
The architecture to run tests for.
-s, --system
The operating system to run tests on.
```
If you don't provide them, then `mode` defaults to `release`, `system` to your
machine's OS, and `arch` to `x64`. If you don't want those default filters, you
can use `all` for any of those options to not filter by that.
Pass `--help` to the test runner to see the allowed values for all of these.
### Local configurations
You typically want to run the tests locally using the same configuration that
the status results are pulled from. It would be strange, for example, to use the
status of some dart2js configuration to define the expected outcome of a batch
of VM tests.
But the bots don't always test every configuration and sometimes there is a
"nearby" configuration to what you want to run that will work for what you're
testing locally. The common case is if you're running tests on a Mac but the bot
only tests a Linux config. Results don't often vary by platform, so you can just
use the Linux results for your local run. To do that, there is an extra flag:
```sh
$ ./tools/sdks/dart-sdk/bin/dart tools/test.dart -n analyzer-asserts-linux \
-N analyzer-asserts-mac
```
The `-n` (lowercase) flag says "use the results for this config". The `-N`
(uppercase) flag says "run this configuration locally". By default, when you
omit `-N`, it runs the same config locally that it gets results from.
## Running tests on the bots and commit queue
Once you have some code working to your satisfaction, the next step is getting
it landed. This is where the bots come into play. Your first interaction with
them will be the **trybots** on the **commit queue**. When you send a change out
for code review, a human must approve it before you can land it. But it must
also survive a gauntlet of robots.
The commit queue is a system that automatically runs your not-yet-landed change
on a subset of all of the bots. This subset is the trybots. They run a selected
set of configurations to balance good coverage against completing in a timely
manner. If all of those tests pass the change can land on main. Here "pass"
means that the test result is *the same as it was before your change.* A
*change* in status is what's considered "failure".
Of course, many times getting something working and changing the outcome of a
test from failing to passing is your exact goal! In that case, you can *approve*
the changes. This is how a human tells the bots that the change in status is
deliberate and desired.
This workflow is still in flux as part of the move away from status files.
**TODO: Rewrite or remove this once the approval workflow is gone or complete.**
Once you please all of the trybots, the change can land on main. After that
*all* of the bots pick up the change and run the tests on all of the supported
configurations against your change. Assuming we've picked a good set of trybots,
these should all pass. But sometimes you encounter a test that behaves as
expected on the subset of trybots but still changes the behavior on some other
configuration. So failures can happen here and the bots "turn red". When this
happens, you'll need to either approve the new outcomes, revert the change, or
land a fix.
## Working on tests
Your job may mostly entail working on a Dart implementation, but there's still
a good chance you'll end up touching the tests themselves. If you are designing
or implementing a new feature, you will likely edit or add new tests. Our tests
have an unfortunate amount of technical debt, including bugs, so you may end up
finding and needing to fix some of that too.
SDK tests are sort of like "integration tests". They are complete Dart programs
that validate some Dart tool's behavior by having the tool run the program and
then verifying the tool's externally visible behavior. (Most tools also have
their own set of white box unit tests, but those are out of scope for this doc.)
The simplest and most typical test is simply a Dart script that the tool should
run without error and then exit with exit code zero. For example:
```dart
import 'package:expect/expect.dart';
main() {
Expect.equals(3, 1 + 2);
}
```
We don't use the ["test"][test pkg] package for writing language and core
library tests. Since the behavior we're testing is so low level and fundamental,
we try to minimize the quantity and complexity of the code under test. The
"test" package is great but it uses tons of language and core library
functionality. If you're trying to fix a bug in that same functionality, it's no
fun if your test framework itself is broken by the same bug.
[test pkg]: https://pub.dev/packages/test
Instead, we have a much simpler package called ["expect"][expect pkg]. It's [a
single Dart file][expect lib] that exposes a rudimentary JUnit-like API for
making assertions about behavior. Fortunately, since the behavior we're testing
is also pretty low-level, that API is usually sufficient.
[expect pkg]: https://github.com/dart-lang/sdk/tree/main/pkg/expect
[expect lib]: https://github.com/dart-lang/sdk/blob/main/pkg/expect/lib/expect.dart
The way it works is that if an assertion fails, it throws an exception. That
exception unwinds the entire callstack and a Dart implementation then exits in
some failed way that the test runner can detect and determine that a runtime
error occurred.
If you are writing asynchronous tests, there is a separate tiny
["async_helper"][async pkg] package that talks to the test runner to ensure all
asynchronous operations performed by the test have a chance to complete.
[async pkg]: https://github.com/dart-lang/sdk/tree/main/pkg/async_helper
With these two packages, it's straightforward to write tests of expected correct
runtime behavior. You can also write tests for validating runtime *failures* by
using the helper functions for checking that certain exceptions are thrown:
```dart
import 'package:expect/expect.dart';
main() {
var list = [0, 1];
Expect.throwsRangeError(() {
list[2];
});
}
```
This test correctly passes if executing `list[2]` throws a RangeError.
We also need to pin down the *static* behavior of Dart programs. The set of
compile errors are specified by the language and tools are expected to report
them correctly. Plain tests aren't enough for that because a test containing a
static error can't be run at all. To handle that, the test runner has built-in
support for other kinds of tests. They are:
### Multitests
A simple multitest looks like this:
```dart
// language_2/abstract_syntax_test.dart
class A {
foo(); //# 00: compile-time error
static bar(); //# 01: compile-time error
}
```
Each line containing `//#` marks that line as being owned by a certain multitest
section. The identifier after that is the name of the section. It's usually just
a number like `00` and `01` here, but can be anything. Then, after the colon,
you have an expected outcome.
The test runner takes this file and splits it into several new test files, one
for each section, with an extra one for the "no sections". Each file contains
all of the unmarked lines as well as the lines marked for a certain section. So
the above test gets split into three files:
```dart
// language_2/abstract_syntax_test/none.dart
class A {
}
```
```dart
// language_2/abstract_syntax_test/00.dart
class A {
foo(); //# 00: compile-time error
}
```
```dart
// language_2/abstract_syntax_test/01.dart
class A {
static bar(); //# 01: compile-time error
}
```
This is literally done textually. Then the test runner runs each of those files
separately. The expectation for the file is whatever was set in its section's
marker. The "none" file is always expected to pass.
So, in this case, it expects `none.dart` to compile and run without error. It
expects `00.dart` and `01.dart` to report some kind of compile-time error
anywhere in the file.
A single file can contain multiple distinct sections which lets you test for
multiple different errors in a single file. It can distinguish between reporting
a compile-time error versus a runtime error. It's the best we had for a long
time, and works pretty well, so there are many many multitests.
But they aren't great. A single test file containing 20 multitest section gets
split into 21 files, each of which has to pass through the entire compilation
and execution pipeline independently. That's pretty slow. You get better
granularity, but still not perfect. As long as *some* error is reported
*somewhere* in the file that contains the section's lines, it considers that
good enough.
We see a lot of multitests that incorrectly pass because they are supposed to
detect some interesting static type error reported by the type checker. But they
actually pass because the author had a typo somewhere, which gets reported as a
syntax error at compile time.
To try to more precisely and easily write tests for static errors, there is a
yet another kind of test...
### Static error tests
This is the newest form of test. These tests are *only* for validating static
errors reported by one of the two front ends: CFE and analyzer. For all other
configurations, they are automatically skipped.
The previous multitest converted to a static error test looks like this:
```dart
class A {
// ^
// [cfe] The non-abstract class 'A' is missing implementations for these members:
foo();
//^^^^^^
// [analyzer] STATIC_WARNING.CONCRETE_CLASS_WITH_ABSTRACT_MEMBER
static bar();
// ^
// [analyzer] SYNTACTIC_ERROR.MISSING_FUNCTION_BODY
// [cfe] Expected a function body or '=>'.
}
```
Each group of adjacent line comments here defines an **error expectation**. The
first comment line defines the error's location. The line is the preceding line,
the column is the column containing the first caret, and the length is the
number of carets. If the preceding line is itself part of some other error
expectation, it will be skipped over, so you can define multiple errors that
are reported on the same line:
```dart
int i = "not int" / 345;
// ^^^^^^^^^
// [analyzer] STATIC_WARNING.SOME_ERROR
// ^^^
// [analyzer] STATIC_WARNING.ANOTHER_ERROR
```
In cases where the location doesn't neatly fit into this syntax—it either
starts before column 2 or spans multiple lines—an explicit syntax can be
used instead:
```dart
var obj1 = [...(123)];
// [error line 1, column 17, length 3]
// [analyzer] CompileTimeErrorCode.AMBIGUOUS_SET_OR_MAP_LITERAL_BOTH
// [cfe] Error: Unexpected type 'int' of a spread. Expected 'dynamic' or an Iterable.
```
After the location comment line are line defining how analyzer and CFE
should report the error. First, a line comment starting with `[analyzer]`
followed by an analyzer error code specifies that analyzer should report an
error at this location with this error code. If omitted, analyzer is not
expected to report this error.
Finally, a line comment starting with "[cfe] " followed by an error message
specifies that CFE should report an error with the given text at this location.
If omitted, the CFE is not expected to report an error here. If the CFE error
message is longer than a single line, you can have further line comments after
the initial `// [cfe]` one:
```dart
var obj1 = [...(123)];
// ^^^^^
// [cfe] Error: Unexpected type 'int' of a spread.
// Expected 'dynamic' or an Iterable.
// Another line of error message text.
```
When the test runner runs a test, it looks for and parses all error
expectations. If it finds any, the test is a static error test. It runs the test
once under the given configuration and validates that it produces all of the
expected errors at the expected locations. The test passes if all errors are
reported at the right location. With the analyzer front end, all expected error
codes must match. For the CFE, error messages must match.
Using a single invocation to report all errors requires our tools to do decent
error recovery and detect and report multiple errors. But our users expect that
behavior anyway, and this lets us validate that and get a significant
performance benefit.
#### Web static errors
Most static errors are reported by all Dart tools. The implementation of that
error reporting either comes from analyzer or the CFE and the above two static
error categories `[analyzer]` and `[cfe]` cover almost all static error tests.
However, the web compilers DDC and dart2js also have a few additional static
restrictions they place on Dart. Code that fails to meet those restrictions
produces compile-time errors. We have static error tests for those too, and
they use a separate `[web]` marker comment. Those tests are run on DDC and
dart2js to ensure the error is reported.
#### Divergent errors
Since we don't validate analyzer error messages or CFE error codes, there is
already some flexibility when the two front ends don't report identical errors.
But sometimes one front end may not report an error at all, or at a different
location.
To support that, the error code or message can be omitted. An error expectation
with no error code is ignored on analyzer. Conversely one with no message is
ignored on CFE. Note in the above example that CFE reports the error about
`foo()` on the class declaration while analyzer reports it at the declaration of
`foo()` itself. Those are two separate error expectations.
#### Unspecified errors
A good tool takes into account the processes around it. One challenge for us is
that tests for new errors are often authored before the implementations exist.
In the case of co19, those tests are authored by people outside of the team in a
separate Git repo. At that point in time, we don't know what the precise error
code or message will be.
To make this a little less painful, we allow an error to be "unspecified":
```dart
var obj1 = [...(123)];
// ^^^^^^
// [analyzer] unspecified
// [cfe] unspecified
```
The special word `unspecifed` in place of an error code or message means the
error must be reported on the given line, but that any code or message is
acceptable. Also, the column and length information is ignored. That enables
this workflow:
1. Someone adds a test for an unimplemented feature using `unspecified` lines.
2. The analyzer or CFE team implements the feature and lands support. The tests
start passing.
3. In order to pin down the precise behavior now that it's known, someone goes
back and replaces `unspecified` with the actual error code or message. Now,
any minor change will cause the test to break so we notice if, for example,
a syntax error starts masking a type error.
4. The other front end team does the same.
5. Now the test is fully precise and passing.
The important points are that the tests start passing on step 2 and continue to
pass through the remaining steps. The syntax above is chosen to be easily
greppable so we can keep track of how many tests need to be made more precise.
#### Automatic error generation
Step three in the above workflow is a chore. Run a test on some front end. Copy
the error output. Paste it into the test file in the right place. Turn it into a
comment. This needs to happen both for new tests and any time the error
reporting for a front end changes, which is frequent. The analyzer and CFE folks
are always improving the usability of their tools by tweaking error messages and
locations.
To make this less painful, we have a tool that will automatically take the
current output of either front end and insert the corresponding error
expectation into a test file for you. It can also remove existing error
implementations.
The tool is:
```sh
$ dart pkg/test_runner/tool/update_static_error_tests.dart -u "**/abstract_syntax_test.dart"
```
It takes a couple of flags for what operation to perform (insert, remove,
update) and for what front end (analyzer, CFE, both) followed by a glob for
which files to modify. Then it goes through and updates all of the matching
tests.
This should give us very precise testing of static error behavior without
much manual effort and good execution performance.