2017-10-13 @floitschG
Welcome to the Dart Language and Library Newsletter.
Last week we discussed the evaluation order of method calls, like o.foo(bar())
. As some readers correctly pointed out, the section was partially misleading and sometimes based on wrong assumptions. Thanks for the feedback.
We claimed that there was an exception to the left-to-right evaluation in the spec. That's misleading: the is spec is careful (and it uses a lot of otherwise unnecessary machinery) to say that in e.foo(e1, ..., eN)
, e.foo
is not an expression that is evaluated. As such, it is meaningless to talk about the “order of evaluation” of method calls, when there are no expressions that are evaluated.
Even after our change - evaluating getters first when they are used as call-targets - it is not meaningful to talk about the evaluation order of expressions, since o.foo
still isn't an expression.
We also claimed that the expected behavior was to evaluate from left to right. We missed the fact that some method-based languages only look up the method before doing the call. C# and Java, both evaluate the arguments before reporting null
errors. In contrast, JavaScript and C++ (at least for virtual functions in gcc) report null
errors before evaluating the arguments to the call.
JavaScript actually changed their behavior in ECMAScript 5.0 (see es5). The change was that GetValue(ref)
in step 2 is now evaluated before evaluating the arguments. The reason for this change was for similar reasons: initially, ECMAScript did not have getters, but with getters it made sense to evaluate the GetValue(ref)
first, so that the evaluation went from left to right.
Fundamentally, the question boils down to: Is o.foo(bar())
the same as (o.foo)(bar())
?
With the proposed change it will be for most programs. There are some issues with noSuchMethod
when a function is invoked with the wrong number of arguments, but this corner-case is only relevant in broken programs.
Dart has two number types: integers and floating-point numbers. For the latter, Dart uses the IEEE double-precision floating-point type. There are some interesting alternatives, like unums, but these are not directly supported by our CPUs and are not yet universally available.
In this section I will focus on Dart's toString
methods of double
. There are 4 methods:
toString
toStringAsExponential
toStringAsFixed
toStringAsPrecision
The double.toString
method is the most commonly used way of converting a double to a string. It uses a mixture of toStringAsExponential
and toStringAsFixed
to provide the most visibly appealing output. For large numbers (more than 20 digits) it emits an exponential representation as in 1.3e+21
. The same is true for very small floating-point numbers, strictly smaller than the one represented by 0.000001
. For example 0.000000998
is printed as 9.98e-7
. All other numbers are printed in the “normal” decimal representation: 5.0
, 1.23
, or 0.001
.
The other methods just force a specific representation. I encourage readers to experiment with them, but I won't explain them in this section.
This is, however, only the easy part of printing a floating-point number. The much harder part is to compute the individual digits. Converting floating-point numbers to strings is surprisingly hard. Not only do floating-point numbers have multiple possible string representations, computing them requires some thought.
In general, the minimal requirement we want is that the output of toString
can be parsed back into a floating-point number with double.parse
and yields the same number. This is called the internal identity requirement. For a long time, C libraries were known to not even satisfy this requirement. Lack of communication (no Internet) was probably a big reason why it took the libraries so long.
The first known publication of a correct algorithm was alread in 1984. In J. T. Coonen's there is a correct (and very efficient) algorithm for correct conversions. That thesis (or at least the algorithm in it) went largely unnoticed...
It took until 1990 when G. L. Steele Jr. and J. L. White published their printing-algorithm (called “Dragon”) at a big conference (PLDI) that C libraries and languages finally started to do the “right” thing. (Note that a draft of the paper existed for a long time and was even mentioned in “Knuth, Volume II” in 1981).
There are two reasons why converting floating-point numbers to strings is difficult:
The term “floating-point” refers to the fact that the decimal point can “float”. It‘s helpful to look at examples in decimal representation. Imagine a floating-point number type that allows for 5 decimal digits, and 2 exponent digits. This would allow for numbers in the range 00001e-99
to 99999e99
(and, of course, 0
itself). This covers a really big range, but the precision is often pretty poor. The best approximation of the number 123456789
would be 12346e04
. Let’s see what happens, if we add 1 to that number. The correct result would be 123460001
. Since that number doesn't fit, we need round again, and end up with 12346e04
again. This means that the 1
was lost due to imprecision. Readers can try the same experiment in IEEE double floating point by writing 1e20 + 1
. The 1
is simply lost.
For a long time algorithms used doubles themselves to compute the digits of another double. However, this can lead to imprecision, because every operation on the input introduces rounding errors.
The correct way to compute the digits of a double is to use a number type that has a higher precision than the input. There are algorithms that manage to compute the correct output with extended precision floating-point numbers (aka “long doubles”). Other algorithms use 64 bit integers (which is more precise than the 53 bit precision of doubles).
While a slightly bigger data type is enough to compute the correct result, it doesn't always yield the optimal string representation. There are multiple representations for every floating-point number and some require more work than others.
Floating-point numbers have a precise value, but, for a specific configuration, are spaced out, so that two consecutive numbers always have a gap between them. Getting back to our decimal floating point numbers, we can see that the two consecutive numbers 12345e42
and 12346e42
are at a distance of 1e42
from each other. For simplicity we think of floating-point numbers as representing intervals. Any real value x
that lies closer to one floating-point number f
than to the neighboring numbers is part of the interval represented by f
. One of the qualities of floating-point numbers is that the gap between two consecutive numbers (and thus the interval) is dependent on the value of the number: the smaller the number, the smaller the gap.
For a string-conversion this means that any output that lies within the interval satisfies the internal identity requirement. Programming languages then have to decide which properties they value when a user asks for the conversion.
C libraries tend to provide the most precise representation, but let users decide how many of those digits they want.
Languages that were designed after 1990 generally tend to prefer representations that use the shortest amount of precision digits. This includes Java, C#, ECMAScript, and Dart.
Let's look at an example: 0.3
.
The most precise representation of this number is: 0.299999999999999988897769753748434595763683319091796875
.
In C users can now ask for a specific amount of digits (6 by default):
#include <stdio.h> int main() { printf("%f\n", 0.3); // => 0.300000 printf("%.54f\n", 0.3); // => 0.299999999999999988897769753748434595763683319091796875 }
Modern C libraries make it possible to print floating-point numbers with any precision, but they don't tell users where they can stop. Modern (C11) libraries provide macros (like DBL_DECIMAL_DIG
) to give users a way to emit at least enough digits to be able to read them in again, but in many cases (like 0.3
) there are shorter valid representations, and there is no way in C to just emit the shortest valid representation.
In Dart, 0.3.toString()
simply returns "0.3"
. That doesn't mean that Dart (like ECMAScript) will always return the shortest representation. For example 0.00001
is converted to "0.00001"
and not "1e-5"
which would be shorter. However, the number of significant digits is chosen to be minimal.
While the algorithm for finding these digits efficiently is non-trivial, the idea is straightforward: given that the floating-point number represents an interval, one just needs to look through every possible string representation in that interval and look for the one that is:
This sounds complicated, but is actually very simple for humans. Let's take a few examples:
double d = 0.3. Lower boundary: 0.2999999999999999611421941381195210851728916168212890625 Exact value: 0.299999999999999988897769753748434595763683319091796875 Upper boundary: 0.3000000000000000166533453693773481063544750213623046875 double d = 0.1234567 Lower boundary: 0.123466999999999986481480362954243901185691356658935546875 Exact value: 0.1234669999999999934203742668614722788333892822265625 Upper boundary: 0.123467000000000000359268170768700656481087207794189453125
Given the lower and upper boundary for these numbers, humans have no trouble finding the shortest that lies inside the interval. Once the leading digits differ, we can discard all remaining digits. The reason the algorithms are complicated is due to corner cases (like when the shortest representation lies exactly on the boundary), and because of the desire to use data types that are small and efficient.
As part of the language and library team we frequently get requests like “can we not just simply add feature X” to the language or libraries. In this section I will explain why even simple requests often require much more thought and why they often take so much time.
One of the fundamental qualities of a language and library is a consistent feel. This requires that library and language design is done holistically. Features that work in other languages might need a different design to fit into Dart, or should be provided through completely different means. I will provide some examples in the remainder of this section. I want to emphasize that these examples are chosen because they show how many features require to think about many other unrelated and often more complex features. They are not necessarily planned, and some might have already been rejected.
Let's start with the simple request that some classes should have a way to serialize themselves to JSON.
There are immediately several questions:
value
type that is serializable and let class
types stay non-serializable.In some sense, Dart already supports one way of serializing to JSON: the toJson
method. For instances that the JSON encoder doesn't recognize, it simply invokes toJson
to make the instance encode itself in a way that is suitable for JSON.
Looking at the questions above, we have:
JSON.encode(randomObject)
works as long as the randomObject
has a toJson
.toEncodable
parameter, but that's an additional hook and not toJson
.toJson
specifies.Dart features another way of serializing instances: mirrors. Using reflection libraries can inspect any object. This makes it trivial to write a library that looks at each instance, gets its fields and emits them. The responses to the questions are now completely different:
We could imagine a better version of mirrors, let‘s call it dart:reflect
that is more suitable for compilation. If it is similar to the reflect
package it would require annotations on a class to make it reflectable (and thus serializable). Depending on the concrete implementation this could solve the performance problem, but might come with the limitation that external users wouldn’t be able to provide serializations for classes they didn't write. A new dart:reflect
library would, in any case, be a major undertaking.
An alternative would be to provide a new “class” type that automatically provides serialization methods. For example, we could add a value
keyword that takes the place of class
:
value Point { int x; int y; }
This value
classes would not need any toJson
and (at least in the example) would also come automatically with a constructor. We could even add a hashCode
and ==
implementation that uses the fields.
Once we have such a class, we need to think about value types. That is, should these instances have an identity or not? Should we specify these value
classes in such a way that it would be possible to pass them on the stack?
If yes, would these classes have mutable or non-mutable fields? If they were mutable, then their semantic would differ from normal classes:
var p = new Point(1, 2); var p2 = p; // Value type -> copy. p2.x = 3; print(p); // 1, 2 print(p2); // 3, 2
Without mutable fields the behavior for value types and non-value types would only be visible when using identical
.
Similarly, value types have lots of questions with respect to Object
(are they Object
s or not?), and how lists of them should be implemented in the VM.
When talking about value types we should also think about tuples. Tuples are (for this document) just a collection of values with different types. For example a Tuple<int, String>
would be a class with two fields (let's just call them fst
and snd
) of types int
and String
. It would make sense to make tuples value types.
One could even imagine that value
is just a layer on top of tuples. That is, that every value
class is just a different view of a similarly typed tuple.
value Point { int x; int y; bool get isOrigin => x == 0 && y == 0; } value Vector { int x; int y; int get length => sqrt(x*x + y*y); } var p = new Point(3, 4); print(p.isOrigin); // false. Tuple<int, int> t = p; // It's the same thing. Vector v = t; // Still the same thing. v.print(v.length); // 5.
This approach would reduce the output code size since many value types could be mapped to the same tuple. However, they would make our serialization go out of the window. The serialization function would only be able to see that the receive object was a tuple, but not which one.
String serialize(Object o) { if (o is Tuple) { /* now what? */ } ... }
That doesn't mean that the value/tuple approach is doomed. We could provide static descriptions for classes that users could pass to the serialization function. Every value
class could have a static field shape
that returns the shape of the value type (or even for every class):
var p = new Point(3, 4); var shape = Point.shape; serialize(p, shape); String serialize(Object o, Shape shape) { if (o is int) print(o); if (o is String) ...; // A non-trivial object: print("{"); for (field in shape.fields) { print("${field.name}: ${serialize(field.value(o), field.shape)},"); } print("}"); }
This would require that every call to serialize
knew the static type of the object that should be serialized. It wouldn't work if the serialization needed to do dynamic is
-checks. This sounds like a big restriction, but in practice, serialization often happens on concrete types, or the type of the object can be determined without is
checks.
In some sense this static value
type would also be very similar to extension methods. They would almost be extension classes. This means that we should look at extension methods too, and make sure that they would fit.
The exploration doesn‘t really stop here, but let’s have a look at another, completely different alternative for serialization: macros.
If Dart had macros, then users could just write the value
type themselves.
define-macro value ID { CLASS_BODY }; Ast value(Identifier id, ClassBody body) { ... }
This is, admittedly, just a very hypothetical macro system, but it would immediately remove the need for a value
type that provides the toJson
method. In fact, a good macro system makes many language features unnecessary, since they can just be implemented on top of macros.
The disadvantage of macros is that they can get complicated to get right (even Scheme has two variants) and that they are almost too powerful. They would definitely be a major undertaking for Dart.
This section is getting long, so I won't cover other alternatives, but I hope I showed how language features can have dependencies that overlap. For each of the related features we would now explore other alternatives and see if they would conflict with a specific direction.
As mentioned in the beginning of this section, we value a consistent feel of the language. This means that these explorations are very important to make sure that things fit together. This is also the reason that features that work well in other languages might not be a great fit for Dart. We need to evaluate each feature and see how it behaves with the current and the future planned features of Dart.