Dart Language and Library Newsletter

2017-09-29 @floitschG

Welcome to the Dart Language and Library Newsletter.

Did You Know?

JSON Encoding

The dart:convert library has support to convert Dart objects to strings. By default only the usual JSON types are supported, but with a toEncodable (see the JsonEncoder constructor) all objects can be encoded.

Example:

import 'dart:convert';

class Contact {
  final String name;
  final String mail;
  const Contact(this.name, this.mail);
}

main() {
  dynamic toEncodable(dynamic o) {
    if (o is DateTime) return ["DateTime", o.millisecondsSinceEpoch];
    if (o is Contact) return ["Contact", o.name, o.mail];
    return o;
  }

  var data = [
    new DateTime.now(),
    new Contact("Sundar Pichai", "sundar@google.com")
  ];

  var encoder = new JsonEncoder(toEncodable);
  print(encoder.convert(data));
}

This program prints: [["DateTime",1506703358743],["Contact","Sundar Pichai","sundar@google.com"]] (at least if you print it at the exact same time as I did).

Another option is to provide a toJson method to classes that should be encodable, but that has two downsides:

  1. It doesn‘t work for classes one can’t modify (like the DateTime in our example).
  2. It forces a specific encoding. Users might want to encode the same object differently depending on where the encoding happens. For the same object, one RPC call might require a different JSON, than another.

When the JSON encoding fails, a JsonUnsupportedObjectError is thrown.

This error class provides rich information to help finding the reason for the unexpected object.

Let's modify the previous example:

main() {
  var data = {
    "key": ["nested", "list", new Contact("Sundar Pichai", "sundar@google.com")]
  };
  var encoder = new JsonEncoder();
  print(encoder.convert(data));
}

Without a toEncodable function, the encoding now fails:

// On 1.24
Unhandled exception:
Converting object to an encodable object failed.


// On 2.0.0
Unhandled exception:
Converting object to an encodable object failed: Instance of 'Contact'

We recently (with Dart 2.0.0-dev) improved the output of the error message so it contains the name of the class that was the culprit of the failed conversion. For older versions, this information was already available in the cause field:

 try {
    print(encoder.convert(data));
  } on JsonUnsupportedObjectError catch (e) {
    print("Conversion failed");
    print("Root cause: ${e.cause}");
  }
}

which outputs:

Conversion failed
Root cause: NoSuchMethodError: Class 'Contact' has no instance method 'toJson'.
Receiver: Instance of 'Contact'
Tried calling: toJson()

The cause field is the error the JSON encoder caught while it tried to convert the object. The default toEncodable method tries to call toJson on any object that hasn‘t one of the JSON types, and, in this case, the Contact class didn’t support this message.

The unsupportedObject field contains the culprit. Aside from simply printing the object, this also makes it possible to inspect it with a debugger.

Finally, there is a new field in Dart 2.0.0: partialResult.

// Only in Dart 2.0.0:
print("Partial result: ${e.partialResult}");

yields: Partial result: {"key":["nested","list",.

Usually, this really simplifies finding the offender in the source code.

Fixed-Size Integers

As part of our efforts to improve the output of AoT-compiled Dart programs we are limiting the size of integers. With Dart 2.0, integers in the VM will be 64 bit integers, and not arbitrary-sized anymore. Among all the investigated options, 64-bit integers have the smallest migration cost, and provide the most consistent and future-proof API capabilities.

Motivation

Dart 1 has infinite-precision integers (aka bignums). On the VM, almost every number-operation must check if the result overflowed, and if yes, allocate a next-bigger number type. In practice this means that most numbers are represented as SMIs (Small Integers), a tagged number type, that overflow into “mint”s (medium integers), and finally overflow into arbitrary-size big-ints.

In a jitted environment, the code for mints and bigints can be generated lazily during a bailout that is invoked when the overflow is detected. This means that almost all compiled code simply has the SMI assembly, and just checks for overflows. In the rare case where more than 31/63 bits (the SMI size on 32bit and 64bit architectures) are needed, does the JIT generate the code for more number types. For precompilation it's not possible to generate the mint/bigint code lazily, and all code must get emitted eagerly, increasing the size of the output.

Also, fixed-size integers play very nicely with non-nullability. Contrary to the JIT, the AoT compiler can do global analyses (or simply use the provided information when non-nullability makes it into Dart) and optimize programs accordingly.

If an integer is known to be not null the AoT compiler can remove the null check. With fixed-size integers, the compiler can then furthermore always transmit the value in a register or the stack (assuming both sides agree). Not only does this remove the SMI check, it also makes the GC faster, since it knows that the value cannot be a pointer. Without fixed-size integers the value could be in a mint or bigint box.

Experiments on Vipunen (an experimental Dart AoT compilation pipeline) have shown that the combination of non-nullability and fixed-size integers (unsurprisingly) yields the best results.

Semantics

An int represents a signed 64-bit two's complement integer. They have the following properties:

  • Integers wrap around, or worded differently, all operations are done modulo 2^64.
  • Integer literals must fit into the signed 64-bit range. For convenience, hexadecimal literals, such as 0xFFFFFFFFFFFFFFFF are also valid if they fit the unsigned 64-bit range.
  • The << operator is specified to shift “out” bits that leave the 64-bit range. - A new >>> operator is added to support “unsigned” right-shifts and is added as const-operation.

Dart 2.0's integers wrap around when they over/underflow. We considered alternatives, such as saturation, unspecified behavior, or exceptions, but found that wrap-around provides the most convenient properties:

  1. It's efficient (one CPU instruction on 64-bit machines).
  2. It makes it possible to do unsigned int64 operations without too much code. For example, addition, subtraction, and multiplication can be done on int64s (representing unsigned int64 values) and the bits of the result can then simply be interpreted as unsigned int64.
  3. Some architectures (for example RISC-V) don't support overflow checks. (See https://www.slideshare.net/YiHsiuHsu/riscv-introduction slide 12).

Compatibility & JavaScript

When compiling to JavaScript, we continue to use JavaScript's numbers. Implementing real 64 bit integers in JavaScript would degrade performance and make interop with existing JavaScript and the DOM much harder and slower.

With the exception of possible restrictions on the size of literals (only allowing literals that don't lose precision), there are no plans to change the behavior of numbers when compiling to JavaScript. These limitations are still under discussion.

Unfortunately, this also means that packages need to pay attention when developing for both the VM and the web. Their code might not behave the same on both platforms. These problems do exist already now, and there is unfortunately not a good solution.

Backwards-compatibility wise, this change has very positive properties: it is non-breaking for all web applications, and only affects VM programs that use integers of 65+ bits. These are relatively rare. In fact, many common operations will get simpler on the VM, since users don't need to think about SMIs anymore. For example, users often bit-and their numbers to ensure that the compiler can see that a number will never need more than a SMI. A typical example would be the JenkinsHash which has been modified to fit into SMIs:

/**
 * Jenkins hash function, optimized for small integers.
 * Borrowed from sdk/lib/math/jenkins_smi_hash.dart.
 */
class JenkinsSmiHash {
  static int combine(int hash, int value) {
    hash = 0x1fffffff & (hash + value);
    hash = 0x1fffffff & (hash + ((0x0007ffff & hash) << 10));
    return hash ^ (hash >> 6);
  }

  static int finish(int hash) {
    hash = 0x1fffffff & (hash + ((0x03ffffff & hash) << 3));
    hash = hash ^ (hash >> 11);
    return 0x1fffffff & (hash + ((0x00003fff & hash) << 15));
  }
  ...
}

For applications that compile to JavaScript this function would still be useful, but in a pure VM/AoT context the hash function could be simplified or updated to use a performant 64 bit hash instead.

Comparison to other Platforms

Among the common and popular languages we observe two approaches (different from bigints and ECMAScript's Number type):

  1. int having 32 bits.
  2. architecture specific integer sizes.

Java, C#, and all languages that compile onto their VMs use 32-bit integers. Given that Java was released in 1994 (JDK Beta), and C# first appeared in 2000, it is not surprising that they chose a 32 bit integer as default size for their ints. At that time, 64 bit processors were uncommon (the Athlon 64 was released in 2003), and a 32-bit int corresponds to the equivalent int type in the popular languages at that time (C, C++ and Pascal/Delphi).

32 bits are generally not big enough for most applications, so Java supports a long type. It also supports smaller sizes. However, contrary to C#, it only features signed numbers.

C, C++, Go, and Swift support a wide range of integer types, going from uint8 to int64. In addition to the specific types (imposing a specific size), Swift also supports Int and UInt which are architecture dependent: on 32-bit architectures an Int/UInt is 32 bits, whereas on a 64-bit architecture they are 64 bits.

C and C++ have a more complicated number hierarchy. Not only do they provide more architecture-specific types, short, int, long and long long, they also provide fewer guarantees. For example, an int simply has to be at least 16 bits. In practice shorts are exactly 16 bit, ints exactly 32 bits, and long long exactly 64 bits wide. However, there are no reasonable guarantees for the long type. See the cppreference for a detailed table. This is, why many developers use typedefs like int8, uint32, instead of the builtin integer types.

Python uses architecture-dependent types, too. Their int type is mapped to C's long type (thus guaranteeing at least 32 bits, but otherwise being dependent on the architecture and the OS). Its long type is an unlimited-precision integer.

Looking forward, Swift will probably converge towards a 64-bit integer world since more and more architectures are 64 bits. Apple's iPhone 5S (2013) was the first 64-bit smartphone, and Android started shipping 64-bit Androids in 2014 with the Nexus 9.

Further Reading

The corresponding informal specification for this change contains more details and discussions of other alternatives that were considered.