Learn to Tango with D Excerpt: Chapter 2

Published June 05, 2008 by Bell, Igesund, Kelly, Parker, posted by Myopic Rhino
Do you see issues with this article? Let us know.
Advertisement
A movie director once made the observation that Charlie Sheen and Emilio Estevez both resemble their father, Martin Sheen, but look nothing like each other. Something similar might be said of programming languages that belong to the C family. They all share certain features of C, but beyond that, they have several differences. For the most part, the differences are to be found in expanded features, such as inherent support for object-oriented or generic programming, automatic memory management, or features intended to make programs more secure and robust. But at the core of each of these languages is a set of features that vary little from one language to the next. D, too, follows this pattern, but does so in a way that makes it stand out from the rest.

As you'll see in later chapters, some of D's more advanced features improve on ideas already implemented in other modern languages derived from C; some were inspired by languages outside the family; and a few are not to be found in any other mainstream programming language. While these features all contribute to D's unique identity, many users are first drawn to the language by the core feature set. In this chapter, we'll look at these core features.

We'll start out with declarations before moving on to the basic types. Next, we'll look at the different kinds of arrays and array operations. Then we'll get into flow-control constructs. Finally, we'll discuss functions and error handling.


Declarations
When declaring variables in D, the syntax varies depending on the type of the variable. When we discuss basic types, arrays, and pointers, we'll look at the syntax for variable declarations of each. Before we get that far though, it is helpful to understand some general rules about declarations in D.


Declaration Syntax and Variable Initialization
D is a statically typed language, which means that the type of a variable must be known at compile time. Therefore, D variable declarations usually require the type to be a part of the declaration. We say "usually," because there is one exception to this rule, which we'll get to in a moment. Declarations read from right to left and must be terminated by a semicolon, as in the following examples:

int x; int y = 1; char[] myString = "Hello"; float[5] fiveFloats; long* pointer = null; This code declares one variable x of type int that is not explicitly initialized, and another variable y of type int that is explicitly initialized to 1. The variable myString is an example of declaring an initialized, dynamic array. All strings in D are arrays of one of three character types. fiveFloats is a static array, which is not explicitly initialized in this example. Finally, the pointer variable is an example of a pointer declaration in D. Notice that we said that the variables x and fiveFloats are "not explicitly initialized," rather than "uninitialized." This is because no D variable is ever left uninitialized at the point of declaration. If you do not explicitly initialize the variable to some value, it will automatically be initialized to a specific default value by the compiler. The value used for initialization depends on the variable's type, but it is guaranteed to be the same for all variables of the same type. This is a very useful feature for debugging. You'll see the different default initialization values when we examine each type.

Caution: Automatic variable initialization is intended to catch uninitialized variables, a common source of bugs. However, don't consider it an opportunity to avoid initializing variables yourself. As you'll see when we discuss floating-point numbers, it is not a good idea get into the habit of relying on automatic variable initialization to do your job for you. Another important part of a variable declaration is the name, or identifier, used to represent the variable. When creating any identifier--whether it is the name of a variable, function, class, struct, or whatever--you need to keep a few rules in mind:

  • Identifiers can begin with a letter, an underscore (_), or a universal alpha character.
  • The first character can be followed by any number of letters or universal alpha characters.
  • You can use as many underscores in the identifier as you like, as long as you don't use them for both the first and second characters. Identifiers beginning with two underscores are reserved for use by the compiler.
  • Identifiers are case-sensitive, so x and X are not the same. Note: Universal alphas are characters from several different languages. They are defined, using hexadecimal codes, in Appendix D of the C99 standard as being legal for use in C identifiers. Because D is derived from C, it accepts the same characters in identifier names. An optional part of variable declaration is the storage class. The storage class of a construct determines when it is allocated, where it is stored, how long it lives, how it is accessed, and, in some cases, how the compiler views it. D reserves several keywords for indicating the storage class of different language constructs. In this chapter, we are concerned with only those that affect variables and functions, as using a storage class alters the syntax of a declaration. A storage class commonly used with individual variables is const. This tells the compiler that a given variable is to be treated as a constant expression, meaning that its value should not change during runtime. Another commonly used storage class is extern, which indicates that a variable is initialized outside the current binary. This is frequently used when creating D modules that interact with C libraries. When using a storage class in a variable declaration, it must precede the type:

    const int x = 1; However, in some cases, the type can be omitted: const y = 1; Here, the type of y is omitted. This form uses a feature of D called automatic type inference. As long as a declaration contains a storage class, the type can be omitted, and the compiler will infer it automatically. Because a storage class is intended to affect the variable in some way, D provides a special storage class, auto, for those cases where you want to use automatic type inference but don't want any storage class side effects. In other words, auto does not affect the variable in any way at all and indicates only that type inference is to be used. Using auto together with the type in the declaration is not an error, but has no meaning. Here's an example of using the auto storage class: auto x = 1; // The type will automatically be inferred as int. auto int y = 1; // auto has no effect here, since the type is specified. There's quite a bit more to say about declarations. We'll get to the specifics for various constructs as the chapter progresses. First, we need to lay some more groundwork and talk about D's scoping rules.
    Declarations and Scope
    The term scope describes the context in which a particular declaration resides. Scope affects variable declarations in two ways:

    • It determines when and how you can initialize your variables.
    • Because scope controls which variables are visible, it also affects how you can name your identifiers. In this chapter, we are concerned with two basic types of scope: module scope and block scope. When you create a new D source file, you are working in module scope by default. You usually create a new block scope with each matching pair of curly braces you add to the file. Note: Module scope is also referred to as global scope. Block scope is often called local scope. D also has a special scope that is unique to classes and structs, generally referred to as class scope. You'll learn about classes and structs in Chapter 3. The following example shows module scope and global scope.

      // This is module scope. Here, we declare x and initialize it with a constant // expression. int x = 1; void main() { // A new block scope starts here--a child of the module scope. // y is declared inside main's block scope, meaning it is local to main. // It can see x, but x can't see it. int y = x; if(1 < 2) { // A new block scope starts here--a child of main's scope. // Because x is visible in main's scope, it is also visible here. And // because main's scope is this scope's parent, y is visible, too. // However, z is visible neither in main's scope nor in the module // scope. int z = x + y; } // The end of the if block scope } // The end of main's block scope void someFunc() { // A new block scope starts here--a child of the module scope and a // sibling of main's scope. // This y is declared inside someFunc's scope. It can see x, but x can't // see it. Also, neither it nor the y in main's scope are visible to each // other. int y = x; } // The end of someFunc's block scope Note: The example of using module and block scope employs some features that we haven't yet discussed, such as functions. For now, you just need to focus on the meaning of scope demonstrated by the code. You'll learn about the other features later in this chapter and in upcoming chapters. In this example, the variable x is declared outside any curly braces. This indicates that it is in module scope. The curly braces in the main function introduce a new block scope. It is in this scope that a variable y resides. Similarly, the function someFunc creates a new block scope with its own y variable.

      The code comments in the listing explain scope visibility. Essentially, children can see identifiers that are visible in, or declared in, their parent, but parents can never see identifiers declared in their children. Neither can siblings see each other's identifiers.

      As noted, both main and someFunc declare a variable y. They can do this since neither scope is visible to the other. However, neither main nor someFunc could declare a variable x, since x is already visible in both scopes. Identifier names must be unique within the scope in which they are declared. If you create a new identifier using a name that is already visible in that scope, the compiler will fail with an error.

      This example also explicitly initializes the variables it declares. Within a block scope, there aren't any restrictions on how you explicitly initialize a variable. However, in the module scope, you can initialize only variables with constant expressions. It is illegal to use a nonconstant expression to explicitly initialize a variable at module scope. Doing so will result in a compiler error. The following shows examples of legal and illegal module scope variable initialization.

      int x = 1; // OK: 1 is a constant expression. int y = x; // Error: x is not a constant expression. int z = 1 + 1; // OK: 1 + 1 is a constant expression. int a = 1 + x; // Error: 1 is constant, x is not, so 1 + x is a // nonconstant expression. Because both y and a depend on x, which is a nonconstant expression, neither can be explicitly initialized at module scope. Instead, they should be assigned their values elsewhere in the program in a block scope, such as a function. Assignments, other than those made at the point of declaration, are illegal at module scope. Again, none of these restrictions apply to block scope. Tip: One way to "fake" initialization with nonconstant expressions at the module scope is to make the declaration as normal, without explicitly initializing it, and then assign the variable its value inside a static module constructor. A module constructor is a unique D feature that is quite handy for this purpose. You will learn about module constructors in Chapter 4.
      Constant and Nonconstant Expressions
      The term expression is often used in language specifications. It is bandied about by compiler writers, who tend to use a great deal of vocabulary that average programmers forgot about after their last comp-sci course, or never knew at all. The D Programming Language specification defines the term as follows:

      Expressions are used to compute values with a resulting type. These values can then be assigned, tested, or ignored. Expressions can also have side effects.

      Any single value, such a 1 or an x, is an expression. An arithmetic operation, such as 1+1 or 1+x, is an expression. A function call that returns a value is an expression. All of these can be used anywhere the language specification says an expression is legal, such as making an assignment to a variable.

      A constant expression is one whose value can be known at compile time and is never going to change. For example, 1 is a constant because it is always 1, just as 1+1 always results in the value 2. Because the values of constant expressions never change, the compiler can make certain optimizations with constant expressions that it otherwise wouldn't be able to do.

      One popular feature of D is compile time function execution (CTFE). This feature is built around constant expressions, allowing programmers to create functions and other expressions that are evaluated at compile time rather than at runtime. This allows the D compiler to make extraordinary optimizations that would not be possible otherwise. You'll learn about CTFE in Chapter 5.

      A nonconstant expression is one whose value can change, and usually does. For example, the declaration int x = 1 assigns a constant expression to the variable x, meaning that the initial value of x is known at compile time, but x itself is nonconstant because it can change at runtime. The compiler is unable to make the optimizations for x that it could make for a constant. This means that x cannot be used where a constant expression is required.

      For the basic types, which we'll discuss next, the declaration syntax doesn't change from what we've looked at so far. Things do change a bit for pointers, arrays, and functions, as you'll see in the sections about those constructs.


      Basic Types
      D supports the same basic types as other languages in the C family, but goes beyond those languages with its own unique twists. For example, D has three character types, nine floating-point types, and a reserved 128-bit integral type that currently doesn't exist in other C languages. All of these are a core part of the language and not specially defined types in a library. In practice, most programmers do not need to be concerned with every data type D supports, but some programmers will put them all to good use. For example, numerical or scientific application programmers will appreciate the variety of floating-point types available.

      Something that nearly all of D's basic data types have in common is that the specification explicitly defines the bit size of each. One of the primary benefits of such an approach is that data of a given type will remain the same size across platforms. But there are times when a fixed-size type is not the best option, and you may require a type that is best suited for the current platform. Types with sizes that vary across CPU architectures are available as part of Tango's C modules (which are briefly described in Chapter 8). In fact, the C type size_t is available automatically, without the need to import any additional modules.

      In this section, we'll look at integral types, floating-point types, and character types. Most programming books categorize characters as integers. However, characters in D are quite different from other integers and deserve their own section. Before we discuss any of the data types, however, we should first talk about properties.


      Properties
      All data types in D expose certain properties that can be queried for specific information about the type or, in some cases, can be used to perform a specific operation on a type instance. For example, all types have the sizeof property, which tells the size, in bytes, of a given type. Arrays have a special sort property, which can be used to sort the contents of the array in place. Properties can be queried by using dot notation with the name of the type, as in int.sizeof.

      One potentially confusing point about properties is that they can also be accessed through a variable, or instance, of a certain type. If you have a variable x declared as type int, you can query its sizeof property in the same manner: x.sizeof. Keep in mind that, depending on the property, the result you get by querying through the type may not be the same result you get when querying through the variable. Furthermore, some default properties, such as array's sort, work on only type instances and not on the types at all. In the majority of cases, however, the default type properties and instance properties are the same. Table 2-1 lists properties that are common to all data types and their instances.

      Property Description init The default initialization value sizeof The size in bytes alignof The byte boundary upon which the type or instance is aligned mangleof A string representing the "mangled" name stringof A string representing the name of the type or instance as it appears in source code Table 2-1. Properties Common to All Data Types and Their Instances Note: When compilers parse source code files, they usually convert the names of functions, variables, and other entities into an internal format that incorporates information about the type or signature of the entity. This form is called the mangled name. The following code queries each of these properties for the type int and for an instance of that type.

      import tango.io.Stdout; void main() { Stdout.formatln("int.init is {}", int.init); Stdout.formatln("int.sizeof is {}", int.sizeof); Stdout.formatln("int.alignof is {}", int.alignof); Stdout.formatln("int.mangleof is '{}'", int.mangleof); Stdout.formatln("int.stringof is '{}'", int.stringof); int x; Stdout.formatln("x.init is {}", x.init); Stdout.formatln("x.sizeof is {}", x.sizeof); Stdout.formatln("x.alignof is {}", x.alignof); Stdout.formatln("x.mangleof is '{}'", x.mangleof); Stdout.formatln("x.stringof is '{}'", x.stringof); } The result of compiling and executing this code is as follows: int.init is 0 int.sizeof is 4 int.alignof is 4 int.mangleof is 'i' int.stringof is 'int' x.init is 0 x.sizeof is 4 x.alignof is 4 x.mangleof is 'i' x.stringof is 'x' As you can see from this example, the only property that shows different results for the type and instance queries is stringof, which is inherently different from instance to instance due to what it represents. You may encounter other properties that differ between a type and an instance. So before you use a particular property, make sure you understand what it represents so that you can make the appropriate query.
      Integral Types
      Integral types are types that represent integer values. Nearly all integral types in D come in two flavors:

      • Signed types can represent both positive and negative numbers.
      • Unsigned types can represent only positive numbers. They have the same name as the corresponding signed type, but with a u prefix. Table 2-2 shows each integral type, its size in both bits and bytes, and the minimum and maximum values it can represent. Type Size (Bits) Size (Bytes) Min Max byte 8 1 -128 127 ubyte 8 1 0 255 short 16 2 -32768 32767 ushort 16 2 0 65535 int 32 4 -2147483648 2147483647 uint 32 4 0 4294967295 long 64 8 -9223372036854775808 9223372936854775807 ulong 64 8 0 18446744073709551615 Table 2-2. Integral Types The bool data type is not listed in Table 2-2 because it has only two possible values and is neither signed nor unsigned. A bool is 1 byte in size and can be either true or false. It can be converted to any integral type, at which time true will be converted to 1 and false to 0. The default value of a bool is false.

        Also missing from Table 2-2 are cent and ucent, both of which are intended to be 128 bits, or 16 bytes, in size. Currently, neither is implemented. However, the keywords cent and ucent are reserved for future use.

        With the exception of long and ulong, the default initialization value of all integral types is 0. In a perfect world, the default would be an invalid value. Unfortunately, in all of the range of values that an integral type can hold, none of them are inherently invalid. So, we have to settle for 0.

        In D, all constant integer values are of type int. D supports a special modifier, L, that can be appended to an integer constant to make it type long instead of int. The initialization value of both long and ulong is 0L.

        Note: The distinction may seem minor, but internally there is a big difference between 0 and 0L. The former is a 32-bit value and is treated as an int, whereas the latter is a 64-bit value and is treated as a long. Appending L to any constant values you assign to long variables is a good habit. In addition to the properties listed in Table 2-1, integral types expose the two properties shown in Table 2-3. The values these properties return are listed in the Min and Max columns of Table 2-2.

        Property Description min Minimum value this type can represent max Maximum value this type can represent Table 2-3. Properties Specific to Integral Types
        Floating-Point Types
        In layman's terms, a floating-point type is any type whose values contain a decimal point. In most programming languages, the built-in floating-point types are used to represent what mathematicians call a real number. A real number is one that includes the numbers between the integers. For example, the number 1.011 can represent a real number in mathematical terms and is a floating-point value in programming terms.

        Although D supports nine floating-point types, the average user of D will probably be interested in only two of the three basic floating-point types: float and double. These two types, and one more called real, represent real numbers in the mathematical sense. But D goes beyond built-in support for real numbers. It offers three floating-point types that represent imaginary numbers and three more that represent complex numbers. If you don't know the difference between real, imaginary, and complex numbers, you'll likely have no use for these other types.

        After reading about the integral types in the previous section, you might think that the default initializer for floating-point types is 0.0. Many newcomers to D incorrectly make that assumption. The default initializer for floating-point types is actually a form of a special value called Not a Number (NaN), which helps to detect errors caused by uninitialized variables. Each floating-point type has a nan property, which is used to initialize it. Outside initialization, if you find yourself with a floating-point value equivalent to NaN after a calculation, you can be fairly certain that you have a programming error.

        Table 2-4 lists all of D's floating-point types, their size in bits and bytes, and the default initializer as defined in the language specification.

        Type Size (Bits) Size (Bytes) Initializer float 32 4 float.nan double 64 8 double.nan real Platform dependent Platform dependent real.nan ifloat 32 4 float.nan * 1.0i idouble 32 8 double.nan * 1.0i ireal Platform dependent Platform dependent real.nan * 1.0i cfloat 64 8 float.nan * float.nan * 1.0i cdouble 128 16 double.nan * double.nan * 1.0i creal Platform dependent Platform dependent real.nan * real.nan * 1.0i Table 2-4. Floating-Point Types The types float and double are the standard fare found in any C-family programming language. The real type is defined to be the largest size on the hardware used to compile the application. For example, on x86 platforms, a real is 80 bits (10 bytes) in size. The types prefixed with an i represent imaginary numbers. The types prefixed with a c represent complex numbers.

        In addition to the properties listed Table 2-1, floating-point types expose all of the properties shown in Table 2-5. The average programmer will likely be concerned with only the nan, max, and min properties.

        Property Description infinity Value that is too large to represent nan Value used to represent NAN dig Number of decimal digits of precision epsilon Smallest increment to the value 1 mant_dig Number of bits in the mantissa max_10_exp Maximum power of 10 exponent this type can represent max_exp Maximum power of 2 exponent this type can represent min_10_exp Minimum power of 10 exponent this type can represent as a normalized value min_exp Minimum power of 2 exponent this type can represent as a normalized value max Largest value this type can represent that is not infinity min Smallest normalized value this type can represent that is not 0 re Real part of the number im Imaginary part of the number Table 2-5. Properties Specific to Floating-Point Types Tip: If you find some of the terminology used in this section puzzling, you might want to visit the Wikipedia page about floating-point numbers. There, you can learn what a mantissa is, what normalization means, and a great deal more. At the very least, it should give you a better understanding of what some of the floating-point properties represent. Another important thing to mention here is that the re and im properties are instance properties, rather than type properties. If you try to access either of them through a type, such as float.re, you will be confronted with a compiler error. This is because, by definition, both properties require a number to exist. There's no such thing as the real or imaginary part of the type float, but these parts do exist in the number 1.11. So assuming that you have a variable of type float named f, you can access both properties through it: f.im.

        You can read more about D's floating-point types, including the rationale behind including complex and imaginary types in the language here.


        Character Types
        Whereas most languages in the C family support only one character type, D supports three. What makes D's character types different from integrals is that they don't necessarily represent integer values. More accurately, each type is intended to represent a Unicode code point. Several different Unicode encodings exist. D's character types provide built-in support for the three most common encodings.

        Another big difference between integrals and characters is the default initializer. Remember that the goal of automatic initialization is to aid in debugging. Ideally, all built-in data types would be initialized to an invalid value. Integrals do not have invalid values. Floating-point values do. Since D's character types represent Unicode code points, they also have values that are invalid. Specifically, certain values are not valid Unicode. D uses three such values as the default initializer for each character type.

        Table 2-6 lists each character type, its size in both bits and bytes, the Unicode encoding it represents, its minimum and maximum values, and it default initializer.

        Type Size (Bits) Size (Bytes) Unicode Encoding Min Max Initializer char 8 1 UTF-8 0 255 0xFF wchar 16 2 UTF-16 0 65535 0xFFFF dchar 32 4 UTF-32 0 1114111 0x0000FFFF Table 2-6. Character Types Individual characters can be used anywhere an integral can. Each character type also exposes the same properties that integrals do, as listed in Table 2-3.

        When we look at strings later in this chapter (in the "Strings" section), you'll see that they are sequences of characters. D programmers generally work with strings more frequently than with individual characters, but character types come in handy when you're modifying strings or searching them to find a specific character. Character values can be assigned any integer value that is valid for an integral of the same bit size, such as char c = 10, or can be assigned a single-quoted letter, such as char c = 'a'.

        That wraps up the three basic data types. Now it's time to take a quick look at pointers, and then move on to arrays.


        Pointers
        In short, a pointer is a variable that represents a memory address rather than actual program data. Often, the address "points" to program data, such as the beginning of a series of integers or other data type. We mention pointers here because those with experience in C or C++ need to know that pointers work a bit differently in the world of D.

        First, pointer declarations are subtly different in D than in C. Consider the following line of code:

        int *x, y; This code is valid in both the C and D languages, but has different results in each. In C, x is a pointer to int, and y is an int, not a pointer. In D, both x and y are pointers to int. This is a subtle, but very important, distinction. When you use D's typeof operator on an int, the type returned is int. But use the typeof operator on a pointer to int, and the type returned is int*. Because of this, it is more common for D programmers to move the asterisk over to the left, transforming the preceding declaration into the following:

        int* x, y; By using this syntax, it is clear that you are declaring two int pointers. It is good practice to follow this convention for all pointer declarations. Another thing to know about D pointers is that there is no -> operator. Struct and class pointers are manipulated using dot notation. However, you can still use the syntax *x to dereference the pointer.

        Note: If you have no idea what pointers are, a good place to start learning about them and some of the terminology used in this section is the Wikipedia page about pointers. Not only does this page discuss pointers in general terms, but it also gives a little background about pointer usage in different languages. Finally, because D has built-in garbage collection, you need to adhere to a few restrictions when using pointers that point to garbage-collected memory. Most of these restrictions relate to a number of pointer tricks that C programmers have implemented over the years, such as using the low-order bits of a pointer to store extra data. In a nutshell, don't do anything that depends on the address of the pointer to stay the same. Pointers that point to memory not managed by the garbage collector are free from these restrictions. Click here for details.


        Arrays
        An array is a sequence of data, usually of the same type, that can vary in length and is stored in a contiguous block of memory. D supports four types of arrays as part of the core language: static arrays, dynamic arrays, strings, and associative arrays. We'll look at each in turn. We'll also discuss array operations. But first, let's examine some things that all arrays have in common.

        Conceptually, you can think of a D array as an item that is made up of two components: a pointer and a length. The pointer points to the memory address that contains the first element in the array, and the length represents the number of elements in the array. Both the length and the pointer are accessible as properties. Because each array knows how many elements it contains, it is possible for the compiler to do automatic bounds checking (DMD does this by default, but it can be turned off by passing -release on the command line).

        D supports the same syntax as C for array declarations, called postfix syntax, but only for backward compatibility.

        // C-style, or postfix, declarations int x[3]; // Declares an array of 3 ints int x[3][5]; // Declares 3 arrays of 5 ints int (*x[5])[3]; // Declares an array of 5 pointers to arrays of 3 ints The preferred syntax is called prefix syntax. // D-style, or prefix, declarations int[3] x; // Declares an array of 3 ints int[3][5] x; // Declares 5 arrays of 3 ints int[3]*[5] x; // Declares 5 pointers to arrays of 3 ints The syntactical differences between the two styles are obvious, but prefix multidimensional array declarations can be confusing to those with a C background. You'll notice that the order is reversed from the postfix declarations. However, indexing values from the multidimensional arrays is done in the same way, no matter how they are declared, as in the following example. int x[5][3]; // Postfix declaration of 5 arrays of 3 ints int[3][5] y; // Prefix declaration of 5 arrays of 3 ints x[0][2] = 1; // The third element of the first array is set to 1. y[0][2] = 2; // the third element of the first array is set to 2. As you can see, postfix and prefix come into play only in array declarations. You index them both in the same way.
        Static Arrays
        Static arrays are the simplest to understand; in fact, all of the arrays declared in the previous examples are static. These are arrays that have a fixed length that is established at compile time. The length of a static array can never change during the course of program execution. Static arrays are allocated on the stack. Table 2-7 lists the properties that are unique to static arrays (remember that these are in addition to the properties listed in Table 2-1, which are common to all data types).

        Property Description length Number of elements in the array; cannot be modified ptr Pointer to the first element in the array dup Creates and returns a dynamic array that is an exact duplicate of the array reverse Reverses the array in place and returns the array sort Sorts the array in place and returns the array Table 2-7. Properties Specific to Static Arrays Unlike most of the properties that you have seen so far, each of those listed in Table 2-7 is exclusively an instance property, and three of the static array properties have side effects. The dup property will allocate enough memory to hold an exact duplicate of the array. The reverse and sort properties will change the order of items in the array. While these are very convenient properties to use, be aware that they could have performance implications when used on large arrays or in performance-critical code.

        Note: The sizeof property of a static array (see Table 2-1) returns the length of the array multiplied by the number of bytes per array element. This means that the result varies based on the type and number of elements in the array.
        Dynamic Arrays
        Unlike with static arrays, the length of a dynamic array does not need to be known at compile time. Dynamic arrays are allocated on the heap. Because the length is not fixed, a dynamic array can be resized as needed. Up to now, all of the properties you have seen have been read-only. The length property of dynamic arrays is actually writable. To resize the array, simply set its length property to the new size.

        You make an array dynamic by declaring it without any numbers to indicate the size of the array. Instead, you use empty brackets, as follows:

        int x[]; // Postfix dynamic array declaration int[] y; // Prefix dynamic array declaration Neither of the arrays in this example allocates any space for its elements. Both arrays are empty, meaning each has a length of 0 and a null pointer. You can allocate space for a dynamic array in three ways:

        • Use D's new keyword.
        • Explicitly set the length property to the number of elements required.
        • Use an array literal. An array literal is a sequence of values contained within brackets, such as [0,2,5,6] or [2.0, 5.0, 3.0]. The following shows examples of these methods. int[] x = new int[10]; // A dynamic array of 10 ints all initialized to int.init int[] y; // An empty dynamic array of ints y.length = 10; // y can now hold 10 ints. int[] z = [0,1,2,3,4]; // A dynamic array that holds 5 ints initialized to // the values 0, 1, 2, 3, and 4 int[5] z1 = [0,1,2,3,4]; // A static array of 5 ints initialized to the values // 0, 1, 2, 3, and 4 int[] a; // An empty dynamic array a= new int[10]; // a can now hold 10 ints. All values are now set // to int.init a.length = 5; // a has been resized to hold only 5 ints. a = [0,1,2,3,4,5]; // a now has a length of 6 and contains the values // 0, 1, 2, 3, 4, and 5 No matter how the space for a dynamic array is initially allocated, whether via new or its length property, it can be resized at any time. Typically, you resize an array by adjusting its length property, by appending new values to the array, by copying one array to another, or by assigning an array literal to the reference. If the new length is greater than the old length, more space is allocated to accommodate it. If the new length is less than the old length, no memory operations are performed, meaning nothing is allocated, reallocated, or freed. In addition to the properties listed in Table 2-1, dynamic arrays expose all of the properties that static arrays do, as listed in Table 2-7. The only difference is that the length property of a dynamic array is not constant, so it is both readable and writable.

          Note: The sizeof property of a dynamic array (see Table 2-1) returns the size, in bytes, of the array reference rather than the amount of memory used by the array.
          Strings
          In D, strings are not so much a separate array type as they are a special case of normal static and dynamic arrays. Strings are arrays that are of type char, wchar, or dchar and represent UTF-8, UTF-16, and UTF-32 sequences, respectively.

          String literals can be used to initialize a new string. A string literal, frequently just referred to as a string, is a sequence of characters contained within double quotation marks, such as "Hello World". Strings initialized with a string literal are immutable, meaning they cannot be modified. Because strings are arrays, they can be static or dynamic.

          Here are examples of initializing strings:

          char[] s1 = "abcd"; // s1 is a dynamic array. s1[0] = 'x'; // since s1 is immutable, this should be an error. Although // the compiler does not complain, this could cause a crash. char[4] s2 = "abcd"; // s2 is a static array. s2[0] = 'x'; // Again, this could cause a crash since s2 was initialized // with a string literal. Another special feature of strings is that a postfix can be attached to a string literal in order to specify how it should be treated by the compiler. By default, the type of a string is determined automatically during compilation. However, you can use the postfix character c, w, or d to force a literal to be treated as an array of char, wchar, or dchar, respectively. For example, the literal "hello"w will be treated as an array of wchar because of the w postfix. Tango provides several library functions that operate on strings. These include functions to convert between one string type and another. While it is possible to cast between string types, such as from wchar[] to dchar[] and vice versa, this is a technique that should be used cautiously. Because each character type represents a different Unicode encoding, it is possible that casting from one type to the other could have unexpected results. A safer approach is to use Tango's Unicode conversion routines instead. You'll learn about Tango's text-processing functions in Chapter 6.

          There are no special properties exposed by strings, other than those exposed by static and dynamic arrays. Which properties are available depends on how the string was allocated.


          Associative Arrays
          Associative arrays are distinct from static and dynamic arrays. What sets them apart is that they are allowed to be indexed by types other than integers and they can be sparsely populated. Associative arrays must be declared to hold values of a certain type and to have keys of a certain type. D's associative arrays are analogous to hash maps in other languages. They are dynamic by nature and always reside on the heap. The following example shows some different associative array declarations.

          int[char[]] x; // An associative array with values of type int and keys of type // char[] float[double] y; // An associative array with values of type float and keys of type // double char[][char[]] z; // An associative array with values of type char[] and keys of // type char[] Once an associative array has been declared, you can associate values with keys using the following syntax: aa[key] = value; If any value is already associated with the given key, it will be overwritten. You can retrieve values from an associative arrays using similar syntax: value = aa[key]; If a key does not exist in the associative array, the D runtime will throw an error indicating that the array index is out of bounds. One way to avoid this is to test if the key exists using the in operator, which we'll look at shortly. Associative arrays have a few unique properties, which are listed in Table 2-8.

          Property Description length Number of values in the associative array; as with static arrays, this is read-only keys Dynamic array containing all of the keys in the associative array values Dynamic array containing all of the values in the associative array rehash Reorganizes the associative array in place to make lookups more efficient and returns the new array Table 2-8. Properties Specific to Associative Arrays Because of the nature of an associative array, it doesn't make sense to be able to modify the number of key/value pairs it contains without adding and removing them, so the length property is read-only. The rehash property is an expensive operation, but can result in more efficient lookups if a large number of keys have been set.

          Note: The sizeof property of an associative array (see Table 2-1) returns the size, in bytes, of the array reference, rather than the amount of memory used by the array. In the next section, we'll look at some operations that can be performed on dynamic and static arrays. Since associative arrays have their own unique set of operations, we'll cover two handy associative array operators here:

          in: This operator can be used to determine if a value has been associated with a given key in an associative array. This is very handy to avoid overwriting a key that already exists, for example. The in operator returns a pointer to the value if it exists and null if it doesn't.

          remove: This function is called through an associative array much like a property, but you pass a key value as an argument (you'll learn more about functions later in this chapter, in the "Functions" section). When the remove function is called, it removes the key and its associated value from the array.

          Here is an example that uses both in and remove:

          int[char[]] aa; // Declare an associative array of int values and char[] keys. aa["One"] = 1; // Associate the value 1 with the key "One". int x = aa["One"]; // Retrieve the value associated with the key "One". int *y = ("Two" in aa); // See if the key "Two" has been set. if(y is null) { aa["Two"] = 2; // Only set "Two" if it isn't set already. } aa.remove("One"); // Remove the key "One" and its associated value.
          Array Operations
          Both static and dynamic arrays (and, as such, strings) have certain operations that can be performed using different operators. For example, as you have already seen, the [] operator is used in declarations, as well as to set and get array values. It is also used in slicing and copying, which we will look at now, before moving on to concatenation.


          Slicing
          Slicing is an operation that essentially creates another view of an array. It does not copy any array data, but simply creates a new array reference that shares a portion of an existing array. In other words, it's a different view of the same data.

          A slice is performed using the following syntax:

          a[start .. end] where start is the index at which the slice begins, and end is one index beyond the index where the slice should end. If you are slicing an entire array, you can use the shorthand [], without specifying the start and end points, thereby creating an identical view of the existing array.

          Here are examples of slicing:

          int x[] = [0, 1, 2, 3, 4]; int y[] = x[1 .. x.length]; // y is a view of x starting from the second index // and ending with the last, i.e., 1, 2, 3, 4. int z[]; z = x[1 .. x.length -1]; // z is a view of x starting from the second index // and ending with the next-to-last, i.e., 1, 2, 3. int all = x[]; // all is a view of all of x, from the first // element to the last, i.e., 0, 1, 2, 3, 4. In the next chapter, you'll see how to implement custom slicing operations on your own data types using operator overloading. Tip: In a slice operation, when you use the length of the array you are slicing as one of the end points, you can substitute the call to the length property with the special $ operator. For example, x[1 .. x.length] can be rewritten as x[1 .. $].
          Copying
          In D, you can copy an array in three ways:

          • Manually fetch the values from an array and assign them all to another array of the same length. This is rather inefficient and should never be the first choice.
          • Use the dup property common to all static and dynamic arrays.
          • Use the slice operator ([]). In the previous section, you learned that an array slice does not copy an array, but instead creates a new reference that shares part of the same data. However, you can use the slice operator in a copying operation by placing it on the left side of an assignment and another array on the right side. The following example demonstrates how to copy arrays using the slice operator. int[] x = [0, 1, 2, 3, 4]; int[] y = x; // All 5 elements of x are copied to y. int[] z = x[]; // All 5 elements of x are copied to z. y[0 .. 2] = x[1 .. 3]; // Same as y[1] = x[2];. Although the example uses only dynamic arrays, the same operations can be performed using static arrays.
            Concatenation
            D has a special binary operator, ~, which is used to perform array concatenation. This operator works with any array, but it is most often used with strings. You can also use the ~= operator, which effectively appends one array to another in place. Here are some examples of array concatenation:

            char[] x = "Hello"; char[] y = "World"; char[] z = x ~ " " ~ y; // z is a new string, "Hello World", while x an y are // unchanged. int[] a = [0, 1, 2, 3]; int[] b = [4, 5]; a ~= b; // a now contains the values 0, 1, 2, 3, 4, and 5, while // b is unchanged. As with the slice operator, custom concatenation behavior can be implemented using operator overloading, as described in the next chapter.
            Flow Control
            In computer programming, flow control refers to language constructs that direct the flow of program execution. D supports three basic types of flow-control constructs: conditionals, loops, and goto.


            Conditionals
            A conditional is a construct that branches based on a true or false condition. D supports three different types of conditionals: if-else blocks, the ternary operator, and switch-case constructs. All of these can be found in other C-family programming languages, with very few differences.


            if-else Blocks
            Perhaps the most commonly used conditional is the form often referred to as an if-else block. These conditionals always begin with an if statement that tests for a certain condition and will execute only if that condition evaluates to true. This statement can be followed by any number of else if statements, which test for more conditions that will execute only if they evaluate to true. Optionally, a final else statement can execute if none of the preceding conditions evaluated to true. The following are examples of if-else blocks.

            bool x = true; if(x) { // This will execute because x is true. } else { // This will not execute because x is true. } bool y = false; int a = 1; int b = 2; if(y) { // This will not execute because y is false. } else if(a > b) { // This will not execute because a > b evaluates to false. } else { // This will execute because both of the two preceding conditions evaluated // to false. } if-else blocks can be nested inside each other for as many levels as you have stack space available. Realistically, nested if-else blocks are quite ugly and hard to follow, so most programmers rarely go beyond two or three levels deep. If you find yourself going deeper, you probably need to rethink your design.
            The Ternary Operator
            Often, you don't need to perform any overly complicated action based on an if-else conditional. For example, you might need to simply assign a value to a single variable based on a specific condition. This is where the ternary operator comes in handy. If you need to perform more than one operation for each statement, however, you should stick with the if-else block.

            The ternary operator consists of two characters: ? and :. The entire expression takes the following form:

            condition ? true action : false action

            where true action and false action are expressions that will be evaluated if condition is true or false, respectively.

            The following example shows how you can replace a single assignment in an if-else block with the ternary operator.

            int x; int y = 1; int z = 2; // The if-else block if(y < z) { x = 1; } else { x = 0; } // The same code using a ternary operator x = y < z ? 1 : 0; The ternary operator should be used sparingly, because overuse can make code confusing and ugly. Caution: When using a ternary operator to assign a value to an auto variable, the result may not be what you expect. Variables declared auto will take the type of the last item in the ternary expression, regardless of the type actually assigned. So if the expression auto a = b ? c : d evaluates to c, the type of a will become the type of d. If c and d are the same type, this doesn't matter, but it is something to be aware of when using multiple types in a ternary expression.
            switch-case Statements
            switch-case is another conditional construct that is often considered to be an ideal alternative to if-else blocks when you have several conditions to test or when you need to test for specific values rather than Boolean conditions.

            You start with a switch statement that evaluates an expression. The switch statement creates a new scope in which you implement several case statements that evaluate expressions that correspond to the possible values the switch evaluated. You may also end the switch with a special default statement, which does not evaluate an expression and executes only if no case matched the result.

            You need to keep in mind a few restrictions on both switch and case statements:

            • The expression evaluated by the switch must result in an integral type, including character types, or a string.
            • The expressions in each case statement must evaluate to a constant value or array.
            • The resultant value in each case statement must be implicitly convertible to the type of the evaluated result in the switch.
            • Each value in the case statements must be unique. It is illegal to have two case statements whose expressions evaluate to the same value. The following shows the switch-case statement in action. int x = 1; int y; int z; switch(x) { case 2 - 1: // Evaluates to case 1: y = 2; break; case 1 + 1: // Evaluates to case 2: y = 3; break; case 3: y = 5; case 4: z = 2; break; default: break; } Notice the break statements here. break is a statement that can be used in a switch or a loop as a quick exit. The case statements containing the break statements will cause the switch to exit as soon as they execute the code before the break. For example, in the first case, after y = 2 executes, the switch will exit. Notice, however, that the third case has no break statement. This means that after y = 5 is evaluated, execution will "fall through" to the next case, so that z = 2 is executed. Essentially, every case without a break statement will fall through to the next until a break is encountered. Much of what you have seen here is the same in D as it is in other C-family languages. However, one thing that D does differently is to accept strings in both the switch and case statements. For example, the following is possible in D:

              char[] str; switch(str) { case "Hello": . . . case "World": . . . . . . } You can always use library functions to compare different strings, but the ability to use strings in switch-case blocks can greatly simplify code.
              Looping Constructs
              A loop is a block of code that executes repeatedly until a certain condition is met. D supports five different loops, but for our purposes, we will split them up into three categories: for, while, and foreach.

              In loops, the break and continue statements provide a way to exit. You've already seen break in the discussion of switch-case constructs. In loops, it serves the same function: an early exit. If you are performing an operation in a loop and determine that you no longer need to repeat the operation, you can execute a break statement and exit the loop immediately, rather than waiting for it to run its course. This is often used as an optimization in loops that are searching a data structure for a specific value, because once the value is found, you don't need to continue searching.

              The continue statement is also handy. Rather than exiting the entire loop as break does, it exits the current iteration and continues to the next. The effect is that any code following the continue statement will not be executed in the same iteration in which continue is called. This statement is usually wrapped in a conditional, as it would be rather silly to call it on every iteration. The result is that the code following a continue will execute on some iterations, and it won't execute on other iterations.


              for Loops
              The venerable for

Cancel Save
0 Likes 0 Comments

Comments

Nobody has left a comment. You can be the first!
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!

What is D to C? Come learn about D's declarations, basic data types, pointers, arrays, looping and decision structures, functions and error handling

Advertisement
Advertisement
Advertisement