How Ot Get Data Out Of An Associative Array

Chapter 10 Associative Arrays

CONTENTS

Limitations of Array Variables
Definition
Referring to Associative Assortment Elements
Adding Elements to an Associative Array
Creating Associative Arrays
Copying Associative Arrays from Array Variables
Adding and Deleting Array Elements
Listing Array Indexes and Values
Looping Using an Associative Array
Creating Data Structures Using Associative Arrays
- Linked Lists
- Structures
- Copse
- Databases
- Example: A Calculator Program
Summary
Q&A
Workshop
- Quiz
- Exercises

Today's lesson shows you lot how to use associative arrays. Yous'll larn the following:

What an associative array is
How to access and create an associative assortment
How to re-create to and from an associative assortment
How to add and delete associative assortment elements
How to list assortment indexes and values
How to loop using an associative assortment
How to build data structures using associative arrays

To start, have a expect at some of the problems that using array variables creates. In one case yous have seen some of the difficulties created by array variables in certain contexts, you'll meet how associative arrays can eliminate these difficulties.

Limitations of Array Variables

In the array variables you've seen so far, you tin access an element of a stored list by specifying a subscript. For case, the following statement accesses the third element of the list stored in the array variable @array:

$scalar = $array[2];

The subscript 2 indicates that the tertiary element of the array is to be referenced.

Although array variables are useful, they take 1 meaning drawback: it'due south often hard to remember which element of an array stores what. For example, suppose you want to write a programme that counts the number of occurrences of each capitalized discussion in an input file. You tin can exercise this using array variables, but it'southward very difficult. Listing 10.i shows you what you lot take to go through to do this.

List ten.i. A programme that uses assortment variables to keep track of capitalized words in an input file.

1:  #!/usr/local/bin/perl  2:  3:  while ($inputline = <STDIN>) {  4:          while ($inputline =~ /\b[A-Z]\Southward+/grand) {  v:                  $word = $&;  half-dozen:                  $word =~ s/[;.,:-]$//;  # remove punctuation  vii:                  for ($count = i; $count <= @wordlist;  8:                                   $count++) {  9:                          $constitute = 0;  x:                         if ($wordlist[$count-1] eq $word) {  xi:                                 $found = 1;  12:                                 $wordcount[$count-ane] += 1;  13:                                 concluding;  14:                         }  15:                 }  sixteen:                 if ($found == 0) {  17:                         $oldlength = @wordlist;  18:                         $wordlist[$oldlength] = $word;  19:                         $wordcount[$oldlength] = 1;  20:                 }  21:         }  22: }  23: print ("Capitalized words and number of occurrences:\northward");  24: for ($count = one; $count <= @wordlist; $count++) {  25:         impress ("$wordlist[$count-1]: $wordcount[$count-1]\due north");  26: }

$ program10_1  Here is a line of Input.  This Input contains some Capitalized words.  ^D  Capitalized words and number of occurrences:  Here: 1  Input: 2  This: one  Capitalized: 1  $

This program reads 1 line of input at a time from the standard input file. The loop starting on line four matches each capitalized word in the line; the loop iterates once for each match, and it assigns the lucifer beingness examined in this particular iteration to the scalar variable $discussion.

In one case any closing punctuation has been removed by line half dozen, the program must then check whether this word has been seen before. Lines 7-fifteen do this by examining each element of the list @wordlist in turn. If an element of @wordlist is identical to the word stored in $give-and-take, the corresponding element of @wordcount is incremented.

If no element of @wordlist matches $word, lines sixteen-xx add a new chemical element to @wordlist and @wordcount.

Definition

As you can see, using array variables creates several issues. First, it's not obvious which chemical element of @wordlist in List 10.one corresponds to which capitalized word. In the example shown, $wordlist[0] contains Here because this is the kickoff capitalized word in the input file, only this is non obvious to the reader.

Worse however, the program has no way of knowing which element of @wordlist contains which discussion. This ways that every time the program reads a new word, it has to check the entire list to see if the word has already been institute. This becomes time-consuming as the list grows larger.

All of these problems with assortment variables be because elements of array variables are accessed by numeric subscripts. To get effectually these problems, Perl defines another kind of array, which enables you to admission assortment variables using any scalar value you like. These arrays are called associative arrays.

To distinguish an associative assortment variable from an ordinary array variable, Perl uses the % grapheme as the first grapheme of an associative array-variable proper name, instead of the @ character. Every bit with other variable names, the first character following the % must be a alphabetic character, and subsequent characters can be letters, digits, or underscores.

The post-obit are examples of associative array-variable names:

%assocarray  %a1  %my_really_long_but_legal_array_variable_name

Notation

Utilise the same name for an associative assortment variable and an ordinary array variable. For example, you can define an array variable named @arrayname and an associative array variable named %arrayname.

The @ and % characters ensure that the Perl interpreter tin tell one variable proper noun from another.

Referring to Associative Array Elements

The main difference between associative arrays and ordinary arrays is that associative array subscripts can be whatever scalar value. For example, the following statement refers to an element of the associative array %fruit:

$fruit{"bananas"} = 1;

The subscript for this array element is bananas. Any scalar value tin be a subscript. For example:

$fruit{"black_currant"}  $number{3.14159}  $integer{-vii}

A scalar variable tin be used as a subscript, equally follows:

$fruit{$my_fruit}

Here, the contents of $my_fruit become the subscript into the associative array %fruit.

When an array chemical element is referenced, as in the previous example, the name of the array element is preceded by a $ character, not the % character. Equally with array variables, this tells the Perl interpreter that this is a single scalar particular and is to exist treated as such.

NOTE

Subscripts for associative array elements are always enclosed in caryatid brackets ({}), not square brackets ([]). This ensures that the Perl interpreter is always able to distinguish associative array elements from other array elements.

Adding Elements to an Associative Array

The easiest way to create an associative array item is just to assign to it. For example, the argument

$fruit{"bananas"} = one;

assigns one to the element bananas of the associative array %fruit. If this element does not exist, it is created. If the array %fruit has not been referred to before, it also is created.

This feature makes it easy to use associative arrays to count occurrences of items. For example, Listing 10.ii shows how you can utilise associative arrays to count the number of capitalized words in an input file. Note how much simpler this plan is than the one in Listing 10.1, which accomplishes the aforementioned chore.

Listing 10.2. A program that uses an associative array to count the number of capitalized words in a file.

ane:  #!/usr/local/bin/perl  ii:  three:  while ($inputline = <STDIN>) {  4:          while ($inputline =~ /\b[A-Z]\S+/g) {  5:                  $word = $&;  6:                  $word =~ southward/[;.,:-]$//;  # remove punctuation  seven:                  $wordlist{$word} += 1;  8:          }  9:  }  10: print ("Capitalized words and number of occurrences:\n");  11: foreach $capword (keys(%wordlist)) {  12:         print ("$capword: $wordlist{$capword}\n");  xiii: }

$ program10_2  Here is a line of Input.  This Input contains some Capitalized words.  ^D  Capitalized words and number of occurrences:  This: 1  Input: 2  Hither: 1  Capitalized: 1  $

As you can encounter, this program is much simpler than the i in Listing 10.1. The previous program required 20 lines of code to read input and store the counts for each word; this program requires merely vii.

As before, this program reads 1 line of input at a time from the standard input file. The loop starting in line 4 iterates once for each capitalized discussion institute in the input line; each match is assigned, in plow, to the scalar variable $word.

Line 7 uses the associative array %wordlist to keep track of the capitalized words. Because associative arrays tin can use any value as a subscript for an chemical element, this line uses the word itself as a subscript. So, the element of the assortment corresponding to the word has ane added to its value.

For instance, when the word Hither is read in, the associative array element $wordlist{"Here"} has 1 added to its value.

Lines 11-13 impress the elements of the associative array. Line 11 contains a telephone call to a special built-in function, keys. This function returns a list consisting of the subscripts of the associative array; the foreach statement then loops through this list, iterating once for each element of the associative assortment. Each subscript of the associative array is assigned, in plough, to the local variable $capword; in this instance, this means that $capword is assigned Here, Input, Capitalized, and This-one per each iteration of the for each loop.

An important fact to think is that associative arrays e'er are stored in "random" order. (Really, it's the society that ensures fastest admission, but, effectively, it is random.) This means that if yous use keys to access all of the elements of an associative array, there is no guarantee that the elements will announced in whatsoever given social club. In particular, the elements exercise non always appear in the club in which they are created.

To control the order in which the associative assortment elements appear, use sort to sort the elements returned by keys.

foreach $capword (sort keys(%wordlist)) {          print ("$capword: $wordlist{$capword}\n");  }

When line 10 of List 10.2 is modified to include a call to sort, the associative assortment elements announced in sorted social club.

Creating Associative Arrays

Y'all tin create an associative array with a single assignment. To exercise this, alternate the array subscripts and their values. For example:

%fruit = ("apples", 17, "bananas", nine, "oranges", "none");

This assignment creates an associative array of three elements:

An element with subscript apples, whose value is 17
An chemical element with subscript bananas, whose value is ix
An element with subscript oranges, whose value is none

Again, it is important to recollect that the elements of associative arrays are not guaranteed to be in any item gild, even if you create the entire array at once.

Note

Perl version five enables you to use either => or , to separate assortment subscripts and values when you assign a list to an associative array. For example:

%fruit = ("apples" => 17, "bananas" => 9, "oranges" => "none");

This statement is identical to the previous one, but is easier to understand; the utilize of => makes information technology easier to run into which subscript is associated with which value.

Equally with any associative array, you e'er can add together more elements to the assortment later. For instance:

$fruit{"cherries"} = 5;

This adds a fourth element, cherries, to the associative array %fruit, and gives it the value 5.

Copying Associative Arrays from Assortment Variables

The list of subscripts and values assigned to %fruit in the previous example is an ordinary list similar any other. This means that yous can create an associative array from the contents of an array variable. For instance:

@fruit = ("apples", 6, "cherries", 8, "oranges", xi);  %fruit = @fruit;

The second statement creates an associative array of iii elements-apples, cherries, and oranges-and assigns information technology to %fruit.

If you are assigning a list or the contents of an array variable to an associative array, brand sure that the listing contains an even number of elements, considering each pair of elements corresponds to the subscript and the value of an associative assortment element.

Similarly, you can re-create one associative array into some other. For example:

%fruit1 = ("apples", 6, "cherries", 8, "oranges", 11);  %fruit2 = %fruit1;

You can assign an associative assortment to an ordinary array variable in the same way. For example:

%fruit = ("grapes", 11, "lemons", 27);  @fruit = %fruit;

However, this might not exist as useful, because the order of the array elements is not defined. Hither, the assortment variable @fruit is assigned either the 4-element listing

("grapes", xi, "lemons", 27)

or the list

("lemons", 27, "grapes", 11)

depending on how the associative array is sorted.

You can as well assign to several scalar variables and an associative array at the aforementioned time.

($var1, $var2, %myarray) = @listing;

Here, the showtime element of @listing is assigned to $var1, the 2nd to $var2, and the residue to %myarray.

Finally, an associative array can exist created from the return value of a built-in office or user-defined subroutine that returns a listing. List 10.iii is an instance of a simple program that does just that. It takes the return value from split, which is a list, and assigns it to an associative array variable.

Listing 10.iii. A program that uses the return value from a congenital-in part to create an associative array.

1:  #!/usr/local/bin/perl  ii:  3:  $inputline = <STDIN>;  4:  $inputline =~ south/^\s+|\southward+\due north$//one thousand;  five:  %fruit = divide(/\due south+/, $inputline);  six:  impress ("Number of bananas: $fruit{\"bananas\"}\n");

$ program10_3  oranges 5 apples vii bananas xi cherries 6  Number of bananas: xi  $

This program reads a line of input from the standard input file and eliminates the leading and trailing white infinite. Line 5 so calls split, which breaks the line into words. In this example, separate returns the following list:

("oranges", 5, "apples", 7, "bananas", xi, "cherries", vi)

This list is then assigned to the associative array %fruit. This assignment creates an associative assortment with four elements:

Chemical element Value

oranges 5

apples 7

bananas eleven

cherries half dozen

Line vi and so prints the value of the element bananas, which is 11.

Adding and Deleting Assortment Elements

As you've seen, you can add an element to an associative array by assigning to an element non previously seen, as follows:

$fruit{"lime"} = 1;

This argument creates a new element of %fruit with index lime and gives information technology the value 1.

To delete an element, use the congenital-in function delete. For example, the following argument deletes the element orange from the array %fruit:

delete($fruit{"orange"});

DO utilize the delete part to delete an element of an associative array; it's the merely style to delete elements.

DON'T use the built-in functions push, popular, shift, or splice with associative arrays considering the position of any item chemical element in the array is not guaranteed.

Listing Array Indexes and Values

Equally you saw in List 10.2, the keys function retrieves a list of the subscripts used in an associative array. The following is an example:

%fruit = ("apples", 9,             "bananas", 23,             "cherries", 11);  @fruitsubs = keys(%fruits);

Here, @fruitsubs is assigned the list consisting of the elements apples, bananas, and cherries. Annotation again that this list is in no particular order. To recall the list in alphabetical lodge, utilise sort on the list.

@fruitindexes = sort keys(%fruits));

This produces the list ("apples", "bananas", "cherries").

To retrieve a listing of the values stored in an associative array, utilize the built-in function values. The following is an instance:

%fruit = ("apples", 9,             "bananas", 23,             "cherries", eleven);  @fruitvalues = values(%fruits);

Here, @fruitvalues contains the listing (ix, 23, 11), non necessarily in this society.

Looping Using an Associative Array

Equally you've seen, you can use the congenital-in function keys with the foreach statement to loop through an associative array. The following is an example:

%records = ("Maris", 61, "Aaron", 755, "Young", 511);  foreach $holder (keys(%records)) {          # stuff goes hither  }

The variable $holder is assigned Aaron, Maris, and Young on successive iterations of the loop (although not necessarily in that order).

This method of looping is useful, but it is inefficient. To call back the value associated with a subscript, the program must look it up in the assortment again, equally follows:

foreach $holder (keys(%records)) {          $tape = %records{$holder};  }

Perl provides a more than efficient way to piece of work with associative array subscripts and their values, using the congenital-in part each, every bit follows:

%records = ("Maris", 61, "Aaron", 755, "Immature", 511);  while (($holder, $record) = each(%records)) {          # stuff goes here  }

Every fourth dimension the each part is called, it returns a two-element list. The offset chemical element of the list is the subscript for a particular chemical element of the associative assortment. The second element is the value associated with that particular subscript.

For example, the offset time each is called in the preceding code fragment, the pair of scalar variables ($holder, $record) is assigned one of the lists ("Maris", 61), ("Aaron", 755), or ("Young", 511). (Because associative arrays are not stored in any particular order, whatsoever of these lists could be assigned beginning.) If ("Maris", 61) is returned by the kickoff call to each, Maris is assigned to $holder and 61 is assigned to $record.

When each is called over again, it assigns a different list to the pair of scalar variables specified. Subsequent calls to each assign further lists, and so on until the associative assortment is exhausted. When at that place are no more elements left in the associative assortment, each returns the empty listing.

NOTE

Don't add a new chemical element to an associative array or delete an chemical element from information technology if you are using the each statement on it. For example, suppose you are looping through the associative assortment %records using the post-obit loop:

while (($holder, $record) = each(%records)) { # code goes here }

Adding a new record to %records, such equally

$records{"Rose"} = 4256;

or deleting a record, as in

delete $records{"Cobb"};

makes the beliefs of each unpredictable. This should be avoided.

Creating Data Structures Using Associative Arrays

You tin can utilize associative arrays to simulate a wide variety of data structures found in high-level programming languages. This section describes how you can implement the following information structures in Perl using associative arrays:

Linked lists
Structures
Trees
Databases

NOTE

The remainder of today'southward lesson describes applications of associative arrays but does non introduce any new features of Perl. If you are not interested in applications of associative arrays, you can skip to the next affiliate without suffering whatsoever loss of general instruction.

Linked Lists

A linked list is a simple data structure that enables you to shop items in a particular order. Each element of the linked listing contains two fields:

The value associated with this element
A reference, or pointer, to the next element in the list

Also, a special header variable points to the first element in the list.

Pictorially, a linked list can be represented every bit in Effigy 10.1. As you can run across, each element of the list points to the next.

Figure 10.one: A linked list.

In Perl, a linked list can easily be implemented using an associative array because the value of one associative array chemical element tin can exist the subscript for the next. For example, the post-obit associative array is actually a linked list of words in alphabetical order:

%words = ("abel", "baker",            "baker", "charlie",            "charlie", "delta",            "delta", "");  $header = "abel";

In this example, the scalar variable $header contains the commencement word in the listing. This word, abel, is likewise the subscript of the offset element of the associative assortment. The value of the outset chemical element of this array, bakery, is the subscript for the second element, and so on, equally illustrated in Figure 10.two.

Effigy 10.ii: A linked listing of words in alphabetical order.

The value of the terminal chemical element of the subscript, delta, is the null string. This indicates the cease of the listing.

Linked lists are most useful in applications where the amount of data to be candy is not known, or grows as the program is executed. Listing 10.4 is an example of one such application. Information technology uses a linked list to impress the words of a file in alphabetical order.

Listing x.4. A program that uses an associative array to build a linked list.

i:  #!/usr/local/bin/perl  ii:  three:  # initialize list to empty  4:  $header = "";  5:  while ($line = <STDIN>) {  6:          # remove leading and trailing spaces  seven:          $line =~ s/^\s+|\s+$//g;  8:          @words = split(/\s+/, $line);  9:          foreach $discussion (@words) {  x:                 # remove closing punctuation, if whatever  11:                 $word =~ s/[.,;:-]$//;  12:                 # catechumen all words to lower example  13:                 $give-and-take =~ tr/A-Z/a-z/;  fourteen:                 &add_word_to_list($discussion);  15:         }  16: }  17: &print_list;  18:  19: sub add_word_to_list {  20:         local($discussion) = @_;  21:         local($pointer);  22:  23:         # if list is empty, add first item  24:         if ($header eq "") {  25:                 $header = $give-and-take;  26:                 $wordlist{$discussion} = "";  27:                 render;  28:         }  29:         # if word identical to outset element in listing,  30:         # do nothing  31:         return if ($header eq $give-and-take);  32:         # encounter whether word should exist the new  33:         # first word in the list  34:         if ($header gt $word) {  35:                 $wordlist{$discussion} = $header;  36:                 $header = $word;  37:                 return;  38:         }  39:         # find place where word belongs  40:         $arrow = $header;  41:         while ($wordlist{$pointer} ne "" &&  42:                 $wordlist{$pointer} lt $word) {  43:                 $pointer = $wordlist{$pointer};  44:         }  45:         # if word already seen, do nothing  46:         return if ($give-and-take eq $wordlist{$pointer});  47:         $wordlist{$word} = $wordlist{$arrow};  48:         $wordlist{$pointer} = $word;  49: }  l:  51: sub print_list {  52:         local ($arrow);  53:         print ("Words in this file:\n");  54:         $pointer = $header;  55:         while ($pointer ne "") {  56:                 print ("$pointer\n");  57:                 $arrow = $wordlist{$pointer};  58:         }  59: }

$ program10_4  Here are some words.  Here are more words.  Here are nevertheless more words.  ^D  Words in this file:  are  hither  more  some  nonetheless  words  $

The logic of this program is a little complicated, but don't despair. Once you empathise how this works, you take all the information you lot demand to build whatsoever information construction you like, no thing how complicated.

This program is divided into iii parts, as follows:

The main plan, which reads input and transforms it into the desired format
The subroutine add_word_to_list, which builds the linked list of sorted words
The subroutine print_list, which prints the list of words

Lines 3-17 contain the main program. Line 4 initializes the list of words past setting the header variable $header to the null string. The loop starting time in line 5 reads ane line of input at a time. Line 7 removes leading and abaft spaces from the line, and line viii splits the line into words.

The inner foreach loop in lines 9-15 processes one discussion of the input line at a time. If the final character of a word is a punctuation character, line 11 removes information technology; this ensures that, for instance, discussion. (word with a period) is considered identical to discussion (without a menstruation). Line 13 converts the word to all lowercase characters, and line xiv passes the give-and-take to the subroutine add_word_to_list.

This subroutine first executes line 24, which checks whether the linked list of words is empty. If it is, line 25 assigns this word to $header, and line 26 creates the outset element of the list, which is stored in the associative array %wordlist. In this case, the kickoff discussion read in is here (Here converted to lowercase), and the list looks like Figure 10.iii.

Figure 10.3: The linked list with ane element in it.

At this signal, the header variable $header contains the value hither, which is too the subscript for the chemical element of %wordlist that has merely been created. This means that the programme can reference %wordlist by using $header as a subscript, as follows:

$wordlist{$header}

Variables such as $header that comprise a reference to another data item are called pointers. Hither, $header points to the first element of %wordlist.

If the list is non empty, line 31 checks whether the first item of the listing is identical to the word currently being checked. To practise this, information technology compares the current word to the contents of $header, which is the first item in the list. If the two are identical, there is no need to add together the new word to the list, because it is already in that location; therefore, the subroutine returns without doing anything.

The adjacent step is to check whether the new discussion should be the first discussion in the list, which is the example if the new word is alphabetically ahead of the existing kickoff word. Line 34 checks this.

If the new discussion is to go starting time, the list is adjusted equally follows:

A new list element is created. The subscript of this element is the new word, and its value is the existing first word.
The new discussion is assigned to the header variable.

To see how this adjustment works, consider the sample input provided. In this example, the 2nd word to be candy is are. Considering are belongs before hither, the array element $wordlist{"are"} is created, and is given the value hither. The header variable $header is assigned the value are. This ways the list now looks like Figure 10.4.

Figure 10.iv: The linked list with ii elements in it.

The header variable $header now points to the list element with the subscript are, which is $wordlist{"are"}. The value of $wordlist{"are"} is here, which ways that the programme can admission $wordlist{"here"} from $wordlist{"are"}. For example:

$reference = $wordlist{"are"};  print ("$wordlist{$reference}\n");

The value here is assigned to $reference, and print prints $wordlist{$reference}, which is $wordlist{"hither"}.

Because you can access $wordlist{"here"} from $wordlist{"are"}, $wordlist{"are"} is a pointer to $wordlist{"here"}.

If the word does not belong at the front of the list, lines 40-44 search for the place in the list where the give-and-take does vest, using a local variable, $pointer. Lines 41-44 loop until the value stored in $wordlist{$pointer} is greater than or equal to $discussion. For case, Figure 10.v illustrates where line 42 is true when the subroutine processes more.

Figure 10.5: The linked list when more is candy.

Note that because the list is in alphabetical gild, the value stored in $pointer is always less than the value stored in $word.

If the word being added is greater than whatsoever word in the list, the conditional expression in line 41 somewhen becomes true. This occurs, for instance, when the subroutine processes some, every bit in Figure 10.6.

Figure x.6: The linked list when some is processed.

Once the location of the new word has been determined, line 46 checks whether the word already is in the list. If it is, there is no need to practice anything.

If the word does not exist, lines 47 and 48 add the word to the list. First, line 47 creates a new element of %wordlist, which is $wordlist{$word}; its value is the value of $wordlist{$pointer}. This means that $wordlist{$give-and-take} and $wordlist{$pointer} now point to the same word, every bit in Figure 10.7.

Effigy 10.seven: The linked list every bit a new word is existence added.

Side by side, line 48 sets the value of $wordlist{$pointer} to the value stored in $word. This means that $wordlist{$pointer} now points to the new chemical element, $wordlist{$word}, that was just created, as in Effigy x.8.

Figure 10.8: The linked list after the new discussion is added.

Once the input file has been completely processed, the subroutine print_list prints the list, one element at a time. The local variable $pointer contains the current value beingness printed, and $wordlist{$pointer} contains the next value to be printed.

NOTE

Ordinarily, you won't want to utilize a linked list in a program. It's easier just to employ sort and keys to loop through an associative array in alphabetical lodge, every bit follows:

foreach $word (sort keys(%wordlist)) { # print the sorted list, or whatever }

However, the basic idea of a pointer, which is introduced hither, is useful in other information structures, such as copse, which are described subsequently.

Structures

Many programming languages enable you to define collections of data called structures. Like lists, structures are collections of values; each element of a structure, however, has its ain name and can exist accessed by that proper noun.

Perl does non provide a fashion of defining structures directly. However, you tin simulate a structure using an associative assortment. For case, suppose you want to simulate the following variable definition written in the C programming language:

struct {          int field1;          int field2;          int field3;  } mystructvar;

This C statement defines a variable named mystructvar, which contains three elements, named field1, field2, and field3.

To simulate this using an associative array, all you lot need to practise is define an associative array with three elements, and set the subscripts for these elements to field1, field2, and field3. The post-obit is an example:

%mystructvar = ("field1", "",               "field2", "",               "field3", "");

Like the preceding C definition, this associative array, named %mystructvar, has 3 elements. The subscripts for these elements are field1, field2, and field3. The definition sets the initial values for these elements to the null string.

As with any associative array, yous can reference or assign the value of an element past specifying its subscript, as follows:

$mystructvar{"field1"} = 17;

To define other variables that use the same "structure," all y'all need to do is create other arrays that utilize the aforementioned subscript names.

Trees

Some other data construction that is often used in programs is a tree. A tree is similar to a linked listing, except that each element of a tree points to more than one other element.

The simplest example of a tree is a binary tree. Each element of a binary tree, chosen a node, points to two other elements, chosen the left child and the right child. Each of these children points to two children of its own, and so on, as illustrated in Effigy 10.9.

Figure ten.nine: A binary tree.

Note that the tree, like a linked list, is a i-way structure. Nodes indicate to children, just children don't betoken to their parents.

The post-obit terminology is used when describing trees:

Because each of the children of a node is a tree of its own, the left child and the right kid are often called the left subtree and the right subtree of the node. (The terms left branch and right branch are likewise used.)
The "first" node of the tree (the node that is not a kid of another node), is called the root of the tree.
Nodes that take no children are chosen leaf nodes.

There are several ways of implementing a tree structure using associative arrays. To illustrate i way of doing then, suppose that you wish to create a tree whose root has the value alpha and whose children accept the values beta and gamma, as in Figure 10.x.

Figure x.x: A binary tree with three nodes.

Here, the left kid of alpha is beta, and the correct child of alpha is gamma.

The problem to exist solved is this: How can a program associate both beta and gamma with blastoff? If the associative array that is to correspond the tree is named %tree, do you assign the value of $tree{"blastoff"} to be beta, or gamma, or both? How practise you show that an element points to two other elements?

There are several solutions to this problem, but one of the most elegant is equally follows: Append the character strings left and correct, respectively, to the name of a node in order to retrieve its children. For case, define alphaleft to point to beta and alpharight to point to gamma. In this scheme, if beta has children, betaleft and betaright bespeak to their locations; similarly, gammaleft and gammaright point to the locations of the children of gamma, and so on.

List 10.5 is an example of a program that creates a binary tree using this method and and so traverses it (accesses every node in the tree).

Listing 10.v. A program that uses an associative array to represent a binary tree.

one:  #!/usr/local/bin/perl  two:  three:  $rootname = "parent";  four:  %tree = ("parentleft", "child1",  five:           "parentright", "child2",  half-dozen:           "child1left", "grandchild1",  seven:           "child1right", "grandchild2",  8:           "child2left", "grandchild3",  nine:           "child2right", "grandchild4");  10: # traverse tree, printing its elements  xi: &print_tree($rootname);  12:  13: sub print_tree {  fourteen:         local ($nodename) = @_;  15:         local ($leftchildname, $rightchildname);  sixteen:  17:         $leftchildname = $nodename . "left";  18:         $rightchildname = $nodename . "correct";  xix:         if ($tree{$leftchildname} ne "") {  20:                 &print_tree($tree{$leftchildname});  21:         }  22:         print ("$nodename\n");  23:         if ($tree{$rightchildname} ne "") {  24:                 &print_tree($tree{$rightchildname});  25:         }  26: }

$ program10_5  grandchild1  child1  grandchild2  parent  grandchild3  child2  grandchild4  $

This plan creates the tree depicted in Figure 10.eleven.

Figure 10.11: The tree created by Listing x.v.

The associative array %tree stores the tree, and the scalar variable $rootname holds the name of the root of the tree. (Note that the grandchild nodes, such as grandchild1, are leafage nodes. At that place is no need to explicitly create grandchild1left, grandchild1right, and so on because the value of whatever undefined associative array chemical element is, by default, the null string.)

Subsequently the tree has been created, the program calls the subroutine print_tree to traverse information technology and print its values. print_tree does this as follows:

Line 17 appends left to the proper name of the node beingness examined to produce the name of the left child, which is stored in $leftchildname. For example, if the root node, parent, is being examined, the value stored in $leftchildname is parentleft.
Similarly, line xviii appends right to the node name and stores the issue in $rightchildname.
Line nineteen checks whether the current node has a left child, which is true if $tree{$leftchildname} is divers. (For instance, parent has a left child, because $tree{"parentleft"} is defined.) If the current node has a left child, line twenty recursively calls print_tree to impress the left subtree (the left child and its children).
Line 22 prints the proper name of the current node.
Line 23 checks whether the electric current node has a right child. If information technology does, line 24 recursively calls print_tree to print the right subtree.

Note that print_tree prints the names of the nodes of the tree in the following order: left subtree, node, right subtree. This order of traversal is called infix way or infix traversal. If y'all move line 22 to precede line 19, the node is printed first, followed by the left subtree and the right subtree; this guild of traversal is chosen prefix mode. If yous motility line 22 to follow line 25, the node is printed afterwards the subtrees are printed; this is called postfix mode.

Databases

As you have seen, you can build a tree using an associative array. To do this, you build the associative array subscripts by joining character strings together (such as joining the node name and "left"). Y'all can utilize this technique of joining strings together to use associative arrays to build other data structures.

For example, suppose yous desire to create a database that contains the lifetime records of baseball game players. Each tape is to consist of the post-obit:

For non-pitchers, a record consists of games played (GP), home runs (60 minutes), runs batted in (RBI) and batting boilerplate (AVG). For instance, the tape on Lou Gehrig would read as follows:
Gehrig: 2164 GP, 493 HR, 1991 RBI, .340 BA
For pitchers, a record consists of games pitched (GP), wins (W), and earned run average (ERA). For case, the record on Lefty Grove would read as follows:
Grove: 616 GP, 300 Westward, three.05 ERA

To create a database containing histrion and pitcher records, you need the post-obit fields:

A proper noun field, for the thespian'southward proper name
A key indicating whether the player was a pitcher
The fields defined to a higher place

You tin can use an associative array to simulate this in Perl. To exercise this, build the subscripts for the associative array by concatenating the name of the histrion with the proper noun of the field beingness stored by this element of the array. For example, if the associative assortment is named %playerbase, $playerbase{"GehrigRBI"}, it contains the career RBI full for Lou Gehrig.

Listing 10.6 shows how to build a player database and how to sequentially impress fields from each of the player records.

List 10.6. A program that builds and prints a database.

i:  #!/usr/local/bin/perl  2:  three:  @pitcherfields = ("Proper name", "KEY", "GP", "W", "ERA");  4:  @playerfields = ("Proper noun", "Central", "GP", "HR", "RBI", "BA");  5:  6:  # Build the role player database by reading from standard input.  vii:  # %playerbase contains the database, @playerlist the list of  8:  # players (for later sequential access).  9:  $playercount = 0;  x: while ($input = <STDIN>) {  eleven:         $input =~ s/^\due south+|\s+$//g;  12:         @words = split up (/\due south+/, $input);  13:         $playerlist[$playercount++] = $words[0];  14:         if ($words[one] eq "player") {  15:                 @fields = @playerfields;  16:         } else {  17:                 @fields = @pitcherfields;  eighteen:         }  19:         for ($count = i; $count <= @words; $count++) {  20:                 $playerbase{$words[0].$fields[$count-ane]} =  21:                         $words[$count-1];  22:         }  23: }  24:  25: # now, print out pitcher win totals and player home run totals  26: foreach $role player (@playerlist) {  27:         print ("$player: ");  28:         if ($playerbase{$histrion."KEY"} eq "player") {  29:                 $value = $playerbase{$histrion."60 minutes"};  30:                 impress ("$value dwelling house runs\north");  31:         } else {  32:                 $value = $playerbase{$role player."W"};  33:                 print ("$value wins\n");  34:         }  35: }

$ program10_6  Gehrig        player      2164    493    1991   .340  Ruth          actor      2503    714    2217   .342  Grove         pitcher     616     300    iii.05  Williams      player      2292    521    1839   .344  Koufax        pitcher     397     165    2.76  ^D  Gehrig: 493 dwelling house runs  Ruth: 714 dwelling house runs  Grove: 300 wins  Williams: 521 home runs  Koufax: 165 wins  $

This program has been designed so that it is easy to add new fields to the database. With this in mind, lines iii and 4 ascertain the fields that are to be used when building the role player and pitcher records.

Lines 9-23 build the database. First, line 9 initializes $playercount to 0; this global variable keeps rails of the number of players in the database.

Lines x-12 read a line from the standard input file, check whether the file is empty, remove leading and trailing white infinite from the line, and divide the line into words.

Line xiii adds the player name (the first word in the input line) to the list of player names stored in @playerlist. The counter $playercount then has one added to it; this reflects the new total number of players stored in the database.

Lines 14-18 determine whether the new player is a pitcher or non. If the player is a pitcher, the names of the fields to be stored in this player record are to be taken from @pitcherfields; otherwise, the names are to be taken from @playerfields. To simplify processing later on, another assortment variable, @fields, is used to shop the listing of fields actually being used for this role player.

Lines nineteen-22 re-create the fields into the associative assortment, one at a time. Each array subscript is made up of two parts: the name of the player and the name of the field existence stored. For example, Sandy Koufax's pitching wins are stored in the array element KoufaxW. Note that neither the role player proper noun nor the field names appear in this loop; this ways that you can add together new fields to the list of fields without having to alter this code.

Lines 26-35 now search the database for all the win and home run totals merely read in. Each iteration of the foreach loop assigns a different player name to the local variable $player. Line 28 examines the contents of the array element named $thespian."Primal" to determine whether the player is a bullpen.

If the player is not a pitcher, lines 29-thirty impress out the player's abode-run total by accessing the array element $role player."HR". If the actor is a pitcher, the pitcher's win full is printed out by lines 32-33; these lines access the array chemical element $actor."W".

Notation that the database can be accessed randomly as well as sequentially. To retrieve, for example, Babe Ruth'southward lifetime batting average, you lot would admission the array chemical element $playerbase{"RuthAVG"}. If the record for a detail player is not stored in the database, attempting to access it volition render the zip string. For instance, the following assigns the null string to $cobbavg considering Ty Cobb is non in the actor database:

$cobbavg = $playerbase{"CobbAVG"};

Every bit you tin can see, associative arrays enable yous to define databases with variable record lengths, accessible either sequentially or randomly. This gives you all the flexibility you need to apply Perl as a database language.

Example: A Estimator Program

Listing 10.vii provides an example of what y'all tin practice with associative arrays and recursive subroutines. This program reads in an arithmetics expression, possibly spread over several lines, and builds a tree from it. The program then evaluates the tree and prints the event. The operators supported are +, -, *, /, and parentheses (to force precedence).

This program is longer and more complicated than the programs you have seen so far, but stick with information technology. In one case you lot empathise this program, y'all volition know enough to be able to write an entire compiler in Perl!

List 10.7. A calculator program that uses trees.

1:  #!/usr/local/bin/perl  2:  # statements which initialize the program  3:  $nextnodenum = 1;  # initialize node name generator  four:  &get_next_item;    # read start value from file  v:  $treeroot = &build_expr;  6:  $upshot = &get_result ($treeroot);  7:  print ("the result is $result\n");  8:  # Build an expression.  ix:  sub build_expr {  x:         local ($currnode, $leftchild, $rightchild);  11:         local ($operator);  12:         $leftchild = &build_add_operand;  13:         if (&is_next_item("+") || &is_next_item("-")) {  fourteen:                 $operator = &get_next_item;  fifteen:                 $rightchild = &build_expr;  sixteen:                 $currnode = &get_new_node ($operator,  17:                         $leftchild, $rightchild);  18:         } else {  19:                 $currnode = $leftchild;  20:         }  21: }  22: # Build an operand for a + or - operator.  23: sub build_add_operand {  24:         local ($currnode, $leftchild, $rightchild);  25:         local ($operator);  26:         $leftchild = &build_mult_operand;  27:         if (&is_next_item("*") || &is_next_item("/")) {  28:                 $operator = &get_next_item;  29:                 $rightchild = &build_add_operand;  30:                 $currnode = &get_new_node ($operator,  31:                         $leftchild, $rightchild);  32:        } else {  33:                 $currnode = $leftchild;  34:        }  35: }  36: # Build an operand for the * or / operator.  37: sub build_mult_operand {  38:         local ($currnode);  39:         if (&is_next_item("(")) {  40:                 # handle parentheses  41:                &get_next_item;  # get rid of "("  42:                 $currnode = &build_expr;  43:                 if (! &is_next_item(")")) {  44:                         die ("Invalid expression");  45:                 }  46:                 &get_next_item;  # get rid of ")"  47:         } else {  48:                 $currnode = &get_new_node(&get_next_item,  49:                             "", "");  50:        }  51:        $currnode;  # ensure correct render value  52: }  53: # Check whether the concluding item read matches  54: # a particular operator.  55: sub is_next_item {  56:         local ($expected) = @_;  57:         $curritem eq $expected;  58: }  59: # Return the last detail read; read some other particular.  60: sub get_next_item {  61:         local ($retitem);  62:         $retitem = $curritem;  63:         $curritem = &read_item;  64:         $retitem;  65: }  66: # This routine actually handles reading from the standard  67: # input file.  68: sub read_item {  69:         local ($line);  70:         if ($curritem eq "EOF") {  71:                 # nosotros are already at end of file; practise nix  72:                 render;  73:         }  74:         while ($wordsread == @words) {  75:                 $line = <STDIN>;  76:                 if ($line eq "") {  77:                         $curritem = "EOF";  78:                         return;  79:                 }  80:                 $line =~ southward/\(/ ( /k;  81:                 $line =~ s/\)/ ) /g;  82:                 $line =~ s/^\s+|\due south+$//g;  83:                 @words = divide(/\south+/, $line);  84:                 $wordsread = 0;  85:         }  86:         $curritem = $words[$wordsread++];  87: }  88: # Create a tree node.  89: sub get_new_node {  ninety:         local ($value, $leftchild, $rightchild) = @_;  91:         local ($nodenum);  92:         $nodenum = $nextnodenum++;  93:         $tree{$nodenum} = $value;  94:         $tree{$nodenum . "left"} = $leftchild;  95:         $tree{$nodenum . "right"} = $rightchild;  96:         $nodenum;   # return value  97: }  98: # Summate the outcome.  99: sub get_result {  100:         local ($node) = @_;  101:         local ($nodevalue, $result);  102:        $nodevalue = $tree{$node};  103:        if ($nodevalue eq "") {  104:                die ("Bad tree");  105:        } elsif ($nodevalue eq "+") {  106:                $result = &get_result($tree{$node . "left"}) +  107:                          &get_result($tree{$node . "correct"});  108:        } elsif ($nodevalue eq "-") {  109:                $event = &get_result($tree{$node . "left"}) -  110:                          &get_result($tree{$node . "right"});  111:        } elsif ($nodevalue eq "*") {  112:                $result = &get_result($tree{$node . "left"}) *  113:                          &get_result($tree{$node . "correct"});  114:        } elsif ($nodevalue eq "/") {  115:                $result = &get_result($tree{$node . "left"}) /  116:                          &get_result($tree{$node . "right"});  117:        } elsif ($nodevalue =~ /^[0-9]+$/) {  118:                $issue = $nodevalue;  119:        } else {  120:                die ("Bad tree");  121:        }  122:}

$ program10_7  xi + 5 *  (four - three)  ^D  the result is sixteen  $

This program is divided into two main parts: a part that reads the input and produces a tree, and a office that calculates the result by traversing the tree.

The subroutines build_expr, build_add_operand, and build_mult_operand build the tree. To see how they practise this, first await at Figure 10.12 to meet what the tree for the case, 11 + 5 * (iv - 3), should look like.

Figure 10.12: The tree for the example in Listing 10.vii.

When this tree is evaluated, the nodes are searched in postfix society. First, the left subtree of the root is evaluated, so the right subtree, and finally the operation at the root.

The rules followed past the iii subroutines are spelled out in the following clarification:

An expression consists of one of the following:
1. An add_operand
2. An add_operand, a + or - operator, and an expression

An add_operand consists of ane of the post-obit:
1. A mult_operand
2. A mult_operand, a * or / operator, and an add_operand

A mult_operand consists of one of the following:
1. A number (a group of digits)
2. An expression enclosed in parentheses

The subroutine build_expr handles all occurrences of condition 1; it is called (perhaps recursively) whenever an expression is processed. Status 1a covers the instance in which the expression contains no + or - operators (unless they are enclosed in parentheses). Condition 1b handles expressions that contain one or more than + or - operators.

The subroutine build_add_operand handles condition 2; information technology is called whenever an add_operand is processed. Status 2a covers the example in which the add operand contains no * or / operators (except mayhap in parentheses). Status 2b handles add together operands that contain ane or more * or / operators.

The subroutine build_mult_operand handles condition three and is chosen whenever a mult_operand is processed. Condition 3a handles multiplication operands that consist of a number. Condition 3b handles multiplication operators that consist of an expression in parentheses; to obtain the subtree for this expression, build_mult_operand calls build_expr recursively and then treats the returned subtree as a child of the node currently being built.

Note that the tree built past build_expr, build_mult_operand, and build_add_operand is slightly unlike from the tree yous saw in Listing x.five. In that tree, the value of the node could also exist used as the subscript into the associative array. In this tree, the value of the node might not be unique. To get effectually this trouble, a split counter creates numbers for each node, which are used when building the subscripts. For each node numbered n (where north is an integer), the following are truthful:

$tree{northward} contains the value of the node, which is the number or operator associated with the node.
$tree{due north."left"} contains the number of the left kid of the node.
$tree{north."right"} contains the number of the right child of the node.

The subroutines is_next_item, get_next_item, and read_item read the input from the standard input file and pause it into numbers and operators. The subroutine get_next_item "pre-reads" the next item and stores information technology in the global variable $curritem; this lets is_next_item cheque whether the side by side detail to be read matches a particular operator. To read an item, get_next_item calls read_item, which reads lines of input, breaks them into words, and returns the next word to exist read.

The subroutine get_new_node creates a tree node. To do this, it uses the contents of the global variable $nextnodenum to build the associative array subscripts associated with the node. $nextnodenum always contains a positive integer n, which means that the value associated with this node (which is a number or operator) is stored in $tree{n}. The location of the left and right children, if any, are stored in $tree{n."left"} and $tree {n."correct"}.

The subroutine get_result traverses the tree built past build_expr in postfix order (subtrees first), performing the arithmetics operations as it does and then. get_result returns the concluding result, which is then printed.

Notation that the main office of this program is merely viii lines long! This often happens in more than complex programs. The chief part of the program just calls subroutines, and the subroutines do the actual piece of work.

NOTE

This programme is only the tip of the iceberg: you can use associative arrays to simulate any information structure in any programming language.

Summary

In today's lesson, yous learned near associative arrays, which are arrays whose subscripts tin be any scalar value.

You can copy a list to an associative assortment, provided there is an even number of elements in the listing. Each pair of elements in the list is treated as an associated array subscript and its value. You also can copy an associative assortment to an ordinary array variable.

To add an chemical element to an associative array, but assign a value to an element whose subscript has not been previously seen. To delete an element, call the built-in function delete.

The post-obit 3 born functions enable you to use associative arrays with foreach loops:

The congenital-in function keys retrieves each associative array subscript in turn.
The built-in function values retrieves each associative array value in turn.
The born function each retrieves each subscript-value pair in turn (equally a two-element list).

Associative arrays are not guaranteed to be stored in whatever particular order. To guarantee a particular social club, employ sort with keys, values, or each.

Associative arrays tin can be used to simulate a wide variety of information structures, including linked lists, structures, trees, and databases.

Q&A

Q:	Are pointers implemented in Perl?
A:	Aye, if you lot are using Perl 5; they are discussed on Twenty-four hour period xviii, "References in Perl 5." Perl 4 does not support pointers.
Q:	How can I implement more complicated data structures using associative arrays?
A:	All you need to do is design the structure yous want to implement, proper noun each of the fields in the construction, and use the proper name-concatenation pull a fast one on to build your associative array subscript names.
Q:	What practise I exercise if I want to build a tree that has multiple values at each node?
A:	There are many ways to do this. One way is to suspend `value1`, `value2`, and and then on, to the name of each node; for instance, if the node is named `n7`, `n7value1` could be the associative array subscript for the starting time value associated with the node, `n7value2` could be the subscript for the second, and and then on.
Q:	What do I practice if I want to build a tree that has more than 2 children per node?
A:	Over again, in that location are many ways to practise this. A possible solution is to utilize `child1`, `child2`, `child3`, so on, instead of `left` and `right`.
Q:	How practise I destroy a tree that I have created?
A:	To destroy a tree, write a subroutine that traverses the tree in postfix order (subtrees first). Destroy each subtree (by recursively calling your subroutine), and so destroy the node you are looking at by calling `delete`. Annotation that you shouldn't apply `keys` or `each` to access each element of the loop before deleting it. Deleting an element affects how the associative array is stored, which ways that `keys` and `each` might non comport the way you want them to. If you desire to destroy the entire associative array in which the tree is stored, you lot tin can utilise the `undef` function, which is described on Day xiv, "Scalar-Conversion and List-Manipulation Functions."

Workshop

The Workshop provides quiz questions to assist you solidify your understanding of the fabric covered and exercises to give y'all experience in using what you've learned. Try and understand the quiz and exercise answers before you keep to tomorrow'southward lesson.

Quiz

Define the following terms:
1. associative array
2. pointer
3. linked list
4. binary tree
5. node
6. child

What are the elements of the associative array created by the following assignment?
%listing = ("17.2", "hello", "there", "46", "due east+6", "88");
What happens when yous assign an associative array to an ordinary array variable?
How can you create a linked list using an associative array?
How many times does the post-obit loop iterate?
%list = ("first", "one", "second", "2", "third", "3"); foreach $word (keys(%list)) { last if ($word == "second"); }

Exercises

Write a program that reads lines of input consisting of two words per line, such as
bananas 16and creates an associative assortment from these lines. (The commencement discussion is to be the subscript and the second word the value.)
Write a program that reads a file and searches for lines of the course
alphabetize give-and-takewhere word is a give-and-take to exist indexed. Each indexed word is to be stored in an associative array, along with the line number on which it first occurs. (Subsequent occurrences tin be ignored.) Print the resulting index.
Alter the program created in Exercise 2 to store every occurrence of each alphabetize line. (Hint: Try building the associative array subscripts using the indexed word, a non-printable character, and a number.) Print the resulting index.
Write a program that reads input lines consisting of a student name and five numbers representing the educatee'due south marks in English, history, mathematics, scientific discipline, and geography, equally follows:
Jones 61 67 75 80 72Utilise an associative array to store these numbers in a database, and then impress out the names of all students with failing grades (less than l) along with the subjects they failed.
BUG BUSTER: What is incorrect with the following lawmaking?
%list = ("Fred", 61, "John", 72, "Jack", 59, "Mary", 80); $surname = "Smith"; foreach $firstname (keys (%listing)) { %listing{$firstname." ".$surname} = %list{$firstname}; }