This lesson is still being designed and assembled (Pre-Alpha version)

Arrays and Vectors

Overview

Teaching: 20 min
Exercises: 20 min
Questions
  • How can I create collections of things in C++

  • How can I get and set values in a collection?

  • If I don’t know the size of the collection in advance, what do I do?

Objectives
  • Understand static arrays of objects in C++ for creating collections.

  • Know how to use the standard library vector to create variable sized collections.

  • Be able to access members of the collection and update, insert and delete values.

N.B.

I feel after having written this episode that it would be very useful to have covered for loops at this point, even just to be able to write an example where one can loop over the container.

It’s also appropriate to introduce container iteration loops alongside containers.

Also, any iterator type really, really should be handled with auto

Introduction

We saw previously how to create variables that represented the basic types of C++. However, usually in scientific and data intensive problems we need to deal a huge amount of data and these are usually grouped into collections of numbers rather than being created and accessed one by one (which would be incredibly tedious!).

Fixed Size Containers: Arrays

In C++ we have the possibility to create arrays of the basic types (well, we’ll see later this can be extended to virtually any object in C++). Here’s a simple example:

#include <iostream>
#include <array>

int main() {
  std::array<double, 4> four_vector {2., 3., 4., -10.};
  std::cout << "This 4-vector has time component " << four_vector[3] << std::endl;

  four_vector[3] = 7.5;
  std::cout << "This 4-vector now has time component " << four_vector[3] << std::endl;

  return 0;
}

What do we see here?

First, arrays in modern C++ are part of the standard library and first we need to tell the compiler we want to use that functionality by including the header file array.

Second, we define an array in C++ using a syntax like this:

array<TYPE, NUMBER>

This is actually an example of what’s called a template in C++, which we’ll study in more detail later, but in this simple example it’s pretty clear how to use it passing these two arguments in.

We’re also showing you a way to initialise multiple value objects like arrays using a standard C++ syntax called uniform initialization: {a, b, c, ... }.

Third, data in the array is accessed using the [n] notation where n counts from 0. So an array of size N has elements 0, 1, 2, ..., N-1.

When we compile and run we get:

$ g++ -std=c++17 array-basic.cpp -o array-basic
$ ./array-basic
This 4-vector has time component -10
This 4-vector now has time component 7.5
$

Note that arrays are always a fixed size in C++.

Array indexes

As we saw, array indexes run from 0 to N-1 for an array of size N. If you accidentally try to access a value outside of this range then the results are undefined behavior - you’re program might crash, or worse it might silently just produce wrong answers. So this is something to be very careful of.

Arrays also support an access method called at which will check that a valid data element is being accessed:

four_vector.at(3)

will access the third element, but will fail (throwing an exception - more on them latter) if the third element wasn’t valid.

If you needed to know the number of elements in an array, this code will return that value to you:

four_vector.size()

EXERCISE FOR ARRAYS HERE

Blah blah blah

Variable Size Containers: Vectors

Arrays are great to use when the data size is known up-front. However, in many cases we might not know how large a container we need at the beginning. In this case C++ comes to our aid with a variable sized container type called a vector.

One can use a vector in much the same way as an array:

#include <iostream>
#include <vector>

int main() {
  std::vector<double> n_vector {2., 3., 4., -10.};
  std::cout << "This n-vector has time component " << n_vector[3] << std::endl;

  n_vector[3] = 7.5;
  std::cout << "This n-vector now has time component " << n_vector[3] << std::endl;

  n_vector[0]++;
  std::cout << "The first element of the vector is " << n_vector.front() <<
  " and the last is " << n_vector.back() << std::endl;
  std::cout << "The vector size is " << n_vector.size() << std::endl;

  return 0;
}

This looks very like our array code above and one of the nice things about these C++ containers is that they are used in very similar ways!

Compiling and running the code produces the expected results:

$ g++ --std=c++17 -Wall vector-basic.cpp -o vector-basic
$ ./vector-basic
This n-vector has time component -10
This n-vector now has time component 7.5
The first element of the vector is 3 and the last is 7.5
The vector size is 4
$

Adding and Deleting Elements from Vectors

Adding

To add a new element to a vector there is a the push_back() method:

#include <iostream>
#include <vector>

int main() {
  std::vector<double> n_vector {};
  n_vector.push_back(1.5);
  n_vector.push_back(2.7);

  std::cout << "Vector elements are " << n_vector[0] << " and " << n_vector[1] <<
  " and the size is " << n_vector.size() << std::endl;

  n_vector.push_back(3.9);
  std::cout << "Vector now has " << n_vector.size() << " elements and the last value is " << n_vector[n_vector.size()-1] << std::endl;

  return 0;
}

The example shows that the push_back() extends the container by one element, adding the argument to as the last value.

$ g++ --std=c++17 -Wall vector-push-back.cpp -o vector-push-back
$ ./vector-push-back
Vector elements are 1.5 and 2.7 and the size is 2
Vector now has 3 elements; and the last value is 3.9

If you need to add an element to an arbitrary position in a vector then you can use the insert() method:

#include <iostream>
#include <vector>

int main() {
  std::vector<double> n_vector {-1.0, -2.0, -3.0};
  n_vector.insert(n_vector.begin(), 100.0);

  std::cout << "The first vector element is " << n_vector[0] <<
  " and the size is " << n_vector.size() << std::endl;

  n_vector.insert(n_vector.begin()+1, 200.0);
  std::cout << "The second vector element is " << n_vector[1] <<
  " and the size is " << n_vector.size() << std::endl;

  return 0;
}

Here we used the method begin() to get the start of the vector and you can see with the second insert that adding values from this starting point work as you would expect (begin()+3, for example, is the 4th element position).

$ g++ --std=c++17 -Wall vector-insert.cpp -o vector-insert
$ ./vector-insert
The first vector element is 100 and the size is 4
The second vector element is 200 and the size is 5

Iterator arithmetic

Formally in C++ the vector v.begin() method returns what is called an iterator that can be used to identify a position inside a container. Adding and subtracting then moves the access point up and down the vector, respectively, e.g.,

auto second_element = v.begin()+1

Just be careful that you don’t try to insert into an invalid place in the vector as bad things will happen, i.e., inserting into a location before the beginning of the vector or inserting into a place more than one step beyond the end. Inserting right at the end is valid and makes insert() behave like push_back():

v.insert(v.end(), VALUE)

behaves the same as

v.push_back(VALUE)

Note that v.end() is used a lot in C++ and returns the position in the container one step beyond the last valid value. (Thus in loops the comparison is always less than: still_valid < v.end().)

EXERCISE FOR VECTORS HERE

Blah blah blah

Deleting

If you want to delete the final element of the vector the pop_back() method will do this. Note this method returns nothing, i.e., it’s a void function in C++.

#include <iostream>
#include <vector>

int main() {
  std::vector<double> n_vector {-1.0, -2.0, -3.0};
  n_vector.pop_back();

  std::cout << "After pop the last vector element is " << n_vector.back() <<
  " and the size is " << n_vector.size() << std::endl;

  return 0;
}

$ g++ --std=c++17 -Wall vector-pop-back.cpp -o vector-pop-back
$ ./vector-pop-back
After pop the last vector element is -2 and the size is 2

And to delete an arbitrary position, use the iterator position that we saw above with erase().

#include <iostream>
#include <vector>

int main() {
  std::vector<double> n_vector {-1.0, -2.0, -3.0, -4.0, -5.0};

  n_vector.erase(n_vector.begin()+2);
  std::cout << "After single erase the third vector element is " << n_vector[2] <<
  " and the size is " << n_vector.size() << std::endl;

  n_vector.erase(n_vector.begin()+2, n_vector.end());
  std::cout << "After a block erase the vector size is " << n_vector.size() << std::endl;

  return 0;
}

$ g++ --std=c++17 -Wall vector-erase.cpp -o vector-erase
$ ./vector-erase
After single erase the third vector element is -4 and the size is 4
After a block erase the vector size is 2

Here we illustrated two ways to use erase, first with a single element and then with a range where all the elements between the starting point and before the end point are removed.

If you want to delete all of the current entries in a vector then use the clear() method:

#include <iostream>
#include <vector>

int main() {
  std::vector<double> n_vector {-1.0, -2.0, -3.0, -4.0, -5.0};

  std::cout << "Vector initial size is " << n_vector.size() << std::endl;
  n_vector.clear();
  std::cout << "After a clear() the vector size is " << n_vector.size() << std::endl;

  return 0;
}

$ g++ --std=c++17 -Wall vector-clear.cpp -o vector-clear
$ ./vector-clear
Vector initial size is 5
After a clear() the vector size is 0

EXERCISE FOR VECTORS EXTENSION HERE

Blah blah blah

How vector storage works in C++

It’s useful to know they way in which a vector works in C++ so you understand how to use it most effectively. Vectors are guaranteed in C++ to occupy contiguous memory, which means that processing a vector’s elements one after another is usually rather efficient on modern CPUs.

However, that does mean that if elements are inserted at the beginning or in the middle of a vector then all the elements after it have to be moved, which is slower. The advice is always try to fill up vectors at the end!

As a vector’s size is mutable the library will usually allocate some more space for the vector elements than are currently used. If there is spare space in the underlying allocation then push_back() will be very quick. However, when the underlying storage is exhausted the library has to reallocate memory in order to add a new element. Roughly this process is:

  1. Call to push_back()
  2. Realise that the current storage if full.
  3. Allocate a new, larger, storage block.
  4. Copy each element in turn from the old block to the new one.
  5. Add the final, new, element.
  6. Free the old storage.

This is slow, and gets slower as the vector grows in size. To offset this problem, use the reserve(N) method of a vector that will tell the library to pre-allocate enough space for N elements in advance. e.g.,

vector<double> v; v.reserve(1000000); for (unsigned int i=0; i<1000000; ++i) v.push_back(i);

will avoid a lot of unnecessary reallocations and copying compared to the same code without the reserve() call.

Key Points

  • C++ can assign fixed sized collections as arrays.

  • C++ can manage variable sized collections as vectors of objects.

  • For vectors, data elements can be added and removed, as well as updated.