Progress Log 107 (R): Vector Subsetting
I’m taking an edX course entitled “Introduction to R for Data Science” and all of the concepts described below come from that course.
Subset by index
Suppose you know what you want to select the first element from the vector
remain. You can use square brackets to do this.
The number in the bracket indicates which element you want from the vector. So, in the first example, the number 1 indicates you want to select the first element (
spades = 1 ) from the vector. The second example chooses the third element (
clubs = 13) from the vector.
The result of these selections is a vector with the length 1. In the first example, the new vector contains the number 1. In the second example, the vector contains the number 11.
Subset by name
If you are dealing with named vectors, you can also use the names to perform the selection.
Instead of using index 1 to select the first element, you can use the name
The result is exactly the same as using the numeric index 1.
The same goes to the third element in the vector.
Subset multiple elements
Suppose you now want to select the elements in the vector that give the remaining spades and clubs in a one-liner and store them in a variable,
remain_black. Instead of using a single number inside the brackets, you can use a vector to specify which indices you want to select. Because spades are at index 1 and clubs at index 4, you use vector containing 1 and 4 inside the square brackets.
Order in selection vector matters!. If you change
c( 1, 4) to
c(4,1), you will get a vector where the clubs come first.
Subsetting multiple elements can also be done using names, at least if you are dealing with a named vector. To get the same result as the command above, you write
Subset all but some
Suppose you want to create a vector that contains all the information that is in the
remain vector, except for the spades count.
You can write
This command removes the first index from the remain vector.
Of course, you can remove multiple elements.
This minus operator does NOT work with names though.
Subset using logical vector
To do this, you typically use a logical vector that has the same length as the vector you want to subset. The elements for which the corresponding value in the selecting vector is
TRUE will be kept in the subset. The vector elements that correspond to
FALSE will not be kept.
Let’s try to select the second and fourth element from the vector using a logical vector.
Of course, you could have created a new vector first and then use it to perform the selection, like this:
Now, you might expect that R throws an error if you try to use a logical vector that is shorter than the vector on which you want to perform subsetting. Trying this shows a different reality.
Suppose you use a vector containing only two logicals instead of four.
No error whatsoever! That is because R performs something called ‘recycling’. R is smart enough to see that the vector of logicals you passed is shorter than the vector, so it repeats the contents of the vector until it has the same length as
remain. This means that, behind the scenes, this line of code is executed, giving the result we’ve observed before.
Even if you use a vector of length 3 to do the selection, the vector is recycled to end up with a vector of length 4, thus appending the first element again.
So this statement:
gets converted to this statement behind the scenes.
Selecting multiple successive elements of
c(2,3,4) is not very convenient. Many statisticians are lazy people by nature, so they created an easier way to do this:
c(2,3,4) can be abbreviated to
2:4, which generates a vector with all natural numbers from 2 up to 4.
So, another way to find the mid-week results is
poker_vector[2:4]. Notice how the vector
2:4 is placed between the square brackets to select element 2 up to 4. You don’t have to use the
c() function if you’re using the shortcut with the colon.
Just like selecting single elements using numerics extends naturally to selecting multiple elements, you can also use a vector of names.
However, you can’t use the colon trick here:
"Monday":"Wednesday" will generate an error.