Strings and Data Frames

by Karl-Kuno Kunze

This post addresses a common recommendation when it comes to adding lines to data frames.

Adding lines to data frames might not be straight forward

Let us create a very small data frame with two cities and their inhabitants (in Mio.)

Now, add one line through the function rbind()

Here, a default feature of R comes to light: strings are converted to factors before being transferred to a data frame. R does not let you add a String to a factor variable. A factor is a categorial variable that may have different values, like eye color.

Common advice:

Often, you may hear the advice to suppress this default behavior of R through the option stringsAsFactors = FALSE when creating the data frame, as seen below.

Now, R does not convert strings to factors.

If we now try the above addition of a line, we find the following:

Everything runs smoothly now.

The more data scientific approach

The way we saw above is certainly correct und works nicely, but it deprives us of a structure that may be in the data. In addition, there are some functions available in R that rely on factors. Therefore, we might try again without the option stringsAsFactors = FALSE.

If we rewrite the code for adding a line like this, i.e. replace c by data.frame everything runs smoothly again and we conserve the property of a factor: