Combining Character Columns in Census Data

When working with Census data, you may need to create a unique identifier by combining several character columns. This guide demonstrates how to concatenate the STATE, COUNTY, TRACT, and BLOCK columns into a new column called BLOCKID.

Example Data

Consider the following data frame:

AL_Blocks <- data.frame(
  LOGRECNO = c(60, 61, 62, 63, 64, 65),
  STATE = c('01', '01', '01', '01', '01', '01'),
  COUNTY = c('001', '001', '001', '001', '001', '001'),
  TRACT = c('021100', '021100', '021100', '021100', '021100', '021100'),
  BLOCK = c('1053', '1054', '1055', '1056', '1057', '1058')
)

Creating the Combined Column

To create the BLOCKID column that concatenates the values from STATE, COUNTY, TRACT, and BLOCK, you can use the paste0 function in R. This function allows you to combine strings without any separator:

AL_Blocks$BLOCKID <- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))

Resulting Data Frame

After executing the above command, your data frame will look like this:

print(AL_Blocks)
  LOGRECNO STATE COUNTY  TRACT BLOCK      BLOCKID
1       60    01    001 021100 1053 01001021101053
2       61    01    001 021100 1054 01001021101054
3       62    01    001 021100 1055 01001021101055
4       63    01    001 021100 1056 01001021101056
5       64    01    001 021100 1057 01001021101057
6       65    01    001 021100 1058 01001021101058

Conclusion

By using the paste0 function, you can efficiently combine multiple character columns into a single identifier, which is particularly useful for data analysis and reporting in Census data.