How to carry column metadata in pivot_longer

[ad_1]

Pivoting information is usually a ache level in bioinformatics workflows. Lots of bioinformatics software program are tied to the extensive format with information unfold out amongst a number of columns whereas the entire tidyverse/ggplot system requires lengthy information with as few columns as potential. Becoming proficient at switching your information to lengthy format has a number of advantages. (1) It gives a unified format for any required information manipulations and summarizations making them sooner to write and simpler to learn and (2) it’s the required enter format for the ggplot system. In R the tidyverse gives the instruments to interchange extensive and lengthy information.

The Problem:

Typically topics in bioinformatics datasets (columns) may have related metadata like therapies and indicators of teams or replicates. Any metadata that corresponds to rows could be simply added to the information.body to be pivoted (eg. with cbind). But if there are column metadata they’ve to be added manually after the pivot.

The Solution:

There are a few methods to do that. The means I’ve settled on is to have a desk of goal meta-data and use a be a part of after the pivot to join it to the information. I discover myself doing this repeatedly in virtually all of my analyses nevertheless it’s an answer I came across by trial and error. I’ve by no means seen it spelled out explicitly anyplace so right here it’s.

head(relig_income)
## # A tibble: 6 x 12
##   faith `<$10ok` `$10-20ok` `$20-30ok` `$30-40ok` `$40-50ok` `$50-75ok` `$75-100ok` `$100-150ok`
##   <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>      <dbl>       <dbl>
## 1 Agnostic      27        34        60        81        76       137        122         109
## 2 Atheist       12        27        37        52        35        70         73          59
## 3 Buddhist      27        21        30        34        33        58         62          39
## 4 Catholic     418       617       732       670       638      1116        949         792
## 5 Don’t ok~      15        14        15        11        10        35         21          17
## 6 Evangel~     575       869      1064       982       881      1486        949         723
## # ... with Three extra variables: `>150ok` <dbl>, `Don't know/refused` <dbl>, religionClass <chr>

First, create the metadata.

I’ll use the relig_income dataset for example. I’ll display how to add each row metadata (simple) and column metadata (bit tough). For row metadata I’ll add a brand new column for faith class that will probably be outlined randomly and for column metadata I’ll group revenue ranges into low, medium, excessive and unknown listed in a separate information.body. Note that this technique depends on linking information column names to metadata so examine the metadata desk fastidiously!

To add the row metadata I merely add a brand new column to the relig_income desk with my random values. For the column metadata I’ll make a brand new information.body.

## Row metadata
set.seed(10)
relig_income$religionClass <- 
  pattern(c("A", "B", "C"), nrow(relig_income), change = TRUE)

## Column metadata
columnMetadata <- information.body(
  revenue = c(colnames(relig_income)[
    grepl("0", colnames(relig_income))],
    "Don't know/refused"),
  incomeGroup = c(rep("low", 3), rep("medium", 3), 
            rep("high", 3), "Don't know/refused"))
columnMetadata
##                revenue        incomeGroup
## 1               <$10ok                low
## 2             $10-20ok                low
## 3             $20-30ok                low
## 4             $30-40ok             medium
## 5             $40-50ok             medium
## 6             $50-75ok             medium
## 7            $75-100ok               excessive
## 8           $100-150ok               excessive
## 9               >150ok               excessive
## 10 Don't know/refused Don't know/refused

Step 1: pivot_longer as typical

Don’t overlook to exclude the brand new religionClass column from the pivot.

relig_income %>%
  pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count")
## # A tibble: 180 x 4
##    faith religionClass revenue             rely
##    <chr>    <chr>         <chr>              <dbl>
##  1 Agnostic C             <$10ok                 27
##  2 Agnostic C             $10-20ok               34
##  3 Agnostic C             $20-30ok               60
##  4 Agnostic C             $30-40ok               81
##  5 Agnostic C             $40-50ok               76
##  6 Agnostic C             $50-75ok              137
##  7 Agnostic C             $75-100ok             122
##  8 Agnostic C             $100-150ok            109
##  9 Agnostic C             >150ok                 84
## 10 Agnostic C             Don't know/refused    96
## # ... with 170 extra rows

Step 2: be a part of the column metadata

All metadata columns will probably be added robotically with this step.

relig_income %>%
  pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
  inner_join(columnMetadata, by = "income")
## # A tibble: 180 x 5
##    faith religionClass revenue             rely incomeGroup       
##    <chr>    <chr>         <chr>              <dbl> <chr>             
##  1 Agnostic C             <$10ok                 27 low               
##  2 Agnostic C             $10-20ok               34 low               
##  3 Agnostic C             $20-30ok               60 low               
##  4 Agnostic C             $30-40ok               81 medium            
##  5 Agnostic C             $40-50ok               76 medium            
##  6 Agnostic C             $50-75ok              137 medium            
##  7 Agnostic C             $75-100ok             122 excessive              
##  8 Agnostic C             $100-150ok            109 excessive              
##  9 Agnostic C             >150ok                 84 excessive              
## 10 Agnostic C             Don't know/refused    96 Don't know/refused
## # ... with 170 extra rows

Step 3 (elective): Convert character information to ordered elements to management plotting order

relig_income %>%
  pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
  inner_join(columnMetadata, by = "income") %>%
  mutate(revenue = ordered(revenue, ranges = columnMetadata$revenue))
## # A tibble: 180 x 5
##    faith religionClass revenue             rely incomeGroup       
##    <chr>    <chr>         <ord>              <dbl> <chr>             
##  1 Agnostic C             <$10ok                 27 low               
##  2 Agnostic C             $10-20ok               34 low               
##  3 Agnostic C             $20-30ok               60 low               
##  4 Agnostic C             $30-40ok               81 medium            
##  5 Agnostic C             $40-50ok               76 medium            
##  6 Agnostic C             $50-75ok              137 medium            
##  7 Agnostic C             $75-100ok             122 excessive              
##  8 Agnostic C             $100-150ok            109 excessive              
##  9 Agnostic C             >150ok                 84 excessive              
## 10 Agnostic C             Don't know/refused    96 Don't know/refused
## # ... with 170 extra rows

Finally have a look at the mapping to guarantee it labored. Unfortunately desk would not play nicely with the %>% operator so this step is a bit inelegant.

check <- relig_income %>%
  pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
  inner_join(columnMetadata, by = "income") %>%
  mutate(revenue = ordered(revenue, ranges = columnMetadata$revenue))

desk(check$revenue, check$incomeGroup)
##                     
##                      Don't know/refused excessive low medium
##   <$10ok                               0    0  18      0
##   $10-20ok                             0    0  18      0
##   $20-30ok                             0    0  18      0
##   $30-40ok                             0    0   0     18
##   $40-50ok                             0    0   0     18
##   $50-75ok                             0    0   0     18
##   $75-100ok                            0   18   0      0
##   $100-150ok                           0   18   0      0
##   >150ok                               0   18   0      0
##   Don't know/refused                 18    0   0      0

The metadata columns are actually obtainable

We can plot the information summarized by our arbitrary grouping of religions and coloured by our grouped revenue ranges. Order the revenue courses to make a wise presentation

relig_income %>%
  pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
  inner_join(columnMetadata, by = "income") %>%
  mutate(revenue = ordered(revenue, ranges = columnMetadata$revenue)) %>%
  mutate(incomeGroup = ordered(incomeGroup, ranges = c("low", "medium", "high", "Don't know/refused"))) %>%
  group_by(religionClass, revenue, incomeGroup) %>%
  summarize(meanCount = imply(rely), .teams =  "drop_last") %>%
  ggplot(aes(x = revenue, y = meanCount, fill = incomeGroup)) +
  geom_col() +
  facet_wrap(vars(religionClass)) + 
  theme(axis.textual content.x = element_text(angle = 90, vjust = 0.5, hjust=1))



[ad_2]

Source hyperlink

Write a comment