How to carry column metadata in pivot_longer
[ad_1]
Pivoting information is usually a ache level in bioinformatics workflows. Lots of bioinformatics software program are tied to the extensive format with information unfold out amongst a number of columns whereas the entire tidyverse/ggplot system requires lengthy information with as few columns as potential. Becoming proficient at switching your information to lengthy format has a number of advantages. (1) It gives a unified format for any required information manipulations and summarizations making them sooner to write and simpler to learn and (2) it’s the required enter format for the ggplot system. In R the tidyverse gives the instruments to interchange extensive and lengthy information.
The Problem:
Typically topics in bioinformatics datasets (columns) may have related metadata like therapies and indicators of teams or replicates. Any metadata that corresponds to rows could be simply added to the information.body to be pivoted (eg. with cbind
). But if there are column metadata they’ve to be added manually after the pivot.
The Solution:
There are a few methods to do that. The means I’ve settled on is to have a desk of goal meta-data and use a be a part of after the pivot to join it to the information. I discover myself doing this repeatedly in virtually all of my analyses nevertheless it’s an answer I came across by trial and error. I’ve by no means seen it spelled out explicitly anyplace so right here it’s.
head(relig_income)
## # A tibble: 6 x 12
## faith `<$10ok` `$10-20ok` `$20-30ok` `$30-40ok` `$40-50ok` `$50-75ok` `$75-100ok` `$100-150ok`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Agnostic 27 34 60 81 76 137 122 109
## 2 Atheist 12 27 37 52 35 70 73 59
## 3 Buddhist 27 21 30 34 33 58 62 39
## 4 Catholic 418 617 732 670 638 1116 949 792
## 5 Don’t ok~ 15 14 15 11 10 35 21 17
## 6 Evangel~ 575 869 1064 982 881 1486 949 723
## # ... with Three extra variables: `>150ok` <dbl>, `Don't know/refused` <dbl>, religionClass <chr>
First, create the metadata.
I’ll use the relig_income
dataset for example. I’ll display how to add each row metadata (simple) and column metadata (bit tough). For row metadata I’ll add a brand new column for faith class that will probably be outlined randomly and for column metadata I’ll group revenue ranges into low, medium, excessive and unknown listed in a separate information.body
. Note that this technique depends on linking information column names to metadata so examine the metadata desk fastidiously!
To add the row metadata I merely add a brand new column to the relig_income
desk with my random values. For the column metadata I’ll make a brand new information.body
.
## Row metadata
set.seed(10)
relig_income$religionClass <-
pattern(c("A", "B", "C"), nrow(relig_income), change = TRUE)
## Column metadata
columnMetadata <- information.body(
revenue = c(colnames(relig_income)[
grepl("0", colnames(relig_income))],
"Don't know/refused"),
incomeGroup = c(rep("low", 3), rep("medium", 3),
rep("high", 3), "Don't know/refused"))
columnMetadata
## revenue incomeGroup
## 1 <$10ok low
## 2 $10-20ok low
## 3 $20-30ok low
## 4 $30-40ok medium
## 5 $40-50ok medium
## 6 $50-75ok medium
## 7 $75-100ok excessive
## 8 $100-150ok excessive
## 9 >150ok excessive
## 10 Don't know/refused Don't know/refused
Step 1: pivot_longer as typical
Don’t overlook to exclude the brand new religionClass
column from the pivot.
relig_income %>%
pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count")
## # A tibble: 180 x 4
## faith religionClass revenue rely
## <chr> <chr> <chr> <dbl>
## 1 Agnostic C <$10ok 27
## 2 Agnostic C $10-20ok 34
## 3 Agnostic C $20-30ok 60
## 4 Agnostic C $30-40ok 81
## 5 Agnostic C $40-50ok 76
## 6 Agnostic C $50-75ok 137
## 7 Agnostic C $75-100ok 122
## 8 Agnostic C $100-150ok 109
## 9 Agnostic C >150ok 84
## 10 Agnostic C Don't know/refused 96
## # ... with 170 extra rows
Step 2: be a part of the column metadata
All metadata columns will probably be added robotically with this step.
relig_income %>%
pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
inner_join(columnMetadata, by = "income")
## # A tibble: 180 x 5
## faith religionClass revenue rely incomeGroup
## <chr> <chr> <chr> <dbl> <chr>
## 1 Agnostic C <$10ok 27 low
## 2 Agnostic C $10-20ok 34 low
## 3 Agnostic C $20-30ok 60 low
## 4 Agnostic C $30-40ok 81 medium
## 5 Agnostic C $40-50ok 76 medium
## 6 Agnostic C $50-75ok 137 medium
## 7 Agnostic C $75-100ok 122 excessive
## 8 Agnostic C $100-150ok 109 excessive
## 9 Agnostic C >150ok 84 excessive
## 10 Agnostic C Don't know/refused 96 Don't know/refused
## # ... with 170 extra rows
Step 3 (elective): Convert character information to ordered elements to management plotting order
relig_income %>%
pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
inner_join(columnMetadata, by = "income") %>%
mutate(revenue = ordered(revenue, ranges = columnMetadata$revenue))
## # A tibble: 180 x 5
## faith religionClass revenue rely incomeGroup
## <chr> <chr> <ord> <dbl> <chr>
## 1 Agnostic C <$10ok 27 low
## 2 Agnostic C $10-20ok 34 low
## 3 Agnostic C $20-30ok 60 low
## 4 Agnostic C $30-40ok 81 medium
## 5 Agnostic C $40-50ok 76 medium
## 6 Agnostic C $50-75ok 137 medium
## 7 Agnostic C $75-100ok 122 excessive
## 8 Agnostic C $100-150ok 109 excessive
## 9 Agnostic C >150ok 84 excessive
## 10 Agnostic C Don't know/refused 96 Don't know/refused
## # ... with 170 extra rows
Finally have a look at the mapping to guarantee it labored. Unfortunately desk would not play nicely with the %>%
operator so this step is a bit inelegant.
check <- relig_income %>%
pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
inner_join(columnMetadata, by = "income") %>%
mutate(revenue = ordered(revenue, ranges = columnMetadata$revenue))
desk(check$revenue, check$incomeGroup)
##
## Don't know/refused excessive low medium
## <$10ok 0 0 18 0
## $10-20ok 0 0 18 0
## $20-30ok 0 0 18 0
## $30-40ok 0 0 0 18
## $40-50ok 0 0 0 18
## $50-75ok 0 0 0 18
## $75-100ok 0 18 0 0
## $100-150ok 0 18 0 0
## >150ok 0 18 0 0
## Don't know/refused 18 0 0 0
The metadata columns are actually obtainable
We can plot the information summarized by our arbitrary grouping of religions and coloured by our grouped revenue ranges. Order the revenue courses to make a wise presentation
relig_income %>%
pivot_longer(-c(faith, religionClass), names_to = "income", values_to = "count") %>%
inner_join(columnMetadata, by = "income") %>%
mutate(revenue = ordered(revenue, ranges = columnMetadata$revenue)) %>%
mutate(incomeGroup = ordered(incomeGroup, ranges = c("low", "medium", "high", "Don't know/refused"))) %>%
group_by(religionClass, revenue, incomeGroup) %>%
summarize(meanCount = imply(rely), .teams = "drop_last") %>%
ggplot(aes(x = revenue, y = meanCount, fill = incomeGroup)) +
geom_col() +
facet_wrap(vars(religionClass)) +
theme(axis.textual content.x = element_text(angle = 90, vjust = 0.5, hjust=1))
[ad_2]
Source hyperlink