Skip to content

dcast may incorrectly assign column names when the argument 'drop = F' #85

@hzarkoob

Description

@hzarkoob

When the argument 'drop' is set to FALSE, in some cases dcast incorrectly assigns the column names.

Note 1. I understand that the reshape2 package is retired, but I think this might be important because, if I am not mistaken, currently some data manipulation tasks, including usage of formulas, are available only true the reshape2 package and not through the newer package tidyr.

Note2. If anyone knows an alternative way to accomplish the task mentioned in the clarifying example below I would appreciate it if they can let me know.

Example to demonstrate the issue:

a = data.frame(Year = c(2000, 2001, 2000, 2001), Country = c("A", "B", "B", NA), City = c("A1", "B1", NA, "C1"), Cost = c(10, 20, 50, 30))

print(a)

Year Country City Cost
1 2000 A A1 10
2 2001 B B1 20
3 2000 B 50
4 2001 C1 30

Now we apply the dcast function with the argument 'drop = F':

dcast(a, "Year ~ Country + City", aggregate.fun = sum, value.var = "Cost", drop = F)

Output:
Year A_A1 A_B1 A_C1 B_A1 B_B1 B_C1 NA NA NA NA NA NA
1 2000 10 NA NA NA NA NA NA 50 NA NA NA NA
2 2001 NA NA NA NA NA 20 NA NA NA NA 30 NA

Note that the value at B_C1 at year 2001 is incorrectly mentioned to be 20. I think that is just because the columns names are mistakenly assigned.

Things look good with the argument 'drop = T'.
dcast(a, "Year ~ Country + City", aggregate.fun = sum, value.var = "Cost", drop = T)

Output:

Year A_A1 B_B1 B_NA NA_C1
1 2000 10 NA 50 NA
2 2001 NA 20 NA 30

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions