data:image/s3,"s3://crabby-images/02e38/02e3834d773a313858605758d2c3afdf23fc1792" alt="Dplyr summarize n lines"
data:image/s3,"s3://crabby-images/a5e09/a5e09126804e744f932cc64268bcc0be22314234" alt="dplyr summarize n lines dplyr summarize n lines"
This is why one wants to avoid needless complexity in the first place. 3 Answers Sorted by: 4 Instead of groupby and summarise, you can use count with. You could also just be explicit and say which package to pull sumamrise from as seen below: mtcars > groupby (cyl) > dplyr::summarise (mean. At some point you don’t correctly guess what interpolation between the documentation, examples, and observed behavior actually represents intent. The problem is that it's using plyr::summarise not dplyr::summarise: mtcars > groupby (cyl) > plyr::summarise (mean (disp), mean (hp)) mean (disp) mean (hp) 1 230.7219 146.6875. count () is paired with tally (), a lower-level helper that is equivalent to df > summarise (n n ()). Supply wt to perform weighted counts, switching the summary from n n() to n sum(wt). R displays only the data that fits onscreen: dplyr::glimpse(iris). count () lets you quickly count the unique values of one or more variables: df > count (a, b) is roughly equivalent to df > groupby (a, b) > summarise (n n ()). count() is paired with tally(), a lower-level helper that is equivalent to df > summarise(n n()). The summary is: when you end up filing 3 or more issues just to try and count rows (while in the middle of something else), you get tired. count() lets you quickly count the unique values of one or more variables: df > count(a, b) is roughly equivalent to df > groupby(a, b) > summarise(n n()). Also this interpretation also means the sparklyr example that appears to count rows correctly is not in fact a correct implementation of tally() as it did not sum the n column as stated in the documentation.
data:image/s3,"s3://crabby-images/1748e/1748ed3b8b1db9f2321512257afad81be05c4ed3" alt="dplyr summarize n lines dplyr summarize n lines"
data:image/s3,"s3://crabby-images/3f85f/3f85f8fbb859a39d3c34245bcde1eb37b94e0bab" alt="dplyr summarize n lines dplyr summarize n lines"
However, under this interpretation the bulk of my observations remain true: you have to avoid the “ n“-column to get a count. So I guess it is to be expected that if there is an “ n” column present tally() will sum it instead of counting rows. I now assume one is to read “whether you’re tallying for the first time” to mean “if there is a column named n present” (and not “if you have called tally() more than once”, my first interpretation). I thought a bit more about the line from help(tally) “ tally() is a convenient wrapper for summarise that will either call n() or sum(n) depending on whether you’re tallying for the first time”.
data:image/s3,"s3://crabby-images/02e38/02e3834d773a313858605758d2c3afdf23fc1792" alt="Dplyr summarize n lines"