Dplyr

From Wikipedia, the free encyclopedia

Original authorsHadley Wickham, Romain François, Lionel Henry, Kirill Müller, Davis Vaughan
Initial releaseJanuary 7, 2014; 11 years ago (2014-01-07)
Stable release
1.1.4[1] Edit this on Wikidata / 17 November 2023; 23 months ago (17 November 2023)
Repository
dplyr
Original authorsHadley Wickham, Romain François, Lionel Henry, Kirill Müller, Davis Vaughan
Initial releaseJanuary 7, 2014; 11 years ago (2014-01-07)
Stable release
1.1.4[1] Edit this on Wikidata / 17 November 2023; 23 months ago (17 November 2023)
Repository
Written inR
LicenseMIT License
Websitedplyr.tidyverse.org//

dplyr is an R package whose set of functions are designed to enable dataframe (a spreadsheet-like data structure) manipulation in an intuitive, user-friendly way. It is one of the core packages of the popular tidyverse set of packages in the R programming language.[2] Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.[3][4]

For instance, someone seeking to analyze a large dataset may wish to only view a smaller subset of the data. Alternatively, a user may wish to rearrange the data in order to see the rows ranked by some numerical value, or even based on a combination of values from the original dataset. Functions within the dplyr package will allow a user to perform such tasks.

dplyr was launched in 2014.[5] On the dplyr web page, the package is described as "a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges."[6]

While dplyr actually includes several dozen functions that enable various forms of data manipulation, the package features five primary verbs or actions:[7]

  • filter(), which is used to extract rows from a dataframe, based on conditions specified by a user;
  • select(), which is used to subset a dataframe by its columns;
  • arrange(), which is used to sort rows in a dataframe based on attributes held by particular columns;
  • mutate(), which is used to create new variables, by altering and/or combining values from existing columns; and
  • summarize(), also spelled summarise(), which is used to collapse values from a dataframe into a single summary.

Additional functions

In addition to its five main verbs, dplyr also includes several other functions that enable exploration and manipulation of dataframes. Included among these are:

  • count(), which is used to sum the number of unique observations that contain some particular value or categorical attribute;
  • rename(), which enables a user to alter the column names for variables, often to improve ease of use and intuitive understanding of a dataset;
  • slice_max(), which returns a data subset that contains the rows with the highest number of values for some particular variable;
  • slice_min(), which returns a data subset that contains the rows with the lowest number of values for some particular variable.

Built-in datasets

References

Related Articles

Wikiwand AI