Modern Data Science with R

Name: Modern Data Science with R
Price: 102,58 € EUR
Availability: InStock
Author: Baumer Benjamin S.; Kaplan Daniel T.; Horton Nicholas J.
ISBN: 9780367191498

baumer benjamin s.; kaplan daniel t.; horton nicholas j.

Disponibilità: Normalmente disponibile in 20 giorni
A causa di problematiche nell'approvvigionamento legate alla Brexit sono possibili ritardi nelle consegne.

PREZZO
107,98 €

NICEPRICE
102,58 €

SCONTO
5%

Acquista

Questo prodotto usufruisce delle SPEDIZIONI GRATIS
selezionando l'opzione Corriere Veloce in fase di ordine.

Pagabile anche con Carta della cultura giovani e del merito, 18App Bonus Cultura e Carta del Docente

Dettagli

Genere:Libro

Lingua: Inglese

Editore:

Chapman and Hall/CRC

Pubblicazione: 04/2021

Edizione: Edizione nuova, 2° edizione

Note Editore

From a review of the first edition: "Modern Data Science with R… is rich with examples and is guided by a strong narrative voice. What’s more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician). Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions. The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.

Sommario

Preface Background and motivation Intended audience Key features of this book Changes in the second edition Key role of technology How to use this book Acknowledgments I Part I: Introduction to Data Science 1. Prologue: Why data science? What is data science? Case study: The evolution of sabermetrics Datasets Further resources 2. Data visualization The federal election cycle Composing data graphics Importance of data graphics: Challenger Creating effective presentations The wider world of data visualization Further resources Exercises Supplementary exercises 3. A grammar for graphics A grammar for data graphics Canonical data graphics in R Extended example: Historical baby names Further resources Exercises Supplementary exercises 4. Data wrangling on one table A grammar for data wrangling Extended example: Ben’s time with the Mets Further resources Exercises Supplementary exercises 5. Data wrangling on multiple tables inner_join() left_join() Extended example: Manny Ramirez Further resources Exercises Supplementary exercises 6. Tidy data Tidy data Reshaping data Naming conventions Data intake Further resources Exercises Supplementary exercises 7. Iteration Vectorized operations Using across() with dplyr functions The map() family of functions Iterating over a one-dimensional vector Iteration over subgroups Simulation Extended example: Factors associated with BMI Further resources Exercises Supplementary exercises 8. Data Science Ethics Introduction Truthful falsehoods Role of data science in society Some settings for professional ethics Some principles to guide ethical action Algorithmic bias Data and disclosure Reproducibility Ethics, collectively Professional guidelines for ethical conduct Further resources Exercises Supplementary exercises II Part II: Statistics and Modeling 9. Statistical foundations Samples and populations Sample statistics The bootstrap Outliers Statistical models: Explaining variation Confounding and accounting for other factors The perils of p-values Further resources Exercises Supplementary exercises 10. Predictive modeling Predictive modeling Simple classification models Evaluating models Extended example: Who has diabetes? Further resources Exercises Supplementary exercises 11. Supervised learning Non-regression classifiers Parameter tuning Example: Evaluation of income models redux Extended example: Who has diabetes this time? Regularization Further resources Exercises Supplementary exercises 12. Unsupervised learning Clustering Dimension reduction Further resources Exercises Supplementary exercises 13. Simulation Reasoning in reverse Extended example: Grouping cancers Randomizing functions Simulating variability Random networks Key principles of simulation Further resources Exercises Supplementary exercises III Part III: Topics in Data Science 14. Dynamic and customized data graphics Rich Web content using Djs and htmlwidgets Animation Flexdashboard Interactive Web apps with Shiny Customization of library(ggplot)ggplot graphics Extended example: Hot dog eating Further resources Exercises Supplementary exercises 15. Database querying using SQL From dplyr to SQL Flat-file databases The SQL universe The SQL data manipulation language Extended example: FiveThirtyEight flights SQL vs R Further resources Exercises Supplementary exercises 16. Database administration Constructing efficient SQL databases Changing SQL data Extended example: Building a database Scalability Further resources Exercises Supplementary exercises 17. Working with geospatial data Motivation: What’s so great about geospatial data? Spatial data structures Making maps Extended example: Congressional districts Effective maps: How (not) to lie Projecting polygons Playing well with others Further resources Exercises Supplementary exercises 18. Geospatial computations Geospatial operations Geospatial aggregation Geospatial joins Extended example: Trail elevations at MacLeish Further resources Exercises Supplementary exercises 19. Text as data Regular expressions using Macbeth Extended example: Analyzing textual data from arXivorg Ingesting text Further resources Exercises Supplementary exercises 20. Network science Introduction to network science Extended example: Six degrees of Kristen Stewart PageRank Extended example: men’s college basketball Further resources Exercises Supplementary exercises 21. Epilogue: Towards "big data" Notions of big data Tools for bigger data Alternatives to R Closing thoughts Further resources IV Part IV: Appendices A Packages used in this book The mdsr package Other packages Further resources B Introduction to R and RStudio Installation Learning R Fundamental structures and objects Add-ons: Packages Further resources Exercises Supplementary exercises C Algorithmic thinking Introduction Simple example Extended example: Law of large numbers Non-standard evaluation Debugging and defensive coding Further resources Exercises Supplementary exercises D Reproducible analysis and workflow Scriptable statistical computing Reproducible analysis with R Markdown Projects and version control Further resources Exercises Supplementary exercises E Regression modeling Multiple regression Inference for regression Assumptions underlying regression Logistic regression Further resources Exercises Supplementary exercises F Setting up a database server SQLite MySQL PostgreSQL Connecting to SQL

Autore

Benjamin S. Baumer is an associate professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and Analyzing Baseball Data with R. He received the 2019 Waller Education Award and the 2016 Significant Contributor Award from the Society for American Baseball Research. Daniel T. Kaplan is the DeWitt Wallace emeritus professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing. Danny received the 2006 Macalester Excellence in Teaching award and the 2017 CAUSE Lifetime Achievement Award. Nicholas J. Horton is Beitzel Professor of Technology and Society (Statistics and Data Science) at Amherst College. He is a Fellow of the ASA and the AAAS, co-chair of the National Academies Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in data science curriculum efforts to help students "think with data".