-
PDF
- Split View
-
Views
-
Cite
Cite
Shalabh, Modern Data Science with R, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 185, Issue 2, April 2022, Pages 735–736, https://doi.org/10.1111/rssa.12784
Close - Share Icon Share
Data science is the latest trend and fascination of many established and upcoming academicians, statisticians and professionals engaged in teaching, research and analytics. A few popular questions that ponder many people are, for example ‘How do I become a data scientist?’, ‘What should I learn to become a data scientist?’, ‘What should I teach to train students as data scientists?’. It is inevitable to accept that knowledge from several areas and subjects is required to become a successful data scientist. The book addresses such concerns and is the second edition of a text published four years ago in 2017. The book’s contents are carefully crafted in chapters to illuminate the various skills and expertise required for becoming a data scientist, ranging from technical competence such as the knowledge of software and computational capability to statistical learning and database management. Without in-depth theoretical detailing of the topics, it outlines various concepts, explains the challenges using data-based examples and demonstrates the extraction of the information hidden in the data.
The topics in the book are partitioned into four parts consisting of 21 chapters and 6 appendices. The first part consists of eight chapters and introduces the essential ingredients required to learn data science concepts. It explains the meaning of data science and covers the topics of data visualization, graphics, data wrangling, data scraping and data cleaning techniques. The second part is divided into five chapters, which include the concepts related to statistics and modelling. They explain some basic theories such as predictive modelling, supervised and unsupervised learning, and simulation. The book’s third part has eight chapters on various topics that explain the concepts required for data science. It illustrates and describes the process of creating dynamic graphics, managing databases using SQL, handling geospatial and text data, network analysis and providing a brief discussion on big data. The last part of the book consists of six appendices detailing the packages used in the book, R and R Studio software, algorithms, reproducible analysis, regression modelling and setting up database servers.
The authors have successfully completed the job of choosing the content with relevant topics and, deciding the extent of knowledge to be delivered, and finally, putting them in an understandable sequence. This is a well-written book and does not cover much theory. A reader need to have in-depth background knowledge of statistics, database management, R software and SQL from other resources to follow the book’s contents. The organization of the text will probably help the newcomers to familiarize with the different areas, topics and extent of knowledge required to become an expert in data science. Although the book does not provide a thorough understanding of any specific topic, it very well explains the expectations from data science when viewing the same problem from a statistical point of view. Exercises are given in every chapter, and the bookdown version of the book is available.
The book’s second edition contents are updated, expanded, revised, split, rewritten and rearranged compared to the first edition. The key changes are the use of recently developed R packages, splitting the three chapters from the earlier edition into two chapters each, updated exercises in the chapters and availability of bookdown version. The beginners may not find it much advantageous to buy the second edition if they already have the first edition, but libraries are recommended to have it on their shelves. The second edition will be more helpful to those who are already working in the field of data science.