X

Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools

Product ID : 46424041


Galleon Product ID 46424041
Model
Manufacturer
Shipping Dimension Unknown Dimensions
I think this is wrong?
-
3,387

*Price and Stocks may change without prior notice
*Packaging of actual item may differ from photo shown

Pay with

About Cleaning Data For Effective Data Science: Doing The

Product Description A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Think about your data intelligently and ask the right questions Master data cleaning techniques using hands-on examples belonging to diverse domains Work with detailed, commented, well-tested code samples in Python and R Book Description In data science, data analysis, or machine learning, most of the effort needed to achieve your actual purpose lies in cleaning your data. Using Python, R, and command-line tools, you will learn the essential cleaning steps performed in every production data science or data analysis pipeline. This book not only teaches you data preparation but also what questions you should ask of your data. The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired. You will begin by looking at data ingestion of a range of data formats. Moving on, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks. What you will learn Ingest and work with common tabular, hierarchical, and other data formats Apply useful rules and heuristics for assessing data quality and detecting bias Identify and handle unreliable data and outliers in their many forms Impute sensible values into missing data and use sampling to fix imbalances Generate synthetic features that help to draw out patterns in your data Prepare data competently and correctly for analytic and machine learning tasks Who this book is for This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful. The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues. Table of Contents Data Ingestion – Tabular Formats Data Ingestion - Hierarchical Formats Data Ingestion - Repurposing Data Sources The Vicissitudes of Error - Anomaly Detection The Vicissitudes of Error - Data Quality Rectification and Creation - Value Imputation Rectification and Creation - Feature Engineering Ancillary Matters - Closure/Glossary Review "Far more time is usually spent in extracting, cleaning, normalizing, or fixing data that ultimately feeds a data scientist's models than is spent on the "data science" itself. Despite this, data cleaning has so far lacked a comprehensive resource to teach newcomers about the practices that some of us have had to learn the hard way over many years. Cleaning Data for Effective Data Science is the first book I've seen that really meets that need. It's well-written and literate, with coherent and understandable explanations of both the structures used in handling real-world data and the many ways things can go wrong. When I give talks about data cleaning, I'm often asked to recommend a book on this topic, and I've never had a really good answer. No more! I predict that this book will be a standard for a rising generation of data engineers, and deservedly so." -- Naomi Ceder, Former Chair, Python Software Foundation, Co-Founder/Organizer Trans*Code Hackday "The subject of Cleaning Data for Effective Data Science is vital yet, sadly, neglected in the literature. I and my fellow practitioners have learned most of what this book teaches on the job by t