Practical Data Quality

Practical Data Quality

data-analyst-science near Pune
Recorded content
Of Total 10 Hrs.
data-analyst-science near Pune
Duration
3 Months (50 hours)
data-analyst-science near Pune
LIVE sessions
4 Workshops
data-analyst-science near Pune
Hands-On Learning
With Practice Modules
data-analyst-science near Pune
Certificate
With License

React Components

The integrity of data (or lack thereof) affects the overall success of any analytical work. This Practical Data Quality training course teaches attendees how to maintain high data quality standards to make sound tactical and strategic business decisions. Participants learn how to resolve errors and flaws in datasets, implement tactics for monitoring and building workflows, and more.

Objective

  • Understand the factors that contribute to poor data quality
  • Measure data quality
  • Validate and normalize data
  • Perform unit testing
  • Implement best practices to ensure data quality

Outline

  • • Data Quality Defined
  • • Data Quality Dimensions/Properties
  • • Interpreting Data Quality Properties
  • • The Typical Data Analytics (Machine Learning) Pipeline
  • • Data Quality Assurance
  • • Common Factors Contributing to Poor Data Quality
  • • Is Bad Data Quality a Good or a Bad Thing?
  • • Data Quality is a Shared Concern
  • • Data Governance
  • • Common Issues that can be Prevented Through Effective Governance
  • • The Data Steward Role
  • • Common Steps to Overcome Data Quality Issues
  • • Data Observability
  • • Application Performance Monitoring (APM) and Observability Magic Quadrant
  • • Example of (Operational) Observability Dashboard
  • • Data Quality and Data Observability Relationship
  • • Example of an Observability-Enabling Service
  • • A Glossary of Business Terms
  • • Data Dictionaries
  • • Example of a Data Dictionary
  • • SLAs
  • • SLAs and Non-Functional Requirements
  • • The Great, Fast, and Cheap Quality Diagram
  • • Examples of Data Quality Metrics
  • • Measuring Data Quality
  • • Common Corrective Measures for Data Quality Problems
  • • Descriptive Statistics
  • • Correlation
  • • Normal Distribution and Z-Score
  • • Non-uniformity of a Probability Distribution
  • • Shannon Entropy
  • • Gini Impurity
  • • Example of Using Gini Impurity Formula
  • • Confusion Matrix
  • • The Binary Classification Confusion Matrix
  • • A Binary Classification Confusion Matrix Visually
  • • Example of a Confusion Matrix

  • • Connecting to the Digital Realm
  • • States of Digital Data
  • • Maintenance
  • • Automation
  • • Workflow (Pipeline) Orchestration Systems
  • • Example of a Workflow Orchestration System: Apache NiFi
  • • NiFi Processor Types
  • • Building a Simple Data Flow in the NiFi Designer
  • • Logging Levels
  • • Data Formats
  • • Interoperable Data
  • • Timeliness
  • • Efficient Storage with Columnar Formats
  • • Storage and Querying Efficiencies of the Parquet Columnar Storage Format
  • • Assertions
  • • The assert Expression in Python
  • • Two Types of Errors
  • • Runtime Errors/Exceptions
  • • Life after an Exception
  • • Assertions vs. Errors (Exceptions)
  • • Data Validation
  • • Data Normalization
  • • DDL-based Data Validation
  • • An SQL DDL Schema with Constraints Example
  • • Apache Hive and Schema-on-Demand
  • • An Example of Hive DDL
  • • XML and JSON Schemas
  • • The Schema Production and Consumption Diagram
  • • Example of an XSD Schema Authoring Editor
  • • Regular Expressions Elements
  • • What is Unit Testing and Why Should I Care?
  • • Unit Testing and Test-Driven Development
  • • TDD Benefits
  • • Testing for Failure
  • • Logging and Monitoring

  • • The Consistency Consensus
  • • The Two-phase Commit (2PC) Protocol Diagram
  • • The CAP Theorem
  • • Mechanisms for Guaranteeing a Single CAP Property
  • • The CAP Triangle
  • • Eventual Consistency
  • • Example of the Consistency vs. Availability Gap
  • • How eBay Preempts Possible Database Corruption
  • • The Saga Pattern
  • • Saga Log and Execution Coordinator
  • • The Saga Happy Path
  • • A Saga Compensatory Requests Example
  • • The Event Sourcing Pattern
  • • Event Sourcing Example
  • • Applying Efficiencies to Event Sourcing
  • • Time Accuracy and Consistency
  • • Network Time Protocol (NTP)

Training Materials

All Data Quality training attendees receive comprehensive courseware.

Software Requirements

• Computer with Internet connectivity

• Ability to install software on the computer

• Recent 64-bit OS, such as Windows 10, macOS, or Linux

Why Online Bootcamps

Develop skills for real career growth

Cutting-edge curriculum designed in guidance with industry and academia to develop job-ready skills

Learn by working on real-world problems

Capstone projects involving real world data sets with virtual labs for hands-on learning

Learn from experts active in their field, not out-of-touch trainers

Leading practitioners who bring current best practices and case studies to sessions that fit into your work schedule.

Structured guidance ensuring learning never stops

24x7 Learning support from mentors and a community of like-minded peers to resolve any conceptual doubts

FAQ's

    SkillsMatrix JobAssist program is an India-specific offering in Partnership with IIMJobs.com to help you land your dream job. With the JobAssist program, we will offer extended support for the Lean Six Sigma certified learners who are looking for a job switch or starting with their first job. Upon successful completing the Lean Six Sigma Master’s Program, you will be eligible to apply for this JobAssist program and your details will be shared with IIMJobs. As a part of this JobAssist program IIMJobs will offer the following exclusive programs:

  • • IIMJobs Pro-Membership for 6 Months
  • • Resume Building Assistance
  • • Spotlight on IIMJobs for highlighting your profile to recuiters

    The iimjobs.com Pro-Membership offers learners exclusive features that aren’t available to free members.

  • • Dedicated career experts from IIMJobs will completely handhold learners to register on the portal, provide them tips and guidance to improve their profile and follow the right keywords.
  • • Access to Insights: This helps them compare their application with other applications received for a similar job.
  • • Improved visibility of your profile: Your application gets promoted to the top of the applications.
  • • Receive a notification if the recruiter shortlists your application: You can chat with the recruiter directly and do the follow-ups on the application.

    To participate in the JobAssist program you need to:

  • • Be a graduate (engineering or equivalent)
  • • Complete our Lean Six Sigma program successfully and earn a certificate upon completion

    In career mentoring sessions, Subject Matter Experts (SMEs) or industry experts answer questions related to career growth and opportunities.

    No, the JobAssist program is designed to help you in finding your dream job. It will maximize your potential and chances of landing a successful job. The final selection is always dependent on the recruiter.

    No, SkillsMatrix or IIMJobs (our JobAssist program partner) will never forward your resume to the recruiters directly. Pro-Membership will give you access to thousands of jobs to apply for on the portal and also attend job fairs which will be conducted from time to time.