Carnegie Mellon University

Applied Data Analytics 

Instructor CEU Units # of Lectures Hours per Week Tuition
Ravi Starzl 4.8 12 8-10 $2,700

Course Objectives

This course prepares students to become entry-level technical team players on data analysis projects, and trains them to use pre-built analytic libraries to answer business intelligence, regression, classification, and clustering problems using the R platform. Students will learn to program in the R environment, and also how to connect R to a cluster for large-scale data processing. Suitable for expert excel users who are not developers. No java programming required.

Upon course completion students will:

  • Understand basic principles of data preparation and analysis
  • Fluently program R scripts to conduct a wide range of analysis
  • Develop techniques to hone in on the information need of your stakeholders
  • Understand the application of machine learning techniques to various problems
  • Understand how to use statistics to describe data and fulfill stakeholder information needs

Prerequisites

Familiarity with the unix/linux command line, a creative and inquisitive mind, and determination. Familiarity with basic statistical concepts or any prior machine learning experience will be helpful, but is not a prerequisite. This course will require a computer capable of installing and running the R computing environment, such as R studio available at: https://www.rstudio.com

Required Textbook

"An Introduction to Statistical Learning"
James, G., Witten, D., Hastie Tlk Tibshirani. R.  2013, XIV, 426p 150 illus. Available at:  http://www-bcf.usc.edu/~gareth/ISL/

Topics

Lecture 1:

Introduction to Machine Learning and Data Analysis

Lecture 1-2:   Evaluating Information Needs and Conducting Regression Analysis
Lecture 3:  Structuring Problems, Data, and Features for Regression Analysis
Evaluating Regression Models and Presenting the Results
Lecture 4-5: Evaluating Information Needs and Conducting Classification Analysis
Lecture 6:  Structuring Problems, Data, and Features for Classification Analysis 
Evaluation of Classification Models and Presentation of Classification Results to Non-Technical Audiences
Lecture 7-8: Evaluating Information Needs and Conducting Clustering Analysis
Lecture 9: Structuring Problems, Data, and Features for Clustering Analysis
Evaluation of Classification Models and Presentation of Clustering Results to Non-Technical Audiences
Lecture 10: Conceptual Introduction to Big Data Systems and Analysis
Lecture 11-12: Conduct Business Intelligence Analysis on AWS Redshift and Present Results to Non-Technical Audience