DATA 201 (2020) - Home Page

Announcements

30/3/2020: The assessment criteria and course information below has been updated to reflect the current situation.

22/3/2020: Assignment 2 is now available (under Assignments). Submit your completed Python notebook using the ECS assignments system.

For all students enrolling in DATA201 in 2020:

  • The Python computer language is used in DATA201. If you have taken a course like COMP132 then you already have the required background.
  • If you've never done a course in Python before, you should work through an online tutorial introduction to Python. There are many free courses available: for example there is this one at datacamp.com.

Course Information

  • General course information is available here.

  • Lectures will be made available as a combination of written notes, videos, and Python notebooks as they are ready. Lecturers will have a 1 hour zoom session for discussions when the course restarts. Announcements will be made via Blackboard.

  • There will be drop-in zoom sessions for lab advice when the course restarts.

  • The terms test will be in week 7. It will be made available on Blackboard between 4pm Wednesday 20th May and 4pm Friday 22nd May. You will have one hour from whenever you download it to complete it and upload your solution. A mock test will be made available in advance.

  • Assessment is based on:
Two submitted or available fortnightly assignments (5% each) 10%
Three upcoming fortnightly assignments (20% each) 60%
Mid-term test (1 hour) 10%
One project (10 hours) 20%
Students are required to achieve 50% of the overall marks.

  • Assignments. Assignments will become available on the website about 10-14 days before they are due. They will be submitted and returned electronically through the ECS Assessment System.

Assignment 1.ipynb: Assignment 1

A2.ZIP: Assignment 2

  • Paper Free This is a paperless course. All course materials will be placed on the website in advance of the lecture.

  • Software This course will use Python 3 in Jupyter Notebooks. This is free software that should run on most computers (but not tablets or phones). The guide to running a notebook here is a good place to start with installing the software, which will also be available on Victoria computers.

To open one of the notebooks below, save it to your computer, and then run the Jupyter notebooks and open the relevant notebook from the Jupyter Notebooks page in your web browser.

Course Plan

Week Topic Lecturer Material Other
1 Intro to Data Science Stephen Slides No lab
      Lecture1 Python notebook  
2 Python Programming Binh Python Programming Basics  
      Introduction to NumPy
 
      Introduction to Matplotlib  
      Introduction to Pandas  
      Notebooks.zip  
      BN - Lectures 1-2 slides  
      BN - Lecture 3 Notebook  
        Tutorial 1
        Tutorial 1 solution
3 An End-to-End Machine Learning Example Binh BN - Lecture 4 slides  
      BN - Lecture 4 Notebook and Data  
      BN - Lectures 5-6 slides (updated 20/3)  
      BN - Lectures 5-6 Notebook and Data (updated 20/3)  
        Tutorial 2
        Tutorial 2 solution
        Assgt 1 due
(University break)
4 Statistical review Richard   Tutorial 3
        Assgt 2 due
5 Privacy, security and ethics Richard    
6 Classification Binh BN - Lectures (5-6-)7 slides (updated 1/4)  
      BN - Lectures (5-6-)7 Notebook and Data Assgt 3 due
7 Algorithms I Binh   Midterm test
8 Linear Algebra Stephen   Assgt 4 due
9 Dimensionality Reduction and Data Visualisation Stephen    
10 Algorithms II Binh   Assgt 5 due
11 Basic optimisation Stephen    
12 No new material Binh   Project due

Datasets

default_credit.csv default_credit.csv
default_credit.xls default_credit.xls
daily_flask_co2_nzd.csv CO2 data for NZ
daily_flask_co2_mlo.csv CO2 data for Mauna Loa
TaranakiStWharf.csv Water quality at Taranaki St
electricity-statistics.xlsx electricity-statistics.xlsx
LifeExpectancy.csv Life Expectancy Dataset

Mid-term Test

The mid-term test will cover material from the first half of the course. You will not need to write any code, but will need to consider what a data scientist does, with respect to possible data problems.

A note on Assignments

The purpose of the lab session and the assignments is to help you learn. Attending labs and lectures, and working seriously on assignments, is strongly correlated with success in mathematics courses. Ignore this at your peril.

Class representatives

The class representative for this course is Elyse Smaill (smaillelys@myvuw.ac.nz). The Facebook page is here.

University policies and statutes

It is worthwhile becoming familiar with the following information. Other relevant policies can be found at the academic policy website.
Topic attachments
I Attachment Action Size Date Who Comment
Introduction to Pandas.ipynbipynb Introduction to Pandas.ipynb manage 36 K 25 Feb 2020 - 16:49 Main.marslast Introduction to Pandas
Introduction_to_Matplotlib.pdfpdf Introduction_to_Matplotlib.pdf manage 996 K 11 Mar 2020 - 23:41 Main.nguyenb5 Introduction to Matplotlib
Introduction_to_NumPy.pdfpdf Introduction_to_NumPy.pdf manage 252 K 09 Mar 2020 - 22:08 Main.nguyenb5 Introduction to NumPy
Introduction_to_Pandas.pdfpdf Introduction_to_Pandas.pdf manage 412 K 11 Mar 2020 - 23:42 Main.nguyenb5 Introduction to Pandas
Notebooks.zipzip Notebooks.zip manage 898 K 11 Mar 2020 - 23:43 Main.nguyenb5 Week 2 - Notebook files
Python_Programming_Basics.pdfpdf Python_Programming_Basics.pdf manage 334 K 09 Mar 2020 - 22:08 Main.nguyenb5 Python Programming Basics
lecture_4.pdfpdf lecture_4.pdf manage 3 MB 17 Mar 2020 - 01:07 Main.nguyenb5 Lecture 4 slides
lectures_5_6.pdfpdf lectures_5_6.pdf manage 5 MB 20 Mar 2020 - 03:10 Main.nguyenb5 Lectures 5-6 slides
lectures_5_6_nb.zipzip lectures_5_6_nb.zip manage 930 K 20 Mar 2020 - 03:10 Main.nguyenb5 Lectures 5-6 notebook
w2_lecture_2.pdfpdf w2_lecture_2.pdf manage 2 MB 11 Mar 2020 - 23:47 Main.nguyenb5 Week 2 - Lecture 2