DATA 201 (2020) - Home Page

Announcements

14/6/2020: The assignment 5 sheet below has been updated to make things a little bit easier for you. I would recommend you download this one, and paste previous answers into it.

9/6/2020: The project is also now available. It is due on last possible semester date Friday 26th June before 6pm.

2/6/2020: Assignment 5 is now available. It is due on extended date Wednesday 17th June before 6pm.

Course Information

  • General course information is available here.

  • Lectures will be made available as a combination of written notes, videos, and Python notebooks as they are ready. Lecturers will have a 1 hour zoom session for discussions when the course restarts. Announcements will be made via Blackboard.

  • There will be drop-in zoom sessions for lab advice when the course restarts.

  • The terms test was in week 7.

  • Assessment is based on:
Two submitted or available fortnightly assignments (5% each) 10%
Three upcoming fortnightly assignments (20% each) 60%
Mid-term test (1 hour) 10%
One project (10 hours) 20%
Students are required to achieve 50% of the overall marks.

  • Assignments. Assignments will become available on the website about 10-14 days before they are due. They will be submitted and returned electronically through the ECS Assessment System.

Assignments and Tests Solutions
Assignment 1: Assignment 1.ipynb Assignment 1_Solution.ipynb: Assignment 1 Solution
Assignment 2: A2.ZIP Suggested solution (uploaded 13/5): A2_sol.ipynb
Assignment 3: Questions and the data required: EuropeanBirds.csv, and some useful background information: EuropeanBirds-Information.txt. Submit via the ECS submission system by 6pm on Friday 15 May. Assignment 3 Solutions
Term test: Questions are in this zipfile: DATA201-MidtermTest.zip Submit via the ECS submission system by Fri 22 May at 4pm. DATA201 Test Solutions
Assignment 4: Questions and the data required: A4.ZIP. Submit via the ECS submission system by midnight on Saturday 30 May. A4-solution.zip
Assignment 5: Questions and data. Submit via the ECS submission system by 6pm on Wednesday 17th June.  
Project: DATA201proj.pdf by 6pm on Friday 26th June.  

  • Paper Free This is a paperless course. All course materials will be placed on the website in advance of the lecture.

  • Software This course will use Python 3 in Jupyter Notebooks. This is free software that should run on most computers (but not tablets or phones). The guide to running a notebook here is a good place to start with installing the software, which will also be available on Victoria computers.

To open one of the notebooks below, save it to your computer, and then run the Jupyter notebooks and open the relevant notebook from the Jupyter Notebooks page in your web browser.

Course Plan

Week Topic Lecturer Material Other
1 Intro to Data Science Stephen Slides No lab
      Lecture1 Python notebook  
2 Python Programming Binh Python Programming Basics  
      Introduction to NumPy
 
      Introduction to Matplotlib  
      Introduction to Pandas  
      Notebooks.zip  
      BN - Lectures 1-2 slides  
      BN - Lecture 3 Notebook  
        Tutorial 1
        Tutorial 1 solution
3 An End-to-End Machine Learning Example Binh BN - Lecture 4 slides  
      BN - Lecture 4 Notebook and Data  
      BN - Lectures 5-6 slides (updated 20/3)  
      BN - Lectures 5-6 Notebook and Data (updated 20/3)  
        Tutorial 2
        Tutorial 2 solution
        Assgt 1 due
(University break)
4 Statistical review Richard Probability 1; Probability 2; Probability 3  
      Statistics 1; Statistics 2  
        Tutorial 3
        Tutorial 3 solution
        Assgt 2 due
5 Privacy, security and ethics Richard Privacy Security Confidentiality; Ethics of Data Science Tutorial 4
        Tutorial 4 solution
6 Classification Binh BN - Lectures (5-6-)7 slides  
      BN - Lectures (5-6-)7 Notebook and Data  
      BN - Lectures 8-9 slides  
      BN - Lectures 8-9 Notebook Tutorial 5
        Tutorial 5 Solution
        Assgt 3 due
7 Algorithms I Binh BN - Lecture 10 slides  
      BN - Lecture 10 Notebook  
      BN - Lecture 11 slides  
      BN - Lecture 11 Notebook Tutorial 6
      BN - Lecture 12 slides Tutorial 6 Solution
      BN - Lecture 12 Notebook Midterm test
8 Linear Algebra Stephen SM - handwritten notes Assgt 4 due
      SM - iPython notebook Week 8 tutorial
        cute.jpg: Picture for week 8 tutorial
        Tutorial solution
9 Dimensionality Reduction and Data Visualisation Stephen SM - handwritten notes  
      SM - iPython notebook Week 9 tutorial
        Tutorial solution
10 Algorithms II Binh BN - Lecture 13 slides Assgt 5 due
      BN - Lecture 13 Notebook  
      BN - Lecture 14 slides  
      BN - Lecture 14 Notebook  
      BN - Lecture 15 slides Week 10 tutorial
      BN - Lecture 15 Notebook Tutorial solution
11 Basic optimisation Stephen SM - handwritten notes  
      SM - iPython notebook Week 11 tutorial
        Tutorial solution
12 No new material     Project due

Datasets

projectdata.zip Data for the project
default_credit.csv default_credit.csv
default_credit.xls default_credit.xls
daily_flask_co2_nzd.csv CO2 data for NZ
daily_flask_co2_mlo.csv CO2 data for Mauna Loa
TaranakiStWharf.csv Water quality at Taranaki St
electricity-statistics.xlsx electricity-statistics.xlsx
LifeExpectancy.csv Life Expectancy Dataset
Olympic100m.csv Olympic Games 100m times
SAheart.csv Heart Health data
SURFIncomeSurvey.csv SURF Income Survey from Stats NZ
fishdata.csv Fish weights and lengths
titanic.csv Passenger list of the Titanic

Mid-term Test

The mid-term test will cover material from the first half of the course (until Week 6). You will need to consider what a data scientist does, with respect to possible data problems. You will not need to write any completed code, but you will be expected to understand a given code and/or to fill in a few missing lines of a given code. The format of most of the questions will be similar to last year exam questions: 2019_1_DATA201.pdf (for example, Question 4(a-d), Question 5(a-e,g-j)).

A note on Assignments

The purpose of the lab session and the assignments is to help you learn. Attending labs and lectures, and working seriously on assignments, is strongly correlated with success in mathematics courses. Ignore this at your peril.

Class representatives

The class representative for this course is Elyse Smaill (smaillelys@myvuw.ac.nz). The Facebook page is here.

University policies and statutes

It is worthwhile becoming familiar with the following information. Other relevant policies can be found at the academic policy website.

Topic attachments
I Attachment Action Size Date Who Comment
A4-solution.zipzip A4-solution.zip manage 69 K 16 Jun 2020 - 23:21 Main.nguyenb5 A4 solution
Assignment 1_Solution.ipynbipynb Assignment 1_Solution.ipynb manage 94 K 15 May 2020 - 09:45 Main.marslast Assignment 1 Solution
Assignment5.ipynbipynb Assignment5.ipynb manage 202 K 14 Jun 2020 - 22:03 Main.marslast A
DATA201proj.pdfpdf DATA201proj.pdf manage 122 K 09 Jun 2020 - 14:49 Main.marslast Project Specification
Introduction to Pandas.ipynbipynb Introduction to Pandas.ipynb manage 36 K 25 Feb 2020 - 16:49 Main.marslast Introduction to Pandas
Introduction_to_Matplotlib.pdfpdf Introduction_to_Matplotlib.pdf manage 996 K 11 Mar 2020 - 23:41 Main.nguyenb5 Introduction to Matplotlib
Introduction_to_NumPy.pdfpdf Introduction_to_NumPy.pdf manage 252 K 09 Mar 2020 - 22:08 Main.nguyenb5 Introduction to NumPy
Introduction_to_Pandas.pdfpdf Introduction_to_Pandas.pdf manage 412 K 11 Mar 2020 - 23:42 Main.nguyenb5 Introduction to Pandas
Lecture 8.ipynbipynb Lecture 8.ipynb manage 88 K 18 May 2020 - 21:29 Main.marslast Lecture 8 Python Notebook
Lecture11.ipynbipynb Lecture11.ipynb manage 83 K 16 Jun 2020 - 18:25 Main.marslast  
Lecture9.ipynbipynb Lecture9.ipynb manage 43 K 30 May 2020 - 10:11 Main.marslast Lecture 9 Python Notebook
Linear Algebra.pdfpdf Linear Algebra.pdf manage 2 MB 18 May 2020 - 21:27 Main.marslast Notes for Week 8
Notebooks.zipzip Notebooks.zip manage 898 K 11 Mar 2020 - 23:43 Main.nguyenb5 Week 2 - Notebook files
Optimisation.pdfpdf Optimisation.pdf manage 6 MB 14 Jun 2020 - 22:32 Main.marslast Notes
Principal Components Analysis.pdfpdf Principal Components Analysis.pdf manage 4 MB 30 May 2020 - 10:10 Main.marslast Lecture 9
Python_Programming_Basics.pdfpdf Python_Programming_Basics.pdf manage 334 K 09 Mar 2020 - 22:08 Main.nguyenb5 Python Programming Basics
Tutorial 2-Solution.ipynbipynb Tutorial 2-Solution.ipynb manage 409 K 02 Jun 2020 - 08:33 Main.marslast Tutorial solution
Tutorial 2.ipynbipynb Tutorial 2.ipynb manage 407 K 24 May 2020 - 20:49 Main.marslast Week 8 tutorial
Tutorial3.ipynbipynb Tutorial3.ipynb manage 64 K 30 May 2020 - 10:16 Main.marslast Week 9 tutorial
Tutorial3_solution.ipynbipynb Tutorial3_solution.ipynb manage 66 K 08 Jun 2020 - 12:37 Main.marslast Tutorial solution
Tutorial9.ipynbipynb Tutorial9.ipynb manage 4 K 14 Jun 2020 - 22:35 Main.marslast  
Tutorial9_solutions.ipynbipynb Tutorial9_solutions.ipynb manage 26 K 22 Jun 2020 - 21:44 Main.marslast  
cute.jpgjpg cute.jpg manage 67 K 24 May 2020 - 20:50 Main.marslast Picture for week 8 tutorial
landmark_faces.txttxt landmark_faces.txt manage 4 MB 02 Jun 2020 - 16:28 Main.marslast Face landmark dataset
lecture_4.pdfpdf lecture_4.pdf manage 3 MB 17 Mar 2020 - 01:07 Main.nguyenb5 Lecture 4 slides
lectures_5_6.pdfpdf lectures_5_6.pdf manage 5 MB 20 Mar 2020 - 03:10 Main.nguyenb5 Lectures 5-6 slides
lectures_5_6_nb.zipzip lectures_5_6_nb.zip manage 930 K 20 Mar 2020 - 03:10 Main.nguyenb5 Lectures 5-6 notebook
projectdata.zipzip projectdata.zip manage 1 MB 09 Jun 2020 - 14:48 Main.marslast Data for the project
tutorial_3_sol.zipzip tutorial_3_sol.zip manage 27 K 09 Apr 2020 - 20:28 Main.nguyenb5 Tutorial 3 Solution
w2_lecture_2.pdfpdf w2_lecture_2.pdf manage 2 MB 11 Mar 2020 - 23:47 Main.nguyenb5 Week 2 - Lecture 2