Syllabus Scientific Programming / Data Processing

General info

Welcome to this programming track! In the weeks ahead, you’ll use the Python programming language while learning to solve scientific problems from several fields of science. This track is intended for students who have no experience in programming at all. It comprises three courses, Scientific Programming 1, Scientific Programming 2 and Data Processing wherein you learn about the Python language by working on programming problems from several scientific areas. The three courses from this track are designed to be taken together, but you can choose to only follow one or two of the courses.

Table of Contents

Prerequisites

Scientific Programming 1 assumes no prior programming experience. If you have already done a course in Python, or if you have extensive experience in another programming language, this course might not be your best option—but we’re happy to refer you to other courses if you’d like!

Other than that, some modules assume high school mathematics or physics, but in those case you can choose an alternative module that doesn’t. If you feel overwhelmed, don’t hesitate to contact the course staff! We can explain the course’s philosophy and requirements, and make recommendations on how to approach problems.

Scientific Programming 2 and Data Processing only assume the preceding course as prior knowledge.

Learning goals

Scientific Programming 1 is a beginner’s course. We will teach you the basics of Python programming as well as several different ways of solving computational problems. After this course, we envision that you:

After Scientific Programming 2 you should be able to independently tackle typical programming challenges that you might encounter in your field of studies/research. We will teach you more intermediate Python concepts. And some more advanced concepts pertaining to data analysis. After this course, we envision that you:

For Data Processing you’ll build your own toolkit of useful programs with which you can read, transform and analyze data that you might find in various scientific areas. After this course we envision that you:

Course materials

All the reading and video material is available on this website. You do not need to purchase any books or software. Every module consists of short explanations (written and in the form of videos) and assignments. You do need to bring your own laptop.

Contact

Contact: scientific@proglab.nl

Getting started

Your entry to the course is the sidebar, where you can leaf through all modules (levels) that you have to complete. To get started:

Beware that these are all the modules for all the Scientific Programming courses! Don’t start making them all until you’ve read below which specific modules you need to do for the course you are following.

The track

The Scientific Programming track consists of three courses:

You can follow each course at different paces to fit your schedule. You don’t have to follow the entire track, you can also follow a single course. If you already have some programming experience and would like to skip the first (two) course(s), please contact us.

Course specific information can be found further down.

Structure

The track is designed to be very flexible. So, there are no compulsory sessions. There are no lectures (except for a kick-off meeting at the start of Scientific Programming 1) and the tutorials are optional. And, you can follow the courses at any pace that fits your schedule.

Help

The fact that the track is flexible doesn’t mean you are on you own. We do provide a lot of help throughout the course with the programming assignments. It is up to you to seek out this help.

There are two ways to get help, the tutorials and the forum: - You can find the tutorial schedule here: datanose - Forum (online): We use Ed as an online discussion platform. Here you can discuss the assignments with other students and also the staff. You will be invited for Ed when enrolled for the course.

Programming modules

You’re going to learn programming through a number of programming modules. Each module consist of:

The modules are grouped into levels, you have to make one module per level. For some levels you have the choice between two different modules. When there is such a choice, you will learn the same programming concepts, but often in different thematic context (i.e. different scientific fields).

Here below is an overview of all modules for all courses.

Scientific Programming 1 (Level 1 - Level 3)

Level 0 Python Installation
Level 1 (choose one of the modules) ALGORITHMS. Learn to think like a computer. Things that we intuitively know how to do, like drawing a pyramid or computing change for a payment, is hard to get a computer to do right. In this module you’ll learn how to break down such intuitive problems into steps that even a computer can understand. or NUMBERS. How do you know if a number is a prime number? Number theory is the science about properties of numbers. In this mathematically oriented module you create a series of programs that compute this and other properties of numbers. No math knowledge required for this module. (You will learn some, though.)
Level 2 (choose one of the modules) TEXT. Natural language processing is the science of making a computer understand (something about) natural human language. You will learn how you can get a computer to understand the sentiment of tweets. Is the tone of the tweet positive or negative? or NUMERICAL INTEGRATION. In many scientific fields you need to determine the surface area under a function. Integration is a mathematical tool for doing so. However this tool doesn't always work and in such cases we can use numerical integration techniques to let the computer do the work for us. You will learn two important techniques for numerical integration.
Level 3 (no choice) BIG-DATA. In this module you will learn to work with data. You will, for example, analyze weather from the Netherlands and answer questions like: When was the first heat-wave? What was the longest freezing period?
Bonus (this module is optional) MOVEMENT. What happens if you dig a tunnel from one side to the other side of the planet and you fall in this tunnel? In this module you’re going to simulate that situation. In physics you often run into problems that are too laborious to compute by hand. In this module you’ll learn how to use your computer instead.

Scientific Programming 2 (Level 4 - Level 6)

Level 4 MONOPOLY. When playing Monopoly, a starting player's advantage seems unfair. To verify, you could play many (millions) real games, but this would take way too much time. Instead, you'll write a computer simulation. This also allows you to experiment with game adjustments to make it fair. You're doing all this for a board game, but this simulation principle applies to various scientific fields (economy, chemistry, biology...).
Level 5 COMPLEXITY. What is an efficient algorithm? When you want to run large simulations, analyze large dataset, or any other computationally intensive task, writing efficient algorithms could in some cases mean the difference between a run time of a couple of minutes or of weeks. The theory of computational complexity gives you a way to reason about the efficiency of algorithms and make them run (much) faster.
Level 6 SURVIVAL. Python is very popular for analyzing and processing data. And Pandas is an important reason why. Pandas is the most used Python package for handling data. You will learn how to use this package to analyze and visualize geographical data.

Data Processing (Level 7 - Level 11)

Level 7 POPULATIONS. Predator-prey simulations are models used in ecology and computer science to study the dynamics between populations of predators and their prey within an ecosystem. What's particularly interesting about these simulations is how they can reveal emergent patterns and complex behaviors that arise from relatively simple rules. To make it easier to program such a simulation you will learn a programming technique called object oriented programming (OOP).
Level 8 ACQUISITION. What was the best year for movies? This is often debated on the internet, for example here, here, here and here. You're going to write a bot that extracts information from websites to find an answer to this question. This process is called web-scraping. When you're doing research it's often the case that the data is out there on the internet, but no-one went through the effort to collect for you in way that you can directly use it. In that case you will need to know how to acquire this data yourself.
Level 9 TRANSFORMATION. Does a restaurant pass health code inspections? Could you gauge this by analyzing reviews of this restaurant? You will use information from two different data sources (Yelp and the Washington State Department of Health) to answer these questions. The problem is (as you will see quite often when analyzing data), there is no straightforward way to combine the two data sources. They were never made to be used together, so you will need to transform the datasets to be used together.
Level 10 DATABASES. When working with really large amounts of data, you typically wont store it in simple (text) files on your computer. You'd use something like a relational database. To get information from a database you'll need a specific language called Structured Query Language (SQL). You're going to practice SQL by solving a mystery...
Level 11 FINAL PROJECT. Do you have data from your own studies or research that you would like to analyze? Do this with our help for the final project of this course. The goal here is to work on something that you find interesting and care about.

Dates and deadlines

Deadlines

Deadlines for each level are listed below. The deadlines are our recommendation. If you follow these deadlines you’ll have all the assignments finished in time for the corresponding examination moment. You can occasionally diverge a bit from the deadlines, but if you notice that you’re structurally behind please contact us.

The deadlines depend on the course you’re starting this period and on the pace you decided to do the course at: relaxed (finish SP1 and SP2 in one year), standard (finish all courses in one year), and fast (finish all courses in one semester). Most students tend to follow the standard schedule. The standard schedule corresponds to an investment of about 10 hours per week, but this varies a lot per student and educational background.

The proposed schedules are worked out for only the most common cases. If none of those seems to apply to you, please contact us.

The deadline schedules:

(log in for deadlines)

Grading

The grading for Scientific Programming 1 and 2 is different from the grading for Data Processing. The main differences are:

Grading Scientific Programming 1 and Scientific Programming 2

Show grading details.

The grading for Scientific Programming 1 and 2 is exactly the same. For both, the grading is based on three modules and a final exam.

Final grade

The course’s final result will be “pass” or “fail”, which means that no grades are assigned. You pass by:

  • submitting sufficient coursework (as detailed below)
  • passing the final exam

Coursework (modules)

For each module you will one of the following grades:

  • completely correct
  • mostly correct
  • insufficient

In principle we expect that everything you hand in is completely correct. However, it happens easily to miss a detail, so to relax the requirements a bit:

  • All your modules need to be at least mostly correct.
  • At least one module needs to be completely correct

You may not re-submit (variations of) solutions that you wrote for any other course’s problems. In case you have done similar assignments before, discuss with the course staff whether this is the right course for you.

Final exam

The final exam is (on campus) programming exam in a controlled setting. This will take about 3 hours. It contains a couple of small programming assignments. You’ll pass the exam when your answers are sufficiently correct.

You can participate in the final exam if you’re done with the coursework, meaning:

  • The first two modules are graded and at least mostly correct.
  • The last module is handed in (with reasonable expectation to be correct).

If you do not meet these requirements (for example when you still have an insufficient or one of the first two modules isn’t graded yet) you cannot participate with the exam yet. You will have to finish the coursework first and do the exam at a later date.

You need to make sure that you leave enough time for grading for the first two modules before the exam. So, hand in the first two modules no later that two weeks before the exam. If you hand in everything last minute you risk not being able to participate in the exam

Grading Data Processing

Show grading details.

Final grade

For Data Processing you will receive a regular grade (1-10).

Your final grade will be determined by the grades for Level 7, 9 and your final project as follows:

Module Weight
Level 7 25%
Level 9 25%
Final Project 50%

Note that, even though level 8 and 10 do not contribute to the grade, you still need to hand them in correctly in order to pass the course.

Coursework (modules)

Some of the assignments are graded (level 7, level 9 and the final project) and some are pass fail (level 8 and level 10).

  • For the pass/fail assignments the grading works the same as for Scientific Programming 1 and 2. You can get one three possible marks: completely correct, mostly correct, or insufficient. You need to get these assignments at least mostly correct.
  • For the graded assignments you get a note based on the code quality (how well written is the code?) and the correctness (does it meet the specifications) of your assignment. Both are graded on a scale from 1 to 5. Where the correctness weighs more than code quality: points=correctness×3+design×2\textrm{points} = \textrm{correctness} \times 3 + \textrm{design} \times 2. So you can get at most 25 and at least 5 points.
  • The final project is graded based on novelty and quality. For both you get points on a scale from 1 to 5.
    • Novelty: As a independent programmer you need to often learn a new skill, library or concept by yourself. The main purpose of the final project is to practice with this. This is reflected in the novelty part: to what extend did you do/learn something new (not yet discussed during the course)?
    • Quality: Just like the other graded assignments. How well structured/written is your code?

Final project grade

The final project is graded on four aspects:

aspect weight notes
final_result 30%  
complexity 30% How complex is your project? How many data sources? Do you need to transform the data?
code_quality 20% Is your code well designed? Is it easy to understand?
process 20% How well did you document the process of the project?

Doing your own work

This course’s philosophy on academic honesty is best stated as “be reasonable.” The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own (unless explicitly stated otherwise). Collaboration on problem sets is not permitted except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on the course’s test and quiz is not permitted at all.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from the course’s heads. Acts considered not reasonable by the course are handled harshly.

Reasonable

Not Reasonable

In all cases we follow the directives regarding fraud and plagiarism of the University of Amsterdam and of the Computer Science BSc programme. Find them here in English and Dutch.

Acknowledgements

This course has been designed by Simon Pauw, Martijn Stegeman, Wouter Vrielink, Tim Doolan and Ivo van Vulpen.

It is partially based on many great programming resources that have been published as Open Courseware under a Creative Commons license. The resulting work itself is also published under the Creative Commons License Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Feel free to re-use! If you would like to use the work commercially, please send an e-mail for arranging a license.

We have had lots of help from students as well as teaching assistants who tried the course or added ideas of their own. We especially thank:

We have used many programming recourses for inspiration: