Syllabus Scientific Programming / Data Processing
General info
Welcome to this programming track! In the weeks ahead, you’ll use the Python programming language while learning to solve scientific problems from several fields of science. This track is intended for students who have no experience in programming at all. It comprises three courses, Scientific Programming 1, Scientific Programming 2 and Data Processing wherein you learn about the Python language by working on programming problems from several scientific areas. The three courses from this track are designed to be taken together, but you can choose to only follow one or two of the courses.
Table of Contents
Prerequisites
Scientific Programming 1 assumes no prior programming experience. If you have already done a course in Python, or if you have extensive experience in another programming language, this course might not be your best option—but we’re happy to refer you to other courses if you’d like!
Other than that, some modules assume high school mathematics or physics, but in those case you can choose an alternative module that doesn’t. If you feel overwhelmed, don’t hesitate to contact the course staff! We can explain the course’s philosophy and requirements, and make recommendations on how to approach problems.
Scientific Programming 2 and Data Processing only assume the preceding course as prior knowledge.
Learning goals
Scientific Programming 1 is a beginner’s course. We will teach you the basics of Python programming as well as several different ways of solving computational problems. After this course, we envision that you:
- can transform the description of a simple algorithm into working code by combining basic program elements;
- can apply several scientific programming techniques from different areas of study;
- can use a couple of libraries in your program and know how to find and read documentation on other libraries;
- can make your programs simpler and easier to read by employing a few standard tactics;
- can trace and fix several common programming errors.
After Scientific Programming 2 you should be able to independently tackle typical programming challenges that you might encounter in your field of studies/research. We will teach you more intermediate Python concepts. And some more advanced concepts pertaining to data analysis. After this course, we envision that you:
- can use native python data structures (like sets, dictionaries, and tuples);
- analyze the complexity of an algorithm;
- quickly learn to use new python packages and know how to find documentation for them;
- import and analyze data;
- create advanced plots.
For Data Processing you’ll build your own toolkit of useful programs with which you can read, transform and analyze data that you might find in various scientific areas. After this course we envision that you:
- can read data into your programs from several structured standard formats;
- can transform data into a form suitable for further analysis by combining basic operators;
- can build meaningful visualizations of your data;
- understand how to write programs that are easy to understand for yourself and other programmers;
- are capable of using advanced programming concepts (like object oriented programming).
Course materials
All the reading and video material is available on this website. You do not need to purchase any books or software. Every module consists of short explanations (written and in the form of videos) and assignments. You do need to bring your own laptop.
Contact
Contact: scientific@proglab.nl
Getting started
Your entry to the course is the sidebar, where you can leaf through all modules (levels) that you have to complete. To get started:
- Read the rest of the syllabus, below.
- Install Python.
- Choose one of the level 1 modules (Algorithms or Numbers) to get started!
Beware that these are all the modules for all the Scientific Programming courses! Don’t start making them all until you’ve read below which specific modules you need to do for the course you are following.
The track
The Scientific Programming track consists of three courses:
- Scientific Programming 1 (3 EC, not graded but pass/fail).
- Scientific Programming 2 (3 EC, not graded but pass/fail).
- Data Processing (6 EC, graded).
You can follow each course at different paces to fit your schedule. You don’t have to follow the entire track, you can also follow a single course. If you already have some programming experience and would like to skip the first (two) course(s), please contact us.
Course specific information can be found further down.
Structure
The track is designed to be very flexible. So, there are no compulsory sessions. There are no lectures (except for a kick-off meeting at the start of Scientific Programming 1) and the tutorials are optional. And, you can follow the courses at any pace that fits your schedule.
Help
The fact that the track is flexible doesn’t mean you are on you own. We do provide a lot of help throughout the course with the programming assignments. It is up to you to seek out this help.
There are two ways to get help, the tutorials and the forum: - You can find the tutorial schedule here: datanose - Forum (online): We use Ed as an online discussion platform. Here you can discuss the assignments with other students and also the staff. You will be invited for Ed when enrolled for the course.
Programming modules
You’re going to learn programming through a number of programming modules. Each module consist of:
- Theory: Explanations both written and in the form of video’s.
- Practice: Exercises to test your understanding of the theory.
- Assignments: Bigger programming problems that require combining multiple programming concepts.
The modules are grouped into levels, you have to make one module per level. For some levels you have the choice between two different modules. When there is such a choice, you will learn the same programming concepts, but often in different thematic context (i.e. different scientific fields).
Here below is an overview of all modules for all courses.
Scientific Programming 1 (Level 1 - Level 3)
Level 0 | Python Installation | ||
Level 1 (choose one of the modules) | ALGORITHMS. Learn to think like a computer. Things that we intuitively know how to do, like drawing a pyramid or computing change for a payment, is hard to get a computer to do right. In this module you’ll learn how to break down such intuitive problems into steps that even a computer can understand. | or | NUMBERS. How do you know if a number is a prime number? Number theory is the science about properties of numbers. In this mathematically oriented module you create a series of programs that compute this and other properties of numbers. No math knowledge required for this module. (You will learn some, though.) |
Level 2 (choose one of the modules) | TEXT. Natural language processing is the science of making a computer understand (something about) natural human language. You will learn how you can get a computer to understand the sentiment of tweets. Is the tone of the tweet positive or negative? | or | NUMERICAL INTEGRATION. In many scientific fields you need to determine the surface area under a function. Integration is a mathematical tool for doing so. However this tool doesn't always work and in such cases we can use numerical integration techniques to let the computer do the work for us. You will learn two important techniques for numerical integration. |
Level 3 (no choice) | BIG-DATA. In this module you will learn to work with data. You will, for example, analyze weather from the Netherlands and answer questions like: When was the first heat-wave? What was the longest freezing period? | ||
Bonus (this module is optional) | MOVEMENT. What happens if you dig a tunnel from one side to the other side of the planet and you fall in this tunnel? In this module you’re going to simulate that situation. In physics you often run into problems that are too laborious to compute by hand. In this module you’ll learn how to use your computer instead. |
Scientific Programming 2 (Level 4 - Level 6)
Level 4 | MONOPOLY. When playing Monopoly, a starting player's advantage seems unfair. To verify, you could play many (millions) real games, but this would take way too much time. Instead, you'll write a computer simulation. This also allows you to experiment with game adjustments to make it fair. You're doing all this for a board game, but this simulation principle applies to various scientific fields (economy, chemistry, biology...). | ||
Level 5 | COMPLEXITY. What is an efficient algorithm? When you want to run large simulations, analyze large dataset, or any other computationally intensive task, writing efficient algorithms could in some cases mean the difference between a run time of a couple of minutes or of weeks. The theory of computational complexity gives you a way to reason about the efficiency of algorithms and make them run (much) faster. | or | SHAKESPEARE. Was the play “Arden of Faversham” (1592) written by Shakespeare? A.C. Swinburne thought it was, but T.S. Eliot didn’t. Could we create a computer program that could settle the debate once and for all? It turns out that the answer is: yes… maybe? |
Level 6 | SURVIVAL. Python is very popular for analyzing and processing data. And Pandas is an important reason why. Pandas is the most used Python package for handling data. You will learn how to use this package to analyze and visualize geographical data. |
Data Processing (Level 7 - Level 11)
Level 7 | POPULATIONS. Predator-prey simulations are models used in ecology and computer science to study the dynamics between populations of predators and their prey within an ecosystem. What's particularly interesting about these simulations is how they can reveal emergent patterns and complex behaviors that arise from relatively simple rules. To make it easier to program such a simulation you will learn a programming technique called object oriented programming (OOP). | ||
Level 8 | ACQUISITION. What was the best year for movies? This is often debated on the internet, for example here, here, here and here. You're going to write a bot that extracts information from websites to find an answer to this question. This process is called web-scraping. When you're doing research it's often the case that the data is out there on the internet, but no-one went through the effort to collect for you in way that you can directly use it. In that case you will need to know how to acquire this data yourself. | ||
Level 9 | TRANSFORMATION. Does a restaurant pass health code inspections? Could you gauge this by analyzing reviews of this restaurant? You will use information from two different data sources (Yelp and the Washington State Department of Health) to answer these questions. The problem is (as you will see quite often when analyzing data), there is no straightforward way to combine the two data sources. They were never made to be used together, so you will need to transform the datasets to be used together. | ||
Level 10 | DATABASES. When working with really large amounts of data, you typically wont store it in simple (text) files on your computer. You'd use something like a relational database. To get information from a database you'll need a specific language called Structured Query Language (SQL). You're going to practice SQL by solving a mystery... | ||
Level 11 | FINAL PROJECT. Do you have data from your own studies or research that you would like to analyze? Do this with our help for the final project of this course. The goal here is to work on something that you find interesting and care about. |
Dates and deadlines
Deadlines
Deadlines for each level are listed below. The deadlines are our recommendation. If you follow these deadlines you’ll have all the assignments finished in time for the corresponding examination moment. You can occasionally diverge a bit from the deadlines, but if you notice that you’re structurally behind please contact us.
The deadlines depend on the course you’re starting this period and on the pace you decided to do the course at: relaxed (finish SP1 and SP2 in one year), standard (finish all courses in one year), and fast (finish all courses in one semester). Most students tend to follow the standard schedule. The standard schedule corresponds to an investment of about 10 hours per week, but this varies a lot per student and educational background.
The proposed schedules are worked out for only the most common cases. If none of those seems to apply to you, please contact us.
The deadline schedules:
(log in for deadlines)
Grading
The grading for Scientific Programming 1 and 2 is different from the grading for Data Processing. The main differences are:
- Scientific Programming 1 and 2 are pass/fail courses and Data Processing is a graded course.
- Scientific Programming 1 and 2 have a final exam Data Processing hasn’t.
Grading Scientific Programming 1 and Scientific Programming 2
Show grading details.
The grading for Scientific Programming 1 and 2 is exactly the same. For both, the grading is based on three modules and a final exam.
Final grade
The course’s final result will be “pass” or “fail”, which means that no grades are assigned. You pass by:
- submitting sufficient coursework (as detailed below)
- passing the final exam
Coursework (modules)
For each module you will one of the following grades:
- completely correct
- mostly correct
- insufficient
In principle we expect that everything you hand in is completely correct. However, it happens easily to miss a detail, so to relax the requirements a bit:
- All your modules need to be at least mostly correct.
- At least one module needs to be completely correct
You may not re-submit (variations of) solutions that you wrote for any other course’s problems. In case you have done similar assignments before, discuss with the course staff whether this is the right course for you.
Final exam
The final exam is (on campus) programming exam in a controlled setting. This will take about 3 hours. It contains a couple of small programming assignments. You’ll pass the exam when your answers are sufficiently correct.
You can participate in the final exam if you’re done with the coursework, meaning:
- The first two modules are graded and at least mostly correct.
- The last module is handed in (with reasonable expectation to be correct).
If you do not meet these requirements (for example when you still have an insufficient or one of the first two modules isn’t graded yet) you cannot participate with the exam yet. You will have to finish the coursework first and do the exam at a later date.
You need to make sure that you leave enough time for grading for the first two modules before the exam. So, hand in the first two modules no later that two weeks before the exam. If you hand in everything last minute you risk not being able to participate in the exam
Grading Data Processing
Show grading details.
Final grade
For Data Processing you will receive a regular grade (1-10).
Your final grade will be determined by the grades for Level 7, 9 and your final project as follows:
Module | Weight |
---|---|
Level 7 | 25% |
Level 9 | 25% |
Final Project | 50% |
Note that, even though level 8 and 10 do not contribute to the grade, you still need to hand them in correctly in order to pass the course.
Coursework (modules)
Some of the assignments are graded (level 7, level 9 and the final project) and some are pass fail (level 8 and level 10).
- For the pass/fail assignments the grading works the same as for Scientific Programming 1 and 2. You can get one three possible marks: completely correct, mostly correct, or insufficient. You need to get these assignments at least mostly correct.
- For the graded assignments you get a note based on the code quality (how well written is the code?) and the correctness (does it meet the specifications) of your assignment. Both are graded on a scale from 1 to 5. Where the correctness weighs more than code quality: . So you can get at most 25 and at least 5 points.
- The final project is graded based on novelty and quality. For both you get points on a scale from 1 to 5.
- Novelty: As a independent programmer you need to often learn a new skill, library or concept by yourself. The main purpose of the final project is to practice with this. This is reflected in the novelty part: to what extend did you do/learn something new (not yet discussed during the course)?
- Quality: Just like the other graded assignments. How well structured/written is your code?
Final project grade
The final project is graded on four aspects:
aspect | weight | notes |
---|---|---|
final_result | 30% | |
complexity | 30% | How complex is your project? How many data sources? Do you need to transform the data? |
code_quality | 20% | Is your code well designed? Is it easy to understand? |
process | 20% | How well did you document the process of the project? |
Doing your own work
This course’s philosophy on academic honesty is best stated as “be reasonable.” The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.
The essence of all work that you submit to this course must be your own (unless explicitly stated otherwise). Collaboration on problem sets is not permitted except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on the course’s test and quiz is not permitted at all.
Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from the course’s heads. Acts considered not reasonable by the course are handled harshly.
Reasonable
- Communicating with classmates about problem sets’ problems in English (or some other spoken language).
- Discussing the course’s material with others in order to understand it better.
- Helping a classmate identify a bug in his or her code at office hours, elsewhere, or even online, as by viewing, compiling, or running his or her code, even on your own computer.
- Incorporating a few lines of code that you find online or elsewhere into your own code, provided that those lines are not themselves solutions to assigned problems and that you cite the lines’ origins.
- Reviewing past semesters’ quizzes and solutions thereto.
- Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.
- Sharing a few lines of your own code online so that others might help you identify and fix a bug.
- Turning to the course’s heads for help or receiving help from the course’s heads during the quiz or test.
- Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problem set’s problems or your own final project.
- Whiteboarding solutions to problem sets with others using diagrams or pseudocode but not actual code.
- Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.
Not Reasonable
- Accessing a solution to some problem prior to (re-)submitting your own.
- Asking a classmate to see his or her solution to a problem set’s problem before (re-)submitting your own.
- Decompiling, de-obfuscating, or disassembling the staff’s solutions to problem sets.
- Failing to cite (as with comments) the origins of code or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.
- Giving or showing to a classmate a solution to a problem set’s problem when it is he or she, and not you, who is struggling to solve it.
- Looking at another individual’s work during the test or quiz.
- Paying or offering to pay an individual for work that you may submit as (part of) your own.
- Providing or making available solutions to problem sets to individuals who might take this course in the future.
- Searching for or soliciting outright solutions to problem sets online or elsewhere. So, avoid sources like: Stackoverflow, Google, chatGPT, GitHub, Copilot, etc.
- Splitting a problem set’s workload with another individual and combining your work.
- Submitting (after possibly modifying) the work of another individual beyond the few lines allowed herein.
- Submitting the same or similar work to this course that you have submitted or will submit to another.
- Submitting work to this course that you intend to use outside of the course (e.g., for a job) without prior approval from the course’s heads.
- Turning to humans (besides the course’s heads) for help or receiving help from humans (besides the course’s heads) during the quiz or test.
- Viewing another’s solution to a problem set’s problem and basing your own solution on it.
In all cases we follow the directives regarding fraud and plagiarism of the University of Amsterdam and of the Computer Science BSc programme. Find them here in English and Dutch.
Acknowledgements
This course has been designed by Simon Pauw, Martijn Stegeman, Wouter Vrielink, Tim Doolan and Ivo van Vulpen.
It is partially based on many great programming resources that have been published as Open Courseware under a Creative Commons license. The resulting work itself is also published under the Creative Commons License Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Feel free to re-use! If you would like to use the work commercially, please send an e-mail for arranging a license.
We have had lots of help from students as well as teaching assistants who tried the course or added ideas of their own. We especially thank:
- Jelle van Assema (assignments and checkpy)
- Roan van Blanken (checkpy tests)
- Natasja Wezel (videos, revisions)
- Iris Luden (video)
- Marianne de Heer Kloots (revisions and testing)
- Maarten Inja (DNA assignment)
- Quinten Post (translations)
- Marleen Rijksen (revisions)
- Huub Rutjes (films)
- Vera Schild (checkpy tests)
- Luca Verhees (artwork “semester of code”)
We have used many programming recourses for inspiration:
- 6.189 A Gentle Introduction to Programming Using Python by Sarina Canelake at MIT http://ocw.mit.edu
- 6.00 Introduction to Computer Science and Programming, Fall 2008 by Eric Grimson and John Guttag at MIT http://ocw.mit.edu
- CS50 Introduction to Computer Science I by David Malan at Harvard http://cs50.tv/
- 6.0001 Introduction to Computer Science and Programming in Python by Ana Bell, Eric Grimson and John Guttag at MIT http://ocw.mit.edu
- Think Python by Allen B. Downey http://greenteapress.com/wp/think-python/