Privacy settings

This website protects your privacy by adhering to the European Union General Data Protection Regulation (GDPR). We will not use your data for any purpose that you do not consent to and only to the extent not exceeding data which is necessary in relation to a specific purpose(s) of processing. You can grant your consent(s) to use your data for specific purposes below or by clicking “Agree to all”.

donate to support journalism
i
Python for journalists

Video

Python for journalists

The course Python for Journalists is meant for journalists looking to learn the most common uses of Python for data journalism. During four modules the course teaches you how to set up Python and all Python-related tools on your own computer. Next you'll learn how to clean up messy datasets using the Pandas library. In the third module you'll learn how to analyse data, again using the Pandas library. In the fourth and final module you'll learn how to automatically download data from the web, by using both the Beautiful Soup and Requests libraries to dabbling in webscraping.

This Python for Journalists course is meant for those who dabbled in Python, but somehow didn't persevere; and for those who can't wait to dive in head first... Though no programming knowledge is required, it helps if you know what a terminal or command prompt is and if you are familiar with Excel.

For all modules except module 1 Set up, there is a [Jupyter Notebook available](https://github.com/winnydejong/pythonforjournalists) to follow along during the course. Each notebook contains exercises and explanations. Next to these notebooks, you'll find cheatsheets with the used Python commands for modules 2, 3 and 4. Happy Pythoning!

1 Set Up

This module revolves around installing the right tools on your laptop. To follow along in the coming modules, you'll need Python 3, and several Python libraries like Requests, Pandas and BeautifulSoup installed. Jupyter Notebooks come highly recommended. It's recommended that you install all of this software in one go, using the [Anaconda distribution](https://anaconda.org/). This first module does not include a Jupyter Notebook.

On your computer:

Install the [Anaconda distribution](https://anaconda.org/) to install Python 3, libraries Requests, Pandas, and BeautifulSoup, and Jupyter Notebooks all at once on your computer.

Note: choose for the Anaconda installation that includes Python 3, at the time of writing that would be Python 3.6.

Extra preparation: If you want to make sure you have a solid foundation to build up on, you might want to learn about the Python syntax first. Here are some places where you can learn about different data types in Python, which might help before continuing with this course: (Since the following tutorials overlap, choosing one is highly recommended.)

Online beginner tutorials at [LearnPython.org](https://www.learnpython.org/)

Digital book [Python for you and me](https://pymbook.readthedocs.io/en/py3/)

2 Clean data

In this second module we'll show you how to into your Python conda environment; how to start a Jupyter Notebook. Once that's out of the way, you'll learn how to import a CSV-file into your Jupyter Notebook, to get ready for some data cleaning. Among other things you'll learn how to search and replace values inside a column; how to change the datatype of a column; and how to extract data from a column to populate a new column. This module includes both a Jupyter Notebook (empty and completed) and a cheatsheet - all named 'clean data'.

3 Analyse data

In this thirth module, you'll learn how to analyse data using the Pandas library. You'll learn how to explore your dataset, looking at summary statistics - count, median, mean, percentiles, standard deviation etc. - for each column. Next we'll look into how to sort, filter, sum and count values in columns. Finally you'll learn how to group data, creating (for those familiar with Excel) pivot tables, using the Pandas library. This module includes both a Jupyter Notebook (empty and completed) and a cheatsheet - all named 'analyse data'.

4 Scrape data

The final module revolves around scraping data using both the Requests and the BeautifulSoup libraries. Though in practice you'll likely first want to scrape data, to later clean and analyse those numbers, this module is last for training purposes. The modules on cleaning and analysing data introduced you to Python, Pandas and Jupyter Notebooks. Paving the way for some basic webscraping, including a for loop to collect data as efficient as possible. Finishing this module you should be able to write some basic webscrapers to collect data from the internet. This module includes both a Jupyter Notebook (empty and completed) and a cheatsheet - all named 'scrape data'.

Teacher

Related

Receive insights, knowledge and updates on funding opportunities.
Receive our monthly update, delivered straight to your inbox.