Skip to content

Workshop

A Hands-On Workshop on Parsing Wikitext

PyCon Austria
6 and 7 April 2025 | Eisenstadt

In this workshop, you will learn how to fetch, parse, and extract data from the German Wiktionary to gather linguistic information such as parts of speech, inflections, example sentences, and definitions. You will also learn where to find the data and how to process it using HTTP requests, as well as XML and Wikitext parsers.

Fetching XML data

Parsing XML

Parsing Wikitext

Using Google Colab

In this workshop, we will use Google Colab, a free, cloud-based platform for running Python code in a Jupyter Notebook environment.

Follow these steps to get started:

  1. Make sure you are signed in with your Google account.
  2. Ensure that you have an active Internet connection on your device.
  3. You can follow and run the code on a tablet, but if you'd like to edit on the go, I recommend using a laptop for a smoother experience.
  4. To access the workshop material, simply click the links provided in the guide table of contents.

Running on Your Own Machine

If you prefer not to use Google Colab but still want to participate in the workshop, you can download the source code and run it on your local machine.

To get started, make sure you install the necessary dependencies for the workshop:

Text Only
1
pip install requests lxml mwparserfromhell

Download Materials of the Workshop

You can download the zip file with all the materials of the workshop, including:

  • Jupyter Notebooks
  • Python files
  • Sample data

Download zip file