Special Export tool
The Special Export tool fetches specific pages with their raw content (wikitext) in real-time, without needing to download the entire dataset. The content is provided in XML format.
Using the Special Export Tool¶
You can actually use Special Export to retrieve pages from any Wiki site. On the German Wiktionary, however, the tool is labelled Spezial:Exportieren, but it works the same way.
Examples¶
Exporting Pages from Any Wiki Site
To access the XML content of the page titled "Austria" from English Wikipedia, you can use the following Python code. When you press run
, it will open the export link in your default browser:
import webbrowser title = 'Austria' domain = 'en.wikipedia.org' url = f'https://{domain}/wiki/Special:Export/{title}' webbrowser.open_new_tab(url)
Exporting Pages from the German Wiktionary
For the German Wiktionary, the export tool uses Spezial:Exportieren
instead of Special:Export
. You can use similar Python code to open the export link for the page titled "schön" (German for "beautiful"):
title = 'schön' domain = 'de.wiktionary.org' url = f'https://{domain}/wiki/Spezial:Exportieren/{title}' webbrowser.open_new_tab(url)
Using the requests
Library¶
To programmatically fetch and download XML content, you can use Python's requests
library. This example shows how to build the URL, make a request, and get the XML content of a Wiktionary page by its title.
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Next, let us attempt to retrieve the XML content for the page titled "hoch" and print the initial 500 bytes for a glimpse of the XML content displayed in the Result
tab.
Python | |
---|---|
1 2 |
|
Python Console Session | |
---|---|
1 |
|
We will continue to use the fetch
function throughout this tutorial.