Special Export tool
The Special Export tool fetches specific pages with their raw content (wikitext) in real-time, without needing to download the entire dataset. The content is provided in XML format.
Importing Packages¶
Python | |
---|---|
1 |
|
Using the Special Export Tool¶
You can actually use Special:Export to retrieve pages from any Wiki site. On the German Wiktionary, however, the tool is labelled Spezial:Exportieren, but it works the same way.
Exporting Pages from Any Wiki Site
To access the XML content of the page titled "Austria" from English Wikipedia, you can construct your URL as follows.
Python | |
---|---|
1 2 3 4 |
|
Python Console Session | |
---|---|
1 |
|
Exporting Pages from the German Wiktionary
For the German Wiktionary, the export tool uses Spezial:Exportieren
instead of Special:Export
.
Python | |
---|---|
1 2 3 4 |
|
Python Console Session | |
---|---|
1 |
|
Fetching XML Data with requests
¶
To programmatically fetch and download XML content, you can use Python's requests
library. This example shows how to build the URL, make a request, and get the XML content of a Wiktionary page by its title.
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Next, let us attempt to retrieve the XML content for the page titled "hoch" and print the initial 500 bytes for a glimpse of the XML content.
Python | |
---|---|
1 2 |
|
Python Console Session | |
---|---|
1 2 3 4 5 6 7 8 |
|
We will continue to use the fetch
function throughout this tutorial.