Parsing XML

In this section, we will explore how to use the lxml package to parse XML content from Wiki sites. Our goal is to extract the Wikitext, which will be processed in the next section.

How we parse the XML content depends on how we retrieve it:

  • Whether using the Special Export tool
  • Or from the Dump File

Let us begin by parsing XML content retrieved using the Special Export tool.