Tech

Read data from any website using Python using the example of talk shows

The Internet. Infinite vastness. Almost every conceivable information can be found on the Internet: The weather forecast for the day after tomorrow at 2.45 p.m., the dates for the board meeting of the Billard Sportverein Wuppertal 1929 e. V. or the explanations of the term for large livestock units consuming roughage. But the wealth of data can quickly overwhelm you and suddenly you go to the same websites again and again to get recurring information. This is more efficient: A small Python program can do this work for you, automatically pull any data from the network and prepare it again. We'll show you how to make a small Python program called Who Talks Where that pulls up upcoming appointments, topics, guests and the description of the guests from the homepage of the talk shows.

This works because the network is machine-readable: The markup language Hypertext Markup Language (HTML) generally structures the text, images, videos or other data on a website. Rules for graphic formatting are laid down in the style sheet language Cascading Style Sheets (CSS). Together they determine what a website should look like. A browser like Firefox or Chrome interprets the two languages ​​and displays their information. This article basically consists of HTML and CSS as well.

If a browser can read this data, so can other programs. As long as you do not sell the data obtained, but use it privately, you can consider the network as your personal data heap from which you can use the Python programming language to draw useful information. One speaks of web scraping, screen scraping or simply scraping.

  • Access to all heise + content
  • exclusive tests, guides & backgrounds: independent, critically founded
  • Read c't, iX, Technology Review, Mac & i, Make, c't photography directly in the browser
  • Register once – read on all devices – cancel monthly
  • first month free, then monthly € 9.95
  • Weekly newsletter with personal reading recommendations from the editor-in-chief

Start FREE month

Start FREE month now

already subscribed to heise +?

More information about heise +

. (tagsToTranslate) ARD (t) BeautifulSoup (t) CSS (t) The First (t) Data (t) HTML (t) Information (t) PiP (t) Programming (t) Python (t) Requests (t) Scraping (t) Screen Scraping (t) Web Scraping (t) Web Requests (t) Websites (t) ZDF