Running a Web Crawler in a Docker Container

Introduction

A website may have hundreds, thousands, or even millions, of public facing pages. When we are responsible for maintaining such a website, it’s impractical to traverse it manually looking for broken links. We need an automated testing tool: one which can scan the whole website and log any broken links, so we can get them fixed sooner rather than later.

In this blog, I am going to describe a web crawler project which can easily and efficiently achieve the goal. The primary technologies used in this project are Scrapy and Docker.