Running a Web Crawler in a Docker Container

Introduction

A website may have hundreds, thousands, or even millions, of public facing pages. When we are responsible for maintaining such a website, it's impractical to traverse it manually looking for broken links. We need an automated testing tool: one which can scan the whole website and log any broken links, so we can get them fixed sooner rather than later.In this blog, I am going to describe a web crawler project which can easily and efficiently achieve the goal. The primary technologies used in this project are Scrapy and Docker.

13 September, 2018 / 0 Comments

Learn more >

TEL highlights for 2016

Shine's Technical Excellence Leadership Group (TEL) has had a stellar year! In this post we've pulled together our top picks from 2016 that we think deserve a special shout out before the year comes to a close. But first, a quick recap on what the TEL group actually is.TEL was established in 2011 with the aim of publicising the great technical work that Shine does, and to raise the company's profile as a technical thought-leader through blogs, local meet up talks, and conference presentations. TEL is allocated a yearly budget from the super-duper generous Shine directors, and the members of the TEL group are put in charge of overseeing how it is spent.The budget comprises two parts: money and time. The monetary portion of the budget goes to prizes and bonuses for producing material. The time portion is for staff to draw upon to get away from their day-to-day work commitments and to produce their material. So, now that you know what TEL is all about, let's have a look at the highlight reel from 2016 shall we?

21 December, 2016 / 0 Comments

Learn more >

License to Queue

A few months ago I joined an exciting project with fellow Shiner Graham Polley, who you might know from such hits as Put on your streaming shoes. This is a follow-up article, discussing the elegant way in which we solved a hideous asynchronous limitation in PHP.My role on the project was DevOps-based, and I was there to build some infrastructure using Amazon Web Services. As the cool kids would put it, I was there to put our client's enterprise application INTO the cloud, or, more succinctly, to build a solution coupling services from two rival cloud service providers and provide a new league of scalability and flexibility. The solution was pretty simple, but, like any simple solution, the little complexities come out along the way, and when you're least expecting them.

19 December, 2014 / 7 Comments

Learn more >

Python Tag

Running a Web Crawler in a Docker Container

Introduction

License to Queue

Menu

Get in touch

Connect with us

Doug

Marcela

Trudi

Joy

James