404 and redirects in Playwright

404 check, check for broken links and see redirect chain

Links are a fundamental part of the web. As websites change broken links become an inherent part of it.

Redirects on old urls prevent users getting 404’s and tell Google the new url. Good as a short term solution, but for the long run cleaning up redirects and removing links to old urls is better.

How to find 404’s and redirects with Playwright?

With Playwright its possible to scrape all the links on a page. Looping through the links and checking the statuscodes can give insights on which link is broken en which link is redirected.

const { chromium } = require('playwright-chromium');

(async() => {
    const browser = await chromium.launch({ headless: false, slowMo: 1050 });
    const page = await browser.newPage();
    await page.goto('https://playwright.dev');

    const hrefs = await page.evaluate(() => {
        return Array.from(document.links).map(item => item.href);
    });
    console.log(hrefs)

    for (let i = 0; i < hrefs.length; i++) {

        try {
            const response = await page.goto(hrefs[i]);

            for (let request = response.request(); request; request = request.redirectedFrom()) {
                console.log((await request.response()).status(), request.url())
            }

        } catch {
            console.log('no errorcode, offline?, check url:', hrefs[i])
        }

    }

    await page.close();
    await browser.close();
})();

Results

Scanning the Playwright.dev homepage gave the following result:
(No 404’s and only external urls that redirect)

Most common statuscodes

CodeMeaning
1xxInformational/temporary statuscode
2xxGreat, succes!
200OK
3xxRedirecting
301Permanent redirect
302Temporary redirect
4xxClient error
400Bad request
401Unauthorized
403Forbidden
404Not found
410Gone
5xxServer error
500Internal server error
502Bad gateway
503Service unavailable

Notes:

Next:

Previous:

Edit this page on Github