Skip to content

PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT don't work #349

@ynkmyz233

Description

@ynkmyz233

Some websites can freeze crawling ,like http://www.hemehealth.com
If abort image,crawls could freeze and PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT don't work,can't call errback to close page.

Playwright raise playwright._impl._errors.TimeoutError like this:

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as pw:
        async def abort(route):
            if route.request.resource_type in ["image"]:
                await route.abort()

        browser = await pw.chromium.launch()
        context = await  browser.new_context()
        page = await  context.new_page()
        page.set_default_timeout(3000)
        try:
            await page.route("**/*", abort)
            await page.goto("http://www.hemehealth.com")
            title=await page.title()
            print(title)
            await page.close()
            await context.close()
            await browser.close()
        except:
            await page.close()
            await context.close()
            await browser.close()
    
asyncio.run(main())

File "/usr/local/lib/python3.13/site-packages/playwright/_impl/_connection.py", line 558, in wrap_api_call
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 3000ms exceeded.
Call log:

scrapy-playwright has been frozen,can't call errback to close page:

import scrapy
from playwright.async_api import Page
from scrapy.crawler import CrawlerProcess

class ExampleSpider(scrapy.Spider):

    name = "example"
    custom_settings = {
        "PLAYWRIGHT_ABORT_REQUEST":lambda  req :  req.resource_type in ["image"],
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "DOWNLOAD_HANDLERS": {
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
        "CONCURRENT_REQUESTS":1,
        "PLAYWRIGHT_LAUNCH_OPTIONS":{'headless': True},
        "RETRY_ENABLED":False,
        "PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT":3000,
    }

    def start_requests(self):
        urls=['http://www.hemehealth.com']
        for url in urls:
            yield scrapy.Request(
            url=url,
            meta={
                "playwright": True,
                "playwright_context": url,
                "playwright_include_page": True,
            },
            callback=self.parse,
            errback=self.close,
            )
    
    async def parse(self, response):
        page = response.meta["playwright_page"]
        title = await page.title()
        html = await page.content()
        await page.close()
        await page.context.close()

    async def close(self, failure):
        page = failure.request.meta["playwright_page"]
        await page.close()
        await page.context.close()


process = CrawlerProcess()
process.crawl(ExampleSpider)
process.start()

2025-08-07 14:18:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2025-08-07 14:18:54 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2025-08-07 14:18:54 [scrapy-playwright] INFO: Starting download handler
2025-08-07 14:18:54 [scrapy-playwright] INFO: Starting download handler
2025-08-07 14:18:54 [scrapy-playwright] INFO: Launching browser chromium
2025-08-07 14:18:55 [scrapy-playwright] INFO: Browser chromium launched
2025-08-07 14:18:55 [scrapy-playwright] DEBUG: Browser context started: 'http://www.hemehealth.com' (persistent=False, remote=False)
2025-08-07 14:18:55 [scrapy-playwright] DEBUG: [Context=http://www.hemehealth.com] New page created, page count is 1 (1 for all contexts)
2025-08-07 14:18:55 [scrapy-playwright] DEBUG: [Context=http://www.hemehealth.com] Request: <GET http://www.hemehealth.com/> (resource type: document)
2025-08-07 14:18:55 [scrapy-playwright] DEBUG: [Context=http://www.hemehealth.com] Response: <200 http://www.hemehealth.com/>
........
2025-08-07 14:18:55 [scrapy-playwright] DEBUG: [Context=http://www.hemehealth.com] Response: <200 http://www.hemehealth.com/js/jquery.min.js>
2025-08-07 14:18:55 [scrapy-playwright] DEBUG: [Context=http://www.hemehealth.com] Response: <200 http://www.hemehealth.com/js/bootstrap.bundle.js>

Maybe the same issue:#266 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions