browsertrix-crawler/package.json

65 lines
1.8 KiB
JSON
Raw Normal View History

2020-10-31 13:16:37 -07:00
{
"name": "browsertrix-crawler",
2024-06-13 10:31:57 -07:00
"version": "1.1.4",
2020-10-31 13:16:37 -07:00
"main": "browsertrix-crawler",
"type": "module",
2020-10-31 13:16:37 -07:00
"repository": "https://github.com/webrecorder/browsertrix-crawler",
"author": "Ilya Kreymer <ikreymer@gmail.com>, Webrecorder Software",
"license": "AGPL-3.0-or-later",
"scripts": {
"tsc": "tsc",
"format": "prettier src/ --check",
"format:fix": "prettier src/ --write",
"lint": "eslint src/",
"lint:fix": "yarn format:fix && eslint src/ --fix",
"test": "yarn node --experimental-vm-modules $(yarn bin jest --bail 1)",
"prepare": "husky install"
},
2020-10-31 13:16:37 -07:00
"dependencies": {
Dev 0.9.0 Beta 1 Work - Playwright Removal + Worker Refactor + Redis State (#253) * Migrate from Puppeteer to Playwright! - use playwright persistent browser context to support profiles - move on-new-page setup actions to worker - fix screencaster, init only one per page object, associate with worker-id - fix device emulation: load on startup, also replace '-' with space for more friendly command-line usage - port additional chromium setup options - create / detach cdp per page for each new page, screencaster just uses existing cdp - fix evaluateWithCLI to call CDP command directly - workers directly during WorkerPool - await not necessary * State / Worker Refactor (#252) * refactoring state: - use RedisCrawlState, defaulting to local redis, remove MemoryCrawlState and BaseState - remove 'real' accessors / draining queue - no longer neede without puppeteer-cluster - switch to sorted set for crawl queue, set depth + extraHops as score, (fixes #150) - override console.error to avoid logging ioredis errors (fixes #244) - add MAX_DEPTH as const for extraHops - fix immediate exit on second interrupt * worker/state refactor: - remove job object from puppeteer-cluster - rename shift() -> nextFromQueue() - condense crawl mgmt logic to crawlPageInWorker: init page, mark pages as finished/failed, close page on failure, etc... - screencaster: don't screencast about:blank pages * more worker queue refactor: - remove p-queue - initialize PageWorkers which run in its own loop to process pages, until no pending pages, no queued pages - add setupPage(), teardownPage() to crawler, called from worker - await runWorkers() promise which runs all workers until completion - remove: p-queue, node-fetch, update README (no longer using any puppeteer-cluster base code) - bump to 0.9.0-beta.1 * use existing data object for per-page context, instead of adding things to page (will be more clear with typescript transition) * more fixes for playwright: - fix profile creation - browser: add newWindowPageWithCDP() to create new page + cdp in new window, use with timeout - crawler: various fixes, including for html check - logging: addition logging for screencaster, new window, etc... - remove unused packages --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-03-17 12:50:32 -07:00
"@novnc/novnc": "^1.4.0",
"@types/sax": "^1.2.7",
"@webrecorder/wabac": "^2.16.12",
"browsertrix-behaviors": "^0.6.0",
"crc": "^4.3.2",
"get-folder-size": "^4.0.0",
"husky": "^8.0.3",
"ioredis": "^5.3.2",
"js-levenshtein": "^1.1.6",
"js-yaml": "^4.1.0",
"minio": "^7.1.3",
"p-queue": "^7.3.4",
"pixelmatch": "^5.3.0",
"pngjs": "^7.0.0",
"puppeteer-core": "^22.6.1",
"sax": "^1.3.0",
"sharp": "^0.32.6",
"tsc": "^2.0.4",
"uuid": "8.3.2",
"warcio": "^2.2.1",
"ws": "^7.4.4",
"yargs": "^17.7.2"
},
"devDependencies": {
"@types/js-levenshtein": "^1.1.3",
"@types/js-yaml": "^4.0.8",
"@types/node": "^20.8.7",
"@types/pixelmatch": "^5.2.6",
"@types/pngjs": "^6.0.4",
"@types/uuid": "^9.0.6",
"@types/ws": "^8.5.8",
"@typescript-eslint/eslint-plugin": "^6.10.0",
"@typescript-eslint/parser": "^6.10.0",
"eslint": "^8.53.0",
"eslint-config-prettier": "^9.0.0",
"eslint-plugin-react": "^7.22.0",
"jest": "^29.7.0",
"md5": "^2.3.0",
"prettier": "3.0.3",
"typescript": "^5.2.2"
},
"jest": {
"transform": {},
"testTimeout": 90000
2020-10-31 13:16:37 -07:00
}
}