Introduction

SecretAgent is a free and open source headless browser that's written in NodeJs, built on top of Chrome and nearly impossible for websites to detect.

Why SecretAgent?

  • Built for scraping - it's the first modern headless browsers designed specifically for scraping instead of just automated testing.
  • Designed for web developers - We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools.
  • Powered by Chrome - The powerful Chrome engine sits under the hood, allowing for lightning fast rendering.
  • Emulates any modern browser - Browser emulators make it easy to disguise your script as practically any browser.
  • Avoids detection along the entire stack - Don't be blocked because of TLS fingerprints in your networking stack.

How It Works

We started by challenging ourselves to create the ultimate scraper detection tool, which we coined DoubleAgent. Along the way we discovered 76,697 checks that any website can implement to block practically all known scrapers. Then we designed SecretAgent to bypass detection by emulating real users.

SecretAgent uses Chrome as its core rendering engine under the hood, with DevTools Protocol as its glue layer.

Instead of creating another complex puppeteer-like API that requires use of nested callbacks and running code in remote contexts, we designed the AwaitedDOM. AwaitedDOM is a W3C compliant DOM written for NodeJS that allows you to write scraper scripts as if you were inside the webpage.

Installation

To use SecretAgent in your project, install it with npm or yarn:

npm i --save secret-agent

or

yarn add secret-agent

When you install SecretAgent, it also downloads a recent version of Chrome and an app call Replay to debug and troubleshoot sessions.

More details about installation can be found on the troubleshooting page.

Usage Example

SecretAgent's API should be familiar to web developers everywhere. We created a W3C compliant DOM library for Node, which allows you to use the exact same DOM selector and traversal commands as you do in modern web browsers like Chromium, Firefox, and Safari.

For example, here's how you might extract the title and intro paragraph from example.org:

import agent from 'secret-agent';

(async () => {
  await agent.goto('https://example.org');
  const title = await agent.document.title;
  const intro = await agent.document.querySelector('p').textContent;
  await agent.close();

  console.log('Retrieved from https://example.org', {
    title,
    intro,
  });
})();

As shown in the example above, window.document follows the standard DOM specification, but with a cool twist which we call the AwaitedDOM.

Edit this page on GitHub