SecretAgent is a free and open source headless browser that's written in NodeJs, built on top of Chrome and nearly impossible for websites to detect.
We started by challenging ourselves to create the ultimate scraper detection tool, which we coined DoubleAgent. Along the way we discovered 76,697 checks that any website can implement to block practically all known scrapers. Then we designed SecretAgent to bypass detection by emulating real users.
SecretAgent uses Chrome as its core rendering engine under the hood, with DevTools Protocol as its glue layer.
Instead of creating another complex puppeteer-like API that requires use of nested callbacks and running code in remote contexts, we designed the AwaitedDOM. AwaitedDOM is a W3C compliant DOM written for NodeJS that allows you to write scraper scripts as if you were inside the webpage.
To use SecretAgent in your project, install it with npm or yarn:
npm i --save secret-agent
or
yarn add secret-agent
When you install SecretAgent, it also downloads a recent version of Chrome and an app call Replay to debug and troubleshoot sessions.
More details about installation can be found on the troubleshooting page.
SecretAgent's API should be familiar to web developers everywhere. We created a W3C compliant DOM library for Node, which allows you to use the exact same DOM selector and traversal commands as you do in modern web browsers like Chromium, Firefox, and Safari.
For example, here's how you might extract the title and intro paragraph from example.org:
import agent from 'secret-agent';
(async () => {
await agent.goto('https://example.org');
const title = await agent.document.title;
const intro = await agent.document.querySelector('p').textContent;
agent.output = { title, intro };
await agent.close();
console.log('Retrieved from https://example.org', agent.output);
})();
As shown in the example above, window.document follows the standard DOM specification, but with a cool twist which we call the AwaitedDOM.