This is the primary class to interact with SecretAgent. The following is a simple example:
const agent = require('secret-agent');
(async () => {
await agent.goto('https://www.google.com');
// other actions...
await agent.close();
})();
An Agent instance can be thought of as a single user-browsing session. A default instance is automatically initialized and available as the default export of secret-agent
. Each additional instance you create has the following attributes:
An instance has a replayable Session that will record all commands, dom changes, interaction and page events.
Instances are very lightweight, sharing a pool of browsers underneath. To manage concurrent scrapes in a single script, you can create one Agent for each scrape, or manage load and concurrency with a Handler.
Agent instances can have multiple Tabs, but only a single tab can be focused at a time. Clicks and other user interaction will go to the active tab (interacting with multiple tabs at once by a single user is easily detectable).
Each Agent instance creates a private environment with its own cache, cookies, session data and BrowserEmulator. No data is shared between instances -- each operates within an airtight sandbox to ensure no identities leak across requests.
A default instance is automatically initialized and available as the default export of secret-agent
.
The default instance can receive configuration via command line arguments. Any args starting with --input.*
will be processed. The resulting json object is available as agent.input
// script.js
const agent = require('secret-agent');
console.log(agent.input); // { secret: "true", agent: "true" }
$ node script.js --input.secret=true --input.agent=true
Creates a new sandboxed browser instance with unique user session and fingerprints. Or pass in an existing UserProfile to reconstruct a previously used user session.
You can optionally await an instance (or constructor) to cause the connection to the underlying SecretAgent to be initialized. If you don't await, the connection will be established on the first call.
Note: If you provide a name
that has already been used to name another instance then a counter will be appended to your string to ensure its uniqueness. However, it's only unique within a single NodeJs process (i.e., rerunning your script will reset the counter).
const { Agent } = require('secret-agent');
(async () => {
// connection established here
const agent = await new Agent({
userAgent: '~ mac 13.1 & chrome > 14'
});
})();
object
Accepts any of the following:options | ConnectionToCore
. An object containing IConnectionToCoreOptions
used to connect, or an already created ConnectionToCore
instance. Defaults to automatically booting up and connecting to a local Core
.string
. This is used to generate a unique sessionName.strong
. This sets your browser's user agent string. Prefixing this string with a tilde (~) allows for dynamic options.string
defaults to default-browser-emulator
. Chooses the BrowserEmulator plugin which emulates the properties that help SecretAgent look like a normal browser.string
defaults to default-human-emulator
. Chooses the HumanEmulator plugin which drives the mouse/keyboard movements.IGeolocation
. Overrides the geolocation of the user. Will automatically grant permissions to all origins for geolocation.number
. Latitude between -90 and 90.number
. Longitude between -180 and 180.number
. Non-negative accuracy value. Defaults to random number 40-50.string
. Overrides the host timezone. A list of valid ids are available at unicode.orgstring
. Overrides the host languages settings (eg, en-US). Locale will affect navigator.language value, Accept-Language request header value as well as number and date formatting rules.IViewport
. Sets the emulated screen size, window position in the screen, inner/outer width and height. If not provided, the most popular resolution is used from statcounter.com.number
. The page width in pixels (minimum 0, maximum 10000000).number
. The page height in pixels (minimum 0, maximum 10000000).number
defaults to 1. Specify device scale factor (can be thought of as dpr).number
. The optional screen width in pixels (minimum 0, maximum 10000000).number
. The optional screen height in pixels (minimum 0, maximum 10000000).number
. Optional override browser X position on screen in pixels (minimum 0, maximum 10000000).number
. Optional override browser Y position on screen in pixels (minimum 0, maximum 10000000).BlockedResourceType[]
. Controls browser resource loading. Valid options are listed here.IUserProfile
. Previous user's cookies, session, etc.object
. An object containing properties to attach to the agent. NOTE: if using the default agent, this object will be populated with command line variables starting with --input.{json path}
. The {json path}
will be translated into an object set to agent.input
.boolean
. Whether or not to show the Replay UI. Can also be set with an env variable: SA_SHOW_REPLAY=true
.string
. A socks5 or http proxy url (and optional auth) to use for all HTTP requests in this session. The optional "auth" should be included in the UserInfo section of the url, eg: http://username:password@proxy.com:80
.object
. Optional settings to mask the Public IP Address of a host machine when using a proxy. This is used by the default BrowserEmulator to mask WebRTC IPs.string
. The URL of an http based IpLookupService. A list of common options can be found in plugins/default-browser-emulator/lib/helpers/lookupPublicIp.ts
. Defaults to ipify.org
. string
. The optional IP address of your proxy, if known ahead of time.string
. The optional IP address of your host machine, if known ahead of time.Returns a reference to the currently active tab.
Tab
The connectionToCore host address to which this Agent has connected. This is useful in scenarios where a Handler is round-robining connections between multiple hosts.
Promise<string>
Returns a reference to the main Document for the active tab.
SuperDocument
Alias for activeTab.document
Returns a list of FrameEnvironments loaded for the active tab.
Promise<FrameEnvironment[]>
.Contains the input configuration (if any) for this agent. This might come from:
NOTE: if using the default agent, this object will be populated with command line variables starting with --input.*
. The parameters will be translated into an object set to agent.input
.
An execution point that refers to a command run on this instance (waitForElement
, click
, type
, etc). Command ids can be passed to select waitFor*
methods to indicate a starting point to listen for changes.
Promise<number>
Alias for activeTab.lastCommandId
Returns a reference to the document of the mainFrameEnvironment of the active tab.
Alias for tab.mainFrameEnvironment.document.
SuperDocument
Retrieves metadata about the agent configuration:
string
. The session identifier.string
. The unique session name that will be visible in Replay.string
. The id of the Browser Emulator in use.string
. The id of the Human Emulator in use.string
. The configured unicode TimezoneId or host default (eg, America/New_York).string
. The configured locale in use (eg, en-US).IGeolocation
. The configured geolocation of the user (if set).IViewport
. The emulated viewport size and location.BlockedResourceType[]
. The blocked resource types.string
. A socks5 or http proxy url (and optional auth) to use for all HTTP requests in this session. The optional "auth" should be included in the UserInfo section of the url, eg: http://username:password@proxy.com:80
.object
. Optional settings to mask the Public IP Address of a host machine when using a proxy. This is used by the default BrowserEmulator to mask WebRTC IPs.string
. The URL of an http based IpLookupService. A list of common options can be found in plugins/default-browser-emulator/lib/helpers/lookupPublicIp.ts
. Defaults to ipify.org
.string
. The optional IP address of your proxy, if known ahead of time.string
. The optional IP address of your host machine, if known ahead of time.string
. The user agent string used in Http requests and within the DOM.Promise<IAgentMeta>
Agent output is an object used to track any data you collect during your session. Output will be shown in Replay during playback for easy visual playback of data collection.
Output is able to act like an Array or an Object. It will serialize properly in either use-case.
NOTE: any object you assign into Output is "copied" into the Output object. You should not expect further changes to the source object to synchronize.
const agent = require('secret-agent');
(async () => {
await agent.goto('https://www.google.com');
const document = agent.document;
for (const link of await document.querySelectorAll('a')) {
agent.output.push({ // will display in Replay UI.
text: await link.textContent,
href: await link.href,
});
}
console.log(agent.output);
await agent.close();
})();
Output
. An array-like object.An identifier used for storing logs, snapshots, and other assets associated with the current session.
Promise<string>
A human-readable identifier of the current Agent session.
You can set this property when calling Handler.dispatchAgent() or Handler.createAgent().
Promise<string>
Returns all open browser tabs.
Promise<Tab[]>
The url of the active tab.
Promise<string>
Alias for Tab.url
Returns a constructor for a Request object bound to the activeTab
. Proxies to tab.Request. These objects can be used to run browser-native tab.fetch requests from the context of the Tab document.
Request
Alias for Tab.Request
Executes a click interaction. This is a shortcut for agent.interact({ click: mousePosition })
. See the Interactions page for more details.
MousePosition
Promise
Closes the current instance and any open tabs.
Promise
Close a single Tab. The first opened Tab will become the focused tab.
Tab
The Tab to close.Promise<void>
Alias for Tab.close()
Update existing configuration settings.
object
Accepts any of the following:IUserProfile
. Previous user's cookies, session, etc.string
. Overrides the host timezone. A list of valid ids are available at unicode.orgstring
. Overrides the host languages settings (eg, en-US). Locale will affect navigator.language value, Accept-Language request header value as well as number and date formatting rules.IViewport
. Sets the emulated screen size, window position in the screen, inner/outer width. (See constructor for parameters).BlockedResourceType[]
. Controls browser resource loading. Valid options are listed here.string
. A socks5 or http proxy url (and optional auth) to use for all HTTP requests in this session. The optional "auth" should be included in the UserInfo section of the url, eg: http://username:password@proxy.com:80
.object
. Optional settings to mask the Public IP Address of a host machine when using a proxy. This is used by the default BrowserEmulator to mask WebRTC IPs.string
. The URL of an http based IpLookupService. A list of common options can be found in plugins/default-browser-emulator/lib/helpers/lookupPublicIp.ts
. Defaults to ipify.org
.string
. The optional IP address of your proxy, if known ahead of time.string
. The optional IP address of your host machine, if known ahead of time.options | ConnectionToCore
. An object containing IConnectionToCoreOptions
used to connect, or an already created ConnectionToCore
instance. Defaults to booting up and connecting to a local Core
.Promise
See the Configuration page for more details on options
and its defaults. You may also want to explore BrowserEmulators and HumanEmulators.
Detach the given tab into a "Frozen" state. The FrozenTab
contains a replica of the DOM and layout at the moment of detachment, and supports all the readonly activities of a normal Tab (eg, querySelectors, getComputedVisibility, getComputedStyle).
FrozenTabs
have a unique attribute in that any queries you run against them will be "learned" on an initial run, and pre-fetched on subsequent runs. This means you can very quickly iterate through all the data you want on a page after you've loaded it into your desired state.
NOTE: you can detach the same Tab
multiple times per script. Each instance will contain DOM frozen at the time it was detached.
Tab
. An existing tab loaded to the point you wish to freeze
string
. Optional extra identifier to differentiate between runs in a loop. This can be useful if you are looping through a list of links and detaching each Tab but have specific extraction logic for each link. NOTE: if your looping logic is the same, changing this key will decrease performance.FrozenTab
await agent.goto('https://chromium.googlesource.com/chromium/src/+refs');
await agent.activeTab.waitForLoad(LocationStatus.DomContentLoaded);
const frozenTab = await agent.detach(agent.activeTab);
const { document } = frozenTab;
const versions = agent.output;
// 1. First run will run as normal.
// 2+. Next runs will pre-fetch everything run against the frozenTab
// NOTE: Every time your script changes, SecretAgent will re-learn what to pre-fetch.
const wrapperElements = await document.querySelectorAll('.RefList');
for (const elem of wrapperElements) {
const innerText = await elem.querySelector('.RefList-title').innerText;
if (innerText === 'Tags') {
const aElems = await elem.querySelectorAll('ul.RefList-items li a');
for (const aElem of aElems) {
const version = await aElem.innerText;
versions.push(version);
}
}
}
await agent.close();
Returns a json representation of the underlying browser state for saving. This can later be restored into a new instance using agent.configure({ userProfile: serialized })
. See the UserProfile page for more details.
Promise<IUserProfile>
Bring a tab to the forefront. This will route all interaction (click
, type
, etc) methods to the tab provided as an argument.
Tab
The Tab which will become the activeTab
.Promise<void>
Alias for Tab.focus()
Executes a series of mouse and keyboard interactions.
Interaction
Promise
Refer to the Interactions page for details on how to construct an interaction.
Executes a scroll interaction. This is a shortcut for agent.interact({ scroll: mousePosition })
. See the Interactions page for more details.
MousePosition
Promise
Executes a keyboard interactions. This is a shortcut for agent.interact({ type: string | KeyName[] })
.
KeyboardInteraction
Promise
Refer to the Interactions page for details on how to construct keyboard interactions.
Add a plugin to the current instance. This must be called before any other agent methods.
ClientPlugin
| array
| object
| string
this
The same Agent instance (for optional chaining)If an array is passed, then any client plugins found in the array are registered. If an object, than any client plugins found in the object's values are registered. If a string, it must be a valid npm package name available in the current environment or it must be an absolute path to a file that exports one or more plugins -- Agent will attempt to dynamically require it.
Also, if a string is passed -- regardless of whether it's an npm package or absolute path -- the same will also be registered in Core (however, the same is not true for arrays or objects). For example, you can easily register a Core plugin directly from Client:
import agent from '@secret-agent';
agent.use('@secret-agent/tattle-plugin');
The following three examples all work:
Use an already-imported plugin:
import agent from '@secret-agent';
import ExecuteJsPlugin from '@secret-agent/execute-js-plugin';
agent.use(ExecuteJsPlugin);
Use an NPM package name (if it's publicly available):
import agent from '@secret-agent';
agent.use('@secret-agent/execute-js-plugin');
Use an absolute path to file that exports one or more plugins:
import agent from '@secret-agent';
agent.use(require.resolve('./CustomPlugins'));
Wait for a new tab to be created. This can occur either via a window.open
from within the page javascript, or a Link with a target opening in a new tab or window.
Promise<Tab>
const url = 'https://dataliberationfoundation.org/nopost';
const { document, activeTab } = agent;
await agent.goto('http://example.com');
// ...
// <a id="newTabLink" href="/newPage" target="_blank">Link to new target</a>
// ...
await document.querySelector('#newTabLink').click();
const newTab = await agent.waitForNewTab();
await newTab.waitForPaintingStable();
Agent instances have aliases to all top-level Tab methods. They will be routed to the activeTab
.
Alias for Tab.fetch()
Alias for Tab.getFrameEnvironment()
Alias for Tab.getComputedStyle()
Alias for Tab.getJsValue()
Alias for Tab.goBack
Alias for Tab.goForward
Alias for Tab.goto
Alias for Tab.getComputedVisibility
Alias for Tab.reload
Alias for Tab.takeScreenshot
Alias for Tab.waitForFileChooser()
Alias for Tab.waitForElement
Alias for Tab.waitForLocation
Alias for Tab.waitForMillis
Alias for Tab.waitForLoad(PaintingStable)
Alias for Tab.waitForResource