Handlers provide a simple interface to load-balance many concurrent Agent sessions across one or more SecretAgent Cores
.
import { Handler } from 'secret-agent';
(async () => {
const handler = new Handler();
const agent = handler.createAgent();
await agent.goto('https://ulixee.org');
async function getDatasetCost(agent: Agent) {
const dataset = agent.input;
await agent.goto(`https://ulixee.org${dataset.href}`);
const cost = agent.document.querySelector('.cost .large-text');
agent.output.cost = await cost.textContent;
}
const links = await agent.document.querySelectorAll('a.DatasetSummary');
for (const link of links) {
const name = await link.querySelector('.title').textContent;
const href = await link.getAttribute('href');
handler.dispatchAgent(getDatasetCost, {
name,
input: {
name,
href,
},
});
}
const results = await handler.waitForAllDispatches();
for (const result of results) {
const cost = result.output.cost;
const name = result.input.name;
console.log('Cost of %s is %s', name, cost);
}
await handler.close();
})();
Handlers allow you to queue up actions to take as Agents become available. They'll automatically round-robin between available connections. It's a simple way to complete all your scrapes without overloading the local machine or remote browsers.
The Handler constructor takes one or more "connections" to SecretAgent Core
instances.
Cores
can be located remotely or in the same process. A remote connection includes a "host" parameter that will be connected to via tcp (and needs to be open on any firewalls).
Every connection controls how many maximum concurrent Agents should be open at any given time. Requests for Agents will be round-robined between all connections.
Connections can be either:
object
. A set of settings that controls the creation of a connection
to a SecretAgent Core
.string
. An optional hostname:port
url that will be used to establish a connection to a SecretAgent Core running on another machine. If no host is provided, a connection to a "locally" running Core
will be attempted.number
. The max number of Agents to allow to be dispatched and created at the same time. Agents are "active" until the dispatchAgent callback is complete, or the created Agent is closed. If not provided, this number will match the max allowed by a Core
.number
. The number of milliseconds to give each Agent in this connection to complete a session. A TimeoutError will be thrown if this time is exceeded.number
defaults to any open port
. Starting internal port to use for the mitm proxy.string
defaults to os.tmpdir()/.secret-agent
. Directory to store session files and mitm certificates.number
. Port to start a live replay server on. Defaults to "any open port".ConnectionToCore
. A pre-initialized connection to a SecretAgent Core
. You can use this option to pre-check your connection to a remote connection, or to provide customization to the connection.const { Handler } = require('secret-agent');
(async () => {
const remote = new RemoteConnectionToCore({
host: '10.10.1.1:1588',
});
await remote.connect();
const handler = new Handler(remote1, {
host: '172.234.22.2:1586',
maxConcurrency: 5,
});
const agent = await handler.createAgent();
})();
Readonly property returning the resolved list of coreHosts.
Promise<string[]>
Sets default properties to apply to any new Agent created. Accepts any of the configurations that can be provided to createAgent()
.
IAgentCreateOptions
See the Configuration page for more details on options
and its defaults. You may also want to explore BrowserEmulators and HumanEmulators.
Tab
Adds a connection to the handler. This method will call connect on the underlying connection.
Connection arguments are the same as the constructor arguments for a single connection.
Can be either:
object
. A set of settings that controls the creation of a connection
to a SecretAgent Core
. (see constructor
)ConnectionToCore
. A pre-initialized connection to a SecretAgent Core
.Promise<void>
Closes and disconnects a connection from core. Agents "in-process" will throw DisconnectedFromCoreError
on active commands.
string
. The coreHost connection.Promise<void>
Closes all underlying connections. NOTE: this function will "abort" any pending processes. You might want to call waitForAllDispatches()
first.
Promise
Creates a new Agent
using one of the Core
connections initialized in this Handler. If there are no connections with availability (based on maxConcurrency
setting), the returned promise will not return until one is free.
NOTE: when using this method, you must call agent.close()
explicitly to allow future Agents to be dispatched or created as needed.
object
. Accepts any of the following:string
. This is used to generate a unique sessionName.string
defaults to default-browser-emulator
. Chooses the BrowserEmulator plugin which emulates the properties that help SecretAgent look like a normal browser.string
defaults to default-human-emulator
. Chooses the HumanEmulator plugin which drives the mouse/keyboard movements.string
. Overrides the host timezone. A list of valid ids are available at unicode.orgstring
. Overrides the host languages settings (eg, en-US). Locale will affect navigator.language value, Accept-Language request header value as well as number and date formatting rules.IViewport
. Sets the emulated screen size, window position in the screen, inner/outer width and height. If not provided, the most popular resolution is used from statcounter.com.BlockedResourceType[]
. Controls browser resource loading. Valid options are listed here.IUserProfile
. Previous user's cookies, session, etc.boolean
. Whether or not to show the Replay UI. Can also be set with an env variable: SA_SHOW_REPLAY=true
.object
. An object containing properties to attach to the agent (more frequently used with dispatchAgent
)string
. A socks5 or http proxy url (and optional auth) to use for all HTTP requests in this session. The optional "auth" should be included in the UserInfo section of the url, eg: http://username:password@proxy.com:80
.object
. Optional settings to mask the Public IP Address of a host machine when using a proxy. This is used by the default BrowserEmulator to mask WebRTC IPs.string
. The URL of an http based IpLookupService. A list of common options can be found in plugins/default-browser-emulator/lib/helpers/lookupPublicIp.ts
. Defaults to ipify.org
.string
. The optional IP address of your proxy, if known ahead of time.string
. The optional IP address of your host machine, if known ahead of time.See the Configuration page for more details on options
and its defaults. You may also want to explore BrowserEmulators and HumanEmulators.
Promise<Agent>
const { Handler } = require('secret-agent');
(async () => {
const handler = new Handler({ maxConcurrency: 2 });
const agent1 = await handler.createAgent();
const agent2 = await handler.createAgent();
setTimeout(() => agent2.close(), 100);
// will be available in 100 ms when agent2 closes
const agent3 = await handler.createAgent();
})();
This method allows you queue up functions that should be called as soon as a connection can allocate a new Agent. All configurations available to createAgent
are available here.
NOTE: you do not need to call close on an Agent when using this method. It will automatically be called when your callback returns.
On Disconnecting: if a Core is shut-down or the handler closes a coreConnection while work is still in-progress, the agent commands will throw a DisconnectedFromCoreError
.
(agent) => Promise
. An asynchronous function that will be passed an initialized Agent with the given createAgentOptions
configuration.object
. Options used to create a new agent. Takes all options available to createAgent()
.const { Handler } = require('secret-agent');
(async () => {
const handler = new Handler({ maxConcurrency: 2 });
handler.dispatchAgent(
async agent => {
const { url } = agent.input;
await agent.goto(url);
const links = await agent.document.querySelectorAll('a');
for (const link of links) {
const href = await link.getAttribute('href');
handler.dispatchAgent(
async agent0 => {
await agent0.goto(agent0.input.link);
const body = await agent0.document.body.textContent;
},
{ input: { href } },
);
}
// send in data
},
{ input: { url: 'https://dataliberationfoundation.org' } },
);
// resolves when all dispatched agents are completed or an error occurs
await handler.waitForAllDispatches();
await handler.close();
})();
Waits for all agents which have been created using dispatchAgent
to complete. If any errors are thrown by Agents, the first exception will be thrown upon awaiting this method.
Promise<DispatchResult[]>
string key
. The session id assigned to the dispatched Agent.string
. The name assigned to this session.any
. Any input arguments passed to the dispatched Agent.any?
. The object set to agent.output if no error thrown.Error?
. An error if one has been thrown during dispatch.CreateAgentOptions
. Any arguments passed to the dispatched Agent.Waits for all agents which have been created using dispatchAgent
to complete or throw an error. This method will always wait for all dispatches to finish, regardless of errors thrown. This is different from waitForAllDispatches
, which will throw on any dispatch errors.
Promise<DispatchResult[]>