Web Scrapers Generator BrowserExt

loadpage

Loads the page for a url given and proceeds operators and functions, defined in the loadpage body.

[@multi] loadpage(url, [method = 'get', [params = dict(), [encoding = 'UTF-8', [headers = array()]]]]) {
    loadpage body
}
    

Parameters:

url Page address string or an array with address strings.
method String specifying the method of page loading. Can be 'get' and 'post'.
params Dictionary. Parameters passed when the page is loaded. If method = 'get', parameters will be automatically assigned to the url.
encoding Original encoding of the page in the server. Is needed to correct parameters passing when post is requested.
headers Passed headers when the page is loaded. Strings array. For example, cookies can be passed via this parameter.

If url is an array, loadpage is loading pages one by one, body is implemented for the each page. Pageparams dictionary with loaded page parameters is available inside:

pageparams['page'] The string containing page's html code
pageparams['url'] Loaded page url. If the page was loaded by get method, url will contain parameters too.
pageparams['effectiveurl'] The final url of the loaded page. Final url is different from the given url for, as an example, redirect of the page has been done. If you launch the script from the Editor this parameter won't be supported.
pageparams['domain'] Loaded page domain. For example, for the loaded page http://site.com/123.html domain will be http://site.com/
pageparams['encoding'] Original page encoding at the server. Page, loaded from the loadpage will always be encoded to UTF-8
pageparams['headers'] Passed when page was loaded headers string. If launch with the Script Editor the parameter is not supported.

Loadpage body has local scope, so variables, defined inside the loadpage, will be seen only inside the body not behind it or in the nested loadpage body.

If @multi directive is used then downloading will be done in parallel, if the url parameter is an array of links.

To collect data for the loaded page get-functions are used ( gettext, getlink and others). To add addresses to download queue the continue function is used. For page addresses array pattern generation you can use the function generateurl.

Example 1. Simple page loading and all the links aquisition:

Example 2. Page loading with parameters

It is possible to pass a parameter to the loadpage body, to do this you must specify a value or a variable in square brackets after the parameters list. The value of this parameter will be available in the loadpage body as the value of variable with name passparams. Also, a parameter can be passed through the second parameter of the function continue.

Example 3. Passing parameter to the loadpage body: