You can go to browser mode by clicking "Browser" tab at the right side of the Editor.
In order to load the page you need to type the address in the address field and
press "Enter" button. For loaded page scraping rules could be set up
by using " Text", " Image",
" Link", " HTML",
"RegExp" buttons. Use these buttons you may select the relevant elements of the page.
By default clicking the link in the loaded page does not leads to loading the link.
To go to the link you need click "Load Link" button and click on the link.
Attention! Some web-sites does not allow using frame. Such web-sites could
not be loaded in "built-in browser", they could be processed only by script usage.
The example is yandex.ru web-site and it's subpages.
Lets take detailed look at the buttons:
- The " Text" button creates Text rule and serves for page text elements parsing,
or, for example, for link anchor. Similarly to the function
gettext in the script.
- The " Image" button creates Image rule and serves for getting the link for
the image or image itself. Similarly to the function
getimglink in the script.
- The " Select Link" button creates Link rule and serves for getting link url.
Similarly to function getlink in the script.
- The " HTML"
button creates a rule of type HTML and serves to get html-code of an element of the
document. It may also be using to achieve elements of the document regarding this
html-code as you can load this html-code by clicking the "Load" and create rules
on a separate page. Similarly to function gethtml
in the script.
- The "RegExp" button creates a rule of type regular expression and is used
to retrieve the text by the regular expression. Similarly to function
getregexp in the script.
If any of above-listed buttons is activated and click on any element of loaded page was made,
the window with the space for typing is shown and the rule name appears.
Selected element of the page is highlightened with red dashed line,
corresponding rule with a name tag, type and xpath rule appears below the page
in the table. If you point the mouse cursor over the rule in the table,
corresponding element is outlined with green dashed line in the browser.
For RegExp rule there is a special window with html-code of the page, fields to enter
a name of the rule, a regular expression, and a group number, which should be returned.
There are several buttons above the table in the bottom part of the page.
- To merge rules you need to tick two rules and press "Merge". Merging procedure
is comparing two xpath rules and forms general way. It is necessary for generating
one rule for all elements in the list. First thing is to create the rule
for the first element from the list, then for second, then merge both of this rules,
as a result all list's elements would be selected.
- Use "Delete" button you can delete selected rules.
- The drop-down box "Group Filter" is used for filtering rules
for a particular group.
- Use "Export rule" to show window for control export rules
for the page.
- Use "Extracted data" you can open a window showing the data
extracted from the page, divided into groups.
Every rule in the table has "Actions" button which calls for pulldown with available
options for rule given.
- "Edit XPath" action allows to change xpath for this rule, another objects
in the browser could be selected at the same time, but rule type won't be changed.
- "Parameters" action calls for the window with settings for that rule.
Every rule has its own number of settings. They would be described further.
- "Load" action is available from the Link or Form rule. If you choose this action,
another page with pointed address or form will be loaded. You could set up the rules
in new opened page too. You can go back by pressing "Back" button above the page
near the page address.
- "Grouping" action calls the window where group for the rules could be changed.
All rules are going to group1 by default.
- "Filter" action brings up a window where you can set filtering data for the rule.
For description of the types of filtering, refer to the function
Results of filtering can be viewed in the "Extracted data" window, rules with filter
have names with asterisk.
For every rule type there is number of parameters. Lets take a look to the whole list of parameters:
- Just text node. Points on the fact, that rule would return text
nodes which belong directly to askable node. Otherwise the text containing nodes
from all the included elements is returned.
- Next element. Points on the fact, that rule would select an element
which is next from the element, pointed by xpath. Type is pointing on the new rule type.
For example, you can select the text at the first place and then point on the next element
and Link type.
- Select the word. Points on the fact, that particular words should be taken
from the text. From k to n. Counting starts from 1.
- Concatenate strings into one. Concatenate all the elements
of the rule into one string using the delimiter specified in the input field.
- Replace characters. The first field is a regular expression that specifies
the characters to search, in the second - the string to be replaced.
- Save the file specified with link. Points on the fact, that file,
pointed by the link will be saved. Similarly to
- Add to download queue. Points on the fact that link will be added
in the download queue. Similarly to continue.
- Save the image file. Points on the fact, that image, pointed by the link
will be saved. Similarly to storefile.
- Returns only the contents (innerHTML). Specifies that returns
a html-code with no top tag, only its contents.
Export Rule For The Page
Above the table of rules, which is located at the bottom of the page there is a button
"Export rule". When you press this button, a window for setting
the export rules of the page is showed. In the window you can see a table with
the export rules. Each rule corresponds to a function store
in the script. You can add a new rule, edit, delete, and move up and down already
Each export rule is configured based on the export profile (Export
tab in the "Editor"), the values of variables or scraping rules are appointed
to the parameters. And also set more general parameters, such as the name of the file
to save and name of the variable that will be store the result.
When you click "Add" or "Edit", a new window with
the parameters of export rules is showed. Export profile name is selected in the
drop-down box "Select the profile". Depending on the selected profile
in the parameters table will be a list of export profile parameters.
For profiles of csv type you must add parameters and move them to set the required
order, for other profiles list of the parameters are to be formed, it is only necessary
to set the parameter value by selecting a scraping rule or variable from the list.
You can also define a constant (string must be enclosed in quotation marks).
You must also specify the filename to save without the extension (for profiles of
RDB type this is not required because there is a store in the database and not to the file.)
And you can set a variable to which the result will be stored. If data must be added
to the array, it is necessary to specify the adding to the end of the array operator,
and the directive @global to make the global variable. For instance, @global prod.
Also for the profile of RDB type, if you click on the name of the parameter,
there is an asterisk, which means that this parameter will be updated if the record
is found in the database. To remove the asterisk enough to click on the name
of the parameter once again.
If you filled the form, press "Enter" at the loaded page, Form type rule with xpath address
will be generated automatically, and the page, called by this form, will be loaded.
You can go "Back" and reload the form again with the same setting by clicking "Load"
in the actions menu.