Browser interaction with WebDriver
In this previous chapter, we scrape data from page with JavaScript created contents. In this chapter, we extend it further and show how to interact with WebDriver to emulate browser automation.
Browser Scrolling and Ajax
The Quote Example 2 scrape quotations from Scroll - infinite scrolling pagination page loads next set of quotes through Ajax call when we scroll to the bottom of browser.
The locator and task snippet from this example is as below.
defs/examples/quote/jsoup/ex-1/job.yml
locatorGroups:
quoteGroup:
locators: <a href="https://www.seleniumhq.org/docs/03_webdriver.jsp#introducing-the-selenium-webdriver-api-by-example" target="_blank">
{ name: quotes, url: "http://quotes.toscrape.com/scroll" }
]
taskGroups:
quoteGroup:
quoteTask:
dataDef: quote
steps:
jsoupDefault:
loader:
class: "org.codetab.scoopi.step.extract.DomLoader"
previous: seeder
next: parser
plugins: [
plugin: {
name: script,
class: "org.codetab.scoopi.plugin.script.BasicScript",
script: "/defs/examples/quote/jsoup/ex-2/script.js",
entryPoint: "execute", }
]
The quoteTask uses jsoupDefault steps and override its loader step to
use DomLoader class as we saw in the last chapter. However, in this
example we add a script plugin which calls execute function defined in
file defs/examples/quote/jsoup/ex-2/script.js
.
In this script we place the Selenium WebDriver and JavaScript code to scroll the window as shown below.
defs/examples/quote/jsoup/ex-2/script.js
function execute(webDriver) {
var Select = Java.type('org.openqa.selenium.support.ui.Select');
var By = Java.type('org.openqa.selenium.By');
var pagesToScroll = 4;
while (true) {
// scroll
webDriver
.executeScript("window.scrollTo(0, document.body.scrollHeight)");
// wait
webDriver
.executeAsyncScript("window.setTimeout(arguments[arguments.length - 1], 500);");
var eles = webDriver.findElements(By.className("quote"));
if (eles.size() >= pagesToScroll * 10) {
break;
}
}
}
When DomLoader step calls the script execute() function, it passes instance of WebDriver. In while loop, the webdriver method executeScript() scrolls the browser window down which triggers page’s ajax call to fetch quotes for next page and then the executeAsyncScript() method wait for 500ms so that DOM is loaded. After that, findElements() method selects and returns list of HTML elements with class name quote. The while loop breaks when list size is more than 40.
Script Engine can execute any Java method of WebDriver class, but to execute method any other class of Selenium such as Select class or By class we need to map those class to JavaScript variables using Java.type() calls as done at the start of script.
WebDriver has easy to explore and understand API to select elements, navigate pages and execute page script. Refer [Selenium WebDriver Documentation to learn more about it.
In the next section, we explain features to manage Scoopi.