Filters
The extracted data of the previous example contains many junk items and use filters to remove them from output.
The filter is normally applied after parse is over and data is created. The Example 5 uses filter for items/item. The filter snippet is as below
defs/examples/fin/jsoup/ex-5/job.yml
dataDefs:
bs:
query:
block: "table:contains(Sources Of Funds)"
selector: "tr:nth-child(%{item.index}) > td:nth-child(%{dim.year.index})"
items:
- item:
name: item
selector: "tr:nth-child(%{index}) > td:nth-child(1)"
index: 5
breakAfter:
- "Book Value (Rs)"
filters:
- filter: { type: value, pattern: "" }
- filter: { type: value, pattern: "Sources Of Funds" }
- filter: { type: value, pattern: "Application Of Funds" }
dims:
- item:
name: year
selector: "tr:nth-child(1) > td:nth-child(%{index})"
indexRange: 2-6
The filter definition remove the members whose axis item (item) value is
- blank
- null
- Sources Of Funds
- Application Of Funds
As the filter specifies type: value, pattern is applied to axis value field. The filter property type can be value or match. When type: match pattern is compared with axis’s match field. Pattern property can be plain text for simple comparison or regex for complex pattern matching.
When filter is true for an axis, then the enclosing data item is removed from the output. For example, when filter for dim axis (year) is
dims:
- item:
name: year
...
filters: [
filter: { type: value, pattern: "Dec 16" },
]
and data item axis are
dim : Dec 16
item : Equity
fact : 20.00
Then as dim axis matches with the pattern and the whole data item is removed from data even though filter is not specified for other two axis.
The next chapter shows how to find out Selector or XPath with Google Chrome, FireFox browsers or through Scoopi Query Analyzer tool.