Technology

The crawler and source library is fully accessible through our API (web service) and requires no local installation. The user interface requires no technical expertise to manage.

CyberWatcher has developed several crawlers for retrieving information - all of which have been designed to meet specific requirements. Via an online dashboard the editor can adjust the retrieval according to type of source, web site link structure, and content elements on each web site. The patented content-element crawler filters out irrelevant content and spam.

When web sites are added to the source library, the source editors include metadata with each source in order to keep a good structure of the library. This secures a high quality of all input.

At the user side of things, this means that the user can make a selection of sources based upon either of the metadata parameters that have been included, in order to get an initial filtering of the content.

CyberWatcher also offers single source monitoring, allowing the user to get content only from one specific website or even a specific section of a website.

Clean & Enriched XML

Our patented content crawler filters out spam, external feeds, menu elements and any other irrelevant content that is presented on the same page as the article. This is done in order to extract only the core content of the article to be indexed. Each content item is further enriched with information about headline, abstract, body text, publish date and author.

Combined, this secures an output in a clean and enriched XML format, with the best quality available among crawler feed providers today.

Example of XML Metadata

Based on Greytness by Adammer™ / All photographs by Kai Fischer