Online press monitor
Summary
This customizable application allows users to receive news updates based on given key words from a list or range of public news portals, government sites, competitors’ sites or even subscription-based sites. Alerts are then sent with appropriate context and links to the specified e-mail address, newsgroup, or mailing list. This system can replace traditional manual daily press monitoring. A well managed press monitoring process can provide companies with a definite competitive advantage. This application has been implemented by a U.S. investment bank and a Hungarian Ministry.
Technical overview
The Online Press Monitoring system is written entirely in Java. Its main components
are:
- Retrieval Engine
- Web Monitor
- Conventional Search
- Management Interface
Retrieval Engine
This engine visits the fields defined on the management interface at a given
time recursively, in multi-thread mode, looking for pages not visited before.
The program continuously holds a URL pool of which the next page to be downloaded
is chosen. The URL to be chosen can be defined by a Retrieval Definition, containing
several Retrieval Rules, configured and grouped at the Management Interface.
Retrieval Rules help users to:
- Avoid recursive references (can lead to deadlock)
- Avoid download of unnecessary pages
- Filter after session
- Filter at only first level (for example the main page and pages that
can directly be reached from the main page – especially useful for press
watching, to avoid co-references)
- Extend rules and filters
Downloaded sites are cleared and filtered by the retriever, a checksum is generated, and if there are no duplications in the database, sites are saved.
Functions, components
Web Monitor
Users can define their queries at the Management Interface according to their
fields of interest, using these search phrases:
- Exact phrase
- Obligatory keywords (AND relation)
- Optional keywords (OR relation)
- Forbidden words (NOT relation)
- Any combination of above
Web Monitor runs daily and checks if any new material in the database contains text satisfying the predefined queries. If so, an e-mail is sent to the user(s) related to the query, containing results, with search phrases highlighted in the body of the text. The e-mail contains hypertext, so results are clickable, and the system offers the auto-highlight function. In this case the site shown in the results is downloaded from our own server, and the corresponding words are highlighted in purple.
Conventional Search
Queries can be defined similarly to Web Monitor. In this case there is no time limit; results are immediately visible.
Management Interface
This is the web interface for system configuration. Sites can be uploaded, activated, or deactivated. Queries can be defined, and Retrieval Definitions can be set and assigned to given sites.
References
Hungary.Network Inc.
Azimuth Trust Fund – a US asset management firm
Ministry of Cultural Heritage, Secretariat of National Anniversaries
Screenshots

















