Date of Award

12-2007

Type

Thesis

Major

Computer Science - Applied Computing Track

Department

TSYS School of Computer Science

First Advisor

Christopher C. Whitehead

Abstract

In the early 2000s, RSS (Really Simple Syndication) was launched into cyber space and rapidly gained fame by existing as the underlying technology that fueled millions of web logs (blogs). Soon RSS feeds appeared for news, multimedia podcasting, and many other types of information on the Internet. RSS introduced a new way to syndicate information that allowed anyone interested to subscribe to published content and pull the information to an aggregator, (RSS reader application), at their discretion. RSS made it simple for people to keep up with online content without having to continuously check websites for new content. This new technology quickly had its shortcomings though. Aggregators were set to periodically check a feed for new content and if the new content did exist then the whole feed may be downloaded again and content filtering was either completely absent or filtering was performed once the file was already downloaded. Users who may have only occasionally checked a site for new content were now equipped with the ability to subscribe to content all over the web and have an aggregator poll the sites periodically for new content. However this presented a serious scalability problem in terms of bandwidth utilization. The same users that were checking a site once a day for new content were now checking the sites with the aggregator on a specific interval such as every hour. Bandwidth utilization increased dramatically where RSS was involved. The aim of this thesis is to design a better RSS aggregator that effectively and efficiently polls, downloads and filters RSS content while using a minimal amount of bandwidth and resources. To meet these needs, an RSS aggregator named FeedWorks has been developed that allows for users to create subscriptions to content and set a interval to poll that subscription for newly published material. The aggregator uses specific HTTP (hypertext transfer protocol) header information to check for new content before it downloads a file and if new content is found, downloads the file but filters it based on user-created filter criteria before it writes the information to disk. Filtering and searching algorithms have been researched to tune the performance and limit the strain on the processor. Caching mechanisms have also been used to enhance the performance of the application. The aggregator contains content management functionality to allow users to create subscriptions and subscription groups and to apply filters to a specific subscription or groups of subscriptions. This thesis compares the aggregator with other currently available products and services. It provides detailed information regarding the end user's interface and the content management functionality it provides. Descriptive information is also presented that explains the content filtering and feed polling functionality and their respective algorithms.

Share

COinS