Every piece of content on a news site should have a structure built out, with an xml representation. The xml documents could be built according to a strict DTD or RDF that defines the structure of each different type of content; articles, recipes, classifieds ad, yellow page entry, etc.
The search engine on the site should index each xml file instead of the "display" version that the reader sees. This would allow each file to be stripped to just the relevant content and delivered to the search engine in a machine-readable format.
With the search engine beefed up with content that has been defined in an strict structure, searching becomes more than just a needle in a haystack search. All of the different types of search on a site could be combined, with slightly different displays based on the type of content. Searches could be made across types, and similar content becomes a much simpler and more valuable source.