This is a rough idea: it would check for a set of problems like:
- Duplicate content available via both
www.domain.com and domain.com. I.e., does not have a clear canonical hostname.
- Robots.txt which blocks the site.
- HTML meta info blocking search spiders.