Google is great for being able to search for words or phrases in the visual end product of code but I need a tool that would allow me to search within the code of sites. Unlike Google though, I wouldn’t need this tool to crawl the entire net, just the urls that I ask it to index.
## Deliverables
Google is great for being able to search for words or phrases in the visual end product of code but I need a tool that would allow me to search within the code of sites. Unlike Google though, I wouldn’t need this tool to crawl the entire net, just the urls that I ask it to index. Would like just to be able to give it a list of urls, have it spider through and then have the results be searchable. Overtime the program could be asked to index new urls and index those results and make them searchable along with previously indexed sites.
The envisioned pages
Login page
-site would need to be secure, requiring login to access
Search interface
-a page where a search string can be entered for the program to search through indexed sites. Would need to support common search engine nomenclature (ie. NOT, AND, quotes, ect)
Search result list
-program would output a list of results, clicking one would go to an individual search result entries, results would need to be sortable alphabetically by domain name and chronologically by index date
Individual search result entries
-would display the indexed site code with the specific search string results highlighted.
-page header would need to display the url of what was indexed as well as a timestamp of when it was indexed.
Admin page
-from here new sites could be imputed into the program from crawling, needs to accept more than one new url at a time, like a text box where each url can just be on its own line.
-ability to remove url’s indexed by the program
-ability to have an already indexed url recrawled and having both results searchable
-alerts, ability to enter strings that will trigger emails if they come up in new indexes, alerts would need the search string, the email address to send them to, and an editable subject line.
-logins, ability to add or delete user logins to access the program, also two levels of logins would be needed, general user and admin, who can also access the admin page
Crawl Database
-all crawls need to be time stamped