Tiny Search Engine
February 2023 – March 2023Systems
CData StructuresSystems
A Tiny Search Engine created for CS 50, Software Design and Implementation at Dartmouth. The project is composed of three modules: crawler, indexer, and querier.
The crawler module crawls the web and retrieves webpages starting from a seed URL. It parses the seed webpage, extracts embedded URLs, then retrieves each of those pages recursively, limiting exploration to a given depth and only crawling pages under the seed domain. The indexer reads document files produced by the crawler, builds an inverted index, and writes it to a file. The querier reads the index file and page files to answer search queries submitted via stdin. Efficiency was optimized with a hashtable-to-counter data structure.