Simples web scraper created using Python3
- extract data using multiple xpaths from multiple urls
- save response in MongoDB
- exceptions and error handling
- only for basic web sraping work from static HTML web pages
{
"url": "https://www.technology.pitt.edu/blog/zoom10faq",
"xpaths": [
{
"questions": '//div[@class="field-item even"]/h2/text()',
"answers": '//div[@class="field-item even"]/p/text()',
"correct_answer": '//div[@class="field-item even"]/p[0]/text()'
}
]
}
myclient = pymongo.MongoClient("mongodb://host:port/") # or add the connection url
mydb = myclient["database"]
mycol = mydb["collection"]
pip3 install -r requirements.txt
python3 main.py