How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

Noah@lemmy.dbzer0.com · edit-2 11 months ago

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

umami_wasabi@lemmy.ml · edit-2 11 months ago

I don’t a single guide for you but I can layout a road map.

A programming language. I prefer Python.
Basic HTML syntax and CSS selectors
HTTP, specifically methods, status code (no need to memorize all cuz you can go look it up), and cookies

After you got those foundation ready, you can go on and try to build a webscraper. I advice aginst using Scrapy. Not because it is bad but too overwhelming and abstracted for any beginner. I will instead advice you use requests for HTTP, and BeautifulSoup4 for HTML parsing. You will build a more solid foundation and transition to scrapy later when you need those advanced function.

When you get stuck, don’t afraid to pause on your attempt and read tutorials again. Head to the Python Community on Discord to get interactive help. We welcome noobs as we once were noobs too. Just don’t ever mention scraping there as they can’t help if they suspect you’re trying to do something inappropriate, malicious, or illegal. They are notoriously aginst yt-dlp which frustrates me a bit. Phrase it nicely and in an generic way. I will be there occasionally offering help.

Noah@lemmy.dbzer0.com · 11 months ago

The discord thing is a no-go since I don’t really know how to make my issue palatable. That’s why I used lemmy. Thanks again!

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can’t seem to make it click.