loading

Logout succeed

Logout succeed. See you again!

ebook img

Scrapoxy Documentation PDF

pages75 Pages
release year2016
file size3.11 MB
languageEnglish

Preview Scrapoxy Documentation

Scrapoxy Documentation Release 3.1.1 Fabien Vauchelles Aug 17, 2018 Get Started 1 WhatisScrapoxy? 3 2 Documentation 5 3 Prerequisite 87 4 Contribute 89 5 License 91 i ii ScrapoxyDocumentation,Release3.1.1 GetStarted 1 ScrapoxyDocumentation,Release3.1.1 2 GetStarted 1 CHAPTER What is Scrapoxy ? http://scrapoxy.io Scrapoxyhidesyourscraperbehindacloud. Itstartsapoolofproxiestosendyourrequests. Now,youcancrawlwithoutthinkingaboutblacklisting! ItiswritteninJavascript(ES6)withNode.js&AngularJSanditisopensource! 1.1 How does Scrapoxy work ? 1. WhenScrapoxystarts,itcreatesandmanagesapoolofproxies. 2. YourscraperusesScrapoxyasanormalproxy. 3. Scrapoxyroutesallrequeststhroughapoolofproxies. 1.2 What Scrapoxy does ? • Createyourownproxies • Usemultiplecloudproviders(AWS,DigitalOcean,OVH,Vscale) • RotateIPaddresses • Impersonateknownbrowsers • Excludeblacklistedinstances • Monitortherequests • Detectbottleneck 3 ScrapoxyDocumentation,Release3.1.1 • Optimizethescraping 1.3 Why Scrapoxy doesn’t support anti-blacklisting ? Anti-blacklistingisajobforthescraper. Whenthescraperdetectsblacklisting,itasksScrapoxytoremovetheproxyfromtheproxiespool(throughaREST API). 1.4 What is the best scraper framework to use with Scrapoxy ? YoucouldusetheopensourceScrapyframework(Python). 1.5 Does Scrapoxy have a SaaS mode or a support plan ? Scrapoxyisanopensourcetool.Sourcecodeishighlymaintained.Youareverywelcometoopenanissueforfeatures orbugs. If you are looking for a commercial product in SaaS mode or with a support plan, we recommend you to check the ScrapingHubproducts(ScrapingHubisthecompanywhichmaintainstheScrapyframework). 4 Chapter1. WhatisScrapoxy? 2 CHAPTER Documentation YoucanbeginwiththeQuickStartorlookattheChangelog. Now,youcancontinuewithStandard,andbecomeanexpertwithAdvanced. AndcompletewithTutorials. 2.1 Quick Start ThistutorialsworksonAWS/EC2,withregioneu-west-1. SeetheAWS/EC2-CopyanAMIfromaregiontoanotherifyouwanttochangeregion. 2.1.1 Step 1: Get AWS credentials SeeGetAWScredentials. 2.1.2 Step 2: Create a security group SeeCreateasecuritygroup. 2.1.3 Step 3A: Run Scrapoxy with Docker Runthecontainer: sudo docker run -e COMMANDER_PASSWORD='CHANGE_THIS_PASSWORD' \ -e PROVIDERS_AWSEC2_ACCESSKEYID='YOUR ACCESS KEY ID' \ -e PROVIDERS_AWSEC2_SECRETACCESSKEY='YOUR SECRET ACCESS KEY' \ -it -p 8888:8888 -p 8889:8889 fabienvauchelles/scrapoxy 5 ScrapoxyDocumentation,Release3.1.1 Warning: Replace PROVIDERS_AWSEC2_ACCESSKEYID and PROVIDERS_AWSEC2_SECRETACCESSKEY byyourAWScredentialsandparameters. 2.1.4 Step 3B: Run Scrapoxy without Docker InstallNode.js SeetheNodeInstallationManual. Theminimumrequiredversionis4.2.1. InstallScrapoxyfromNPM Installmake: sudo apt-get install build-essential AndScrapoxy: sudo npm install -g scrapoxy Generateconfiguration scrapoxy init conf.json Editconfiguration 1. Editconf.json 2. Inthecommandersection,replacepasswordbyapasswordofyourchoice 3. Intheproviders/awsec2section,replaceaccessKeyId,secretAccessKeyandregionbyyourAWScredentialsand parameters. StartScrapoxy scrapoxy start conf.json -d 2.1.5 Step 4: Open Scrapoxy GUI ScrapoxyGUIisreachableathttp://localhost:8889 2.1.6 Step 5: Connect Scrapoxy to your scraper Scrapoxyisreachableathttp://localhost:8888 6 Chapter2. Documentation

See more

The list of books you might like