Witam
Chciałem przetestować działanie crawlera tutaj użyłem scrapy i udało mi się tylko że myślałem że robot zapisze wyniki do jakiegoś pliku ale wydaje mi się że tak nie zrobił. Czy ktoś mi może przetłumaczyć co bot tak naprawdę wykonał? Wskazałem mu domenę wp.pl. Bo z tego co patrzę to jedynie sprawdził plik robots.txt
test@test-VirtualBox:~/projekt_test$ scrapy crawl scrapy_test
2017-07-05 10:01:40 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: projekt_test)
2017-07-05 10:01:40 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'projekt_test.spiders', 'SPIDER_MODULES': ['projekt_test.spiders'], 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'projekt_test'}
2017-07-05 10:01:40 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2017-07-05 10:01:40 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-07-05 10:01:40 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-07-05 10:01:40 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-07-05 10:01:40 [scrapy.core.engine] INFO: Spider opened
2017-07-05 10:01:40 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-07-05 10:01:40 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-07-05 10:01:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.wp.pl/robots.txt> (referer: None)
2017-07-05 10:01:40 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.wp.pl/> from <GET http://www.wp.pl/>
2017-07-05 10:01:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.wp.pl/> (referer: None)
2017-07-05 10:01:41 [scrapy.core.engine] INFO: Closing spider (finished)
2017-07-05 10:01:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 669,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 54988,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/301': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 7, 5, 8, 1, 41, 21077),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'memusage/max': 49946624,
'memusage/startup': 49946624,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2017, 7, 5, 8, 1, 40, 523312)}
2017-07-05 10:01:41 [scrapy.core.engine] INFO: Spider closed (finished)