Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
Python-100-Days
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
huangkq
Python-100-Days
Commits
da492bb9
Commit
da492bb9
authored
Jun 09, 2018
by
jackfrued
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
更新了Scrapy部分文档
parent
3e313ffb
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
5 deletions
+19
-5
Scrapy爬虫框架分布式实现.md
Day66-75/Scrapy爬虫框架分布式实现.md
+19
-5
No files found.
Day66-75/Scrapy爬虫框架分布式实现.md
View file @
da492bb9
...
@@ -6,11 +6,25 @@
...
@@ -6,11 +6,25 @@
### Scrapy分布式实现
### Scrapy分布式实现
1.
安装Scrapy-Redis。
2.
配置Redis服务器。
### 布隆过滤器
3.
修改配置文件。
-
SCHEDULER = 'scrapy_redis.scheduler.Scheduler'
-
DUPEFILTER_CLASS = 'scrapy_redis.dupefilter.RFPDupeFilter'
-
REDIS_HOST = '1.2.3.4'
-
REDIS_PORT = 6379
-
REDIS_PASSWORD = '1qaz2wsx'
-
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.FifoQueue'
-
SCHEDULER_PERSIST = True(通过持久化支持接续爬取)
-
SCHEDULER_FLUSH_ON_START = True(每次启动时重新爬取)
### Scrapyd分布式部署
### Scrapyd分布式部署
1.
安装Scrapyd
2.
修改配置文件
-
mkdir /etc/scrapyd
-
vim /etc/scrapyd/scrapyd.conf
3.
安装Scrapyd-Client
-
将项目打包成Egg文件。
-
将打包的Egg文件通过addversion.json接口部署到Scrapyd上。
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment