Commit b8a0a6c1 authored by jackfrued's avatar jackfrued

更新了部分文档

parent 9ff9250e
......@@ -1345,4 +1345,4 @@
Python还有很多用于处理并行任务的三方库,例如:joblib、PyMP等。实际开发中,要提升系统的可扩展性和并发性通常有垂直扩展(增加单个节点的处理能力)和水平扩展(将单个节点变成多个节点)两种做法。可以通过消息队列来实现应用程序的解耦合,消息队列相当于是多线程同步队列的扩展版本,不同机器上的应用程序相当于就是线程,而共享的分布式消息队列就是原来程序中的Queue。消息队列(面向消息的中间件)的最流行和最标准化的实现是AMQP(高级消息队列协议),AMQP源于金融行业,提供了排队、路由、可靠传输、安全等功能,最著名的实现包括:Apache的ActiveMQ、RabbitMQ等。
Celery是Python编写的分布式任务队列,它使用分布式消息进行工作,可以基于RabbitMQ或Redis来作为后端的消息代理,这个内容我们会在项目中讲到
要实现任务的异步化,可以使用名为Celery的三方库。Celery是Python编写的分布式任务队列,它使用分布式消息进行工作,可以基于RabbitMQ或Redis来作为后端的消息代理
## 常见反爬策略及应对方案
1. 构造合理的HTTP请求头。
- Accept
- User-Agent - 三方库fake-useragent
```Python
from fake_useragent import UserAgent
ua = UserAgent()
ua.ie
# Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);
ua.msie
# Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)'
ua['Internet Explorer']
# Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)
ua.opera
# Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11
ua.chrome
# Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
ua.google
# Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13
ua['google chrome']
# Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
ua.firefox
# Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1
ua.ff
# Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1
ua.safari
# Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25
# and the best one, random via real world browser usage statistic
ua.random
```
- Referer
- Accept-Encoding
- Accept-Language
2. 检查网站生成的Cookie。
- 有用的插件:[EditThisCookie](http://www.editthiscookie.com/)
- 如何处理脚本动态生成的Cookie
3. 抓取动态内容。
- Selenium + WebDriver
- Chrome / Firefox - Driver
4. 限制爬取的速度。
5. 处理表单中的隐藏域。
- 在读取到隐藏域之前不要提交表单
- 用RoboBrowser这样的工具辅助提交表单
6. 处理表单中的验证码。
- OCR(Tesseract) - 商业项目一般不考虑
- 专业识别平台 - 超级鹰 / 云打码
```Python
from hashlib import md5
class ChaoClient(object):
def __init__(self, username, password, soft_id):
self.username = username
password = password.encode('utf-8')
self.password = md5(password).hexdigest()
self.soft_id = soft_id
self.base_params = {
'user': self.username,
'pass2': self.password,
'softid': self.soft_id,
}
self.headers = {
'Connection': 'Keep-Alive',
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
}
def post_pic(self, im, codetype):
params = {
'codetype': codetype,
}
params.update(self.base_params)
files = {'userfile': ('captcha.jpg', im)}
r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers)
return r.json()
if __name__ == '__main__':
client = ChaoClient('用户名', '密码', '软件ID')
with open('captcha.jpg', 'rb') as file:
print(client.post_pic(file, 1902))
```
7. 绕开“陷阱”。
- 网页上有诱使爬虫爬取的爬取的隐藏链接(陷阱或蜜罐)
- 通过Selenium+WebDriver+Chrome判断链接是否可见或在可视区域
8. 隐藏身份。
- 代理服务 - 快代理 / 讯代理 / 芝麻代理 / 蘑菇代理 / 云代理
[《爬虫代理哪家强?十大付费代理详细对比评测出炉!》](https://cuiqingcai.com/5094.html)
- 洋葱路由 - 国内需要翻墙才能使用
```Shell
yum -y install tor
useradd admin -d /home/admin
passwd admin
chown -R admin:admin /home/admin
chown -R admin:admin /var/run/tor
tor
```
......@@ -1830,7 +1830,7 @@ CSRF令牌和小工具
>>> str1 = '我爱你们!'
>>> str2 = AES.new(key, AES.MODE_CFB, iv).encrypt(str1)
b'p\x96o\x85\x0bq\xc4-Y\xc4\xbcp\n)&'
>>> str3 = AES.new(key, AES.MODE_CFB, iv).decrypt(str2).decode
>>> str3 = AES.new(key, AES.MODE_CFB, iv).decrypt(str2).decode()
'我爱你们!'
```
......@@ -2014,7 +2014,6 @@ TOTAL 427 367 14%
- stub:测试期间为提供响应的函数生成的替代品
- mock:代替实际对象(以及该对象的API)的对象
- fake:没有达到生产级别的轻量级对象
#### 集成测试
......
## 英语面试
以下用I表示面试官(Interviewer),用C表示面试者(Candidate)。
### 开场寒暄
1. I: Thanks for waiting. (Please follow me.)
C: It's no problem.
2. I: How are you doing this morning?
C: I'm great. / I'm doing fine. Thank you. / How about you?
3. I: How did you get here?
C: I took the subway here. / I drove here.
4. I: Glad to meet you.
C: Glad to meet you. / It's great to finally meet you in person. (之前电话沟通过的)
### 正式面试
#### 人力面试
1. I: Can you tell me a little bit about yourself? (介绍下自己)
原则:不要谈私生活和奇怪的癖好(英雄联盟干到钻石),因为别人更想知道的是你的专业技能(qulifications)和工作经验(experience),所以重点在你之前的公司(company name)、职位(title)、时间(years)和主要职责(major responsibilities)
C: Thank you for having me. My name is Dachui WANG. I'm 25 years old, and I'm single. I have a Bachelor's Degree of Computer Science from Tsinghua University. I was a Junior Java Programmer for ABC Technologies during my college life. Then I become an intermediate Java engineer for XYZ Corporation in last two years. Programming is my everyday life and programming is where my passion is. I think I have a good knowledge of Java enterprise application developement using light-weight frameworks like Spring, Guice, Hibernate and other open source middle-ware like Dubbo, Mycat, rocketmq and so on and so forth. I love reading, travelling and playing basketball in my spare time. That's all! Thank you!
2. I: How would you describe your personality? (你的性格)
C: I'm hard working, eager to learn, and very serious about my work. I enjoy working with other people and I love challenges.
3. I: What do you know about our company? (你对我们公司有什么了解)
(需要做功课,了解公司的状况和企业文化,该公司在这个行业中的一个状况,有什么核心业务,主要的竞争对手有哪些)
C: The one thing that I like the most about our company is your core values. I think they're very important in this industry because …(自由发挥的部分)... I personally really believe in the cause as well. Of course, I'm very interested in your products such as …(功课部分)… and the techniques behind them.
4. I: Why are you leaving your last job? (为什么离职)
C: I want to advance my career and I think this job offers more challenges and opportunities for me do to that.
5. I: What do you see yourself in 3 or 5 years? (3-5年职业规划)
C: My long term goals involve growing with the company, where I can continue to learn, to take on additional responsibilities and to contribute as much value as I can. I intend to take advantage of all of these.
6. I: What's your salary expectation? (期望薪资)
C: My salary expectation is in line with my experience and qualifications. I believe our company will pay me and every other employee fairly. (把球踢给对方先看看对方报价是多少,如果对方非要你报价再说后面的内容) I think 15 thousands RMB or above is fitting for me to leave in Chengdu.
7. I: Do you have any questions for me? (问面试官的问题)
C: What's the growth potential for this position?
#### 技术面试
1. I: What's difference between an interface and an abstract class?
2. I: What are pass by reference and pass by value?
3. I: What's the difference between process and threads?
4. I: Explain the available thread state in high-level.
5. I: What's deadlocks? How to avoid them?
6. I: How HashMap works in Java?
7. I: What's the difference between ArrayList and LinkedList? (类似的问题还有很多,比如比较HashSet和TreeSet、HashMap和Hashtable)
8. I: Tell me what you know about garbage collection in Java.
9. I: What're two types of exceptions in Java?
10. I: What's the advantage of PreparedStatement over Statement?
11. I: What's the use of CallableStatement?
12. I: What does connection pool mean?
13. I: Explain the life cycle of a Servlet.
14. I: What's the difference between redirect and forward?
15. I: What's EL? What're implicit objects of EL?
16. I: Tell me what you know about Spring framework and its benefits.
17. I: What're different types of dependency injection.
18. I: Are singleton beans thread safe in Spring framework?
19. I: What're the benefits of Spring framework's transaction management?
20. I: Explain what's AOP.
21. I: What's a proxy and how to implement proxy pattern?
22. I: How Spring MVC works?
23. I: What's the working scenario of Hibernate and MyBatis?
24. I: How to implement SOA?
25. I: Make a brief introduction of the projects you are involved before?
上面主要是面试Java程序员的问题,但是整个流程大致如此。
......@@ -59,7 +59,7 @@
1. 安装底层依赖库。
```Shell
yum -y install wget gcc zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel
```
2. 下载Python源代码。
......@@ -88,11 +88,11 @@
make && make install
```
6. 配置PATH环境变量(用户环境变量)并激活。
6. 配置PATH环境变量(用户或系统环境变量)并激活。
```Shell
cd ~
vim .bash_profile
vim ~/.bash_profile
vim /etc/profile
```
```INI
......@@ -104,7 +104,8 @@
```
```Shell
source .bash_profile
source ~/.bash_profile
source /etc/profile
```
7. 注册软链接(符号链接)- 这一步不是必须的,但通常会比较有用。
......@@ -122,7 +123,7 @@
### 项目目录结构
假设项目文件夹为`project`,下面的五个子目录分别是:`code`、`conf`、`logs`、`stat`和`venv`分别用来保存项目的代码、配置文件、日志文件、静态资源和虚拟环境。其中,`conf`目录下的子目录`cert`中保存了配置HTTPS需要使用的证书和密钥;`code`目录下的项目代码可以通过版本控制工具从代码仓库中检出;虚拟环境可以通过工具(如:venv、virtualenv等)进行创建。
假设项目文件夹为`project`,下面的五个子目录分别是:`code`、`conf`、`logs`、`stat`和`venv`分别用来保存项目的代码、配置文件、日志文件、静态资源和虚拟环境。其中,`conf`目录下的子目录`cert`中保存了配置HTTPS需要使用的证书和密钥;`code`目录下的项目代码可以通过版本控制工具从代码仓库中检出;虚拟环境可以通过工具(如:venv、virtualenv、pyenv等)进行创建。
```
project
......@@ -211,12 +212,32 @@ project
![](./res/aliyun-certificate.png)
可以使用类似于sftp的工具将证书上传到`conf/cert`目录,然后使用git克隆项目代码到`code`目录。
```Shell
cd code
git clone <url>
```
回到项目目录,创建并激活虚拟环境。
```Shell
python3 -m venv venv
source venv/bin/activate
```
重建项目依赖项。
```Shell
pip install -r code/teamproject/requirements.txt
```
### uWSGI的配置
1. 安装uWSGI。
```Shell
pip3 install uwsgi
pip install uwsgi
```
2. 修改uWSGI的配置文件(`/root/project/conf/uwsgi.ini`)。
......@@ -226,7 +247,7 @@ project
# 配置前导路径
base=/root/project
# 配置项目名称
name=fangtx
name=teamproject
# 守护进程
master=true
# 进程个数
......@@ -320,7 +341,7 @@ project
uwsgi_pass 172.18.61.250:8000;
}
location /static/ {
alias /root/project/static/;
alias /root/project/stat/;
expires 30d;
}
}
......@@ -497,7 +518,7 @@ root
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 590 | | | |
| mysql-bin.000003 | 590 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
......@@ -511,25 +532,25 @@ root
3. 创建和配置slave。
```Shell
docker run -d -p 3308:3306 --name mysql57-slave-1 \
docker run -d -p 3308:3306 --name mysql-slave-1 \
-v /root/mysql/slave-1/conf:/etc/mysql/mysql.conf.d \
-v /root/mysql/slave-1/data:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=123456 \
--link mysql-master:mysql-master mysql:5.7
docker run -d -p 3309:3306 --name mysql57-slave-2 \
docker run -d -p 3309:3306 --name mysql-slave-2 \
-v /root/mysql/slave-2/conf:/etc/mysql/mysql.conf.d \
-v /root/mysql/slave-2/data:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=123456 \
--link mysql-master:mysql-master mysql:5.7
docker run -d -p 3310:3306 --name mysql57-slave-3 \
docker run -d -p 3310:3306 --name mysql-slave-3 \
-v /root/mysql/slave-3/conf:/etc/mysql/mysql.conf.d \
-v /root/mysql/slave-3/data:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=123456 \
--link mysql-master:mysql-master mysql:5.7
docker exec -it mysql57-slave-1 /bin/bash
docker exec -it mysql-slave-1 /bin/bash
```
```Shell
......@@ -547,7 +568,7 @@ root
mysql> reset slave;
Query OK, 0 rows affected (0.02 sec)
mysql> change master to master_host='mysql-master', master_user='slave', master_password='iamslave', master_log_file='mysql-bin.000001', master_log_pos=590;
mysql> change master to master_host='mysql-master', master_user='slave', master_password='iamslave', master_log_file='mysql-bin.000003', master_log_pos=590;
Query OK, 0 rows affected, 2 warnings (0.03 sec)
mysql> start slave;
......
......@@ -186,5 +186,5 @@ Python为什么要做出这样的设定呢?用一句广为流传的格言来
所以在Python中我们实在没有必要将类中的属性或方法用双下划线开头的命名处理成私有的成员,因为这并没有任何实际的意义。如果想对属性或方法进行保护,我们建议用单下划线开头的受保护成员,虽然它也不能真正保护这些属性或方法,但是它相当于给调用者一个暗示,让调用者知道这是不应该直接访问的属性或方法,而且这样做并不影响子类去继承这些东西。
需要提醒大家注意的是,Python类中的那些魔法方法,如\_\_str\_\_\_\_repr\_\_等,这些方法并不是私有成员哦,虽然它们以双下划线开头,但是他们也是以双下划线结尾的,这种命名并不是私有成员的命名,这一点对初学者来说真的很坑。
需要提醒大家注意的是,Python类中的那些魔法方法,如`__str__``__repr__`等,这些方法并不是私有成员哦,虽然它们以双下划线开头,但是他们也是以双下划线结尾的,这种命名并不是私有成员的命名,这一点对初学者来说真的很坑。
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment