leetcode-98-Validate Binary Search Tree

Posted on 2019-05-06 | In leetcode , top-100-liked-questions |

Words count in article: 369 | Reading time ≈ 2

Given a binary tree, determine if it is a valid binary search tree (BST).

leetcode-84-Largest Rectangle in Histogram

Posted on 2019-05-05 | In leetcode , top-100-liked-questions |

Words count in article: 634 | Reading time ≈ 2

Given n non-negative integers representing the histogram’s bar height where the width of each bar is 1, find the area of largest rectangle in the histogram.

leetcode-85-Maximal Rectangle

Posted on 2019-05-05 | In leetcode , top-100-liked-questions |

Words count in article: 708 | Reading time ≈ 3

Given a 2D binary matrix filled with 0’s and 1’s, find the largest rectangle containing only 1’s and return its area.

学习爬虫Part6-动态爬取解决方案之 selenium

Posted on 2019-05-05 | In python , 学习爬虫 , 从零开始爬虫系列 |

Words count in article: 2.2k | Reading time ≈ 8

动态爬取解决方案之 selenium

通过驱动浏览器获得的解析JavaScript的能力

Read more »

学习爬虫Part5-动态爬取解决方案之手动分析

Posted on 2019-05-05 | In python , 学习爬虫 , 从零开始爬虫系列 |

Words count in article: 1.1k | Reading time ≈ 4

动态爬取解决方案之手动分析

动态的标志

进入一个网页，鼠标到处点，滑轮上下滚，各种框框各种信息都蹦出来了，但是网页链接没变过，网页也没重新刷新过

比如：逛网页版的网易云音乐的评论时，无论评论翻到第几页，网址也不会改变；逛知乎时，鼠标不停往下滚，只要下面还有回答，就会不断的加载出来，同样网址也不会改变

类似这样能不转跳不刷新就能加载新信息的网页，就是用了动态加载。

学习爬虫Part4-初遇json&爬取某宝商品信息

Posted on 2019-05-05 | In python , 学习爬虫 , 从零开始爬虫系列 |

Words count in article: 2.4k | Reading time ≈ 9

初遇json&爬取某宝商品信息

JSON

是什么

json是轻量级的文本数据交换格式，符合json的格式的字符串叫json字符串，其格式就像python中字符串化后的字典，有时字典中还杂着列表字典，但是里面的数据都被双引号包着，下面是一个例子

'{"Africa": [
{ "name":"蜜獾" , "nickname":"平头哥" }, 
{ "name":"虫子" , "nickname":"小辣条" }, 
{ "name":"毒蛇" , "nickname":"大面筋" }
            ]
 }'
#这是理想化的数据，实际上看到的json是不分行堆在一起，而且更多时候用unicode编码取代中文

而且为了能更好的传输各种语言，json对非英语的字符串进行了Unicode编码，于是我们直接看到的json数据通常都是带着\uxxxx的字符串而不会带着中文，json数据还会堆在一起不换行，给我们的分析带来了困难，不过我们有json 模块让它转回中文，更有一个 https://www.bejson.com/ 把它转回中文同时排版，分析json数据时多用这个工具。

学习爬虫Part3-强大的正则表达式，re模块

Posted on 2019-05-05 | In python , 学习爬虫 , 从零开始爬虫系列 |

Words count in article: 3.5k | Reading time ≈ 13

强大的正则表达式，re模块

特性

其实BeautifulSoup也是用正则实现的，而且它find_all的参数里还能接收正则
BeautifulSoup用的是节点定位，可能会出现多个符合条件的节点（却没有目标信息）；正则是直接针对目标信息，以字符为单位匹配，一次筛选出正确结果
有时候完整的信息不是你想要的，你只想取它的某一部分，正则能搞定，BeautifulSoup只能先获取完整信息再分离。

如果你要匹配一个ip地址，正则表达式会是这样
匹配ip地址：((?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d).){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d))

学习爬虫Part2-实战：requests+BeautifulSoup实现静态爬取

Posted on 2019-05-05 | In python , 学习爬虫 , 从零开始爬虫系列 |

Words count in article: 1.1k | Reading time ≈ 4

实战：requests+BeautifulSoup实现静态爬取

静态网页是指一次性加载所有内容的网页，爬虫一次请求便能得到所有信息，对爬虫非常友好。

学习爬虫Part1-爬虫思路&requests模块使用

Posted on 2019-05-04 | In python , 学习爬虫 , 从零开始爬虫系列 |

Words count in article: 2.9k | Reading time ≈ 10

爬虫思路&requests模块使用

http协议知识整理

Posted on 2019-05-04 | In python , 学习爬虫 , 爬虫专题 |

Words count in article: 1.3k | Reading time ≈ 5

HTTP 协议

1.HTTP协议是什么？

http协议是一个应用层的协议。规定了浏览器和服务器之间的通信规范。通常用TCP连接方式。

动态爬取解决方案 之 selenium

动态爬取解决方案 之 手动分析

动态的标志

初遇json&爬取某宝商品信息

JSON

是什么

强大的正则表达式，re模块

特性

实战：requests+BeautifulSoup实现静态爬取

爬虫思路&requests模块使用

HTTP 协议

1.HTTP协议是什么？

动态爬取解决方案之 selenium

动态爬取解决方案之手动分析