快速上手_w3cschool

使用colly之前请确保已经按照上一节配置好开发环境。下面通过一些简单的例子，带你快速上手colly。

首先，你需要在代码中引入colly包：

import "github.com/gocolly/colly"

接下来介绍colly中几个关键概念：

Collector

Colly的首要入口是一个 Collector 对象。 Collector 管理网络通信并负责在 Collector job 运行时执行附加的回调。使用colly，你必须初始化一个Collector：

c := colly.NewCollector()

为 Collector 添加回调函数

回调你可以把不同类型的回调函数附加到收集器上来控制收集任务，然后取回信息

c.OnRequest(func(r *colly.Request) {
    fmt.Println("Visiting", r.URL)
})


c.OnError(func(_ *colly.Response, err error) {
    log.Println("Something went wrong:", err)
})


c.OnResponse(func(r *colly.Response) {
    fmt.Println("Visited", r.Request.URL)
})


c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    e.Request.Visit(e.Attr("href"))
})


c.OnHTML("tr td:nth-of-type(1)", func(e *colly.HTMLElement) {
    fmt.Println("First column of a table row:", e.Text)
})


c.OnXML("//h1", func(e *colly.XMLElement) {
    fmt.Println(e.Text)
})


c.OnScraped(func(r *colly.Response) {
    fmt.Println("Finished", r.Request.URL)
})

回调函数的执行顺序

OnRequest 请求发出之前调用

OnError 请求过程中出现Error时调用

OnResponse 收到response后调用

OnHTML 如果收到的内容是HTML，就在onResponse执行后调用

OnXML 如果收到的内容是HTML或者XML，就在onHTML执行后调用

OnScraped OnXML执行后调用