Webcrawler API LogoWebCrawler API
PricingDocsBlogSign inSign Up
Webcrawler API LogoWebCrawler API

Tools

  • Website to Markdown
  • llms.txt Generator

Resources

  • Blog
  • Docs
  • Changelog

Follow us

  • Github
  • X (Twitter)
  • Postman
  • Swagger

Legal

  • Privacy Policy
  • Terms & Conditions
  • Refund Policy

Made in Netherlands 🇳🇱
2023-2025   ©103Labs
    TechnicalTutorial6 min read to read

    How to extract XPath in Golang

    XPath is a powerful tool for selecting nodes in an XML document. In this article, we will show you how to extract XPath in Golang.

    Written byAndrew
    Published onDec 29, 2025

    Table of Contents

    • How it works in Go
    • Extracting multiple Xpath elements in Golang

    Table of Contents

    • How it works in Go
    • Extracting multiple Xpath elements in Golang

    Sometimes you need to extract data points from HTML using Xpath (Read what is XPath first). First, install htmlquery package:

    go get github.com/antchfx/htmlquery
    

    Here is the code and read explanation after:

    package main
    
    import (
    	"bytes"
    	"github.com/antchfx/htmlquery"
    	"strings"
    	"fmt"
    )
    
    // ExtractXPath function that accepts a single XPath expression and returns a single string
    func ExtractXPath(htmlStr string, xpathExpr string) (string, error) {
    	// Load the HTML document
    	var buffer bytes.Buffer
    	buffer.WriteString(htmlStr)
    	doc, err := htmlquery.Parse(&buffer)
    	if err != nil {
    		return "", err
    	}
    
    	// Find the nodes matching the XPath expression
    	nodes := htmlquery.Find(doc, xpathExpr)
    	var content []string
    
    	// Iterate over the nodes and extract the content
    	for _, node := range nodes {
    		content = append(content, htmlquery.InnerText(node))
    	}
    	// Join the extracted content if multiple nodes were found
    	result := strings.Join(content, " ")
    
    	return result, nil
    }
    
    func main() {
    	htmlStr := `
    		<html>
    			<head>
    				<title>Test Page</title>
    			</head>
    			<body>
    				<div class="content">
    					<p>Hello, World!</p>
    					<p>This is a test.</p>
    				</div>
    			</body>
    		</html>`
    
    	xpathExpr := "//div[@class='content']/p"
    
    	content, err := ExtractXPath(htmlStr, xpathExpr)
    	if err != nil {
    		fmt.Println("Error:", err)
    	} else {
    		fmt.Println("Extracted content:", content)
    	}
    }
    

    You will receive the output:

    Extracted content: Hello, World! This is a test.
    

    How it works in Go

    Luckily, there is an open-source lib htmlquery for that. Install it first:

    go get github.com/antchfx/htmlquery
    

    Then, do a basic query against the document:

    nodes, err := htmlquery.QueryAll(doc, "//a")
    if err != nil {
    	panic(`not a valid XPath expression.`)
    }
    

    See more examples at the doc.

    Extracting multiple Xpath elements in Golang

    package extract
    
    import (
    	"bytes"
    	"github.com/antchfx/htmlquery"
    	"strings"
    )
    
    type Rules = map[string]string
    type Content = map[string]string
    
    func XPath(htmlStr string, filter Rules) (Content, error) {
    	// Load the HTML document
    	var buffer bytes.Buffer
    	buffer.WriteString(htmlStr)
    	doc, err := htmlquery.Parse(&buffer)
    	if err != nil {
    		return nil, err
    	}
    
    	result := make(Content)
    
    	// Iterate over the filter to apply each XPath expression
    	for key, xpathExpr := range filter {
    		// Find the nodes matching the XPath expression
    		nodes := htmlquery.Find(doc, xpathExpr)
    		var content []string
    
    		// Iterate over the nodes and extract the content
    		for _, node := range nodes {
    			content = append(content, htmlquery.InnerText(node))
    		}
    		// Join the extracted content if multiple nodes were found
    		result[key] = strings.Join(content, " ")
    	}
    
    	return result, nil
    }
    

    Extracting multiple Xpath elements requires more complicated code. First, define two maps: for extracting rules and for the result. Each rule has its own key, which will be used for the result map after extraction.

    Then, iterate over filter rules and find elements for each rule. Extract it and put it into the resulting map under a certain key.

    Here is the usage example:

    
    filter := Rules{
    		"Title":          "//title/text()",
    		"Header":         "//h1/text()",
    		"link_more_info": "//a[contains(text(),'More information')]/@href",
    		"link_fb":        "//a[contains(text(),'Another link fb')]/@href",
    	}
    
    	content, err := XPath(html, filter)
    	if err != nil {
    		fmt.Println("Error: %s", err)
    	}
    	fmt.Printf("Extracted content: %v
    ", content)
    

    The result map will be:

    map[
    	Header:Example Domain
    	Title:Example Domain
    	link_fb:https://fb.com/test
    	link_more_info:https://www.iana.org/domains/example
    ]