lazytiger / gumbo-query Goto Github PK

View Code? Open in Web Editor NEW

262.0 262.0 80.0 1.64 MB

c++ library to provide jQuery style api for gumbo library

License: MIT License

Shell 2.26% C++ 46.90% CMake 17.45% Makefile 14.44% Ruby 0.49% HTML 18.45%

gumbo-query's People

Contributors

Stargazers

Watchers

Forkers

falven zenden2k d0rc iwangxl zhangf911 sarrow104 tychus tsh185 luobenyu jadderbao keithyipkw ruipires jacklicn chkob danyplay natcoder grobx henryzxj great-bug xianlimei dingjingmaster liyoung1992 kylinxh atom9j hotercyc nekodragonica f34nk linecode altairwei rogersguedes chai21cn yuliswe o0white0o presleyhank rudals uramanfj mrkr derofim snolkmg anclark dartlogas bayonetta5 wangchuangang yanick-salzmann tpnndhqc xuhuajie-2021 fjb2080 seh9306 drivestudy cejutue getsync donglu saddalim seafitliu mesteriis sigil-ebook echozzj actondev ichdy gerhobbelt gsioteam mzjaworski iomeone yueyingyehua cppcooper wangdan9454 galeone rzvc hernan-gonzalez batistasilva yucie williamqf-ai cjiezhao whitestone7 mliesmot elpresedente liunix61

gumbo-query's Issues

Unable to install Gumbo Query due to it unable to find Gumbo parser shared library

Problems while encoding russian symbols

i m trying to parse html in utf8 format, while it contains russian symbols. As a result i get symbols ?????? instead of normal symbols.

nth-child(odd) skips first node

Machine Linux (GCC 7.1) and Windows (GCC 6.3)
Sample Code

        std::string page("<div><p>1</p><p>2</p><p>3</p><p>4</p><p>5</p><p>6</p></div>");
	CDocument doc;
	doc.parse(page.c_str());
	CSelection c = doc.find("p:nth-child(odd)");
	CNode node = c.nodeAt(0);

		
	std::cout << c.nodeNum() << std::endl;
	for (int i = 0; i < c.nodeNum(); i++)
	{
		CNode node = c.nodeAt(i);
		std::cout << "  " + node.text() << std::endl;
	}

Expected Output

Original output

2
  3
  5

CNode::startPos 与CNode::endPos颠倒的问题

问题场景：

给定这样一个page：

std::string page = "<!DOCTYPE html><html lang=\"en\"><head><meta charset=\"UTF-8\"><title>Title</title></head><body><img src=\"file:///d:/test.png\" /><img src=\"file:///d:/test2.png\" /><img src=\"http://blablabla.png\" /><div><div><img src=\"http://asdfasfdsaf.com/asdf.png\" alt=\"alternative text\"/><img src=\"file:///d:/test3.png\" /></div><div1><div2><div3><img id=\"imgId2\" src=\"http://www.taobao.com\" /><pre><img src=\"file:///foo\" id=\"bar\" id=\"notexists\"/></pre></div3></div2></div1></div></body></html>";

在其中查找所有tag为img的node

auto nodes = doc.find("IMG");

然后取出这些node的 startPos 和 endPos ，
发现它们 startPos 是指向标签末尾的，endPos指向标签开始，是不是这两个函数的实现搞反了？

Setting text contents of a node?

I'm assuming this wasn't intended, but would it be possible to create a way to set the text contents of a CNode? I'm in a situation where I need to update parts of a DOM on the fly, and I need such a feature.

If it were to be implemented, I'd imagine overloading .text() for a CNode to accept an std::string would work well and be similar to the JQuery function .html().

not support css3 selector

like a[src*="runoob"] is not supported, maybe updated?

.text() including HTML?

Would it be possible to have .text() include the HTML inside a node as well as the text content?

For example:

std::string te = "<div><span>1</span>2</div>";
CDocument cdo;
cdo.parse(te);
// cdo.find("div").nodeAt(0).text() should be "<span>1</span>2" not "12"

License

Since this library is based heavily on the cascadia GO library, it's recommended to include the original Author's copyright notice in your license file. Example:

Presently, your license doesn't reflect that your work is derived from another copyrighted work. Not trying to be nitpicky, just a suggestion. :)

References:
https://github.com/andybalholm/cascadia/blob/master/LICENSE
http://programmers.stackexchange.com/a/22261/22018

Trimming strings for advanced datasets

After some digging around I found a way to trim the unformatted strings (containing '\r', '\v', '\f', '\n', '\t', ' ') this library returns when parsing HTML files. For example a file with multiple spaces etc can be very annoying when you for example try to train a ML algortigh that gets data from libcurl. So this function 'reduce' will tranform the string:

You can modify the text in the box to the left any way you like, and                               ss
        then click the "Show Page" button below the box to display the
        result here. Go ahead and do this as often and as long as you like.

To something like this:

You can modify the text in the box to the left any way you like, and ss then click the "Show Page" button below the box to display the result here. Go ahead and do this as often and as long as you like.

The code:

std::string trim(
    const std::string& str,
    const std::string& whitespace = " \t \n \r \v \f"
){
    const auto strBegin = str.find_first_not_of(whitespace);
    if (strBegin == std::string::npos)
        return ""; // no content

    const auto strEnd = str.find_last_not_of(whitespace);
    const auto strRange = strEnd - strBegin + 1;

    return str.substr(strBegin, strRange);
}

std::string reduce(
    const std::string& str,
    const std::string& fill = " ",
    const std::string& whitespace = " \t \n \r \v \f")
{
    // trim first
    auto result = trim(str, whitespace);

    // replace sub ranges
    auto beginSpace = result.find_first_of(whitespace);
    while (beginSpace != std::string::npos)
    {
        const auto endSpace = result.find_first_not_of(whitespace, beginSpace);
        const auto range = endSpace - beginSpace;

        result.replace(beginSpace, range, fill);

        const auto newStart = beginSpace + fill.length();
        beginSpace = result.find_first_of(whitespace, newStart);
    }

    return result;
}

I go this from a reddit post, but it did not have an author.

Document.h includes gumbo.h which is missing

I built *.deb with

cmake ..
cmake — build .
sudo checkinstall

then my application tries to compile with error

Document.h:19:10: fatal error: gumbo.h: No such file or directory
#include <gumbo.h>

And it's true. There is no gumbo.h in /usr/local/include/gq

Document.h
Node.h
Object.h
Parser.h
QueryUtil.h
Selection.h
Selector.h

How to resolve?

add more examples

I hope gumbo-query will have more usage examples, so i prepared some examples of examples :)

https://gist.github.com/derofim/517b60c637dc2d8e0f680610ffd8722f

I still dont know how to use properly things like nextSibling().

Ubuntu 14.04 x64 LTS compile gumbo-query error.

-- The CXX compiler identification is GNU 4.8.4
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at cmake/LibFindMacros.cmake:259 (message):
REQUIRED PACKAGE NOT FOUND

We only found some files of Gumbo, not all of them. Perhaps your
installation is incomplete or maybe we just didn't look in the right place?
This package is REQUIRED and you need to install it or adjust CMake
configuration in order to continue building gumbo_query.

Relevant CMake configuration variables:

Gumbo_INCLUDE_DIR=/usr/local/include
Gumbo_LIBRARY=<not found>
Gumbo_static_LIBRARY=/usr/local/lib/libgumbo.a

You may use CMake GUI, cmake -D or ccmake to modify the values. Delete
CMakeCache.txt to discard all values and force full re-detection if
necessary.

Call Stack (most recent call first):
cmake/FindGumbo.cmake:39 (libfind_process)
CMakeLists.txt:20 (find_package)

-- Configuring incomplete, errors occurred!

Have you specified a license for your code?

Hi,
Sigil the opensource epub editor has decided to adopt Google's gumbo parser to help with html5 used in epub3. I found your project and have to do something similar to use it with Qt. Do you have a leicnese for your code yet so I know whether or not I could use in inside our GPL3 project?

Thanks,
Kevin

Building on linux

May you add to cmake/FindGumbo.cmake libgumbo.so or libgumbo.so.1 ? It will allow a successful build on linux.

Patch example:

--- cmake/FindGumbo.cmake.orig 2015-08-06 20:37:13.000000000 -0300
+++ cmake/FindGumbo.cmake 2015-08-08 05:46:41.021517785 -0300
@@ -23,7 +23,7 @@

Finally the library itself

find_library(Gumbo_LIBRARY

NAMES libgumbo.dylib libgumbo.dll gumbo.dylib gumbo.dll
NAMES libgumbo.dylib libgumbo.dll gumbo.dylib gumbo.dll libgumbo.so
PATHS ${Gumbo_PKGCONF_LIBRARY_DIRS}
)

No Unicode support?

It seems like there's no unicode support, because CDocument .parse only accepts std::string, which doesn't seem unicode friendly (at least under Windows)

Crash when select string include '(' char.

I am searching a "script" node in facebook html source, the node is like
<.s.c.r.i.p.t.>require("TimeSlice").guard(function() ... ///< The dots in "script" is for showing this line normally in issue page.

So I use this selector to find this node
CSelection c = doc.find("script:contains(require("TimeSlice"))");

But, the app crashed with error "terminate called after throwing an instance of 'std::string'", GDB says it crash in doc.find function.

If I use CSelection c = doc.find("script:contains(require)"), it works well. But these nodes are not what I want. So, I think gumbo-query's "contains" filter does not support '(' in it.

CSelection可能的内存泄露？

您好，出于某种目的，我需要new一个CSelection的对象，就像这样：CSelection* sel = new CSelection(selection)，调用的拷贝构造函数。然而在程序的最后我有delete这个指针，但是用valgrind检测的时候还是检测出了内存泄露。释放CSelection对象的时候是直接delete还是调用release()函数呢？还是说这是个潜在的内存泄露问题？英文不好所以就用中文了，望解答，谢谢！

.find("tag[attribute*='somethingwithasinglequote']")

If I have a span with the id "that's", I can't seem to use .find() to get to it:

.find("span[id_='that's']") doesn't work (error)
.find("span[id_='that's']") doesn't work (error)

An unhandled exception of type 'System.StackOverflowException' occurred in Parser.cpp

std::string ret = message + " at:";
for (unsigned int i = ds.size() - 1; i >= 0; i--)
{
ret.push_back(ds[i]);
}

Where is <gumbo.h> located at in the src folder? i have my own cmake workspace and just need the source code

Getting nodes with a specific class

Hi,

I want to get all nodes with a class containing (not matching exactly) a specific value.
I used 'find', but it returns only the nodes with class matching exactly the value.
For example, if I have a:

<div class="the-doc the-row'">
      Hello world
</div>

This do not work:

    CSelection c = doc.find("div[class=' the-row']");
    for (int i = 0; i < c.nodeNum(); i++) {
        qDebug() << c.nodeAt(i).text().c_str();
    }

this works:

    CSelection c = doc.find("div[class='the-doc the-row']");
    for (int i = 0; i < c.nodeNum(); i++) {
        qDebug() << c.nodeAt(i).text().c_str();
    }

Thanks.

Static library not found

Just tried 3 different package installations of gumbo-parser (Archlinux)

pacman -Sy gumbo-parser
yay -Sy gumbo-parser
yay -Sy gumbo-git

However, none provides a static library to find.. not even gumbo-git which built and installed the package from source. So the cmake configuration process always fails. Ideally, gumbo-query should accept a package install that has a default configuration and not been customized to produce a static library.

CMake Error at extern/gumbo-query/cmake/LibFindMacros.cmake:259 (message):
  REQUIRED PACKAGE NOT FOUND

  We only found some files of Gumbo, not all of them.  Perhaps your
  installation is incomplete or maybe we just didn't look in the right place?
  This package is REQUIRED and you need to install it or adjust CMake
  configuration in order to continue building nxm.

  Relevant CMake configuration variables:

    Gumbo_INCLUDE_DIR=/usr/include
    Gumbo_LIBRARY=/usr/lib/libgumbo.so
    Gumbo_static_LIBRARY=<not found>

  You may use CMake GUI, cmake -D or ccmake to modify the values.  Delete
  CMakeCache.txt to discard all values and force full re-detection if
  necessary.

CObject destructor can throw, but shouldn't

CObject can throw inside destructor, but destructors should never throw. This can cause problems with stack unwinding.

Brew formula is wrong

In readme.md, "brew install gumbo-query" does not work so I am using brew tap instead. There are many problem in the gumbo-query.rb as indicated here Homebrew/legacy-homebrew#50276

Maybe the repo should point to this repo instead of Falven's.
No need to have sha256. I do not know much about homebrew formula but the sha245 seems to be for installing form an archive.
"install" does not work at all. "system" should be inside "cd".
The test does not work because the build folder does not exist when using "brew test". I do not know the way to properly write such a test in the formula so I removed the test in the formula below. Examples I found always create an adhoc file for testing, e.g. https://github.com/Homebrew/homebrew/blob/master/Library/Formula/tinyxml2.rb
The version of gumbo-query is always "query". I guess that because there is no versions and the last word accidentally matches brew formula naming convention. It is not a big problem comparing with problem 3.

class GumboQuery < Formula
  homepage "https://github.com/lazytiger/gumbo-query"
  url "https://github.com/lazytiger/gumbo-query", :using => :git

  depends_on "cmake" => :build

  def install
    cd "build" do
      system "cmake", "..", "-DCMAKE_INSTALL_PREFIX=#{prefix}"
      system "make"
      system "make", "install"
    end
  end

end

Conan package

Hi,

do you plan to create a conan package?

Thanks,
Dario

nth-of-type always select nothing

The logic of skipping nodes should be the opposite
keithyipkw@0efee4b

memory leak when using class selectors

void test_parser() {
    std::string page("<h1><a>wrong link</a><a class=\"special\"\\>some link</a></h1>");
    CDocument doc;
    doc.parse(page.c_str());

    CSelection c = doc.find("h1 a.special");
    printf("Node: %s\n", c.nodeAt(0).text().c_str());
}

I've checked that each iteration of test_parser adds more and more allocated memory. When I was trying to identify where memory leaks I've tried valgrind:

==89424== 98,304 bytes in 1,024 blocks are definitely lost in loss record 77 of 77
==89424==    at 0x66BB: malloc (in /usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==89424==    by 0x9A28D: operator new(unsigned long) (in /usr/lib/libc++.1.dylib)
==89424==    by 0x1000061EB: CParser::parseClassSelector() (in ./parser)
==89424==    by 0x100004CFC: CParser::parseSimpleSelectorSequence() (in ./parser)
==89424==    by 0x100003C9C: CParser::parseSelector() (in ./parser)
==89424==    by 0x100003664: CParser::parseSelectorGroup() (in ./parser)
==89424==    by 0x1000035E3: CParser::create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (in ./parser)
==89424==    by 0x10001503E: CSelection::find(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (in ./parser)
==89424==    by 0x100002935: CDocument::find(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (in ./parser)
==89424==    by 0x1000022B3: test_parser() (in ./parser)
==89424==    by 0x1000025C2: main (in ./parser)

I think it happens here: https://github.com/lazytiger/gumbo-query/blob/master/src/Parser.cpp#L625
Result is never freed. But I can't guess right place to delete this object, and it seems like it's not the only thing to delete after selection is done.

Brew unable to install and unable to make

I follow the instruction and tried to install through brew

Tuzis-MacBook:build tuzi$ brew install gumbo-paerser
Error: No available formula with the name "gumbo-paerser"
==> Searching for a previously deleted formula (in the last month)...
Warning: homebrew/core is shallow clone. To get complete history run:
git -C "$(brew --repo homebrew/core)" fetch --unshallow

Error: No previously deleted formula found.
==> Searching for similarly named formulae...
==> Searching local taps...
Error: No similarly named formulae found.
==> Searching taps...
==> Searching taps on GitHub...
Error: No formulae found in taps.

Thanks for your help

Homebrew

Hi,

I tried to install via homebrew today and it told me that the package wasn't found. I checked via Braumeister and same result. Only the parser package is available.

Thanks, Daniel

crashed when syntax error

Parser.cpp Line961: unsigned int will not less than zero
Modify:
for (unsigned int i = 0; i < ds.size(); i++)
{
ret.push_back(ds[ds.size()-i-1]);
}

Memory usage?

If I have the following code in a managed C++/CLI project:

for (int i = 0; i < 5000; i++)
{
std::string html = HttpRequest(......);

CDocument d;
d.parse(html);
CSelection c = d.find("#something");

if (c.nodeNum() != 0)
{ ,,,, }
}

Will this cause a memory leak, as variables d and c are not being destructed? What can I do to remedy this if that's the case?

Memory leaks in the provided example, and when exception is thrown

std::string page("<h1><a>some link</a></h1>");
CDocument doc;
doc.parse(page.c_str());

CSelection c = doc.find("h1 a");
std::cout << c.nodeAt(0).text() << std::endl; // some link

VLD output:

WARNING: Visual Leak Detector detected memory leaks!
---------- Block 2228 at 0x009F5B88: 32 bytes ----------
  Leak Hash: 0xCF2E3225, Count: 1, Total 32 bytes
  Call Stack (TID 2592):
    0x773B1020 (File and line number not available): ntdll.dll!RtlAllocateHeap
    f:\dd\vctools\crt\crtw32\heap\malloc.c (58): Tests.exe!_heap_alloc_base
    f:\dd\vctools\crt\crtw32\misc\dbgheap.c (431): Tests.exe!_heap_alloc_dbg_impl + 0x9 bytes
    f:\dd\vctools\crt\crtw32\misc\dbgheap.c (239): Tests.exe!_nh_malloc_dbg_impl + 0x19 bytes
    f:\dd\vctools\crt\crtw32\misc\dbgheap.c (302): Tests.exe!_nh_malloc_dbg + 0x1D bytes
    f:\dd\vctools\crt\crtw32\misc\dbgmalloc.c (56): Tests.exe!malloc + 0x15 bytes
    f:\dd\vctools\crt\crtw32\heap\new.cpp (59): Tests.exe!operator new + 0x9 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (640): Tests.exe!CParser::parseTypeSelector + 0x7 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (134): Tests.exe!CParser::parseSimpleSelectorSequence + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (58): Tests.exe!CParser::parseSelector + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (38): Tests.exe!CParser::parseSelectorGroup + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (33): Tests.exe!CParser::create + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\selection.cpp (37): Tests.exe!CSelection::find + 0x1C bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\document.cpp (42): Tests.exe!CDocument::find + 0x20 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\source\core\3rdpart\tests\gumbotest.cpp (16): Tests.exe!GumboTest_Simple_Test::TestBody + 0x27 bytes
   ....
  Data:
    80 3A 47 01    01 00 00 00    04 00 00 00    00 CD CD CD     .:G..... ........
    00 00 00 00    00 00 00 00    00 CD CD CD    0F 00 00 00     ........ ........


---------- Block 2233 at 0x009F5BE8: 32 bytes ----------
  Leak Hash: 0x39E52DD7, Count: 1, Total 32 bytes
  Call Stack (TID 2592):
    0x773B1020 (File and line number not available): ntdll.dll!RtlAllocateHeap
    f:\dd\vctools\crt\crtw32\heap\malloc.c (58): Tests.exe!_heap_alloc_base
    f:\dd\vctools\crt\crtw32\misc\dbgheap.c (431): Tests.exe!_heap_alloc_dbg_impl + 0x9 bytes
    f:\dd\vctools\crt\crtw32\misc\dbgheap.c (239): Tests.exe!_nh_malloc_dbg_impl + 0x19 bytes
    f:\dd\vctools\crt\crtw32\misc\dbgheap.c (302): Tests.exe!_nh_malloc_dbg + 0x1D bytes
    f:\dd\vctools\crt\crtw32\misc\dbgmalloc.c (56): Tests.exe!malloc + 0x15 bytes
    f:\dd\vctools\crt\crtw32\heap\new.cpp (59): Tests.exe!operator new + 0x9 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (640): Tests.exe!CParser::parseTypeSelector + 0x7 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (134): Tests.exe!CParser::parseSimpleSelectorSequence + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (89): Tests.exe!CParser::parseSelector + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (38): Tests.exe!CParser::parseSelectorGroup + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\parser.cpp (33): Tests.exe!CParser::create + 0x8 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\selection.cpp (37): Tests.exe!CSelection::find + 0x1C bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\contrib\source\gumbo-query\src\document.cpp (42): Tests.exe!CDocument::find + 0x20 bytes
    d:\develop\imageuploader-1.3.2-vs2013\image-uploader\source\core\3rdpart\tests\gumbotest.cpp (16): Tests.exe!GumboTest_Simple_Test::TestBody + 0x27 bytes
...
  Data:
    80 3A 47 01    01 00 00 00    04 00 00 00    00 CD CD CD     .:G..... ........
    00 00 00 00    00 00 00 00    00 CD CD CD    27 00 00 00     ........ ....'...


Visual Leak Detector detected 2 memory leaks (21900 bytes).
Largest number used: 82245 bytes.
Total allocations: 287293 bytes.
Visual Leak Detector is now exiting.

I have fixed this in CSelector* CParser::parseSelector():

CSelector* ret_old = ret;   // <-----
CSelector* sel = parseSimpleSelectorSequence();
// ....
else if (combinator == '~')
{
    ret = new CBinarySelector(ret, sel, true);
}
else
{
    throw error("impossible");
}

ret_old->release();  // <---
sel->release();   // <---

but i'm not sure if this is correct way to fix it

Escape method won't match correctly

So, going over the code for parsing selectors, there's the method parseEscape(...) which appears to be intended to convert attribute values that contain escaped characters and code points to literal values. However, doing some quick tests embedding such values in HTML, parsing then matching, it appears that gumbo_parser directly embeds these values untouched. So if I'm right, unescaping these sequences in supplied selectors will actually cause the selectors which should match, to not match.

I'm not familiar with the GO html package, but from what searching I've done, it appears that this might have been a requirement for cascadia (since it worked against the GO html package), but is invalid for use with gumbo_parser. On that note, gumbo_parser does appear to convert numeric character references and named character references, but gumbo_query doesn't of course, so matching a selector with an id containing something like " " will fail.

Will try to post some tests to confirm, I'm just reading over all of this figuring it out.

References:
http://www.w3.org/International/questions/qa-escapes
https://mathiasbynens.be/notes/css-escapes
http://www.w3.org/TR/html5/syntax.html#named-character-references

Error: No available formula with the name "gumbo-query"

─$ brew install gumbo-query
Updating Homebrew...
Error: No available formula with the name "gumbo-query"
==> Searching for a previously deleted formula (in the last month)...
Error: No previously deleted formula found.
==> Searching for similarly named formulae...
Error: No similarly named formulae found.
==> Searching taps...
==> Searching taps on GitHub...
Error: No formulae found in taps.

Getting the OuterHTML

Hi,

How can I get the OuterHTML (not just the InnerText) of a specific Tag.
For example, If I want to get the div tag with class="content" from this HTML source:

<html>

   <head>
      <title>My Title</title>
      <meta content="">
      <style></style>
   </head>
   
   <body>
      <h3>First header</h3>
      <p>text text text</p>
      <div class="content">
         <h3>My Text 0<a href="https://www.google.com">The site of google</a></h3>
      </div>
      <div>
         My Text 1
         <div class="cls">My Text 2
         <h1>My Text 3</h1>
      </div>
      </div>
   </body>
   
</html>

The result I want extract:

<div class="content">
    <h3>My Text 0<a href="https://www.google.com">The site of google</a></h3>
</div>

Thanks.

Consider wrapping library's code in a namespace

It is a common practice for c++ libraries to avoid naming collisions.
For example, CDocument is a very common class name in C++ projects.
It could be namespace GumboQuery { or something like this.