uri looks useful maybe?
e.g. read a table and get all table cell contents
require "http"
require "xml"
require "uri"
# Provide the URL on the CLI
uri = URI.parse(ARGV[0])
# Fetch the HTML page
response = HTTP::Client.get(uri)
# Parse the HTML document
doc = XML.parse_html(response.body)
# Find all `td` elements that are inside of `table` elements
table_cells = doc.xpath_nodes("//table//td")
pp table_cells.map(&.text)
The string passed to xpath_nodes
can be customized for the content you’re scraping. But it’s important to keep in mind that it’s XPath syntax rather than CSS selectors.
If you want to use CSS selectors, you can use the kostya/lexbor
shard. It’s easy enough to use, and I use it in one of my own apps.
I get this error:
/home/drhuffman12/.cache/crystal/crystal-run-web_scraper.tmp: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
and this doesn’t help:
sudo apt-get install libssl-dev
nor
sudo apt-get install libssl3.0-dev
Also,
libssl-dev is already the newest version (3.0.2-0ubuntu1.10).
Also:
require "lexbor"
tells me Error: can't find file 'lexbor'
, eventhough I added it to my shard like
dependencies:
lexbor:
github: kostya/lexbor
and run shards install
It sounds like you’ve got v3 of LibSSL/OpenSSL installed and Crystal is trying to use v1.1. I run into a similar issue recently with Ruby on macOS. To work around it, I had to ensure that the default OpenSSL version was 1.1 and not 3. On Homebrew that involved running brew link --overwrite openssl@1.1
but I don’t remember how to do that with apt
.
Are you maybe using an old version of crystal? I am on debian unstable with just libssl 3.0.10-1, no version 1.1, and the latest crystal 1.9.2 is working fine with it
Apparantly, I have:
$ openssl version -a
OpenSSL 1.1.1s 1 Nov 2022
built on: Tue Nov 1 12:36:10 2022 UTC
platform: linux-x86_64
options: bn(64,64) md2(char) rc4(8x,int) des(int) idea(int) blowfish(ptr)
compiler: gcc-11 -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
OPENSSLDIR: "/home/linuxbrew/.linuxbrew/etc/openssl@1.1"
ENGINESDIR: "/home/linuxbrew/.linuxbrew/Cellar/openssl@1.1/1.1.1s/lib/engines-1.1"
Seeding source: os-specific