uri looks useful maybe?
e.g. read a table and get all table cell contents
uri looks useful maybe?
require "http" require "xml" require "uri" # Provide the URL on the CLI uri = URI.parse(ARGV) # Fetch the HTML page response = HTTP::Client.get(uri) # Parse the HTML document doc = XML.parse_html(response.body) # Find all `td` elements that are inside of `table` elements table_cells = doc.xpath_nodes("//table//td") pp table_cells.map(&.text)
The string passed to
xpath_nodes can be customized for the content you’re scraping. But it’s important to keep in mind that it’s XPath syntax rather than CSS selectors.
If you want to use CSS selectors, you can use the
kostya/lexbor shard. It’s easy enough to use, and I use it in one of my own apps.
I get this error:
/home/drhuffman12/.cache/crystal/crystal-run-web_scraper.tmp: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
and this doesn’t help:
sudo apt-get install libssl-dev
sudo apt-get install libssl3.0-dev
libssl-dev is already the newest version (3.0.2-0ubuntu1.10).
Error: can't find file 'lexbor', eventhough I added it to my shard like
dependencies: lexbor: github: kostya/lexbor
It sounds like you’ve got v3 of LibSSL/OpenSSL installed and Crystal is trying to use v1.1. I run into a similar issue recently with Ruby on macOS. To work around it, I had to ensure that the default OpenSSL version was 1.1 and not 3. On Homebrew that involved running
brew link --overwrite email@example.com but I don’t remember how to do that with
Are you maybe using an old version of crystal? I am on debian unstable with just libssl 3.0.10-1, no version 1.1, and the latest crystal 1.9.2 is working fine with it
Apparantly, I have:
$ openssl version -a OpenSSL 1.1.1s 1 Nov 2022 built on: Tue Nov 1 12:36:10 2022 UTC platform: linux-x86_64 options: bn(64,64) md2(char) rc4(8x,int) des(int) idea(int) blowfish(ptr) compiler: gcc-11 -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG OPENSSLDIR: "/firstname.lastname@example.org" ENGINESDIR: "/home/linuxbrew/.linuxbrew/Cellaremail@example.com/1.1.1s/lib/engines-1.1" Seeding source: os-specific