My question is, is it possible to use only HTTP::Client or crest like tools, waiting the www.ip111.cn to finished it iframe tag, then use a simple parser, e.g. lexbor, parse the expected response?
If you mean how does it know whether you’re in China, it probably uses something like MaxMind to geolocate the user’s IP address. It can also use ASNs as an even more performant way to determine whether they’re in a given network. At my company, we use ASNs in the algorithm that detects bot traffic to determine whether to block the request before they even reach the app.
Sure, that can be done with just the stdlib.
require "http/client"
require "xml"
url = "http://www.ip111.cn/"
if iframe = XML.parse_html(HTTP::Client.get(url).body).xpath_node("//iframe")
if url = iframe["src"]?
uri = URI.parse(url)
ip = uri.host
pp ip
else
puts "No IP address in the iframe's `src` attribute."
end
else
puts "no iframe"
end
Sorry for confusing, what i expected is get my public ip when visit a site which within China or outside China.
This is almost exactly what i do for now, check this code, what i really what is, get the result from the current visit website directly, instead of do the request on my side,
if waiting a while, it will show IP when visit (1)China site, (2)outside China site (3)blocked by China site.
If we use selenium instead(although it is indeed a bit overkill for this simple tools), we can query on those element, until it return the expected(as the IP address above) result.
My question is, is it if possible when use simple HTTP::Client do same work?
@zw963 what your code is doing I think is correct - when a browser receives an HTML payload from a server that contains an iframe tag with a src attribute, the browser also makes a request to that provided URL and puts the newly returned HTML into the iframe tag in the original page (i.e. the browser is also making two requests).
Selenium is essentially a headless (or not headless, maybe) browser, so it runs through those 2+ requests for you. I don’t think it would necessarily make sense for the HTTP::Client to do this for you, since the behavior of the iframe tag is part of the HTML specification, not the HTTP protocol itself.
the browser also makes a request to that provided URL and puts the newly returned HTML into the iframe tag in the original page (i.e. the browser is also making two request
The situation will become complex because some site probably not use iframe, even, direct visit to the provided URL(for get really ip) is not possible, because it need some token to provided. https://ip.skk.moe/ is a example, see screenshot for a token example.