Andrew Yourtchenko and Dr. Tony Przygienda left great feedback to my Display screen Scraping in 2025 weblog submit, however sadly they like commenting on a closed platform with ephemeral content material; the one technique to make their ideas accessible to a wider viewers is by reposting them. Andrew first:
I hold saying CLI is an API. Nevertheless, it’s a lot easier and an simpler technique to adapt to the modifications, if these three situations are met:
- regexes are written in a defensive, but permissive trend (generously ignore the areas and features that don’t match, however be sure to ignore each areas and tabs)
- for the information that you simply do seize, be very conservative, such that your doubtless end result if one thing goes awry is not any information quite than rubbish information.
- deal with all of the parsed information as Possibility enum in a language which permits for that and at all times verify whether or not it’s Some(worth) or None earlier than utilizing it.
The (3) you’ll have to do anyway even with structured API, when dealing with the modifications. For the (1) and (2), if (in some legendary universe), distributors have been to publish the regexes, will probably be indistinguishable from the opposite transports. (I’m in fact leaving apart the query of information conversions, as a result of they’re equally an issue when utilizing the “structured” APIs as effectively, simply of a special form.
It seems like Cisco (having to cope with historic codebase with printf statements sprinkled throughout it) did one thing like what Andrew steered no less than as soon as (pyATS/Genie) if not twice (ConfD on Cisco IOS/XE).
Not surprisingly, Tony disagreed (most likely based mostly on his battle scars):
Sorry, it’s largely placing lipstick on you recognize what. It’s unimaginable to know as a vendor what sort of “sensible regexes” some buyer put in that may cope with “any change” till they’ll’t. As a result of regardless of the “sensible regex” is it’s nonetheless one thing that does essentially not perceive the semantic construction of the underlying output. And having handled a few of it it’s concerning the third circle of hell to keep up such “tremendous sensible regexes” with backtracking and no matter else not …
I’ve to agree with Tony: regexes suck, and I at all times want to work with structured information… if solely the distributors wouldn’t make it so cumbersome that it’s simpler to cope with the ache of screen-scraping.