Essential International Standards and Registries for Web Developers
- Programming, Quality Assurance, Security
Latest revision:
The following is a collection of free international standards, registries and references that I collected throughout the years while developing websites and web services. These references, while very precise and technical by their nature, are extremely useful in order to ensure that a specific implementation is actually correct, and to mitigate unexpected interoperability between systems on the Internet.
As it's not always clear how a technology is used based on its name or acronym, I included the primary use case for each reference along with its name and/or acronym.
Also, many of these standards are built over each other, and as such I tried to list them in that order as much as possible while maintaining clarity.
Note that many of these standards use a metalanguage defined by the following:
- Augmented Backus-Naur Form (ABNF) base definition
- Augmented Backus-Naur Form (ABNF) case-sensitive extension
Real world stuff
The following provides IDs and critical information about worldwide social, political and cultural concepts often referenced.
- Country code ISO database (ISO 3166 codes)
- Country code United Nations list (M49)
- Currency code list (ISO 4217 codes)
- Language code list (ISO 639 codes)
- Language tags
- Matching language tags
- Language subtag list
- Language tag extension list
- Phone numbering (E.164)
- Phone numbers notation (E.123)
- Time zone database
- Timestamps
Plain text
The following explains how text is handled by a computer.
- Character ID list (Unicode)
- Normalizing equivalent character strings (UAX #15)
- Unicode normalization charts
- Character encoding list for general use
- Character encoding list for web pages
- Encoding binary data as text (Base64)
Note that the most popular character encoding is UTF-8, a superset of ASCII.
IP adresses
The following explains how computers can identify and talk to each other on the Internet.
Domain names
The following explains how to find information about a particular domain on the Internet, including the IP adresses of its services.
Note that host names are domain names on which a website can be hosted.
- Host name original definition
- Host name extended definition (section 2.1)
- Domain names (DNS) part 1
- Domain names (DNS) part 2
- Domain name resource record types (RR TYPEs) list
- Wildcards in domain names
- Punycode
- International domain names (IDNA) part 1
- International domain names (IDNA) part 2
- International domain names (IDNA) part 3
- International domain names (IDNA) part 4
- International domain names (IDNA) part 5
- International domain names (IDNA) contextual rules list
- Special domain name list
- Domain name root zone list
- Domain name public suffix list
Note that the original version of IDNA is not fully backwards-compatible with the current version. While the current version is used in all major browsers nowadays, some other clients may still be in transition. For more information, refer to UTS #46.
TLS
The following explains how an encrypted connection can be established between two machines over a network.
- Sharing public encryption keys (X.509 certificates) (often misnamed "SSL certificates")
- Secure communication protocol (TLS)
- Secure communication protocol (TLS) parameter lists
- X.509 certificate management automation (ACME)
Note that SSL is an obsolete technology that was superseded by TLS.
Also note that there is another standard that enables secure storing of TLS public keys on the DNS called DANE, which technically makes certificates obsolete and also enables mandatory secure connections to servers, but it currently suffers from operational issues preventing widespread support and deployment.
Emails
The following explains how emails work.
Note that Pluralsight subscribers can watch my course Configuring and Managing SPF, DKIM, and DMARC, which cover some of these topics.
- Transmission of emails (SMTP)
- Email base definition
- Required email adresses
- Emails with multiple senders
- Email extensions part 1
- Email extensions part 2
- Email extensions part 3
- International emails
- Email message header list
- Domain-based email sources authorization (SPF)
- Updating email source during forwarding (SRS)
- Cryptographic signatures on emails (DKIM)
- Cryptographic signatures on emails (DKIM) parameter list
- Domain-based email authentication policy (DMARC)
- Email authentication for international emails
- Email authentication parameter lists
- SMTP over TLS reporting (TLSRPT)
- Strict SMTP over TLS (MTA-STS)
XML
The following explains how to use XML, a data format that forms the base of all web pages.
- XML 1.0
- Navigating XML documents (XPath 1.0)
- Defining XML schemas (XSD) part 1
- Defining XML schemas (XSD) part 2
Note that newer versions of XML and XPath exist, but are seldom used.
JSON
The following explains how to use JSON, a common data format commonly used by websites.
URLs
The following explains how to interpret URLs.
- URLs
- URL scheme list
about
URL scheme token list- Using URLs as data
- Well-known URLs
- Well-known URLs suffix list
HTTP
The following explains how web clients interact with websites.
- HTTP and HTTPS fundamentals
- HTTP/1.1
- HTTP/2
- HTTP/3
- HTTP method list
- HTTP parameter lists
- HTTP status code list
- HTTP header list
- Media type list
- Cache directive list
- Cookies
- Strict HTTP over TLS (HSTS)
- Strict HTTP over TLS (HSTS) preload directive
Static web
The following explains how to write a web page.
- Web fundamentals
- Web interface definition language (Web IDL)
- Web document representation (DOM)
- Fetching web resources (Fetch)
- Hypertext (HTML)
- Styling (CSS)
- Raster graphics with lossless compression (PNG)
- Scalable vector graphics (SVG)
- Mathematical formulas (MathML)
- Restricting unauthorized content (CSP)
- News feeds (Atom) (often misnamed "RSS")
- Security research management (security.txt)
- Security research management (security.txt) field list
Note that RSS is an obsolete technology that was superseded by Atom.
Metadata
The following describes non-standard HTML meta tags found on the Internet. Note that standard ones are documented in the HTML specifications.
- Meta tags understood by Facebook (Open Graph)
- Meta tags understood by Twitter
- Publicly-known non-standard meta tag list
Client-side programming
The following explains how to write and automatically interact with dynamic web pages. Note that some API implementations are already described in the HTML definition listed in the previous section.
- High-level programming (ECMAScript) (often misnamed "JavaScript")
- API to display notifications
- API to display elements fullscreen
- API to fetch data from a server (XMLHttpRequest) (not related to XML)
- API to communicate with a server (WebSockets)
- API to communicate with other web clients (WebRTC)
- APIs for data streams
- API to manage local or session storage
- API to access the file system
- Low-level programming (WebAssembly)
- API to access low-level code
- Low-level code integration with the web
- HTTP API to access a web browser's UI (WebDriver)
External APIs
The following describe common ways web servers implement publicly-facing APIs.
- APIs using HTTP (REST)
- Defining REST APIs (OpenAPI) (often misnamed "Swagger")
- Data model query and manipulation language (GraphQL)
Note that I am not including SOAP, WSDL or any other technology used on top of them due to their many competing versions and extensions, and as they are rarely used outside of complex financial transactions. Many API providers that do use them generally offer REST APIs anyway.
Data access management
The following explains how websites should manage access to secure data, including cases where authentication is done by a third-party.
- Usernames and passwords containing international characters
- HTTP authentication scheme list
- One-time passwords (OTP)
- Time-based one-time passwords (TOTP)
- Passwordless authentication (WebAuthn)
- Passwordless authentication (WebAuthn) attestation statement formats and extensions
- XML-based single sign-on (SAML)
- JSON-based authorization (OAuth 2.0)
- JSON-based authorization (OAuth 2.0) bearer tokens
- JSON-based authorization (OAuth 2.0) parameter list
- JSON-based identity validation (OpenID Connect)
Accessibility
The following explains how to write web pages to be accessible for people with disabilities.
- Accessibility guidelines for web pages (WCAG)
- Adding accessibility information to web pages (ARIA)
- Implicit accessibility information in HTML
Markdown
The following defines a humanly-readable plain text format that can be easily converted to hypertext with tools.
Other common data formats
The following defines a few other common data formats that can be found on the web.
- Spreadsheet table data (CSV)
- End-to-end encryption or signing (OpenPGP) (often misnamed "PGP" or "GPG")
- End-to-end encryption or signing (OpenPGP) parameter lists
- Human-readable JSON superset (YAML) (surprisingly complex)
- Compressed archives (ZIP)
- Semantic version numbering (optional)
Related content I wrote
A Technical Introducition to MathML Core for Writing Mathematics on the Web
- Programming, Mathematics
Thanks to recent efforts, all major web browsers currently support MathML Core, a subset of MathML focused on important presentation markup, to support mathematics on the web. As of this writing, the MathML Core specifications are still not finalized, but given its strong origins and support, it can…
The New Open Source Video Game Randomizer List Is Now Live
- Video Games, Programming
Time to update your bookmarks! After a few months of work behind the scenes, the new open source version of The BIG List of Video Game Randomizer is now live for your enjoyment, with dark mode support and a brand new UI for better readability! The new URL is: https://randomizers.debigare.com/ (The…
The Future of the Video Game Randomizer List
- Video Games, Programming, Anecdotes
It's hard to believe that it's been almost 8 years since I first posted on the ROMhacking.net forums a list of video game randomizers that I found online, and that it would evolve into the massive project it has become today, with almost 900 entries currently being listed. It's always a strange…
Minifying JSON Text Beyond Whitespace
- Programming, Mathematics
JSON is a common data serialization format to transmit information over the Internet. However, as I mentioned in a previous article, it's far from optimal. Nevertheless, due to business requirements, producing data in this format may be necessary. I won't go into the details as to how one could…
Current Generative AIs Have Critical Quality Issues
- Business, Quality Assurance, Security
The hype for generative AI is real. It is now possible for anybody to dynamically generate various types of media that are good enough to be mistaken as real, at least at first glance, either for free or at a low cost. In addition, the seemingly-creative solutions they come up with, and the…