Industry Insights

Ontology Had Its Moment. Then Another One. We Were There for Both.

By Robert Goodyear•December 9, 2024•4 min read

2024: Conferences Again

SEMANTiCS 2024 in Amsterdam, ISWC 2024 with its "Large Language Models for Ontology Learning Challenge," the Semantic Web Journal's special issue on "Large Language Models, Generative AI and Knowledge Graphs." Twenty years after OWL became a W3C standard, ontology is having a moment again.

The Word

Ontos (being) plus logos (study). Aristotle's Categories around 350 BCE offered ten primitives for describing what exists: substance, quantity, quality, relation, place, date, posture, state, action, passion. In the 1970s, AI researchers borrowed the term because it fit, and Tom Gruber codified the computer science definition in 1993: "an explicit specification of a conceptualization," meaning a formal way to describe what things are and how they relate.

First Wave: 1994-2006

Tim Berners-Lee outlined the Semantic Web at the first WWW Conference in 1994, describing a web readable by machines. The 2001 Scientific American article with Hendler and Lassila crystallized the vision: everyone marks up their pages with semantic metadata, intelligent agents traverse the structured web.

The problem was adoption. Healthcare had HIPAA requirements, so medical ontologies like SNOMED CT got built. Search engines had business models, so Google built knowledge graphs. The open web had neither incentive nor mandate, which meant JSON-LD survived for SEO purposes and knowledge panels survived for search, but the universal semantic web failed to materialize.

Second Wave: 2023-Present

GPT-4 is trained on essentially the entire internet, fluent and confident, including when wrong. The initial fix was Retrieval-Augmented Generation: rather than relying on what the model memorized, retrieve relevant documents at inference time and include them in the prompt. Text chunks work, but relationships are implicit and multi-hop reasoning is fragile.

Microsoft's GraphRAG in 2024 integrated knowledge graphs into the retrieval pipeline, where the graph structure explicitly captures entity relationships. The model traverses connections instead of hoping they're implicit in text. Gartner now predicts graph technologies will appear in 80% of data and analytics innovations by end of 2025, up from 10% in 2021. Structured knowledge matters because LLM reliability demands it.

2014

Same municipal bond, three different prices across systems at one institution. These were material discrepancies affecting portfolio decisions, not rounding, a problem that haunted me for years as I watched every platform speak its own language.

Simon Property Group: "Real Estate" in one system, "Financials" in another (old GICS), "Alternatives" in a third (endowment model), "Discretionary" in a fourth (consumer exposure lens). Same REIT, four classifications, four risk models, four different answers to simple questions. DTCC and Oliver Wyman estimated $20-40 billion annually in reconciliation, failed trades, and manual mapping. You cannot surface cross-asset patterns when systems disagree on what assets are.

2016-2020

Early prototypes classified bonds by cash flows rather than legal names. The key was a directed acyclic graph that lets instruments inherit multiple properties: a convertible bond is both debt and equity-linked without the system choking, with no circular dependencies. By 2020, machine learning classified new securities with 99.7% accuracy across 10+ million instruments, and the models found patterns humans missed, including munis behaving like corporate debt and REITs trading like fixed income.

This became ReferenceModel, the classification backbone for Aaim's lending infrastructure and the asset taxonomy for SocketCloud's distributed systems.

Convergence

We weren't tracking LLM papers in 2014 because we were solving a concrete problem: financial systems couldn't agree on what things were. The LLM researchers needed structured knowledge to ground model outputs; we needed it to make incompatible systems interoperable. Same destination, different routes.

Unstructured information has limits, and at some point you need explicit relationships and formal categories. Aristotle used substance and quality, Gruber used conceptualizations, the LLM researchers use knowledge graphs, and we use directed acyclic graphs of financial instruments.

Sources

Robert Goodyear

Founder/CEO

Robert Goodyear is the founder of Aaim, a financial technology company providing alternative asset infrastructure to financial institutions.