• mandate
  • Posts
  • ontology & what palantir does

ontology & what palantir does

given that we're building the palantir killer it's probably good to share a primer on what palantir does that isn't just "gov stuff + consulting"

this article aims to explain the origin of ontology and how the world developed w/o it. why we need it. how Palantir developed it. why the world needs one.

it was also adapted from an internal textql memo that was horribly written by a total moron — your mileage may vary

how the world developed w/o ontology [setting]

the state of the world today

  • there are two types of data companies

    • horizontal data companies sold to IT & R&D, can be customized, can handle generalized data, and can be built on AWS

    • vertical data companies had rigid specialized data, could not be customized, sold to business teams, and had to deliver value

no product can be customized and is also built for business value delivery

how it came about

  • if you sell to IT teams [general]

    • they don’t care about business value

    • they want generalizability

    • results in thinner and thinner layers on the cloud providers

    • you build it for as flexible a data model as possible, w/ no templates

  • if you sell to business teams [opinionated]

    • they don’t care about generalizability [in the absolute sense]

    • they want business value per unit cost to be high

    • result is competition forces you to forfeit the long tail of generalizable business use cases

    • you build as little data model as possible - cannot add to it

thse budgets are divided by the cloud providers

  • if your user has to think about the cloud providers,

  • if your user doesn’t, it’s a business unit

so how did this ecosystem evolve?

over time, this resulted in both apps and infra becoming thinner and thinner layers of pointier and pointier point solutions

  • for infra

    • you have storage and compute [snowflake]

      • but then transformations on that [dbt]

        • and then monitoring of those pipelines [monte carlo]

          • and then catalogs of those monitors [atlan]

  • for apps

    • netsuite: need customer object… to run business

      • salesforce: need specific customer object for sales

        • gong: need specific customer object for sales call recording

          • clari: need specific customer object for using notes from sales call records to help forecast sales comp planning

  • over time this means a combinatorial explosion of data models

    • there’s nothing connecting them

    • there is no shared ontology that you can use to bring things together

      • teh apps aren’t built on top of teh data

why we need ontology [problem]

this seems fine - why is this a problem?

frog boiling in water, each point solution feels fine but they pay a debt of complexity

entity resolution is impossible to disentangle,

source of truth is unresolvable

ontology / semantic layer is unresolvable

resulting in:

  1. you don’t know what’s going on - too many competing answers

    • this results in companies taking 10^5. times longer to answer questions than they need to

      • ^this also means AI cannot meaningfully find the right answer

  2. you pay way more money, in both # of apps & amount of data stored

    • this results in companies paying 5-10x more for software than they need to

who benefits from this trend?

  • every vendor that doesn’t deserve to exist benefits from this trend

    • but in a combinatorial way, the underlying infra benefits [cloud providers]

      • the cloud providers get 10x more compute as “account info” is stored 100x redundantly

      • and every calculation is done 100x more times than necessary

so why can’t we build a generalizable business solution for all of this?

it’s cost-prohibitive to get off the ground

to reach business value parity w/ a point solution but a generalizable data model, you need to build out 10x as much

you cannot get a business team to pay 10x as much to subsidize the early R&D

you cannot motivate an IT team to pay 10x as much

if you observe the

how Palantir developed ontology [solution]

so that’s it? everyone’s fucked against a future of more and more layers?

unless there is an industry sector where budgets like this are more integrated

where you can bundle a huge amount of use cases under a large enough value prop that you can justify building the whole thing in a generalized way

this would be able to subsidize the R&D costs of building out so much, and all the unknown unknowns

a solution like this once built could then make its way down the market since the R&D costs are already fronted

the government is a huge use case, w/ huge budgets, and no centralized IT spending but rather end-to-end contracts

served by a company like… Palantir

who’s been making its way into the commercial sector

after years of false starts and failures

w/ a generalized system of record

that it started w/ $100M contracts like that w/ swiss Re

and is recently releasing PLG versions of their platform… via AIP

Palantir has this?

yes.

palantir was basically in a hyperbaric chamber training like Vegeta — and subsidized by $4B of losses and costs from peter thiel

to develop a platform that can connect the app layer to build directly on top of a generalized storage

thus creating a composable system of record

Palantir has every component of the modern data stack

  • ingest

  • storage

  • transform

  • query

  • model

  • notebook

  • ML

but also a generalized framework for business app building, for apps like

  • customer service engine [general]

  • account payable automation [finance]

  • dynamic scheduling [provider healthcare]

  • inventory rebalancing [supply chain]

  • warranty claims [fraud and insurnace]

because they owned both sides of the equation - the business case and the general data infra, they needed to build the connective tissue to connect both pieces. so they developed Ontology

Part 2

this is a 101 for what palantir does, how it relates to the modern data stack

this article assumes that you know what the modern data stack is

if you don’t know - do not look at matt turk’s trash; use this https://a16z.com/emerging-architectures-for-modern-data-infrastructure/

  • its a meme that no one knows what palantir does

    • palantir is the apple of the modern data stack, in that it has built an integrated ecosystem with tightly coupled inter-operability across all the pieces of the data stack

in order of furthest from the user to closest to the user, the world is

category

MDS Winner

Palantir Foundry Equivalent

Data Infra

Ingestion and Transport

Fivetran

HyperAuto

Orchestration & Transforamtion

Airflow / dbt

Pipeline Bulider

Storage & Compute

Snowflake / Databricks

Foundry

Semantic Layer

[no winner, LookML]

Ontology

Analytics

Business Intelligence

Tableau

Quiver

GraphDB UI

Neo4J

Vertex

Notebooks

Hex / Jupyter

Contour / Code Workbook

Office

Document Processing

Word / Notion

Notepad

Spreadsheets

Excel / Equals

Fusion

Workflows

App Building [No Code]

Retool

Workshop

App Building [Low Code]

Webflow

Slate

Robotic Process Automation

Zapier / UIPath

Automate / Workflow Builder

  • claiming something is the “apple” of X is a big claim, it implies that there is very strong ecosystem synergy that results from combining it, that no point solution can replicate

    • we believe this claim is true

    • the glue that wraps a lot of the apple products together its the core suite of apple apps, and how they feel

    • in the case of palantir - it is ontology

  • recapping the need for ontology

    • palantir was the only company that served use cases where unique combinations data integration → user-facing applications

    • as such they had to build a composable data stack [not unlike the modern data stack] that’s tightly integrated with their composable application stack [which is very different from the rest of the world]

    • this tight integration is held together by a flexible system of record called “ontology”

  • parker conrad’s compound startup principle is here - by bundling everything together

    • they can make use of shared abstractions for interoperability guarantees

      • [in this case, the ontology]

    • they can also charge much more by bundling the costs of a ton of components underneath the parent

  • going to war with SAP

    • ripping and replacing systems of record by replacing workflows on top, before ripping out the whole thing

    • palantir’s internal teams view the world this way