title | subtitle | author | institute | date | lang | keywords | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Exploring FIBO Complexity With Crunchbase |
Representing Crunchbase IPOs in FIBO |
Vladimir Alexiev |
Ontotext Corp |
14-Apr-2024 |
en |
|
The Financial Industry Business Ontology (FIBO) by the Enterprise Data Management Council (EDMC) is a family of ontologies and a reference model for representing data in the financial world using semantic technologies. It is used in fintech Knowledge Graph (KG) projects because it offers a comprehensive and principled approach to representing financial data, and a wide set of predefined models that can be used to implement data harmonization and financial data integration. The 2022Q2 FIBO release consisted of 290 ontologies using 380 prefixes (see [@allemangInfrastructureCollaborativeOntology2021; @garbaczReasoningFIBOOntology2022] for details) that cover topics such as legal entities, contracts, agency, trusts, regulators, securities, loans, derivatives, etc. FIBO's reach and flexible ontological approach allow the integration of a wide variety of financial data, but it comes at the price of more complex representation.
Crunchbase (CB) is a well-known dataset by TechCrunch that includes companies, key people, funding rounds, acquisitions, Initial Public Offerings (IPOs), etc. It has about 2M companies with a good mix of established enterprises (including 47k public companies), mid-range companies and startups. We (Ontotext and other Wikidata contributors) have matched 72k CB companies to Wikidata, see this query.
I explore the representation of Crunchbase data (more specifically IPOs) in FIBO and compare it to the simplest possible semantic representation. I therefore illustrate the complexity of FIBO, and explain its flexibility along the way. I finish with some discussion and conclusions as to when FIBO can bring value to fintech KG projects.
This example is available as open source at https://github.com/VladimirAlexiev/crunchbase-fibo and includes the following files:
Makefile
: orchestrate file generation withmake
ipos-sample.csv
: 4 sample CSV rowscb-model.ttl, cb-model.png
: simple CB model (all CB data) and generated imageipos-agents.ttl, ipos-agents.png
: part of model and generated imageipos-offering.png, ipos-offering.ttl
: part of model and generated imageipos-financials.png, ipos-financials.ttl
: part of model and generated imageipos-currencies.ttl, ipos-currencies.png
: part of model and generated imageipos-fibo.ttl, ipos-fibo.png
: full FIBO IPO model (concatenated) and generated imageipos-fibo.ru
: OntoRefine SPARQL UPDATE transformation generated from the full FIBO IPO modelcommon.h
: C preprocessor file with SPARQL "functions" used in generating transformationprefixes.ttl, prefixes.rq
: all used prefixes in Turtle and SPARQL formatREADME.md, README.html
: this writeupbibliography.bib, bibliography.html
: bibliography source and rendered HTMLacm-sig-proceedings-long-author-list.csl
: bibliography stylepandoc-defaults.yaml
: pandoc settingspandoc-header.html
: JavaScript to enable syntax highlightingcss/*
: styles for syntax highlighting
Crunchbase consists of 18 tables (available as CSV and a JSON API) that cover companies, universities, persons, financial transactions, events (conferences and workshops), etc. The gist Crunchbase Challenge describes the complete database, but in this blog post I focus on Initial Public Offerings (IPOs). An IPO is when a company goes public (is listed at a stock exchange and starts trading), which is considered one possible "exit" for its investors and founders.
Crunchbase has a table ipos
with the following fields:
field | type | descr |
---|---|---|
uuid | string | Unique identifier, never changes |
name | string | Entity name (often empty for IPOs) |
type | string | Entity type (always "ipo" for IPOs) |
permalink | string | Suffix of cb_url. Despite the name, sometimes changes |
cb_url | anyURI | Full Crunchbase URL of the IPO event |
rank | integer | Crunchbase rank (smaller is "more important") |
created_at | dateTime | When the Crunchbase record was created |
updated_at | dateTime | When the Crunchbase record was updated |
org_uuid | string | Points to the company that was listed |
stock_exchange_symbol | string | Exchange code. Uses internal Crunchbase codes that are ambiguous |
stock_symbol | string | Ticker on the exchange |
went_public_on | dateTime | When the company went public (IPO date) |
share_price_usd | decimal | The share price for the stock at the time of IPO, in US dollars |
share_price | decimal | The share price for the stock at the time of IPO, in local currency |
share_price_currency_code | string | Local currency of share price |
valuation_price_usd | decimal | Valuation of the Organization at IPO, in US dollars |
valuation_price | decimal | Valuation of the Organization at IPO, in local currency |
valuation_price_currency_code | string | Local currency of the valuation |
money_raised_usd | decimal | Total amount raised from the IPO, in US dollars |
money_raised | decimal | Total amount raised from the IPO, in local currency |
money_raised_currency_code | string | Local currency of the total amount raised |
There are also organization attributes (org_name, org_cb_url, country_code, state_code, region, city
) that are redundant, thus not shown above.
- The first 8 fields are present in pretty much every CB entity
- The next 13 fields are specific to the IPO entity
stock_exchange_symbol
uses internal Crunchbase exchange codes that are sometimes ambiguous (for example, "MSE" may mean the Madrid Stock Exchange, Metropolitan Exchange in India, Mongolia Stock Exchange, etc). We (Ontotext) have mapped all 156 codes to Wikidata, together with the unambiguous Market Identification Codes (ISO 10383 MIC), see this query:
select ?item ?itemLabel ?cb ?mic {
?item wdt:P7534 ?mic; # ISO MIC
p:P528 [ps:P528 ?cb; # catalog code
pq:P972 wd:Q10846831]. # catalog
service wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
stock_symbol
is the company code on that exchange. This is a valuable field for coreferencing public companies across datasets, but one should beware that:- Stock symbols are only 85% unique world-wide, so they need to be interpreted in conjunction with the Exchange field
- The same company may have several stock symbols (even on the same exchange over time), and symbols may be sold and bought between companies over time
- "Share price" is the stock price at the time of IPO (not sure whether the opening or closing price)
- "Valuation" (market capitalization) is the product of all outstanding shares times the share price
- "Money raised" is the product of share price times the shares sold during the IPO
- These 3 financial factors are represented with 3 fields each, to account for local currencies and USD
For example, here are the first 4 CB IPO records
field | example1 | example2 | example3 | example4 |
---|---|---|---|---|
uuid | 72d30ebd-53ef-2486-6c29-22785c5173ce | 3ad2b068-2d97-f646-0b80-1e5f3d7adfc4 | a265c6f6-4b96-4079-096a-967a37f3da2b | ee426509-826e-5dd0-9309-e79c8f384904 |
type | ipo | ipo | ipo | ipo |
permalink | microsoft-ipo--72d30ebd | the-walt-disney-company-ipo--3ad2b068 | divx-ipo--a265c6f6 | xo-group-ipo--ee426509 |
cb_url | https://www.crunchbase.com/ipo/microsoft-ipo--72d30ebd | https://www.crunchbase.com/ipo/the-walt-disney-company-ipo--3ad2b068 | https://www.crunchbase.com/ipo/divx-ipo--a265c6f6 | https://www.crunchbase.com/ipo/xo-group-ipo--ee426509 |
rank | 31712 | 44186 | 14752 | 19369 |
created_at | 2008-02-09 05:25:18 | 2008-02-09 05:40:32 | 2008-02-25 23:52:11 | 2008-02-29 00:31:34 |
updated_at | 2018-02-12 23:11:05 | 2019-02-25 22:31:49 | 2018-02-12 23:57:54 | 2018-02-12 23:41:42 |
org_uuid | fd80725f-53fc-7009-9878-aeecf1e9ffbb | 756936c0-c335-f0ae-0a3d-fe26bdff5695 | 73296f0d-85a5-78d5-90b3-86c5f8981ba9 | ff8439cf-097c-a88a-9bb9-dd83d23aa14b |
stock_exchange_symbol | nasdaq | nyse | nasdaq | nyse |
stock_symbol | MSFT | DIS | DIVX | XOXO |
went_public_on | 1986-03-13 | 1978-01-13 | 2006-10-22 | 1999-12-02 |
share_price_usd | 16.0 | |||
share_price | 16.0 | |||
share_price_currency_code | USD | |||
valuation_price_usd | 160000000 | |||
valuation_price | 160000000 | |||
valuation_price_currency_code | USD | |||
money_raised_usd | 300000000 | 145000000 | 35000000 | |
money_raised | 300000000 | 145000000 | 35000000 | |
money_raised_currency_code | USD | USD | USD |
The simplest possible semantic representation of CB uses one class per table
(subsidiary tables like org_descriptions
and org_parent
are merged to other classes; and org_parents
is a direct RDF relation).
I make URLs based on entity type and uuid
.
I leverage the fact that UUIDs are globally unique (no conflicts between classes) to put Organizations
and People
under the abstract superclass Agent
and in the same URL namespace cb/agent
:
this is needed since some financial transactions (in particular funding_round
) can involve a mix of persons and organizations.
The following diagram shows the complete CB semantic model of 18 entities:
It is generated from a semantic model in Turtle format (cb-model.ttl
) using rdfpuml [@Alexiev2016-rdfpuml-rdf2rml] and PlantUML.
This model is concatenated from 18 individual per-table models that include source field names in parentheses (also seen on the overall model).
The individual models are used to generate semantic transformations for Ontotext Refine in the form of SPARQL Update queries.
The same updates are used for both initial ingest and update, leveraging a global timestamp and using named graphs per each table row (over 10M graphs).
See gist Crunchbase Challenge and [@Alexiev-ENDORSE-2023] for more details,
including the model source cb-model.ttl
and timing (performance) information of the semantic conversions.
Important: Please note that this model covers all of CB, not just CB IPOs. The IPO class (node) is shown with red border.
Now I turn to representing IPOs in FIBO by using the ontologies from the FIBO 2024Q1 release (see gist Converting FIBO from RDF to Turtle).
I use the following colors in the diagrams below.
- lightblue: agents (Offeror=Issuer, Exchange=RegistrationAuthority)
- yellow: Shares and share-related events (Offering, Listing)
- red: Financial factors (share price, market capitalization, money raised)
- green: currencies (and their codes)
- lightgreen: other codes and codesets
See the next sections for entity diagrams of different kinds of entities (partial models), followed by an overall model diagram, and a binding of the colors to specific RDF classes.
First I represent the agents involved in the IPO: Exchange and Issuer. (Perhaps the main agent is the public that invests money into the shares, but they are not individually represented in FIBO).
- Stock exchanges are identified by internal CB codes.
(As mentioned above, some CB Exchange codes are ambiguous, unlike the ISO 10383 MIC standard).
- FIBO distinguishes between entities and their identifiers/codes in various coding schemes, which allows us to capture the CB exchange code scheme.
- A FIBO related project is working on representing the MIC coding scheme, which involves not just exchanges and their primary markets, but also secondary markets on these exchanges.
- A future effort could coreference CB and MIC exchanges precisely in FIBO, leveraging our work in Wikidata.
- FIBO distinguishes the share Issuer/Offeror (a role) from the Agent playing that role (a CB organization)
- As you can see, many entities obtain multiple types (classes). This allows to capture not only typical situations (as per CB) but also to cover exceptional corner cases
- The
Issuer
of shares is typically also theirOfferor
, but I guess it's possible for the Offeror to be a third party - The
Exchange
is also typically theRegistrationAuthority
of tickers and shares on that exchange, but I guess exceptions are possible
- The
Now I represent the main entities of the IPO event, which in FIBO are:
- Share
- Offering of the share to the general public
- Listing of the share at a stock exchange
- Ticker that is allocated to the share
Please note that PublicOffering
includes a number of non-FIBO properties (i.e. CB custom properties):
cb:uuid '(uuid)';
cb:name '(name)';
cb:permalink '(permalink)';
cb:url '(cb_url)'^^xsd:anyURI;
cb:rank '(rank)'^^xsd:integer;
cb:createdAt 'fixDate(created_at)'^^xsd:dateTime;
cb:updatedAt 'fixDate(updated_at)'^^xsd:dateTime.
Truth be told, these are not very semantic:
- They include CB data about an event (
name, permalink, url, rank
) that is not properly attributed to CB;url
could be mapped to a standard FIBO property. FIBO has a property for "home page" but it's not fair to say that CB's page about an IPO is its home page, because in all likelihood there are better pages at the Exchange and Company websites describing the IPO- If you need a proper ontology for representing web pages and their relation to real-world things, use
schema.org
. It has no less than 5 properties for relations of varying strength and nuance:url, sameAs, mainEntity, about, mentions
. (And there's alsokeywords
used for free-text keywords.)
- They also include book-keeping information (
uuid, createdAt, updatedAt
) that is not about the event, but about CB's record of the event.
However, I didn't want to complicate the model even further by placing these fields in yet more FIBO nodes.
Furthermore, we'd have to repeat the same event nodes for both Offering
and Listing
.
The 3 financial factors (share price, market capitalization, money raised) are expressed in 2 currencies each:
Of the 3 factors, Market Capitalization is represented using a more complex pattern:
- The
<.../marketCap/...>
node beingMarketCapitalization
, which is based on thepricePerShare
and has value<.../marketCapValue/...>
- The
<.../marketCapValue/...>
node beingMonetaryAmount
Please note the URL patterns used by the financial nodes in "national currency" vs in "USD":
<cb/ipo/(uuid)/pricePerShare/(share_price_currency_code)> vs <cb/ipo/(uuid)/pricePerShare/USD>
<cb/ipo/(uuid)/marketCap/(valuation_price_currency_code)> vs <cb/ipo/(uuid)/marketCap/USD>
<cb/ipo/(uuid)/marketCapValue/(valuation_price_currency_code)> vs <cb/ipo/(uuid)/marketCapValue/USD>
In an initial version of the mapping, I used a simpler pattern:
<cb/ipo/(uuid)/pricePerShare> vs <cb/ipo/(uuid)/pricePerShareUsd>
<cb/ipo/(uuid)/marketCap> vs <cb/ipo/(uuid)/marketCapUsd>
<cb/ipo/(uuid)/marketCapValue> vs <cb/ipo/(uuid)/marketCapValueUsd>
The difference is subtle but crucial: the current URL pattern effectively merges the financial nodes where the "national currency" is "USD". Since many IPOs are denominated in USD, I can save a significant number of nodes by using URL patterns that incorporate the currency code.
I assume that the monetary fields (eg (money_raised)
vs (money_raised_usd)
) are identical for US IPOs.
If that's true, then the multiple instances of the same statement will be collapsed by the semantic repository on data ingestion,
so we won't have duplicate statements in the repository.
Finally, I represent the currencies used for the Financials. This includes USD plus up to 3 national currencies for the 3 financials. (I can't imagine an IPO that would use different national currencies for its 3 financials, but Crunchbase has used separate fields, so I represent separate nodes).
Please note that these nodes are shared between all IPOs, so they are not a large number. And for IPOs that use the same currency for their 3 financials, only 1 pair of nodes will be generated.
FIBO distinguishes betwen currencies and their codes in a particular code set (in this case "CrunchBase currency code set").
But CB uses ISO 4217 standard currency codes, and FIBO already includes such data in the ontology FND/Accounting/ISO4217-CurrencyCodes.rdf
, e.g.:
fibo-fnd-acc-4217:USD
rdf:type fibo-fnd-acc-cur:CurrencyIdentifier , owl:NamedIndividual ;
fibo-fnd-rel-rel:hasTag "USD" ;
rdfs:label "USD" ;
cmns-dsg:denotes fibo-fnd-acc-4217:USDollar ;
cmns-id:identifies fibo-fnd-acc-4217:USDollar ;
fibo-fnd-acc-4217:USDollar
rdf:type fibo-fnd-acc-cur:Currency , owl:NamedIndividual ;
cmns-dsg:hasName "US Dollar" .
rdfs:label "US Dollar" ;
fibo-fnd-acc-cur:hasNumericCode "840" ;
cmns-cxtdsg:isUsedBy lcc-3166-1:VirginIslandsBritish ...
So why didn't I reuse the FIBO currency nodes and instead made CB currency nodes? The reason is subtle and is described in fibo/issues/1816:
- As you can see above, currency nodes use the currency name, eg
fibo-fnd-acc-4217:USDollar
- But in CB we have the currency code, eg
USD
- The financial nodes must connect to the currency nodes, not the currency code nodes
- Given
fibo-fnd-acc-4217:USD
, we can very easily findfibo-fnd-acc-4217:USDollar
with a simple SPARQL query - However, most semantic ETL tools (e.g. TARQL, RML, Karma) cannot run a SPARQL query (or even access a simple RDF pattern) while performing ETL over tabular data
- Only tools like OntoRefine, SPARQL Generate and XSPARQL can join tabular source data to RDF data in a repository.
So I decided to play it safe and use the CB currency nodes. The EDM Council has agreed to change the node names a bit to:
fibo-fnd-acc-4217:CurrencyCode-USD, fibo-fnd-acc-4217:Currency-USDollar
And to add a separate coreferencing file that will have sameAs
statements for each currency like this:
fibo-fnd-acc-4217:Currency-USDollar owl:sameAs fibo-fnd-acc-4217:Currency-USD
Generated with the following SPARQL query:
construct {
?curr owl:sameAs ?currAsCode
} where {
?code a fibo-fnd-acc-cur:CurrencyIdentifier; fibo-fnd-rel-rel:hasTag ?c; cmns-id:identifies ?curr.
bind(iri(concat(str(fibo-fnd-acc-4217:),"Currency-",?c)) as ?currAsCode)
}
This will effectively add alias URLs for each currency (the two sameAs
URLs shown above),
so now external data like CB can safely refer to currency nodes like fibo-fnd-acc-4217:Currency-USD
.
Finally, I present the overall FIBO IPOs model:
It's a complex graph:
- It has 25 nodes, of which 11 are shared with other IPOs (the currencies and currency codes) and 14 are per-IPO
- It has 136 triples; compare to the simplest model that uses 1 node and 22 triples
- It uses terms from many ontologies: 13 from FIBO and 8 from the Object Management Group (OMG)'s Commons Ontology Library (see prefixes.ttl).
@prefix cb: <https://ontotext.com/crunchbase/ontology/> .
@prefix cmns-cds: <https://www.omg.org/spec/Commons/CodesAndCodeSets/> .
@prefix cmns-col: <https://www.omg.org/spec/Commons/Collections/> .
@prefix cmns-cxtdsg: <https://www.omg.org/spec/Commons/ContextualDesignators/> .
@prefix cmns-dsg: <https://www.omg.org/spec/Commons/Designators/> .
@prefix cmns-dt: <https://www.omg.org/spec/Commons/DatesAndTimes/> .
@prefix cmns-id: <https://www.omg.org/spec/Commons/Identifiers/> .
@prefix cmns-qtu: <https://www.omg.org/spec/Commons/QuantitiesAndUnits/> .
@prefix cmns-rlcmp: <https://www.omg.org/spec/Commons/RolesAndCompositions/> .
@prefix fibo-fbc-fct-mkt: <https://spec.edmcouncil.org/fibo/ontology/FBC/FunctionalEntities/Markets/> .
@prefix fibo-fbc-fct-ra: <https://spec.edmcouncil.org/fibo/ontology/FBC/FunctionalEntities/RegistrationAuthorities/> .
@prefix fibo-fbc-fi-fi: <https://spec.edmcouncil.org/fibo/ontology/FBC/FinancialInstruments/FinancialInstruments/> .
@prefix fibo-fbc-pas-fpas: <https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/FinancialProductsAndServices/> .
@prefix fibo-fnd-acc-cur: <https://spec.edmcouncil.org/fibo/ontology/FND/Accounting/CurrencyAmount/> .
@prefix fibo-fnd-agr-ctr: <https://spec.edmcouncil.org/fibo/ontology/FND/Agreements/Contracts/> .
@prefix fibo-fnd-arr-id: <https://spec.edmcouncil.org/fibo/ontology/FND/Arrangements/IdentifiersAndIndices/> .
@prefix fibo-fnd-rel-rel: <https://spec.edmcouncil.org/fibo/ontology/FND/Relations/Relations/> .
@prefix fibo-ind-mkt-bas: <https://spec.edmcouncil.org/fibo/ontology/IND/MarketIndices/BasketIndices/> .
@prefix fibo-sec-eq-eq: <https://spec.edmcouncil.org/fibo/ontology/SEC/Equities/EquityInstruments/> .
@prefix fibo-sec-sec-id: <https://spec.edmcouncil.org/fibo/ontology/SEC/Securities/SecuritiesIdentification/> .
@prefix fibo-sec-sec-iss: <https://spec.edmcouncil.org/fibo/ontology/SEC/Securities/SecuritiesIssuance/> .
@prefix fibo-sec-sec-lst: <https://spec.edmcouncil.org/fibo/ontology/SEC/Securities/SecuritiesListings/> .
What is the reason for this complexity?
- FIBO distinguishes between Offering (the IPO), Share, Listing (of the share at an exchange), Ticker (a
ReassignableIdentifier
) - FIBO represents the financials as explicit nodes (and Market Capitalization as 2 nodes).
- FIBO represents the stock exchange and its code as two separate nodes
- Finally, FIBO represents currencies and their codes as two separate nodes
I have validated this representation with the EDM Council (see fibo/issues/1808): performed 3 iterations and implemented all suggested additions/corrections.
The initial version of this paper (5-Sep-2023) used FIBO 2022Q2. But there were numerous changes to FIBO since then. This includes:
- Integration of a number of patterns that are now part of the OMG's Commons Ontology Library Specification v1.1 beta, many of which were derived from FIBO and improved on.
- Additional contentw as added to FIBO in securities and derivatives that was not available when this example was initially created.
Elisa Kendall graciously updated the example to use the new ontology terms. For the curious, a detailed diff can be seen on Github. The following namespaces were changed:
- ADDED
@prefix cmns-cds: <https://www.omg.org/spec/Commons/CodesAndCodeSets/> .
@prefix cmns-col: <https://www.omg.org/spec/Commons/Collections/> .
@prefix cmns-dsg: <https://www.omg.org/spec/Commons/Designators/> .
@prefix cmns-dt: <https://www.omg.org/spec/Commons/DatesAndTimes/> .
@prefix cmns-cxtdsg: <https://www.omg.org/spec/Commons/ContextualDesignators/> .
@prefix cmns-id: <https://www.omg.org/spec/Commons/Identifiers/> .
@prefix cmns-qtu: <https://www.omg.org/spec/Commons/QuantitiesAndUnits/> .
@prefix cmns-rlcmp: <https://www.omg.org/spec/Commons/RolesAndCompositions/> .
- DELETED
@prefix fibo-fnd-utl-alx: <https://spec.edmcouncil.org/fibo/ontology/FND/Utilities/Analytics/> .
@prefix fibo-fnd-dt-fd: <https://spec.edmcouncil.org/fibo/ontology/FND/DatesAndTimes/FinancialDates/> .
@prefix lcc-lr: <https://www.omg.org/spec/LCC/Languages/LanguageRepresentation/> .
I made the overall model by simply concatenating the Turtle files of the individual models presented in earlier sections.
Concatenating any Turtle files produces a valid turtle file, since @prefix
and @base
can occur anywhere in Turtle.
(But don't abuse this feature by redefining prefixes mid-way!)
If you have played with UML XMI files, the simplicity of composing semantic models by simply concatenating them should be refreshing. It mirrors the simplicity of semantic data integration by "simply" converting any kind of data to RDF, using consistent URLs, and "pouring" all data into a semantic repository.
I have used some rdfpuml
(PlantUML) layout instructions in the models to improve the look of the overall model.
These specify the direction and length of a few arrows, and set colored circles for the classes (called "UML stereotype").
cmns-cxtdsg:appliesTo puml:arrow puml:up.
cmns-id:identifies puml:arrow puml:up.
cmns-qtu:hasArgument puml:arrow puml:up.
cmns-rlcmp:isPlayedBy puml:arrow puml:up.
fibo-fbc-fi-fi:isDenominatedIn puml:arrow puml:down-4.
fibo-fnd-acc-cur:isPriceFor puml:arrow puml:up.
fibo-fnd-rel-rel:isIssuedBy puml:arrow puml:up.
cmns-cds:CodeSet puml:stereotype "(C,lightgreen)".
cmns-id:IdentificationScheme puml:stereotype "(S,lightgreen)". # also cmns-cds:CodeSet
cmns-id:Identifier puml:stereotype "(I,lightgreen)".
fibo-fbc-fct-mkt:Exchange puml:stereotype "(X,lightblue)". # also fibo-fbc-fct-ra:RegistrationAuthority
fibo-fbc-fi-fi:Issuer puml:stereotype "(I,lightblue)". # also fibo-fbc-pas-fpas:Offeror
fibo-fnd-acc-cur:Currency puml:stereotype "(C,green)".
fibo-fnd-acc-cur:CurrencyIdentifier puml:stereotype "(I,green)".
fibo-fnd-acc-cur:MonetaryAmount puml:stereotype "(A,red)".
fibo-fnd-acc-cur:MonetaryPrice puml:stereotype "(P,red)".
fibo-ind-mkt-bas:MarketCapitalization puml:stereotype "(C,red)".
fibo-sec-eq-eq:PricePerShare puml:stereotype "(P,red)".
fibo-sec-sec-id:TickerSymbol puml:stereotype "(T,lightgreen)".
fibo-sec-sec-iss:PublicOffering puml:stereotype "(O,yellow)".
fibo-sec-sec-lst:ListedSecurity puml:stereotype "(S,yellow)". # also fibo-sec-eq-eq:Share
fibo-sec-sec-lst:Listing puml:stereotype "(L,yellow)".
Although the FIBO IPO model is considerably more complex than the CB IPO model,
we can use the rdf2sparql
script (part of the RDF by Example open source project)
to generate a semantic transformation automatically [@Alexiev-ENDORSE-2023].
ipos-fibo.ru
is an OntoRefine SPARQL UPDATE transformation (240 lines)
generated from the FIBO IPO model that has the following parts:
- 22 prefixes, collected manually from the various model parts
delete graph
that empties the named graph for the row, specified with a comment in the first model file:# GRAPH <cb/ipos/(uuid)>
:
delete {graph ?cb_ipos_uuid_URL {?_s_ ?_p_ ?_o_}}
where {
service <rdf-mapper:ontorefine:PROJECT_ID> {
bind(?c_uuid as ?uuid)
bind(iri(concat("cb/ipos/",?uuid)) as ?cb_ipos_uuid_URL)
bind(?c_updated_at as ?updated_at)
}
<cb> cb:updatedAt ?UPDATED_AT_DT
bind(replace(str(?UPDATED_AT_DT),'T',' ') as ?UPDATED_AT) filter(?updated_at > ?UPDATED_AT)
graph ?cb_ipos_uuid_URL {?_s_ ?_p_ ?_o_}};
CrunchBase has about 18 tables and each row has uuid
and updated_at
timestamp.
This allows us to process the whole dump and daily updates using the same generated set of scripts,
by storing each row of each table in a separate named graph (about 14M graps).
We clear only graphs that have been updated since the last ingest.
The service <rdf-mapper:ontorefine:PROJECT_ID>
pattern is executed against an OntoRefine virtual SPARQL endpoint,
whereas cb:updatedAt
fetches a global timestamp recorded in the real RDF repository.
See rdf2sparql: Global Filtering for details.
- 144 lines in the
insert graph
part that come from the IPO model - 57
bind
to compute values from source CSV fields.
Binds come in several varieties:
- Simple data cleaning, eg
bind(REPLACE(?created_at,' ','T') as ?created_at_FIXDATE)
- Attach datatypes, eg
bind(strdt(?created_at_FIXDATE,xsd:dateTime) as ?created_at_FIXDATE_xsd_dateTime)
- Compute URLs, eg
bind(iri(concat("cb/ipo/",?uuid,"/pricePerShare/USD")) as ?cb_ipo_uuid_pricePerShare_USD_URL)
Here are some of the binds. Hopefully you can appreciate how a simple declarative model is used to generate a complex (even cryptic) transformation:
bind(REPLACE(?created_at,' ','T') as ?created_at_FIXDATE)
bind(strdt(?created_at_FIXDATE,xsd:dateTime) as ?created_at_FIXDATE_xsd_dateTime)
bind(REPLACE(?updated_at,' ','T') as ?updated_at_FIXDATE)
bind(strdt(?updated_at_FIXDATE,xsd:dateTime) as ?updated_at_FIXDATE_xsd_dateTime)
bind(iri(concat("cb/ipo/",?uuid,"/listing")) as ?cb_ipo_uuid_listing_URL)
bind(iri(concat("cb/currency/",?share_price_currency_code)) as ?cb_currency_share_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/ticker")) as ?cb_ipo_uuid_ticker_URL)
bind(iri(concat("cb/currency/",?valuation_price_currency_code)) as ?cb_currency_valuation_price_currency_code_URL)
bind(iri(concat("cb/currency/",?money_raised_currency_code)) as ?cb_currency_money_raised_currency_code_URL)
bind(strdt(?money_raised,xsd:decimal) as ?money_raised_xsd_decimal)
bind(strdt(?money_raised_usd,xsd:decimal) as ?money_raised_usd_xsd_decimal)
bind(iri(concat("cb/ipo/",?uuid,"/marketCap/",?valuation_price_currency_code)) as ?cb_ipo_uuid_marketCap_valuation_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/marketCapValue/",?valuation_price_currency_code)) as ?cb_ipo_uuid_marketCapValue_valuation_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/pricePerShare/",?share_price_currency_code)) as ?cb_ipo_uuid_pricePerShare_share_price_currency_code_URL)
bind(iri(concat("cb/ipo/",?uuid,"/marketCap/USD")) as ?cb_ipo_uuid_marketCap_USD_URL)
bind(iri(concat("cb/ipo/",?uuid,"/marketCapValue/USD")) as ?cb_ipo_uuid_marketCapValue_USD_URL)
bind(iri(concat("cb/ipo/",?uuid,"/pricePerShare/USD")) as ?cb_ipo_uuid_pricePerShare_USD_URL)
FIBO is good for capturing real-world complexity and integrating data from numerous sources, e.g.
- If you need to integrate "tape ticker" information (data about stock prices),
these will mesh nicely with the CB data on "price per share" on the day of the IPO.
The FIBO IPO model uses single
pricePerShare
nodes, but it's easy to multiply these by adding a timestamp to the node URL. In contrast, the simple CB model has only one slot for "price per share". - You can represent evolutions like a ticker being reassigned to another company, or the exchange listing being purchased by another company
However, that flexibility comes at the price of higher complexity. Compare:
- The first figure that shows all of CB in the simplest possible semantic form, to
- The last figure that shows only IPOs represented in FIBO
Should you use FIBO in fintech applications?
- As a definitive reference model, yes.
- As the "physical" semantic integration model: depends on your requirements including:
- Number and variety of sources
- Ability to ingest data sources in the future, without changing
- Complexity of competence questions
- Ability of data consumers (query writers) to navigate the complex graph efficiently
- Possibility to introduce some query abstractions or "facades" to simplify querying. As such examples I can point out to GraphQL (e.g. as implemented over RDF by Ontotext Semantic Objects) and to CIDOC CRM Fundamental Relations search [@Alexiev2012-CRM-search; @Alexiev2013-CRM-reasoning].
Such calls rest ultimately with the Semantic Data Architect, who should evaluate the tradeoffs of different representations.
Finally, I close with a couple of quips regarding FIBO:
- FIBO definitely comes "With Batteries Included".
- But as the Romans said, "Caveat Emptor".
- And as Americans say, "Your Mileage May Vary".
- I am grateful to Elisa Kendall and Pawel Garbacz for their feedback on my IPO representation in FIBO.
- I am especially grateful to Elisa Kendall or updating the example to FIBO version 2024Q1.
::: {#refs} :::