Skip to content
Make AI Good

Graph · Event

Te Hiku Media Kaitiakitanga License — initial GitHub publication (2 October 2018)

01 · In focus

One event, in the field.

The structured facts the source records about Te Hiku Media Kaitiakitanga License — initial GitHub publication (2 October 2018), the count of declared adjacencies in the corpus, and the federation map zoomed on this node and its neighbours.

event

1 declared connection

Kind
Event
Status
historical
Confidence
high
Type
license publication
Date
2018-10-02
Location
Kaitaia, Aotearoa New Zealand (license published to GitHub)
Entity ID
event-te-hiku-media-kaitiakitanga-license-release-2018-10-02
Network
View in network

Tags aotearoa, new-zealand, kaitaia, far-north, te-hiku-o-te-ika, oceania, pacific, polynesia, indigenous-led, maori, te-reo, te-reo-maori, language-revitalisation, indigenous-data-sovereignty, kaitiakitanga, kaitiakitanga-license, license-publication, data-governance, ai-ethics, community-data-governance, speech-recognition, automatic-speech-recognition, digital-colonialism, data-colonialism, anti-colonial, github-release, indigenous-ai, tikanga, community-benefit, stewardship

Te Hiku Media Kaitiakitanga License — initial GitHub publication (2 October 2018) · 1 direct neighbour visible

02 · Connections

1 adjacency, by relation.

Split by direction. Direct links are the ones Te Hiku Media Kaitiakitanga License — initial GitHub publication (2 October 2018)’s source record names; inferred backlinks are records elsewhere in the corpus that point at this entity.

Direct from this record

1 link

Links named in this entity's structured fields.

03 · Background

From the source record.

Body prose as it appears in movement-graph’s published markdown for this entity. Links to other corpus entities resolve to their graph page; links to deeper repo paths are kept as text so the page does not invent a route.

On 2 October 2018, Te Hiku Media — the Kaitaia-based Māori iwi broadcaster and indigenous-AI organisation — published the Kaitiakitanga License to a public GitHub repository, formalising the tikanga-grounded data-stewardship framework governing how te reo Māori language data collected under its Kōrero Māori project could be accessed, used, and commercialised. The publication marked the first point at which a legally operative indigenous-language data licence built on an explicit anti-colonial governance principle existed as a version-controlled open document — not an open licence in the traditional open-source sense, but a stewardship licence requiring permission and community-benefit conditions that open-source norms structurally cannot provide.

Background: Kōrero Māori and the Lionbridge controversy

The immediate context of the Kaitiakitanga License's publication was the 2017-2018 Kōrero Māori project — a crowdsourcing campaign funded under New Zealand's Ka Hao: Māori Digital Technology Fund, in which Te Hiku Media collected te reo Māori speech recordings from community contributors as the basis for training the first te reo Māori automatic speech recognition system. The project demonstrated both the technical viability of community-led indigenous-language AI and the vulnerability of that model to commercial extraction: in May 2018, Lionbridge, an American translation-services company, launched a Facebook recruitment campaign offering Māori speakers $45 USD per hour for te reo audio recordings, with the stated aim of training commercial speech-recognition technology. Te Hiku Media refused the approach and characterised it publicly as a form of digital colonialism — the company sought to extract the language corpus for commercial development with no mechanism for community oversight, consent, or benefit-sharing.

Peter-Lucas Jones, then General Manager (later Chief Executive) of Te Hiku Media, framed the Lionbridge model as one in which the company would "translate apps of any description into the native languages of indigenous people, and sell that back", describing digital misappropriation of indigenous language and cultural data as "the last frontier of colonisation". Keoni Mahelona, Te Hiku Media's Native Hawaiian Chief Technology Officer, characterised the prospect of corporate control over indigenous language data as "the icing on the cake of colonisation", noting that the issue was not whether te reo should be digitised but who should lead that digitisation and on what terms. Te Hiku Media published its "Indigenous Data Theft" article on 10 August 2018; 1News carried the controversy on 27 August 2018; the Kaitiakitanga License followed six weeks later, on 2 October 2018.

The license: kaitiakitanga as a legal-and-tikanga framework

The word kaitiakitanga has no direct English translation but carries the meaning of guardian, protector, and custodian — a stewardship relationship rather than an ownership one. The license's preamble frames the distinction from open-source licensing explicitly: by simply open sourcing data and knowledge, it states, indigenous peoples "further allow ourselves to be colonised digitally in the modern world", because open-source norms were built on an assumption of symmetric access to the means of commercial development that indigenous communities structurally do not have. The resulting data relationship — an open-source project drawing on an indigenous-community corpus — would reproduce the classical extraction pattern: community-contributed data enriches an external actor's commercial product, with no consent boundary, no benefit-sharing mechanism, and no way for the contributing community to revoke or condition access.

The Kaitiakitanga License addresses this by operating on a stewardship model. Its four core terms are: users must contact Te Hiku Media and obtain permission before accessing, using, contributing to, or modifying content in the repository; commercial use without explicit authorisation is prohibited; code derived from the repository remains bound by the same licence; and works utilising the repository's content are subject to the same terms downstream. The governing principle, drawn from the licence's name, is that data is not owned but is cared for under kaitiakitanga, and any benefit derived from that data flows back to the source community. The preamble states that Te Hiku Media will always make time to help indigenous and other underrepresented groups, while noting the licence is unlikely to be applicable within a non-indigenous context — it is explicitly positioned as indigenous infrastructure, not a general-purpose alternative to Creative Commons or the MIT Licence.

The framing that Peter-Lucas Jones has used in multiple subsequent public contexts makes the political stake concrete: "In the digital world, data is like land. If we do not have control, governance, and ongoing guardianship of our data as indigenous people, we will be landless in the digital world, too." Keoni Mahelona has framed the same stake as: "Data is the last frontier of colonization." The licence was described at publication as a "living license" intended to evolve — it would eventually branch into multiple application-specific variants (covering the Papa Reo API, the Rongo platform, and others) and serve as a template for five further Māori Data Sovereignty licences.

Adoption and impact

The Kaitiakitanga License has become a reference apparatus in the indigenous data-sovereignty literature. Te Hiku Media has been invited internationally to speak on the licence, and the licence has been adopted by a New Zealand government department and a social enterprise. Five further Māori Data Sovereignty licences have since been developed incorporating Te Hiku Media's original framework. The licence governs the Kōrero Māori crowdsourcing campaign — the data-collection exercise in which more than 2,500 contributors provided over 300 hours of labelled te reo Māori speech for training Te Hiku Media's automatic speech recognition models — affirming that the stewardship model does not prevent community-scale data contribution but specifically governs the terms on which that contribution is held and used. It also explicitly prohibits use of Te Hiku Media's tools for surveillance, discrimination, or tracking, carrying a human-rights provision that the open-source licensing tradition does not.

Significance for the corpus

The 2 October 2018 publication of the Kaitiakitanga License is the corpus's first event anchored in Aotearoa New Zealand and its first event in the Pacific, closing what the inbox framed as a zero-coverage gap across New Zealand and Pacific events. It is also the corpus's first licence-publication event — expanding the event-type range to cover the community-governance tooling layer alongside the protests, court rulings, and research launches the event slice already carries.

The event sits at the intersection of two of the corpus's movement areas — indigenous data sovereignty and algorithmic accountability — and connects them through a concrete legal instrument. The licence is the apparatus through which Te Hiku Media's theory of change becomes operational: the argument that indigenous communities should be the makers, not merely the users, of AI built from their language and knowledge requires not only the technical capability to build models locally but the legal-and-tikanga framework to keep them under indigenous stewardship, and the Kaitiakitanga License is that framework's founding document. Peter-Lucas Jones, who spearheaded its development, was named to TIME magazine's TIME100 AI 2024 list specifically for this work — a recognition that the corpus's indigenous-data-sovereignty register traces directly back to the licence published on this date.

04 · Sources

Where this came from.

6 sources listed from the pinned corpus. Links are shown only when the source URL is a valid HTTP(S) address.

  1. github.com

    Checked 2026-05-26

    TeHikuMedia/Kaitiakitanga-License GitHub repository — primary source for the initial commit date of 2 October 2018 (commit 17787df8, "Initial commit", 2018-10-02T12:51:02Z confirmed via GitHub commits API), for the license preamble verbatim framing that "by simply open sourcing our data and knowledge, we further allow ourselves to be colonised digitally in the modern world", for the four core terms (permission required before access or use; no commercial use without authorisation; derivative works remain bound; downstream works remain bound), and for the repository's self-description as a "living license" aiming to become "an international example for indigenous people's retention of mana over data and other intellectual property in a Western construct"

  2. tehiku.nz

    Checked 2026-05-26

    Te Hiku Media's own blog post on the Kaitiakitanga License — primary source for the organisation's critique that "artificial intelligence in its current form is based on the wholesale appropriation of existing culture", for the license as a stewardship framework grounded in tikanga rather than commercial open-source norms, and for the account of the licence being adopted by a government department and a social enterprise and of five further Māori Data Sovereignty licences being developed from Te Hiku Media's original

  3. tehiku.nz

    Checked 2026-05-26

    Te Hiku Media "Indigenous Data Theft" article (published 10 August 2018) — primary source for the Lionbridge controversy: Lionbridge's May 2018 Facebook recruitment campaign offering Māori speakers $45 USD/hour for voice recordings to train speech-recognition technology, Peter-Lucas Jones's characterisation of digital misappropriation of indigenous language and cultural data as "the last frontier of colonisation", and Keoni Mahelona's framing of corporate control over indigenous language data as "the icing on the cake of colonisation"

  4. 1news.co.nz

    Checked 2026-05-26

    1News article (27 August 2018) on the Lionbridge controversy — primary source for Peter-Lucas Jones's framing of Lionbridge's business model as aiming to "translate apps of any description into the native languages of indigenous people, and sell that back", for Keoni Mahelona's framing that the sovereignty question is "whether an American corporate company is the right fit to lead the data initiative", and for the parallel Jones drew with Hawaiian resistance to cultural trademarking

  5. papareo.nz

    Checked 2026-05-26

    Papa Reo home page — primary source for the verbatim kaitiakitanga-license principle that "data is not owned but is cared for under the principle of kaitiakitanga and any benefit derived from data flows to the source of the data", for the Kōrero Māori crowdsourcing campaign figures (more than 2,500 contributors, more than 300 hours of labelled te reo Māori speech), and for the license's role as the data-governance instrument governing access to Papa Reo's training corpus

  6. technologyreview.com

    Checked 2026-05-26

    MIT Technology Review feature by Sandeep Ravindran (22 April 2022) — secondary source for the Kaitiakitanga License as grounding future collaborations in the Māori principle of kaitiakitanga (guardianship) and as requiring organisations to "respect Māori values, honor consent boundaries, and share benefits with the Māori community", for Keoni Mahelona's verbatim framing "Data is the last frontier of colonization", and for Peter-Lucas Jones's verbatim framing that absent indigenous-stewardship structures Māori-language data would be "used by the very people that beat that language out of our mouths to sell it back to us as a service"

Source: entities/events/event-te-hiku-media-kaitiakitanga-license-release-2018-10-02.md in movement-graph at pin 3cc1a36.