Metacat: Netflix Makes Their Big Data Accessible and Useful
Actian Germany GmbH
March 29, 2019
Wie viele andere Unternehmen verfügt auch Netflix über eine große Menge an Daten, die aus vielen verschiedenen Datenquellen in unterschiedlichen Formaten stammen. Als führender Streaming-Anbieter von Subscription-Video-on-Demand (SVoD) ist die Auswertung von Daten natürlich ein wichtiger strategischer Vorteil. Angesichts der Vielfalt ihrer Datenquellen suchte die Streaming-Plattform nach einer Möglichkeit, diese Daten mit Hilfe eines einzigen Tools zu bündeln und mit ihnen zu interagieren. So entstand Metacat.
Netflix’s Key Figures
Netflix has come a long way since its DVD rental company in the 1990s. Video consumption on Netflix accounts for 15% of global internet traffic. But Netflix today is also:
-
130 million paying subscribers worldwide (400% increase since 2011).
-
$10 billion turnover, including $403 million in profits.
-
$100 billion market capitalization, or the sum of all the leading television groups in Europe.
-
$6 billion investment in original creations (TV shows and movies).
Netflix is also a data warehouse of 60 petabytes (60 million billion bytes), which is a real challenge for the firm to exploit and federate this data.
Netflix’s Big Data Platform Architecture
Its basic architecture includes three key services. These are the Execution Service (Genie), the Metadata Service (Metacat), and the Event Service (Microbot).
In order to operate between its different languages and data sources, which are not very compatible with each other, Metacat was born. This tool acts as a data and metadata access layer from Netflix’s data sources. A centralized service accessible by any data user in order to facilitate their discovery, treatment, and management.
Metacat and its Features
Netflix has data queries, such as Hive, Pig, or Spark, that are not operable together. By introducing a common abstraction layer, Netflix can provide data access to its users, regardless of their storage systems.
In addition, Metacat goes so far as to simplify transferring one dataset to a datastore to another.
Business Metadata
Hand-written, user-defined, business-oriented metadata, in free format can be added via Metacat. Its main information includes the connections, configurations, metrics, and the life cycles of each dataset.
Data Discovery
By creating Metacat, Netflix makes it easy for consumers to find business datasets. The tool publishes schema and business metadata defined by its users in Elasticsearch, making it easier to find full-text information in its data sources.
Data Modification and Audit
As a cross-functional tool for all data stores, Metacat registers and notifies all changes made to the metadata and the data itself from its storage systems.
Metacat and the Future of Netflix
According to Netflix, the current version of Metacat is a step towards the new features they are working on. They still want to improve the visualization of their metadata, as it would be very useful for restoration purposes.
Metacat, according to Netflix, should also be able to have a plug-in architecture. Thus, their tool could validate and maintain all of its metadata. This is because users define metadata in free form. Therefore, Netflix needs to put into place a validation process that can be done before storing the metadata.
As a centralizing tool for multi-source and multi-format data, Netflix’s Metacat has clearly made progress.
The development of this in-house service has adapted to all the tools used by the company, allowing Netflix to become Data Driven.
Sources
Abonnieren Sie den Actian Blog
Abonnieren Sie den Blog von Actian, um direkt Dateneinblicke zu erhalten.
- Bleiben Sie auf dem Laufenden: Holen Sie sich die neuesten Informationen zu Data Analytics direkt in Ihren Posteingang.
- Verpassen Sie keinen Beitrag: Sie erhalten automatische E-Mail-Updates, die Sie informieren, wenn neue Beiträge veröffentlicht werden.
- Ganz wie sie wollen: Ändern Sie Ihre Lieferpräferenzen nach Ihren Bedürfnissen.
Abonnieren
(d.h. sales@..., support@...)