Skip to content
  • HCLSoftware
  • Support
  • Community
  • Downloads
  • Documentation
  • Login
A graphic logo of the Actian Company A graphic logo of the Actian Company
  • Products Products
    • blue data icon for Actian

      Data + AI Intelligence

      Actian Data Intelligence Platform New
      Cloud-native SaaS solution that democratizes data access and accelerates your data-driven initiatives
      Actian Data Platform
      Easily connect, manage, and analyze data with a unified platform
    • blue Database icon for Actian

      Databases

      • Zen
        Low-maintenance embedded database
      • Actian NoSQL
        Databases for complex object networks
      • Actian Ingres
        Scalable and reliable transactional processing
      • HCL Informix®
        Fast, cost-optimized enterprise data management
    • blue line chart icon for Actian

      Analytics

      • Vector
        High performance, vectorized columnar analytics database
    • integrations

      Data Management

      • DataConnect
        Highly scalable hybrid integration solution
      • Data Quality
        Make informed decisions that drive your business forward
      • DataFlow
        Parallel execution platform data-in-motion
    • Bar Chart blue icon

      App Modernization

      • OpenROAD
        Database, object-oriented rapid app development
      • VoltMX
        Low code app development platform
    • See All Actian Products
    • blue square with right arrow pointing up

      Deployment

      Deployment

      Cloud, hybrid and on-premises

      • Google Cloud Launch your journey to Google with Actian
      • Amazon Web Services Launch your journey to AWS with Actian
      • Microsoft Azure Launch your journey to Azure with Actian
    See All Actian Products Explore All Deployment Partners
  • Solutions Solutions

    Solutions by Industry

    • Manufacturing
    • Transportation and Logistics
    • Banking, Financial Services, and Insurance
    • Healthcare and Life Sciences
    • Retail and Consumer Goods
    • Energy and Utilities

    Use Cases

    • Gen AI icon
      GenAI Data Readiness A quick checklist to evaluate your GenAI readiness
    • blue layer icon for Actian
      Flexible Data Integration Collect, transform, and automate data pipelines
    • database blue icon
      Data Warehouse Modernization Low-risk, simplified migration to a modern data warehouse deployed on-premises and in multiple clouds
    • blue communications solutions icon
      Enterprise Data Marketplace Discover, access, and share data products
    • blue cloud icon for Actian
      Edge-to-Cloud Analytics Modernize application data processing and analytics at the Edge
    • blue dataflow chart icon for Actian
      Customer Data Analytics Hub Get real-time actionable customer intelligence across all your customer experience data silos
    Explore All Industry Solutions
  • Customers Customers

    Customers

    • blue users icon for actian
      Our Customers Join a growing community of businesses across diverse industries who trust Actian to unlock the power of their data

    Featured Customer Stories

    • blue user icon for actian
      Academy Bank
    • blue user icon for actian
      Tsubakimoto
    View All Customers
  • Partners Partners

    Partners

    • blue info square icon for Actian
      Program Overview Competitive solutions, industry-leading incentives and a comprehensive support package
    • blue check icon for Actian
      Become a Partner Accelerate your business with the Actian Partner Program
    • blue Bezier Icon for Actian
      Technology Partners Partnering to create a force multiplier
    • blue user icon for actian
      Refer a Lead Protect your customer, grow your business
    • Find a partner icon
      Find a Partner Leverage expertise and insights from our partner network
  • Learn Learn

    Learn

    • Image Indent Left Icon
      Blog
    • graduation hat blue icon
      Actian Academy
    • book blue icon
      Resources
    • blue icon with paper and magnifying glass for Actian
      Guides
    • blue square
      Webinars
    • blue list logo
      Glossary
    View All Resources
  • Company Company

    Company

    • blue Actian logo
      About Us
    • announcement blue icon
      Newsroom
    • question blue icon
      About HCLSoftware
    • blue briefcase icon for Actian
      Careers
    • blue users icon Actian
      Leadership
    • blue check icon for Actian
      Awards and Recognition
    • Calendar blue icon
      Events
    • message blue icon
      Contact Us
    Learn More About Actian
Take a Tour Request Demo Login
  • Support
  • Community
  • Downloads
  • Documentation
  • HCLSoftware
Learn more about our data solutions
Contact Us
Data Intelligence

Marquez: The Metadata Discovery Solution at WeWork

Actian Corporation

December 10, 2020

Marquez v2 EN

Latest Blog Posts

Keep up with the latest data trends

Subscribe

Created in 2010, WeWork is a global office and workspace leasing company. Their objective is to provide space for teams of any size including startups, SMEs, and major corporations, to collaborate. To achieve this, what WeWork provides can be broken down into three different categories:

  • Space: To ensure companies with optimal space, WeWork must provide the appropriate infrastructure, which consists of booking rooms for interviews / one on ones or even entire buildings for huge corporations. They also must make sure they are equipped with the appropriate facilities such as kitchens for lunch and coffee breaks, bathrooms, etc.
  • Community: Via WeWork’s internal application, the firm enables WeWork members to connect with one another, whether it’s local within their own WeWork space, or globally. For example, if a company is in need of feedback for a project from specific job titles (such as a developer or UX designer), they can directly ask for feedback and suggestions via the application to any member, regardless of their location.
  • Services: WeWork also provides their members with full IT services if there are any problems as well as other services such as payroll services, utility services, etc

In 2020, WeWork represents:

  • More than 600,000 memberships.
  • Locations in 127 cities in 33 different countries.
  • 850 offices worldwide.
  • Generated $1.82 billion in revenue.

It is clear that WeWork works with all sorts of data from their staff and customers, whether that be individuals or companies. The huge firm was therefore in need of a platform where their data experts could view, collect, aggregate, and visualize their data ecosystem’s metadata. This was resolved by the creation of Marquez.

This article will focus on WeWork’s implementation of Marquez mainly through free & accessible documentation provided on various websites, to illustrate the importance of having an enterprise-wide metadata platform in order to truly become data-driven.  

Why Manage and Utilize Metadata?

In his talk “A Metadata Service for Data Abstraction, Data Lineage & Event-based Triggers” at the Data Council back in 2018, Willy Lulciuc, Software Engineer for the Marquez project at WeWork explained that metadata is crucial for three reasons:

  • Ensuring Data Quality: When data has no context, it is hard for data citizens to trust their data assets: are there fields missing? Is the documentation up to date? Who is the data owner and are they still the owner? These questions are answered through the use of metadata.
  • Understanding Data Lineage: Knowing your data’s origins and transformations are key to being able to truly know what stages your data went through over time.
  • Democratization of Datasets: According to Willy Lulciuc, democratizing data in the enterprise is critical! Having a central portal or UI available for users to be able to search for and explore their datasets is one of the most important ways companies can truly create a self-service data culture.

To sum up: creating a healthy data ecosystem. Willy explains that being able to manage and utilize metadata creates a sustainable data culture where individuals no longer need to ask for help to find and work with the data they need. In his slide, he goes through three different categories that make up a healthy data ecosystem:

  1. Being a self service ecosystem, where data and business users have the possibility to discover the data and metadata they need, and explore the enterprise’s data assets when they don’t know exactly what they are searching for. Providing data with context, gives the ability to all users and data citizens to effectively work on their data use cases.
  2. Being self-sufficient by enabling data users the freedom to experiment with their datasets as well as having the flexibility to work on every aspect of their datasets whether they input or output datasets for example.
  3. And finally, instead of relying on certain individuals or groups, a healthy data ecosystem allows for all employees to be accountable for their own data. Each user has the responsibility to know their data, their costs (is this data producing enough value?) as well as keeping track of their data’s documentation in order to build trust around their datasets.

Room Booking Pipeline Before

As mentioned above, utilizing metadata is crucial for data users to be able to find the data they need. In his presentation, Willy shared a real situation to prove metadata is essential: WeWork’s data pipeline for booking a room.

For a “WeWorker”, the steps are as follows:

  1. Find a location (the example was a building complex in San Francisco).
  2. Choose the appropriate room size (usually split into the number of attendees – in this case they chose a room that could greet 1 – 4 people).
  3. Choose the date for when the booking will take place.
  4. Decide on the time slot the room is booked for as well as the duration of the meeting.
  5. Confirm the booking.

Now that we have an example of how their booking pipeline works, Willy proceeds to demonstrate how a typical data team would operate when wanting to pull out data on WeWork’s bookings. In this case, the example exercise was to find the building that held the most room bookings, and extract that data to send over to management. The steps he stated were the following:

  • Read the room bookings from a data source (usually unknown).
  • Sum up all of the room bookings and return the top locations.
  • Once the top location is calculated, the next step is to write it into some output data source.
  • Run the job once a hour.
  • Process the data through .csv files and store it somewhere.

However, Willy stated that even though these steps seem like it’s going to be good enough, usually, there are problems that occur. He goes over three types of issues during the job process:

  1. Where can I find the job input’s dataset?
  2. Does the dataset have an owner? Who is it?
  3. How often is the dataset updated?

Most of these questions are difficult to answer and jobs end up failing. Without being sure and trusting this information, it can be hard to present numbers to management. These sorts of problems and issues are what made WeWork develop Marquez.

What is Marquez?

Willy defines the platform as an “open-sourced solution for the aggregation, collection, and visualization of metadata of [WeWork’s] data ecosystem”. Indeed, Marquez is a modular system and was designed as a highly scalable, highly extensible platform-agnostic solution for metadata management. It consists of the following components:

  • Metadata Repository: Stores all job and dataset metadata, including a complete history of job runs and job-level statistics (i.e. total runs, average runtimes, success/failures, etc).
  • Metadata API: RESTful API enabling a diverse set of clients to begin collecting metadata around dataset production and consumption.
  • Metadata UI: Used for dataset discovery, connecting multiple datasets and exploring their dependency graph.

Marquez’s Design

Marquez provides language-specific clients that implement the Metadata API. This enables a  diverse set of data processing applications to build a metadata collection. In their initial release, they provided support for both Java and Python.

The Metadata API extracts information around the production and consumption of datasets. It’s a stateless layer responsible for specifying both metadata persistence and aggregation. The API allows clients to collect and/or obtain dataset information to/from the Metadata Repository.

Metadata needs to be collected, organized, and stored in a way to allow for rich exploratory queries via the Metadata UI. The Metadata Repository serves as a catalog of dataset information encapsulated and cleanly abstracted away by the Metadata API.

According to Willy, what makes a very strong data ecosystem is the ability to search for information and datasets. Datasets in Marquez are indexed and ranked through the use of a search engine based keyword or phrase as well as the documentation of a dataset: the more a dataset has context, the more it is likely to appear first in the search results. Examples of a dataset’s documentation is its description, owner, schema, tag, etc.

You can see more detail of Marquez’s data model in the presentation itself here: https://www.youtube.com/watch?v=dRaRKob-lRQ&ab_channel=DataCouncil

The Future of Data Management at WeWork

Two years after the project, Marquez has proven to be a big help for the giant leasing firm. They’re long term roadmap is to solely focus on their solution’s UI, by including more visualizations and graphical representations in order to provide simpler and more fun ways for users to interact with their data.

They also provide various online communities via their Github page, as well as groups on LinkedIn for those who are interested in Marquez to ask questions, get advice or even report issues on the current Marquez version.

Sources

A Metadata Service for Data Abstraction, Data Lineage & Event-Based Triggers, WeWork. Youtube: https://www.youtube.com/watch?v=dRaRKob-lRQ&ab_channel=DataCouncil

29 Stunning WeWork Statistics – The New Era Of Coworking, TechJury.com: https://techjury.net/blog/wework-statistics/

Marquez: Collect, aggregate, and visualize a data ecosystem’s metadata, https://marquezproject.github.io/marquez/

Marquez: An Open Source Metadata Service for ML Platforms Willy Lulciuc
actian avatar logo

About Actian Corporation

Actian makes data easy. Our data platform simplifies how people connect, manage, and analyze data across cloud, hybrid, and on-premises environments. With decades of experience in data management and analytics, Actian delivers high-performance solutions that empower businesses to make data-driven decisions. Actian is recognized by leading analysts and has received industry awards for performance and innovation. Our teams share proven use cases at conferences (e.g., Strata Data) and contribute to open-source projects. On the Actian blog, we cover topics ranging from real-time data ingestion, data analytics, data governance, data management, data quality, data intelligence to AI-driven analytics.
  • Metadata Management
  • Share withTwitter Icon
  • Share withLinkedin Icon
  • Share withFacebook Icon
  • Share withMail Icon

Subscribe to the Actian Blog

Subscribe to Actian’s blog to get data insights delivered
right to you.

  • Stay in the know – Get the latest in data analytics pushed directly to your inbox.
  • Never miss a post – You’ll receive automatic email updates to let you know when new posts are live.
  • It’s all up to you – Change your delivery preferences to suit your needs.

Subscribe

This email extension () is not allowed. Please update.
This personal email address domain () is not allowed. Please update.

Thank you for subscribing to the Actian Blog!

Get ready to stay informed and inspired with the latest insights, trends, and updates in the world of data analytics and technology.

Expect our carefully curated articles, case studies, and industry news to land in your inbox soon.

Also of Interest:
  • Data Intelligence for Smarter Decisions
  • Get a 360-Degree Customer View
  • Actian Named a Top Data Quality Vendor

Platforms

  • Actian Data Intelligence Platform
  • Actian Data Platform

Capabilities

  • Data Analytics
  • Databases
  • Data Integration & Quality
  • Application Services

Solutions

  • Manufacturing
  • Financial Services
  • Healthcare Data Analytics
  • Transportation & Logistics
  • Communications

Company

  • About Actian
  • About HCLSoftware
  • Events
  • Awards & Recognition
  • Newsroom
  • Press
  • Careers
  • Locations

Customers

  • Support
  • Community
  • Documentation
  • Customer Portal Login
  • Actian Data Platform Login

Get Started

  • Request Demo
  • Contact Us
Actian
© 2025 Actian Corporation. All Rights Reserved.
  • x social icon
  • facebook
  • Linkedin
  • GitHub
  • youtube
  • Terms of Use
  • Modern Slavery Policy
  • Privacy Policy
  • Trademark Guidelines
  • Patents
  • Security
hcl-logo