Data Intelligence

Air France: Their Big Data Strategy in a Hybrid Cloud Context

Actian Corporation

October 22, 2020

airfrance big data

Air France-KLM is the leading group in terms of international traffic departing from Europe. The airline is a member of the SkyTeam alliance, consisting of 19 different airlines, offering access to a global network of more than 14,500 daily flights in over 1,150 destinations worldwide. In 2019, Air France represented:

  • 104.2 million passengers.
  • 312 destinations.
  • 119 countries.
  • 546 aircrafts.
  • 15 million members enrolled in their “Flying Blue” loyalty program*.
  • 2,300 flights per day*.

At the Big Data Paris 2020, Eric Poutrin, Lead Enterprise Architect Data Management & Analytics at Air France, explained how the airline business works, what Air France’s Big Data structure started as, and how their data architecture is today in the context of a hybrid cloud structure.

How Does an Airline Company Work?

Before we start talking about data, it is imperative to understand how an airline company works from the creation of its flight path to its landing. 

Before planning a route, the first step for an airline such as Air France is to have a flight schedule. Note that in times of health crises, they are likely to change quite frequently. Once the flight schedule is set up, there are three totally separate flows that activate for a flight to have a given departure date and time: 

  • The flow of passengers, which represents different forms of services to facilitate the traveler’s experience along the way, from buying tickets on their various platforms (web, app, physical) to the provision of staff or automatic kiosks in various airports to help travelers check in, drop off their luggage, etc.
  • The flow of crew management, with profiles adapted to the qualifications required to operate or pilot the aircraft, as well as the management of flight attendant schedules.
  • The engineering flow which consists of getting the right aircraft with the right configuration at the right parking point.

However, Eric tells us that all this… is in an ideal world:

“The “product” of an airline goes through the customer, so all of the hazards are visible. And, they all impact each other’s flows! So the closer you get to the date of the flight, the more critical these hazards become.”

Following these observations, 25 years ago now, Air France decided to set up a “service-oriented” architecture, which allows, among other things, the notification of subscribers in the event of hazards on any flow. These real-time notifications are pushed either to agents or passengers according to their needs: prevention of technical difficulties (an aircraft breaking down), climate hazards, prevention of delays, etc.

“The objective was to bridge the gap between a traditional analytical approach and a modern analytical approach based on omni-present, predictive and prescriptive analysis on a large scale” affirmed Eric.

Air France’s Big Data Journey

The Timeline

In 1998, Air France began their data strategy by setting up an enterprise data warehouse on the commercial side, gathering customer, crew and technical data that allowed the company’s IT teams to build analysis reports. 

Eric tells us that in 2001, following the SARS (Severe Acute Respiratory Syndrome) health crisis, Air France had to redeploy their aircrafts following the ban on incoming flights to the United States. It was the firm’s data warehouse that allowed them to find other sources of revenue, thanks to their machine learning and artificial intelligence algorithms. This way of working with data had worked well for 10 years and even allowed the firm to overcome several other difficulties, including the tragedy of September 11, 2001 and the crisis of rising oil prices.

In 2012, Air France’s data teams decided to implement a Hadoop platform in order to be able to perform predictive or prescriptive analysis (depending on individual needs) in real time, as the data warehouse no longer met these new needs and the high volume of information that was to be managed. It was only in a few months after the implementation of Hadoop, KAFKA, and other new-generation technologies that the firm was able to obtain much “fresher” and more relevant data.

Since then, the teams have been constantly improving and optimizing their data ecosystem in order to stay up to date with new technologies and thus, allow data users to work efficiently with their analysis.

Air France’s Data Challenges

During the conference, Eric also presented the firm’s data challenges in the implementation of a data strategy:

  • Delivering a reliable analytics ecosystem with quality data.
  • Implementing technologies adapted for all profiles and their use cases regardless of their line of business.
  • Having an infrastructure that supports all types of data in real time.

Air France was able to resolve some of these issues with the implementation of a robust architecture (which notably enabled the firm to withstand the COVID-19 crisis), as well as the setting up of dedicated teams, the deployment of applications and the security structures, particularly regarding the GDPR and other pilot regulations. 

However, Air France KLM has not finished working to meet their data challenges. With ever-increasing volumes of data, the number of data and business users growing, managing data flows across the different channels of the enterprise and managing data is a constant work of governance:

“We must always be at the service of the business, and as people and trends change, it is imperative to make continuous efforts to ensure that everyone can understand the data”.

Air France’s Unified Data Architecture

The Unified Data Architecture (UDA) is the cornerstone of Air France. Eric explains that there are four types of platforms:

The Data Discovery Platform

Separated into two different platforms, they are the applications of choice for data scientists and citizen data scientists. They allow, among other things, to:

    • Extract the “knowledge” from the data.
    • Process unstructured data, (text, images, voice, etc.).
    • Have predictive analytics support to understand customer behaviors.

A Data Lake

Air France’s data lake is a logical instance and is accessible to all the company’s employees, regardless of their profession. However, Eric specifies that the data is well secured: “The data lake is not an open bar at all! Everything is done under the control of the data officers and data owners“. The data lake:

    • Stores structured and unstructured data.
    • Combines the different data sources from various businesses.
    • Provides a complete view of a situation, a topic or a data environment.
    • Is very scalable.

“Real Time Data Processing” Platforms

To operate the data, Air France has implemented 8 real-time data processing platforms to meet the needs of each “high priority” business use case.  For example, they have a platform for predictive maintenance, customer behavior knowledge, or process optimization on stopovers.

Eric confirms that when an event or hazard occurs, their platform is able to push recommendations in “real time” in just 10 seconds.

Data Warehouses

As mentioned above, Air France had also already set up data warehouses to store external data such as customer and partner data and data from operational systems.  These Data Warehouses allow users to query these datasets in complete security, and are an excellent communication vector to explain the data strategy between the company’s different business lines.

The Benefits of Implementing a Hybrid Cloud Architecture

Air France’s initial questions regarding the move to the Cloud were:

  • Air France KLM aims to standardize its calculation and storage services as much as possible.
  • Not all data is eligible to leave Air France’s premises due to regulations or sensitive data.
  • All the tools already used in UDA platforms are available both on-premise and in the public cloud.

Éric says that a hybrid Cloud architecture would allow the firm to have more flexibility to meet today’s challenges:

“Putting our UDA on the Public Cloud would give greater flexibility to the business and more options in terms of data deployment.”

According to Air France, here is the checklist of best practices before migrating to a Hybrid Cloud:

  • Check if the data has a good reason to be migrated to the Public Cloud.
  • Check the level of sensitivity of the data (according to internal data management policies).
  • Verify compliance with the UDA implementation guidelines.
  • Verify data stream designs.
  • Configure the right network connection.
  • For each implementation tool, choose the right level of service management.
  • For each component, evaluate the locking level and exit conditions.
  • Monitor and forecast possible costs.
  • Adopt a security model that allows Hybrid Cloud security to be as transparent as possible.
  • Extend data governance in the Cloud.

Where is Air France Today?

It’s clear that the COVID-19 crisis has completely changed the aviation sector. Every day, Air France has to take the time to understand new passenger behavior and adapt flight schedules in real time, in line with the travel restrictions put in place by various governments. By the end of summer 2020, Air France will have served nearly 170 destinations, or 85% of their regular network.

Air France’s data architecture has therefore been a key catalyst for the recovery of their airlines:

“A huge thanks to our business users (data scientists) who every day try to optimize services in real time so that they can understand how passengers are behaving in the midst of a health crisis. Even if we are working on artificial intelligence, the human factor is still an essential resource in the success of a data strategy”. 

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

Speedier Interactions With Actian Zen From Node.js

Actian Corporation

October 15, 2020

Actian Zen

Make Faster Calls Using the High-Speed Btrieve 2 API

Developers building real-time, data-intensive edge applications are increasingly turning to Node.js. It’s not in itself a programming language but an open-source, multi-platform, run-time environment that leverages JavaScript and its ecosystem — and turns out to be quite well suited for today’s data streaming and JSON API applications.

If you’re using Actian Zen as your edge data management platform — and naturally we think you should — you’ll find that Node.js pairs well with Zen. However, there’s more than one way to pair them. You can easily interact with Actian Zen from Node.js using SQL via ODBC, for example, and when the complexity of your interactions warrants the use of SQL that’s a perfect option.

But SQL via ODBC isn’t the fastest way to interact with Zen, and when you need speed there’s a better option: From Node.js you can access Zen data via the Btrieve 2 API. Let’s talk conceptually about how you can do this, and then we’ll dive into the practicalities of doing this. You’ll need certain software components to facilitate interaction with the Btrieve 2 API – including PHP, Python3, C++, and a few others that are easily downloaded – but let’s skip over the setup for now and focus on how you can speed up access to the Zen data you need.

Using the Btrieve 2 API

Conceptually, your JavaScript program is going to push a call through a special Node.js interface to the Btrieve 2 API, which is a C++ library that interacts directly with the Zen database engine.

From the standpoint of a JavaScript program, the interactions are relatively straightforward. Here’s the procedural logic:

  • Define the libraries and components to be loaded.
  • Set up and variables to be used.
  • Define the name, location, and record characteristics of the data file to hold the results of a query.
  • Instantiate an instance of the BtrieveClient class used for performing engine-wide operations such as creating and deleting files and opening and closing files.
  • Prepare information defining the key segment.
  • Set the created key segment information into the index attribute.
  • Create a file attributes object and set the fixed record length.
  • Create a new Btrieve file based on the information set (BtrieveFile object is a class that handles Btrieve data files).
  • Open the file.
  • Perform the database operations that your application requires.
  • Close the Btrieve File.

You can download a sample .js file here that will enable you to see the logic in action. Performatively, the 43-line sample application creates a Btrieve file and populates it with 10,000 10-byte records (each record consisting of an 8-byte timestamp and a 2-byte integer that, in this instance, might represent input from, say, an IoT sensor). The sample program also stores for later use the timestamp index for every 200th record and, ultimately, extracts the last written record from the data file and displays the value recorded in that record. Naturally, your use case may be far more involved but you’ll see how easy it is to create the JavaScript that will provide a high-performance interaction with Zen.

Putting the Sample Through its Paces

Care to run the aforementioned .js file to experience the performance of the Btrieve 2 API?  Give it a try! Net-net, Node.js and Zen can provide a powerful array of options when it comes to developing mobile and IoT applications.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What is the Difference Between a Data Owner and a Data Steward?

Actian Corporation

October 12, 2020

There are many different definitions associated with data management and data governance on the internet. Moreover, depending on the company, their definitions and responsibilities can vary significantly. To try and clarify the situation, we’ve written this article to shed light on these two profiles and establish a potential complementarity.

Above all, we firmly believe that there is no idyllic or standard framework. These definitions are specific to each company because of their organization, culture, and their “legacy”.

Data Owners and Data Stewards: Two Roles With Different Maturities

The recent appointment of CDOs was largely driven by the digital transformations undertaken in recent years: mastering the data life cycle from its collection to its value creation. To try to achieve this, a simple – yet complex – objective has emerged: first and foremost, to know the company’s information assets, which are all too often siloed.

Thus, the first step for many CDOs was to reference these assets. Their mission was to document them from a business perspective as well as the processes that have transformed them, and the technical resources to exploit them.

This founding principle of data governance was also evoked by Christina Poirson, CDO of Société Générale, during a roundtable discussion at Big Data Paris 2020. She explained the importance of knowing your data environment and the associated risks to ultimately create value. During her presentation, Christina Poirson developed the role of the Data Owner and the challenge of sharing data knowledge. As part of the business roles, they are responsible for defining their datasets as well as their uses and their quality level, without questioning the Data Owner:

“The data in our company belongs either to the customer or to the whole company, but not to a particular BU or department. We manage to create value from the moment the data is shared”. 

It is evident that the role of “Data Owner” has been present in organizations longer than the “Data Steward” has. They are stakeholders in the collection, accessibility and quality of datasets. We qualify a Data Owner as being the person in charge of the final data. For example, a marketing manager can undertake this role in the management of customer data. They will thus have the responsibility and duty to control its collection, protection and uses.

More recently, the democratization of data stewards has led to the creation of dedicated positions in organizations. Unlike a Data Owner and manager, the Data Steward is more widely involved in a challenge that has been regaining popularity for some time now: data governance.

In our article, “Who are data stewards“, we go further into explaining about this profile, who are involved in the referencing and documenting phases of enterprise assets (we are talking about data of course!) to simplify their comprehension and use.

Data Steward and Data Owners: Two Complementary Roles?

In reality, companies do not always have the means to open new positions for Data Stewards. In an ideal organization, the complementarity of these profiles could tend towards:

A data owner is responsible for the data within their perimeter in terms of its collection, protection and quality. The data steward would then be responsible for referencing and aggregating the information, definitions and any other business needs to simplify the discovery and understanding of these assets.

Let’s take the example of the level of quality of a dataset. If a data quality problem occurs, you would expect the Data Steward to point out the problems encountered by its customers to the Data Owner, who is then responsible for investigating and offering corrective measures.

To illustrate this complementarity, Chafika Chettaoui, CDO at Suez – also present at the Big Data Paris 2020 roundtable – confirms that they added another role in their organization: the Data Steward. According to her and Suez, the Data Steward is the person who makes sure that the data flows work. She explains:

“The Data Steward is the person who will lead the so-called Data Producers (the people who collect the data in the systems), make sure they are well trained and understand the quality and context of the data to create their reporting and analysis dashboards. In short, it’s a business profile, but with real data valence and an understanding of data and its value”. 

To conclude, there are two notions regarding the differentiation of the two roles: the Data Owner is “accountable for data” while the Data Steward is “responsible for” the day-to-day data activity.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

What is a Modern Data Platform?

Actian Corporation

October 9, 2020

modern data platform image

What is a modern data platform?  When we ask this question, we get two different groups of answers. The first group is around the technology needed for data processing in today’s business environment. In this group, the answers mention things like cloud services, containers, and on-prem. The second set of answers involves sources of data, types of data, and the management of that data. This group is more focused on using the data that is collected, but the fact is that both groups need to work together to provide a modern data platform.

Modern Data Platform Technology

The modern data platform has to deal with three different aspects of business data.

1. Volume – This is perhaps the easiest aspect to deal with in storage systems that utilize cloud technologies. Products like Actian can be delivered on hybrid systems that include private and public cloud service providers, including Amazon Web Services, Microsoft Azure, and Google Cloud. These platforms all offer the capability to grow to support increasing volumes of data and, in many cases, offer the ability to maintain that data for long periods.

2. The Variety of Data Types and Input Sources – Most people have multiple digital devices and interact with multiple executable sources. A single individual may use a mobile phone, a tablet, and a computer to interact with a given business. More than direct interaction, inputs may come from social media applications, and mobile phone apps in the form of structured or semi-structured data. One of the keys to a successful modern data platform is the utilization and dependency on standard protocols rather than some proprietary connection.

3. The Velocity of Data Accrual – Because data arrives from a variety of sources, it tends to arrive quickly as well. The consolidation of disparate data sources is another key to the modern data platform. It is difficult at best to bring together all of the data into a single repository or format, so virtual unity through data management and operations is necessary. One way to accomplish the virtual consolidation is through adaptive indexing and metadata use. Replacing or augmenting traditional taxonomies with faceting classifications enables the data to be searched and organized in multiple ways. This provides the use of different analytics and allows organizations to understand the data.

Beyond the technology is the need for a business to understand the data that they have, what additional data they might need, and how all of that data can be used.

Data Usage

The goal for any organization is data-driven insights into business operations and business needs. Those insights are only achievable based on data pipelines that are cleansing, formatting, and organizing the data. The data pipeline is fed from the business data operations. Data operations are, in turn, fed by the raw data collection and our data repositories. That data should be accessed via standard protocols for things like workflows, data quality measures, and data governance.

The insights derived from the process need, in turn, to be distributed to the appropriate consumers in order to provide value. That distribution can be in the form of some combination of reports, service layer APIs for automated action, and mobile device alerts. The delivery of value based on the processes of a modern data platform needs to align with the needs of the users of the processed data. Dashboards and reports on things like financial and operational performance can be actionable data products. That means that reports and dashboards are important, but so also is information in a form that takes a step beyond reporting. For example, linking regulatory or previously stored data with current data would be useful when processing invoice data in a financial system.

From a business perspective, the types of data are an important aspect of the data platform. Customer data was ranked in a recent survey as the most important data in a data warehouse. Businesses want to turn prospects into customers, customers into loyal customers, and loyal customers into product advocates. In order to do that, good customer data collection and processing is necessary in order to build aggregated customer data to identifies customer groups and segments.

Best practices for managing customer data include:

  • Each customer has a single source record, no matter how many data sources there are.
  • Access to customer data should have standard governance for storage, retrieval, and usage.
  • Data that is not relevant to the business relationship should not be stored or processed.
  • Customer privacy needs to be protected.

The modern data platform needs to manage multiple types of data. In the survey, transactional data is listed as the second most important type of data. Transactional data is often the result of customer interactions, such as product purchases, product returns, payments, subscriptions to newsletters and other recurring information, and where applicable, donations. These types of data often have legal significance in addition to business significance. Other common types of data include operational data, contact center data, marketing data, and reservations.

Once a business has decided on what data is important to collect, then the data platform needs to be deployed and monitored. Data needs to be processed in a timely manner, appropriately governed, and protected. The data storage needs to be managed as well as data access and usage.

Learn more about Actian – hybrid data warehouse platform here.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.