Data Intelligence

How You’re Going to Fail Your Data Catalog Project (or Not…)

Actian Corporation

March 25, 2020

data catalog difficultés

Many solutions on the data catalog market offer an overview of all enterprise data, all thanks to the efforts conducted by data teams.

However, after a short period of use, due to the approaches undertaken by enterprises and the solutions that were chosen, data catalog projects often fall into disuse.

Here are some of the things that can make a data catalog project fail…or not:

Your Objectives Were Not Defined

Many data catalog projects are launched using a Big Bang approach to document assets, but without truly knowing their objectives.

Fear not! In order to avoid bad project implementation, we advocate a model based on iteration and value generation. Conversely, this approach allows for better risk control and the possibility of a faster return on investment.

The first effects should be observable at the end of each iteration. In other words, the objective must be set to produce concrete value for the company, especially for your data users.

For example, if your goal is data compliance, start documentation focused on these properties and target a particular domain, geographic area, business unit, or business process.

Your Troop’s Motivation Will Wear Off Over Time

While it is possible to gain adherence and support regarding your company’s data inventory efforts in its early stages, it is impossible to maintain this support and commitment over time without automation capabilities.

We believe that descriptive documentation work should be kept to a minimum to keep your teams motivated. The implementation of a data catalog must be a progressive project and will only last if the effort required by each individual is greater than the value they will get in the near future.

You Won’t Have the Critical Mass of Information Needed

For a data catalog to bring value to your organization, it must be richly populated.

In other words, when a user searches for information in a data catalog, they must be able to find it for the most part.

At the start of your data catalog implementation project, the chances that the information requested by a user is not available are quite high.

However, this transition period should be as short as possible so that your users can quickly see the value generated by the data catalog. By choosing a tactical solution, based on its technology and connectivity to information sources, a pre-filled data catalog will be available as soon as it is implemented.

Does Not Reflect Your Operational Reality

In addition to these challenges, data catalogs must have a set of automated features that are useful and effective over time. Surprisingly, many solutions do not have offer these minimum requirements for a viable project, and are unfortunately destined for a slow and painful death.

Connecting data catalogs to your sources will ensure that your data consumers:

  • Reliability as to the information made available in the data catalog for analysis and use in their projects.
  • Fresh information: Are they up to date, in real time?
actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Actian Life

A Letter Regarding COVID-19 From Actian CEO Rohit De Souza

Actian Corporation

March 20, 2020

actian logo and open letter to customers and partners

A letter to our customers and partners:

At Actian, our hearts and thoughts go out to the people who have been affected by this unprecedented global event and we appreciate the healthcare workers, local communities, and governments around the world who are on the front line working to contain this pandemic. Our focus is on the health and safety of our employees, family and the communities we engage with socially and professionally.

Please know that we are vigilantly monitoring the COVID-19 (coronavirus) situation around the clock and we are confident we have taken the necessary precautions to ensure our ability to continue running our business, operating our platform, and providing continuous high quality support to all our customers globally.

To date, Actian does not have any confirmed cases of COVID-19 and we are dedicated to minimizing the risk of exposure. We have not had any disruptions in our services and offerings, and as part of our pandemic response we have implemented the following measures — with the goal of ensuring our service to you will be uninterrupted, while protecting in every way possible the health and well-being of our personnel.

We are utilizing social distancing as a worldwide policy. All Actian employees around the world have been strongly encouraged to work from home. With our highly distributed workforce, and many of our employees typically working remotely, this shift has been extremely smooth, and we will continue to deliver the highest levels of service and support to our customers. We have global collaboration tools and a weekly company-wide “All Hands Call” (along with numerous regular team calls) for continuous feedback and updates to our employees on any enhanced guidelines. We have also mandated travel restrictions and visitor guidelines to reduce the risk of infection.

Our teams have been instructed to work with customers through digital channels as much as possible in support of social distancing and keeping in-person interactions to a minimum. We have postponed our in-person marketing events and implemented a shift to virtual events to keep our customers updated and connected with the technology community. To assist our customers with the increased needs for remote expertise during these challenging times, we are offering our Remote DBA and Customer Success services at a discounted rate.

Actian is committed to doing our part to stem the spread of the COVID-19 virus, and to heed the best practices directed by public health authorities and government guidelines. This dynamic and rapidly moving health crisis will present challenges for everyone at home and in the workplace. Our early preparedness efforts give us confidence that we will continue providing excellent services to you without interruption or compromise. We at Actian are wishing you and your families and your colleagues good health and well-being.

With the actions we have taken, we remain confident and committed to supporting all our customers and partners as they work through these trying times.

Rohit de Souza
President & CEO

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

How Spotify Improved Their Data Discovery for Their Data Scientists

Actian Corporation

March 19, 2020

spotify lexikin cover

As the world leader in the music streaming market, it is without question that the huge firm is driven by data.

Spotify has access to the biggest collections of music in the world, along with podcasts and other audio content.

Whether they’re considering a shift in product strategy or deciding which tracks they should add, Spotify says that “data provides a foundation for sound decision-making”.

Spotify in Numbers

Founded in 2006 in Stockholm, Sweden, by Daniel Ek and Martin Lorentzon, the leading music app’s goal was to create a legal music platform in order to fight the challenge of online music piracy in the early 2000s.

Here are some statistics & facts about Spotify in 2020:

  • 248 million active users worldwide.
  • 20,000 songs are added per day on their platform.
  • Spotify has a 40% share of the global music streaming market.
  • 20 billion hours of music were streamed in 2015.

These numbers not only represent Spotify’s success, but also the colossal amounts of data that is generated each year, let alone each day! To enable their employees, or as they call them, Spotifiers, to make faster and smarter decisions, Spotify developed Lexikon.

Lexikon is a library of data and insights that helps employees find and understand the data and knowledge generated by their expert community.

What Were the Data Issues at Spotify?

In their article How We Improved Data Discovery for Data Scientists at Spotify, Spotify explains that they started their data strategy by migrating data to the Google Cloud Platform, and saw an explosion of their datasets. They were also in the process of hiring many data specialists such as data scientists, analyst, etc. However, they explain that datasets lacked clear ownership and had little-to-no documentation, making it difficult for these experts to find them.

The next year, they released Lexikon, as a solution for this problem.

Their first release allowed their Spotifiers to search and browse through available BigQuery tables as well as discover past researches and analysis. However, months after the launch, their data scientists were still reporting data discovery as a major pain point, spending most of their time trying to find their datasets therefore delaying informed decision-making.

Spotify decided then to focus on this specific issue by iterating on Lexikon, with the unique goal to improve data discovery experience for data scientists.

How Does Lexikon Data Discovery Work?

In order for Lexikon to work, Spotify started out by conducting research on their users, their needs as well as their pain points. In doing so, the firm was able to gain a better understanding of their users intent and use this understanding to drive product development.

Low Intent Data Discovery

For example, you’ve been in a foul mood so you’d like to listen to music to lift your spirits. So, you open Spotify, browse through different mood playlists and put on the “Mood Booster” playlist.

Tah-dah! This is an example of low-intent data discovery, meaning your goal was reached without extremely strict demands.

To put this into Spotify’s data scientists context, especially new ones, their low intent data discovery would be:

  • Find popular datasets used widely across the company.
  • Find datasets that are relevant to the work my team is doing.
  • Find datasets that I might not be using, but I should know about.

So in order to satisfy these needs, Lexikon has a customizable homepage to serve personalized recommendations to users. The homepage recommends potentially relevant, automatically generated suggestions for datasets such as:

  • Popular datasets used within the company.
  • Dataset recently used by the user.
  • Datasets widely used by the team the user belongs to.

High Intent Data Discovery

To explain this in simple terms, Spotify uses the example of hearing a song, and researching it over and over in the app until you finally find it, and listen to it on repeat. This is high intent data discovery.

A data scientist at Spotify with high intent has specific goals and is likely to know exactly what they are looking for. For example they might want to:

  • Find a dataset by its name.
  • Find a dataset that contains a specific schema field.
  • Find a dataset related to a particular topic.
  • Find a dataset that a colleague used of which they can’t remember the name.
  • Find the top datasets that a team has used for collaborative purposes.

To fulfill their data scientists needs, Spotify focused first on their search experience.

They built a search ranking algorithm based on popularity. By doing so, data scientists reported that their search results were more relevant, and had more confidence in the datasets they discovered because they were able to see which dataset was more widely-used by the company.

In addition to improving their search rank, they introduced new types of properties (schemas, fields, contact, team, etc.) to Lexikon to better represent their data landscape.

These properties are able to open up new pathways for data discovery. In the example down below, a data scientist is searching for a “track_uri”. They are able to navigate through the “track_uri” schema field page and see the top tables containing this information. Since adding this new feature, it has proven to be a critical pathway for data discovery, with 44% of Lexikon users visiting these types of pages.”

Final Thoughts on Lexikon

Since making these improvements, the use of Lexikon amongst data scientists has increased from 75% to 95%, putting it in the top 5 tools used by data scientists!

Data discovery is thus, no longer a major pain point for their Spotifiers.

Sources:

Spotify Usage and Revenue Statistics (2019): https://www.businessofapps.com/data/spotify-statistics/
How We Improved Data Discovery for Data Scientists at Spotify: https://labs.spotify.com/2020/02/27/how-we-improved-data-discovery-for-data-scientists-at-spotify/
75 Amazing Spotify Statistics and Facts (2020): https://expandedramblings.com/index.php/spotify-statistics/

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Analytics

Why You Should Offload Analytics into a Data Warehouse

Actian Corporation

March 17, 2020

Data warehouse image

Modern businesses are fueled by data. The insights that your data bring are what power decision-making, enable you to optimize business processes and respond to changing market conditions. The organizations that have data and manage it well – excel. Those who lack data or struggle to harvest actionable insights from their data have a tougher time.

One of the most significant challenges IT has as the stewards of company data is striking a balance between high-performance real-time data processing performance of individual business processes and the deep/enterprise-scale analytics required for solving the company’s biggest problems. By offloading analytics from Online Transaction Processing (OLTP) systems into a cloud data warehouse like the Actian Data Platform, your company can achieve both objectives at the same time.

Sustaining High-Performance in Your Business Systems

OLTP systems are your transactional business systems – the tools that your employees, partners, and customers interact with within the course of normal day-to-day business activities. These systems are optimized for real-time data processing (as they should be). Any impact on performance has a direct impact on your process cycle times and employee productivity. With each new business transaction, you create more data.

As the size of your OLTP database grows, the applications that run on it begin to slow down. Adding an analytics load on top of the transactional processing makes the problem even worse.  Sustaining a high-performing business system requires continuous active tuning of the OLTP system to eliminate any non-essential activities. A key technique that IT teams employ is to offload analytics processing into a data warehouse, freeing up compute capacity in the OLTP system, so business software has more system resources from which to draw.

Leveraging Change Data Capture for Real-Time Analytics

Change data capture is an analytics capability available in nearly all databases but is mostly used to populate data warehouses. What this capability does is to monitor for changes in your transactional data that might correspond with business events that represent opportunities or threats for your business. Some changes in business transactions are good… an incremental increase in sales transaction values. Other changes are adverse… such as a sudden drop in the number of users logged in to your website. Change data capture can help you understand when something is awry, so you can assess the impact and determine if any corrective action is required.

Running change data capture operations on OLTP systems can be problematic. The overhead load it places on the system has to be monitored to minimize the performance impact to your business systems. Change data capture is most valuable with larger data sets for analyzing trends. If you have a good log archive management in place, performance overheads can be contained. So, it makes sense to use change data capture to populate your data warehouse system with near-real-time business operational data.

Extend the Useful Life of Your Business Systems

Business systems like Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Human Resource Management (HRM), IT Service Management (ITSM), and eCommerce systems are costly to install, but disruptive to the business when you need to replace them. If the systems you are using today run on-premises in your data center, upgrades to hardware infrastructure to add compute capacity may require new capital outlays and/or migration to the cloud. Offloading analytics from these systems into a data warehouse can help you keep these systems running longer with existing resources – postponing the impacts of system upgrades.

Over the next few years, new business systems offerings that are cloud-native, integrate Artificial Intelligence (AI) capabilities, and have enhanced support for streaming data are poised to come to market, creating a natural time to upgrade. Extending the useful life of your existing systems gives your company the flexibility to wait for the new features that are “coming soon” and catch the next wave of emerging technology to maximize the return on investment of your upgrade projects.

Offloading analytics from your OLTP system into a data warehouse is a smart IT decision. It helps keep your business systems running faster, gives you the real-time data insights you need for agile decision-making, and extends the useful lifespan of your existing systems, so you capture the next wave of technology innovations that are just over the horizon. Actian Data Platform can help. As a data warehouse solution, the Actian Data Platform can run on-premises, in the cloud, or even as a hybrid, split across different environments giving you the analytics capabilities and scale that you will need to manage your company’s data successfully.

To learn more, visit www.actian.com/data-platform

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.