Actian Life

Introducing Our 2021 Interns

Actian Corporation

July 28, 2021

2021 Actian Interns

Summer is in full swing which can only mean one thing… Actian’s internship program is back! For the second year in the row, Actian is hosting a 12-week virtual internship program designed for college students to explore careers in tech. This year, our interns will get a chance to gain experience with teams in Engineering, Finance, Technical Course Development, Marketing, and People. Get to know a little more about each intern below:

Adesh Kumar

IT Applications Intern

Adesh is a graduate student pursuing a Master of Fine Arts in Visual Effects from the Savannah College of Art and Design. After receiving his bachelor’s degree in Computer Science from UC Santa Cruz, Adesh became fascinated with visual design. It was not long before he discovered a field that would allow him to leverage his CS skills without having to leave his artistic side behind. When Adesh learned about user interface and user experience design, he knew this was the type of work he was looking to explore. Luckily, through Actian, Adesh has been able to further develop his skills in these areas and aspires to one day lead a UX team of his own.

Now, you might be wondering what exactly an IT Applications Intern does at Actian. After waking up at 7 am, sipping on some coffee, and going through a few morning meetings, Adesh is ready to work on his summer project. His project focuses on designing an internal application for the company. In just five weeks, Adesh is amazed at how much he has learned in such little time. Already he has iterated through different designs, contributed his ideas in meetings, and has come to appreciate the importance of documenting everything he does.

One word you can use to describe Actian’s work environment is collaborative—and this just happens to be Adesh’s favorite part about interning here. Not only is Adesh able to work with his team towards weekly goals, but he is also able to partner with people outside his immediate team. This allows him to see how different teams work together to better support the company.

After a long week of work, Adesh shuts down his computer and turns off the lights to his studio (a neat project he and his brother accomplished of turning their garage into a workspace) and begins his relaxing weekend. Usually, you can find him watching formula one races, catching up on the latest NBA games, or practicing his newly learned quarantine trick—juggling any 3 objects he sees in sight.

Iris Lee  

Product Marketing Intern

Iris is a rising junior studying Economics and Data Science at UC Berkeley. When it came to deciding her major, Iris knew she had always been curious about consumer’s behavior in decision-making, particularly in a business setting. However, after taking an introduction to data science course, Iris gained a greater interest in the intersection between business and technology. In fact, the reason she applied to Actian was because she found it to be a perfect blend of the two. Upon doing further research on the company, Iris became intrigued with the concept behind cloud data warehouses. As someone who loves data, growth, and innovation, reading about Actian’s products made her excited about the future of analytics—and it was an avenue she wanted to learn more about.

Knowing that Actian’s products were the first thing that caught her eye, working with the marketing team is the perfect fit for Iris. As the product marketing intern, Iris is researching different buyer personas to better understand customers in industries such as health care, manufacturing, and financial services. Through this research, she hopes to gain insight on the type of data visualization tools these industries utilize and how they can operate in conjunction with Actian products. During this internship, Iris hopes to strengthen her market research approach and learn to get comfortable with external applications in this area. She is also looking forward to learning all the tips and tricks on becoming a great presenter, which she will then prepare to use during the internship showcase.

One of the things that makes Actian’s internship program so great is that as much as interns get to work on meaningful projects, they can also take time to distress by attending fun weekly intern events. This is Iris’ favorite part of the internship. Meeting other interns, laughing at Fibbage responses, and creating random PowerPoints are just a few of the perks for attending these events, which in Iris’ opinion, is a great way to wind down the week.

When Iris is not working or attending an intern event, you can find her cruising through her neighborhood on her penny board, reading fiction books from the New York Times Best Sellers, or indulging in the greatest snack of all time…drum roll please…Hot Cheetos.

Mollie Kendall  

Digital and Demand Generation Intern

Mollie is a senior at Texas State University, where she will be graduating this fall with a Bachelor of Business Administration in Marketing. Mollie decided to concentrate her degree in marketing when she noticed how crucial it is for the success of any business—especially her own. Yes, you read that right, prior to entering college, Mollie was already a successful business owner! In 2008, she began her own photography business. What first began as family portraits eventually progressed to all different types of photography including commercial, weddings, and sports. It was sports photography that eventually led Mollie to land a job with the Texas Stars as their marketing photographer. Although shooting ice hockey is not an easy task, being able to capture such a competitive sport is something Mollie truly enjoys and hopes she can continue doing so in the future.

Mollie first heard about Actian’s internship program through a close mentor, who connected her with an employee at Actian. Wanting to go beyond the classroom setting, Mollie was looking to gain real world experience in marketing which was exactly what the Digital and Demand Generation intern role offered.

Currently, Mollie is exploring different facets of social media to see how Actian can stand out in today’s competitive market. Although marketing can be labeled as a “creative field”, Mollie has already learned the importance of collecting, analyzing, and researching data in this area. By the end of the internship, she hopes to gain a greater understanding of the analytics behind marketing and discover what strategies increase brand recognition.

We all know the phrase, “you learn something new every day”, but little do we often take time to reflect and realize the truth that comes with this statement. For Mollie, this is one of the reasons that makes Actian a great company. Whether it’s learning new software, skill, or marketing lingo, Mollie appreciates how dedicated her team is in ensuring she is always learning and developing in new ways.

When she is not at work or shooting for the Texas Stars, you can most likely find Mollie playing on her local ice hockey team, spending time with her adorable Aussie puppy, or watching any type of gangster mob movie out there (with Goodfellas being her favorite).

Geethika Bonthala  

Finance Intern

Geethika is an incoming sophomore at the University of Texas at Austin majoring in Finance, minoring in Economics, and working towards a certificate in Applied Statistical Modeling. Finance first caught her attention when she joined DECA in high school, an organization dedicated to developing students in leadership and business roles. Through DECA, Geethika had the opportunity to shadow a mentor who worked in finance. It was through this experience where she noticed that many of her strengths closely aligned with the skills needed to succeed in this field.

Geethika’s journey to Actian was a bit different than other interns. She first applied to the Sumeru Equity Partners Fellows Program, which seeks to place fellows at one of its portfolio companies. After going through a few rounds of interviews, she landed herself a spot on Actian’s finance team, which was a top choice for Geethika as she was looking to gain exposure to corporate finance.

Working on the FP&A team, Geethika is currently building a KPI dashboard for the finance team. One of her favorite parts about this project is working collaboratively with her team. Additionally, Geethika enjoys the work environment at Actian. There is enough space for her to do her research and independently try to tackle a task, but there is also enough support where she knows she can reach out to people in case she needs the help. Given that this is her first internship, Geethika loves that she is doing real work, as her project will have a direct impact on the company.

When Geethika first decided to study finance, she never thought about it in the context of technology. However, after this internship, she would love to explore future opportunities that incorporate both finance and tech. When she is not crunching out numbers or working on an excel spreadsheet, you can find Geethika playing the violin (which she has played for 14 years!), spending afternoons on the tennis court, or making trips to one of the most therapeutic places in the world—Target.

Bianca Lawson  

Technical Course Developer Intern

Bianca is currently a graduate student at the University of Central Florida pursuing her master’s degree in Instructional Design and Technology. Now, for a quick short story on how Bianca came across this internship opportunity… One day, Bianca decided to open her laptop and browse through her advisor’s bulletin page to see what internship opportunities were available for the summer. While scrolling through the page, one particular company stood out… blue lettering, Montserrat font, great looking logo… it was Actian. Although this was her first time reading about the company, the internship description matched with the skills Bianca was eager to learn. Most importantly, she noticed that the internship program offered a dedicated buddy—which was something she valued, especially for getting acclimated to a new industry. Today, Bianca is grateful for taking a few seconds of her day to scroll through that page since it led her to her first tech internship that has exceeded her expectations.

As the technical course developer intern, Bianca is helping revamp courses in Actian Academy, the company’s on demand training platform. During the week, Bianca is busy creating storyboards, typing up scripts, and obtaining the chance to see first-hand what it’s like to produce educational content for internal and external use.

One of Bianca’s favorite parts about being on the education team is how encouraging and patient everyone is. She loves having the opportunity to work on things independently, present her work to her team, and then have the ability to get feedback. Additionally, she knows that if she ever has questions, someone on her team is always willing to answer them. One of Bianca’s biggest goals this summer is to gain competence and confidence in a new skill, which is certain she’ll be able to reach with such a great team by her side.

Outside of work, you can usually find Bianca with a mystery book or a controller in hand, most likely playing the Sims. During her free time, she also loves to cosplay (her favorite character to dress up is Tina Belcher) and attend conventions to meet new people. In the future, Bianca sees herself having a career in instructional design, working in the tech sector, and perhaps even living in a state that snows (since she has never seen snow before!).

Josh Steigerwald

Software Engineering Intern

Josh is a rising senior at Stony Brook University studying Computer Science and Finance. Growing up, Josh didn’t know much about computers (apart from playing the educational CD-ROM games) and only had real exposure to them when he was in school. However, this all changed when his parents introduced a new addition to the family—the iMac. This instantly piqued Josh’s curiosity about computers and naturally led him to take a few computer science courses in high school. It wasn’t long before Josh realized that he enjoyed the content he was learning and could see himself pursuing a career in this field. In the future, after leaving his mark on the engineering world, Josh would like to explore the business side of tech, which is a big reason why he is also studying finance.

If there is one word to describe Josh it would definitely be entrepreneurial. At an early age, Josh started a business… which down the road turned into several businesses. From doing handyman work to having a landscaping business, to teaching private music lessons, Josh has worn multiple hats and gained valuable skills throughout the years. All this business work has taught Josh how to handle his finances, time management, and the importance of building trust with customers.

What drew Josh to this internship program was Actian’s culture. Every person he met during his interview stage were all so friendly and willing to extend themselves, even before he was hired. Now that he is working with Vector, Josh can see how this welcoming atmosphere is still foster within his team, which has ultimately made this internship one of his best learning experiences so far.

Something you might not know about Josh is that he collects instruments and vinyl records. He has collected over 50 instruments and at one point had over 4,000 vinyl records. In his free time, you can find Josh spending time with his cats, shooting some digital and film photography, or performing jazz music.

Alba Lokaj  

Software Engineering Intern

Alba is a rising third year at Jacobs University Bremen studying Computer Science. Like many others, Alba was first introduced to computer science through her education. After taking some computer science classes in high school, she was amazed at how writing a few lines of code could bring an application or website to life. Alba saw computer science as a way to build and drive innovation in the future so when it came time to choosing her area of study, computer science was the al fit for her.

Alba first heard about Actian through LinkedIn. Looking to get experience in managing and manipulating data, Actian seemed like the ideal data driven company that could help her achieve this. Currently, Alba sits in Actian’s office in Ilmenau, Germany and is directly working with the Vector product. Already, she has done research on external libraries, worked with new programming languages, and partnered with people on her team to help her succeed in her project. Alba’s main goal coming into this internship was wanting to receive practical experience in software, something she can successfully say she was able to do as soon as she started her internship.

After this internship, Alba knows she would like to continue working in a software engineer role and eventually one day would like to obtain her master’s degree in Computer Science. She hopes she can take her skills and work on purposeful projects in both her personal and professional life.

In her free time, you can find Alba going on a run in the green countryside of the city, playing on her university’s soccer team (where she takes on the leadership role of captain), or listening to alternative/indie music. A key strength that Alba has is paying attention to detail, which is the reason Alba has a gift for doing impersonations. Not only can she impersonate her friends almost perfectly, but also can do this with characters she sees on TV, catching onto every small gesture and behavior possible.

And now for some rather familiar faces…

Sam Nichols

Software Engineering Intern

Sam is a returning intern from the University of Wisconsin-Madison where he will be graduating this fall with a degree in Electrical Engineering and Computer Science. Choosing to return to Actian was an easy choice for Sam. Last summer, he felt that he had tremendously grown, both personally and professionally, throughout the internship program. It’s hard to choose just one favorite thing about Actian, but if Sam really had to choose, it would have to be the amount of support offered by everyone he meets. At Actian, he never felt unsure or lost on what he needed to do. As an intern, this is huge. Having confidence in your work is such an awesome feeling, but knowing you have people that care about you and your work, is an even better feeling.

This summer, Sam is looking forward to working with Vector, the industry’s fastest analytic database. After spending the prior summer learning the ins and outs of the Actian Data Platform , Sam is excited to focus on another product, learn a slightly different skillset, all while being at a company that has been so vital to his professional development.

At the end of the 12 weeks, interns have the opportunity to present their capstone project to the entire company. For Sam, this was a great experience because you are presenting to employees beyond your team who all have a genuine interest in what each intern has to say. After presenting his project and being showered with supportive messages in the chat, it became clear to Sam that Actian is a place that values the input of all interns.

Working remotely comes with its perks, one of them being the ability to create your ideal workstation. So I asked Sam to describe his workstation situation to me and this is what he describe: Imagine you walk into a room and at the corner of this room, there is an L shaped desk. On one side of the desk rests Sam’s personal laptop and personal desktop. On the other side is his work laptop and work desktop. Sam is quite literally surrounded by screens, which is the perfect set up for him as all his work can be displayed at once.

Amy Vides

Employee Experience Intern

Amy is a returning intern studying Economics and Administrative Studies at the University of California, Riverside. Working with Actian’s people team for two summers in a row, Amy can now confidently say that the best part about interning at Actian is indeed the people. Attending coffee chats, playing trivia during virtual happy hours, and connecting with individuals from different departments are just a few things that Amy loves about Actian.

Amy’s favorite Actian memory from last summer would have to be the greenlight brainstorming sessions she had with her team. During these sessions, Amy was able to advocate for certain ideas, listen to other’s perspectives, and ultimately learn how collaboration can drive the success of a project.

One fundamental skill that Amy has been able to develop during her time here is project management. Before her internship, she didn’t know much about this process other than having to finish a project by a certain date. Today, she has a better understanding of the stages within project management, the constraints that need to be considered, and how to measure progress to ensure that things are moving along.

One thing that Amy deeply cares about is bringing equity to the tech space. Working with a team that values diversity, equity, inclusion, and belonging has been such a rewarding experience for her. Every day she is amazed by the steps and initiatives her team is taking towards making everyone at Actian feel respected and empowered.

This summer, Amy is excited to learn more about employee engagement, professional development, and employer branding. She loves being in a space that encourages her to try new things, take lead on different projects, and challenge her to think outside the box. When not working, you can find Amy looking for the next big food trend, browsing through abstract art on Pinterest, or trying to convince her friends why NorCal is so much better than SoCal.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Comprehensive Guide to Effective Data Preparation

Actian Corporation

July 28, 2021

Data preparation and analysis words on a laptop screen

Abraham Lincoln might easily have been discussing data preparation steps for analytics when he said, “If I had 8 hours to chop down a tree, I would spend 6 sharpening my axe.”  Spending 75% of the allotted time on preparation may seem like a lot. But in fact, most industry observers report that data preparation steps for business analysis or machine learning consume 70 to 80% of the time spent by data scientists and analysts.

Data Preparation Steps in Detail

The data preparation pipeline consists of the following steps:

  1. Access the data.
  2. Ingest (or fetch) the data.
  3. Cleanse the data.
  4. Format the data.
  5. Combine the data.
  6. And finally, analyze the data.

Access

There are many sources of business data within any organization. Examples include endpoint data, customer data, marketing data, and all their associated repositories. This first essential data preparation step involves identifying the necessary data and its repositories. This is not simply identifying all possible data sources and repositories, but identifying all that apply to the desired analysis. This means that there must first be a plan that includes the specific questions to be answered by the data analysis.

Ingest

Once the data is identified, it needs to be brought into the analysis tools. The data will likely be some combination of structured and semi-structured data in different types of repositories. Importing it all into a common repository is necessary for the subsequent steps in the pipeline. Access and ingest tend to be manual processes with significant variations in exactly what needs to be done. Both data preparation steps require a combination of business and IT expertise and are therefore best done by a small team. This step is also the first opportunity for data validation.

Cleanse

Cleansing the data ensures that the data set can provide valid answers when the data is analyzed. This step could be done manually for small data sets but requires automation for most realistically sized data sets. There are software tools available for this processing. If custom processing is needed, many data engineers rely on applications coded in Python. There are many different problems possible with the ingested data. There could be missing values, out-of-range values, nulls, and whitespaces that obfuscate values, as well as outlier values that could skew analysis results. Outliers are particularly challenging when they are the result of combining two or more variables in the data set. Data engineers need to plan carefully for how they are going to cleanse their data.

Format

Once the data set has been cleansed; it needs to be formatted. This step includes resolving issues like multiple date formats in the data or inconsistent abbreviations. It is also possible that some data variables are not needed for the analysis and should therefore be deleted from the analysis data set. This is another data preparation step that will benefit from automation. Cleansing and formatting steps should be saved into a repeatable recipe data scientists or engineers can apply to similar data sets in the future. For example, a monthly analysis of sales and support data would likely have the same sources that need the same cleansing and formatting steps each month.

Combine

When the data set has been cleansed and formatted, it may be transformed by merging, splitting, or joining the input sets. Once the combining step is complete, the data is ready to be moved to the data warehouse staging area. Once data is loaded into the staging area, there is a second opportunity for validation.

Analyze

Once the analysis has begun, changes to the data set should only be made with careful consideration. During analysis, algorithms are often adjusted and compared to other results. Changes to the data can skew analysis results and make it impossible to determine whether the different results are caused by changes to the data or the algorithms.

Data Preparation Principles and Best Practices

Many of the principles of functional programming can be applied to data preparation. It is not necessary to use a functional programming language to automate data preparation, but such languages are often used to do so.

  1. Understand the data consumer – who is going to use the data and what questions do they need answered.
  2. Understand the data – where it is coming from and how it was generated.
  3. Save the raw data. If the data engineer has the raw data, then all the data transformations can be recreated. Additionally, don’t move or delete the raw data once it is saved.
  4. If possible, store all the data, raw and processed. Of course, privacy regulations like the European Union (EU)’s General Data Protection Regulation (GDPR) will influence what data can be saved and for how long.
  5. Ensure that transforms are reproducible, deterministic and idempotent. Each transform must produce the same results each time it is executed given the same input data set, without harmful effects.
  6. Future proof your data pipeline. Version not only the data and the code that performs the analysis, but also the transforms that have been applied to the data.
  7. Ensure that there is adequate separation between the online system and the offline analysis so that the ingest step does not impact user-facing services.
  8. Monitor the data pipeline for consistency across data sets.
  9. Employ Data Governance early, and be proactive. IT’s need for security and compliance means incorporating governance capabilities like data masking, retention, lineage, and role-based permissions are all important aspects of the pipeline.

Know your data, know your customers’ needs, and set up a reproducible process for constructing your data preparation pipeline.

Making Data Integration Easier

Actian DataConnect is a versatile hybrid integration solution. It allows you to connect to virtually any data source, regardless of format or location, using any protocol that empowers business users, integration specialists, SaaS admins, and line-of-business owners. Users can design and manage integrations and move data quickly while IT maintains corporate governance. Find out how Actian can help with all your data integration, data management, and data storage needs here.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #7: Complex, Not Complicated

Actian Corporation

July 20, 2021

Choosing The Best Way Forward, Concept

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning in order to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is Complex…but isn’t Complicated

This closing statement seems to us, a fitting summary for all of the above, and it will be our conclusion.

We have seen too many Data Catalog initiatives morph into endless data governance projects that try to solve a laundry list of issues- ignoring those easily solved by a Data Catalog. Once you have removed the extra baggage.

The deployment of a Data Catalog only takes a few days, rather than months, to produce value.

The services rendered by a Data Catalog are simple. In its leanest form, a Data Catalog presents as a search bar, in which any user can type in a few keywords (or even pose a question in a natural language) and obtain a list of results with the first 5 elements being the most relevant, thus providing him with all the information he needs to use the data (just like a web search engine, or an online retailer).

This ease of use is crucial to guarantee adoption by the data teams. On the user front, the Data Catalog should be a simple affair with a clean design. Like any other search or recommendations engine, however, the underlying complexity is substantial.

The good news for the customer is that this complexity is nothing for you to worry about, it’s on us.

Actian Data Intelligence Platform has invested enormously on the structure of the information (building a knowledge graph), on automation and on the search and recommendations engine. This complexity isn’t visible but it is what constitutes the value of a Data Catalog.

The obsession for simplicity is at the heart of our values. Each functionality we choose to add to the product has to tick one of the two boxes below:

  • Does this functionality help deploy the catalog faster in the organization?
  • Does this functionality enable the data teams to find the information more quickly in order to get on with their projects?

If neither of the questions above are answered by yes, the functionality will be discarded.

The result is that you can connect the Actian Data Intelligence Platform to your operational systems, configure and feed your first metamodel, and open the catalog to the end users within a matter of days.

Of course, going forward, you’ll need to complete the metamodel, integrate other sources, etc. But the value creation is instant.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

It’s Time for Data Historians to Become…History

Actian Corporation

July 17, 2021

data historians hands

Database Historians…History?

Why a modern time-series capable database can simplify yet enhance time-series data analysis.

Despite the professorial image the term suggests, a data historian is not an instructor or researcher, but a purpose-built software solution. And evolutions in how operational data is used and managed have eclipsed the need for data historian software solutions.

What is a Data Historian?

There are many Operational Technology (OT) environments within manufacturing, oil and gas, engineering research, and countless other industries. In these environments, complex equipment, machinery, and networks of sensors and devices generate time-series data. These time-series streams range from sensor data representing pressure, volume and temperature to video streams for machine vision and surveillance.

Initially, these streams were ignored or sampled only at low periodic rates. As time-series streams increased in volume and local data processing incorporated multiple feed reconciliation, OT engineers began to build data collection, aggregation, and minimal processing systems to better handle these time-series data streams. Eventually, these proprietary and bespoke systems were collectively labeled data historians.

The Data Historian Process Gap

The use and users of OT data have both changed much during the past few years. Increasingly, OT data is leveraged by a host of other players within an organization beyond OT professionals. These newer users include developers, business analysts and data scientists supporting the OT, and product and service managers driving the business.

However, no data historian software solution was ever designed for use with a range of external systems or by users who were not OT professionals. Instead, the typical data historian platform was little more than libraries of data collected by and intended only for the use of OT professionals. And they typically built each data historian software solution from the ground up, directly or by proxy through vendors of manufacturing or other specialized equipment. In essence, data historian solutions are libraries built only for the librarians.

In addition, much data historian software was implemented on expensive legacy hardware. Resource constraints and lack of standards meant that functionality was pared down and focused only on the localized and immediate requirements of the OT infrastructure and process at hand. The result is that data historian software solutions are not easily extended for functions such as localized analytics and visualization or sharing data across local systems. It is also difficult or impossible for the typical data historian platform to easily and securely exchange data with modern backend systems for further analytics and visualization.

Technology That Empowers Historical Data to Shape the Future

As with any other part of the business and IT industries, the technology for data management is continuously evolving, with new capabilities emerging every day. Currently, three primary technology shifts are combining to move beyond the capabilities and expected outcomes of data historian software.

Modern Time-Series Databases: Beyond the Data Historian

Outside of the OT domain, the rest of your company data is likely stored in traditional relational databases and data warehouses. Data historian solutions were focused on capturing largely structured data in time-series formats. Today’s data is a vast superset of the data captured by these legacy systems.

Modern time-series databases include traditional time-series data capabilities. However, those modern solutions are designed and optimized for capturing data chronology and ingesting data from unstructured and multi-variate streaming data sources. These can range from Binary Large Objects (BLOBs) and data compliant with the JavaScript Open Notation (JSON) open standard to the latest in Internet of Things (IoT) connectivity.

Ad-Hoc Analysis and Reporting: the Right Data for Everyone

Data historians tend to rely upon NoSQL application programming interfaces (APIs). These store and access data based on so-called “key values,” rather than in the rows and columns of traditional databases. NoSQL APIs are great for data collection and local data management. However, they are not readily accessible for post-collection ad hoc analysis and reporting – particularly by business analysts and data scientists outside the OT domain.

Modern time-series databases provide both a NoSQL API and APIs compliant with the American National Standards Institute (ANSI) Structured Query Language (SQL) standard. The latter feature enables easy extraction of data to support remote ad-hoc analysis, reporting and visualization through widely used business intelligence and reporting tools that rely on standard IT connectivity mechanisms such as Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC).

Artificial Intelligence (AI): Enabling History to Support Predicting the Future

Traditional data historian solutions can enable operations managers in the field to catch problems with their infrastructures, such as when pressure is too high or a part has failed. But these alerts are always after the fact. The collection and processing speed of the specific data historian solution somewhat determines how quickly afterwards, but hindsight is always the default.

AI, powered by modern Machine Learning (ML) capabilities, can deliver alerts that are more insightful. Depending on the combinations of data, past patterns, and the ability to analyze them, AI-driven successors to data historian solutions can even deliver predictive guidance about when a part is likely to fail.  Modern, integrated time-series databases can support AI and ML capabilities locally at the point of action within the OT domain by integrating OT with backend IT. The result is that data scientists and engineers can craft AI and ML capabilities for backend IT systems. Developers and front-end OT engineers can then invoke those capabilities in the OT environment. This approach provides a new and modern way of interacting with your company’s data to generate more useful insights and improved outcomes.

Respect the Legacy, But Move Into the Future

Data historian solutions have been crucial to the evolution of OT and the IT industry since the 1980s and earlier, and their contributions should be acknowledged and respected. Their time has passed, however, and modern technology solutions are replacing them. These allow you to better manage the data your company needs today and have faster, more complete, and more accurate information insights for the future.

Actian is the industry leader in operational data warehouse and edge data management solutions for modern businesses. With a complete set of solutions to help you manage data on-premises, in the cloud, and at the edge, including mobile and IoT devices. Actian can help you develop the technical foundation you need to support true business agility. To learn more, visit www.actian.com.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #6: Must Rely on Automation

Actian Corporation

July 9, 2021

Business process management and workflow automation diagram with gears and icons with connection line network in background. Manager touching interface

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning in order to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog Must Rely on Automation

Some Data Catalog vendors, who hail from the world of cartography, have developed the rhetoric that automation is a secondary topic, which can be addressed at a later stage.

They will tell you that a few manual file imports suffice, along with a generous user community collaborating on their tool to feed and use the catalog. A little arithmetic is enough to understand why this approach is doomed to failure in a data-centric organization.

An active Data Lake, even a modest one, quickly hoovers up, in its different layers, hundreds and even thousands of datasets. Along with these datasets, can be added those from other systems (database applications, various APIs, CRMs, ERPs, noSQL, etc) which we usually want to integrate in the catalog.

The orders of magnitude quickly go beyond thousands, sometimes tens of thousands of datasets. Each dataset contains dozens of fields. Datasets and fields alone represent several hundreds of thousands of objects (we could also include other assets: ML models, dashboards, reports, etc). In order for the catalog to be useful, inventorying those objects isn’t enough.

You also need to combine with them all the properties (metadata) which will enable end users to find, understand, and exploit these assets. There are several types of metadata: technical information, business classification, semantics, security, sensitivity, quality, norms, uses, popularity, contacts, etc. Here again, for each asset, there are dozens of properties.

Back to the arithmetics: Overall, we are dealing with millions of attributes needing to be maintained.

Such volumes alone should disqualify any temptation to choose the manual approach. But there is more. The stock of informational assets isn’t static. It is constantly growing. In a data-centric organization, datasets are created daily, others are moved or changed.

The Data Catalog Needs to Reflect These Changes.

Otherwise, its content will be permanently obsolete and the end users will reject it. Who is going to trust a Data Catalog that is incomplete and wrong? If you feel that your organization can absorb the load and keep your catalog up to date, that’s wonderful. Otherwise, we would suggest you monitor as quickly as possible the level of automation provided by the different solutions you are looking at.

What can we Automate in a Data Catalog?

In terms of automation, the most important capacity is the inventory.

A Data Catalog should be able to regularly scan all your data sources and automatically update the asset inventory (datasets, structures and technical metadata at a minimum) to reflect the day-to- day reality of the hosting systems.

Believe us: a Data Catalog that cannot connect to your data sources will quickly become useless, because its content will always be in doubt.

Once the inventory is completed, the next challenge is to automate the metamodel feed.

Here, beyond the technical metadata, complete automation seems a little hard to imagine. It is still possible to significantly reduce the necessary workload for the maintenance of the metamodel. The value of certain properties can be determined by simply applying rules at the time of the integration of the objects in the catalog.

It is also possible to suggest property values using more or less sophisticated algorithms (semantic analysis, pattern matching, etc.).

Lastly, it’s often possible to feed part of the catalog by integrating the systems that produce or contain metadata. This can apply for instance for quality measurement, for lineage information, for business ontologies, etc.

For this approach to work, the Data Catalog must be open and offer a complete set of APIs that allow the metadata to be updated from other systems.

Take Away

A Data Catalog handles millions of information in a constantly shifting landscape.

Maintaining this information manually is virtually impossible, or extremely costly. Without automation, the content of the catalog will always be in doubt, and the data teams will not use it.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #5: Not a Business Modeling Solution

Actian Corporation

July 9, 2021

a data catalog is not a business modeling solution

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Business Modeling Solution

Some organizations, usually large ones, have invested for years in the modeling of their business processes and information architecture.

They have developed several layers of models (conceptual, logical, physical) and have put in place an organization that helps the maintenance and sharing of these models with specific populations (business experts and IT people mostly).

We do not question the value of these models. They play a key role in the urbanization, the schema blueprints, the IS management, as well as regulatory compliance. But we seriously doubt that these modeling tools can provide a decent Data Catalog.

There is also a market phenomenon at play here: certain historical business modeling players are looking to widen the scope of their offer by positioning themselves on the Data Catalog market. After all, they do already manage a great deal of information on physical architecture, business classifications, glossaries, ontologies, information lineage, processes and roles, etc. But we can identify two major flaws in their approach.

The first is organic. By their nature, modeling tools produce top-down models to outline the information in an IS. However accurate it may be, a model remains a model: a simplified representation of reality.

They are very useful communication tools in a variety of domains, but they are not an exact reflection of the day-to-day operational reality which, for me, is crucial to keeping the promises of a Data Catalog (enabling teams to find data, understanding and knowing how to use the datasets).

The second flaw?: It is not user -friendly.

A modeling tool is complex and handles an important number of abstract concepts which require an important learning curve. It’s a tool for experts.

We could consider improving user friendliness of course to open it up to a wider audience. But the built-in complexity of the information won’t go away.

Understanding the information provided by these tools requires a solid understanding of modeling principles (object classes, logical levels, nomenclatures, etc). It is quite a challenge for data teams and a challenge that seems difficult to justify from an operational perspective.

The truth is, modeling tools that have been turned into Data Catalogs are faced with important adoption issues with the teams (they have to make huge efforts to learn how to use the tool, only to not find wha t they are looking for).

A prospective client recently presented us with a metamodel they had built and asked us whether it was possible to implement it in the Actian Data Intelligence Platform. Derived from their business models, the metamodel had several dozen classes of objects and thousands of attributes. To their question, the official answer was yes (the platform metamodel is very flexible). But instead, we tried to dissuade them from taking that path: A metamodel that sophisticated ran the risk, in our opinion, of losing the end users, and turning the Data Catalog project into a failure…

Should we Therefore Abandon Business Models When Putting a Data Catalog in Place? Absolutely Not.

It must, however, be remembered that business models are there to handle some issues, and the Data Catalog other issues. Some information contained within the models help structure the catalog and enrich its content in a very useful way (for instance responsibilities, classifications, and of course business glossaries).

The best approach is therefore, in our view, to conceive the catalog metamodel by focusing exclusively on the added value to the data teams (always with the same underlying question: does this information help find, localize, understand, and correctly use the data?), and then integrating the modeling tool and the Data Catalog in order to automate the supply of certain elements of the metamodel already present in the business model.

Take Away

 As useful and complete as they may be, business models are still just models: they are an imperfect reflection of the operational reality of the systems and therefore they struggle to provide a useful Data Catalog.

Modeling tools, as well as business models, are too complex and too abstract to be adopted by data teams. Our recommendation is that you define the metamodel of your catalog with a view to answering the questions of the data teams and supply some aspects of the metamodel with the business model.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #4: Not a Query Solution

Actian Corporation

July 2, 2021

a data catalog is not a query solution

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Query Solution

Here is another oddity of the Data Catalog market. Several vendors, whose initial aim was to allow users to query simultaneously several data sources, have “pivoted” towards a Data Catalog positioning on the market.

There is a reason for them to pivot.

The emergence of Data Lakes and Big Data have cornered them in a technological cul-de-sac that has weakened the market segment they were initially in.

A Data Lake is typically segmented into sever al layers. The “raw” layer integrates data without transformation, in formats that are more or less structured and in great quantities; A second layer, which we’ll call “clean”, will contain roughly the same data but in normalized formats, after a dust down. After that, there can be one or sever al “business” layers ready for use: A data warehouse and visualization tool for analytics, a Spark cluster for data science, a storage system for commercial distribution, etc. Within these layers, data is transformed, aggregated and optimized for use, along with the tools supporting this use (data visualization tools, notebooks, massive processing, etc).

In This Landscape, a Universal Self-Service Query Tool isn’t Suitable.

It is of course possible to set up an SQL interpretation layer on top of the “clean” layer (like Hive) but query execution remains a domain for specialists. The volumes of data are huge and rarely indexed.

Allowing users to define their own queries is very risky: On on-prem systems, they run the risk of collapsing the cluster by running a very expensive query. And on the Cloud, the bill could run very high indeed. Not to mention security and data sensitivity issues.

As for the “business” layers, they are generally coupled with more specialized solutions (such as a combination of Snowflake and Tableau for analytics) that offer very complete and secured tooling, offering great performance for self-service queries. With their market space shrinking like snow in the sun, some multi-source query vendors have pivoted towards Data Catalogs.

Their pitch is now to convince customers that the ability to execute queries makes their solution the Rolls-Royce of Data Catalogs (in order to justify their six-figure pricing). We would invite you to think twice about it.

Take Away

On a modern data architecture, the capacity to execute queries from a Data Catalog isn’t just unnecessary, it’s also very risky (performance, cost, security, etc.).

Data teams already have their own tools to execute queries on data, and if they haven’t, it may be a good idea to equip them. Integrating data access issues in the deployment of a catalog is the surest way to make it a long, costly, and disappointing project.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

What is a Data Mesh?

Actian Corporation

June 28, 2021

In this new era of information, new terms are used in organizations working with data: Data Management Platform, Data Quality, Data Lake, Data Warehouse

Behind each of these words, we find specificities, technical solutions, etc. Let’s decipher.

Did you say: “Data Mesh”? Don’t be embarrassed if you’re not familiar with the concept. The term wasn’t used until 2019 as a response to the growing number of data sources and the need for business agility.

The Data Mesh model is based on the principle of a decentralized or distributed architecture exploiting a literal mesh of data.

While a Data Lake can be thought of as a storage space for raw data, and the Data Warehouse is designed as a platform for collecting and analyzing heterogeneous data, Data Mesh responds to a different use case.

On paper, a Data Warehouse and Data Mesh have a lot in common, especially when it comes to their main purpose, which is to provide permanent, real-time access to the most up-to-date information possible. But Data Mesh goes further. The freshness of the information is only one element of the system.

Because it is part of a distributed model, Data Mesh is designed to address each business line in your company with the key information that it concerns.

To meet this challenge, Data Mesh is based on the creation of data domains. 

The advantages? Your teams are more autonomous through local data management, a decentralization of your enterprise in order to aggregate more and more data, and finally, more control of the overall organization of your data assets.

Data Mesh: Between Logic and Organization

If a Data Lake is ultimately a single reservoir for all your data, Data Mesh is the opposite. Forget the monolithic dimension of a Data Lake. Data is a living, evolving asset, a tool for understanding your market and your ecosystem and an instrument of knowledge and understanding. 

Therefore, in order to appropriate the concept of meshing data, you need to think differently about data. How can we do this? By laying the foundations for a multi-domain organization. Each type of data has its own use, its own target, and its own exploitation. From then on, all the business areas of your company will have to base their actions and decisions on the data that is really useful to them to accomplish their missions. The data used by marketing is not the same as the data used by sales or your production teams. 

The implementation of a Data Catalog is therefore the essential prerequisite for the creation of a Data Mesh. Without a clear vision of your data’s governance, it will be difficult to initiate your company’s transformation. Data quality is also a central element. But ultimately, Data Mesh will help you by decentralizing the responsibility for data to the domain level and by delivering high-quality transformed data.

The Challenges

Does adopting Data Mesh seem impossible because the project seems both complex and technical? No cause for panic! Data Mesh, beyond its technicality, its requirements, and the rigor that goes with it, is above all a new paradigm. It must lead all the stakeholders in your organization to think of data as a product addressed to the business. 

In other words, by moving towards a Data Mesh model, the technical infrastructure of the data environment is centralized, while the operational management of the data is decentralized and entrusted to the business.

With Data Mesh, you create the conditions for an acculturation to data for all your teams so that each employee can base his or her daily action on data.

The Data Mesh Paradox

Data Mesh is meant to put data at the service of the business. This means that your teams must be able to access it easily, at any time, and to manipulate the data to make it the basis of their daily activities.

But in order to preserve the quality of your data, or to guarantee compliance with governance rules, change management is crucial and the definition of each person’s prerogatives is decisive. When deploying Data Mesh, you will have to lay a sound foundation in the organization. 

On the one hand, free access to data for each employee (what we call functional governance). On the other hand, management and administration, in other words, technical governance in the hands of the Data teams.

Decompartmentalizing uses by compartmentalizing roles, that’s the paradox of Data Mesh.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #3: Not a Compliance Solution

Actian Corporation

June 25, 2021

a data catalog is not a compliance solution

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.

 These players have rejigged their marketing positioning to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Compliance Solution

As with governance, regulatory compliance is a crucial issue for any data-centric organization.

There is a plethora of data handling regulations spanning all sectors of activity and countries. On the subject of personal data alone, GDPR is mandatory across all EU countries, but each State has a lot of wiggle room on how its implemented, and most States have a large arsenal of legislation to complete, reinforce and adapt it (Germany alone for instance, has several dozen regulations across different sectors of activity related to personal data).

In the US, there are hundreds of laws and regulations across States and sectors of activity (with varying degrees of adherence). And here we are only referring to personal data…Rules and regulations also exist for financial data, medical data, biometric data, banking data, risk data, insurance data etc. Put simply, every organization has some regulation it has to be in compliance with.

So What Does Compliance Mean in this Case?

The vast majority of regulatory audits center on the following:

  • The ability to provide complete and up to date documentation on the procedures and controls put in place in order to meet the norms.
  • The ability to prove that the procedures described in the documentation are rolled out in the field.
  • The ability to supervise all the measures deployed with a view towards continuous improvement.

A Data Catalog is neither a procedures library, or an evidence consolidation system, and even less a process supervision solution.

It strikes us as obvious that assigning those responsibilities to a Data Catalog will make it considerably less simple to use (norms are too obscure for most people) and will jeopardize adoption for those most likely to benefit from it (data teams).

Should we Therefore Forget About Data Catalogs in our Quest for Compliance?

No, of course not. Again, in terms of compliance, it would be much wiser to use the Da ta Catalog for the literacy of the data teams. And to tag the data appropriately thus, enabling the teams to quickly identify any norm or procedure they need to adhere to before using the data. The Catalog can even help place the tags using a variety of approaches. It can for example automatically detect sensitive or personal data.

That said, even with the help of ML, detection will never work perfectly ( the notion of “personal data” defined by GDPR for instance, is much larger and harder to detect than North American PII). The Catalog’s ability to manage these tags is therefore critical.

Take Away

Regulatory compliance is above all a matter of documentation and proof and has no place in a Data Catalog.

However, the Data Catalog can help identify (more or less automatically) data that is subject to regulations. The Data Catalog plays a key role in the acculturation of the data teams with respect to the importance of regulations.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Lakes: The Benefits and Challenges

Actian Corporation

June 24, 2021

data lakes: the pros and cons

Data Lakes are increasingly used by companies for storing their enterprise data. However, storing large quantities of data in a variety of formats can lead to data chaos! Let’s take a look at the pros and cons of Data Lakes.

To understand what a Data Lake is, let’s imagine a reservoir or a water retention basin that runs alongside the road. Regardless of the type of data, its origin, its purpose, everything, absolutely everything, ends up in the Data Lake. Whether that data is raw or refined, cleansed or not, all of this information ends up in this single place where it isn’t modified, filtered, or deleted before being stored.

Sounds a bit messy, doesn’t it? But that’s the whole point of the Data Lake.

It’s because it frees the data from any preconceived idea that a Data Lake offers real added value. How? By allowing data teams to constantly reinvent the use and exploitation of your company’s data.

Improvement of customer experience with a 360° analysis of the customer journey, detection of personas to refine marketing strategies, and rapid integration of new data flows from IoT, in particular, the Data Lake is an agile response to very structured problems for companies.

Data Lakes: The Undeniable Advantages

The first advantage of a Data Lake is that it allows you to store considerable volumes of protean data. Structured or unstructured, data from NoSQL databases…a Data Lake is, by nature, agnostic to the type of information it contains. It is precisely because it has no strict data exploitation scheme that the Data Lake is a valuable tool. And for good reason, none of the data it contains is ever altered, degraded, or distorted.

This is not the only advantage of a Data Lake. Indeed, since the data is raw, it can be analyzed on an ad-hoc basis.

The objective: to detect trends and generate reports according to business needs without it being a vast project involving another platform or another data repository. 

Thus, the data available in the Data Lake can be easily exploited, in real time, and allows you to place your company in a data centric scheme so that your decisions, your choices, and your strategies are never disconnected from the reality of your market or your activities.

Nevertheless, the raw data stored in your Data Lake can (and should!) be processed in a specific way, as part of a larger, more structured project. But your company’s data teams will know that they have, within reach of a click, an unrefined ore that can be put to use for further analysis.

The Challenges a Data Lake

When you think of a Data Lake, poetic mental images come to mind. Crystalline waves waving in the wind of success that carries you away…but beware! A Data Lake carries the seeds of murky, muddy waters. This receptacle of data must be the object of particular attention because without rigorous governance, the risk of sinking into a “chaos of data” is real.

In order for your Data Lake to reveal its full potential, you must have a clear and standardized vision of your data sources.

The control of these flows is a first essential safeguard to guarantee the good exploitation of data by heterogeneous nature. You must also be very vigilant about data security and the organization of your data.

The fact that the data in a Data Lake is raw does not mean that it should not have a minimum structure to allow you to at least identify and find the data you want to exploit.

Finally, a Data Lake often requires significant computing power in order to refine masses of raw data in a very short time. This power must be adapted to the volume of data that will be hosted in the Data Lake.

Between method, rigor and organization, a Data Lake is a tool that serves your strategic decisions.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

7 Lies of Data Catalogs #2: Not a Quality Solution

Actian Corporation

June 21, 2021

data quality

The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted several players from adjacent markets.

 These players have rejigged their marketing positioning in order to present themselves as Data Catalog solutions.

The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.

The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.

A Data Catalog is NOT a Data Quality Management (DQM) Solution

Do not underestimate the importance of data quality in successfully delivering a data project, quite the contrary. It just seems absurd to me to put this in the hands of a solution, which by its very nature, cannot achieve the controls at the right time.

Let us explain: There is a very elementary rule to quality control, a rule that can be applied virtually in any domain where quality is an issue, be it an industrial production chain, software development, or the cuisine of a 5-star restaurant: The sooner the problem is detected, the less it will cost to correct.

To demonstrate the point, a car manufacturer is unlikely to refrain from testing the battery of a new vehicle until after its built and all the production costs have already been incurred and solving a defect would cost the most. No. Each piece is closely controlled, each step of the production is tested, defective pieces are removed before ever being integrated in the production circuit, and the entire chain of production can be halted if quality issues are detected at any stage. The quality issues are corrected at the earliest possible state of the production process where they are the least costly and the most durable.

“In a modern data organization, data production rests on the same principles. We are dealing with an assembly chain whose aim is to provide usage with high added value. Quality control and correction must happen at each step. The nature and level of controls will depend on what the data is used for.”

If you are handling data, you obviously have at your disposal pipelines to feed your uses. These pipelines can involve dozens of steps – data acquisition, data cleaning, various transformations, mixing various data sources, etc.

In order to develop these pipelines, you probably have a number of technologies at play, anything from in-house scripts to costly ETLs and exotic middleware tools. It’s within those pipelines that you need to insert and pilot your quality control, as early as possible, by adapting them to what is at stake with the end product. Only measuring data quality levels at the end of the chain isn’t just absurd, it’s totally inefficient.

It is therefore difficult to see how a Data Catalog (whose purpose is to inventory and document all potentially usable datasets in order to facilitate data discovery and usage) can be a useful tool to measure and manage quality.

A Data Catalog operates on available datasets, on any systems that contain data, and should be as least invasive as possible in order to be deployed quickly throughout the organization.

A DQM solution works on the data feed (the pipelines), focuses on production data and is, by design, intrusive and time consuming to deploy. I cannot think of any software architecture that can tackle both issues without compromising the quality of either one.

Data Catalog vendors promising to solve your data quality issues are, in our opinion, in a bind and it seems unlikely they can go beyond a “salesy” demo.

As for DQM vendors (who also often sell ETLs), their solutions are often too complex and costly to deploy as credible Data Catalogs.

The good news is that the orthogonal nature of data quality and data cataloging makes it easy for specialized solutions in each domain to coexist without encroaching on each other’s lane.

Indeed, while a data catalog isn’t purposed for quality control, it can exploit the information on the quality of the datasets it contains which obviously provides many benefits.

The Data Catalog uses this metadata for example to share the information (and possible alerts it may identify) with the data consumers. The catalog can benefit from this information to adjust his search and recommendation engine and thus, orientate other users towards higher quality datasets.

And both solutions can be integrated at little cost with a couple of APIs here and there.

Take Away

Data quality needs to be assessed as early as possible in the pipeline feeds.

The role of the Data Catalog is not to do quality control but to share as much as possible the results of these controls. By their natures, Data Catalogs are bad DQM solutions, and DQM solutions are mediocre and overly complex Data Catalogs.

An integration between a DQM solution and a Data Catalog is very straightforward and is the most pragmatic approach.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Events

Hybrid Data Conference Recap and Highlights

Traci Curran

June 17, 2021

hybrid data conference banner

That’s a Wrap!

Wow! What a wonderful time we had at the 2021 Hybrid Data Conference! Over two days, we showcased amazing demos, customer stories and technology advancements across the Actian portfolio. For those in attendance, we hope you enjoyed the event and the opportunity to see a few of the ways Actian is innovating and enabling our customers to gain greater value from their data at a fraction of the time and cost of other cloud data platforms.

For those who missed the event, here’s a quick recap of some of our most popular sessions.

Some of Our Favorite Sessions from the 2021 Hybrid Data Conference

Delivering on the Vision – Actian Hybrid Data Platform, presented by Emma McGrattan, Actian VP of Engineering

Emma McGrattan, Actian’s VP of Engineering gave an in-depth overview of how Actian products are delivering on the vision of hybrid cloud. Highlighting the Actian Data Platform, Emma showcased how Actian’s product portfolio is accelerating cloud adoption and changing the way customers advance along their cloud journey. If you’re looking to make the shift left right away or modernize and preserve investments in critical applications, this session is a great overview of many options and use cases to support your unique path to the cloud.

Actian on Google Cloud, Presented by Lak Lakshmanan, Google’s Director of Analytics

This brief 15 minute session presented by Lak Lakshmanan, Google’s Director of Analytics and AI Solutions, is a great intro in why Actian has chosen Google as our preferred cloud. We all love a better together story, but Lak shows provides a glimpse from the cloud provider perspective.

Of course, no conference would be complete without perspectives from our customers. Actian would like to thank all of the customers and partners that made the 2021 Hybrid Data Conference a success.

Actian Customer Panel Featuring Key Customer Speakers from Sabre, Finastra, and Goldstar Software

One Final Highlight

Greg Williams from Wired Mag Image

We were delighted to have Greg Williams, Editor-in-Chief for Wired deliver his thoughts on why data-driven insights are no longer optional in today’s modern world. Greg summarized it best in his presentation – every company is a data company.

Please visit the on-demand conference to hear more of his outstanding commentary on the future of data and how companies are creating advantage in a global economy.

Once again, we want to thank everyone that attended this year’s Hybrid Data Conference. We hope you found the networking and content valuable and we can’t wait to see you in 2022 – hopefully in person! Stay safe, and enjoy your summer!

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.