Data Intelligence

DataOps: Data Catalogs Enable Better Data Discovery in a Big Data Project

Actian Corporation

May 6, 2020

big data

In today’s world, Big Data environments are more and more complex and difficult to manage. We believe that Big Data architectures should, among other things:

  • Retrieve information on a wide spectrum of data.
  • Use advanced analytics techniques such as statistical algorithms, machine learning, and artificial intelligence.
  • Enable the development of data-oriented applications such as a recommendation system on a website.

In order to put in place a successful Big Data architecture, enterprise data is stored in a centralized data lake, destined to serve various purposes. However, the massive & continuous amount of diverse & varied data from different sources transforms a data lake into a data swamp. So, as business functions are increasingly working with data, how can we help them find their way?

In order for your Big Data to be exploited to their full potential, your data must be well documented.

Data documentation is key here. However, documenting data such as their business name, description, owner, tags, level of confidentiality, etc, can be an extremely time-consuming task, especially with millions of data points available in your lake!

With a DataOps approach, an agile framework focused on improving communication, integration, and automation of data flows between data managers and data consumers across an organization, enterprises are able to carry out their projects in an incremental manner. Supported by a data catalog solution, enterprises are able to easily map and leverage their data assets in an agile, collaborative, and intelligent manner.

How Does a Data Catalog Support a DataOps Approach in Your Big Data Project?

Let’s go back to the basics…what is a data catalog?

A data catalog automatically captures and updates technical and operation metadata from an enterprise’s data sources and stores them in a unique source of truth. It’s purpose is to democratize data understanding: to allow your collaborators to find the data they need via one easy-to-use platform above data systems. Data catalogs don’t require technical expertise to actually discover what is new and seize opportunities.

Effective Data Lake Documentation for Your Big Data

Think of Legos. Legos can be created and built into anything you want, but at its core, Legos are still just a set of bricks. Theses blocks can be shaped to any need, desire or resource.

In your quest to facilitate your data lake journey, it is important to create effective documentation through the following:

  • Customizable layouts.
  • Interactive components.
  • A set of pre-created templates.

By offering modular templates, Data Stewards can simply and efficiently configure documentation templates according to their business users’ data lake search queries.

Monitor Big Data With Automated Capabilities

Through an innovative architecture and connectors, data catalogs can connect to your Big Data sources, where the IT department can monitor their data lake. They are able to map new incoming datasets, be notified of any deleted or modified datasets or even report errors to referring contacts for example.

Users are able to access to up-to-date information in real time.

These automated capabilities allow users to be notified of when new datasets appear, when they are deleted, when there are errors, when they were last updated, etc.

Support Big Data Documentation With Augmented Capabilities

Intelligent data catalogs are essential for data documentation. They rest on artificial intelligence and machine learning techniques, one being “fingerprinting” technology. This feature offers data users that are responsible for a particular data set some suggestions as for its documentation. These recommendations can, for example, be associated with tags, contacts, or even business terms of other data sets based on:

  • The analysis on the data itself (statistical analysis).
  • The schema resembling other data sets.
  • The links on the other data set’s fields.

An intelligent data catalog also detects personal/private data in any given data set and report it on its interface. This feature helps enterprises respond to the different GDPR demands put into place in May 2018, as well as alert potential users on a data’s sensitivity level.

Enrich Your Big Data Documentation With Data Catalog

Enrich your data’s documentation with the Actian Data Intelligence Platform. Our metadata management platform was designed for Data Stewards, and centralizes all data knowledge in a single and easy-to-use interface.

Automatically imported, generated, or added by the administrator, data stewards are able to efficiently document their data directly within our data catalog. Give meaning to your data with metadata.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

HIE Rule Changes and How it Impacts Healthcare Companies

Traci Curran

May 5, 2020

healthcare companies sharing data using actian

Health Information Exchanges (HIEs) play a critical role in enabling healthcare companies to share data by getting clinical information to where it is needed. HIEs gather electronic medical information or EMR (Electronic Medical Records) from hospitals, doctors, and other healthcare providers across a region so the data can be shared securely. This enables the industry to deliver better patient care, improve personal and public health, and lower overall costs.

The COVID-19 pandemic has caused a couple of significant changes to the way healthcare information is shared. The Center for Medicare and Medicaid (CMS) announced a new rule that requires hospitals to notify doctors if their patients are undergoing coronavirus treatment. Health Information Exchanges play a vital orchestration role in these notifications, so to make compliance easier, many HIEs have changed and relaxed their confidentiality rules to remove the patient consent requirement and expand data sharing to include epidemiology, surveillance, and related efforts. These changes mean hospitals and other providers must update their processes and IT systems.  The effort required to do this varies based on where the provider is located, the services provided by the HIE, and the company’s strategy for data integration.

Impact to Organizations That are Part of a Health Information Exchange

Many states and regions have Health Information Exchanges in place to orchestrate the data-sharing process. Providers make their data available to the exchange, which then handles distributing it to each of the consuming organizations. If your company takes part in a HIE, there are two main impacts the recent changes will have on you.

  1. Change processes and system rules to follow the HIE rules about patient consent. You likely have a control in place to prevent data transmission if the patient has not supplied explicit consent. This control may need to be removed.
  2. Data volume for incoming data. With more providers sending more data to the HIEs, consuming organizations are likely to see more data coming into their IT systems to support patient care. Real-time data integration is critical to avoid delays in supplying information to practitioners.

Impact to Organizations That are not Part of a Health Information Exchange

Healthcare companies must follow the CMS rule on integration and patient access regardless of whether they take part in a HIE. If you don’t have a HIE available or have direct relationships with partner organizations outside of the HIE, your company is responsible for ensuring COVID-19-related data is distributed to the correct external parties in a safe, secure, and consistent way. That is likely going to mean deploying new data integrations and updating your current ones.

Depending on your company’s integration strategy, how many organizations you need to integrate with, and what investments you have made in the past, the new CMS rule may require significant IT effort. The Actian DataConnect platform can help make this process easier by providing your IT staff with a robust set of tools for designing, deploying, and managing integrations across your company’s partner ecosystem. With HIPAA and HL7 compliant solutions designed for the healthcare industry, Actian DataConnect can help you deliver more integrations, faster.

Why Healthcare Companies Need a Data Integration Platform

Pandemic situations like COVID-19 causes tremendous strain on the healthcare industry. The number of patients the healthcare system can handle at one time is often determined by the efficiency of the healthcare system in exchanging patient information. Healthcare practitioners do not know what data is available to them unless it is pushed to the screen. Health Information Exchanges play an essential role in this process, but the last mile is dependent on your internal IT staff.  Actian DataConnect gives them the tools they need to manage your data integrations effectively.

Actian DataConnect provides a robust data integration platform that can enable hospitals and other healthcare companies to manage their integrations with HIEs and other partners in a scalable, safe, and controlled way. Aggregate data from all your internal data sources and ensure that your partners have a complete, correct, and up-to-date set of information about the patient so they can coordinate care. Connect with the HIE, governmental organizations like the CDC and FDA, research institutions, and industry partners to give your practitioners access to the best information available in the industry to aid in diagnosis and treatment. To learn more, visit DataConnect.

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Management

SQLite – the Banana Slug of Embedded Databases

Actian Corporation

April 29, 2020

banana and paper slug as a representation of sqlite

Earlier this month, I kicked off a series of blogs about SQLite. The first looked at the advantages that SQLite offers over both flat files and the heavier lift of an enterprise-grade SQL databases – and it does offer distinct advantages over both. To a point. And that point was five years ago.

Here’s the thing: If you’re a mobile or IoT developer, or if you’re extending out from the cloud to the edge with distributed applications and data, local embedded data management is a critical capability, and that’s where SQLite stood out for years. But while local and embedded data management is necessary – indeed critical – for modern edge data management systems, as implemented in SQLite, it is insufficient. Modern edge data management demands an ability to process and analyze data locally, to share it peer-to-peer, to move data between gateways, other intelligent machines, and even back into the cloud – and SQLite was never built to meet those demands.

Putting aside the challenges posed by the shared and distributed data requirements – we’ll touch on those in the next installment – let’s just examine SQLite’s limitations in the area of local data processing, starting with one of the most important, performance.

SQLite is just plain slow.

Eighteen months ago, we ran performance tests of Actian Zen — our Zero-DBA Embedded Nano-footprint database—against the latest SQLite distribution and found Zen to be faster by two orders of magnitude, depending on the operation being run. Okay, for indexed deletes, it was three orders of magnitude faster. We made an apples-to-apples comparison, running both on a Raspberry Pi 3, a small ARM-based single-board computer that you can buy from Amazon for under $50. Zen Core and SQLite are both free, and you can run this test for yourself.

Okay, so SQLite’s pokiness is not really news. Everyone in the SQLite community knows that SQLite is painfully slow. So why has it remained so popular? Practically speaking, it has largely been the only game in town. There are three reasons for this. First, it’s an open-source offering that’s been around for over two decades, so it is widely known. Second, it’s bundled with a lot of open-source developer kits, most notably Android. Finally, many database vendors at one time or another have packaged and rebranded SQLite as their “mobile” edition (read: MongoDB and Couchbase). There are even a few startups that literally slap their label on SQLite and sell services around it as their only market offering.

Here’s where we get back to those limitations: The sluggish performance that was good enough five years ago is simply not going to be good enough for the next five years (let alone any period after that). And the performance problems in SQLite are only going to get worse: The data management tasks at the Edge are going to grow more challenging over time, even for embedded applications.

Consider the increased need for local data persistence. It’s not just for simple caching anymore. Now data persistence is needed for computationally intensive local data processing and unsupervised machine learning. In these scenarios, we see a firehose of inbound streaming data that SQLite is just not robust enough to handle. Moreover, we see an increase in compute demands involving query, extraction, and analysis of existing patterns from the local database and/or those of external peers or upstream gateways.

And it’s not just an issue of the volume of streaming data or the sophistication of the analytics taking place. There’s also the issue of multiple applications using the same set of data simultaneously – or even a single upstream consumer subscribing and copying data from multiple downstream publishers. Both cases require a level of concurrency that is architecturally out of scope for SQLite. At best, it can be said that SQLite attempts to simulate concurrency with a lock on the entire data table to all users other than the one currently reading or writing to it – even if that read or write involves only a single row – instead of the granular lock a real ACID-compliant SQL database should provide.  The end result is that SQLite creates serious bottlenecks as data demands and volumes increase in practical IoT and mobile use cases.

Ratcheting up the horsepower of the compute platform behind SQLite will improve the performance of this aging campaigner only so much. What modern edge data management needs is a cheetah, and what we have in SQLite is banana slug.

And none of these scenarios are scenes from some distant future. Enterprises are encountering these challenges today in situations involving IoT grids with thousands of sensors and upstream gateways. Some of our competitors are seeing these challenges too, as more and more are looking to jettison SQLite as their mobile engine.

If you’ve tested SQLite against other databases or documented a change in the performance of your application when you moved from using flat files to SQLite, let us know. Feel free to drop me a line at lewis.carr@actian.com.

In the meantime, please read my next blog next on what the near-ubiquitous use of SQLite as a mobile database has meant for data sharing and movement from mobile and IoT at the edge to the cloud and data center.

Finally, if you’re ready to reconsider SQLite, learn more about Actian Zen.  Or, you can just kick the tires for free with Zen Core which is royalty-free for development and distribution.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Platform

RTDM in Times of Market Uncertainty, and What is it Anyway?

Actian Corporation

April 29, 2020

clocks moving forward for decision-making

I recently started a new blog series on why Real-Time Decision-Making is required during periods of market uncertainty. In this next blog, let’s level set on exactly what is Real-Time Decision-Making and why it matters so much more during periods of market uncertainty and business disruption. Naturally, what got me thinking about all of this is COVID-19 and that more than likely, peaks and valleys of this pandemic will generate intermittent business disruptions, possibly through spring 2021. And, even if this episode mercifully does simply go away, it is unlikely this is the last Coronavirus we will see in our lifetimes.

If it weren’t a pandemic, it would be floods, political instability, wars, earthquakes – you name it – that create unexpected business challenges for organizations large and small. Upon further reflection, I realized my thinking around business-as-usual should be adjusted as well; after all, mergers and acquisitions, significant regulatory changes, even labor disputes, and strikes can create similarly devastating impacts.

With this broader set of disruptors as a backdrop, I believe a simple definition of Real-Time Decision-Making is in order. Real-Time Decision-Making is the ability to deliver decision support within the shortest possible time frame using the best possible set of data and decision-making models to direct and report on business operations and interactions. Some qualifiers need to be applied though; essentially, everything is case by case, with the key factors being:

  • Time Frame: The time constraints of your business or organizational and operational process.
  • Best Possible Set of Data: Comprehensive, reliable, fresh.
  • Decision-Making Model: Built by and for use in decision support across your organization with results that are clear, prescriptive, and actionable and extensible to new players on your virtual team.

It is worth noting what Real-Time Decision-Making (RTDM) is not. Most notably, it is not the actual business process being executed, although data extracted from those processes and metadata about them is part of the data collection and aggregation to perform the analysis behind RTDM. Instead, based on the definition above, think of it as the closed feedback loop in a control system. It’s also not the individual data collection events, data queries, or other “steps” within an intelligence-gathering or reporting process. In fact, think about a query on any search engine and the fact that the engine aggregates and maps all the data that the query is being executed on long before the query is made.

Defining RTDM narrowly against the real-time part of the name misses the forest for the trees and locks us into a more technical definition of real-time execution. Not that speed isn’t important – it is – but bounded by the time constraints of the business implemented by IT and not the other way around. Further, RTDM should not be equated to real-time analytics. While digital transformation and the rise of mobile, IoT, and web and social media streams make streaming data commonplace, there is still a place for batch processing and analytics as they will often satisfy time constraints from a business perspective.

Well, that’s my definition. Why does it matter more in times of market uncertainty and business disruption? First, let’s establish a baseline, RTDM always matters and the more that a company has evolved along their digital transformation path, the more that RTDM is a part of their normal business operations. However, during periods of market uncertainty, the speed of business is compressed. Compounding this difficulty, often several course corrections are needed – sometimes in parallel – to accurately respond to changing market and business conditions. RTDM serves as the feedback loop and its capabilities and performance must rise to the occasion, optimizing what are often negatively correlated parameters: speed and accuracy and cost.

There are three key traits that define a world-class RTDM supporting strategic capabilities:

Complete Common Operational Picture (COP): Some of you may recognize this as term used in net-centric operations but it really applies to almost any scenario where your situational awareness is a function of how complete your dataset is and the ability for it to paint a picture understood by all parties in your virtual team. The more complete your COP, the higher your situational awareness or in non-militaristic terms, situational IQ.

Data-Guided: I could’ve used the term data-driven, but it often means that the decision is made solely on the data and there is no weighting given to prior experience nor external parties that bring in their own point of view (often without an ability to see let alone vet their data fully). The completeness of the COP and how easily its visualized by each role in the context of the operational task will determine the extent data-guided equates to data-driven and how high your situational IQ is.

Prescriptive and Executable: The RTDM generates decision support in a format that works for the specific business process or reviews of one or more of them with speed and accuracy. Additionally, this support must be in a format that is actionable. By actionable we mean that the intelligence informs humans or machines as to what path they should take, operation they should execute, or to combine it with other intelligence in order to do the same.

All three of these characteristics are strained by market turmoil. The complete COP for normal business is incomplete when faced with the need for a new supply chain partner, a significant segment of your workforce unable to show up to work, and so forth. There will be a need to incorporate data from external sources and new entries into your virtual team – new business partners, government, customer feedback, and so forth – to help fill gaps in your COP, hence the point above on data-guided instead of data-driven. And finally, how do you take these new relationships and missing pieces to the data puzzle and deliver your situational intelligence into your now not-so-normal business process, adjusting your execution to respond better to uncertainty and disruption?

Underpinning your RTDM with the Actian Real-Time Connected Data Warehouse will determine if you can build up an IQ and turn that into executable recommendations at the operational level. More on that in the next blog.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.