DataOps

In any AI or ML problem the data problem is usually, if not always, as challenging or more challenging as the analytics themselves.

Total Runtime: 14:38
May 18, 2019

NEXT Talks Abstract

In any AI/ML problem/challenge, it is data that are usually more challenging than the analytics itself. Data are messy, unlabeled, hard to access, and often incomplete.

The data pipelines that we make are the NT Concepts framework that take care of this problem so that model development can actually happen.

So what is DataOps? We want our data to be discoverable, accessible, and intelligible. This means data needs to be easy to find, easy for analysts and data scientists to use. And the schemas and visuals that we make need to make sense and be useful to them.

This whole process is about preparing data sets. In a data set, we need:

Data to contain enough information to solve whatever problem is at hand
The data needs to be complete and accurate
There has to be buy-in upfront for work to be done this way every time

We all know that no data comes like that. So we have a process to get it into an acceptable form to where it can be used in the ways that we need. To do this, first we have to explore our data sets and that means:

What data do we actually care about? (you don’t want to take everything)
Is there a schema that we can use? Or are we dealing with something that is unstructured and needs to have some sort of structure added to it?
How will this data be accessed and how often?

Once we answer those things we can then move on to the processing step where we transform our data. We want to identify the end state and then figure out what processing needs there are to get from point A to point B.

A lot of times this is where you have to start thinking about scale. What kind of processing do you need, how often will new data need to be [X] and QA is crucial. You want to make sure your output looks like what it is supposed to look like.

Lastly, you want to get your data ready for production needs. You should be thinking about this the whole time because this is what’s going to keep you from taking on any extra technical debt.

How should data be stored based on schema and access needs?
How do we monitor access, what log ins do we need?
And how are we going to make this data accessible and useful to others – whether that be through an API or dashboard or and SDK?

These are all methods you can use to make your data easy to use.

You might also like:

NEXT Talks

The Future of Military Logistics: AI, Data Models, and Emerging Technologies

Nick Chadwick discusses the complexities in military supply chain logistics, the importance of transitioning from a network-centric to a data-centric approach, and the role of AI/ML and knowledge graphs for operational efficiency.

The Future of Military Logistics: AI, Data Models, and Emerging Technologies

NEXT Talks

Demystifying Quantum Computing

In this NEXT Talks podcast, we sit down with Dr. Charles Forgy and explore quantum computing and its potential impact on national security.

Demystifying Quantum Computing

NEXT Talks

Predictive Maintenance

It's science. It's art. It's sausage. But mostly, it's math.

Predictive Maintenance