How We Used AI/ML To Ease Debt Risk Management For An Acclaimed Cancer Center

Customer:
A Prominent Cancer Center in the US

Industry:
Healthcare

Services:
CI/CD | Cloud | Containers

Technologies:
CI/CD | Containers | Cloud

Tools:
GoCD | Docker | Rancher

Platform:
Amazon Web Services | Rancher | Cattle Orchestrator

Enabling Early Debt Risk Identification

Our team used AI/ML techniques along with DevOps principles to help a prominent US-based cancer center quickly identify patients with high debt risk. This enabled the center to intervene early and offer payment plans that could help them minimize bad debt.

Solution

The solution we offered implements a “training module” and a “prediction module” that run every week with new sets of data. Training is the phase where new data is ingested into an ML model. This is later used by the prediction module to identify and forecast debt risk cases among new patients. This also involves getting data from third-party sources like credit institutions.

01. Gathering Data from Vendors	02. Training the Model	03. Generating Predictions	04. Failure Scenarios
To feed the training model we need data from different sources. These sources are trusted vendors who share data via subscriptions to a pull-based system. This is stored as Big Data.	This Big Data from the previous step is unclassified. To train the model, we need to classify it and feed it to the training model.	The same ML model is applied to generate the prediction based on the trained data. This is also handed over to the application.	Each step involves numerous hours and days, and is prone to failure due to network, broken data and downtime of different components.

We broke the entire problem down into three distinct phases.

Automating: The first phase was to set up automation for getting data from third-party vendors. Each of those vendors has different mechanisms to provide data. Some vendors run Cron jobs and push it onto a commonly shared computer in the Cloud. A few others offer API endpoints that can be scraped via pipelines. This data is stored in different databases, including Hadoop containers. All this is done within GoCD in different stages as pipelines combining all the endpoints needed. The containers are run sequentially based on the success of each step. We used Rancher with Cattle to facilitate this flow.

Modeling: In the second phase, we get the training model that is applied to these different data sets. The model is then equipped with enough data to run its prediction. This is done as a different stack within Rancher.

Predicting: Finally, the third phase is implemented to generate predictions. This depends on the new registrations from the confidential patient data. As this is highly confidential data, we have employed vaults and encryption of the attributes, which is facilitated by Rancher by default.

The GoCD pipeline gets triggered every week and starts its sequential execution to get a new set of data. Each job is run asynchronously as a container so that the failure of any job doesn’t impede the next. Failures are reported to the respective Slack channels, and jobs can be re-triggered once the problem is identified.

At the time of implementing this project, Rancher was a robust orchestrator with an intuitive way of scheduling containers and defining a flow based on simple “docker-compose” files. Rancher’s scheduler, host agnostic scaling capacities, and visible logging of each container service gave us an easy way to run “train” and “prediction” models.

The loads, including Rancher, databases, and GoCD, were hosted on AWS and continuously monitored using CloudWatch so that we could scale them accordingly. A comprehensive notification system alerted us to all changes.

With asynchronous data gathering via pipelines, we gained the opportunity to track multiple vendors along with any data inconsistencies and address them individually.

Previously, it was a challenge to have the models and algorithms correctly versioned. Applying DevOps principles brought everything under control and facilitated higher value creation.

Containerizing every component—from pipeline data scraping to launching the train and predict programs—created isolated steps which helped us debug the system as a whole.

Versioned ML Models

Previously, the ML models had been maintained in the form of Jupyter Notebooks saved in folders, which over a period of time, resulted in multiple unorganized folders.

By applying software engineering principles, we were able to find ways to version control the models, and educate data scientists and engineers about the need for doing so. Version control allowed us to keep track of how an ML model evolved right from day one, while also simplifying collaboration over particular ML models.

How We Used AI/ML To Ease Debt Risk Management For An Acclaimed Cancer Center

Enabling Early Debt Risk Identification

About the customer

Business Challenge

Solution

Results

Project Highlights

Resilient Data Collection

Versioned ML Models

Eliminated A Single Point Of Failure

We're ready. It's time to see what we can accomplish, together.

Partner With Us

Join us