Automated Data Matching Solution for Darnytsia

Accurate Data Matching Solution to Accelerate Sales Analytics for Darnytsia

logo_darnytsia

Location:

Ukraine

Industry:

Pharmaceuticals

Employees:

1,000+

About the Customer:

Pharmaceutical Company “Darnytsia” is Ukraine’s largest producer of medicinal products, having long-standing pharmaceutical traditions. With 90+ years of history, Darnytsia has had the leading position in the domestic pharmaceutical market by volume for more than 20 years, which proves to be true by year-to-year indicators of the industrial and market activity of the enterprise, as well as authoritative branch ratings. Darnytsia shows significant progress in the process of digital transformation, succeeding in the implementation of market-leading technologies.

Executive Summary

Goal: Design a solution to match data tables from different sources and adjust data to a single format for further sales and market analytics, accelerating data processing and addressing human error risks.

Solution: Infopulse data scientists implemented a hybrid data matching algorithm with above 96% matching accuracy for drug names and above 85% accuracy for pharmacy names and addresses.

Benefits: Automated data matching with zero human error risk, faster data analytics and decision-making, streamlined data processing with reduced labor and cost

Services delivered: Innovation Services, Intelligent Business, Intelligent Automation, Smart Insights.

picture-1024x500-accurate-data-matching-solution-to-accelerate-sales-analytics-for-darnyt

Business Challenge

Darnytsia and Infopulse have a long history of collaboration, having implemented numerous digital transformation projects together. After successfully implementing a GenAI data analytics bot and a sales prediction solution in cooperation with Infopulse, the company requested our assistance in addressing another challenge related to data used for sales and market analytics:

  • Matching the drug information between the company’s internal database and external tables received from pharmacies collaborating with Darnytsia
  • Matching the pharmacies’ names and addresses between a table from the internal database and external sources

Initially, data matching was conducted manually; yet such an approach is time-consuming and is associated with risks of human errors that may negatively affect the results. The pharma giant was not satisfied with a ready-made solution they had previously tested and requested Infopulse to develop a custom matching algorithm.

Solution & Business Value

Together with Darnytsia, Infopulse developed a hybrid matching algorithm that matches records between data tables (internal and external) with a total error of less than 5%.

The business value from such an output for Darnytsia includes:

  • Complete automation of the data matching process with 20x less employees’ time required
  • Reduced time and cost for the related operations: matching tens of thousands of records takes less than 10 minutes
  • 85-96% accuracy of matching
  • Zero human error risk due to automation and no manual job involved
  • Faster data analytics, decision-making, and insights generation

The project implementation took two weeks.

Technical Details

The client provided us with two Excel data tables containing drug information (300-400 records each), and three tables containing pharmacy data (up to 10,000 records each). Data was inconsistent and set in different formats, making the initial task more complicated. The goal was to design a solution that would find matches between records, taking into account disparate formats and typos.Using these initial datasets, we searched for suitable matching metrics and tested various approaches and available algorithms. As a result, Infopulse created a hybrid matching algorithm that looks for similarities between two data strings and calculates the total similarity score.     

The following metrics were used:

  • Ratcliff similarity
  • Levenshtein distance
  • TF-IDF (term frequency-inverse document frequency)

As initially agreed with Darnytsia, the output is presented in the form of Python code that our client can run when needed.

Matching algorithm tests showed the following results:

  • 96% accuracy for matching two drug tables of 300-400records each
  • Above 85% accuracy for matching three pharmacy tables (names and addresses matched) of 10,000 records each

The time required for running the matching algorithm is under one minute for drug tables, and under 10 minutes for pharmacy tables. The solution is not limited by a specific data scale and can work for any data volume.   

Technologies

Python logo
Python
python-difflib
Difflib
Com logo
SequenceMatcher
rapidfuzz-lng
RapidFuzz
scikit learn logo
Sklearn

Related Services

We have a solution to your needs. Just send us a message, and our experts will follow up with you asap.

Thank you!

We have received your request and will contact you back soon.