Mining manufacturing data for discovery of high productivity process characteristics

Salim Charaniya, Huong Le, Huzefa Rangwala, Keri Mills, Kevin Johnson, George Karypis, and Wei-Shou Hu
Journal of Biotechnology, Vol. 147, pp. 186—197, 2010
Download Paper
Modern manufacturing facilities for bioproducts are highly automated with advanced process monitoring and data archiving systems. The time dynamics of hundreds of process parameters and outcome variables over a large number of production runs are archived in the data warehouse. This vast amount of data is a vital resource to comprehend the complex characteristics of bioprocesses and enhance production robustness. Cell culture process data from 108 ‘trains’ comprising production as well as inoculum bioreactors from Genentech’s manufacturing facility were investigated. Each run constitutes over one-hundred on-line and off-line temporal parameters. A kernel-based approach combined with a maximum margin based support vector regression algorithm was used to integrate all the process parameters and develop predictive models for a key cell culture performance parameter. The model was also used to identify and rank process parameters according to their relevance in predicting process outcome. Evaluation of cell culture stage-specific models indicates that production performance can be reliably predicted days prior to harvest. Strong associations between several temporal parameters at various manufacturing stages and final process outcome were uncovered. This model-based data mining represents an important step forward in establishing a process data-driven knowledge discovery in bioprocesses. Implementation of this methodology on the manufacturing floor can facilitate a real-time decision making process and thereby improve the robustness of large scale bioprocesses.
Research topics: Bioinformatics | Data mining