Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

General information

Authors:
André SANTOS, Regina NOGUEIRA, Anália LOURENÇO

DOI:
10.14201/ADCAIJ20121118

Volume:
Regular Issue 1 (1), 2012

Keywords:

Text mining; Biotechnology applications; Procedure optimization

Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.

References

Ceccaroni, L., Cortés, U., and Sànchez-Marrè , M. OntoWEDSS: augmenting environmental decision-support systems with ontologies. Environmental Modelling & Software, 19(9)(2004) 785–797.

Dionisi, H., Layton, A., Robinson, K., Brown, J., Gregory, I., Parker, J., and Sayler, G. Quantification of nitrosomonas oligotropha and nitrospira spp. using competitive polymerase chain reaction in bench-scale wastewater treatment reactors operating at different solids retention times. Water environmental research, 2002, pp. 462–469.

Gerner, M., Nenadic, G., and Bergman, C. Linnaeus: a species name identification system for biomedical literature. BMC Bioinformatics, 11(1)(2010) 85.

Hamouda, M., Anderson, W., Huck, P., et al. Decision support systems in water and wastewater treatment process selection and design: a review. Water Science and Technology, 60(7)(2009)1757–1770.

Koegst, T., Tränckner, J., Blumensaat, F., Eichhorn, J., and Mayer-Eichberger, V. On the use of an ontology for the identification of degrees of freedom in urban wastewater systems. Water science and technology: a journal of the International Association on Water Pollution Research, 55(4)(2007) 155

Krallinger, M., Leitner, F., and Valencia, A. Analysis of biological processes and diseases using text mining approaches. Methods in Molecular Biology, 593(2010) 341–382.

Limpiyakorn, T., Kurisu, F., and Yagi, O. Development and application of real-time pcr for quantification of specific ammonia-oxidizing bacteria in activated sludge of sewage treatment systems. Applied microbiology and biotechnology, 72(5)(2006) 1004–1013.

Nogueira, R. and Melo, L. Competition between nitrospira spp. and nitrobacter spp. in nitrite-oxidizing bioreactors. Biotechnology and bioengineering, 95(1)(2006) 169–175.

Nogueira, R., Melo, L., Purkhold, U., Wuertz, S., and Wagner, M. Nitrifying and heterotrophic population dynamics in biofilm reactors: effects of hydraulic retention time and the presence of organic carbon. Water research, 36(2)(2002) 469–481.

Tamames, J. and De Lorenzo, V. Envmine: A text-mining system for the automatic extraction of contextual information. BMC bioinformatics, 11(1)(2010) 294.

How to cite item?