Selection of projects (3)

An introduction to data mining in epidemiology as applied to an ongoing study of treatment choice for newly diagnosed type 2 diabetics.

The project was conducted with the Worldwide Epidemiology project team of GSK to explore the potential value of data mining tools in epidemiologic research. Further objectives were to educate the Epidemiology department in data mining strategies and to demonstrate to them S-Plus as a data mining tool by applying the methods to a specific project evaluating treatments for diabetes.

Advanced multivariate methods in QSAR and ecotoxicology.

The toxicity of large groups of chemical substances is usually assessed by quantitative structure activity relationship (QSAR). Thereto the actual compounds are represented by chemical descriptors like electronic interaction or Taft steric substituent constant. The feature space of descriptors and the measured ecotoxicology of training sets are then investigated with statistical methods. We compared the potential of different multivariate approaches like neural nets, CART, cluster analysis and various linear models and could thereby select from over hundred descriptors the most relevant parameters (6) which describe the ecotoxicology of more 1000 chemical compounds.