Showing posts from November, 2014

Performance of a popular docking code

This post is a quick note on a performance of a commercial docking code, as measured across the entire DUDE dataset. For around a 100 protein targets, the code is supposed to rank active compounds higher than decoys compounds – separate poppy seeds from sand if you like. Let me start by saying that I'm pretty impressed with what I've seen. Starting this side-project, I assumed that any docking code can be expected to have an AUC of around 0.6-0.7 measured on a standard benchmarking set (such as DUDE). I think that's largely true of free codes such as Autodock vina or rdock. But here we're looking at a piece of code from a major commercial vendor and it performs beyond my expectation. I'm not disclosing the name of the code, since there may have been something in the academic license that prevents such benchmarking studies from being published (a "gag order" effectively, in case somebody is measuring the code "incorrectly"). AUC is estima

SDF to Excel file, in an automated fashion

Sharing SDF files between chemists is often a pain. It's supposed to be vanilla and super-standard but sometimes still gives everyone involved a headache. Especially when moving SDF between two chemistry codes, especially if hydrogens are involved... For this reason, and because some people ONLY work with excel files, it's good to have an ability to automatically convert an SDF file to a Excel file (especially xlsx). With pandas and rdkit, its possible to easily make such moves. Example below. Pandas uses xlsxwriter module to support the Excel format. There is no easy way to pass image objects, embedded in the pandasa.DataFrame, down to xlsxwriter. The writer itself supports the insert_image functionality that takes a filename as argument example ). The easiest way is to make pandas detect that a cell contains a string ending with a .png and take use 'insert_image', see the hack below: And here you go: molecule_data.xlsx has a beautiful col