shntnu
6/1/2015 - 7:57 PM

Creating DMSO-normalized per-cell data using data from PMID 24045582

Creating DMSO-normalized per-cell data using data from PMID 24045582

Setup:

$ pip freeze
MySQL-python==1.2.5
numpy==1.9.2
progressbar==2.3
scipy==0.15.1
verlib==0.1
$ python --version
Python 2.7.9
$ git clone git@github.com:CellProfiler/CellProfiler-Analyst.git
$ cd CellProfiler-Analyst/
$ git rev-parse HEAD
1e16c195bfbc98c9c14f017370fa38c47a15eca1
$ export PYTHONPATH=<path-to-CellProfiler-Analyst-directory>

Upload the database:

$ unzip Data_S2_database.zip
$ unzip Data_S3_reproduce.zip
$ cd database/
. load_supplement_tables.sh .|mysql -h <hostname> -u <username> -p<password> <databasename>

Add a view in the database:

CREATE view supplement_Image_withmoa 
AS 
  SELECT TableNumber, 
         ImageNumber, 
         Image_Metadata_Plate_DAPI, 
         Image_Metadata_Well_DAPI, 
         Image_Metadata_Compound, 
         Image_Metadata_Concentration, 
         moa AS Image_Metadata_MOA 
  FROM   supplement_Image AS image 
         JOIN `supplement_GroundTruth` AS ground 
           ON image.`Image_Metadata_Compound` = ground.`compound` and 
           image.`Image_Metadata_Concentration` = ground.`concentration`

Edit properties file:

Edit these fields in properties/supplement.properties appropriately

  • db_host
  • db_name
  • db_user
  • db_passwd

Replace supplement_Image with supplement_Image_withmoa throughout

Edit the definition of group_SQL_Well

group_SQL_Well = SELECT TableNumber, ImageNumber, Image_Metadata_Plate_DAPI, Image_Metadata_Well_DAPI, Image_Metadata_Compound, Image_Metadata_Concentration, Image_Metadata_MOA FROM supplement_Image_withmoa

Add this line at the end

filter_SQL_test = SELECT TableNumber, ImageNumber from supplement_Image_withmoa where Image_Metadata_Plate_DAPI = 'Week10_40111' and substr(Image_Metadata_Well_DAPI from 2 for 2) IN ('02', '11')

Create cache

cd src
python -m cpa.profiling.cache -r ../properties/supplement.properties ../inputs/cache "Image_Metadata_Compound = 'DMSO'"

Create DMSO-normalized data

Test out the creation of per-cell data (normalized) using a filter (test)

$ mkdir ../output
$ python -m cpa.profiling.profile_percell \
         -o ../output/percell -c \
         -f test \
         --normalization=RobustLinearNormalization \
         ../properties/supplement.properties \
         ../inputs/cache/ \
         Well

Once you are satisfied that things are working fine (see the output in ../output/), you can run it on the whole thing (i.e. remove the filter, will result in 632 csv files, one per well)

$ python -m cpa.profiling.profile_percell \
         -o ../output/percell -c \
         --normalization=RobustLinearNormalization \
         ../properties/supplement.properties \
         ../inputs/cache/ \
         Well
$ ls ../output/ -1 |wc -l
632

Perform PCA on the per-cell data

Add this line to the properties file if it doesn't already exist

filter_SQL_dmso = SELECT TableNumber, ImageNumber from supplement_Image_withmoa WHERE Image_Metadata_Compound = 'DMSO'
SUBSAMPLE_SIZE_DMSO_ALL = 306144
SUBSAMPLE_DMSO_ALL  = ../output/$(SUBSAMPLE_SIZE_DMSO_ALL).dmso.subsample


python -m cpa.profiling.subsample \
       --multiprocessing \
       -p -v \
       --normalization RobustLinearNormalization \
       -f dmso \
       ../properties/supplement.properties \
       ../inputs/cache \
       $(SUBSAMPLE_DMSO_ALL) \
       $(SUBSAMPLE_SIZE_DMSO_ALL)

python -m cpa.profiling.pca \
       --normalization RobustLinearNormalization \
       $(SUBSAMPLE_DMSO_ALL) 50 \
       ../output/$(SUBSAMPLE_SIZE_DMSO_ALL).dmso.50.pca.preprocessor

python -m cpa.profiling.profile_percell \
       -o ../output/percell_pca -c \
       --normalization=RobustLinearNormalization \
       --preprocess $(d)/$(SUBSAMPLE_SIZE_DMSO_ALL).dmso.50.pca.preprocessor \
       ../properties/supplement.properties \
       ../inputs/cache/ \
       Well