Creating DMSO-normalized per-cell data using data from PMID 24045582
$ pip freeze
MySQL-python==1.2.5
numpy==1.9.2
progressbar==2.3
scipy==0.15.1
verlib==0.1
$ python --version
Python 2.7.9
$ git clone git@github.com:CellProfiler/CellProfiler-Analyst.git
$ cd CellProfiler-Analyst/
$ git rev-parse HEAD
1e16c195bfbc98c9c14f017370fa38c47a15eca1
$ export PYTHONPATH=<path-to-CellProfiler-Analyst-directory>
$ unzip Data_S2_database.zip
$ unzip Data_S3_reproduce.zip
$ cd database/
. load_supplement_tables.sh .|mysql -h <hostname> -u <username> -p<password> <databasename>
CREATE view supplement_Image_withmoa
AS
SELECT TableNumber,
ImageNumber,
Image_Metadata_Plate_DAPI,
Image_Metadata_Well_DAPI,
Image_Metadata_Compound,
Image_Metadata_Concentration,
moa AS Image_Metadata_MOA
FROM supplement_Image AS image
JOIN `supplement_GroundTruth` AS ground
ON image.`Image_Metadata_Compound` = ground.`compound` and
image.`Image_Metadata_Concentration` = ground.`concentration`
Edit these fields in properties/supplement.properties appropriately
db_host
db_name
db_user
db_passwd
Replace supplement_Image
with supplement_Image_withmoa
throughout
Edit the definition of group_SQL_Well
group_SQL_Well = SELECT TableNumber, ImageNumber, Image_Metadata_Plate_DAPI, Image_Metadata_Well_DAPI, Image_Metadata_Compound, Image_Metadata_Concentration, Image_Metadata_MOA FROM supplement_Image_withmoa
Add this line at the end
filter_SQL_test = SELECT TableNumber, ImageNumber from supplement_Image_withmoa where Image_Metadata_Plate_DAPI = 'Week10_40111' and substr(Image_Metadata_Well_DAPI from 2 for 2) IN ('02', '11')
cd src
python -m cpa.profiling.cache -r ../properties/supplement.properties ../inputs/cache "Image_Metadata_Compound = 'DMSO'"
Test out the creation of per-cell data (normalized) using a filter (test)
$ mkdir ../output
$ python -m cpa.profiling.profile_percell \
-o ../output/percell -c \
-f test \
--normalization=RobustLinearNormalization \
../properties/supplement.properties \
../inputs/cache/ \
Well
Once you are satisfied that things are working fine (see the output in ../output/), you can run it on the whole thing (i.e. remove the filter, will result in 632 csv files, one per well)
$ python -m cpa.profiling.profile_percell \
-o ../output/percell -c \
--normalization=RobustLinearNormalization \
../properties/supplement.properties \
../inputs/cache/ \
Well
$ ls ../output/ -1 |wc -l
632
Add this line to the properties file if it doesn't already exist
filter_SQL_dmso = SELECT TableNumber, ImageNumber from supplement_Image_withmoa WHERE Image_Metadata_Compound = 'DMSO'
SUBSAMPLE_SIZE_DMSO_ALL = 306144
SUBSAMPLE_DMSO_ALL = ../output/$(SUBSAMPLE_SIZE_DMSO_ALL).dmso.subsample
python -m cpa.profiling.subsample \
--multiprocessing \
-p -v \
--normalization RobustLinearNormalization \
-f dmso \
../properties/supplement.properties \
../inputs/cache \
$(SUBSAMPLE_DMSO_ALL) \
$(SUBSAMPLE_SIZE_DMSO_ALL)
python -m cpa.profiling.pca \
--normalization RobustLinearNormalization \
$(SUBSAMPLE_DMSO_ALL) 50 \
../output/$(SUBSAMPLE_SIZE_DMSO_ALL).dmso.50.pca.preprocessor
python -m cpa.profiling.profile_percell \
-o ../output/percell_pca -c \
--normalization=RobustLinearNormalization \
--preprocess $(d)/$(SUBSAMPLE_SIZE_DMSO_ALL).dmso.50.pca.preprocessor \
../properties/supplement.properties \
../inputs/cache/ \
Well