leemengtaiwan
10/6/2017 - 4:58 AM

建立image的dataset, 將資料夾裡頭的圖片調整大小以後傳回對應的X, y ndarray (適用Dogs vs Cats)

可以調整要用的照片數目

def load_image_dataset(dir_path='datasets/train/', dataset_size=None,
                size=(300, 300)):
    """
    Resize all the images in the specifed directory to the specified
    (height, width) as X and their corresponding labels as y. Where
    `y = 0` indicate it's a dog image while `y = 1` indicate cat image.
    
    Parameters:
    -----------
    dir_path: relative path to image folder
    dataset_size: total number of images to be included in the result,
        useful when there are too many images in the folder
    size: final image size after resize operation
    
    Returns:
    --------
    X: ndarray of shape (#images, height, width, #channel)
    y: ndarray of shape (#images, label)
    """
    import os
    import numpy as np
    
    X, y = [list() for _ in range(2)]
    all_img_files = os.listdir(dir_path)
    
    # if dataset_size is not specified, resize all the images 
    dataset_size = dataset_size if dataset_size else len(all_img_files)
    
    # random pick files in the folder
    img_files = np.random.choice(all_img_files, dataset_size)
    for img_file in img_files:
        img = read_image_and_resize(dir_path + img_file, size=size)
        label = 0 if 'dog' in img_file else 1
        X.append(img); y.append(label)
        
    return (np.array(X), np.array(y).reshape(-1, 1))
    
# example invoke
X, y = load_image_dataset(dir_path='datasets/train/', dataset_size=100, size=(300, 300))