_resources.md

parkerjgit

7/18/2018 - 5:21 AM

web_scraping.md

Rendered
Source

python_pandas.md

Rendered
Source

Pandas Basics

[TOC]

Series -----------------------------------------------------------

Create a pandas series from a dicts, an ndarray, or a scalar value

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. Series can be created from a dicts, an ndarray, or a scalar value. (tmp)

s = pd.Series(np.array([1, 2, 3]), index=['a', 'b', 'c']) # optional index must be the same length as data if provided
s = pd.Series({'a' : 1, 'b' : 2, 'c' : 3})
s = pd.Series(5., index=['a', 'b', 'c']) # scalar values will be repeated

https://pandas.pydata.org/pandas-docs/stable/dsintro.html#series

Series behaves like an ndarray and a dict.

s[0]                # index
s[:3]               # slice
s[s > s.median()]   # index conditionally
s[[4, 3, 1]]        # index with array indices
s['a']              # index by key
s['e'] = 12         # insert key/value
'e' in s            # inclusion
s.get('f', np.nan)  # index by key, return default if not found
s * 2               # vectorized multiplication
s + s               # vectorized addition

Note, slicing also slices the index.

https://pandas.pydata.org/pandas-docs/stable/dsintro.html#series

Unlike ndarray, Series operations align data by label

Being able to write code without doing any explicit data alignment grants immense freedom and flexibility in interactive data analysis and research. The integrated data alignment features of the pandas data structures set pandas apart from the majority of related tools for working with labeled data.

>>> a = np.array([1,2,3])
>>> s = pd.Series(a)

>>> a[1:] + a[:-1]
array([3, 5])

>>> s[1:] + s[:-1]
0    NaN
1    4.0
2    NaN

https://pandas.pydata.org/pandas-docs/stable/dsintro.html#series

Dataframes --------------------------------------------------------

2-dimensional labeled data structure with columns of potentially different types. Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments. If axis labels are not passed, they will be constructed from the input data based on common sense rules. (tmp)

adf

Dataframe from dict of nd arrays/lists

Given feature vectors of same length (represented as ndarrays or list), create dictionary where each key/value pair corresponds to feature label and feature vector (ndarray or list).

d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}

df = pd.DataFrame(d)

---
   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

Dataframe from dict of Series/dicts

Given feature vectors of unequal length, cast each as a series and assign to an entry in dictionary, where key/value corresponds to feature label/vector, and indeces are optionally passed as a list to "index" keyword arg. If no index is passed, the result will be range(n), where n is the array length. Can then create a dataframe.

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

---
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

https://pandas.pydata.org/pandas-docs/stable/dsintro.html#from-dict-of-series-or-dicts

python_numpy.md

Rendered
Source

Python numpy

[TOC]

Array Creation

Use Numpy for fixed-length homogeneous multidimensional arrays
NumPy supports way more numerical types than Python does
Specify type, with optional dype argument, when creating arrays
Prefer astype() attribute over static type casting functions.
Create n-dimensional numpy array with np.array()
Initialize array with zeros(), ones(), and empty()
Create regular sequences with arange() and linspace()
Create logorithmic sequences with logspace()
Generate indices of a grid with indices()
Partition a 1d vector into nd array with reshape()
Inspect properties of a numpy array with shape, ndim, itemsize, and size attributes.

Use Numpy for fixed-length homogeneous multidimensional arrays

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. The number of axes is rank. (quickstart tutorial)

(If you try to create heterogeneous numpy array, it will convert everything to numbers if possible, strings if not.)

np.array([True, 1, 2]) + np.array([3, 4, False])    # array([4,5,2])
np.array(['Cat', 1, 2])                             # array('Cat', '1', '2')
np.array(['Cat', 1, 2]) + np.array([3, 4, False])   # TypeError

Note, cannot do element-wise addition of np array of strings.

NumPy supports way more numerical types than Python does

NumPy numerical types. are instances of dtype (data-type) objects, available as np.bool_, np.float32, etc.

Specify type, with optional `dype` argument, when creating arrays

Datatype is typically specified when creating arrays, but data-types can be used themselves as casting functions to convert python lists/numbers to np arrays/scalars. If not specified, default data types of np arrays are infered as int32 or float64.

a = np.array( [1,2,3] )
b = np.array( [1,2,3], dtype=np.float32 )
c = np.array( [1,2,3], dtype='f' )
d = np.float32(1.0)
e = np.int_([1,2,4])

Prefer `astype()` attribute over static type casting functions.

Can also cast existing np array with datatype function, but preferable to use astype().

y = np.int_([1,2,4])
z = y.astype(float)

Array Creation

Create n-dimensional numpy array with `np.array()`

Can create numpy array by converting python lists, or list of list, etc..., into n-dimensional array. array() converts lists, tuples or any object that supports array-protocol.

a = np.array( [1,2,3] )
b = np.array( [[1,2,3], [4,5,6]] )
c = np.array( [ [[1,2,3], [4,5,6]], [[7,8,9], [0,1,2]] ] )

Initialize array with `zeros()`, `ones()`, and `empty()`

Array are fixed-length, so initialize them instead of growing them as you would with a list. zeros() and ones() creates an array of zeros/ones, and empty() creates an array of random content.

np.zeros( (3,4) )
np.zeros( (3,4), dtype=np.int16)
np.ones( (2,3,4) )
np.empty( (2,3) )

Create regular sequences with `arange()` and `linspace()`

Numpy's arange() functions just like python range(), but returns a numpy array and also accepts optional dtype argument.

np.arange(10)                       # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(2, 10, dtype=np.float)    # array([ 2., 3., 4., 5., 6., 7., 8., 9.])
np.arange(2, 2.5, 0.1)              # array([ 2. , 2.1, 2.2, 2.3, 2.4])

linspace() takes first and last(inclusive) elements and the total number of elements and generates calculates the spacing for you.

np.linspace(1., 4., 6)          # array([ 1. ,  1.6,  2.2,  2.8,  3.4,  4. ])

Create logorithmic sequences with `logspace()`

logspace() works like linspace() but first and last elements are the base 10 log of first/last values (optional argument to change base)

np.logspace(1, 2, 10, base=10.0)
np.logspace(np.log10(10), np.log10(100), 10, base=10.0)

Generate indices of a grid with `indices()`

np.indices() takes the shape of a n-dimensional grid and generates the indices!

grid = np.indices((3,4))

# equivalent
rows, cols = [], []
for r in range(3):
    for c in range(4):
        rows.append(r)
        cols.append(c)
rows = np.array(rows).reshape(3,4)
cols = np.array(cols).reshape(3,4)
grid = np.stack( (rows, cols), axis=0 )

Partition a 1d vector into nd array with `reshape()`

Create/initialize nd array with a numeric sequence (or any pattern) by first creating vector representation of unfolded nd array, then simply reshape it.

a = np.arange(15).reshape(3, 5)

Inspect properties of a numpy array with `shape`, `ndim`, `itemsize`, and `size` attributes.

type(a)         # <type 'numpy.ndarray'>
a.dtype.name    # 'int64'
a.shape         # (3, 5)
a.ndim          # 2 (rank, i.e. number of dimensions)
a.itemsize      # 8 (size in bytes of each element, i.e. 64/8 = 8)
a.size          # 15 (total number of elements)

Print arrays

https://docs.scipy.org/doc/numpy-dev/user/quickstart.html#printing-arrays

Basic Operations

Universal Functions

https://docs.scipy.org/doc/numpy-dev/user/quickstart.html#universal-functions

Indexing

Indexing, Slicing and Iterating

Indexing with Arrays of Indices

Indexing with Boolean Arrays

Index np array like a list or better yet with comma indexing notation

np arrays indexed like python lists, BUT multiple elements on np array can be indexed with a list/tuple/array of array indices to be indexed.

>>> b = np.array([1,2,3,4,5])
>>> b[[1,3]]
array([2,4])

a = np.array([[1,2,3,4], 
              [6,7,8,9]])

a[0][2]     # 3
a[0,2]      # 3
a[:,1:3]    # array([[2,3],[7,8]])
a[1,:]      # array([6,7,8,9])

np arrays can also be indexed by a boolean array (also called a logic array), where elements corresponding to True values in boolean array get indexed. Boolean arrays can also be constructed by applying a comparator to a np array.

>>> b[[False, True, True, False, False]]
array([2,3])
>>> b[b > 3]
array([4,5])

Shape Manipulation

Changing the shape of an array

Stacking together different arrays

Splitting one array into several smaller ones

Copies and Views

This shold be moved to new file. its basically a cmd reference

np.mean(a) np.media(a) np.std(a) np.corrcoef(a,b)

Functions and Methods Overview

Broadcasting-Quick Start

unsortted

np.argsort() returns indeces of array elements in array-sorted order recall you can index a numpy array with an array of indeces, so can use to get the k smallest elements in array in sorted order. (eg., to implement knn)

a = [4,7,3,5,6,8,1,0,2]
i = np.argsort(a)           # [7, 6, 8, ...]
a[i[0:3]]                   # [0, 1, 2]

gotchas overview

'+' operator concatenates python lists, but performs element-wise addition on np arrays

When you slice a python list you get a copy!!! When you slice a np array you get VIEW!!! When you index a np array you get a copy!!!! So indexing a np array is actually more similar to slicing a list!!!!

np.any(), np.all() and np.where() to apply condition to all elements in array. any/all return True/False if any/all elements meet condition. np.where() returns a list(tuple) of all indices that meet the condition. Compare that with indexing array with a condition (returns a boolean array)

data_munging.md

Rendered
Source

data munging

Create dictionary from a lists of keys & values with dict(zip(keys,values)) (unzip with keys() & values() dictionary attributes)
Get list of tuples from dictionary with d.items() (useful for iterating over dictionary, ie for for k,v in d.items(): ...)
Create dictionary with auto-generated keys with dict(enumerate(values))
Implement an empty bag for counting (ie. multiset) with d = Collections.defaultdict(int) (eg. word count)
Generate full bag with Collections.Counter(list_of_items)
Count named tuples by type with collections.Counter(tpl.typ for tpl in named_tpls), e.g., collections.Counter(medal.team for medal in medals) (see Exploit Python Collections)
Count number of non-zero elements in array, a, with np.count_nonzero(a).
Count number of elements greater than x np.count_nonzero(np.array(a) > x)
Zipping uneven lists truncates the shorter, e.g. list(zip((1,2,3,4,5),('a','b','c')))
rotate a matrix with list(zip(*reversed(m))) or [[row[i] for row in reversed(a)] for i in range(len(m))]

generating synthetic data

Generate 5 x 2 matrix with values from standard normal distribution with mean 0 and standard deviation 1.

from scipy import stats as ss
ss.norm(0,1).rvs((5,2))

data_analysis.md

Rendered
Source

Data Analysis

description

[TOC]

Aquiring Data

Download data with `urllib.request.urlretrieve(<url>, <new file name>)`

import urllib.request
urllib.request.urlretrieve('ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt', 'stations.txt')

# now read/parse local file

`open(...)` returns iterable object, so peek at data with `open(...).readlines()[:5]` or simply cast to list `list(open(...))[:5]`

open('stations.txt','r').readlines()[:10]

list(open('stations.txt','r'))[:5]

which is equiv to:

with open('stations.txt','r') as f:
    f.readlines()[:5]
    # or list(f)[:5]

Read in text file data to dataframe with parsing functions, e.g., `read_csv`, `read_table`, `read_json`, etc.

Use `read_csv` to if data is comma-seperated, use `read_table` if tab-separated, or specify arbitrary delimitter with `sep` option.

df = pd.read_csv('examples/ex1.csv')
pd.read_table('examples/ex1.csv', sep='/')

Note, sep accepts regular expressions. e.g., \s+ for variable white space.

Inspect data with `info()` and `head()/tail()/unique(column_name)` dataframe methods.

Iff no header line in file, specify column names with `names=[...]` or accept defaults with `header=None` option.

pd.read_csv('examples/ex2.csv', header=None)
pd.read_csv('examples/ex2.csv', names=['a', 'b', 'c', 'd', 'message'])

Note, if header AND names options ommitted, the first line will be used as header, and if options included, any header line in file will be NOT be overridden, it will become the first line of data.

Specify index column with `index_col` or accept default row ids by default.

colums = ['a','b','c']
pd.read_csv('examples/ex2.csv', names=columns, index_col='c')
pd.read_csv('examples/ex2.csv', names=columns, index_col=['a','b']) # hierarchical index
pd.read_csv('examples/ex2.csv', names=columns) # default index

Note, by passing a list of columns names, i.e., index_col=['a','b'], you can specify a hierarchical index.

Skip n number of rows from begining/end with `skiprows`/`skipfooter` options, or pass a list to `skiprows` to specify line numbers to skip.

Replace specified values with "NA" with `na_values` (missing values are replace with "NA" by default).

Cull rows with nan values by indexing data with ~np.isnan(column) for each column

for col in data.columns:
    data = data[~np.isnan(data.col)]

Specify encoding with `encoding='...'`

data = b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'.decode('utf8').encode('latin-1')
df = pd.read_csv(BytesIO(data), encoding='latin-1')

(from Dealing with Unicode Data)

See Encodings and Unicode for full list of python encodings.

Adjust number of rows pandas displays to output with `pd.options.display.max_rows` setting.

Read in first n number of rows with `nrows`, e.g. `nrows=5`.

Read file chunks into an iterable by specifying number of rows per chunk with `chunksize`

chunks = pd.read_csv('in.csv', chunksize=1000)
for chunk in chunks:
    # ...

Write a dataframe (OR a series) to file with `to_csv`, specifying a filename and delimitter with `sep` (or accepting commas by default). Pass in sys.stdout instead filename to write to console.

data.to_csv(sys.stdout)
data.to_csv('out.csv')
data.to_csv('out.csv', sep='|')

Note, missing values are represented as empty strings unless you specify with na_rep, e.g., na_rep='NULL'

Omit column / row(index) names with `header=False` and `index=False`.

Write a subset of columns in a specified order with `columns=['b','a',...]`

Get list of lines from file with `list(csv.reader(open_file))`,

# get list of lines
with open('in.csv') as f:
    lines = list(csv.reader(f))

# wrange/fix/etc. data manually here
# ...

# recreate dataframe
head, vals = lines[0], lines[1:]
data_dict = {h:v for h,v in zip(head,zip(*vals))}
dataframe = pd.DataFrame(data_dict)

Build feature vectors (columns) from rows, i.e., transpose rows to columns, with `zip(*vals)`.

rows = [['1','2','3'],
        ['1','2','3'],
        ['1','2','3']]

# transpose rows to cols
cols = zip(*rows) 
names = range(len(cols))

# create dictionary with entries of form: feature:feature_vector
data_dict = {h:v for h,v in zip(names, cols)}

# which is equiv to
data_dict = {h:v for h,v in zip(range(len(vals)),zip(*vals))}

Create a custom CSV format by definging a subclass of `csv.Dialect`

class my_dialect(csv.Dialect):
    lineterminator = '\n'
    delimiter = ';'
    quotechar = '"'
    quoting = csv.QUOTE_MINIMAL

reader = csv.reader(f, dialect=my_dialect)

(McKinney 177)

Write delimited files manually with `csv.writer(open_file)`

with open('out.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(('a', 'b', 'c'))
    writer.writerow(('1', '2', '3'))
    writer.writerow(('4', '5', '6'))

Convert to/from json object and python dictionary with `json.loads()` and `json.dumps()` from standard library.

Convert to/from json object and dataframe with `pandas.read_json()` and `to_json()` dataframe/series method.

McKinney, Wes. Python for Data Analysis. 2nd ed., O’Reilly, 2018.

Data Cleaning and Preparation

_workflow.md

Rendered
Source

Data Science Workflow

[TOC]

1_Acquiring Data - making, downloading, api requesting, html scraping

2_Data Analysis - parsing, cleaning, munging

3_Predictive Modeling - machine learning w/ Python

4_Static Visualizing w/ Python - matplotlib, seaborn, bokeh, etc.

5_Getting data into databases - mysql, nosql, graph

6_Creating/Deploying endpoints (API)

7_Interactive/Web Visualization w/ javascript - d3.js, p5.js

8_Analytic Apps/Dapps

_resources.md

Rendered
Source

Data Science Resources

ton of resources: https://blog.peoplemaven.com/best-data-science-books-articles-f2fa755f2b9d

coursework:

No.	Course	Institution	Effort	Status
---	Using Python for Research	Harvard/edX
---	pandas for datascience	lynda	---	---
---	Numpy Data Science Essential Training	lynda	---	---
---	Python for Data Science Essential Training	lynda	---	---
---	xxx	---	---	---
---	xxx	---	---	---

Local Bootcamps (12wks/$16,000): https://www.thisismetis.com/data-science-bootcamps https://www.galvanize.com/seattle/data-science

topics:

WEEK 1: Introduction to the Data Science Toolkit Exploratory Data Analysis, Bash, Git & GitHub, Python, pandas, matplotlib, Seaborn

WEEK 2: Linear Regression and Machine Learning Intro Web scraping via BeautifulSoup and Selenium, regression with statsmodels and scikit-learn, feature selection overfitting and train/test splits, probability theory.

WEEK 3: Linear Regression and Machine Learning Continued Regularization, hypothesis testing , intro to Bayes Theorem

WEEK 4: Databases and Introduction to Machine Learning Concepts Classification and regression algorithms (Knn, logistic regression, SVM, decision trees, and random forest), SQL concepts, cloud servers

WEEK 5: More supervised learning algorithms & web tools Naive Bayes, stochastic gradient descent and intro to Deep Learning, Full stack in a nutshell: Python Flask, Javascript and D3.js

WEEK 6: Statistical Fundamentals MLE, GLM, Distributions, Databases ( RESTful APIs, NoSQL databases, MongoDB, pymongo) Natural Language Processing techniques

WEEK 7: Unsupervised Machine Learning Various clustering algorithms, including K-means and DBSCAN, dimension reduction techniques (PCA, SVD, LDA, NMF)

WEEK 8: More Deep Learning & Unsupervised Learning Deep Learning via Keras, Recommender Systems

WEEK 9: Big Data Hadoop, Hive & Spark, Final project initiated

WEEK 10-12: Final Project

from https://www.thisismetis.com/data-science-bootcamps

supplementary (tutorials, etc):

tbd...

People

Wes McKinney - author of "Python for Data Analysis" Jake VanderPlas - author of "Python Data Science Handbook"

data sources

web scraping

visualization

Polygraph.cool

Cacher is the code snippet organizer for pro developers

We empower you and your team to get more done, faster

_resources.md

Pandas Basics

Series -----------------------------------------------------------

Create a pandas series from a dicts, an ndarray, or a scalar value

Series behaves like an ndarray and a dict.

Unlike ndarray, Series operations align data by label

Dataframes --------------------------------------------------------

Dataframe from dict of nd arrays/lists

Dataframe from dict of Series/dicts

Python numpy

Use Numpy for fixed-length homogeneous multidimensional arrays

NumPy supports way more numerical types than Python does

Specify type, with optional dype argument, when creating arrays

Prefer astype() attribute over static type casting functions.

Create n-dimensional numpy array with np.array()

Initialize array with zeros(), ones(), and empty()

Create regular sequences with arange() and linspace()

Create logorithmic sequences with logspace()

Generate indices of a grid with indices()

Partition a 1d vector into nd array with reshape()

Inspect properties of a numpy array with shape, ndim, itemsize, and size attributes.

Index np array like a list or better yet with comma indexing notation

unsortted

gotchas overview

data munging

generating synthetic data

Data Analysis

Aquiring Data

Download data with urllib.request.urlretrieve(<url>, <new file name>)

open(...) returns iterable object, so peek at data with open(...).readlines()[:5] or simply cast to list list(open(...))[:5]

Read in text file data to dataframe with parsing functions, e.g., read_csv, read_table, read_json, etc.

Use read_csv to if data is comma-seperated, use read_table if tab-separated, or specify arbitrary delimitter with sep option.

Inspect data with info() and head()/tail()/unique(column_name) dataframe methods.

Iff no header line in file, specify column names with names=[...] or accept defaults with header=None option.

Specify index column with index_col or accept default row ids by default.

Skip n number of rows from begining/end with skiprows/skipfooter options, or pass a list to skiprows to specify line numbers to skip.

Replace specified values with "NA" with na_values (missing values are replace with "NA" by default).

Cull rows with nan values by indexing data with ~np.isnan(column) for each column

Specify encoding with encoding='...'

Adjust number of rows pandas displays to output with pd.options.display.max_rows setting.

Read in first n number of rows with nrows, e.g. nrows=5.

Read file chunks into an iterable by specifying number of rows per chunk with chunksize

Write a dataframe (OR a series) to file with to_csv, specifying a filename and delimitter with sep (or accepting commas by default). Pass in sys.stdout instead filename to write to console.

Omit column / row(index) names with header=False and index=False.

Write a subset of columns in a specified order with columns=['b','a',...]

Get list of lines from file with list(csv.reader(open_file)),

Build feature vectors (columns) from rows, i.e., transpose rows to columns, with zip(*vals).

Create a custom CSV format by definging a subclass of csv.Dialect

Write delimited files manually with csv.writer(open_file)

Convert to/from json object and python dictionary with json.loads() and json.dumps() from standard library.

Convert to/from json object and dataframe with pandas.read_json() and to_json() dataframe/series method.

Data Cleaning and Preparation

Data Science Workflow

1_Acquiring Data - making, downloading, api requesting, html scraping

2_Data Analysis - parsing, cleaning, munging

3_Predictive Modeling - machine learning w/ Python

4_Static Visualizing w/ Python - matplotlib, seaborn, bokeh, etc.

5_Getting data into databases - mysql, nosql, graph

6_Creating/Deploying endpoints (API)

7_Interactive/Web Visualization w/ javascript - d3.js, p5.js

8_Analytic Apps/Dapps

Data Science Resources

coursework:

topics:

supplementary (tutorials, etc):

Texts

readings:

Kaggle

People

data sources

web scraping

visualization

Specify type, with optional `dype` argument, when creating arrays

Prefer `astype()` attribute over static type casting functions.

Create n-dimensional numpy array with `np.array()`

Initialize array with `zeros()`, `ones()`, and `empty()`

Create regular sequences with `arange()` and `linspace()`

Create logorithmic sequences with `logspace()`

Generate indices of a grid with `indices()`

Partition a 1d vector into nd array with `reshape()`

Inspect properties of a numpy array with `shape`, `ndim`, `itemsize`, and `size` attributes.

Download data with `urllib.request.urlretrieve(<url>, <new file name>)`

`open(...)` returns iterable object, so peek at data with `open(...).readlines()[:5]` or simply cast to list `list(open(...))[:5]`

Read in text file data to dataframe with parsing functions, e.g., `read_csv`, `read_table`, `read_json`, etc.

Use `read_csv` to if data is comma-seperated, use `read_table` if tab-separated, or specify arbitrary delimitter with `sep` option.

Inspect data with `info()` and `head()/tail()/unique(column_name)` dataframe methods.

Iff no header line in file, specify column names with `names=[...]` or accept defaults with `header=None` option.

Specify index column with `index_col` or accept default row ids by default.

Skip n number of rows from begining/end with `skiprows`/`skipfooter` options, or pass a list to `skiprows` to specify line numbers to skip.

Replace specified values with "NA" with `na_values` (missing values are replace with "NA" by default).

Specify encoding with `encoding='...'`

Adjust number of rows pandas displays to output with `pd.options.display.max_rows` setting.

Read in first n number of rows with `nrows`, e.g. `nrows=5`.

Read file chunks into an iterable by specifying number of rows per chunk with `chunksize`

Write a dataframe (OR a series) to file with `to_csv`, specifying a filename and delimitter with `sep` (or accepting commas by default). Pass in sys.stdout instead filename to write to console.

Omit column / row(index) names with `header=False` and `index=False`.

Write a subset of columns in a specified order with `columns=['b','a',...]`

Get list of lines from file with `list(csv.reader(open_file))`,

Build feature vectors (columns) from rows, i.e., transpose rows to columns, with `zip(*vals)`.

Create a custom CSV format by definging a subclass of `csv.Dialect`

Write delimited files manually with `csv.writer(open_file)`

Convert to/from json object and python dictionary with `json.loads()` and `json.dumps()` from standard library.

Convert to/from json object and dataframe with `pandas.read_json()` and `to_json()` dataframe/series method.