Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization
The querier is a query language for Python pandas Data Frames, inspired by relational databases querying. If you like SQL, Structured Query Language, you’ll like the querier
. If you haven’t had a taste of SQL yet, no problem: the querier
’s language is intuitive, and contains 9 verbs in its current form. You can see how these verbs work individually in the following notebooks:
concat
: concatenates two Data Frames, either horizontally or verticallydelete
: deletes rows from a Data Frame based on given criteriadrop
: drops columns from a Data Framefiltr
: filters rows of the Data Frame based on given criteriajoin
: joins two Data Frames based on given criteriaselect
: selects columns from the Data Framesummarize
: obtains summaries of data based on grouping columnsupdate
: updates a column, using an operation given by the userrequest
: for operations more complex than the previous 8 ones, makes it possible to use a SQL query on the Data Frame
It is now possible to compose the querier
’s verbs, to construct more powerful queries for your Data Frames. Here is how to do it:
Installing the package
From command line:
!pip install git+https://github.com/Techtonique/querier.git
Import packages and dataset
import pandas as pd
import querier as qr
# Import data -----
url = ('https://raw.github.com/pandas-dev'
'/pandas/master/pandas/tests/data/tips.csv')
df = pd.read_csv(url)
print(df.head())
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
Example1:
- select columns
tip, sex, smoker, time
from tips dataset - filter rows in which
smoker == No
only - obtain cumulated tips by
sex
andtime
of the day
# Example 1 -----
qrobj = qr.Querier(df)
df1 = qrobj\
.select(req="tip, sex, smoker, time")\
.filtr(req="smoker == 'No'")\
.summarize(req="sum(tip), sex, time", group_by="sex, time")\
.df
print(df1)
sum_tip sex time
0 88.28 Female Dinner
1 61.49 Female Lunch
2 243.17 Male Dinner
3 58.83 Male Lunch
The query could be written in one line, but it would be less readable (hence the “\” for line continuation).
Example2:
- select columns
tip, sex, day, size
from tips dataset - filter rows corresponding to weekends only
- obtain average tips by
sex
andday
# Example 2 -----
df2 = qr.Querier(df)\
.select(req='tip, sex, day, size')\
.filtr(req="(day == 'Sun') | (day == 'Sat')")\
.summarize(req="avg(tip), sex, day", group_by="sex, day")\
.df
print(df2)
avg_tip sex day
0 2.801786 Female Sat
1 3.367222 Female Sun
2 3.083898 Male Sat
3 3.220345 Male Sun
A notebook containing these results can be found here. Contributions/remarks are welcome as usual, you can submit a pull request on Github.
Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!
Comments powered by Talkyard.