The querier is a query language that helps you to retrieve data from Python Data Frames. In this post from October 25, we presented the querier and different verbs constituting its grammar for wrangling data: concat
, delete
, drop
, filtr
, join
, select
, summarize
, update
, request
. In this other post from November 22, we showed how our querier verbs can be composed to form data wrangling pipelines.
The querier is now available on Pypi, and can be installed from the command line as:
pip install querier
We now show how to import data from csv and SQL databases. In example 1, we import data from csv. Then in example 2 we import data from a relational database (sqlite3).
Example 1
Import data from csv, and chain the querier’s operations select
, filtr
, summarize
.
import pandas as pd
import querier as qr
import sqlite3
import sys
# data -----
url = ('https://raw.github.com/pandas-dev'
'/pandas/master/pandas/tests/data/tips.csv')
# Example 1 - Import from csv -----
qrobj1 = qr.Querier(source=url)
df1 = qrobj1\
.select(req="tip, sex, smoker, time")\
.filtr(req="smoker == 'No'")\
.summarize(req="sum(tip), sex, time", group_by="sex, time")\
.df
print(df1)
sum_tip sex time
0 88.28 Female Dinner
1 61.49 Female Lunch
2 243.17 Male Dinner
3 58.83 Male Lunch
Example 2
Import data from an sqlite3 database, and chain the querier’s operations select
, filter
, summarize
.
# Example 2 - Import from sqlite3 -----
# an sqlite3 database connexion
con = sqlite3.connect('people.db')
with con:
cur = con.cursor()
cur.execute("CREATE TABLE Population(id INTEGER PRIMARY KEY, name TEXT, age INT, sex TEXT)")
cur.execute("INSERT INTO Population VALUES(NULL,'Michael',19, 'M')")
cur.execute("INSERT INTO Population VALUES(NULL,'Sandy', 41, 'F')")
cur.execute("INSERT INTO Population VALUES(NULL,'Betty', 34, 'F')")
cur.execute("INSERT INTO Population VALUES(NULL,'Chuck', 12, 'M')")
cur.execute("INSERT INTO Population VALUES(NULL,'Rich', 24, 'M')")
# create querier object from the sqlite3 database
qrobj2 = qr.Querier(source='people.db', table="Population")
# filter on people with age >= 20
df2 = qrobj2.select(req="name, age, sex").filtr(req="age >= 20").df
print("df2: ")
print(df2)
print("\n")
# avg. age for people with age >= 20, groupped by sex
qrobj3 = qr.Querier(source='people.db', table="Population")
df3 = qrobj3.select(req="name, age, sex").filtr(req="age >= 20")\
.summarize("avg(age), sex", group_by="sex").df
print("df3: ")
print(df3)
print("\n")
df2:
name age sex
1 Sandy 41 F
2 Betty 34 F
4 Rich 24 M
df3:
avg_age sex
0 37.5 F
1 24.0 M
Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!
Comments powered by Talkyard.