Data Frames are a way to represent tabular data, that is widely used and useful for Statistical Learning. Basically, a Data Frame = Tabular data + Named columns, and there are different implementations of this data structure, notably in R, Python and Apache Spark. The querier exposes a query language to retrieve data from Python
pandas Data Frames, inspired from SQL’s relational databases querying. Currently, the
querier can be installed from Github as:
pip install git+https://github.com/thierrymoudiki/querier.git
There are 9 types of operations available in the
querier, with no plan to extend that list much further (to maintain a relatively simple mental model). These verbs will look familiar to
dplyr users, but the implementation (
SQLite3 are used) and functions’ signatures are different:
concat: concatenates 2 Data Frames, either horizontally or vertically
delete: deletes rows from a Data Frame based on given criteria
drop: drops columns from a Data Frame
filtr: filters rows of the Data Frame based on given criteria
join: joins 2 Data Frames based on given criteria (available for completeness of the interface, this operation is already straightforward in pandas)
select: selects columns from the Data Frame
summarize: obtains summaries of data based on grouping columns
update: updates a column/creates a new column, using an operation given by the user
request: for operations more complex than the previous 8 ones, makes it possible to directly use a SQL query on the Data Frame
The following notebooks present multiple examples of use of the
Contributions/remarks are welcome as usual, you can submit a pull request on Github.
Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!
Under License Creative Commons Attribution 4.0 International.