Data Frames are a way to represent tabular data, that is widely used and useful for Statistical Learning. Basically, a Data Frame = Tabular data + Named columns, and there are different implementations of this data structure, notably in R, Python and Apache Spark. The querier exposes a query language to retrieve data from Python pandas Data Frames, inspired from SQL’s relational databases querying. Currently, the querier can be installed from Github as:

pip install git+https://github.com/thierrymoudiki/querier.git

There are 9 types of operations available in the querier, with no plan to extend that list much further (to maintain a relatively simple mental model). These verbs will look familiar to dplyr users, but the implementation (numpy, pandas and SQLite3 are used) and functions’ signatures are different:

  • concat: concatenates 2 Data Frames, either horizontally or vertically

image-title-here

  • delete: deletes rows from a Data Frame based on given criteria

image-title-here

  • drop: drops columns from a Data Frame

image-title-here

  • filtr: filters rows of the Data Frame based on given criteria

image-title-here

  • join: joins 2 Data Frames based on given criteria (available for completeness of the interface, this operation is already straightforward in pandas)

image-title-here

  • select: selects columns from the Data Frame

image-title-here

  • summarize: obtains summaries of data based on grouping columns

image-title-here

  • update: updates a column/creates a new column, using an operation given by the user

image-title-here

  • request: for operations more complex than the previous 8 ones, makes it possible to directly use a SQL query on the Data Frame

The following notebooks present multiple examples of use of the querier:

Contributions/remarks are welcome as usual, you can submit a pull request on Github.

Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!

Licence Creative Commons
Under License Creative Commons Attribution 4.0 International.