Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

DataTypeSystem

PyPI PyPI - Downloads

This Python package provides a type system for different data structures that are coercible to full arrays. It is a Python translation of the code of the Raku package "Data::TypeSystem", [AAp1].


Installation

Install from GitHub

pip install -e git+https://github.com/antononcube/Python-packages.git#egg=DataTypeSystem-antononcube\&subdirectory=DataTypeSystem

From PyPi

pip install DataTypeSystem

Usage examples

The type system conventions follow those of Mathematica's Dataset -- see the presentation "Dataset improvements".

Here we get the Titanic dataset, change the "passengerAge" column values to be numeric, and show dataset's dimensions:

import pandas

dfTitanic = pandas.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
dfTitanic = dfTitanic[["sex", "age", "pclass", "survived"]]
dfTitanic = dfTitanic.rename(columns ={"pclass": "class"})
dfTitanic.shape
(891, 4)

Here is a sample of dataset's records:

from DataTypeSystem import *

dfTitanic.sample(3)
sex age class survived
555 male 62.0 1 0
278 male 7.0 3 0
266 male 16.0 3 0

Here is the type of a single record:

deduce_type(dfTitanic.iloc[12].to_dict())
Struct([age, class, sex, survived], [float, int, str, int])

Here is the type of single record's values:

deduce_type(dfTitanic.iloc[12].to_dict().values())
Tuple([Atom(<class 'str'>), Atom(<class 'float'>), Atom(<class 'int'>), Atom(<class 'int'>)])

Here is the type of the whole dataset:

deduce_type(dfTitanic.to_dict())
Assoc(Atom(<class 'str'>), Assoc(Atom(<class 'int'>), Atom(<class 'str'>), 891), 4)

Here is the type of "values only" records:

valArr = dfTitanic.transpose().to_dict().values()
deduce_type(valArr)
Vector(Struct([age, class, sex, survived], [float, int, str, int]), 891)

References

[AAp1] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.