Tento notebook je výukovým materiálem v předmětu BI-JUL.21 vyučovaném v zimním semestru akademického roku 2023/2024 Tomášem Kalvodou. Tvorba těchto materiálů byla podpořena NVS FIT.
Hlavní stránkou předmětu, kde jsou i další notebooky a zajímavé informace, je jeho Course Pages stránka.
versioninfo()
DataFrames.jl
Pro práci s daty jistě znáte Pythonovský nástroj pandas.
DataFrames.jl
je v podstatě Julia analog tohoto nástroje.
Pokud jste zvyklí pandas používat, může pro vás být užitečné porovnání pandas s Dataframes.jl
.
using DataFrames
DataFrame
?DataFrame
lze vytvořit mnoha způsoby.
Můžeme začít s prázdnou tabulkou a postupně ji naplnit daty, nebo využít existující matici, nebo data načíst z externího souboru.
Prázdný DataFrame
vytvoříme velmi snadno:
DataFrame()
NamedTuple
Data v pojmenovaných sloupcích můžeme předat pomocí keyword argumentů (klíčové slovo je název sloupce, hodnota data):
df = DataFrame(course=["BI-LA1", "BI-DML", "BI-MA1", "BI-MA2"], semester=[1, 1, 2, 2], department=18105)
df.course
Tímto způsobem bychom ovšem měli problém zadat data se sloupcích, jejichž názvy obsahují třeba speciální znaky jako mezery.
K tomu můžeme použít slovník, resp. dvojice (klíče mohou být řetězce nebo symboly -- ty jsou doporučené, místo mezer je vhodnější použít podtžítka :slovo_slovo
):
df = DataFrame("fiktivní postava" => ["Gandalf", "Harry Potter"], "kniha" => ["Pán prstenů", "Harry Potter a kámen mudrců"])
df.var"fiktivní postava"
df[:, "fiktivní postava"]
Další možností je použít NamedTuple
:
nt = (a = 1, b = 3)
nt[1], nt[2]
nt[:a], nt[:b]
Po sloupcích (pozor na jemný rozdíl od výše uvedeného způsobu):
DataFrame((a = [1,2,3], b = [4,5,6]))
Po řádcích:
DataFrame([(a = 1, b = 4), (a = 2, b = 5), (a = 3, b = 6)])
DataFrame([(a = 1, b = 4, c = nothing), (a = 2, b = 5, c = nothing), (a = 3, b = nothing, c = 6)])
K vytvoření DataFrame
můžeme použít i matici, jen musíme vyřešit pojmenování sloupců.
Automaticky (nutno zadat jako druhý argument :auto
) budou označeny jako x1
, x2
, atd.:
DataFrame(rand(2, 3), :auto)
DataFrame(rand(Int64, 2, 3), :auto)
V druhém argumentu případně můžeme zadat naše názvy sloupců:
DataFrame(rand(2, 3), ["col1", "col2", "col3"])
CSV.jl
V tomto notebooku budeme pro některé ukázky používat CSV anonymizovaný export z Grades předmětu BI-MA1 v semestru B212 (2021/2022).
Importu dat z CSV lze snadno docílit pomocí balíčku CSV.jl
, který přidáme standardně ] add CSV
a pak importujeme:
using CSV
Nyní stačí použít metodu read
z modulu CSV
, v prvním argumentu zadáme cestu k souboru a v druhém uvedeme požadovaný výstupní "formát", v našem případě DataFrame
:
df = CSV.read(joinpath("files", "bi-ma1-b232.csv"), DataFrame)
Row | username | test1 | test2 | test3 | second_chance | activity | tests_total | gitlab | assessment | exam_test | date | oral_exam | veto | points_total | mark | tutor | percentil |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String15 | Float64? | Float64? | Float64? | Float64? | Float64? | Float64 | Float64? | Bool | Int64? | String7? | Float64? | Bool? | Float64? | String1? | String7 | Int64 | |
1 | student001 | 3.0 | 11.5 | 13.5 | missing | 5.0 | 28.0 | missing | true | 17 | 13:15 | 34.0 | missing | 67.0 | D | Irena | 58 |
2 | student002 | 2.0 | 3.0 | 18.5 | 19.5 | 3.0 | 25.0 | missing | true | 18 | 12:45 | 48.0 | missing | 76.0 | C | Irena | 43 |
3 | student003 | missing | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ondra | 6 |
4 | student004 | 3.0 | missing | missing | missing | 4.0 | 3.0 | missing | false | missing | missing | missing | missing | missing | missing | Jan V | 13 |
5 | student005 | 3.0 | 18.0 | 17.0 | missing | 4.0 | 38.0 | missing | true | 19 | 17:15 | 56.0 | missing | 98.0 | A | Jarda | 93 |
6 | student006 | 3.0 | 10.5 | 15.0 | missing | missing | 28.5 | missing | true | 15 | 16:30 | 50.0 | missing | 78.5 | C | Jan S | 59 |
7 | student007 | 2.0 | 19.0 | 16.0 | missing | 2.0 | 37.0 | missing | true | 16 | 11:30 | 55.0 | missing | 94.0 | A | Jan V | 91 |
8 | student008 | missing | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Jitka | 6 |
9 | student009 | 3.0 | 8.0 | 14.0 | missing | 3.0 | 25.0 | missing | true | 18 | 13:15 | 42.0 | missing | 70.0 | C | Irena | 46 |
10 | student010 | 1.0 | 6.0 | 7.0 | missing | missing | 14.0 | missing | false | missing | missing | missing | missing | missing | missing | Jan S | 27 |
11 | student011 | 3.0 | 14.5 | 19.0 | missing | 4.0 | 36.5 | missing | true | 17 | 15:00 | 57.0 | missing | 97.5 | A | Jakub | 89 |
12 | student012 | 3.0 | 18.0 | 15.0 | missing | missing | 36.0 | missing | true | 17 | 11:30 | 54.0 | missing | 90.0 | A | Jan V | 88 |
13 | student013 | 2.0 | missing | missing | missing | missing | 2.0 | missing | false | missing | missing | missing | missing | missing | missing | Irena | 11 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
648 | student648 | 1.0 | 13.5 | 14.5 | missing | missing | 29.0 | missing | true | 15 | 11:00 | 45.0 | missing | 74.0 | C | Jan V | 62 |
649 | student649 | 1.0 | missing | missing | missing | missing | 1.0 | missing | false | missing | missing | missing | missing | missing | missing | Jitka | 9 |
650 | student650 | 2.5 | 1.0 | 1.0 | missing | missing | 4.5 | missing | false | missing | missing | missing | missing | missing | missing | Ondra | 17 |
651 | student651 | 0.0 | 8.0 | 8.0 | missing | 1.0 | 16.0 | missing | false | missing | missing | missing | missing | missing | missing | Jitka | 30 |
652 | student652 | 2.0 | 5.5 | 12.5 | 12.5 | 1.0 | 20.0 | missing | false | missing | missing | missing | missing | missing | missing | Jitka | 35 |
653 | student653 | 3.0 | 8.5 | 17.5 | missing | missing | 29.0 | missing | true | 16 | 14:45 | 55.0 | missing | 84.0 | B | Jitka | 62 |
654 | student654 | 1.0 | 14.0 | 15.0 | missing | missing | 30.0 | missing | true | 16 | 14:30 | 50.0 | missing | 80.0 | B | Jan S | 65 |
655 | student655 | 1.5 | 5.0 | missing | missing | missing | 6.5 | missing | false | missing | missing | missing | missing | missing | missing | Irena | 19 |
656 | student656 | 2.5 | 5.5 | 17.5 | missing | missing | 25.5 | missing | true | 16 | 10:45 | 54.5 | missing | 80.0 | B | Ondra | 49 |
657 | student657 | 1.0 | 15.0 | 6.5 | 16.0 | 1.0 | 25.0 | missing | true | 15 | 14:15 | 42.0 | missing | 68.0 | D | Irena | 41 |
658 | student658 | 1.0 | 15.0 | 17.5 | missing | 0.5 | 33.5 | missing | true | 17 | 13:45 | 56.0 | missing | 90.0 | A | Jarda | 79 |
659 | student659 | missing | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Jakub | 6 |
K těmto datům se vrátíme podrobněji později. Samotnou tabulku občas použijeme v různých ukázkách.
DataFrame
Z částečného výpisu dat výše vyvstává hned několik otázek:
Pojďme se vydat na průzkum. První, co můžeme zkusit, je podívat se na atributy naší instance (interaktivně pomocí TAB):
df.
ParseError: # Error @ ]8;;file:///home/kalvin/documents/fit/B241-BI-JUL/bi-jul/tutorials/In[4]#1:4\In[4]:1:4]8;;\ df. # └ ── premature end of input Stacktrace: [1] top-level scope @ In[4]:1
df.username
659-element Vector{String15}: "student001" "student002" "student003" "student004" "student005" "student006" "student007" "student008" "student009" "student010" "student011" "student012" "student013" ⋮ "student648" "student649" "student650" "student651" "student652" "student653" "student654" "student655" "student656" "student657" "student658" "student659"
df.assessment
659-element Vector{Bool}: 1 1 0 0 1 1 1 0 1 0 1 1 0 ⋮ 1 0 0 0 0 1 1 0 1 1 1 0
Nebo k tomu můžeme použít metodu names
z modulu DataFrames
:
names(df)
17-element Vector{String}: "username" "test1" "test2" "test3" "second_chance" "activity" "tests_total" "gitlab" "assessment" "exam_test" "date" "oral_exam" "veto" "points_total" "mark" "tutor" "percentil"
Metoda eachcol
vrátí iterátor přes sloupce. Můžeme tak zjistit typ prvků sloupců:
eltype.(eachcol(df))
17-element Vector{Type}: String15 Union{Missing, Float64} Union{Missing, Float64} Union{Missing, Float64} Union{Missing, Float64} Union{Missing, Float64} Float64 Union{Missing, Float64} Bool Union{Missing, Int64} Union{Missing, String7} Union{Missing, Float64} Union{Missing, Bool} Union{Missing, Float64} Union{Missing, String1} String7 Int64
Odtud vidíme, co znamenají otazníky u typů prvků sloupců.
Prvky sloupce jsou složeného typu Union{Missing, T}
, v některých sloupcích mohou hodnoty chybět, obsahují hodnotu missing
.
Stručné informace můžeme získat pomocí metody describe
.
Samozřejmě občas uvedené statistiky nemají moc smysl.
describe(df)
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | Type | |
1 | username | student001 | student659 | 0 | String15 | ||
2 | test1 | 2.02801 | 0.0 | 2.0 | 3.0 | 70 | Union{Missing, Float64} |
3 | test2 | 11.5804 | 0.0 | 12.0 | 20.0 | 68 | Union{Missing, Float64} |
4 | test3 | 13.5825 | 0.0 | 14.5 | 20.0 | 150 | Union{Missing, Float64} |
5 | second_chance | 12.6964 | 2.5 | 14.0 | 22.5 | 589 | Union{Missing, Float64} |
6 | activity | 2.38564 | 0.0 | 2.0 | 5.0 | 334 | Union{Missing, Float64} |
7 | tests_total | 22.8437 | 0.0 | 26.5 | 43.0 | 0 | Float64 |
8 | gitlab | 0.814815 | 0.5 | 0.5 | 4.0 | 632 | Union{Missing, Float64} |
9 | assessment | 0.616085 | false | 1.0 | true | 0 | Bool |
10 | exam_test | 16.9113 | -1 | 17.0 | 20 | 253 | Union{Missing, Int64} |
11 | date | --- | 17:30 | 254 | Union{Missing, String7} | ||
12 | oral_exam | 47.7257 | 0.0 | 49.0 | 57.0 | 278 | Union{Missing, Float64} |
13 | veto | 1.0 | true | 1.0 | true | 657 | Union{Missing, Bool} |
14 | points_total | 81.5019 | 55.0 | 81.5 | 105.0 | 279 | Union{Missing, Float64} |
15 | mark | A | F | 253 | Union{Missing, String1} | ||
16 | tutor | Irena | Ondra | 0 | String7 | ||
17 | percentil | 50.3612 | 0 | 50.0 | 100 | 0 | Int64 |
Výpis můžeme kontrolovat uvedením konkrétního rozsahu.
describe(df, cols=["test1", "test2", "test3"])
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Float64 | Float64 | Float64 | Float64 | Int64 | Union | |
1 | test1 | 2.02801 | 0.0 | 2.0 | 3.0 | 70 | Union{Missing, Float64} |
2 | test2 | 11.5804 | 0.0 | 12.0 | 20.0 | 68 | Union{Missing, Float64} |
3 | test3 | 13.5825 | 0.0 | 14.5 | 20.0 | 150 | Union{Missing, Float64} |
DataFrame
je tabulka, máme pro ní k dispozici podobné metody jako pro matice, na které se také lze dívat jako na "tabulky".
size(df)
(659, 17)
nrow(df)
659
ncol(df)
17
DataFrame
lze kopírovat pomocí metody copy
, nebo ho vyprázdnit pomocí metod empty
, resp. empty!
.
ex1 = DataFrame(rand(3, 2), :auto)
ex2 = copy(ex1)
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.781833 | 0.0730139 |
2 | 0.689409 | 0.292322 |
3 | 0.356713 | 0.156782 |
ex1
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.781833 | 0.0730139 |
2 | 0.689409 | 0.292322 |
3 | 0.356713 | 0.156782 |
empty!(ex2)
empty(ex1)
Row | x1 | x2 |
---|---|---|
Float64 | Float64 |
ex1
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.781833 | 0.0730139 |
2 | 0.689409 | 0.292322 |
3 | 0.356713 | 0.156782 |
ex2
Row | x1 | x2 |
---|---|---|
Float64 | Float64 |
To by bylo pro začátek vše k základním vlastnostem. Pojďme nyní tabulky modifikovat, upravovat.
Indexování vychází z maticového zápisu a má podobné vlastnosti. Ke sloupcům můžeme přistupovat několika způsoby:
df.mark
, df."mark"
df[!, :mark]
, df[!, "mark"]
df[:, :mark]
, df[:, "mark"]
Tímto způsobem můžeme data číst, ale i je modifikovat.
df.mark
659-element PooledArrays.PooledVector{Union{Missing, String1}, UInt32, Vector{UInt32}}: "D" "C" missing missing "A" "C" "A" missing "C" missing "A" "A" missing ⋮ "C" missing missing missing missing "B" "B" missing "B" "D" "A" missing
df.mark === df[!, :mark]
true
df.mark === df[:, :mark] # kopie!
false
df.mark == df[!, :mark]
missing
df.mark == df[:, :mark]
missing
missing == missing
missing
missing == true
missing
missing == false
missing
Vedle toho nemusíme používat název sloupce, v pořádku je použít i pořadí. Například prvních deset řádků v druhém a třetím sloupci:
df[1:10, ["test1", "test2", "test3"]]
Row | test1 | test2 | test3 |
---|---|---|---|
Float64? | Float64? | Float64? | |
1 | 3.0 | 11.5 | 13.5 |
2 | 2.0 | 3.0 | 18.5 |
3 | missing | missing | missing |
4 | 3.0 | missing | missing |
5 | 3.0 | 18.0 | 17.0 |
6 | 3.0 | 10.5 | 15.0 |
7 | 2.0 | 19.0 | 16.0 |
8 | missing | missing | missing |
9 | 3.0 | 8.0 | 14.0 |
10 | 1.0 | 6.0 | 7.0 |
Not
, Between
, Cols
, All
Dále můžeme vybírat pouze potřebné sloupce.
ex = DataFrame(rand(3, 5), :auto)
Row | x1 | x2 | x3 | x4 | x5 |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 0.6686 | 0.53519 | 0.180508 | 0.815861 | 0.49856 |
2 | 0.676451 | 0.548124 | 0.644572 | 0.724846 | 0.151916 |
3 | 0.683903 | 0.441108 | 0.745621 | 0.340693 | 0.95443 |
ex[:, Not(:x2)]
Row | x1 | x3 | x4 | x5 |
---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | |
1 | 0.6686 | 0.180508 | 0.815861 | 0.49856 |
2 | 0.676451 | 0.644572 | 0.724846 | 0.151916 |
3 | 0.683903 | 0.745621 | 0.340693 | 0.95443 |
Not(:x2)
InvertedIndex{Symbol}(:x2)
ex[:, Not([:x2, :x3])]
Row | x1 | x4 | x5 |
---|---|---|---|
Float64 | Float64 | Float64 | |
1 | 0.6686 | 0.815861 | 0.49856 |
2 | 0.676451 | 0.724846 | 0.151916 |
3 | 0.683903 | 0.340693 | 0.95443 |
ex[:, Between(:x1, :x3)]
Row | x1 | x2 | x3 |
---|---|---|---|
Float64 | Float64 | Float64 | |
1 | 0.6686 | 0.53519 | 0.180508 |
2 | 0.676451 | 0.548124 | 0.644572 |
3 | 0.683903 | 0.441108 | 0.745621 |
ex[:, Cols(:x2, :x4)]
Row | x2 | x4 |
---|---|---|
Float64 | Float64 | |
1 | 0.53519 | 0.815861 |
2 | 0.548124 | 0.724846 |
3 | 0.441108 | 0.340693 |
ex[:, All()]
Row | x1 | x2 | x3 | x4 | x5 |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 0.6686 | 0.53519 | 0.180508 | 0.815861 | 0.49856 |
2 | 0.676451 | 0.548124 | 0.644572 | 0.724846 | 0.151916 |
3 | 0.683903 | 0.441108 | 0.745621 | 0.340693 | 0.95443 |
Výše uvedené ukázky využívají funkcionalitu z DataFrames.jl
balíčku.
"Staré maticové" postupy samozřejmě fungují také, např.:
ex[:, [:x2, :x3]]
Row | x2 | x3 |
---|---|---|
Float64 | Float64 | |
1 | 0.53519 | 0.180508 |
2 | 0.548124 | 0.644572 |
3 | 0.441108 | 0.745621 |
Údaje můžeme měnit několika způsoby. Vezměme si jednoduchou tabulku:
m = [ "α" 1 0.5; "β" 2 1.5; "γ" 3 2.5]
3×3 Matrix{Any}: "α" 1 0.5 "β" 2 1.5 "γ" 3 2.5
ex = DataFrame(m, ["a", "b", "c"])
Row | a | b | c |
---|---|---|---|
Any | Any | Any | |
1 | α | 1 | 0.5 |
2 | β | 2 | 1.5 |
3 | γ | 3 | 2.5 |
Změna jedné položky ( je velké ):
ex[1, :a] = "Ξ"
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Any | |
1 | Ξ | 1 | 0.5 |
2 | β | 2 | 1.5 |
3 | γ | 3 | 2.5 |
Přepsání všech hodnot ve sloupci c
:
ex[:, :c] = ["tau", "pi", "omega"]
3-element Vector{String}: "tau" "pi" "omega"
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Any | |
1 | Ξ | 1 | tau |
2 | β | 2 | pi |
3 | γ | 3 | omega |
Přepsání jednou hodnotou.
ex[:, :c] .= "ř"
3-element view(::Vector{Any}, :) with eltype Any: "ř" "ř" "ř"
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Any | |
1 | Ξ | 1 | ř |
2 | β | 2 | ř |
3 | γ | 3 | ř |
z = ex[:, :c]
3-element Vector{Any}: "ř" "ř" "ř"
z[2] = "pí"
"pí"
z
3-element Vector{Any}: "ř" "pí" "ř"
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Any | |
1 | Ξ | 1 | ř |
2 | β | 2 | ř |
3 | γ | 3 | ř |
z = ex[!, :c]
3-element Vector{Any}: "ř" "ř" "ř"
z[2] = "¿"
"¿"
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Any | |
1 | Ξ | 1 | ř |
2 | β | 2 | ¿ |
3 | γ | 3 | ř |
ex.c = [true, false, missing]
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Bool? | |
1 | Ξ | 1 | true |
2 | β | 2 | false |
3 | γ | 3 | missing |
Přepsání hodnot ve sloupci a
danou hodnotou, ale pouze v řádcích, kde je hodnota ve sloupci b
lichá (zde opět používáme indexování bitovým vektorem, nebo "maskování", s kterým jsme se setkali dříve během semestru):
isodd.(ex.b)
3-element BitVector: 1 0 1
ex[isodd.(ex.b), :a] .= "λ"
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Bool? | |
1 | λ | 1 | true |
2 | β | 2 | false |
3 | λ | 3 | missing |
ex[iseven.(ex.b), :]
Row | a | b | c |
---|---|---|---|
Any | Any | Bool? | |
1 | β | 2 | false |
Občas je nutné projít tabulku řádek po řádku.
V tom případě můžeme iterovat přes index.
Například se pokusme změnit první sloupec na řetězec obsahující daný symbol se spodním indexem daným číslem ve sloupci b
v LaTeX notaci (tj. λ_1
atd.).
for row_id = axes(ex, 1) # 1:nrow(df)
ex[row_id, :a] = ex[row_id, :a] * "_" * string(ex[row_id, :b])
end
ex
Row | a | b | c |
---|---|---|---|
Any | Any | Bool? | |
1 | λ_1 | 1 | true |
2 | β_2 | 2 | false |
3 | λ_3 | 3 | missing |
for row in eachrow(ex)
println(row)
println(row[:a])
end
DataFrameRow Row │ a b c │ Any Any Bool? ─────┼───────────────── 1 │ λ_1 1 true λ_1 DataFrameRow Row │ a b c │ Any Any Bool? ─────┼───────────────── 2 │ β_2 2 false β_2 DataFrameRow Row │ a b c │ Any Any Bool? ─────┼─────────────────── 3 │ λ_3 3 missing λ_3
show
, first
, last
, view
)Pokud je DataFrame
příliš velký, tak při jeho zobrazení dojde k ořezání (řádků i sloupců). Toto chování můžeme přebít pomocí parametrů metody show
.
U našeho BI-ZMA příkladu je to lehce overkill.
show(df)
659×17 DataFrame Row │ username test1 test2 test3 second_chance activity ⋯ │ String15 Float64? Float64? Float64? Float64? Float64? ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ student001 3.0 11.5 13.5 missing 5.0 ⋯ 2 │ student002 2.0 3.0 18.5 19.5 3.0 3 │ student003 missing missing missing missing missing 4 │ student004 3.0 missing missing missing 4.0 5 │ student005 3.0 18.0 17.0 missing 4.0 ⋯ 6 │ student006 3.0 10.5 15.0 missing missing 7 │ student007 2.0 19.0 16.0 missing 2.0 8 │ student008 missing missing missing missing missing 9 │ student009 3.0 8.0 14.0 missing 3.0 ⋯ 10 │ student010 1.0 6.0 7.0 missing missing 11 │ student011 3.0 14.5 19.0 missing 4.0 ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 650 │ student650 2.5 1.0 1.0 missing missing 651 │ student651 0.0 8.0 8.0 missing 1.0 ⋯ 652 │ student652 2.0 5.5 12.5 12.5 1.0 653 │ student653 3.0 8.5 17.5 missing missing 654 │ student654 1.0 14.0 15.0 missing missing 655 │ student655 1.5 5.0 missing missing missing ⋯ 656 │ student656 2.5 5.5 17.5 missing missing 657 │ student657 1.0 15.0 6.5 16.0 1.0 658 │ student658 1.0 15.0 17.5 missing 0.5 659 │ student659 missing missing missing missing missing ⋯ 11 columns and 638 rows omitted
show(df, allcols=true)
659×17 DataFrame Row │ username test1 test2 test3 second_chance activity tests_total gitlab assessment exam_test date oral_exam veto points_total mark tutor percentil │ String15 Float64? Float64? Float64? Float64? Float64? Float64 Float64? Bool Int64? String7? Float64? Bool? Float64? String1? String7 Int64 ─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ student001 3.0 11.5 13.5 missing 5.0 28.0 missing true 17 13:15 34.0 missing 67.0 D Irena 58 2 │ student002 2.0 3.0 18.5 19.5 3.0 25.0 missing true 18 12:45 48.0 missing 76.0 C Irena 43 3 │ student003 missing missing missing missing missing 0.0 missing false missing missing missing missing missing missing Ondra 6 4 │ student004 3.0 missing missing missing 4.0 3.0 missing false missing missing missing missing missing missing Jan V 13 5 │ student005 3.0 18.0 17.0 missing 4.0 38.0 missing true 19 17:15 56.0 missing 98.0 A Jarda 93 6 │ student006 3.0 10.5 15.0 missing missing 28.5 missing true 15 16:30 50.0 missing 78.5 C Jan S 59 7 │ student007 2.0 19.0 16.0 missing 2.0 37.0 missing true 16 11:30 55.0 missing 94.0 A Jan V 91 8 │ student008 missing missing missing missing missing 0.0 missing false missing missing missing missing missing missing Jitka 6 9 │ student009 3.0 8.0 14.0 missing 3.0 25.0 missing true 18 13:15 42.0 missing 70.0 C Irena 46 10 │ student010 1.0 6.0 7.0 missing missing 14.0 missing false missing missing missing missing missing missing Jan S 27 11 │ student011 3.0 14.5 19.0 missing 4.0 36.5 missing true 17 15:00 57.0 missing 97.5 A Jakub 89 ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 650 │ student650 2.5 1.0 1.0 missing missing 4.5 missing false missing missing missing missing missing missing Ondra 17 651 │ student651 0.0 8.0 8.0 missing 1.0 16.0 missing false missing missing missing missing missing missing Jitka 30 652 │ student652 2.0 5.5 12.5 12.5 1.0 20.0 missing false missing missing missing missing missing missing Jitka 35 653 │ student653 3.0 8.5 17.5 missing missing 29.0 missing true 16 14:45 55.0 missing 84.0 B Jitka 62 654 │ student654 1.0 14.0 15.0 missing missing 30.0 missing true 16 14:30 50.0 missing 80.0 B Jan S 65 655 │ student655 1.5 5.0 missing missing missing 6.5 missing false missing missing missing missing missing missing Irena 19 656 │ student656 2.5 5.5 17.5 missing missing 25.5 missing true 16 10:45 54.5 missing 80.0 B Ondra 49 657 │ student657 1.0 15.0 6.5 16.0 1.0 25.0 missing true 15 14:15 42.0 missing 68.0 D Irena 41 658 │ student658 1.0 15.0 17.5 missing 0.5 33.5 missing true 17 13:45 56.0 missing 90.0 A Jarda 79 659 │ student659 missing missing missing missing missing 0.0 missing false missing missing missing missing missing missing Jakub 6 638 rows omitted
Toto by byl ještě větší overkill.
# show(df, allrows=true)
Případně můžeme prozkoumávat začátek a konec tabulky.
first(df, 10)
Row | username | test1 | test2 | test3 | second_chance | activity | tests_total | gitlab | assessment | exam_test | date | oral_exam | veto | points_total | mark | tutor | percentil |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String15 | Float64? | Float64? | Float64? | Float64? | Float64? | Float64 | Float64? | Bool | Int64? | String7? | Float64? | Bool? | Float64? | String1? | String7 | Int64 | |
1 | student001 | 3.0 | 11.5 | 13.5 | missing | 5.0 | 28.0 | missing | true | 17 | 13:15 | 34.0 | missing | 67.0 | D | Irena | 58 |
2 | student002 | 2.0 | 3.0 | 18.5 | 19.5 | 3.0 | 25.0 | missing | true | 18 | 12:45 | 48.0 | missing | 76.0 | C | Irena | 43 |
3 | student003 | missing | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ondra | 6 |
4 | student004 | 3.0 | missing | missing | missing | 4.0 | 3.0 | missing | false | missing | missing | missing | missing | missing | missing | Jan V | 13 |
5 | student005 | 3.0 | 18.0 | 17.0 | missing | 4.0 | 38.0 | missing | true | 19 | 17:15 | 56.0 | missing | 98.0 | A | Jarda | 93 |
6 | student006 | 3.0 | 10.5 | 15.0 | missing | missing | 28.5 | missing | true | 15 | 16:30 | 50.0 | missing | 78.5 | C | Jan S | 59 |
7 | student007 | 2.0 | 19.0 | 16.0 | missing | 2.0 | 37.0 | missing | true | 16 | 11:30 | 55.0 | missing | 94.0 | A | Jan V | 91 |
8 | student008 | missing | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Jitka | 6 |
9 | student009 | 3.0 | 8.0 | 14.0 | missing | 3.0 | 25.0 | missing | true | 18 | 13:15 | 42.0 | missing | 70.0 | C | Irena | 46 |
10 | student010 | 1.0 | 6.0 | 7.0 | missing | missing | 14.0 | missing | false | missing | missing | missing | missing | missing | missing | Jan S | 27 |
last(df, 10)
Row | username | test1 | test2 | test3 | second_chance | activity | tests_total | gitlab | assessment | exam_test | date | oral_exam | veto | points_total | mark | tutor | percentil |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String15 | Float64? | Float64? | Float64? | Float64? | Float64? | Float64 | Float64? | Bool | Int64? | String7? | Float64? | Bool? | Float64? | String1? | String7 | Int64 | |
1 | student650 | 2.5 | 1.0 | 1.0 | missing | missing | 4.5 | missing | false | missing | missing | missing | missing | missing | missing | Ondra | 17 |
2 | student651 | 0.0 | 8.0 | 8.0 | missing | 1.0 | 16.0 | missing | false | missing | missing | missing | missing | missing | missing | Jitka | 30 |
3 | student652 | 2.0 | 5.5 | 12.5 | 12.5 | 1.0 | 20.0 | missing | false | missing | missing | missing | missing | missing | missing | Jitka | 35 |
4 | student653 | 3.0 | 8.5 | 17.5 | missing | missing | 29.0 | missing | true | 16 | 14:45 | 55.0 | missing | 84.0 | B | Jitka | 62 |
5 | student654 | 1.0 | 14.0 | 15.0 | missing | missing | 30.0 | missing | true | 16 | 14:30 | 50.0 | missing | 80.0 | B | Jan S | 65 |
6 | student655 | 1.5 | 5.0 | missing | missing | missing | 6.5 | missing | false | missing | missing | missing | missing | missing | missing | Irena | 19 |
7 | student656 | 2.5 | 5.5 | 17.5 | missing | missing | 25.5 | missing | true | 16 | 10:45 | 54.5 | missing | 80.0 | B | Ondra | 49 |
8 | student657 | 1.0 | 15.0 | 6.5 | 16.0 | 1.0 | 25.0 | missing | true | 15 | 14:15 | 42.0 | missing | 68.0 | D | Irena | 41 |
9 | student658 | 1.0 | 15.0 | 17.5 | missing | 0.5 | 33.5 | missing | true | 17 | 13:45 | 56.0 | missing | 90.0 | A | Jarda | 79 |
10 | student659 | missing | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Jakub | 6 |
Pro rychlé prozkoumávání částí tabulky slouží metoda view
, resp. makro @view
, která jen část tabulky zobrazí.
Nevytváří nový objekt, měla by být efektivnější.
view(df, 100:150, [:username, :points_total, :mark])
Row | username | points_total | mark |
---|---|---|---|
String15 | Float64? | String1? | |
1 | student100 | 91.5 | A |
2 | student101 | 80.5 | B |
3 | student102 | missing | missing |
4 | student103 | 73.0 | C |
5 | student104 | 75.5 | C |
6 | student105 | missing | missing |
7 | student106 | 81.5 | B |
8 | student107 | 76.0 | C |
9 | student108 | missing | missing |
10 | student109 | 76.5 | C |
11 | student110 | missing | missing |
12 | student111 | 57.0 | E |
13 | student112 | missing | missing |
⋮ | ⋮ | ⋮ | ⋮ |
40 | student139 | 89.0 | B |
41 | student140 | 74.0 | C |
42 | student141 | 90.0 | A |
43 | student142 | 64.5 | D |
44 | student143 | missing | missing |
45 | student144 | missing | missing |
46 | student145 | 78.0 | C |
47 | student146 | missing | missing |
48 | student147 | 80.5 | B |
49 | student148 | missing | F |
50 | student149 | missing | missing |
51 | student150 | 100.5 | A |
@view df[10:15, [:username, :mark]]
Row | username | mark |
---|---|---|
String15 | String1? | |
1 | student010 | missing |
2 | student011 | A |
3 | student012 | A |
4 | student013 | missing |
5 | student014 | missing |
6 | student015 | D |
K přidávání sloupců slouží metoda insertcols!
:
ex = DataFrame(rand(2, 2), :auto)
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.894499 | 0.176588 |
2 | 0.0582584 | 0.709324 |
insertcols!(ex, 1, :a => [pi, pi])
Row | a | x1 | x2 |
---|---|---|---|
Irration… | Float64 | Float64 | |
1 | π | 0.894499 | 0.176588 |
2 | π | 0.0582584 | 0.709324 |
insertcols!(ex, 1, :a2 => [pi, -pi])
Row | a2 | a | x1 | x2 |
---|---|---|---|---|
Float64 | Irration… | Float64 | Float64 | |
1 | 3.14159 | π | 0.894499 | 0.176588 |
2 | -3.14159 | π | 0.0582584 | 0.709324 |
insertcols!(ex, 3, :b => "⊕")
Row | a2 | a | b | x1 | x2 |
---|---|---|---|---|---|
Float64 | Irration… | String | Float64 | Float64 | |
1 | 3.14159 | π | ⊕ | 0.894499 | 0.176588 |
2 | -3.14159 | π | ⊕ | 0.0582584 | 0.709324 |
insertcols!(ex, 5, :c => [1, 2])
Row | a2 | a | b | x1 | c | x2 |
---|---|---|---|---|---|---|
Float64 | Irration… | String | Float64 | Int64 | Float64 | |
1 | 3.14159 | π | ⊕ | 0.894499 | 1 | 0.176588 |
2 | -3.14159 | π | ⊕ | 0.0582584 | 2 | 0.709324 |
Ale lze použít i prosté indexování:
ex[:, :d] = [42, 42]
ex
Row | a2 | a | b | x1 | c | x2 | d |
---|---|---|---|---|---|---|---|
Float64 | Irration… | String | Float64 | Int64 | Float64 | Int64 | |
1 | 3.14159 | π | ⊕ | 0.894499 | 1 | 0.176588 | 42 |
2 | -3.14159 | π | ⊕ | 0.0582584 | 2 | 0.709324 | 42 |
Mazání sloupců musíme provést pomocí indexace, neexistuje metoda "dropcolumns!
".
ex[:, Not(:b)]
Row | a2 | a | x1 | c | x2 | d |
---|---|---|---|---|---|---|
Float64 | Irration… | Float64 | Int64 | Float64 | Int64 | |
1 | 3.14159 | π | 0.894499 | 1 | 0.176588 | 42 |
2 | -3.14159 | π | 0.0582584 | 2 | 0.709324 | 42 |
Vytvořme si zase testovací tabulku:
ex = DataFrame(rand(2, 2), :auto)
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
Přidat řádek lze opět několika způsoby. Nejpřirozenější je asi metoda push!
:
push!(ex, [1.0, 2.0])
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
3 | 1.0 | 2.0 |
push!(ex, [1, 2])
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
3 | 1.0 | 2.0 |
4 | 1.0 | 2.0 |
push!(ex, (0.5, 2.3))
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
3 | 1.0 | 2.0 |
4 | 1.0 | 2.0 |
5 | 0.5 | 2.3 |
push!(ex, ["a", "b"])
┌ Error: Error adding value to column :x1. Maybe you forgot passing `promote=true`? └ @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/dataframe/insertion.jl:810
MethodError: Cannot `convert` an object of type String to an object of type Float64 The function `convert` exists, but no method is defined for this combination of argument types. Closest candidates are: convert(::Type{T}, ::T) where T<:Number @ Base number.jl:6 convert(::Type{T}, ::CartesianIndex{1}) where T<:Number @ Base multidimensional.jl:136 convert(::Type{T}, ::AbstractChar) where T<:Number @ Base char.jl:185 ... Stacktrace: [1] push!(a::Vector{Float64}, item::String) @ Base ./array.jl:1249 [2] _row_inserter!(df::DataFrame, loc::Int64, row::Vector{String}, mode::Val{:push}, promote::Bool) @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/dataframe/insertion.jl:776 [3] #push!#339 @ ~/.julia/packages/DataFrames/kcA9R/src/dataframe/insertion.jl:545 [inlined] [4] push!(df::DataFrame, row::Vector{String}) @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/dataframe/insertion.jl:538 [5] top-level scope @ In[98]:1
push!(ex, (x2 = 0.5, x1 = 0.3))
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
3 | 1.0 | 2.0 |
4 | 1.0 | 2.0 |
5 | 0.5 | 2.3 |
6 | 0.3 | 0.5 |
push!(ex, (x2 = 7, x1 = 5))
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
3 | 1.0 | 2.0 |
4 | 1.0 | 2.0 |
5 | 0.5 | 2.3 |
6 | 0.3 | 0.5 |
7 | 5.0 | 7.0 |
Pomocí append!
můžeme spojovat několik tabulek "vertikálně" dohromady:
append!(ex, DataFrame(rand(10, 2), :auto))
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
3 | 1.0 | 2.0 |
4 | 1.0 | 2.0 |
5 | 0.5 | 2.3 |
6 | 0.3 | 0.5 |
7 | 5.0 | 7.0 |
8 | 0.320972 | 0.87072 |
9 | 0.110998 | 0.615922 |
10 | 0.113922 | 0.459498 |
11 | 0.603263 | 0.847671 |
12 | 0.347329 | 0.584542 |
13 | 0.505099 | 0.781451 |
14 | 0.466649 | 0.651833 |
15 | 0.208853 | 0.79092 |
16 | 0.544047 | 0.636684 |
17 | 0.590173 | 0.677212 |
Mazat řádky můžeme explicitně pomocí delete!
:
delete!(ex, 3)
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.0686912 | 0.786712 |
2 | 0.263486 | 0.322541 |
3 | 1.0 | 2.0 |
4 | 0.5 | 2.3 |
5 | 0.3 | 0.5 |
6 | 5.0 | 7.0 |
7 | 0.320972 | 0.87072 |
8 | 0.110998 | 0.615922 |
9 | 0.113922 | 0.459498 |
10 | 0.603263 | 0.847671 |
11 | 0.347329 | 0.584542 |
12 | 0.505099 | 0.781451 |
13 | 0.466649 | 0.651833 |
14 | 0.208853 | 0.79092 |
15 | 0.544047 | 0.636684 |
16 | 0.590173 | 0.677212 |
delete!(ex, [1, 2, 3])
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.5 | 2.3 |
2 | 0.3 | 0.5 |
3 | 5.0 | 7.0 |
4 | 0.320972 | 0.87072 |
5 | 0.110998 | 0.615922 |
6 | 0.113922 | 0.459498 |
7 | 0.603263 | 0.847671 |
8 | 0.347329 | 0.584542 |
9 | 0.505099 | 0.781451 |
10 | 0.466649 | 0.651833 |
11 | 0.208853 | 0.79092 |
12 | 0.544047 | 0.636684 |
13 | 0.590173 | 0.677212 |
ex.x1 .> 0.5
13-element BitVector: 0 0 1 0 0 0 1 0 1 0 0 1 1
delete!(ex, ex.x1 .> 0.5)
Row | x1 | x2 |
---|---|---|
Float64 | Float64 | |
1 | 0.5 | 2.3 |
2 | 0.3 | 0.5 |
3 | 0.320972 | 0.87072 |
4 | 0.110998 | 0.615922 |
5 | 0.113922 | 0.459498 |
6 | 0.347329 | 0.584542 |
7 | 0.466649 | 0.651833 |
8 | 0.208853 | 0.79092 |
Výše jsme shrnuli zřejmě ty nejužitečnější metody a techniky.
Tím ale možnosti DataFrames.jl
zdaleka nekončí.
Zvídavému čtenáři doporučujeme prolétnout dokumentaci.
Na tomto místě snad jen upozorněme na následující:
Query.jl
umožňující pracovat s různými datovými zdroji.DataFramesMeta.jl
poskytující užitečná makra pro práci s tabulkami.Velmi sofistikované transformace dat v tabulkách lze také provádět pomocí následujících metod:
combine
select
/select!
transform
/transform!
Jejich podrobný výklad je už nad rámec tohoto kurzu.
Pojďme se podrobněji podívat na data z předmětu BI-MA1.
Nejprve znovu načteme data z CSV souboru, který je syrovým exportem z Grades, kde jsou anonymizováni uživatelská jména studentů.
using CSV
df1 = CSV.read(joinpath("files", "bi-ma1-b212.csv"), DataFrame)
df2 = CSV.read(joinpath("files", "bi-ma1-b222.csv"), DataFrame)
df3 = CSV.read(joinpath("files", "bi-ma1-b232.csv"), DataFrame);
describe(df1)
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | Type | |
1 | username | student001 | student528 | 0 | String15 | ||
2 | test1 | 8.45199 | 0.0 | 8.0 | 20.0 | 101 | Union{Missing, Float64} |
3 | test2 | 13.3188 | 0.0 | 14.0 | 20.0 | 219 | Union{Missing, Float64} |
4 | second_chance | 15.0588 | 0.0 | 15.0 | 23.5 | 460 | Union{Missing, Float64} |
5 | activity | 1.95629 | 0.0 | 2.0 | 5.0 | 242 | Union{Missing, Float64} |
6 | tests_total | 14.928 | 0.0 | 15.5 | 39.5 | 0 | Float64 |
7 | gitlab | 1.30769 | 0.0 | 1.0 | 7.0 | 489 | Union{Missing, Float64} |
8 | assessment | 0.395833 | false | 0.0 | true | 0 | Bool |
9 | exam_test | 16.7081 | 0 | 17.0 | 20 | 319 | Union{Missing, Int64} |
10 | date | --- | 16:50 | 319 | Union{Missing, String7} | ||
11 | oral_exam | 48.3026 | 28.0 | 50.0 | 60.0 | 338 | Union{Missing, Float64} |
12 | veto | 1.0 | true | 1.0 | true | 525 | Union{Missing, Bool} |
13 | points_total | 79.6217 | 55.0 | 80.0 | 103.5 | 339 | Union{Missing, Float64} |
14 | mark | A | F | 319 | Union{Missing, String1} | ||
15 | tutor | Honza | Tomáš | 8 | Union{Missing, String7} | ||
16 | percentil | 53.1136 | 23 | 50.0 | 100 | 0 | Int64 |
size(df1)
(528, 16)
nrow(df1)
528
describe(df2)
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | Type | |
1 | username | student0001 | student0760 | 0 | String15 | ||
2 | test1 | 1.69818 | 0.0 | 1.5 | 3.0 | 99 | Union{Missing, Float64} |
3 | test2 | 10.6672 | 0.0 | 11.0 | 20.0 | 132 | Union{Missing, Float64} |
4 | test3 | 12.6893 | 0.0 | 13.5 | 20.0 | 245 | Union{Missing, Float64} |
5 | second_chance | 14.5402 | 5.0 | 15.0 | 22.0 | 673 | Union{Missing, Float64} |
6 | activity | 2.27454 | -6.0 | 2.0 | 5.0 | 383 | Union{Missing, Float64} |
7 | tests_total | 19.1148 | 0.0 | 24.5 | 43.0 | 2 | Union{Missing, Float64} |
8 | gitlab | 2.23684 | 1 | 1.0 | 12 | 722 | Union{Missing, Int64} |
9 | assessment | 0.497361 | false | 0.0 | true | 2 | Union{Missing, Bool} |
10 | exam_test | 16.4377 | -1 | 17.0 | 20 | 383 | Union{Missing, Int64} |
11 | date | --- | CT 14:30 | 383 | Union{Missing, String15} | ||
12 | oral_exam | 46.8988 | 20.0 | 48.0 | 57.0 | 419 | Union{Missing, Float64} |
13 | veto | 1.0 | true | 1.0 | true | 757 | Union{Missing, Bool} |
14 | points_total | 80.3235 | 54.5 | 80.5 | 105.0 | 420 | Union{Missing, Float64} |
15 | mark | A | F | 383 | Union{Missing, String1} | ||
16 | tutor | Irena | Petr | 2 | Union{Missing, String7} | ||
17 | percentil | 50.9513 | 12 | 50.0 | 100 | 0 | Int64 |
size(df2)
(760, 17)
nrow(df2)
760
describe(df3)
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | Type | |
1 | username | student001 | student659 | 0 | String15 | ||
2 | test1 | 2.02801 | 0.0 | 2.0 | 3.0 | 70 | Union{Missing, Float64} |
3 | test2 | 11.5804 | 0.0 | 12.0 | 20.0 | 68 | Union{Missing, Float64} |
4 | test3 | 13.5825 | 0.0 | 14.5 | 20.0 | 150 | Union{Missing, Float64} |
5 | second_chance | 12.6964 | 2.5 | 14.0 | 22.5 | 589 | Union{Missing, Float64} |
6 | activity | 2.38564 | 0.0 | 2.0 | 5.0 | 334 | Union{Missing, Float64} |
7 | tests_total | 22.8437 | 0.0 | 26.5 | 43.0 | 0 | Float64 |
8 | gitlab | 0.814815 | 0.5 | 0.5 | 4.0 | 632 | Union{Missing, Float64} |
9 | assessment | 0.616085 | false | 1.0 | true | 0 | Bool |
10 | exam_test | 16.9113 | -1 | 17.0 | 20 | 253 | Union{Missing, Int64} |
11 | date | --- | 17:30 | 254 | Union{Missing, String7} | ||
12 | oral_exam | 47.7257 | 0.0 | 49.0 | 57.0 | 278 | Union{Missing, Float64} |
13 | veto | 1.0 | true | 1.0 | true | 657 | Union{Missing, Bool} |
14 | points_total | 81.5019 | 55.0 | 81.5 | 105.0 | 279 | Union{Missing, Float64} |
15 | mark | A | F | 253 | Union{Missing, String1} | ||
16 | tutor | Irena | Ondra | 0 | String7 | ||
17 | percentil | 50.3612 | 0 | 50.0 | 100 | 0 | Int64 |
size(df3)
(659, 17)
Nejprve prozkoumejte dostupné sloupce a pokuste se zjistit základní údaje jako
Počet studentů.
nrow(df1)
528
nrow(df2)
760
nrow(df3)
659
Kolik studentů má zápočet?
sum(df1.assessment)
209
length(filter(isone, df1.assessment))
209
sum(df2.assessment)
missing
sum(skipmissing(df2.assessment))
377
sum(df2[(!ismissing).(df2.assessment), :assessment])
377
filter(x -> !ismissing(x), df2.assessment)
758-element Vector{Union{Missing, Bool}}: 1 1 0 0 0 0 1 1 1 1 0 1 0 ⋮ 1 0 0 1 1 1 1 0 1 1 0 1
filter(x -> !ismissing(x), df2.assessment) |> sum
377
sum(dropmissing(df2, :assessment).assessment)
377
coalesce.(df2.assessment, false) |> sum
377
sum(df3.assessment)
406
Tj. průchodnost zápočtu.
sum(df1.assessment) / nrow(df1)
0.3958333333333333
sum(coalesce.(df2.assessment, 0)) / nrow(df2)
0.49605263157894736
sum(df3.assessment) / nrow(df3)
0.6160849772382397
Počet vet v tabulce.
sum(df1.veto)
missing
missing + 1 # aha! pozor na missing hodnoty
missing
sum(dropmissing(df1[:, Cols(:veto)]).veto)
3
sum(filter(x -> !ismissing(x), df1.veto))
3
sum(dropmissing(df1, :veto).veto)
3
sum(dropmissing(df2, :veto).veto)
3
Získalo zápočet a úspěšně dokončilo předmět:
length(filter(x -> x != "F", dropmissing(df1, :mark).mark))
189
sum(dropmissing(df1, :mark).mark .!= "F")
189
sum(dropmissing(df2, :mark).mark .!= "F")
340
sum(dropmissing(df3, :mark).mark .!= "F")
380
Počet studentů, kteří získali zápočet a současně neuspěli u zkoušky.
sum(df1.assessment .&& df1.mark .== "F")
20
sum(coalesce.(df2.assessment, false) .&& df2.mark .== "F")
37
sum(coalesce.(df3.assessment, false) .&& df3.mark .== "F")
26
Poté se podívejme podrobněji jak probíhalo získávání zápočtů.
using PyPlot, Statistics
Histogram prvního zápočtové testu.
plt.grid()
hist(collect(skipmissing(df3.test1)), bins=[ -0.49 + j for j in 0:4]);
Pozor, se samotným skipmissing
nefunguje.
skipmissing(df3.test1)
skipmissing(Union{Missing, Float64}[3.0, 2.0, missing, 3.0, 3.0, 3.0, 2.0, missing, 3.0, 1.0 … 2.5, 0.0, 2.0, 3.0, 1.0, 1.5, 2.5, 1.0, 1.0, missing])
collect(skipmissing(df3.test1))
589-element Vector{Float64}: 3.0 2.0 3.0 3.0 3.0 2.0 3.0 1.0 3.0 3.0 2.0 2.5 2.5 ⋮ 3.0 1.0 1.0 2.5 0.0 2.0 3.0 1.0 1.5 2.5 1.0 1.0
mean(skipmissing(df3.test1))
2.0280135823429544
median(skipmissing(df3.test1))
2.0
length(collect(skipmissing(df3.test1)))
589
nrow(df3) - length(df3.test1[ismissing.(df3.test1)])
589
Třetí běh lze porovnat s druhým během!
plt.grid()
hist(collect(skipmissing(df2.test1)), bins=[ -0.49 + j for j in 0:4]);
Histogram druhého zápočtového testu.
plt.grid()
hist(collect(skipmissing(df3.test2)), bins=[ -0.49 + j for j in 0:21]);
plt.grid()
hist(collect(skipmissing(df2.test2)), bins=[ -0.49 + j for j in 0:21]);
Třetí test.
plt.grid()
hist(collect(skipmissing(df2.test3)), bins=[ -0.49 + j for j in 0:21]);
plt.grid()
hist(collect(skipmissing(df2.test3)), bins=[ -0.49 + j for j in 0:21]);
Počet studentů píšících testy v druhém běhu.
df2.test1 |> skipmissing |> collect |> length
661
df2.test2 |> skipmissing |> collect |> length
628
df2.test3 |> skipmissing |> collect |> length
515
A v třetím běhu.
df3.test1 |> skipmissing |> collect |> length
589
df3.test2 |> skipmissing |> collect |> length
591
df3.test3 |> skipmissing |> collect |> length
509
Počet studentů, kteří získali zápočet bez opravné zápočtové písemky.
sum(coalesce.(df1.test1 + df1.test2, 0) .>= 25.)
165
sum(coalesce.(df2.test1 + df2.test2 + df2.test3, 0) .>= 25.)
325
sum(coalesce.(df3.test1 + df3.test2 + df3.test3, 0) .>= 25.)
367
Počet studentů s alespoň 20 body ze zápočtových písemek.
nrow(df1[coalesce.(df1.tests_total .>= 20, false), :])
234
nrow(df2[coalesce.(df2.tests_total .>= 20, false), :])
418
nrow(df3[coalesce.(df3.tests_total .>= 20, false), :])
444
plt.grid()
hist(collect(skipmissing(df3.test1 + df3.test2 + df3.test3)), bins=[ -0.49 + j for j in 0:44]);
plt.grid()
hist(collect(skipmissing(df2.test1 + df2.test2 + df2.test3)), bins=[ -0.49 + j for j in 0:44]);
Histogram opravné zápočtové písemky.
plt.grid()
hist(filter(x -> !ismissing(x), df1.second_chance), bins=[ -0.49 + j for j in 0:25]);
plt.grid()
hist(filter(x -> !ismissing(x), df2.second_chance), bins=[ -0.49 + j for j in 0:25]);
Histogram celkového počtu bodů ze semestru.
plt.grid()
hist(filter(x -> !ismissing(x), df1.tests_total), bins=[ -0.49 + j for j in 0:41]);
plt.grid()
hist(filter(x -> !ismissing(x), df2.tests_total), bins=[ -0.49 + j for j in 0:44]);
plt.grid()
hist(filter(x -> !ismissing(x), df3.tests_total), bins=[ -0.49 + j for j in 0:44]);
Úspěšnost zisku zápočtu podle cvičících.
groupby(df1, :tutor)
GroupedDataFrame with 7 groups based on key: tutor
Row | username | test1 | test2 | second_chance | activity | tests_total | gitlab | assessment | exam_test | date | oral_exam | veto | points_total | mark | tutor | percentil |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String15 | Float64? | Float64? | Float64? | Float64? | Float64 | Float64? | Bool | Int64? | String7? | Float64? | Bool? | Float64? | String1? | String7? | Int64 | |
1 | student001 | 2.0 | missing | missing | missing | 2.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 29 |
2 | student007 | 9.5 | 6.0 | missing | missing | 15.5 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 50 |
3 | student010 | 0.0 | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 23 |
4 | student022 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 23 |
5 | student023 | 2.0 | 8.5 | missing | missing | 10.5 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 46 |
6 | student037 | 2.5 | missing | missing | missing | 2.5 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 30 |
7 | student038 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 23 |
8 | student044 | 12.0 | 16.5 | missing | 1.5 | 28.5 | missing | true | 18 | 11:20 | 60.0 | missing | 90.0 | A | Honza | 83 |
9 | student045 | 9.0 | 13.5 | 15.0 | missing | 25.0 | missing | true | 15 | 11:20 | 51.0 | missing | 76.0 | C | Honza | 71 |
10 | student047 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 23 |
11 | student068 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 23 |
12 | student073 | 10.0 | 10.0 | 15.0 | missing | 25.0 | missing | true | 15 | 10:20 | 47.0 | missing | 72.0 | C | Honza | 71 |
13 | student077 | 11.0 | 15.0 | missing | missing | 26.0 | missing | true | 17 | 10:45 | 50.0 | missing | 76.0 | C | Honza | 75 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
77 | student473 | 4.5 | missing | missing | missing | 4.5 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 37 |
78 | student477 | 3.0 | missing | missing | missing | 3.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 33 |
79 | student482 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 23 |
80 | student484 | 0.0 | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 23 |
81 | student488 | 13.0 | 16.0 | missing | 2.5 | 29.0 | missing | true | 19 | 12:20 | 54.0 | missing | 85.5 | B | Honza | 84 |
82 | student500 | 5.0 | 12.0 | missing | missing | 17.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 52 |
83 | student505 | 6.0 | 14.5 | 23.5 | 0.0 | 25.0 | missing | true | 17 | 10:20 | 31.0 | missing | 56.0 | E | Honza | 58 |
84 | student506 | 1.0 | missing | missing | missing | 1.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 26 |
85 | student518 | 8.0 | 14.0 | 17.0 | missing | 25.0 | missing | true | 16 | 16:50 | 42.0 | missing | 67.0 | D | Honza | 71 |
86 | student520 | 1.0 | missing | missing | missing | 1.0 | missing | false | missing | missing | missing | missing | missing | missing | Honza | 26 |
87 | student524 | 5.5 | 19.5 | missing | missing | 25.0 | missing | true | 15 | 11:00 | 30.0 | missing | 55.0 | E | Honza | 71 |
88 | student526 | 10.5 | 17.0 | missing | 2.0 | 27.5 | missing | true | 13 | --- | missing | missing | missing | F | Honza | 80 |
⋮
Row | username | test1 | test2 | second_chance | activity | tests_total | gitlab | assessment | exam_test | date | oral_exam | veto | points_total | mark | tutor | percentil |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String15 | Float64? | Float64? | Float64? | Float64? | Float64 | Float64? | Bool | Int64? | String7? | Float64? | Bool? | Float64? | String1? | String7? | Int64 | |
1 | student020 | 11.5 | 9.5 | 10.5 | missing | 21.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 59 |
2 | student032 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 23 |
3 | student041 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 23 |
4 | student058 | 5.0 | missing | missing | missing | 5.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 38 |
5 | student060 | 1.0 | missing | missing | missing | 1.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 26 |
6 | student070 | 0.0 | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 23 |
7 | student075 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 23 |
8 | student095 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 23 |
9 | student098 | 4.0 | 6.5 | missing | missing | 10.5 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 46 |
10 | student100 | 5.0 | missing | missing | missing | 5.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 38 |
11 | student111 | 7.0 | missing | missing | missing | 7.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 42 |
12 | student112 | 9.0 | 10.0 | missing | missing | 19.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 55 |
13 | student120 | 9.0 | 11.0 | 10.5 | missing | 20.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 57 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
37 | student381 | 5.0 | missing | missing | missing | 5.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 38 |
38 | student391 | 9.0 | 13.5 | 17.0 | missing | 25.0 | missing | true | 10 | --- | missing | missing | missing | F | Ivo | 71 |
39 | student414 | 12.5 | 20.0 | missing | missing | 32.5 | 0.5 | true | 17 | 15:30 | 41.0 | missing | 74.0 | C | Ivo | 92 |
40 | student425 | 4.0 | missing | missing | missing | 4.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 36 |
41 | student432 | 9.0 | 6.5 | missing | missing | 15.5 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 50 |
42 | student451 | 0.5 | 5.0 | missing | missing | 5.5 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 39 |
43 | student454 | 13.0 | 9.0 | 11.0 | missing | 22.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 59 |
44 | student478 | 3.5 | missing | missing | missing | 3.5 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 34 |
45 | student479 | 9.5 | 14.0 | 13.0 | missing | 23.5 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 61 |
46 | student511 | missing | missing | missing | missing | 0.0 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 23 |
47 | student512 | 2.5 | 6.0 | missing | 1.0 | 8.5 | missing | false | missing | missing | missing | missing | missing | missing | Ivo | 44 |
48 | student528 | 13.0 | 13.0 | missing | 1.0 | 26.0 | 0.0 | true | 15 | 10:00 | 42.0 | missing | 69.0 | D | Ivo | 75 |
df = combine(groupby(df1, :tutor),
nrow => "studenti",
:assessment => sum => "zápočty",
:test1 => (x -> sum(coalesce.(x, 0) .> 0)) => "psali 1. test",
:activity => (x -> mean(coalesce.(x, 0))) => "průměrná aktivita"
)
df[:, "propustnost"] = df[:, "zápočty"] ./ df[:, "studenti"]
df
Row | tutor | studenti | zápočty | psali 1. test | průměrná aktivita | propustnost |
---|---|---|---|---|---|---|
String7? | Int64 | Int64 | Int64 | Float64 | Float64 | |
1 | Honza | 88 | 26 | 58 | 0.375 | 0.295455 |
2 | Irena | 95 | 30 | 63 | 1.02105 | 0.315789 |
3 | Jitka | 95 | 34 | 78 | 1.23158 | 0.357895 |
4 | Pavel | 96 | 38 | 78 | 1.75 | 0.395833 |
5 | missing | 8 | 0 | 0 | 0.0 | 0.0 |
6 | Tomáš | 98 | 69 | 90 | 1.38265 | 0.704082 |
7 | Ivo | 48 | 12 | 39 | 0.1875 | 0.25 |
df = combine(groupby(df2, :tutor),
nrow => "studenti",
:assessment => (x -> sum(collect(skipmissing(x)))) => "zápočty",
:test1 => (x -> sum(coalesce.(x, 0) .> 0)) => "psali 1. test",
:activity => (x -> mean(coalesce.(x, 0))) => "průměrná aktivita"
)
df[:, "propustnost"] = df[:, "zápočty"] ./ df[:, "studenti"]
df
Row | tutor | studenti | zápočty | psali 1. test | průměrná aktivita | propustnost |
---|---|---|---|---|---|---|
String7? | Int64 | Int64 | Int64 | Float64 | Float64 | |
1 | Jan V | 146 | 64 | 115 | 0.424658 | 0.438356 |
2 | Pavel | 149 | 84 | 120 | 1.07383 | 0.563758 |
3 | Jitka | 73 | 41 | 62 | 1.76027 | 0.561644 |
4 | Petr | 99 | 39 | 64 | 0.459596 | 0.393939 |
5 | Jiřina | 68 | 36 | 54 | 1.49265 | 0.529412 |
6 | Irena | 74 | 41 | 68 | 1.5 | 0.554054 |
7 | Jarda | 100 | 57 | 90 | 2.46 | 0.57 |
8 | Jan S | 49 | 15 | 33 | 0.0612245 | 0.306122 |
9 | missing | 2 | 0 | 0 | 0.0 | 0.0 |
df = combine(groupby(df3, :tutor),
nrow => "studenti",
:assessment => (x -> sum(collect(skipmissing(x)))) => "zápočty",
:test1 => (x -> sum(coalesce.(x, 0) .> 0)) => "psali 1. test",
:activity => (x -> mean(coalesce.(x, 0))) => "průměrná aktivita"
)
df[:, "propustnost"] = df[:, "zápočty"] ./ df[:, "studenti"]
df
Row | tutor | studenti | zápočty | psali 1. test | průměrná aktivita | propustnost |
---|---|---|---|---|---|---|
String7 | Int64 | Int64 | Int64 | Float64 | Float64 | |
1 | Irena | 99 | 66 | 90 | 2.33333 | 0.666667 |
2 | Ondra | 94 | 51 | 79 | 0.205676 | 0.542553 |
3 | Jan V | 144 | 77 | 126 | 0.673611 | 0.534722 |
4 | Jarda | 95 | 76 | 89 | 2.59474 | 0.8 |
5 | Jan S | 47 | 36 | 42 | 0.255319 | 0.765957 |
6 | Jitka | 96 | 60 | 87 | 1.20312 | 0.625 |
7 | Jakub | 84 | 40 | 48 | 0.642857 | 0.47619 |
V BI-MA1 hodně rozdílné vůči BI-ZMA.
Histogram známek.
combine(groupby(df1, :mark), nrow => "studenti") |> sort
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 42 |
2 | B | 57 |
3 | C | 53 |
4 | D | 32 |
5 | E | 5 |
6 | F | 20 |
7 | missing | 319 |
combine(groupby(df2, :mark), nrow => "studenti") |> sort
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 79 |
2 | B | 104 |
3 | C | 110 |
4 | D | 41 |
5 | E | 6 |
6 | F | 37 |
7 | missing | 383 |
combine(groupby(df3, :mark), nrow => "studenti") |> sort
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 108 |
2 | B | 121 |
3 | C | 107 |
4 | D | 35 |
5 | E | 9 |
6 | F | 26 |
7 | missing | 253 |
sort(combine(groupby(df1[coalesce.(df1.gitlab, 0) .> 0 ,:], :mark), nrow => "studenti"), :mark)
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 12 |
2 | B | 7 |
3 | C | 8 |
4 | D | 2 |
5 | F | 4 |
6 | missing | 5 |
sort(combine(groupby(df2[coalesce.(df2.gitlab, 0) .> 0 ,:], :mark), nrow => "studenti"), :mark)
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 13 |
2 | B | 12 |
3 | C | 4 |
4 | D | 1 |
5 | F | 2 |
6 | missing | 6 |
sort(combine(groupby(df3[coalesce.(df3.gitlab, 0) .> 0 ,:], :mark), nrow => "studenti"), :mark)
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 9 |
2 | B | 4 |
3 | C | 5 |
4 | D | 1 |
5 | F | 1 |
6 | missing | 7 |
sort(combine(groupby(df1[coalesce.(df1.activity, 0) .> 2 ,:], :mark), nrow => "studenti"), :mark)
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 29 |
2 | B | 25 |
3 | C | 13 |
4 | D | 10 |
5 | F | 4 |
6 | missing | 15 |
sort(combine(groupby(df2[coalesce.(df2.activity, 0) .> 2 ,:], :mark), nrow => "studenti"), :mark)
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 47 |
2 | B | 39 |
3 | C | 32 |
4 | D | 6 |
5 | F | 7 |
6 | missing | 28 |
sort(combine(groupby(df3[coalesce.(df3.activity, 0) .> 2 ,:], :mark), nrow => "studenti"), :mark)
Row | mark | studenti |
---|---|---|
String1? | Int64 | |
1 | A | 58 |
2 | B | 43 |
3 | C | 16 |
4 | D | 7 |
5 | F | 7 |
6 | missing | 16 |
plt.grid()
hist(filter(x -> !ismissing(x), df1.oral_exam), bins=[ -0.49 + j for j in 0:61]);
plt.grid()
hist(filter(x -> !ismissing(x), df2.oral_exam), bins=[ -0.49 + j for j in 0:58]);
plt.grid()
hist(filter(x -> !ismissing(x), df3.oral_exam), bins=[ -0.49 + j for j in 0:58]);
plt.grid()
hist(filter(x -> !ismissing(x), df1.exam_test), bins=[ -0.49 + j for j in 0:21]);
plt.grid()
hist(filter(x -> !ismissing(x), df2.exam_test), bins=[ -0.49 + j for j in 0:21]);
plt.grid()
hist(filter(x -> !ismissing(x), df3.exam_test), bins=[ -0.49 + j for j in 0:21]);
combine(groupby(df1, :exam_test), nrow => "počet") |> sort
Row | exam_test | počet |
---|---|---|
Int64? | Int64 | |
1 | 0 | 4 |
2 | 10 | 1 |
3 | 11 | 1 |
4 | 12 | 3 |
5 | 13 | 2 |
6 | 14 | 5 |
7 | 15 | 29 |
8 | 16 | 37 |
9 | 17 | 40 |
10 | 18 | 37 |
11 | 19 | 34 |
12 | 20 | 16 |
13 | missing | 319 |
combine(groupby(df1[df1.assessment .=== true, :], :exam_test), nrow => "počet") |> sort
Row | exam_test | počet |
---|---|---|
Int64? | Int64 | |
1 | 0 | 4 |
2 | 10 | 1 |
3 | 11 | 1 |
4 | 12 | 3 |
5 | 13 | 2 |
6 | 14 | 5 |
7 | 15 | 29 |
8 | 16 | 37 |
9 | 17 | 40 |
10 | 18 | 37 |
11 | 19 | 34 |
12 | 20 | 16 |
Další informace, detaily či tutoriály lze nalézt v dokumentaci DataFrames.jl
.
V notebooku jsme používali balíček CSV.jl
.