Statistical analysis β€” comparison of file-system benchmarks: Btrfs, Ext4, XFS, ZFSΒΆ

I will process and compare data collected from fio benchmark JSON output on the following filesystems β€” Btrfs, Ext4, XFS, and ZFS β€” using performance-based metrics.

Data originΒΆ

Each benchmark was run inside a virtual machine managed with libvirt, where a different operating system was used depending on the filesystem:

  • Btrfs, Ext4 β€” Arch Linux
  • XFS β€” Gentoo
  • ZFS β€” FreeBSD

All benchmarks were executed on my ThinkPad X230 using the fio command via a simple bash script that created all required fio job files, launched fio for every workload type, repeated each workload 10 times, and collected results into the ./benchmark_results folder.

Every job used the same parameters:

  1. ioengine=sync β€” synchronous I/O engine: each READ or WRITE blocks until it completes.
  2. bs=4k β€” block size set to 4 KiB (each READ/WRITE operates on 4 kilobytes).
  3. size=1G β€” total amount to be processed per job: 1 gigabyte.
  4. numjobs=4 β€” use 4 threads for the test.
  5. runtime=60s β€” run duration for each test.
  6. time_based β€” time-based run: the test runs for the specified duration regardless of whether the filesystem processed the entire data amount.

The only varying parameter is rw, which specifies the workload. The workload types used were:

  1. randread β€” random read
  2. randwrite β€” random write
  3. seqread β€” sequential read
  4. seqwrite β€” sequential write
  5. mixed β€” mixed read/write

References:

  • fio β€” https://github.com/axboe/fio
  • libvirt β€” https://libvirt.org/
  • bash β€” https://www.gnu.org/software/bash/manual/bash.html
  • ThinkPad X230 β€” https://thinkwiki.de/X230
  • Arch Linux β€” https://archlinux.org/
  • Gentoo β€” https://www.gentoo.org/
  • FreeBSD β€” https://www.freebsd.org/
InΒ [Β ]:
#!/bin/bash

RESULTS_DIR="./fio_results"
mkdir -p "$RESULTS_DIR"

declare -A JOBS
JOBS["randread"]="randread.fio"
JOBS["randwrite"]="randwrite.fio"
JOBS["seqread"]="seqread.fio"
JOBS["seqwrite"]="seqwrite.fio"
JOBS["mixed"]="mixed.fio"

cat <<EOF > "${JOBS["randread"]}"
[randread]
ioengine=sync
rw=randread
bs=4k
size=1G
numjobs=4
runtime=60s
time_based
EOF

cat <<EOF > "${JOBS["randwrite"]}"
[randwrite]
ioengine=sync
rw=randwrite
bs=4k
size=1G
numjobs=4
runtime=60s
time_based
EOF

cat <<EOF > "${JOBS["seqread"]}"
[seqread]
ioengine=sync
rw=read
bs=1M
size=1G
numjobs=1
runtime=60s
time_based
EOF

cat <<EOF > "${JOBS["seqwrite"]}"
[seqwrite]
ioengine=sync
rw=write
bs=1M
size=1G
numjobs=1
runtime=60s
time_based
EOF

cat <<EOF > "${JOBS["mixed"]}"
[mixed]
ioengine=sync
rw=randrw
bs=4k
size=1G
numjobs=4
runtime=60s
time_based
rwmixread=70
EOF

for i in {0..9}; do
    mkdir -p "$RESULTS_DIR/${i}"
    
    for JOB in "${!JOBS[@]}"; do
        echo "Running $JOB benchmark..."
        fio "${JOBS[$JOB]}" --output-format=json --output="$RESULTS_DIR/${i}/$JOB-result.json"
    done
done

echo "All benchmarks completed. Results are stored in $RESULTS_DIR."

First Phase of ProcessingΒΆ

In the first phase of processing, the command fio with the argument --output-format=json saves all resulting data in JSON format.

I am attaching an example of the clean output for one job section for the case ./benchmark_results/btrfs/mixed_results.json:

InΒ [Β ]:
   {
      "jobname" : "mixed",
      "groupid" : 0,
      "job_start" : 1753822420164,
      "error" : 0,
      "eta" : 0,
      "elapsed" : 61,
      "job options" : {
        "ioengine" : "libaio",
        "rw" : "randrw",
        "bs" : "4k",
        "size" : "1G",
        "numjobs" : "4",
        "runtime" : "60s",
        "time_based" : "",
        "rwmixread" : "70"
      },
      "read" : {
        "io_bytes" : 677052416,
        "io_kbytes" : 661184,
        "bw_bytes" : 11284018,
        "bw" : 11019,
        "iops" : 2754.887419,
        "runtime" : 60001,
        "total_ios" : 165296,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 60135,
          "max" : 169270733,
          "mean" : 327892.475353,
          "stddev" : 704004.012420,
          "N" : 165296
        },
        "clat_ns" : {
          "min" : 1581,
          "max" : 2301665,
          "mean" : 6175.911782,
          "stddev" : 34244.334279,
          "N" : 165296,
          "percentile" : {
            "1.000000" : 1880,
            "5.000000" : 1960,
            "10.000000" : 1992,
            "20.000000" : 2064,
            "30.000000" : 2096,
            "40.000000" : 2160,
            "50.000000" : 2224,
            "60.000000" : 2320,
            "70.000000" : 2576,
            "80.000000" : 2864,
            "90.000000" : 3152,
            "95.000000" : 15808,
            "99.000000" : 96768,
            "99.500000" : 136192,
            "99.900000" : 370688,
            "99.950000" : 708608,
            "99.990000" : 1449984
          }
        },
        "lat_ns" : {
          "min" : 62116,
          "max" : 169288876,
          "mean" : 334068.387136,
          "stddev" : 704772.019699,
          "N" : 165296
        },
        "bw_min" : 1780,
        "bw_max" : 16104,
        "bw_agg" : 22.005418,
        "bw_mean" : 11049.655462,
        "bw_dev" : 2798.273603,
        "bw_samples" : 119,
        "iops_min" : 445,
        "iops_max" : 4026,
        "iops_mean" : 2762.285714,
        "iops_stddev" : 699.522745,
        "iops_samples" : 119
      },
      "write" : {
        "io_bytes" : 289918976,
        "io_kbytes" : 283124,
        "bw_bytes" : 4831902,
        "bw" : 4718,
        "iops" : 1179.663672,
        "runtime" : 60001,
        "total_ios" : 70781,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 5914,
          "max" : 3033011,
          "mean" : 42302.398016,
          "stddev" : 126243.650789,
          "N" : 70781
        },
        "clat_ns" : {
          "min" : 1042,
          "max" : 2893097,
          "mean" : 6552.313234,
          "stddev" : 48773.628443,
          "N" : 70781,
          "percentile" : {
            "1.000000" : 1112,
            "5.000000" : 1128,
            "10.000000" : 1160,
            "20.000000" : 1208,
            "30.000000" : 1240,
            "40.000000" : 1288,
            "50.000000" : 1560,
            "60.000000" : 1912,
            "70.000000" : 2160,
            "80.000000" : 2288,
            "90.000000" : 2544,
            "95.000000" : 3120,
            "99.000000" : 136192,
            "99.500000" : 183296,
            "99.900000" : 667648,
            "99.950000" : 1138688,
            "99.990000" : 1712128
          }
        },
        "lat_ns" : {
          "min" : 6996,
          "max" : 3078384,
          "mean" : 48854.711250,
          "stddev" : 135555.993167,
          "N" : 70781
        },
        "bw_min" : 790,
        "bw_max" : 7102,
        "bw_agg" : 21.932277,
        "bw_mean" : 4732.731092,
        "bw_dev" : 1164.468350,
        "bw_samples" : 119,
        "iops_min" : 197,
        "iops_max" : 1775,
        "iops_mean" : 1183.058824,
        "iops_stddev" : 291.144244,
        "iops_samples" : 119
      },
      "trim" : {
        "io_bytes" : 0,
        "io_kbytes" : 0,
        "bw_bytes" : 0,
        "bw" : 0,
        "iops" : 0.000000,
        "runtime" : 0,
        "total_ios" : 0,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "clat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "bw_min" : 0,
        "bw_max" : 0,
        "bw_agg" : 0.000000,
        "bw_mean" : 0.000000,
        "bw_dev" : 0.000000,
        "bw_samples" : 0,
        "iops_min" : 0,
        "iops_max" : 0,
        "iops_mean" : 0.000000,
        "iops_stddev" : 0.000000,
        "iops_samples" : 0
      },
      "sync" : {
        "total_ios" : 0,
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        }
      },
      "job_runtime" : 60000,
      "usr_cpu" : 1.750000,
      "sys_cpu" : 10.748333,
      "ctx" : 261501,
      "majf" : 0,
      "minf" : 13,
      "iodepth_level" : {
        "1" : 100.000000,
        "2" : 0.000000,
        "4" : 0.000000,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.000000,
        ">=64" : 0.000000
      },
      "iodepth_submit" : {
        "0" : 0.000000,
        "4" : 100.000000,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.000000,
        "64" : 0.000000,
        ">=64" : 0.000000
      },
      "iodepth_complete" : {
        "0" : 0.000000,
        "4" : 100.000000,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.000000,
        "64" : 0.000000,
        ">=64" : 0.000000
      },
      "latency_ns" : {
        "2" : 0.000000,
        "4" : 0.000000,
        "10" : 0.000000,
        "20" : 0.000000,
        "50" : 0.000000,
        "100" : 0.000000,
        "250" : 0.000000,
        "500" : 0.000000,
        "750" : 0.000000,
        "1000" : 0.000000
      },
      "latency_us" : {
        "2" : 26.430360,
        "4" : 67.502552,
        "10" : 0.593874,
        "20" : 1.747735,
        "50" : 1.188595,
        "100" : 1.361844,
        "250" : 0.976376,
        "500" : 0.105474,
        "750" : 0.033040,
        "1000" : 0.016944
      },
      "latency_ms" : {
        "2" : 0.040241,
        "4" : 0.010000,
        "10" : 0.000000,
        "20" : 0.000000,
        "50" : 0.000000,
        "100" : 0.000000,
        "250" : 0.000000,
        "500" : 0.000000,
        "750" : 0.000000,
        "1000" : 0.000000,
        "2000" : 0.000000,
        ">=2000" : 0.000000
      },
      "latency_depth" : 1,
      "latency_target" : 0,
      "latency_percentile" : 100.000000,
      "latency_window" : 0
    },

Processing of the dataΒΆ

The collected raw data needs to be filtered to retain only relevant information, then organized and converted into CSV format.

For this, we will use the following pieces of Python code, which will load all results, accumulate them, and then write all relevant information to the corresponding CSV file.

InΒ [51]:
#!/usr/bin/env python

import pandas as pd
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
import os
import json
import csv
import glob
import sklearn

import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score

The function used for processing takes two absolute paths: one contains the data in raw JSON format, and the other will store the CSV result. The processing consists solely of selecting relevant parameters for further analysis.

InΒ [52]:
def process_fio_json_to_csv(json_file_path, csv_file_path):
    with open(json_file_path, 'r') as file:
        data = json.load(file)

    with open(csv_file_path, 'w', newline='') as csvfile:
        fieldnames = [
            'Job_Name', 'Read_IOPS', 'Read_Bandwidth', 'Read_Latency_Mean', 
            'Read_Latency_Min', 'Read_Latency_Max', 'Read_Errors',
            'Write_IOPS', 'Write_Bandwidth', 'Write_Latency_Mean', 
            'Write_Latency_Min', 'Write_Latency_Max', 'Write_Errors'
        ]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for job in data.get('jobs', []):
            job_name = job.get('jobname', 'Unknown_Job')
            
            read_metrics = job.get('read', {})
            write_metrics = job.get('write', {})

            writer.writerow({
                'Job_Name': job_name,
                'Read_IOPS': read_metrics.get('iops', 0),
                'Read_Bandwidth': read_metrics.get('bw_bytes', 0),
                'Read_Latency_Mean': read_metrics.get('clat_ns', {}).get('mean', 0) / 1000, # Konvertujeme na (ms)
                'Read_Latency_Min': read_metrics.get('clat_ns', {}).get('min', 0) / 1000,  # Konvertujeme na (ms)
                'Read_Latency_Max': read_metrics.get('clat_ns', {}).get('max', 0) / 1000,  # Konvertujeme na (ms)
                'Read_Errors': data.get('error', 0),
                'Write_IOPS': write_metrics.get('iops', 0),
                'Write_Bandwidth': write_metrics.get('bw_bytes', 0),
                'Write_Latency_Mean': write_metrics.get('clat_ns', {}).get('mean', 0) / 1000, # Konvertujeme na (ms)
                'Write_Latency_Min': write_metrics.get('clat_ns', {}).get('min', 0) / 1000,  # Konvertujeme na (ms)
                'Write_Latency_Max': write_metrics.get('clat_ns', {}).get('max', 0) / 1000,  # Konvertujeme na (ms)
                'Write_Errors': data.get('error', 0)
            })

NΓ‘sledovne ju len pouΕΎijeme pri iterovanΓ­ zloΕΎky, v ktorej sΓΊ uloΕΎenΓ© vΕ‘etky fio benchmarky.

InΒ [53]:
source_dir = './benchmark_results'
destination_dir = './semi_processed_results'

os.makedirs(destination_dir, exist_ok=True)

for filesystem in os.listdir(source_dir):
    filesystem_path = os.path.join(os.path.join(source_dir, filesystem), 'fio_results')
    
    if os.path.isdir(filesystem_path):
        for iteration in os.listdir(filesystem_path):
            iteration_file_path = os.path.join(filesystem_path, iteration)
            #print(iteration_file_path)
            for result_file in os.listdir(iteration_file_path):
                if result_file.endswith('.json'):
                    result_file_path = os.path.join(iteration_file_path, result_file)
                    processed_file_dir = os.path.join(os.path.join(destination_dir, filesystem), iteration)
                    os.makedirs(processed_file_dir, exist_ok=True)
                    processed_file_path = os.path.join(processed_file_dir, os.path.splitext(result_file)[0] + '.csv')
                
                    # print(f"result_file_path: {result_file_path}")
                    # print(f"processed_file_dir: {processed_file_dir}")
                    # print(f"processed_file_path: {processed_file_path}")
                    # print()
                
                    with open(result_file_path, 'r') as file:
                        data = json.load(file)
                
                    #print(f"Processing fio_result '{result_file}' from {filesystem}.")
                    process_fio_json_to_csv(result_file_path, processed_file_path)
                    #print(f"Saved processed fio_result to '{processed_file_path}'")
                    #print()

print("Processing complete.")
Processing complete.

NasledujΓΊco zloΕΎΓ­me vΕ‘etky spracovanΓ© benchmarkovΓ© iterΓ‘cie do jednej tabuΔΎky.

InΒ [54]:
source_dir = './semi_processed_results'
destination_dir = './processed_results'

os.makedirs(destination_dir, exist_ok=True)

for filesystem in os.listdir(source_dir):
    filesystem_path = os.path.join(source_dir, filesystem)
    if not os.path.isdir(filesystem_path):
        continue
    #print("filesystem_path: ", filesystem_path)
    out_path = os.path.join(destination_dir, f'{filesystem}.csv')
    header_written = False

    for iteration in os.listdir(filesystem_path):
        iter_path = os.path.join(filesystem_path, iteration)

        #print("iter_path: ", iter_path)

        for filename in os.listdir(iter_path):
            if not filename.endswith('.csv'):
                continue

            src = os.path.join(iter_path, filename)
            with open(src, newline='') as src_csv:
                reader = csv.reader(src_csv)
                rows = list(reader)

            with open(out_path, 'a', newline='') as dst_csv:
                writer = csv.writer(dst_csv)
                if not header_written:
                    header = rows[0] + ['Iteration']
                    writer.writerow(header)
                    header_written = True
                for row in rows[1:]:
                    writer.writerow(row + [iteration])

print("Processing complete.")
Processing complete.

Ε truktΓΊra dΓ‘tΒΆ

DΓ‘ta sΓΊ tvorenΓ© riadkami, obsahujΓΊcimi informΓ‘cie rΓ΄znych kategΓ³riΓ­ pre kaΕΎdΓ½ fio job, ktorΓ½ pre danΓ½ workload zbehol.

KategΓ³rie, ktorΓ© sa spracovΓ‘vajΓΊ (separΓ‘tne pre READ a WRITE):

  1. IOPS - Input/Output operΓ‘cie za sekundu, anglicky Inputs/Outputs per second.
  2. Bandwidth (bytes/sec) - Ε Γ­rka pΓ‘sma v bytoch za sekundu.
  3. Latency Mean (ms) - MediÑn latencie počas celého trvania daného jobu.
  4. Latency Min (ms) - Minimum z nameraných hodnôt latencie počas celého trvania daného jobu.
  5. Latency Max (ms) - Maximum z nameraných hodnôt latencie počas celého trvania daného jobu.
  6. Errors - Počet chýb, ktoré nastali.
  7. Iteration - Číslo iterÑcie daného benchmarku pre daný workload.

PrΓ­klad spracovanej zlozky ./benchmark_results/btrfs/fio_results/ skriptom na CSV:

InΒ [55]:
df = pd.read_csv('processed_results/brtfs.csv')
df
Out[55]:
Job_Name Read_IOPS Read_Bandwidth Read_Latency_Mean Read_Latency_Min Read_Latency_Max Read_Errors Write_IOPS Write_Bandwidth Write_Latency_Mean Write_Latency_Min Write_Latency_Max Write_Errors Iteration
0 mixed 567.697743 2325289 537.2872390570001 60.455 53545.272 0 243.041899 995499 2831.60668333 8.28 465766.619 0 6
1 mixed 577.280757 2364541 547.535714591 59.938 57243.541 0 248.325056 1017139 2722.235577718 7.99 462862.361 0 6
2 mixed 562.773954 2305122 549.882368822 61.756 48186.059 0 239.846003 982409 2844.742298311 8.401 465344.578 0 6
3 mixed 560.590657 2296179 546.290920799 62.211 50474.684 0 248.21253 1016678 2762.9590984359997 8.82 450801.746 0 6
4 seqwrite 0.0 0 0.0 0.0 0.0 0 113.507986 119021749 8590.604406548999 382.103 16447037.637 0 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
417 randwrite 0.0 0 0.0 0.0 0.0 0 230.015472 942143 4335.342988066 6.995 480680.658 0 8
418 randwrite 0.0 0 0.0 0.0 0.0 0 230.747475 945141 4325.153932877 7.408 477860.984 0 8
419 randwrite 0.0 0 0.0 0.0 0.0 0 245.191988 1004306 4067.403516284 6.043 479348.142 0 8
420 randwrite 0.0 0 0.0 0.0 0.0 0 219.578744 899394 4541.969428095 7.506 479457.954 0 8
421 seqread 1310.717797 1374387225 646.892084237 82.43 15017.528 0 0.0 0 0.0 0.0 0.0 0 8

422 rows Γ— 14 columns

Analysis of Individual File System Data One by OneΒΆ

The first analysis will be for individual file systems, where I will examine how a given file system responds to different loads.

First, we will define a function that reads point estimates for each iteration separately for a given workload and calculates the 95% confidence estimates and overlaps across them.

InΒ [56]:
def workload_summary(csv_path, workload):
    df = pd.read_csv(csv_path)
    df = df[df['Job_Name'] == workload]

    io_cols = [
        'Read_IOPS', 'Read_Bandwidth', 'Read_Latency_Mean',
        'Write_IOPS', 'Write_Bandwidth', 'Write_Latency_Mean'
    ]
    for col in io_cols:
        df[col] = pd.to_numeric(df[col], errors='coerce')
    
    stats_summary = df.agg(
        {
            'Read_IOPS': ['mean', 'median'],
            'Read_Bandwidth': ['mean'],
            'Read_Latency_Mean': ['mean'],
            'Write_IOPS': ['mean'],
            'Write_Bandwidth': ['mean'],
            'Write_Latency_Mean': ['mean']
        }
    ).reset_index()

    confidence = 0.95
    n = len(df)

    # Confidence interval for Read_IOPS
    mean = stats_summary.loc[0, 'Read_IOPS']
    std_dev = df['Read_IOPS'].std()
    h = std_dev * stats.t.ppf((1 + confidence) / 2., n - 1) / np.sqrt(n)

    stats_summary['Read_IOPS_CI_Lower'] = mean - h
    stats_summary['Read_IOPS_CI_Upper'] = mean + h

    return stats_summary

def filesystem_summary(processed_data_path, workloads, iterations):
    summaries = []
    
    for workload in workloads:
        for iteration in range(iterations):
            summary = workload_summary(processed_data_path, workload)
            summary['Workload'] = workload
            summary['Iteration'] = iteration
            summaries.append(summary)

    combined_summary = pd.concat(summaries, ignore_index=True)
    
    return combined_summary

def plot_fs_summary(combined_summary, filter_seq, metric, units, filesystem, description):
    plt.figure(figsize=(12, 8))

    mean_data = combined_summary[combined_summary['index'] == 'mean']
    if filter_seq:
        mean_data = mean_data[mean_data['Workload'].isin(['seqwrite', 'seqread'])]
    else:
        mean_data = mean_data[~mean_data['Workload'].isin(['seqwrite', 'seqread'])]
        
    mean_data = mean_data.drop(columns=['index'])
    final_data = mean_data.melt(id_vars=['Workload'], var_name='Metric', value_name='mean')
    final_data = final_data[final_data["Metric"].isin(metric)]
    
    plt.figure(figsize=(12, 8))
    sns.barplot(data=final_data, x='Workload', y='mean', hue='Metric', palette='viridis')

    plt.title(f'{filesystem} - {description}')
    plt.xlabel('Workload')
    plt.ylabel(f'Mean Value {units} across all jobs')
    plt.xticks(rotation=45)
    plt.legend(title=f'Metrics {units}')
    plt.tight_layout()

    plt.show()

def analyze_ci_overlaps(combined_summary):
    def check_overlap(row1, row2):
        return not (
            row1['Read_IOPS_CI_Upper'] < row2['Read_IOPS_CI_Lower'] or 
            row2['Read_IOPS_CI_Upper'] < row1['Read_IOPS_CI_Lower']
        )
        
    mean_data = combined_summary[combined_summary['index'] == 'mean']
    overlap_results = []
    for i in range(len(mean_data)):
        for j in range(i + 1, len(mean_data)):
            workload1 = mean_data.iloc[i]['Workload']
            workload2 = mean_data.iloc[j]['Workload']
            if mean_data.iloc[i]['Read_IOPS_CI_Lower'] is not None and mean_data.iloc[i]['Read_IOPS_CI_Upper'] is not None and \
               mean_data.iloc[j]['Read_IOPS_CI_Lower'] is not None and mean_data.iloc[j]['Read_IOPS_CI_Upper'] is not None:
                overlap = check_overlap(mean_data.iloc[i], mean_data.iloc[j])
                overlap_results.append((workload1, workload2, overlap))
                
    return overlap_results

fs_summaries = []
workloads = ['mixed', 'randread', 'randwrite', 'seqread', 'seqwrite']

We will calculate point estimates and a 95% confidence interval for individual workloads, where for the purpose of visualization, due to the expected significant difference in measured values, we will divide the workloads into two groups: sequential and random plus mixed.

BrtfsΒΆ

Btrfs (B-tree file system is a modern file system for Linux, personally used by me and suitable for both desktops and servers.

The main features that can affect benchmark results are:

  • Data Integrity Check - it uses checksums to ensure data integrity, meaning it can detect and repair corrupted files.

  • Copy-On-Write - this mechanism ensures that when new data is written, the original data is not changed but copied to a new location. This increases performance during write operations and allows for efficient snapshot creation without the need to duplicate data.

InΒ [57]:
csv_path = 'processed_results/brtfs.csv'
summaries = filesystem_summary(csv_path, workloads, 10)
fs_summaries.append([summaries, 'brtfs'])
summaries
Out[57]:
index Read_IOPS Read_Bandwidth Read_Latency_Mean Write_IOPS Write_Bandwidth Write_Latency_Mean Read_IOPS_CI_Lower Read_IOPS_CI_Upper Workload Iteration
0 mean 567.569589 2324764.575 666.100357 245.039915 1.003683e+06 2516.909446 562.811206 572.327972 mixed 0
1 median 574.166709 NaN NaN NaN NaN NaN 562.811206 572.327972 mixed 0
2 mean 567.569589 2324764.575 666.100357 245.039915 1.003683e+06 2516.909446 562.811206 572.327972 mixed 1
3 median 574.166709 NaN NaN NaN NaN NaN 562.811206 572.327972 mixed 1
4 mean 567.569589 2324764.575 666.100357 245.039915 1.003683e+06 2516.909446 562.811206 572.327972 mixed 2
... ... ... ... ... ... ... ... ... ... ... ...
95 median 0.000000 NaN NaN NaN NaN NaN 0.000000 0.000000 seqwrite 7
96 mean 0.000000 0.000 0.000000 124.603583 1.306563e+08 7760.988795 0.000000 0.000000 seqwrite 8
97 median 0.000000 NaN NaN NaN NaN NaN 0.000000 0.000000 seqwrite 8
98 mean 0.000000 0.000 0.000000 124.603583 1.306563e+08 7760.988795 0.000000 0.000000 seqwrite 9
99 median 0.000000 NaN NaN NaN NaN NaN 0.000000 0.000000 seqwrite 9

100 rows Γ— 11 columns

InΒ [58]:
#results = analyze_ci_overlaps(summaries)
#results
InΒ [59]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_IOPS","Write_IOPS", "Read_IOPS_CI_Lower", "Read_IOPS_CI_Upper"], units="", filesystem="brtfs", description="Input/Output Operations per Second")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [60]:
plot_fs_summary(summaries, filter_seq=True, metric=["Read_Bandwidth", "Write_Bandwidth"], units="(bytes/sec)", filesystem="brtfs", description="Bandwidth in Bytes per Second for Sequential Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [61]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Bandwidth", "Write_Bandwidth"], units="(bytes/sec)", filesystem="brtfs", description="Bandwidth in Bytes per Second for Random and Mixed Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [62]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Latency_Mean", "Write_Latency_Mean"], units="(ms)", filesystem="brtfs", description="Average Response in Milliseconds")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image

The result is surprising to me, especially the values during sequential operations. What I find more interesting is that the ratio between READ/WRITE decreases when it comes to a random workload.

Ext4ΒΆ

Ext4 (Fourth Extended File System) is one of the most widely used file systems for Linux. It is an improved version of the previous file systems ext2 and ext3, suitable for desktop use.

The main features that can affect benchmark results are:

  • Journaling - all write operations are first recorded in a journal. This increases fault tolerance and allows for faster recovery after a system crash.

  • Extents - introduces the concept of extents, which are contiguous blocks on the disk that reduce fragmentation and improve performance during read and write operations.

  • Data Integrity Check - supports data integrity checking using checksums.

InΒ [63]:
csv_path = 'processed_results/ext4.csv'
summaries = filesystem_summary(csv_path, workloads, 10)
fs_summaries.append([summaries, 'ext4'])
summaries
Out[63]:
index Read_IOPS Read_Bandwidth Read_Latency_Mean Write_IOPS Write_Bandwidth Write_Latency_Mean Read_IOPS_CI_Lower Read_IOPS_CI_Upper Workload Iteration
0 mean 3705.347271 15177101.95 328.756066 1593.725267 6527898.2 11.719552 3411.27211 3999.422433 mixed 0
1 median 4064.973917 NaN NaN NaN NaN NaN 3411.27211 3999.422433 mixed 0
2 mean 3705.347271 15177101.95 328.756066 1593.725267 6527898.2 11.719552 3411.27211 3999.422433 mixed 1
3 median 4064.973917 NaN NaN NaN NaN NaN 3411.27211 3999.422433 mixed 1
4 mean 3705.347271 15177101.95 328.756066 1593.725267 6527898.2 11.719552 3411.27211 3999.422433 mixed 2
... ... ... ... ... ... ... ... ... ... ... ...
95 median 0.000000 NaN NaN NaN NaN NaN 0.00000 0.000000 seqwrite 7
96 mean 0.000000 0.00 0.000000 765.921937 803127360.0 965.512927 0.00000 0.000000 seqwrite 8
97 median 0.000000 NaN NaN NaN NaN NaN 0.00000 0.000000 seqwrite 8
98 mean 0.000000 0.00 0.000000 765.921937 803127360.0 965.512927 0.00000 0.000000 seqwrite 9
99 median 0.000000 NaN NaN NaN NaN NaN 0.00000 0.000000 seqwrite 9

100 rows Γ— 11 columns

InΒ [64]:
#results = analyze_ci_overlaps(summaries)
#results
InΒ [65]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_IOPS","Write_IOPS", "Read_IOPS_CI_Lower", "Read_IOPS_CI_Upper"], units="", filesystem="ext4", description="Input/Output Operations per Second")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [66]:
plot_fs_summary(summaries, filter_seq=True, metric=["Read_Bandwidth", "Write_Bandwidth"], units="(bytes/sec)", filesystem="ext4", description="Bandwidth in Bytes per Second for Sequential Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [67]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Bandwidth", "Write_Bandwidth"], units="(bytes/sec)", filesystem="ext4", description="Bandwidth in Bytes per Second for Random and Mixed Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [68]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Latency_Mean", "Write_Latency_Mean"], units="(ms)", filesystem="ext4", description="Average Response in Milliseconds")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image

In the case of Ext4, I would describe the ratios of values between READ and WRITE as faster than I expected, especially the bandwidth during random writes is impressive.

XfsΒΆ

XFS is a high-performance file system that was originally developed by SGI (Silicon Graphics, Inc.) for the IRIX operating system. Today, XFS is widely used in Linux environments, especially for servers.

The main features that can affect benchmark results are:

  • Journaling - uses journaling to ensure data integrity, meaning that all write operations are recorded in a journal before they are executed.

  • Dynamic Allocation - supports dynamic block allocation, allowing files to grow without the need for prior planning. This improves performance when writing and reading large files.

  • Support for Large Files and Partitions - XFS supports files up to 8 exabytes in size and partitions up to 8 exabytes.

  • Fast File Processing - optimized for fast processing of large files and contiguous blocks.

  • Data Integrity Check - ensures data integrity using checksums.

InΒ [69]:
csv_path = 'processed_results/xfs.csv'
summaries = filesystem_summary(csv_path, workloads, 10)
fs_summaries.append([summaries, 'xfs'])
summaries
Out[69]:
index Read_IOPS Read_Bandwidth Read_Latency_Mean Write_IOPS Write_Bandwidth Write_Latency_Mean Read_IOPS_CI_Lower Read_IOPS_CI_Upper Workload Iteration
0 mean 2736.634934 1.120926e+07 431.161198 1177.463757 4.822891e+06 12.085936 2497.6283 2975.641568 mixed 0
1 median 2426.456113 NaN NaN NaN NaN NaN 2497.6283 2975.641568 mixed 0
2 mean 2736.634934 1.120926e+07 431.161198 1177.463757 4.822891e+06 12.085936 2497.6283 2975.641568 mixed 1
3 median 2426.456113 NaN NaN NaN NaN NaN 2497.6283 2975.641568 mixed 1
4 mean 2736.634934 1.120926e+07 431.161198 1177.463757 4.822891e+06 12.085936 2497.6283 2975.641568 mixed 2
... ... ... ... ... ... ... ... ... ... ... ...
95 median 0.000000 NaN NaN NaN NaN NaN 0.0000 0.000000 seqwrite 7
96 mean 0.000000 0.000000e+00 0.000000 1516.069565 1.589714e+09 519.398134 0.0000 0.000000 seqwrite 8
97 median 0.000000 NaN NaN NaN NaN NaN 0.0000 0.000000 seqwrite 8
98 mean 0.000000 0.000000e+00 0.000000 1516.069565 1.589714e+09 519.398134 0.0000 0.000000 seqwrite 9
99 median 0.000000 NaN NaN NaN NaN NaN 0.0000 0.000000 seqwrite 9

100 rows Γ— 11 columns

InΒ [70]:
#results = analyze_ci_overlaps(summaries)
#results
InΒ [71]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_IOPS","Write_IOPS", "Read_IOPS_CI_Lower", "Read_IOPS_CI_Upper"], units="", filesystem="xfs", description="Bandwidth in Bytes per Second for Sequential Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [72]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Bandwidth", "Write_Bandwidth"], units="(bytes/sec)", filesystem="xfs", description="Bandwidth in Bytes per Second for Random and Mixed Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [73]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Latency_Mean", "Write_Latency_Mean"], units="(ms)", filesystem="xfs", description="Average Response in Milliseconds")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image

In the case of xfs, its focus on high performance is noticeable, however, my hardware is quite limited, and the virtual machine had only 2 processor cores available, so the parallelism for which xfs is known certainly did not have the opportunity to shine. In the future, I would like to collect data ideally on some personal cluster.

ZfsΒΆ

ZFS (Zettabyte File System) is an advanced file system and volume manager developed by Sun Microsystems. ZFS is suitable for servers and large data storage.

The main features that can affect benchmark results are:

  • Data Integrity Check - uses checksums to ensure integrity.

  • Copy-On-Write - this mechanism ensures that when new data is written, the original data is not changed but copied to a new location. This increases performance during write operations.

  • High Capacity - supports very large volumes and files, with theoretical limits of up to 256 zettabytes, making it ideal for large data storage.

InΒ [74]:
csv_path = 'processed_results/zfs.csv'
summaries = filesystem_summary(csv_path, workloads, 10)
fs_summaries.append([summaries, 'zfs'])
summaries
Out[74]:
index Read_IOPS Read_Bandwidth Read_Latency_Mean Write_IOPS Write_Bandwidth Write_Latency_Mean Read_IOPS_CI_Lower Read_IOPS_CI_Upper Workload Iteration
0 mean 338.054943 1384672.6 2039.391012 146.182581 5.987634e+05 13220.572681 300.31346 375.796425 mixed 0
1 median 318.069698 NaN NaN NaN NaN NaN 300.31346 375.796425 mixed 0
2 mean 338.054943 1384672.6 2039.391012 146.182581 5.987634e+05 13220.572681 300.31346 375.796425 mixed 1
3 median 318.069698 NaN NaN NaN NaN NaN 300.31346 375.796425 mixed 1
4 mean 338.054943 1384672.6 2039.391012 146.182581 5.987634e+05 13220.572681 300.31346 375.796425 mixed 2
... ... ... ... ... ... ... ... ... ... ... ...
95 median 0.000000 NaN NaN NaN NaN NaN 0.00000 0.000000 seqwrite 7
96 mean 0.000000 0.0 0.000000 104.092003 1.091484e+08 9749.291413 0.00000 0.000000 seqwrite 8
97 median 0.000000 NaN NaN NaN NaN NaN 0.00000 0.000000 seqwrite 8
98 mean 0.000000 0.0 0.000000 104.092003 1.091484e+08 9749.291413 0.00000 0.000000 seqwrite 9
99 median 0.000000 NaN NaN NaN NaN NaN 0.00000 0.000000 seqwrite 9

100 rows Γ— 11 columns

InΒ [75]:
#results = analyze_ci_overlaps(summaries)
#results
InΒ [76]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_IOPS","Write_IOPS", "Read_IOPS_CI_Lower", "Read_IOPS_CI_Upper"], units="", filesystem="zfs", description="Input/Output Operations per Second")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [77]:
plot_fs_summary(summaries, filter_seq=True, metric=["Read_Bandwidth", "Write_Bandwidth"], units="(bytes/sec)", filesystem="zfs", description="Bandwidth in Bytes per Second for Sequential Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [78]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Bandwidth", "Write_Bandwidth"], units="(bytes/sec)", filesystem="zfs", description="Bandwidth in Bytes per Second for Random and Mixed Operations")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
InΒ [79]:
plot_fs_summary(summaries, filter_seq=False, metric=["Read_Latency_Mean", "Write_Latency_Mean"], units="(ms)", filesystem="zfs", description="Average Response in Milliseconds")
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image

Although I expected the ratio between READ/WRITE to be large, I did not anticipate that sequential reading would be so low. In the context of how ZFS operates and its features, the data makes sense.

Analysis of Data of Individual File Systems in Relation to Each OtherΒΆ

My personal opinion, which I will be testing, is that regarding speed, there is no noticeable difference on the desktop for my use. When choosing a file system, I should focus on features such as snapshots, recovery, etc.

Thus, the null hypothesis I am testing is that there is no pair of file systems that would have a significant difference in the metrics of Inputs/Outputs per second (IOPS), Bandwidth, and Mean Latency for both READ and WRITE.

InΒ [80]:
final_summary = []
for fs_summary in fs_summaries:
    summary = fs_summary[0]
    summary['Filesystem'] = fs_summary[1]
    final_summary.append(summary)

final_data = pd.concat(final_summary, ignore_index=True)
final_data.ffill(inplace=True)
#print(final_data)

To start, we will try to create a simple regression model.

Independent Variables:

  1. Workload
  2. Filesystem

Dependent Variables:

  1. Read/Write IOPS
  2. Read/Write Latency Mean (ms)

The first version will use linear regression.

InΒ [81]:
# independent variables
x = final_data[['Filesystem', 'Workload']]
# dependent variables
y = final_data[['Read_IOPS', 'Write_IOPS', 'Read_Latency_Mean', 'Write_Latency_Mean']]

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(), ['Filesystem', 'Workload'])
    ])

model = Pipeline(steps=[('preprocessor', preprocessor),
                         ('regressor', LinearRegression())])

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print('R^2 Score:', r2_score(y_test, y_pred))
print('RMSE:', mean_squared_error(y_test, y_pred))
R^2 Score: 0.4826403430749351
RMSE: 1836464.825941098

The resulting values are not good enough for me to conclude that the relationships are linear.

To be sure, I will try decision tree regression for the second version.

InΒ [82]:
model = Pipeline([
    ('preprocessor', preprocessor),
    ('regressor', DecisionTreeRegressor(
        max_depth=5,
        min_samples_leaf=5,
        random_state=42
    ))
])

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print('R^2 Score:', r2_score(y_test, y_pred))
print('RMSE:', mean_squared_error(y_test, y_pred))
R^2 Score: 0.91236505622809
RMSE: 184343.8175389695

Decision tree regression is successful for the given data. However, it is possible that the model is overfitted and could be improved.

Visualization of the tree:

InΒ [83]:
regressor = model.named_steps['regressor']  # Vratime regressor z pipeline
feature_names = model.named_steps['preprocessor'].transformers_[0][1].get_feature_names_out(['Filesystem', 'Workload'])

plt.figure(figsize=(20, 10))
plot_tree(regressor, feature_names=feature_names, filled=True)
plt.title('Decision Tree')
plt.show()
No description has been provided for this image

To address the problem of overfitting, I can also try a Random Forest Regressor.

InΒ [84]:
model_rf = Pipeline([
    ('preprocessor', preprocessor),
    ('regressor', RandomForestRegressor(
        n_estimators=100,
        max_depth=10,
        min_samples_leaf=5,
        random_state=42,
        n_jobs=-1
    ))
])

model_rf.fit(X_train, y_train)
y_pred_rf = model_rf.predict(X_test)
print("RF R^2:", r2_score(y_test, y_pred_rf))
print("RF RMSE:", mean_squared_error(y_test, y_pred_rf))
RF R^2: 0.9982959626477509
RF RMSE: 9769.911615594548

The results look even better, so I can conclude that the relationships between the data are likely non-linear.

Analysis of VariationsΒΆ

Next, I will conduct an analysis of variations across the individual metrics and iterations across all tested file systems and test the null hypothesis.

InΒ [85]:
def check_assumptions(data, metric, factor='Filesystem'):
    formula = f"{metric} ~ C({factor})"
    model = ols(formula, data=data).fit()
    resid = model.resid
    groups = [group[metric].values for name, group in data.groupby(factor)]

    print(f"\n=== Assumptions for metric '{metric}' ===")

    shapiro_stat, shapiro_p = stats.shapiro(resid)
    print(f"Shapiro-Wilk test: stat={shapiro_stat:.4f}, p={shapiro_p:.4f}")
    if shapiro_p < 0.05:
        print("!!! Residuals are likely not normally distributed (p < 0.05) !!!")

    sm.qqplot(resid, line='45', fit=True)
    plt.title(f"QQ-plot of residuals for {metric}")
    plt.show()

    # Homogenita rozptylov
    levene_stat, levene_p = stats.levene(*groups, center='median')
    print(f"Levene's test: stat={levene_stat:.4f}, p={levene_p:.4f}")
    if levene_p < 0.05:
        print("!!! Variances between groups are not homogeneous (p < 0.05) !!!")

    plt.figure()
    sns.boxplot(x=factor, y=metric, data=data)
    plt.title(f"{metric} by {factor}")
    plt.show()

    return model

def perform_anova_with_checks(data, metrics, factor='Filesystem'):
    anova_results = {}
    for metric in metrics:
        model = check_assumptions(data, metric, factor)
        # ANOVA
        anova_table = sm.stats.anova_lm(model, typ=2)
        print("\nANOVA result:")
        print(anova_table)
        anova_results[metric] = anova_table
    return anova_results

Inputs/Outputs per secondΒΆ

InΒ [86]:
metrics_to_analyze = [
    'Read_IOPS',
    'Write_IOPS',
]
anova_results = perform_anova_with_checks(final_data, metrics_to_analyze, factor="Filesystem")
=== Assumptions for metric 'Read_IOPS' ===
Shapiro-Wilk test: stat=0.8510, p=0.0000
!!! Residuals are likely not normally distributed (p < 0.05) !!!
No description has been provided for this image
Levene's test: stat=6.3287, p=0.0003
!!! Variances between groups are not homogeneous (p < 0.05) !!!
No description has been provided for this image
ANOVA result:
                     sum_sq     df         F    PR(>F)
C(Filesystem)  3.100036e+07    3.0  2.706863  0.045019
Residual       1.511731e+09  396.0       NaN       NaN

=== Assumptions for metric 'Write_IOPS' ===
Shapiro-Wilk test: stat=0.6978, p=0.0000
!!! Residuals are likely not normally distributed (p < 0.05) !!!
No description has been provided for this image
Levene's test: stat=31.4532, p=0.0000
!!! Variances between groups are not homogeneous (p < 0.05) !!!
No description has been provided for this image
ANOVA result:
                     sum_sq     df          F        PR(>F)
C(Filesystem)  3.250426e+08    3.0  27.673086  2.883330e-16
Residual       1.550446e+09  396.0        NaN           NaN

For both READ and WRITE, the result of the assumption check indicates that the data are likely not normally distributed, so the ANOVA test cannot be considered reliable.

Bandwidth (bytes/sec)ΒΆ

InΒ [87]:
metrics_to_analyze = [
    'Read_Bandwidth',
    'Write_Bandwidth',
]
anova_results = perform_anova_with_checks(final_data, metrics_to_analyze, factor="Filesystem")
=== Assumptions for metric 'Read_Bandwidth' ===
Shapiro-Wilk test: stat=0.6402, p=0.0000
!!! Residuals are likely not normally distributed (p < 0.05) !!!
No description has been provided for this image
Levene's test: stat=2.9002, p=0.0348
!!! Variances between groups are not homogeneous (p < 0.05) !!!
No description has been provided for this image
ANOVA result:
                     sum_sq     df         F    PR(>F)
C(Filesystem)  7.406809e+18    3.0  2.883642  0.035613
Residual       3.390500e+20  396.0       NaN       NaN

=== Assumptions for metric 'Write_Bandwidth' ===
Shapiro-Wilk test: stat=0.6685, p=0.0000
!!! Residuals are likely not normally distributed (p < 0.05) !!!
No description has been provided for this image
Levene's test: stat=16.1055, p=0.0000
!!! Variances between groups are not homogeneous (p < 0.05) !!!
No description has been provided for this image
ANOVA result:
                     sum_sq     df          F        PR(>F)
C(Filesystem)  6.154461e+18    3.0  16.068696  7.079198e-10
Residual       5.055723e+19  396.0        NaN           NaN

For both READ and WRITE, the result of the assumption check indicates that the data are likely not normally distributed, so the ANOVA test cannot be considered reliable.

Latency (ms)ΒΆ

InΒ [88]:
metrics_to_analyze = [
    'Read_Latency_Mean',
    'Write_Latency_Mean',
]
anova_results = perform_anova_with_checks(final_data, metrics_to_analyze, factor="Filesystem")
=== Assumptions for metric 'Read_Latency_Mean' ===
Shapiro-Wilk test: stat=0.7896, p=0.0000
!!! Residuals are likely not normally distributed (p < 0.05) !!!
No description has been provided for this image
Levene's test: stat=9.9822, p=0.0000
!!! Variances between groups are not homogeneous (p < 0.05) !!!
No description has been provided for this image
ANOVA result:
                     sum_sq     df        F    PR(>F)
C(Filesystem)  3.266334e+06    3.0  5.21157  0.001538
Residual       8.273056e+07  396.0      NaN       NaN

=== Assumptions for metric 'Write_Latency_Mean' ===
Shapiro-Wilk test: stat=0.8692, p=0.0000
!!! Residuals are likely not normally distributed (p < 0.05) !!!
No description has been provided for this image
Levene's test: stat=94.2678, p=0.0000
!!! Variances between groups are not homogeneous (p < 0.05) !!!
No description has been provided for this image
ANOVA result:
                     sum_sq     df         F        PR(>F)
C(Filesystem)  1.677955e+09    3.0  58.20406  3.428079e-31
Residual       3.805405e+09  396.0       NaN           NaN

In this category, all assumptions have even failed.

Since the ANOVA test failed for all categories on at least one assumption, I will choose an alternative, which is the Kruskal–Wallis test.

InΒ [89]:
def perform_kruskal_wallis_tests(data, metrics, factor='Filesystem'):
    results = {}
    for metric in metrics:
        grouped = [group[metric].dropna().values
                   for _, group in data.groupby(factor)]
        
        stat, p = stats.kruskal(*grouped)
        print(f"Kruskal-Wallis for '{metric}': H = {stat:.4f}, p = {p:.4f}")
InΒ [90]:
metrics_to_test = ['Read_IOPS', 'Write_IOPS', 'Read_Bandwidth', 'Write_Bandwidth', 'Read_Latency_Mean', 'Write_Latency_Mean']
perform_kruskal_wallis_tests(final_data, metrics_to_test, factor='Filesystem')
Kruskal-Wallis for 'Read_IOPS': H = 2.4468, p = 0.4850
Kruskal-Wallis for 'Write_IOPS': H = 37.2742, p = 0.0000
Kruskal-Wallis for 'Read_Bandwidth': H = 0.6404, p = 0.8871
Kruskal-Wallis for 'Write_Bandwidth': H = 12.9371, p = 0.0048
Kruskal-Wallis for 'Read_Latency_Mean': H = 2.1775, p = 0.5364
Kruskal-Wallis for 'Write_Latency_Mean': H = 42.1416, p = 0.0000

P-values in the categories Write_IOPS, Write_Bandwidth, and Write_Latency_Mean are less than 0.05, indicating that there is a statistically significant difference for WRITE, so I reject the null hypothesis and accept the alternative.

In the case of READ, all p-values are above 0.05, so I cannot reject the null hypothesis.

Pairwise Analysis Between Btrfs and Ext4ΒΆ

The file systems zfs and xfs are primarily intended for use on servers, while ext4 and btrfs are typically used on desktops. I would like to additionally test whether there is a difference between these pairs, as they are the most relevant for my setup.

Thus, the null hypothesis I am testing is that there is no significant difference in the metrics of Inputs/Outputs per second (IOPS), Bandwidth, and Mean Latency for READ and WRITE between ext4 and btrfs.

Since the data are likely not normally distributed, I will use the Wilcoxon test.

InΒ [91]:
def wilcoxon_paired_test(
    data,
    metric,
    group_col = "Filesystem",
    group1 = "ext4",
    group2= "btrfs",
    index_cols = ["Workload", "Iteration"],
    plot_qq = True
):
    wide = (
        data
        .pivot_table(
            index=index_cols,
            columns=group_col,
            values=metric
        )
        .loc[:, [group1, group2]]
        .dropna()
    )

    diffs = wide[group1] - wide[group2]

    # normalita rozdielov
    shapiro_stat, shapiro_p = stats.shapiro(diffs)
    print(f"Shapiro-Wilk for {metric} diffs ({group1}-{group2}): "
          f"stat={shapiro_stat:.4f}, p={shapiro_p:.4f}")

    # QQ-plot
    if plot_qq:
        sm.qqplot(diffs, line="45", fit=True)
        plt.title(f"QQ-plot differences for {metric}")
        plt.show()

    # Wilcoxon
    try:
        w_stat, w_p = stats.wilcoxon(
            wide[group1],
            wide[group2],
            zero_method="wilcox",
            alternative="two-sided"
        )
    except ValueError as e:
        print(f"Wilcoxon test failed for {metric}: {e}")
        w_stat, w_p = None, None

    print(f"Wilcoxon for {metric}: stat={w_stat}, p={w_p}\n")

    return {
        "metric": metric,
        "shapiro_stat": shapiro_stat,
        "shapiro_p": shapiro_p,
        "wilcoxon_stat": w_stat,
        "wilcoxon_p": w_p,
        "n_pairs": len(diffs)
    }

def perform_wilcoxon_paired_tests(
    data,
    metrics,
    group_col = "Filesystem",
    group1 = "ext4",
    group2= "btrfs",
    index_cols = ["Workload", "Iteration"]
):
    results = []
    for metric in metrics:
        res = wilcoxon_paired_test(
            data=data,
            metric=metric,
            group_col=group_col,
            group1=group1,
            group2=group2,
            index_cols=index_cols
        )
        results.append(res)
    return pd.DataFrame(results)

First, we will filter out XFS and ZFS from the final data, and then we will run the paired Wilcoxon test.

InΒ [92]:
final_data = final_data[final_data['Filesystem'].isin(['brtfs', 'ext4'])]
final_data = final_data[final_data['index'].isin(['mean'])]

#print(final_data)

Inputs/Outputs per secondΒΆ

InΒ [93]:
metrics_to_analyze = [
    'Read_IOPS',
    'Write_IOPS',
]
perform_wilcoxon_paired_tests(final_data, metrics_to_analyze, group_col = "Filesystem", group1= "ext4", group2 = "brtfs", index_cols = ["Workload", "Iteration"])
Shapiro-Wilk for Read_IOPS diffs (ext4-brtfs): stat=0.8203, p=0.0000
No description has been provided for this image
Wilcoxon for Read_IOPS: stat=210.0, p=0.6390937131929089

Shapiro-Wilk for Write_IOPS diffs (ext4-brtfs): stat=0.6288, p=0.0000
No description has been provided for this image
Wilcoxon for Write_IOPS: stat=0.0, p=1.2598486800387818e-06

Out[93]:
metric shapiro_stat shapiro_p wilcoxon_stat wilcoxon_p n_pairs
0 Read_IOPS 0.820291 2.659540e-06 210.0 0.639094 50
1 Write_IOPS 0.628792 5.341631e-10 0.0 0.000001 50

The p-value is less than 0.05 in the case of Write_IOPS, indicating that there is a statistically significant difference, so I reject the null hypothesis for Write_IOPS.

Bandwidth (bytes/sec)ΒΆ

InΒ [94]:
metrics_to_analyze = [
    'Read_Bandwidth',
    'Write_Bandwidth',
]
perform_wilcoxon_paired_tests(final_data, metrics_to_analyze, group_col = "Filesystem", group1= "ext4", group2 = "brtfs", index_cols = ["Workload", "Iteration"])
Shapiro-Wilk for Read_Bandwidth diffs (ext4-brtfs): stat=0.5277, p=0.0000
No description has been provided for this image
Wilcoxon for Read_Bandwidth: stat=155.0, p=0.10623959199707807

Shapiro-Wilk for Write_Bandwidth diffs (ext4-brtfs): stat=0.5192, p=0.0000
No description has been provided for this image
Wilcoxon for Write_Bandwidth: stat=0.0, p=1.2598486800387818e-06

Out[94]:
metric shapiro_stat shapiro_p wilcoxon_stat wilcoxon_p n_pairs
0 Read_Bandwidth 0.527711 1.890011e-11 155.0 0.106240 50
1 Write_Bandwidth 0.519150 1.457871e-11 0.0 0.000001 50

The p-value is less than 0.05 in the case of Write_Bandwidth, indicating that there is a statistically significant difference, so I reject the null hypothesis for Write_Bandwidth.

Latency (ms)ΒΆ

InΒ [95]:
metrics_to_analyze = [
    'Read_Latency_Mean',
    'Write_Latency_Mean',
]
perform_wilcoxon_paired_tests(final_data, metrics_to_analyze, group_col = "Filesystem", group1= "ext4", group2 = "brtfs", index_cols = ["Workload", "Iteration"])
Shapiro-Wilk for Read_Latency_Mean diffs (ext4-brtfs): stat=0.8330, p=0.0000
No description has been provided for this image
Wilcoxon for Read_Latency_Mean: stat=210.0, p=0.6390937131929089

Shapiro-Wilk for Write_Latency_Mean diffs (ext4-brtfs): stat=0.8199, p=0.0000
No description has been provided for this image
Wilcoxon for Write_Latency_Mean: stat=0.0, p=1.2598486800387818e-06

Out[95]:
metric shapiro_stat shapiro_p wilcoxon_stat wilcoxon_p n_pairs
0 Read_Latency_Mean 0.832988 0.000006 210.0 0.639094 50
1 Write_Latency_Mean 0.819868 0.000003 0.0 0.000001 50

The p-value is less than 0.05 in the case of Write_Latency_Mean, indicating that there is a statistically significant difference, so I reject the null hypothesis for Write_Latency_Mean.

Conclusion of the Analysis of Btrfs vs. Ext4ΒΆ

For WRITE, a significant statistical difference was found in every category. This means that there is a difference in performance during writes; however, for me and my setup, writing is the less critical part, so my personal conclusion is that I will focus on non-performance-based metrics when choosing a file system.