Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/.vitepress/en.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ function sidebar() {
text: "Simple Linear Regression",
link: "/en/statistics/simple-linear-regression",
},
{
text: "Variance and Standard Deviation",
link: "/statistics/variance-standard-deviation",
},
],
},
{
Expand Down
4 changes: 4 additions & 0 deletions docs/.vitepress/id.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ function sidebar() {
text: "Regresi Linier Sederhana",
link: "/statistics/simple-linear-regression",
},
{
text: "Varians dan Standar Deviasi",
link: "/statistics/variance-standard-deviation",
},
],
},
{
Expand Down
134 changes: 134 additions & 0 deletions docs/en/statistics/variance-standard-deviation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
contributors:
- name: "Fadjar"
username: "fadjarrafi"
---
# Variance and Standard Deviation

Variance is a fundamental concept in statistics that measures how spread out a set of numbers is from their average (mean) value. Think of variance as a measure of the "average" distance between data points and their center, but with an interesting twist - these distances are squared, which has important mathematical implications.

Imagine you're measuring tree heights in two different forests. In Forest A, most trees are around 30 meters tall with little variation. Meanwhile, in Forest B, the tree heights vary greatly, ranging from 10 meters to 50 meters. Although the average tree height in both forests is 30 meters, it's clear that the variation in tree heights differs between the two forests. Variance helps us quantify this difference.

## Mathematical Definition

Let's break down the variance formula step by step:

1. First, find the mean value (μ) of the data.
2. For each value in the data:
- Subtract the mean (to get the distance from the center).
- Square the result (to make all values positive and emphasize larger differences).
3. Calculate the average of all squared differences.

The formula is:

$$
\sigma^2 = \frac{\sum (x - \mu)^2}{N}
$$

Where:

| Symbol | Meaning |
|--------|---------|
| $\sigma^2$ | Variance |
| $x$ | Each value in the dataset |
| $\mu$ | Mean of the dataset, calculated as $\mu = \frac{\sum x}{N}$ |
| $N$ | Total number of values in the dataset |
| $\sum$ | Sigma notation, meaning "sum up everything that follows" |

## Programming Implementation

Here's a JavaScript implementation for calculating variance:

```javascript
// Function to calculate the mean of an array
const calculateMean = (numbers) => {
return numbers.reduce((sum, num) => sum + num, 0) / numbers.length;
};

// Main function to calculate variance
const calculateVariance = (numbers) => {
const mean = calculateMean(numbers);
const squaredDifferences = numbers.map(num => Math.pow(num - mean, 2));
return calculateMean(squaredDifferences);
};

// Function to calculate standard deviation
const calculateStandardDeviation = (numbers) => {
return Math.sqrt(calculateVariance(numbers));
};

// Example usage
const exploreVariance = (datasetName, numbers) => {
console.log(`\n${datasetName} Analysis:`);
console.log(`Data: [${numbers.join(', ')}]`);
console.log(`Mean: ${calculateMean(numbers).toFixed(2)}`);
console.log(`Variance: ${calculateVariance(numbers).toFixed(2)}`);
console.log(`Standard Deviation: ${calculateStandardDeviation(numbers).toFixed(2)}`);
};

// Example datasets with different variances
const forestA = [28, 29, 30, 31, 32];
const forestB = [10, 20, 30, 40, 50];

exploreVariance('Forest A (Uniform Heights)', forestA);
exploreVariance('Forest B (Varying Heights)', forestB);
// Output will show that Forest B has a higher variance than Forest A
```

## Why Square the Differences?

Squaring the differences has several important benefits:

4. **Handling Negative Values**: Differences can be negative or positive, but squaring makes them all positive.
5. **Emphasizing Outliers**: Values far from the mean will result in larger differences when squared, making variance more sensitive to outliers.
6. **Mathematical Benefits**: Squaring creates a smoother function that can be differentiated, which is useful in various optimizations.

## Interpreting Variance and Standard Deviation in Statistics

Variance and standard deviation aren't just mathematical numbers; they have important meanings in understanding data distribution. Here are some ways to interpret these metrics in statistical analysis:

7. **Small Variance and Standard Deviation**

- If a dataset has small variance and standard deviation, this means the values tend to cluster around the mean.
- Example: In car parts production, a small standard deviation indicates that the produced parts are very consistent in size.

8. **Large Variance and Standard Deviation**

- If variance and standard deviation are large, this means the data is spread far from the mean, indicating high variability.
- Example: In stock investments, a large standard deviation indicates high volatility, meaning stock prices can fluctuate significantly.

9. **Comparison between Datasets**

- Two datasets can have the **same mean but different variances**, indicating different levels of data spread.
- Example: If two schools have the same average test score, but one school has a higher standard deviation, then students in that school have more extreme score differences compared to the other school.

10. **Relationship with Normal Distribution**

- In a normal distribution, **about 68% of data falls within one standard deviation of the mean**, **95% within two standard deviations**, and **99.7% within three standard deviations**.
- This is very useful in data analysis for determining whether data points are outliers.

## Types of Variance

There are two main types of variance in statistics, depending on whether we're analyzing the entire population or just a sample:

11. **Population Variance (σ²)**

- Used when we have **all data in the population**.
- Formula: $\sigma^2 = \frac{\sum (x - \mu)^2}{N}$
- Example: If we have weight data for **all** residents of a city, we use population variance.

12. **Sample Variance (s²)**

- Used when we only have **partial data from the population** (a sample).
- Formula: $s^2 = \frac{\sum (x - \bar{x})^2}{n - 1}$
- The main difference from population variance is **using division by (n - 1) instead of N**. This is called **degrees of freedom** and is used to avoid bias in population variance estimation.
- Example: If we only have weight data from **100 people** in the city (not the entire population), we use sample variance.

## When to Use Population vs. Sample Variance?

- Use **population variance** if you have **all the data** from the population you want to analyze.
- Use **sample variance** if you only have **partial data** and want to estimate the variance for the entire population.

## Relationship with Standard Deviation

Standard deviation (σ) is the square root of variance. While variance is useful in mathematical calculations, standard deviation is more intuitive because it has the same units as the original data.
132 changes: 132 additions & 0 deletions docs/id/statistics/variance-standard-deviation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
contributors:
- name: "Fadjar"
username: "fadjarrafi"
---
# Varians dan Standar Deviasi

Variansi adalah konsep dasar dalam statistika yang mengukur seberapa tersebar suatu kumpulan angka dari nilai rata-rata (mean)-nya. Bayangkan variansi sebagai ukuran jarak "rata-rata" antara titik data dan pusatnya, tetapi dengan sedikit perbedaan menarik - jarak ini dikuadratkan, yang memiliki implikasi matematis yang penting.

Bayangkan kamu sedang mengukur tinggi pohon di dua hutan yang berbeda. Di Hutan A, sebagian besar pohon memiliki tinggi sekitar 30 meter dengan sedikit variasi. Sementara itu, di Hutan B, tinggi pohonnya sangat beragam, mulai dari 10 meter hingga 50 meter. Meskipun rata-rata tinggi pohon di kedua hutan adalah 30 meter, jelas bahwa tingkat variasi tinggi pohon di kedua hutan berbeda. Variansi membantu kita menghitung perbedaan ini.

## Definisi Rumus

Mari kita uraikan rumus variansi secara bertahap:

1. Pertama, cari nilai rata-rata (μ) dari data.
2. Untuk setiap nilai dalam data:
- Kurangkan dengan rata-rata (untuk mendapatkan jarak dari pusat).
- Kuadratkan hasilnya (agar semua nilai positif dan menekankan perbedaan yang lebih besar).
3. Hitung rata-rata dari semua selisih yang telah dikuadratkan.

Rumusnya adalah:

$$
\sigma^2 = \frac{\sum (x - \mu)^2}{N}
$$

Di mana:

| Simbol | Pengertian |
|---------|------------|
| $\sigma^2$ | Varians |
| $x$ | Setiap nilai dalam kumpulan data |
| $\mu$ | Rata-rata dari kumpulan data, dihitung sebagai $\mu = \frac{\sum x}{N}$ |
| $N$ | Jumlah total nilai dalam kumpulan data |
| $\sum$ | Notasi sigma, berarti "jumlahkan semua yang mengikuti" |



## Implementasi dalam Pemrograman

Berikut adalah implementasi dalam JavaScript untuk menghitung variansi:

```javascript
// Fungsi untuk menghitung rata-rata dari sebuah array
const calculateMean = (numbers) => {
return numbers.reduce((sum, num) => sum + num, 0) / numbers.length;
};

// Fungsi utama untuk menghitung variansi
const calculateVariance = (numbers) => {
const mean = calculateMean(numbers);
const squaredDifferences = numbers.map(num => Math.pow(num - mean, 2));
return calculateMean(squaredDifferences);
};

// Fungsi untuk menghitung standar deviasi
const calculateStandardDeviation = (numbers) => {
return Math.sqrt(calculateVariance(numbers));
};

// Contoh penggunaan
const exploreVariance = (datasetName, numbers) => {
console.log(`\nAnalisis ${datasetName}:`);
console.log(`Data: [${numbers.join(', ')}]`);
console.log(`Mean: ${calculateMean(numbers).toFixed(2)}`);
console.log(`Variansi: ${calculateVariance(numbers).toFixed(2)}`);
console.log(`Standar Deviasi: ${calculateStandardDeviation(numbers).toFixed(2)}`);
};

// Contoh kumpulan data dengan variansi berbeda
const hutanA = [28, 29, 30, 31, 32];
const hutanB = [10, 20, 30, 40, 50];

exploreVariance('Hutan A (Tinggi Seragam)', hutanA);
exploreVariance('Hutan B (Tinggi Bervariasi)', hutanB);
// Ouput akan memunculkan kalau Hutan B memiliki nilai varians lebih besar dari Hutan A
```

## Mengapa Harus Dikuadratkan?

Mengkuadratkan selisih memiliki beberapa manfaat penting:

1. **Mengatasi Nilai Negatif**: Selisih bisa bernilai negatif atau positif, tetapi dengan mengkuadratkannya, semua nilai menjadi positif.
2. **Menekankan Outlier**: Nilai yang jauh dari rata-rata akan menghasilkan selisih yang lebih besar setelah dikuadratkan, sehingga variansi lebih sensitif terhadap outlier.
3. **Manfaat Matematis**: Kuadrat menciptakan fungsi yang lebih halus dan dapat diturunkan, yang bermanfaat dalam berbagai optimasi.

## Interpretasi Varians dan Standar Deviasi dalam Statistik

Varians dan standar deviasi tidak hanya angka matematis, tetapi juga memiliki makna penting dalam memahami distribusi data. Berikut adalah beberapa cara menginterpretasikan kedua metrik ini dalam analisis statistik:

1. **Varians dan Standar Deviasi Kecil**

- Jika varians dan standar deviasi suatu dataset kecil, ini berarti nilai-nilai data cenderung berkumpul di sekitar rata-rata.
- Contoh: Dalam produksi suku cadang mobil, standar deviasi kecil menunjukkan bahwa ukuran suku cadang yang dihasilkan sangat konsisten.
2. **Varians dan Standar Deviasi Besar**

- Jika varians dan standar deviasi besar, ini berarti data tersebar jauh dari rata-rata, menunjukkan adanya variabilitas tinggi.
- Contoh: Dalam investasi saham, standar deviasi besar menunjukkan volatilitas tinggi, yang berarti harga saham dapat berfluktuasi secara signifikan.
3. **Perbandingan antara Dataset**

- Dua dataset dapat memiliki **rata-rata yang sama tetapi varians berbeda**, yang mengindikasikan tingkat penyebaran data yang berbeda.
- Contoh: Jika dua sekolah memiliki rata-rata nilai ujian yang sama, tetapi satu sekolah memiliki standar deviasi lebih tinggi, maka siswa di sekolah tersebut memiliki perbedaan nilai yang lebih ekstrem dibandingkan sekolah lainnya.
4. **Kaitan dengan Distribusi Normal**

- Dalam distribusi normal, **sekitar 68% data berada dalam satu standar deviasi dari rata-rata**, **95% dalam dua standar deviasi**, dan **99.7% dalam tiga standar deviasi**.
- Ini sangat berguna dalam analisis data untuk menentukan apakah suatu data adalah outlier atau tidak.

## Jenis-jenis Varians

Ada dua jenis utama varians dalam statistik, tergantung pada apakah kita menganalisis seluruh populasi atau hanya sampel dari populasi tersebut:

1. **Varians Populasi (Population Variance, σ²)**

- Digunakan ketika kita memiliki **seluruh data dalam populasi**.
- Rumus: $s^2 = \frac{\sum (x - \bar{x})^2}{n - 1}$
- Contoh: Jika kita memiliki data berat badan **seluruh** penduduk suatu kota, kita menggunakan varians populasi.
2. **Varians Sampel (Sample Variance, s²)**

- Digunakan ketika kita hanya memiliki **sebagian data dari populasi** (sampel).
- Rumus: $s2=∑(x−xˉ)2n−1s^2 = \frac{\sum (x - \bar{x})^2}{n - 1}$
- Perbedaan utama dari varians populasi adalah **menggunakan pembagian dengan (n - 1) daripada N**. Ini disebut **derajat kebebasan (degrees of freedom)** dan digunakan untuk menghindari bias dalam estimasi varians populasi.
- Contoh: Jika kita hanya memiliki data berat badan dari **100 orang** di kota (bukan seluruh populasi), kita menggunakan varians sampel.

## Kapan Menggunakan Varians Populasi vs. Sampel?

- Gunakan **varians populasi** jika Anda memiliki **seluruh data** dari populasi yang ingin dianalisis.
- Gunakan **varians sampel** jika Anda hanya memiliki **sebagian data**, dan ingin mengestimasi varians untuk seluruh populasi.

## Hubungan dengan Standar Deviasi

Standar deviasi (σ) adalah akar kuadrat dari variansi. Meskipun variansi berguna dalam perhitungan matematis, standar deviasi lebih intuitif karena memiliki satuan yang sama dengan data aslinya.