Skip to main content

📈 Kinerja & Optimisasi

Target Pemantauan

Untuk menjaga dan meningkatkan performa mining, monitor metrik-metrik berikut secara rutin:

MetrikTargetCara CekPrioritas
Uptime>99.5%uptime, monitoring tool⭐⭐⭐⭐⭐
Response TimeDi bawah 5 detik (tergantung subnet)Log miner, timing⭐⭐⭐⭐⭐
Error RateDi bawah 1%Log analysis⭐⭐⭐⭐
GPU Utilization70-95%nvidia-smi⭐⭐⭐⭐
GPU TemperatureDi bawah 80°Cnvidia-smi⭐⭐⭐
RAM UsageDi bawah 85%free -h⭐⭐⭐
Disk UsageDi bawah 80%df -h⭐⭐⭐
Network LatencyDi bawah 100ms ke validatorping⭐⭐⭐
Weight/RankMeningkat atau stabilbtcli subnet metagraph⭐⭐⭐⭐⭐
Daily EarningsSesuai targetTaostats, wallet balance⭐⭐⭐⭐
Baseline Dulu

Sebelum optimisasi, catat metrik baseline Anda. Tanpa baseline, Anda tidak bisa mengukur apakah optimisasi berhasil.


🔍 Tools Pemantauan

Built-in Tools

# GPU monitoring real-time
watch -n 1 nvidia-smi

# System resources
htop

# Disk I/O
iotop

# Network
iftop

Monitoring Script Sederhana

#!/bin/bash
# monitor.sh - Jalankan dengan: bash monitor.sh

echo "=== Bittensor Miner Monitor ==="
echo "Waktu: $(date)"
echo ""

echo "--- GPU Status ---"
nvidia-smi --query-gpu=name,temperature.gpu,utilization.gpu,memory.used,memory.total --format=csv,noheader

echo ""
echo "--- Memory ---"
free -h | grep Mem

echo ""
echo "--- Disk ---"
df -h / | tail -1

echo ""
echo "--- Miner Process ---"
pgrep -a -f "miner.py" || echo "MINER NOT RUNNING!"

echo ""
echo "--- Network Latency ---"
ping -c 3 entrypoint-finney.opentensor.ai | tail -1

Taostats

Taostats adalah dashboard web untuk monitoring:

FiturDeskripsi
Subnet OverviewStatistik per subnet (emisi, jumlah miner, dll)
Miner RankingPosisi miner Anda dibanding kompetitor
Emission TrackerTAO yang diterima per epoch
Validator StatsPerforma validator yang menilai Anda
Historical DataTren performa dari waktu ke waktu

Prometheus + Grafana (Advanced)

Untuk monitoring yang lebih serius, setup Prometheus + Grafana:

# Install Prometheus node exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
./node_exporter-1.7.0.linux-amd64/node_exporter &

# Install NVIDIA GPU exporter
# Mengekspose metrik GPU ke Prometheus
docker run -d --gpus all -p 9400:9400 utkuozdemir/nvidia_gpu_exporter:1.2.0

🚀 Strategi Optimisasi

1. Model Caching

Hindari loading model berulang kali:

# Contoh caching model di memory
import functools

@functools.lru_cache(maxsize=1)
def load_model():
"""Load model sekali, cache di memory."""
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"model-name",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("model-name")
return model, tokenizer

2. Response Caching

Cache response untuk query yang serupa:

from functools import lru_cache
import hashlib

# Cache response berdasarkan hash query
response_cache = {}

def get_cached_response(query: str):
query_hash = hashlib.md5(query.encode()).hexdigest()
if query_hash in response_cache:
return response_cache[query_hash]
return None

def cache_response(query: str, response: str):
query_hash = hashlib.md5(query.encode()).hexdigest()
response_cache[query_hash] = response
# Limit cache size
if len(response_cache) > 10000:
# Remove oldest entries
oldest = list(response_cache.keys())[:1000]
for key in oldest:
del response_cache[key]
Hati-hati dengan Caching

Beberapa subnet mungkin mendeteksi dan menghukum response yang di-cache (identical responses). Gunakan caching dengan bijak dan pastikan sesuai dengan aturan subnet.

3. Batch Processing

Proses multiple queries sekaligus jika memungkinkan:

# Batch inference untuk efisiensi GPU
def batch_inference(queries: list, model, tokenizer, batch_size=8):
results = []
for i in range(0, len(queries), batch_size):
batch = queries[i:i+batch_size]
inputs = tokenizer(batch, return_tensors="pt", padding=True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
results.extend(tokenizer.batch_decode(outputs, skip_special_tokens=True))
return results

4. Model Quantization

Kurangi ukuran model untuk inference lebih cepat:

# 4-bit quantization dengan bitsandbytes
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
"model-name",
quantization_config=quantization_config,
device_map="auto"
)
QuantizationVRAM SavingsQuality ImpactSpeed Impact
FP32 (baseline)0%TerbaikPaling lambat
FP16~50%MinimalLebih cepat
8-bit~75%KecilLebih cepat
4-bit~87%ModeratPaling cepat

5. Worker Scaling

Jalankan multiple worker jika GPU mendukung:

# Gunakan multiprocessing untuk parallel inference
import torch.multiprocessing as mp

def worker(gpu_id, task_queue, result_queue):
model = load_model_on_gpu(gpu_id)
while True:
task = task_queue.get()
if task is None:
break
result = inference(model, task)
result_queue.put(result)

🔄 Decision Framework: Upgrade Hardware vs Optimize Software

Kapan Upgrade Hardware?

SituasiAksi
GPU utilization >95% terus-menerusUpgrade GPU
VRAM penuh, tidak bisa quantize lebihUpgrade ke GPU VRAM lebih besar
Response time masih lambat setelah optimisasiUpgrade GPU generasi lebih baru
Multiple subnet / scaling upTambah GPU
Network bottleneckUpgrade internet / pindah VPS

Kapan Optimize Software?

SituasiAksi
GPU utilization rendah (di bawah 50%)Optimisasi batching
Model loading lambatImplement caching
Banyak response duplicateResponse caching
Memory leak (RAM naik terus)Profile dan fix memory management
Startup time lamaPreload model, warmup

📊 Benchmarking Tips

Benchmark Response Time

import time

def benchmark_inference(model, tokenizer, prompt, n_runs=10):
"""Benchmark rata-rata response time."""
times = []
for _ in range(n_runs):
start = time.time()
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
tokenizer.decode(outputs[0], skip_special_tokens=True)
elapsed = time.time() - start
times.append(elapsed)

avg_time = sum(times) / len(times)
min_time = min(times)
max_time = max(times)
print(f"Avg: {avg_time:.2f}s | Min: {min_time:.2f}s | Max: {max_time:.2f}s")
return avg_time

Benchmark Checklist

  • Benchmark sebelum dan sesudah setiap perubahan
  • Test dengan berbagai jenis query (pendek, panjang, kompleks)
  • Monitor GPU memory dan utilization selama benchmark
  • Bandingkan dengan response time kompetitor (via Taostats)
  • Ulangi benchmark minimal 10 kali untuk rata-rata yang akurat

🔁 Iterative Improvement Loop

Best Practices

  1. Satu perubahan per iterasi — Jangan ubah banyak hal sekaligus
  2. Ukur sebelum dan sesudah — Tanpa data, Anda hanya menebak
  3. Rollback jika gagal — Jangan pertahankan perubahan yang tidak membaik
  4. Dokumentasikan — Catat apa yang berhasil dan apa yang tidak
  5. Sabar — Optimisasi butuh waktu, jangan expect hasil instan

Rangkuman

AreaRekomendasi
MonitoringPantau uptime, response time, GPU, dan weight secara rutin
CachingCache model di memory, response caching untuk query serupa
QuantizationGunakan 4-bit/8-bit untuk model besar di GPU kecil
DecisionGPU di bawah 50% = optimize software; GPU di atas 95% = upgrade hardware
BenchmarkingSelalu measure sebelum dan sesudah perubahan
IterasiMeasure → Analyze → Plan → Implement → Measure

Selanjutnya: Logika Bisnis & Strategi GTM →