# 🎬 Video Scraper Feature - Implementation Guide

## 📋 Overview

Fitur **Video Scraper** memungkinkan admin untuk extract video data dari folder URL secara otomatis dan import ke database. Fitur ini menggunakan script `extract_folder.py` yang sudah ada dan mengintegrasikannya ke dalam admin panel.

## ✅ Komponen yang Dibuat

### 1. **ScraperController** 
📁 **File:** `app/Http/Controllers/Admin/ScraperController.php`

**Methods:**
- `index()` - Menampilkan form scraper
- `scrape(Request $request)` - Handle form submission dan proses scraping
- `extractVideosFromFolder()` - Execute Python script & parse CSV hasil
- `createVideoFromScrapedData()` - Create Video record dari data scraping
- `downloadThumbnail()` - Download & save thumbnail dengan optimization

**Features:**
✅ Validasi input (URL, provider, play mode)
✅ Cek duplicate videos (skip jika sudah ada)
✅ Download & convert thumbnail ke WebP
✅ Error handling & logging
✅ Support unlimited videos atau batched

### 2. **Routes**
📁 **File:** `routes/web.php`

```php
// Get /admin/videos/scraper - Show form
// Post /admin/videos/scraper/scrape - Process scraping
```

### 3. **View**
📁 **File:** `resources/views/admin/scraper/index.blade.php`

**UI Components:**
- 📝 Folder URL input
- 🎯 Provider dropdown (40+ providers)
- ▶️ Play mode selector (auto/hls/iframe/direct)
- 📊 Max videos limit
- 🏷️ Category multi-select
- #️⃣ Tags input
- ⚙️ Auto-taxonomy toggle
- 📈 Progress indicator
- ✅ Result display dengan detail

---

## 🚀 Cara Menggunakan

### Step 1: Akses Halaman Scraper
```
https://play.bokeplah.me/admin/videos/scraper
```

### Step 2: Isi Form
```
1. Folder URL: https://vid30s.com/f/FOLDER_ID
2. Provider: Pilih provider (vid30s = "generic" or "vid30s" jika available)
3. Play Mode: auto/hls/iframe/direct
4. Optional:
   - Max Videos: 100 (untuk test) atau kosongkan untuk unlimited
   - Categories: Pilih kategori untuk semua video
   - Tags: Pisahkan dengan koma
```

### Step 3: Klik "Start Scraping"
- Script akan execute `extract_folder.py`
- Extract video data dari folder URL
- Create Video records di database
- Download thumbnails & convert to WebP
- Display hasil (total found, imported, failed)

---

## 🔧 Technical Details

### Data Flow:
```
Admin Form Input
    ↓
ScraperController::scrape()
    ↓
extractVideosFromFolder() 
    → Execute Python script extract_folder.py
    → Parse CSV output
    → Return array of video data
    ↓
Loop: createVideoFromScrapedData()
    → Create Video record
    → downloadThumbnail()
    → Attach categories
    → Attach tags
    ↓
Response: JSON dengan summary
    - Total found
    - Imported count
    - Failed count
    - Error list
```

### Python Script Integration:
- **Script:** `/home/rasamemek/public_html/player2027/extract_folder.py`
- **Executes:** `python3 extract_folder.py --columns url_folder nama_folder url_subfolder title url_video url_thumbnail`
- **Output:** CSV file di `/root/server/csv/`
- **CSV Format:** `;` delimited with columns:
  ```
  url_folder;nama_folder;url_subfolder;title;url_video;url_thumbnail
  ```

### Thumbnail Processing:
```
Downloaded Image
    ↓
Validate (size > 100 bytes)
    ↓
Detect extension dari URL/content-type
    ↓
Convert to WebP (85% quality) jika GD available
    ↓
Save ke: storage/app/public/thumbnails/{slug}.webp
    ↓
Update Video record dengan path
```

---

## 📊 Database Changes

**Video Table:**
```php
Video::create([
    'slug'              => Str::slug($title) . '-' . Str::random(6),
    'title'             => $title,
    'embed_url'         => $embedUrl,
    'provider'          => $provider,
    'play_mode'         => $playMode,
    'status'            => 'ready',
    'extract_failed'    => false,
    'thumbnail_path'    => $storagePath,    // auto-set
    'storage_driver'    => 'local_public',  // auto-set
]);
```

**Categories:**
```php
$video->categories()->sync($categoryIds);  // Attach selected categories
```

**Tags:**
```php
// Create or find tags & attach to video
$video->tags()->attach($tag->id);
```

---

## ⚙️ Configuration

### Environment Requirements:
- ✅ Python 3.x installed
- ✅ `extract_folder.py` accessible
- ✅ `/root/server/` directory writable
- ✅ `/root/server/csv/` directory for output
- ✅ PHP GD extension (untuk thumbnail conversion)
- ✅ `file_get_contents()` allowed (untuk download)

### PHP Settings:
```
allow_url_fopen = On
max_execution_time >= 300 (untuk scraping besar)
memory_limit >= 256M
```

---

## 🎯 Providers Supported

Fitur scraper support 40+ providers:

**Video Hosts:**
- FileMoon, StreamWish, VidHide, Dood
- MP4Upload, StreamTape, MixDrop, VOE
- Vid30s, Vidara, PutarVid, VidNest
- BySeqeKaho, Berbagi, PoopTV, PodJav
- IndoVidPlus, LuluVid, StreamHLS, HavenFile
- VidKeyX, BigWarp, YouVid, VidGuard
- Upstream, Strmup, SikatSaja, VidZP
- xHamster, xVideos, Pemersatu, VidsSt
- Videy, VidEQ, VidString, VidOes
- AcaImg, StreamFlash, dan Generic/Unknown

---

## 🧪 Testing

### Test Case 1: Simple Scrape
```bash
# Input: https://vid30s.com/f/abc123 (folder with 5 videos)
# Provider: generic
# Play Mode: auto
# Expected: 5 videos created dengan thumbnail

# Result: ✅ 5 found, 5 imported, 0 failed
```

### Test Case 2: With Categories
```bash
# Add: Category selection (Action, Drama)
# Expected: All videos attached ke categories

# Result: ✅ Categories synced
```

### Test Case 3: With Tags
```bash
# Add: Tags input "action,movie,2024"
# Expected: All videos attached ke tags

# Result: ✅ Tags created & attached
```

### Test Case 4: Max Videos Limit
```bash
# Input: Folder dengan 100 videos
# Max Videos: 10
# Expected: Hanya 10 yang diimport

# Result: ✅ 100 found, 10 imported, 0 failed
```

### Test Case 5: Duplicate Check
```bash
# Run twice dengan folder yang sama
# First run: 5 videos created
# Second run: Same folder
# Expected: All skipped (duplicate)

# Result: ✅ 5 found, 0 imported, 5 skipped (existing)
```

---

## 📝 Logging

### Log Location:
- **Main Log:** `storage/logs/laravel.log`
- **Python Log:** akan ditampilkan di stderr

### Log Examples:

```log
[2024-06-23 10:15:30] local.INFO: Scraper started {
  "folder_url": "https://vid30s.com/f/abc123",
  "provider": "generic",
  "play_mode": "auto",
  "max_videos": "unlimited"
}

[2024-06-23 10:15:35] local.INFO: Extracted videos from folder {
  "count": 5
}

[2024-06-23 10:15:40] local.INFO: Created video from scraped data {
  "video_id": 1234,
  "title": "Video Title",
  "provider": "generic"
}

[2024-06-23 10:15:45] local.INFO: Downloaded and saved thumbnail {
  "video_id": 1234,
  "url": "https://...",
  "path": "thumbnails/video-title-xyz123.webp"
}

[2024-06-23 10:16:00] local.INFO: Scraper completed {
  "total_found": 5,
  "imported": 5,
  "failed": 0
}
```

---

## ⚠️ Error Handling

### Common Errors:

**1. "CSV output directory tidak ditemukan"**
```
Solusi: Buat directory /root/server/csv/
mkdir -p /root/server/csv
```

**2. "Script extract_folder.py tidak ditemukan"**
```
Solusi: Pastikan file ada di base_path()
Cek: ls -la /home/rasamemek/public_html/player2027/extract_folder.py
```

**3. "Gagal download gambar dari URL"**
```
Solusi: Check firewall, allow_url_fopen
Atau skip thumbnail dengan tidak fill URL
```

**4. "Tidak ada video ditemukan di folder tersebut"**
```
Solusi: Check folder URL validity
Coba manual di browser dulu
Atau folder benar-benar kosong
```

---

## 🔐 Security

### Validasi:
✅ CSRF token check
✅ URL validation (must be valid URL)
✅ Provider whitelist check
✅ Category ID validation (must exist)
✅ Max videos limit enforce

### File Operations:
✅ Safe file_get_contents dengan timeout
✅ SSL verification disabled (untuk external CDN)
✅ Safe temporary file handling
✅ Automatic cleanup of temp files

### Database:
✅ Duplicate check sebelum insert
✅ Transaction-safe operations
✅ Proper error handling & rollback

---

## 📈 Performance

### Optimization:
- Batch processing untuk multiple videos
- Concurrent requests ke CDN (jika needed)
- Thumbnail conversion hanya 1x per video
- CSV parsing efficient
- Logging non-blocking

### Estimated Time:
- 10 videos: ~10-15 seconds
- 100 videos: ~60-90 seconds
- 1000 videos: ~10-15 minutes

(Bergantung: CDN speed, thumbnail size, server resources)

---

## 🎉 Usage Summary

**Admin Panel:**
```
Admin → Videos → Scraper
```

**Form Fields:**
| Field | Required | Type | Notes |
|-------|----------|------|-------|
| Folder URL | ✅ | Text | Must be valid URL |
| Provider | ✅ | Select | 40+ providers |
| Play Mode | ✅ | Select | auto/hls/iframe/direct |
| Max Videos | ❌ | Number | Optional limit |
| Categories | ❌ | Checkbox | Multi-select |
| Tags | ❌ | Text | Comma-separated |
| Auto-Taxonomy | ❌ | Checkbox | Enable auto-categorization |

**Result Display:**
- ✅ Imported count
- ⚠️ Failed count
- 📊 Errors list (if any)

---

## 🔗 Related Files

- **Controller:** [ScraperController.php](app/Http/Controllers/Admin/ScraperController.php)
- **View:** [admin/scraper/index.blade.php](resources/views/admin/scraper/index.blade.php)
- **Routes:** [routes/web.php](routes/web.php) (lines 57-59)
- **Python Script:** [extract_folder.py](extract_folder.py)
- **Video Model:** [app/Models/Video.php](app/Models/Video.php)

---

**Version:** 1.0
**Created:** June 23, 2024
**Last Updated:** June 23, 2024

