Skip to content

Commit 236dd67

Browse files
committed
Implement comprehensive testing and monitoring scripts for Cloud Function data loading
1 parent 69d4d3f commit 236dd67

File tree

5 files changed

+926
-11
lines changed

5 files changed

+926
-11
lines changed

CLOUD_FUNCTION_TESTING_GUIDE.md

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
# Cloud Function Testing Guide
2+
3+
This guide provides comprehensive instructions for testing your Cloud Function that automatically loads Titanic data to BigQuery.
4+
5+
## 🧪 Available Testing Scripts
6+
7+
### 1. **Comprehensive Test Suite** (PowerShell)
8+
```powershell
9+
.\scripts\test_cloud_function.ps1 -ProjectId "your-project-id" -TestMode "full"
10+
```
11+
12+
**Test Modes:**
13+
- `full` - Complete testing including performance tests
14+
- `quick` - Basic functionality test (recommended for regular use)
15+
- `logs-only` - Only check function logs
16+
- `function-status` - Only verify function deployment
17+
18+
### 2. **Quick Bash Test**
19+
```bash
20+
./scripts/test_cloud_function_quick.sh your-project-id
21+
```
22+
23+
### 3. **Enhanced Data Loading with Function Testing**
24+
```bash
25+
./scripts/check_and_load_titanic_data.sh your-project-id
26+
```
27+
28+
### 4. **Live Monitoring**
29+
```powershell
30+
.\scripts\monitor_cloud_function.ps1 -ProjectId "your-project-id" -MonitorDuration 5
31+
```
32+
33+
## 🔍 Manual Testing Steps
34+
35+
### Step 1: Verify Function Deployment
36+
```bash
37+
gcloud functions describe titanic-data-loader \
38+
--region=us-central1 \
39+
--project=your-project-id
40+
```
41+
42+
**Expected Output:**
43+
- Status: `ACTIVE`
44+
- Runtime: `python311`
45+
- Trigger: Storage bucket event
46+
- Entry point: `load_titanic_to_bigquery`
47+
48+
### Step 2: Check Function Logs
49+
```bash
50+
gcloud logging read \
51+
"resource.type=cloud_function AND resource.labels.function_name=titanic-data-loader" \
52+
--project=your-project-id \
53+
--limit=10
54+
```
55+
56+
### Step 3: Test File Upload Trigger
57+
```bash
58+
# Download test data
59+
curl -o titanic.csv "https://raw.githubusercontent.com/datasciencedojo/datasets/refs/heads/master/titanic.csv"
60+
61+
# Upload to trigger function
62+
gsutil cp titanic.csv gs://your-project-id-temp-bucket/titanic.csv
63+
```
64+
65+
### Step 4: Monitor Function Execution
66+
```bash
67+
# Watch logs in real-time
68+
gcloud logging tail \
69+
"resource.type=cloud_function AND resource.labels.function_name=titanic-data-loader" \
70+
--project=your-project-id
71+
```
72+
73+
### Step 5: Verify BigQuery Data Loading
74+
```bash
75+
# Check if table exists
76+
bq show --project_id=your-project-id test_dataset.titanic
77+
78+
# Get row count
79+
bq query --project_id=your-project-id \
80+
--use_legacy_sql=false \
81+
"SELECT COUNT(*) as total_rows FROM \`your-project-id.test_dataset.titanic\`"
82+
83+
# Sample data
84+
bq query --project_id=your-project-id \
85+
--use_legacy_sql=false \
86+
"SELECT * FROM \`your-project-id.test_dataset.titanic\` LIMIT 5"
87+
```
88+
89+
## 🎯 What to Look For
90+
91+
### ✅ Success Indicators
92+
1. **Function Status**: `ACTIVE` in function description
93+
2. **Trigger Configuration**: Event trigger on correct bucket
94+
3. **Log Messages**:
95+
- "Processing file: titanic.csv"
96+
- "Successfully loaded X rows into..."
97+
- No ERROR severity logs
98+
4. **BigQuery Table**:
99+
- Table exists in `test_dataset.titanic`
100+
- Contains ~891 rows (Titanic dataset size)
101+
- Data sample shows passenger information
102+
103+
### ❌ Common Issues
104+
105+
#### Function Not Triggering
106+
**Symptoms:** No logs after file upload
107+
**Causes:**
108+
- Function not deployed
109+
- Incorrect bucket trigger configuration
110+
- Permissions issues
111+
112+
**Debug:**
113+
```bash
114+
# Check function exists
115+
gcloud functions list --project=your-project-id
116+
117+
# Verify trigger bucket
118+
gcloud functions describe titanic-data-loader --region=us-central1 --project=your-project-id --format="value(eventTrigger.resource)"
119+
120+
# Check bucket permissions
121+
gsutil iam get gs://your-project-id-temp-bucket
122+
```
123+
124+
#### Function Failing
125+
**Symptoms:** ERROR logs in function execution
126+
**Common Errors:**
127+
- BigQuery permission denied
128+
- Dataset doesn't exist
129+
- Python package import errors
130+
131+
**Debug:**
132+
```bash
133+
# Check detailed error logs
134+
gcloud logging read \
135+
"resource.type=cloud_function AND resource.labels.function_name=titanic-data-loader AND severity=ERROR" \
136+
--project=your-project-id \
137+
--limit=5
138+
```
139+
140+
#### BigQuery Not Updating
141+
**Symptoms:** Function executes but table not created/updated
142+
**Causes:**
143+
- BigQuery API not enabled
144+
- Service account lacks BigQuery permissions
145+
- Dataset doesn't exist
146+
147+
**Debug:**
148+
```bash
149+
# Check if dataset exists
150+
bq ls --project_id=your-project-id
151+
152+
# Create dataset if missing
153+
bq mk --project_id=your-project-id --description="Test dataset" test_dataset
154+
155+
# Check service account permissions
156+
gcloud projects get-iam-policy your-project-id
157+
```
158+
159+
## 🔧 Troubleshooting Commands
160+
161+
### Check Function Metrics
162+
```bash
163+
gcloud functions describe titanic-data-loader \
164+
--region=us-central1 \
165+
--project=your-project-id \
166+
--format="table(status,updateTime,versionId,runtime)"
167+
```
168+
169+
### View Recent Function Activity
170+
```bash
171+
gcloud logging read \
172+
"resource.type=cloud_function AND resource.labels.function_name=titanic-data-loader AND timestamp>=2024-01-01" \
173+
--project=your-project-id \
174+
--format="table(timestamp,severity,textPayload)" \
175+
--limit=20
176+
```
177+
178+
### Test Different File Types
179+
```bash
180+
# Test with different CSV files
181+
echo "name,age,city" > test.csv
182+
echo "John,30,NYC" >> test.csv
183+
gsutil cp test.csv gs://your-project-id-temp-bucket/titanic.csv
184+
```
185+
186+
### Check Bucket Activity
187+
```bash
188+
# List bucket contents
189+
gsutil ls -l gs://your-project-id-temp-bucket/
190+
191+
# Check bucket notifications
192+
gsutil notification list gs://your-project-id-temp-bucket/
193+
```
194+
195+
## 📊 Performance Expectations
196+
197+
### Normal Function Performance
198+
- **Cold Start**: 3-10 seconds for first execution
199+
- **Warm Start**: 1-3 seconds for subsequent executions
200+
- **Data Processing**: ~5-15 seconds for Titanic dataset (891 rows)
201+
- **Total Time**: Usually completes within 30 seconds
202+
203+
### Memory and Timeout
204+
- **Allocated Memory**: 256MB
205+
- **Timeout**: 300 seconds (5 minutes)
206+
- **Expected Usage**: <50MB for Titanic dataset
207+
208+
## 🔄 Automated Testing Integration
209+
210+
### GitHub Actions Testing
211+
```yaml
212+
- name: Test Cloud Function
213+
run: |
214+
./scripts/test_cloud_function_quick.sh ${{ secrets.GCP_PROJECT_ID }}
215+
```
216+
217+
### Scheduled Health Checks
218+
```bash
219+
# Add to crontab for hourly checks
220+
0 * * * * /path/to/test_cloud_function_quick.sh your-project-id >> /var/log/cf-health.log 2>&1
221+
```
222+
223+
## 📝 Test Results Interpretation
224+
225+
### Sample Successful Test Output
226+
```
227+
✅ Cloud Function 'titanic-data-loader' is deployed
228+
ℹ️ Status: ACTIVE
229+
✅ File uploaded successfully
230+
✅ New function execution detected!
231+
✅ BigQuery table 'test_dataset.titanic' exists
232+
ℹ️ Rows in table: 891
233+
🎉 Quick Cloud Function test completed!
234+
```
235+
236+
### Error Patterns to Watch
237+
```
238+
❌ Cloud Function 'titanic-data-loader' not found
239+
⚠️ No new function execution logs found
240+
❌ BigQuery table 'test_dataset.titanic' not found
241+
```
242+
243+
This comprehensive testing approach ensures your Cloud Function is working correctly and provides detailed diagnostics when issues occur.

scripts/check_and_load_titanic_data.sh

Lines changed: 39 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -53,23 +53,51 @@ if [ "$NEED_DATA" = "true" ]; then
5353

5454
# Check if the table was created by the Cloud Function
5555
echo "Verifying that data was loaded by Cloud Function..."
56+
57+
# First, wait and check logs for function execution
58+
echo "Checking Cloud Function execution logs..."
59+
UPLOAD_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
60+
sleep 5 # Brief wait before checking logs
61+
62+
# Check for recent function execution
63+
FUNCTION_LOGS=$(gcloud logging read "resource.type=cloud_function AND resource.labels.function_name=titanic-data-loader AND timestamp>=\"$(date -u -d '2 minutes ago' +%Y-%m-%dT%H:%M:%SZ)\"" --project="$PROJECT_ID" --limit=5 --format="value(textPayload)" 2>/dev/null || echo "")
64+
65+
if [ -n "$FUNCTION_LOGS" ]; then
66+
echo "📋 Recent Cloud Function activity detected:"
67+
echo "$FUNCTION_LOGS" | head -3
68+
else
69+
echo "⚠️ No recent Cloud Function logs found"
70+
fi
71+
72+
# Wait for function to complete processing
73+
echo "Waiting for Cloud Function processing to complete..."
74+
sleep 25
75+
76+
# Check if BigQuery table was created
5677
if gcloud alpha bq tables describe "test_dataset.titanic" --project="$PROJECT_ID" >/dev/null 2>&1; then
5778
echo "✅ Cloud Function successfully loaded data to BigQuery table 'test_dataset.titanic'"
5879

5980
# Get row count
6081
ROW_COUNT=$(gcloud alpha bq query --project="$PROJECT_ID" --use_legacy_sql=false --format="value(f0_)" "SELECT COUNT(*) FROM \`$PROJECT_ID.test_dataset.titanic\`")
6182
echo "📊 Table contains $ROW_COUNT rows"
62-
# else
63-
# echo "❌ Cloud Function may have failed. Falling back to direct BigQuery load..."
64-
# gcloud alpha bq load \
65-
# --project="$PROJECT_ID" \
66-
# --source_format=CSV \
67-
# --skip_leading_rows=1 \
68-
# --autodetect \
69-
# "test_dataset.titanic" \
70-
# "gs://$BUCKET_NAME/titanic.csv"
71-
# echo "✅ Data loaded directly to BigQuery as fallback"
72-
# fi
83+
84+
# Verify data quality
85+
if [ "$ROW_COUNT" -gt 800 ]; then
86+
echo "✅ Data loading successful - expected row count achieved"
87+
else
88+
echo "⚠️ Warning: Row count ($ROW_COUNT) seems low for Titanic dataset"
89+
fi
90+
else
91+
echo "❌ Cloud Function may have failed. Falling back to direct BigQuery load..."
92+
gcloud alpha bq load \
93+
--project="$PROJECT_ID" \
94+
--source_format=CSV \
95+
--skip_leading_rows=1 \
96+
--autodetect \
97+
"test_dataset.titanic" \
98+
"gs://$BUCKET_NAME/titanic.csv"
99+
echo "✅ Data loaded directly to BigQuery as fallback"
100+
fi
73101

74102
# Clean up local file
75103
rm -f titanic.csv

0 commit comments

Comments
 (0)