-
Notifications
You must be signed in to change notification settings - Fork 164
components oss_distillation_generate_data_batch_postprocess
github-actions[bot] edited this page Nov 13, 2024
·
1 revision
Component to prepare data returned from teacher model enpoint in batch
Version: 0.0.1
View in Studio: https://ml.azure.com/registries/azureml/components/oss_distillation_generate_data_batch_postprocess/version/0.0.1
Inputs
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| train_file_path | Path to the registered training data asset. The supported data formats are jsonl, json, csv, tsv and parquet. |
uri_file | |||
| validation_file_path | Path to the registered validation data asset. The supported data formats are jsonl, json, csv, tsv and parquet. |
uri_file | True | ||
| hash_train_data | jsonl file containing the hash for each payload. | uri_file | False | ||
| hash_validation_data | jsonl file containing the hash for each payload. | uri_file | True | ||
| batch_score_train_result | Path to the directory containing jsonl file(s) that have the result for each payload. | uri_folder | |||
| batch_score_validation_result | Path to the directory containing jsonl file(s) that have the result for each payload. | uri_folder | True | ||
| min_endpoint_success_ratio | The minimum value of (successful_requests / total_requests) required for classifying inference as successful. If (successful_requests / total_requests) < min_endpoint_success_ratio, the experiment will be marked as failed. By default it is 0.7 (0 means all requests are allowed to fail while 1 means no request should fail.) | number | 0.7 | ||
| enable_chain_of_thought | Enable Chain of thought for data generation | string | false | True | |
| enable_chain_of_density | Enable Chain of density for text summarization | string | false | True | |
| data_generation_task_type | Data generation task type. Supported values are: 1. NLI: Generate Natural Language Inference data 2. CONVERSATION: Generate conversational data (multi/single turn) 3. NLU_QA: Generate Natural Language Understanding data for Question Answering data 4. MATH: Generate Math data for numerical responses 5. SUMMARIZATION: Generate Key Summary for an Article | string | ['NLI', 'CONVERSATION', 'NLU_QA', 'MATH', 'SUMMARIZATION'] | ||
| connection_config_file | Connection config file for batch scoring | uri_file |
| Name | Description | Type |
|---|---|---|
| generated_batch_train_file_path | Generated train data | uri_file |
| generated_batch_validation_file_path | Generated validation data | uri_file |
azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/76