Skip to content

Commit 61bfb09

Browse files
authored
improve the COPY INTO table (#2683)
1 parent 584b768 commit 61bfb09

File tree

1 file changed

+109
-39
lines changed

1 file changed

+109
-39
lines changed

docs/en/sql-reference/10-sql-commands/10-dml/dml-copy-into-table.md

Lines changed: 109 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,8 @@ externalLocation ::=
6464
/* Amazon S3-like Storage */
6565
's3://<bucket>[/<path>]'
6666
CONNECTION = (
67-
[ ENDPOINT_URL = '<endpoint-url>' ]
67+
[ CONNECTION_NAME = '<connection-name>' ]
68+
| [ ENDPOINT_URL = '<endpoint-url>' ]
6869
[ ACCESS_KEY_ID = '<your-access-key-ID>' ]
6970
[ SECRET_ACCESS_KEY = '<your-secret-access-key>' ]
7071
[ ENABLE_VIRTUAL_HOST_STYLE = TRUE | FALSE ]
@@ -78,21 +79,24 @@ externalLocation ::=
7879
/* Azure Blob Storage */
7980
| 'azblob://<container>[/<path>]'
8081
CONNECTION = (
81-
ENDPOINT_URL = '<endpoint-url>'
82+
[ CONNECTION_NAME = '<connection-name>' ]
83+
| ENDPOINT_URL = '<endpoint-url>'
8284
ACCOUNT_NAME = '<account-name>'
8385
ACCOUNT_KEY = '<account-key>'
8486
)
8587

8688
/* Google Cloud Storage */
8789
| 'gcs://<bucket>[/<path>]'
8890
CONNECTION = (
89-
CREDENTIAL = '<your-base64-encoded-credential>'
91+
[ CONNECTION_NAME = '<connection-name>' ]
92+
| CREDENTIAL = '<your-base64-encoded-credential>'
9093
)
9194

9295
/* Alibaba Cloud OSS */
9396
| 'oss://<bucket>[/<path>]'
9497
CONNECTION = (
95-
ACCESS_KEY_ID = '<your-ak>'
98+
[ CONNECTION_NAME = '<connection-name>' ]
99+
| ACCESS_KEY_ID = '<your-ak>'
96100
ACCESS_KEY_SECRET = '<your-sk>'
97101
ENDPOINT_URL = '<endpoint-url>'
98102
[ PRESIGN_ENDPOINT_URL = '<presign-endpoint-url>' ]
@@ -101,7 +105,8 @@ externalLocation ::=
101105
/* Tencent Cloud Object Storage */
102106
| 'cos://<bucket>[/<path>]'
103107
CONNECTION = (
104-
SECRET_ID = '<your-secret-id>'
108+
[ CONNECTION_NAME = '<connection-name>' ]
109+
| SECRET_ID = '<your-secret-id>'
105110
SECRET_KEY = '<your-secret-key>'
106111
ENDPOINT_URL = '<endpoint-url>'
107112
)
@@ -183,13 +188,18 @@ For remote files, you can use glob patterns to specify multiple files. For examp
183188

184189
The `FILE_FORMAT` parameter supports different file types, each with specific formatting options. Below are the available options for each supported file format:
185190

186-
### Common Options for All Formats
191+
<Tabs>
192+
<TabItem value="common" label="Common Options" default>
193+
194+
These options are available for all file formats:
187195

188196
| Option | Description | Values | Default |
189197
|--------|-------------|--------|--------|
190198
| COMPRESSION | Compression algorithm for data files | AUTO, GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, XZ, NONE | AUTO |
191199

192-
### TYPE = CSV
200+
</TabItem>
201+
202+
<TabItem value="csv" label="CSV">
193203

194204
| Option | Description | Default |
195205
|--------|-------------|--------|
@@ -204,39 +214,52 @@ The `FILE_FORMAT` parameter supports different file types, each with specific fo
204214
| EMPTY_FIELD_AS | How to handle empty fields | null |
205215
| BINARY_FORMAT | Encoding format(HEX or BASE64) for binary data | HEX |
206216

207-
### TYPE = TSV
217+
</TabItem>
218+
219+
<TabItem value="tsv" label="TSV">
208220

209221
| Option | Description | Default |
210222
|--------|-------------|--------|
211223
| RECORD_DELIMITER | Character(s) separating records | newline |
212224
| FIELD_DELIMITER | Character(s) separating fields | tab (\t) |
213225

214-
### TYPE = NDJSON
226+
</TabItem>
227+
228+
<TabItem value="ndjson" label="NDJSON">
215229

216230
| Option | Description | Default |
217231
|--------|-------------|--------|
218232
| NULL_FIELD_AS | How to handle null fields | NULL |
219233
| MISSING_FIELD_AS | How to handle missing fields | ERROR |
220234
| ALLOW_DUPLICATE_KEYS | Allow duplicate object keys | FALSE |
221235

222-
### TYPE = PARQUET
236+
</TabItem>
237+
238+
<TabItem value="parquet" label="PARQUET">
223239

224240
| Option | Description | Default |
225241
|--------|-------------|--------|
226242
| MISSING_FIELD_AS | How to handle missing fields | ERROR |
227243

228-
### TYPE = ORC
244+
</TabItem>
245+
246+
<TabItem value="orc" label="ORC">
229247

230248
| Option | Description | Default |
231249
|--------|-------------|--------|
232250
| MISSING_FIELD_AS | How to handle missing fields | ERROR |
233251

234-
### TYPE = AVRO
252+
</TabItem>
253+
254+
<TabItem value="avro" label="AVRO">
235255

236256
| Option | Description | Default |
237257
|--------|-------------|--------|
238258
| MISSING_FIELD_AS | How to handle missing fields | ERROR |
239259

260+
</TabItem>
261+
</Tabs>
262+
240263
## Copy Options
241264

242265
| Parameter | Description | Default |
@@ -270,6 +293,10 @@ If `RETURN_FAILED_ONLY` is set to `true`, the output will only contain the files
270293

271294
## Examples
272295

296+
:::tip Best Practice
297+
For external storage sources, it's recommended to use pre-created connections with the `CONNECTION_NAME` parameter instead of specifying credentials directly in the COPY statement. This approach provides better security, maintainability, and reusability. See [CREATE CONNECTION](../00-ddl/13-connection/create-connection.md) for details on creating connections.
298+
:::
299+
273300
### Example 1: Loading from Stages
274301

275302
These examples showcase data loading into Databend from various types of stages:
@@ -314,16 +341,19 @@ These examples showcase data loading into Databend from various types of externa
314341
<Tabs groupId="external-example">
315342
<TabItem value="Amazon S3" label="Amazon S3">
316343

317-
This example establishes a connection to Amazon S3 using AWS access keys and secrets, and it loads 10 rows from a CSV file:
344+
This example uses a pre-created connection to load data from Amazon S3:
318345

319346
```sql
320-
-- Authenticated by AWS access keys and secrets.
347+
-- First create a connection (you only need to do this once)
348+
CREATE CONNECTION my_s3_conn
349+
STORAGE_TYPE = 's3'
350+
ACCESS_KEY_ID = '<your-access-key-ID>'
351+
SECRET_ACCESS_KEY = '<your-secret-access-key>';
352+
353+
-- Use the connection to load data
321354
COPY INTO mytable
322355
FROM 's3://mybucket/data.csv'
323-
CONNECTION = (
324-
ACCESS_KEY_ID = '<your-access-key-ID>',
325-
SECRET_ACCESS_KEY = '<your-secret-access-key>'
326-
)
356+
CONNECTION = (CONNECTION_NAME = 'my_s3_conn')
327357
FILE_FORMAT = (
328358
TYPE = CSV,
329359
FIELD_DELIMITER = ',',
@@ -333,19 +363,20 @@ COPY INTO mytable
333363
SIZE_LIMIT = 10;
334364
```
335365

336-
This example connects to Amazon S3 using AWS IAM role authentication with an external ID and loads CSV files matching the specified pattern from 'mybucket':
366+
**Using IAM Role (Recommended for Production)**
337367

338368
```sql
339-
-- Authenticated by AWS IAM role and external ID.
369+
-- Create connection using IAM role (more secure, recommended for production)
370+
CREATE CONNECTION my_iam_conn
371+
STORAGE_TYPE = 's3'
372+
ROLE_ARN = 'arn:aws:iam::123456789012:role/my_iam_role';
373+
374+
-- Load CSV files using the IAM role connection
340375
COPY INTO mytable
341376
FROM 's3://mybucket/'
342-
CONNECTION = (
343-
ENDPOINT_URL = 'https://<endpoint-URL>',
344-
ROLE_ARN = 'arn:aws:iam::123456789012:role/my_iam_role',
345-
EXTERNAL_ID = '123456'
346-
)
377+
CONNECTION = (CONNECTION_NAME = 'my_iam_conn')
347378
PATTERN = '.*[.]csv'
348-
FILE_FORMAT = (
379+
FILE_FORMAT = (
349380
TYPE = CSV,
350381
FIELD_DELIMITER = ',',
351382
RECORD_DELIMITER = '\n',
@@ -360,18 +391,46 @@ COPY INTO mytable
360391
This example connects to Azure Blob Storage and loads data from 'data.csv' into Databend:
361392

362393
```sql
394+
-- Create connection for Azure Blob Storage
395+
CREATE CONNECTION my_azure_conn
396+
STORAGE_TYPE = 'azblob'
397+
ENDPOINT_URL = 'https://<account_name>.blob.core.windows.net'
398+
ACCOUNT_NAME = '<account_name>'
399+
ACCOUNT_KEY = '<account_key>';
400+
401+
-- Use the connection to load data
363402
COPY INTO mytable
364403
FROM 'azblob://mybucket/data.csv'
365-
CONNECTION = (
366-
ENDPOINT_URL = 'https://<account_name>.blob.core.windows.net',
367-
ACCOUNT_NAME = '<account_name>',
368-
ACCOUNT_KEY = '<account_key>'
369-
)
404+
CONNECTION = (CONNECTION_NAME = 'my_azure_conn')
370405
FILE_FORMAT = (type = CSV);
371406
```
372407

373408
</TabItem>
374409

410+
<TabItem value="Google Cloud Storage" label="Google Cloud Storage">
411+
412+
This example connects to Google Cloud Storage and loads data:
413+
414+
```sql
415+
-- Create connection for Google Cloud Storage
416+
CREATE CONNECTION my_gcs_conn
417+
STORAGE_TYPE = 'gcs'
418+
CREDENTIAL = '<your-base64-encoded-credential>';
419+
420+
-- Use the connection to load data
421+
COPY INTO mytable
422+
FROM 'gcs://mybucket/data.csv'
423+
CONNECTION = (CONNECTION_NAME = 'my_gcs_conn')
424+
FILE_FORMAT = (
425+
TYPE = CSV,
426+
FIELD_DELIMITER = ',',
427+
RECORD_DELIMITER = '\n',
428+
SKIP_HEADER = 1
429+
);
430+
```
431+
432+
</TabItem>
433+
375434
<TabItem value="Remote Files" label="Remote Files">
376435

377436
This example loads data from three remote CSV files and skips a file in case of errors.
@@ -411,13 +470,16 @@ COPY INTO mytable
411470
This example loads a GZIP-compressed CSV file on Amazon S3 into Databend:
412471

413472
```sql
473+
-- Create connection for compressed data loading
474+
CREATE CONNECTION compressed_s3_conn
475+
STORAGE_TYPE = 's3'
476+
ACCESS_KEY_ID = '<your-access-key-ID>'
477+
SECRET_ACCESS_KEY = '<your-secret-access-key>';
478+
479+
-- Load GZIP-compressed CSV file using the connection
414480
COPY INTO mytable
415481
FROM 's3://mybucket/data.csv.gz'
416-
CONNECTION = (
417-
ENDPOINT_URL = 'https://<endpoint-URL>',
418-
ACCESS_KEY_ID = '<your-access-key-ID>',
419-
SECRET_ACCESS_KEY = '<your-secret-access-key>'
420-
)
482+
CONNECTION = (CONNECTION_NAME = 'compressed_s3_conn')
421483
FILE_FORMAT = (
422484
TYPE = CSV,
423485
FIELD_DELIMITER = ',',
@@ -432,8 +494,16 @@ COPY INTO mytable
432494
This example demonstrates how to load CSV files from Amazon S3 using pattern matching with the PATTERN parameter. It filters files with 'sales' in their names and '.csv' extensions:
433495

434496
```sql
497+
-- Create connection for pattern-based file loading
498+
CREATE CONNECTION pattern_s3_conn
499+
STORAGE_TYPE = 's3'
500+
ACCESS_KEY_ID = '<your-access-key-ID>'
501+
SECRET_ACCESS_KEY = '<your-secret-access-key>';
502+
503+
-- Load CSV files with 'sales' in their names using pattern matching
435504
COPY INTO mytable
436505
FROM 's3://mybucket/'
506+
CONNECTION = (CONNECTION_NAME = 'pattern_s3_conn')
437507
PATTERN = '.*sales.*[.]csv'
438508
FILE_FORMAT = (
439509
TYPE = CSV,
@@ -445,19 +515,19 @@ COPY INTO mytable
445515

446516
Where `.*` is interpreted as zero or more occurrences of any character. The square brackets escape the period character `.` that precedes a file extension.
447517

448-
To load from all the CSV files:
518+
To load from all the CSV files using a connection:
449519

450520
```sql
451521
COPY INTO mytable
452522
FROM 's3://mybucket/'
523+
CONNECTION = (CONNECTION_NAME = 'pattern_s3_conn')
453524
PATTERN = '.*[.]csv'
454525
FILE_FORMAT = (
455526
TYPE = CSV,
456527
FIELD_DELIMITER = ',',
457528
RECORD_DELIMITER = '\n',
458529
SKIP_HEADER = 1
459530
);
460-
461531
```
462532

463533
When specifying the pattern for a file path including multiple folders, consider your matching criteria:
@@ -605,7 +675,7 @@ DESC t2;
605675
An error would occur when attempting to load the data into a table:
606676

607677
```sql
608-
root@localhost:8000/default> COPY INTO t2 FROM @~/invalid_json_string.parquet FILE_FORMAT = (TYPE = PARQUET) ON_ERROR = CONTINUE;
678+
COPY INTO t2 FROM @~/invalid_json_string.parquet FILE_FORMAT = (TYPE = PARQUET) ON_ERROR = CONTINUE;
609679
error: APIError: ResponseError with 1006: EOF while parsing a value, pos 3 while evaluating function `parse_json('[1,')`
610680
```
611681

0 commit comments

Comments
 (0)