-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Open
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalcommunity-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityusability
Description
Description
I use a method to split the values of some columns and generate a new column where each element contains all parts of the split. However, I can't easily flatten the output. Currently, I can only use map_batches or flat_map to handle the data myself.
Use cases:
- string split
- audio split
- video split
What I want is the explode expression. This is the daft example:
import daft
from daft.functions import explode
df = daft.from_pydict({"id": [1, 2, 3], "sentence": ["lorem ipsum", "foo bar baz", "hi"]})
df.with_column("word", explode(df["sentence"].split(" "))).show()
╭───────┬─────────────┬────────╮
│ id ┆ sentence ┆ word │
│ --- ┆ --- ┆ --- │
│ Int64 ┆ String ┆ String │
╞═══════╪═════════════╪════════╡
│ 1 ┆ lorem ipsum ┆ lorem │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ lorem ipsum ┆ ipsum │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ foo │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ bar │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ baz │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ hi ┆ hi │
╰───────┴─────────────┴────────╯
(Showing first 6 of 6 rows)Use case
import pandas as pd
import ray.data
from ray.data.expressions import explode, col
df = pd.DataFrame({"id": [1, 2, 3], "sentence": ["lorem ipsum", "foo bar baz", "hi"]})
df = ray.data.from_pandas(df)
df.with_column("word", explode(col["sentence"].split(" "))).show()
╭───────┬─────────────┬────────╮
│ id ┆ sentence ┆ word │
│ --- ┆ --- ┆ --- │
│ Int64 ┆ String ┆ String │
╞═══════╪═════════════╪════════╡
│ 1 ┆ lorem ipsum ┆ lorem │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ lorem ipsum ┆ ipsum │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ foo │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ bar │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ baz │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ hi ┆ hi │
╰───────┴─────────────┴────────╯
(Showing first 6 of 6 rows)Metadata
Metadata
Assignees
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalcommunity-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityusability