Skip to content

Backfil job fails when sharding on datetime[ns] #1291

@FirefoxMetzger

Description

@FirefoxMetzger

Chances are that this is a user error on my part, though I couldn't work it out from the docs. Figured I'll ask here so that we can see if there is a way to improve the docs and/or if there is an issue.

I'm trying to create a sharded/partitioned feature group, which uses both a primary key and a partition key. While the feature group is created successfully, I can't seem to be able to insert data into it:

import pandas as pd
import numpy as np
import hopsworks
from datetime import datetime, timedelta

rng = np.random.default_rng(1234)

# insertion using numpy types works fine
np_data = (
    pd.DataFrame({
        "index": np.arange(10),
        "feature": np.arange(10, 0, -1),
    })
    .astype(np.int8)
    .assign(event_time=[datetime.now()+timedelta(seconds=int(x)) for x in rng.integers(0, 100, 10)])
)

ctx = hopsworks.login()
fs = ctx.get_feature_store()

feature_group = fs.get_or_create_feature_group(
    name="foo",
    version="1",
    description="an example",
    primary_key=["index"],
    partition_key=["event_time"]
)

feature_group.insert(np_data)  # FeatureStoreException

Here is a link to the (failed) backfill job: https://c.app.hopsworks.ai/p/16549/jobs/named/foo_1_offline_fg_backfill/executions
(I can also share the logs if necessary.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions