Skip to content

Adding of reduce index over existing documents does not work #660

@AndreySurkov

Description

@AndreySurkov

Hi. I have following situation inside of OrchardCore application. there is an exising database with large number of ContentItems. At some point I desided to intriduce new ReduceIndex to calculate some statistics. I am ok for now that previously saved Items will not be included in Index, but at least I expect that if I re-save element it will appers in index.
But when I re-save document it call index Reduce and then Delete. So results are incorrect. I reproduced same behaviour on \YesSql.Samples.FullText.csproj project made some adjustments in Program.cs.

 public class Program
 {
     public static async Task Main(string[] args)
     {
         var filename = "yessql.db";

         if (File.Exists(filename))
         {
             File.Delete(filename);
         }

         var configuration = new Configuration()
             .UseSqLite($"Data Source={filename};Cache=Shared")
             ;

         var store = await StoreFactory.CreateAndInitializeAsync(configuration);

         // creating article without any index
         await using (var session = store.CreateSession())
         {
             await session.SaveAsync(new Article { Content = "This is a green fox" });
             await session.SaveChangesAsync();
         }

         // Recreate store to emulate late Index appending
         store.Dispose();
         store = await StoreFactory.CreateAndInitializeAsync(configuration);
         await using (var connection = store.Configuration.ConnectionFactory.CreateConnection())
         {
             await connection.OpenAsync();

             await using var transaction = await connection.BeginTransactionAsync(store.Configuration.IsolationLevel);
             var builder = new SchemaBuilder(store.Configuration, transaction);

             await builder.CreateReduceIndexTableAsync<ArticleByWord>(table => table
                 .Column<int>("Count")
                 .Column<string>("Word")
             );

             await transaction.CommitAsync();
         }

         // register available indexes
         store.RegisterIndexes<ArticleIndexProvider>();

        // Update document
         await using (var session = store.CreateSession())
         {
             var someArticle = await session.Query<Article>().FirstOrDefaultAsync();
             someArticle.Content = "This is a green wolf";
             await session.SaveAsync(someArticle);
             await session.SaveChangesAsync();

         }

         // Find any documents 
         await using (var session = store.CreateSession())
         {
             Console.WriteLine("Simple term: 'green'");
             var simple = await session
                 .Query<Article, ArticleByWord>(x => x.Word == "green")
                 .ListAsync();

             foreach (var article in simple)
             {
                 Console.WriteLine(article.Content);
             }
         }
     }
 }

In ArticlesByWords I see then

Id Count Word
1 0 This
2 0 is
3 0 green
4 1 wolf

And ArticlesByWords_Document table contains 4 records

This seems like a fairly common situation to me. Are there any recommendations on this ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions