Skip to content

Conversation

Copy link

Copilot AI commented Nov 4, 2025

Repositories with deeply nested directories (e.g., blanchon/ETCI-2021-Flood-Detection) timeout due to unbounded recursive API calls via Promise.all.

Changes

Added depth limiting

  • Max recursion depth of 10 levels to prevent excessive traversal
  • Returns empty array with warning when limit reached

Added concurrency control

  • Limits parallel API requests to 5 concurrent operations
  • Uses Set-based tracking for O(1) promise lifecycle management

Implementation

// Before: unbounded parallel recursion
const result = await Promise.all(
  data.map(async (item) => {
    if (item.type === 'directory') return await getMetaData(basePath, item.path);
    return item;
  })
);

// After: depth-limited with concurrency control
async function getMetaDataInternal(basePath, filepath, depth = 0, maxDepth = 10) {
  if (depth >= maxDepth) return [];
  
  const result = await limitConcurrency(data, 5, async (item) => {
    if (item.type === 'directory') {
      return await getMetaDataInternal(basePath, item.path, depth + 1, maxDepth);
    }
    return item;
  });
  
  return result.flat(Infinity);
}

Documentation

  • Updated README files (EN, zh-CN, zh-TW) documenting the 10-level depth and 5-concurrent limits

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • hf-mirror.com
    • Triggering command: curl -s REDACTED (dns block)
    • Triggering command: curl -H Accept: application/json REDACTED (dns block)
    • Triggering command: node --experimental-vm-modules --trace-warnings node_modules/jest/bin/jest.js (dns block)
  • huggingface.co
    • Triggering command: curl -H Accept: application/json REDACTED (dns block)
  • registry.npmmirror.com
    • Triggering command: npm install (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>超深超多文件仓库解析超时</issue_title>
<issue_description>例如:https://huggingface.co/datasets/blanchon/ETCI-2021-Flood-Detection</issue_description>

<agent_instructions>Guarantee that the previous features are available</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@DSYZayn DSYZayn requested review from DSYZayn and Copilot November 4, 2025 17:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds concurrency control and recursion depth limiting to the metadata fetching functionality to prevent timeouts when parsing extremely deep repository structures on HuggingFace.

Key changes:

  • Implements a limitConcurrency function to control concurrent API requests (max 5 concurrent)
  • Adds maximum recursion depth limit (10 levels) with early termination
  • Refactors the main function to use an internal implementation with depth tracking

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/api/getMetaData.js Adds concurrency control and depth limiting to prevent timeouts
README.md Documents the new 10-level depth limit and 5 concurrent subdirectory processing
readme/README.zh-CN.md Chinese documentation for depth and concurrency limits
readme/README.zh-TW.md Traditional Chinese documentation for depth and concurrency limits
dist/index.js Minified production build reflecting the source code changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 13 to 19
const promise = handler(item).then((result) => {
executing.delete(promise);
return result;
});

results.push(promise);
executing.add(promise);
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexOf operation on line 14 will fail to find the promise because it's called before the promise is added to the executing array. This creates a race condition where the promise may not be properly removed from the tracking array. Store the promise first, then create the cleanup handler, or use a different approach like tracking with a Map or wrapping in an immediately resolved promise.

Suggested change
const promise = handler(item).then((result) => {
executing.delete(promise);
return result;
});
results.push(promise);
executing.add(promise);
let resolveWrapper;
const wrapperPromise = new Promise((resolve, reject) => {
resolveWrapper = resolve;
});
executing.push(wrapperPromise);
const promise = handler(item)
.then((result) => {
executing.splice(executing.indexOf(wrapperPromise), 1);
resolveWrapper(result);
return result;
})
.catch((err) => {
executing.splice(executing.indexOf(wrapperPromise), 1);
// propagate error to wrapperPromise
resolveWrapper(); // resolve to allow Promise.race to proceed
throw err;
});
results.push(promise);

Copilot uses AI. Check for mistakes.
Copilot AI changed the title [WIP] Fix repository parsing timeout for large datasets Fix timeout on deep repository parsing with depth limiting and concurrency control Nov 4, 2025
Copilot finished work on behalf of DSYZayn November 4, 2025 17:33
*/
export default async function getMetaData(basePath, filepath) {
return getMetaDataInternal(basePath, filepath, 0, 10);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me but you need to adapt my unit test scripts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

超深超多文件仓库解析超时

2 participants