Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,9 @@ coverage/
.nyc_output
*~
\#*#

# dotenv environment variables file
.env
env.json

*.env.list
66 changes: 66 additions & 0 deletions docs/add-new-harvester.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Add a new harvest source

This is a document in progress detailing the steps necessary to add a new harvest source.

## Community Engagement
When adding a new harvest source, the points to be considered are located at https://github.com/clearlydefined/clearlydefined/blob/master/docs/adding-sources.md#adding-a-new-harvest-source. Please include these considerations and document them in your GitHub issue, similar to https://github.com/clearlydefined/service/issues/882.

## Crawler
Need to implement:
Copy link
Collaborator

@elrayle elrayle Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Need to implement:
In the crawler, each harvest source must implement fetch and extract functions. The functions are prefixed with a name that identifies the source (e.g. mavenFetch, mavenExtract).
Implement:

- xxFetch, which is responsible for downloading package and registry information.
- xxExtract, which is responsible for creating a document based on the fetched information and queuing up scan tools and invoking source discovery.
- To enable the queuing of scan tools, config/map.js needs to be updated.

Example commit:
```
Commit: ea1618de0de4663b4d9aeecc9cdbc392edb2feba [ea1618d]
Parents: 968422b174
Author: Nell Shamrell [email protected]
Date: October 27, 2021 4:16:41 PM
Committer: Nell Shamrell
adds support for fetching and extracting go packages to the crawler
```

## Service

1. ClearlyDescribedSummarizer

Example commit:
```
Commit: 5e8b305f108b8cc9bd18c35ad5c626f71c081ef2 [5e8b305]
Parents: 6906969865
Author: Nell Shamrell [email protected]
Date: August 3, 2021 2:55:44 PM
Committer: Nell Shamrell
Commit Date: October 27, 2021 3:56:37 PM
adds in code and test for determining urls for a go package
```
2. /origin endpoint (for ui query)

Example commit:
```
Commit: 8c057670781451cfd7ca22337cecacae2124ac85 [8c05767]
Parents: 542c02763d
Author: Nell Shamrell [email protected]
Date: October 28, 2021 4:17:40 PM
Committer: Nell Shamrell
adds ability to get go package revisions through the service API
```
3. update validation schemas

Example commit:
```
Commit: 21e11c45b97c06170f436db498c534ff079443d3 [21e11c4]
Parents: 90c8414909
Author: Nell Shamrell [email protected]
Date: July 29, 2021 3:17:35 PM
Committer: Nell Shamrell
Commit Date: October 27, 2021 3:55:33 PM
adds go as a type in schemas
```
## Documentation
Adaptation to reflect the new harvest source in the following documents:
- service/README.md
- service/swagger.yaml
- service/docs/determining-declared-license.md
- clearlydefined/docs/adding-sources.md