How to get the scraper to not store original HTML files? #233

nishalsach · 2022-07-25T19:30:35Z

nishalsach
Jul 25, 2022

Hi, the documentation says that in the default config, the tool also stores the HTML files from websites, when running in CLI mode. I wanted to ask if there was a way to switch this off and only store the extracted JSON. I couldn't find out how to do so in the config file. Any help would be appreciated!

nishalsach · 2024-03-05T22:13:51Z

nishalsach
Mar 5, 2024
Author

In case anybody else was curious, I resolved this by setting up a cron job in a new command line session that would delete all .html files in the news-please subdirectory for the chosen date every 2 minutes or so.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to get the scraper to not store original HTML files? #233

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to get the scraper to not store original HTML files? #233

Uh oh!

nishalsach Jul 25, 2022

Replies: 1 comment

Uh oh!

nishalsach Mar 5, 2024 Author

nishalsach
Jul 25, 2022

nishalsach
Mar 5, 2024
Author