Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,13 +211,14 @@ Community Scripts are a bunch of scripts made to solve specific issues. They are

They can be found in the [./community_scripts](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts) folder. Don't hesitate to submit yours!

| Community Script | Description | Documentation |
|----------------|-------------|---------------|
| Community Script | Description | Documentation |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| **Karakeep-Time-Tagger** | Automatically adds time-to-read tags (`0-5m`, `5-10m`, etc.) to bookmarks based on content length analysis. Includes systemd service and timer files for automated periodic execution. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/karakeep-time-tagger) |
| **Karakeep-List-To-Tag** | Converts a Karakeep list into tags by adding a specified tag to all bookmarks within that list. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/karakeep-list-to-tag) |
| **Omnivore2Karakeep-Highlights** | Imports highlights from Omnivore export data to Karakeep, with intelligent position detection and bookmark matching. Supports dry-run mode for testing. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/omnivore2karakeep-highlights) |
| **Omnivore2Karakeep-Archived** | (Should not be needed anymore) Fixes the archived status of bookmarks imported from Omnivore by reading export data and updating Karakeep accordingly. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/omnivore2karakeep-archived) |
| **pocket2karakeep-archived** | (Should not be needed anymore) Fixes the archived status of bookmarks imported from Pocket by reading export data and updating Karakeep accordingly. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/pocket2karakeep-archived) |
| **Karakeep-List-To-Tag** | Converts a Karakeep list into tags by adding a specified tag to all bookmarks within that list. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/karakeep-list-to-tag) |
| **Omnivore2Karakeep-Highlights** | Imports highlights from Omnivore export data to Karakeep, with intelligent position detection and bookmark matching. Supports dry-run mode for testing. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/omnivore2karakeep-highlights) |
| **Omnivore2Karakeep-Archived** | (Should not be needed anymore) Fixes the archived status of bookmarks imported from Omnivore by reading export data and updating Karakeep accordingly. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/omnivore2karakeep-archived) |
| **pocket2karakeep-archived** | (Should not be needed anymore) Fixes the archived status of bookmarks imported from Pocket by reading export data and updating Karakeep accordingly. | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/pocket2karakeep-archived) |
| **karakeep-archive-before-date** | Allow you to archive all not archived post before a given date | [`Link`](https://github.com/thiswillbeyourgithub/karakeep_python_api/tree/main/community_scripts/karakeep-archive-before-date) |

## Development

Expand Down
22 changes: 22 additions & 0 deletions community_scripts/karakeep-archive-before-date/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Karakeep Archive before date

Small cleaning script to clean old article not archived after an import from another readlater app.

## Prerequisites

N/A

## Usage

Define a date to limit archiving. All not archived bookmarks before this date will be archived.

```bash
python archiving_before_date.py --before-date 2023-12-24
```

`--before-date` format is `YYYY-MM-DD`

You might need to set up environment variables for the Karakeep API client or pass them as arguments if the script supports it (e.g., `KARAKEEP_PYTHON_API_BASE_URL` and `KARAKEEP_PYTHON_API_KEY`). Refer to the script's help or the `karakeep-python-api` documentation for more details on authentication.



Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
"""
Small script to clean old article not archived after an import from another readlater app.

Parameters:
before_date: Date in YYYY-MM-DD format. Articles created before this date will be archived.
"""
import time
from datetime import datetime

from Levenshtein import ratio
import pickle
from fire import Fire
from typing import Optional
from pathlib import Path
import json
import csv
from karakeep_python_api import KarakeepAPI
from tqdm import tqdm

VERSION: str = "1.0.0"

karakeep = KarakeepAPI(verbose=False)


def main(before_date: str) -> None:
"""Archive articles created before the specified date.

Args:
before_date: Date string in YYYY-MM-DD format
"""
before_date = datetime.strptime(before_date, "%Y-%m-%d")

n = karakeep.get_current_user_stats()["numBookmarks"]
pbar = tqdm(total=n, desc="Fetching bookmarks")
all_bm = []
batch_size = 100 # if you set it too high, you can crash the karakeep instance, 100 being the maximum allowed
page = karakeep.get_all_bookmarks(
include_content=False,
limit=batch_size,
)
all_bm.extend(page.bookmarks)
pbar.update(len(all_bm))
while page.nextCursor:
page = karakeep.get_all_bookmarks(
include_content=False,
limit=batch_size,
cursor=page.nextCursor,
)
all_bm.extend(page.bookmarks)
pbar.update(len(page.bookmarks))

assert (
len(all_bm) == n
), f"Only retrieved {len(all_bm)} bookmarks instead of {n}"
pbar.close()

failed = []
for bookmark in all_bm:

# skip already archived
if bookmark.archived:
continue


#tqdm.write(f"Creation Date: {bookmark.createdAt}")
creation_date = datetime.strptime(bookmark.createdAt, "%Y-%m-%dT%H:%M:%S.%fZ")

if creation_date > before_date:
continue

# do the archiving
retries = 3
for attempt in range(retries):
try:
res_arch = karakeep.update_a_bookmark(
bookmark_id=bookmark.id,
update_data={"archived": True},
)
break
except Exception as e:
if attempt == retries - 1:
raise e
tqdm.write(f"Update failed, retrying ({attempt + 1}/{retries})")
time.sleep(1)
if isinstance(res_arch, dict):
assert res_arch["archived"], res_arch
else:
assert res_arch.archived, res_arch
tqdm.write(f"Successfuly archived: {bookmark.title}")


if __name__ == "__main__":
Fire(main)
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
It identifies entries with status "archive" and updates their status in Karakeep.

"""

import time

from Levenshtein import ratio
Expand All @@ -20,6 +19,8 @@
from karakeep_python_api import KarakeepAPI
from tqdm import tqdm

VERSION: str = "1.1.0"

karakeep = KarakeepAPI(verbose=False)


Expand Down Expand Up @@ -137,7 +138,9 @@ def main(
elif hasattr(content, "sourceUrl"):
found_url = content.sourceUrl
else:
breakpoint()
found_url = ""



if found_url == url:
found_it = True
Expand Down Expand Up @@ -175,7 +178,7 @@ def main(
r = ratio(pocket["title"].lower(), content.title.lower())
if r >= threshold:
found_it = True
breakpoint()
#breakpoint()
break

if (
Expand All @@ -189,11 +192,12 @@ def main(
found_it = True
break


# couldn't be found
if not found_it:
failed.append(pocket)
tqdm.write(f"Failed to find {url}")
breakpoint()
#breakpoint()
with open("./omnivore_archiver_failed.txt", "a") as f:
f.write(f"\n{pocket}")
continue
Expand All @@ -202,15 +206,21 @@ def main(
if bookmark.archived:
tqdm.write(f"Already archived: {url}")
continue
fresh = karakeep.get_a_single_bookmark(
bookmark_id=bookmark.id, include_content=False
)
for attempt in range(5):
try:
fresh = karakeep.get_a_single_bookmark(bookmark_id=bookmark.id, include_content=False)
break
except Exception as e:
if attempt == 4:
raise e
tqdm.write(f"Get single bookmark failed, retrying ({attempt + 1}/5)")
time.sleep(1)
if fresh.archived:
tqdm.write(f"Already archived: {url}")
continue

# do the archiving
retries = 3
retries = 10
for attempt in range(retries):
try:
res_arch = karakeep.update_a_bookmark(
Expand Down