Instagram Archiver

Python versions PyPI - Version GitHub tag (with filter) License GitHub commits since latest release (by SemVer including pre-releases) CodeQL QA Tests Coverage Status Dependabot Documentation Status mypy uv pytest Ruff Downloads Stargazers pre-commit Prettier Follow @Tatsh Mastodon Follow

Commands

instagram-archiver

Archive a profile (USERNAME) or your saved posts (–saved).

Pass exactly one of: a USERNAME positional argument, or --saved/-s.

Usage

instagram-archiver [OPTIONS] [USERNAME]

Options

-o, --output-dir <output_dir>

Output directory. Defaults to the username (profile mode) or . (saved mode).

-b, --browser <browser>

Browser to read cookies from.

Options:

brave | chrome | chromium | edge | opera | vivaldi | firefox | safari

-p, --profile <profile>

Browser profile.

-d, --debug

Enable debug output.

-q, --quiet

Disable progress display updates.

-S, --sleep-time <sleep_time>

Number of seconds yt-dlp waits between requests.

--no-log

Ignore log (re-fetch everything).

-C, --include-comments

Also download all comments (extends download time significantly).

-R, --include-child-comments

Also recursively download child (reply) comments. Implies –include-comments.

-s, --saved

Archive your saved posts instead of a profile (mutually exclusive with USERNAME).

-u, --unsave

Unsave posts after successful archive (only with –saved).

Arguments

USERNAME

Optional argument

instagram-archiver -o ~/instagram-backups/username username
instagram-archiver --saved -o ~/instagram-backups/saved

In profile mode the default output path is the username under the current working directory; in --saved mode it is the current working directory.

Videos are saved using yt-dlp and its respective configuration.

Library

Generic client.

exception instagram_archiver.client.CSRFTokenNotFound

CSRF token not found in cookies.

class instagram_archiver.client.InstagramClient(browser: BrowserName = 'chrome', browser_profile: str = 'Default')

Generic asynchronous client for Instagram.

add_csrf_token_header() None

Add CSRF token header to the session.

Raises:

CSRFTokenNotFound – If the CSRF token is not found in the cookies.

add_video_url(url: str) None

Add a video URL to the list of video URLs.

Parameters:
url : str

URL to enqueue for yt-dlp.

async dispatch_edges(edges: Iterable[Edge], image_queue: asyncio.Queue[Edge | None], comments_queue: asyncio.Queue[Edge | None], video_queue: asyncio.Queue[str | None], *, parent_edge: Edge | None = None, stats: Stats | None = None, yt_dlp_state: YTDLPState | None = None) None

Dispatch edges to the appropriate worker queue.

Parameters:
edges : Iterable[Edge]

Edges to dispatch.

image_queue : asyncio.Queue[Edge | None]

Queue receiving non-video edges.

comments_queue : asyncio.Queue[Edge | None]

Queue receiving edges whose comments should also be saved.

video_queue : asyncio.Queue[str | None]

Queue receiving video URLs.

parent_edge : Edge | None

Optional parent edge used as a fallback for the shortcode lookup.

stats : Stats | None

Optional live statistics object whose POSTS_HANDLED counter is incremented for every dispatched edge.

yt_dlp_state : YTDLPState | None

Optional yt-dlp progress state whose total_urls counter is incremented for every URL routed to the video worker.

failed_urls : set[str]

Set of failed URLs.

async get_json(url: str, *, cast_to: type[T], headers: Mapping[str, str] | None = None, params: Mapping[str, str] | None = None) T

Get JSON data from a URL.

Parameters:
url : str

URL to fetch.

cast_to : type[T]

Expected type of the decoded JSON body.

headers : Mapping[str, str] | None

Optional per-call headers. When None (the default), API_HEADERS is used. Passing an explicit dict (typically API_HEADERS plus a Referer) lets callers like save_comments() mirror the per-post Referer the browser sends.

params : Mapping[str, str] | None

Optional query string parameters.

Returns:

Response body decoded from JSON.

Return type:

T

async get_text(url: str, *, params: Mapping[str, str] | None = None) str

Get text from a URL.

Parameters:
url : str

URL to fetch.

params : Mapping[str, str] | None

Optional query string parameters.

Returns:

Response body as text.

Return type:

str

async graphql_query(variables: Mapping[str, Any], *, cast_to: type[T], doc_id: str = '9806959572732215') T | None

Make a GraphQL query.

Parameters:
variables : Mapping[str, Any]

Variables passed to the query.

cast_to : type[T]

Expected type of the data field in a successful response.

doc_id : str

GraphQL document identifier.

Returns:

The data payload, or None if the request failed or the response was invalid.

Return type:

T | None

async highlights_tray(user_id: int | str) HighlightsTray

Get the highlights tray data for a user.

Parameters:
user_id : int | str

Instagram user identifier.

Returns:

Highlights tray payload from the API.

Return type:

HighlightsTray

is_saved(url: str) bool

Check if a URL is already saved.

Parameters:
url : str

URL to check.

Returns:

False in the base implementation.

Return type:

bool

Fetch a page of the PolarisStoriesV3 reel gallery.

Used to retrieve full story metadata (image and video items) for the supplied reels. The first page uses the ReelPageGalleryQuery document; subsequent pages (when after is supplied) use the pagination document.

reel_ids : Sequence[str]

Numeric reel identifiers (user IDs for current stories or numeric highlight IDs).

after : str | None

Cursor returned by a previous page, or None for the first page.

first : int

Maximum number of reels to return per page.

initial_reel_id : str | None

Reel that the user “opened first”. Defaults to the first entry of reel_ids.

is_highlight : bool

True when reel_ids refer to highlights, False for current stories.

The connection payload, or None when the request fails or the response shape is unexpected.

XDTStoriesV3ReelPageGalleryConnection | None

async save_comments(edge: Edge) None

Save comments for an edge node.

When should_save_child_comments is True, replies are also fetched for every top-level comment that reports having any (child_comment_count > 0) and embedded back into the saved JSON under the parent’s child_comments key.

Parameters:
edge : Edge

Edge whose comments should be saved.

async save_edges(edges: Iterable[Edge], parent_edge: Edge | None = None) None

Save edge node media.

Parameters:
edges : Iterable[Edge]

Edges to process.

parent_edge : Edge | None

Optional parent edge used as a fallback for the shortcode lookup.

async save_image_versions2(sub_item: CarouselMedia | MediaInfoItem | StoryReelItem, timestamp: int) None

Save images in the image_versions2 dictionary.

Parameters:
sub_item : CarouselMedia | MediaInfoItem | StoryReelItem

Source item containing image_versions2 candidates.

timestamp : int

Timestamp to apply to the saved file.

async save_media(edge: Edge) None

Save media for an edge node.

Parameters:
edge : Edge

Edge whose media should be saved.

Raises:

UnexpectedRedirect – If a redirect occurs unexpectedly.

async save_reel_item(item: StoryReelItem, video_queue: asyncio.Queue[str | None] | None = None, *, username: str | None = None, yt_dlp_state: YTDLPState | None = None) None

Save a single story item.

Image-only items are written via save_image_versions2(); items with a video are routed to video_queue for the yt-dlp worker (or appended to video_urls when no queue is supplied, mirroring the synchronous helper used elsewhere).

Parameters:
item : StoryReelItem

Story item payload from a reel page gallery response.

video_queue : asyncio.Queue[str | None] | None

Optional queue receiving permalinks for the yt-dlp worker. When None, video URLs are appended to video_urls instead.

username : str | None

Username of the reel owner. Used to build the stories/{username}/{pk}/ permalink for video items. Falls back to the literal "_" when not available, which yt-dlp still accepts because it identifies the story by pk.

yt_dlp_state : YTDLPState | None

Optional yt-dlp progress state whose total_urls counter is incremented when a video URL is enqueued.

save_to_log(url: str) None

Save a URL to the log.

Parameters:
url : str

URL to record.

session : AsyncSession

The niquests AsyncSession used for all HTTP calls.

should_save_child_comments : bool

Whether to recursively fetch child (reply) comments.

should_save_comments : bool

Whether to fetch comments. Subclasses or mixins flip this on.

video_urls : list[str]

List of video URLs to download.

exception instagram_archiver.client.UnexpectedRedirect

Unexpected redirect in a request.

Instagram profile scraper.

class instagram_archiver.profile_scraper.ProfileScraper(username: str, *, log_file: str | Path | None = None, output_dir: str | Path | None = None, disable_log: bool = False, browser: 'brave' | 'chrome' | 'chromium' | 'edge' | 'firefox' | 'opera' | 'safari' | 'vivaldi' = 'chrome', browser_profile: str = 'Default', child_comments: bool = False, comments: bool = False)

Scrape an Instagram profile timeline.

is_saved(url: str) bool

Check if a URL is already saved.

Parameters:
url : str

URL to check.

Returns:

False in the base implementation.

Return type:

bool

async process(ydl: AsyncYoutubeDL, *, fail: bool = False, on_cleanup: OnMessage | None = None, on_message: OnMessage | None = None, stats: Stats | None = None, yt_dlp_idle_event: asyncio.Event | None = None, yt_dlp_state: YTDLPState | None = None) None

Process posts in parallel using producer/consumer queues.

Parameters:
ydl : AsyncYoutubeDL

Configured yt-dlp wrapper.

fail : bool

Whether yt-dlp failures should abort processing.

on_cleanup : OnMessage | None

Optional callback that receives cleanup status updates.

on_message : OnMessage | None

Optional callback that receives progress text updates.

stats : Stats | None

Optional live statistics object.

yt_dlp_idle_event : asyncio.Event | None

Optional event that the video worker sets when idle.

yt_dlp_state : YTDLPState | None

Optional yt-dlp progress state shared with the video worker.

Raises:

asyncio.CancelledError – Re-raised when the producer is cancelled (typically from a termination signal).

save_to_log(url: str) None

Save a URL to the log.

Parameters:
url : str

URL to record.

Saved posts scraper.

class instagram_archiver.saved_scraper.SavedScraper(browser: BrowserName = 'chrome', browser_profile: str = 'Default', output_dir: str | Path | None = None, *, child_comments: bool = False, comments: bool = False, disable_log: bool = False, log_file: str | Path | None = None)

Scrape saved posts.

is_saved(url: str) bool

Check if a URL is already saved.

Parameters:
url : str

URL to check.

Returns:

False in the base implementation.

Return type:

bool

async process(ydl: AsyncYoutubeDL, *, fail: bool = False, on_cleanup: OnMessage | None = None, on_message: OnMessage | None = None, stats: Stats | None = None, unsave: bool = False, yt_dlp_idle_event: asyncio.Event | None = None, yt_dlp_state: YTDLPState | None = None) None

Process the saved posts in parallel using producer/consumer queues.

Parameters:
ydl : AsyncYoutubeDL

Configured yt-dlp wrapper.

fail : bool

Whether yt-dlp failures should abort processing.

on_cleanup : OnMessage | None

Optional callback that receives cleanup status updates.

on_message : OnMessage | None

Optional callback that receives progress text updates.

stats : Stats | None

Optional live statistics object.

unsave : bool

If True, unsave each post after dispatching it.

yt_dlp_idle_event : asyncio.Event | None

Optional event that the video worker sets when idle.

yt_dlp_state : YTDLPState | None

Optional yt-dlp progress state shared with the video worker.

Raises:

asyncio.CancelledError – Re-raised when the producer is cancelled (typically from a termination signal).

save_to_log(url: str) None

Save a URL to the log.

Parameters:
url : str

URL to record.

async unsave(items: Iterable[str]) None

Unsave saved posts.

Parameters:
items : Iterable[str]

Shortcodes to unsave.

Worker orchestration for asynchronous edge processing.

exception instagram_archiver.workers.WorkerAbort

Worker-level abort signal for graceful CLI handling.

async instagram_archiver.workers.comments_worker(comments_queue: asyncio.Queue[Edge | None], first_exception: list[BaseException], save_comments: collections.abc.Callable[[Edge], Awaitable[None]], stop_event: asyncio.Event, *, on_cleanup: OnMessage | None = None, on_message: OnMessage | None = None, stats: Stats | None = None) None

Save comments for posts sequentially.

Parameters:
comments_queue : asyncio.Queue[Edge | None]

Queue containing edge payloads whose comments should be saved. None is a shutdown sentinel.

first_exception : list[BaseException]

Mutable container for the first observed fatal exception.

save_comments : Callable[[Edge], Awaitable[None]]

Coroutine factory invoked once per edge to fetch comments.

stop_event : asyncio.Event

Event indicating that workers should stop.

on_cleanup : OnMessage | None

Optional callback that receives cleanup status updates.

on_message : OnMessage | None

Optional callback that receives progress text updates.

stats : Stats | None

Optional live statistics object updated after each comment thread.

async instagram_archiver.workers.image_worker(image_queue: asyncio.Queue[Edge | None], first_exception: list[BaseException], save_media: collections.abc.Callable[[Edge], Awaitable[None]], stop_event: asyncio.Event, *, on_cleanup: OnMessage | None = None, on_message: OnMessage | None = None, stats: Stats | None = None) None

Save image/post media sequentially.

Parameters:
image_queue : asyncio.Queue[Edge | None]

Queue containing edge payloads to save. None is a shutdown sentinel.

first_exception : list[BaseException]

Mutable container for the first observed fatal exception.

save_media : Callable[[Edge], Awaitable[None]]

Coroutine factory invoked once per edge to perform the download.

stop_event : asyncio.Event

Event indicating that workers should stop.

on_cleanup : OnMessage | None

Optional callback that receives cleanup status updates.

on_message : OnMessage | None

Optional callback that receives progress text updates.

stats : Stats | None

Optional live statistics object updated after each saved post.

async instagram_archiver.workers.video_worker(video_queue: asyncio.Queue[str | None], first_exception: list[BaseException], failed_urls: set[str], stop_event: asyncio.Event, *, fail: bool, idle_event: asyncio.Event | None = None, is_saved: collections.abc.Callable[[str], bool], on_cleanup: OnMessage | None = None, on_message: OnMessage | None = None, save_to_log: collections.abc.Callable[[str], None], stats: Stats | None = None, ydl: AsyncYoutubeDL, yt_dlp_state: YTDLPState | None = None) None

Process video URLs one yt-dlp download at a time.

Parameters:
video_queue : asyncio.Queue[str | None]

Queue containing video URLs. None is a shutdown sentinel.

first_exception : list[BaseException]

Mutable container for the first observed fatal exception.

failed_urls : set[str]

Set updated with URLs whose download did not produce any media.

stop_event : asyncio.Event

Event indicating that workers should stop.

fail : bool

Whether yt-dlp failures should abort processing.

idle_event : asyncio.Event | None

Optional event that is set when the worker is idle and cleared while a download is in progress.

is_saved : Callable[[str], bool]

Callback returning True if a URL has already been archived.

on_cleanup : OnMessage | None

Optional callback that receives cleanup status updates.

on_message : OnMessage | None

Optional callback that receives progress text updates.

save_to_log : Callable[[str], None]

Callback used to record a successfully downloaded URL.

stats : Stats | None

Optional live statistics object updated after each video URL.

ydl : AsyncYoutubeDL

Configured yt-dlp wrapper instance.

yt_dlp_state : YTDLPState | None

Optional yt-dlp progress state updated with the current URL and index.

SQLite-backed dedup log shared by the scrapers.

class instagram_archiver.dedup.LogDB(path: Path, *, disabled: bool = False)

SQLite-backed dedup log.

close() None

Close the underlying cursor and connection.

is_saved(url: str) bool

Check whether url has previously been recorded.

Parameters:
url : str

URL to check.

Returns:

True if the URL is in the log, False otherwise (or always when the log is disabled).

Return type:

bool

save(url: str) None

Record url in the log.

Parameters:
url : str

URL to record.

instagram_archiver.dedup.clean_url(url: str) str

Normalise a URL for dedup lookup by stripping its query string and fragment.

Parameters:
url : str

URL to normalise.

Returns:

URL with only its scheme, netloc, and path.

Return type:

str

Constants

Constants.

instagram_archiver.constants.API_HEADERS

Headers to use for API requests.

instagram_archiver.constants.BROWSER_CHOICES

Possible browser choices to get cookies from.

instagram_archiver.constants.PAGE_FETCH_HEADERS

Headers to use for fetching HTML pages.

instagram_archiver.constants.SHARED_HEADERS

Headers to use for requests.

The Sec-CH-UA* family must agree with USER_AGENT. Instagram’s edge runs a User-Agent ↔ client-hint consistency check on the /api/v1/... endpoints and falls back to the React app shell (HTML) when they disagree. The strings here mirror a captured browser request verbatim, including the GREASE entry and version-list ordering.

instagram_archiver.constants.USER_AGENT

User agent.

Modern Chrome on Linux. Must be sent together with the matching Sec-CH-UA* client-hint headers in SHARED_HEADERS; Instagram’s edge cross-references the two and serves the React app shell (HTML) for /api/v1/media/<pk>/comments/ and similar endpoints if they disagree. The exact strings here are taken verbatim from a captured browser request that Instagram successfully routed to the JSON API.

Typing

Typing helpers.

instagram_archiver.typing.BrowserName

Possible browser choices to get cookies from.

alias of Literal[‘brave’, ‘chrome’, ‘chromium’, ‘edge’, ‘firefox’, ‘opera’, ‘safari’, ‘vivaldi’]

instagram_archiver.typing.COMMENTS_PROCESSED

Counter key for posts whose comments have been saved successfully.

class instagram_archiver.typing.CarouselMedia
id : str

Identifier.

image_versions2 : MediaInfoItemImageVersions2

Image versions.

class instagram_archiver.typing.ChildCommentsPage

One page of replies under a top-level comment.

child_comments : Sequence[Mapping[str, Any]]

Replies returned on this page.

has_more_head_child_comments : NotRequired[bool]

Whether more replies exist forward of the current cursor.

has_more_tail_child_comments : NotRequired[bool]

Whether more replies exist behind the current cursor.

next_min_id : NotRequired[str]

Cursor for fetching the next page when paging forward.

class instagram_archiver.typing.Comments

Comments container.

can_view_more_preview_comments : bool

Whether more preview comments can be viewed.

comments : Sequence[HasID]

List of comments.

next_min_id : str

Next minimum ID for pagination.

class instagram_archiver.typing.Edge

Edge of a graph.

node : XDTMediaDict

Node at this edge.

class instagram_archiver.typing.HasID

Dictionary with an id field.

id : str

Identifier.

class instagram_archiver.typing.HighlightsTray
tray : Sequence[HighlightItem]

Highlights tray items.

instagram_archiver.typing.IMAGES_PROCESSED

Counter key for image posts that have been saved successfully.

class instagram_archiver.typing.MediaInfo

Media information.

class instagram_archiver.typing.MediaInfoItem

Media information item.

carousel_media : NotRequired[Sequence[CarouselMedia] | None]

Carousel media items.

id : str

Identifier.

image_versions2 : MediaInfoItemImageVersions2

Image versions.

taken_at : int

Timestamp when the media was taken

user : HasID

User who posted the media.

video_dash_manifest : NotRequired[str | None]

URL of the video dash manifest.

video_duration : float

Duration of the video in seconds.

video_versions : Sequence[MediaInfoItemVideoVersion]

Video versions.

class instagram_archiver.typing.MediaInfoItemImageVersions2Candidate
height : int

Height of the image.

url : str

URL of the image.

width : int

Width of the image.

instagram_archiver.typing.OnMessage

Callback used to report human-readable progress updates.

alias of Callable[[str], None]

instagram_archiver.typing.POSTS_HANDLED

Counter key for posts routed by the producer.

class instagram_archiver.typing.Stats

Live pipeline statistics shown in the progress spinner.

class instagram_archiver.typing.StoryReel

A reel (collection of story items belonging to a single user/highlight).

id : str

Reel identifier (numeric user ID for stories, numeric highlight ID for highlights).

user : NotRequired[HasID]

Owner of the reel.

class instagram_archiver.typing.StoryReelEdge

Edge wrapping a StoryReel.

node : StoryReel

Node at this edge.

class instagram_archiver.typing.StoryReelItem

A single story media item inside a reel.

code : NotRequired[str]

Optional shortcode of the story item.

id : str

Identifier.

image_versions2 : NotRequired[MediaInfoItemImageVersions2]

Image versions, when the item carries a still image.

media_type : NotRequired[int]

Instagram media type (1=image, 2=video, etc).

pk : str

Primary key.

taken_at : int

Timestamp when the media was taken.

user : NotRequired[HasID]

User who posted the story.

video_dash_manifest : NotRequired[str | None]

Video DASH manifest URL, if available.

video_versions : NotRequired[Sequence[MediaInfoItemVideoVersion]]

Video versions, if the item is a video.

class instagram_archiver.typing.UserInfo

User information.

edge_owner_to_timeline_media : EdgeOwnerToTimelineMedia

Timeline media edge.

id : str

User ID.

profile_pic_url_hd : str

Profile picture URL.

instagram_archiver.typing.VIDEOS_PROCESSED

Counter key for video URLs handed to yt-dlp successfully.

class instagram_archiver.typing.WebProfileInfo

Profile information container.

data : NotRequired[WebProfileInfoData]

Profile data.

class instagram_archiver.typing.WebProfileInfoData
user : UserInfo

User information.

class instagram_archiver.typing.XDTAPIV1FeedUserTimelineGraphQLConnection
edges : Sequence[Edge]

Edges of the graph.

page_info : PageInfo

Pagination information.

class instagram_archiver.typing.XDTAPIV1FeedUserTimelineGraphQLConnectionContainer

Container for XDTAPIV1FeedUserTimelineGraphQLConnection.

xdt_api__v1__feed__user_timeline_graphql_connection : XDTAPIV1FeedUserTimelineGraphQLConnection

User timeline data.

class instagram_archiver.typing.XDTMediaDict
code : str

Short code.

id : str

Media ID.

owner : Owner

Owner information.

pk : str

Primary key. Also carousel ID.

video_dash_manifest : NotRequired[str | None]

Video dash manifest URL, if available.

class instagram_archiver.typing.XDTStoriesV3ReelPageGalleryConnection

Connection for the PolarisStoriesV3 reel page gallery query.

edges : Sequence[StoryReelEdge]

Edges of the connection.

page_info : PageInfo

Pagination information.

class instagram_archiver.typing.XDTStoriesV3ReelPageGalleryQueryResponse

Container for XDTStoriesV3ReelPageGalleryConnection.

xdt_api__v1__feed__reels_media : XDTStoriesV3ReelPageGalleryConnection

Reels media connection payload.

class instagram_archiver.typing.YTDLPState(current_index: int = 0, current_url: str | None = None, total_urls: int = 0)

Mutable yt-dlp progress state shared between the producer and the yt-dlp worker.

current_index : int = 0

1-based index of the URL currently being processed.

current_url : str | None = None

URL yt-dlp is currently downloading, or None when idle.

render() str | None

Build the YT_DLP_STATUS value from the current state.

Returns:

Rendered status string, or None when no URL is active.

Return type:

str | None

total_urls : int = 0

Running total of URLs enqueued for the yt-dlp worker.

instagram_archiver.typing.YT_DLP_STATUS

Status-line key for the current yt-dlp URL.

Utilities

Utility functions.

class instagram_archiver.utils.JSONFormattedString(formatted: str, original: Any)

Contains a formatted version of the JSON str and the original value.

formatted

Formatted JSON string.

original_value

Original value.

exception instagram_archiver.utils.UnknownMimetypeError

Raised when an unknown mimetype is encountered in get_extension().

instagram_archiver.utils.dump_json(target: Path | str, obj: Any, *, mode: str = 'w') None

Dump obj to target as sorted, indented JSON.

Parameters:
target : Path | str

File path to write to.

obj : Any

Object to serialise.

mode : str

File open mode (typically 'w' or 'w+').

instagram_archiver.utils.get_extension(mimetype: str) str

Get the appropriate extension for a mimetype.

Parameters:
mimetype : str

Mimetype to be converted.

Returns:

File extension without the leading dot.

Return type:

str

Raises:

UnknownMimetypeError – If the mimetype is not recognised.

instagram_archiver.utils.json_dumps_formatted(obj: Any) JSONFormattedString

Return a special object with the formatted version of the JSON str and the original.

Parameters:
obj : Any

The object to be formatted.

Returns:

Formatted JSON text together with the original value.

Return type:

JSONFormattedString

instagram_archiver.utils.write_bytes(target: Path | str, content: bytes) None

Write bytes to a file.

Parameters:
target : Path | str

File path to write to.

content : bytes

Bytes to write.

instagram_archiver.utils.write_failed_urls(target: Path | str, urls: Iterable[str]) None

Write a newline-separated list of URLs to target.

Parameters:
target : Path | str

File path to write to.

urls : Iterable[str]

URLs to write, one per line.

instagram_archiver.utils.write_if_new(target: Path | str, content: str | bytes, mode: str = 'w') None

Write a file only if it will be a new file.

Indices and tables