Instagram Archiver¶
Commands¶
instagram-archiver¶
Archive a profile (USERNAME) or your saved posts (–saved).
Pass exactly one of: a USERNAME positional argument, or --saved/-s.
Usage
instagram-archiver [OPTIONS] [USERNAME]
Options
- -o, --output-dir <output_dir>¶
Output directory. Defaults to the username (profile mode) or . (saved mode).
- -b, --browser <browser>¶
Browser to read cookies from.
- Options:¶
brave | chrome | chromium | edge | opera | vivaldi | firefox | safari
- -p, --profile <profile>¶
Browser profile.
- -d, --debug¶
Enable debug output.
- -q, --quiet¶
Disable progress display updates.
- -S, --sleep-time <sleep_time>¶
Number of seconds yt-dlp waits between requests.
- --no-log¶
Ignore log (re-fetch everything).
- -C, --include-comments¶
Also download all comments (extends download time significantly).
- -R, --include-child-comments¶
Also recursively download child (reply) comments. Implies –include-comments.
- -s, --saved¶
Archive your saved posts instead of a profile (mutually exclusive with USERNAME).
- -u, --unsave¶
Unsave posts after successful archive (only with –saved).
Arguments
- USERNAME¶
Optional argument
instagram-archiver -o ~/instagram-backups/username username
instagram-archiver --saved -o ~/instagram-backups/saved
In profile mode the default output path is the username under the current working directory; in
--saved mode it is the current working directory.
Videos are saved using yt-dlp and its respective configuration.
Library¶
Generic client.
- exception instagram_archiver.client.CSRFTokenNotFound¶
CSRF token not found in cookies.
-
class instagram_archiver.client.InstagramClient(browser: BrowserName =
'chrome', browser_profile: str ='Default')¶ Generic asynchronous client for Instagram.
- add_csrf_token_header() None¶
Add CSRF token header to the session.
- Raises:¶
CSRFTokenNotFound – If the CSRF token is not found in the cookies.
-
async dispatch_edges(edges: Iterable[Edge], image_queue: asyncio.Queue[Edge | None], comments_queue: asyncio.Queue[Edge | None], video_queue: asyncio.Queue[str | None], *, parent_edge: Edge | None =
None, stats: Stats | None =None, yt_dlp_state: YTDLPState | None =None) None¶ Dispatch edges to the appropriate worker queue.
- Parameters:¶
- edges : Iterable[Edge]¶
Edges to dispatch.
- image_queue : asyncio.Queue[Edge | None]¶
Queue receiving non-video edges.
- comments_queue : asyncio.Queue[Edge | None]¶
Queue receiving edges whose comments should also be saved.
- video_queue : asyncio.Queue[str | None]¶
Queue receiving video URLs.
- parent_edge : Edge | None¶
Optional parent edge used as a fallback for the shortcode lookup.
- stats : Stats | None¶
Optional live statistics object whose
POSTS_HANDLEDcounter is incremented for every dispatched edge.- yt_dlp_state : YTDLPState | None¶
Optional yt-dlp progress state whose
total_urlscounter is incremented for every URL routed to the video worker.
-
async get_json(url: str, *, cast_to: type[T], headers: Mapping[str, str] | None =
None, params: Mapping[str, str] | None =None) T¶ Get JSON data from a URL.
- Parameters:¶
- url : str¶
URL to fetch.
- cast_to : type[T]¶
Expected type of the decoded JSON body.
- headers : Mapping[str, str] | None¶
Optional per-call headers. When
None(the default),API_HEADERSis used. Passing an explicit dict (typicallyAPI_HEADERSplus aReferer) lets callers likesave_comments()mirror the per-post Referer the browser sends.- params : Mapping[str, str] | None¶
Optional query string parameters.
- Returns:¶
Response body decoded from JSON.
- Return type:¶
T
-
async graphql_query(variables: Mapping[str, Any], *, cast_to: type[T], doc_id: str =
'9806959572732215') T | None¶ Make a GraphQL query.
- async highlights_tray(user_id: int | str) HighlightsTray¶
Get the highlights tray data for a user.
-
async reel_page_gallery(reel_ids: collections.abc.Sequence[str], *, after: str | None =
None, first: int =5, initial_reel_id: str | None =None, is_highlight: bool =True) XDTStoriesV3ReelPageGalleryConnection | None¶ Fetch a page of the PolarisStoriesV3 reel gallery.
Used to retrieve full story metadata (image and video items) for the supplied reels. The first page uses the
ReelPageGalleryQuerydocument; subsequent pages (whenafteris supplied) use the pagination document.- Parameters:¶
- reel_ids : Sequence[str]¶
Numeric reel identifiers (user IDs for current stories or numeric highlight IDs).
- after : str | None¶
Cursor returned by a previous page, or
Nonefor the first page.- first : int¶
Maximum number of reels to return per page.
- initial_reel_id : str | None¶
Reel that the user “opened first”. Defaults to the first entry of
reel_ids.- is_highlight : bool¶
Truewhenreel_idsrefer to highlights,Falsefor current stories.
- Returns:¶
The connection payload, or
Nonewhen the request fails or the response shape is unexpected.- Return type:¶
- async save_comments(edge: Edge) None¶
Save comments for an edge node.
When
should_save_child_commentsisTrue, replies are also fetched for every top-level comment that reports having any (child_comment_count > 0) and embedded back into the saved JSON under the parent’schild_commentskey.
-
async save_edges(edges: Iterable[Edge], parent_edge: Edge | None =
None) None¶ Save edge node media.
- async save_image_versions2(sub_item: CarouselMedia | MediaInfoItem | StoryReelItem, timestamp: int) None¶
Save images in the
image_versions2dictionary.- Parameters:¶
- sub_item : CarouselMedia | MediaInfoItem | StoryReelItem¶
Source item containing
image_versions2candidates.- timestamp : int¶
Timestamp to apply to the saved file.
- async save_media(edge: Edge) None¶
Save media for an edge node.
- Parameters:¶
- Raises:¶
UnexpectedRedirect – If a redirect occurs unexpectedly.
-
async save_reel_item(item: StoryReelItem, video_queue: asyncio.Queue[str | None] | None =
None, *, username: str | None =None, yt_dlp_state: YTDLPState | None =None) None¶ Save a single story item.
Image-only items are written via
save_image_versions2(); items with a video are routed tovideo_queuefor the yt-dlp worker (or appended tovideo_urlswhen no queue is supplied, mirroring the synchronous helper used elsewhere).- Parameters:¶
- item : StoryReelItem¶
Story item payload from a reel page gallery response.
- video_queue : asyncio.Queue[str | None] | None¶
Optional queue receiving permalinks for the yt-dlp worker. When
None, video URLs are appended tovideo_urlsinstead.- username : str | None¶
Username of the reel owner. Used to build the
stories/{username}/{pk}/permalink for video items. Falls back to the literal"_"when not available, which yt-dlp still accepts because it identifies the story bypk.- yt_dlp_state : YTDLPState | None¶
Optional yt-dlp progress state whose
total_urlscounter is incremented when a video URL is enqueued.
- session : AsyncSession¶
The niquests
AsyncSessionused for all HTTP calls.
- exception instagram_archiver.client.UnexpectedRedirect¶
Unexpected redirect in a request.
Instagram profile scraper.
-
class instagram_archiver.profile_scraper.ProfileScraper(username: str, *, log_file: str | Path | None =
None, output_dir: str | Path | None =None, disable_log: bool =False, browser: 'brave' | 'chrome' | 'chromium' | 'edge' | 'firefox' | 'opera' | 'safari' | 'vivaldi' ='chrome', browser_profile: str ='Default', child_comments: bool =False, comments: bool =False)¶ Scrape an Instagram profile timeline.
-
async process(ydl: AsyncYoutubeDL, *, fail: bool =
False, on_cleanup: OnMessage | None =None, on_message: OnMessage | None =None, stats: Stats | None =None, yt_dlp_idle_event: asyncio.Event | None =None, yt_dlp_state: YTDLPState | None =None) None¶ Process posts in parallel using producer/consumer queues.
- Parameters:¶
- ydl : AsyncYoutubeDL¶
Configured yt-dlp wrapper.
- fail : bool¶
Whether yt-dlp failures should abort processing.
- on_cleanup : OnMessage | None¶
Optional callback that receives cleanup status updates.
- on_message : OnMessage | None¶
Optional callback that receives progress text updates.
- stats : Stats | None¶
Optional live statistics object.
- yt_dlp_idle_event : asyncio.Event | None¶
Optional event that the video worker sets when idle.
- yt_dlp_state : YTDLPState | None¶
Optional yt-dlp progress state shared with the video worker.
- Raises:¶
asyncio.CancelledError – Re-raised when the producer is cancelled (typically from a termination signal).
-
async process(ydl: AsyncYoutubeDL, *, fail: bool =
Saved posts scraper.
-
class instagram_archiver.saved_scraper.SavedScraper(browser: BrowserName =
'chrome', browser_profile: str ='Default', output_dir: str | Path | None =None, *, child_comments: bool =False, comments: bool =False, disable_log: bool =False, log_file: str | Path | None =None)¶ Scrape saved posts.
-
async process(ydl: AsyncYoutubeDL, *, fail: bool =
False, on_cleanup: OnMessage | None =None, on_message: OnMessage | None =None, stats: Stats | None =None, unsave: bool =False, yt_dlp_idle_event: asyncio.Event | None =None, yt_dlp_state: YTDLPState | None =None) None¶ Process the saved posts in parallel using producer/consumer queues.
- Parameters:¶
- ydl : AsyncYoutubeDL¶
Configured yt-dlp wrapper.
- fail : bool¶
Whether yt-dlp failures should abort processing.
- on_cleanup : OnMessage | None¶
Optional callback that receives cleanup status updates.
- on_message : OnMessage | None¶
Optional callback that receives progress text updates.
- stats : Stats | None¶
Optional live statistics object.
- unsave : bool¶
If
True, unsave each post after dispatching it.- yt_dlp_idle_event : asyncio.Event | None¶
Optional event that the video worker sets when idle.
- yt_dlp_state : YTDLPState | None¶
Optional yt-dlp progress state shared with the video worker.
- Raises:¶
asyncio.CancelledError – Re-raised when the producer is cancelled (typically from a termination signal).
-
async process(ydl: AsyncYoutubeDL, *, fail: bool =
Worker orchestration for asynchronous edge processing.
- exception instagram_archiver.workers.WorkerAbort¶
Worker-level abort signal for graceful CLI handling.
-
async instagram_archiver.workers.comments_worker(comments_queue: asyncio.Queue[Edge | None], first_exception: list[BaseException], save_comments: collections.abc.Callable[[Edge], Awaitable[None]], stop_event: asyncio.Event, *, on_cleanup: OnMessage | None =
None, on_message: OnMessage | None =None, stats: Stats | None =None) None¶ Save comments for posts sequentially.
- Parameters:¶
- comments_queue : asyncio.Queue[Edge | None]¶
Queue containing edge payloads whose comments should be saved.
Noneis a shutdown sentinel.- first_exception : list[BaseException]¶
Mutable container for the first observed fatal exception.
- save_comments : Callable[[Edge], Awaitable[None]]¶
Coroutine factory invoked once per edge to fetch comments.
- stop_event : asyncio.Event¶
Event indicating that workers should stop.
- on_cleanup : OnMessage | None¶
Optional callback that receives cleanup status updates.
- on_message : OnMessage | None¶
Optional callback that receives progress text updates.
- stats : Stats | None¶
Optional live statistics object updated after each comment thread.
-
async instagram_archiver.workers.image_worker(image_queue: asyncio.Queue[Edge | None], first_exception: list[BaseException], save_media: collections.abc.Callable[[Edge], Awaitable[None]], stop_event: asyncio.Event, *, on_cleanup: OnMessage | None =
None, on_message: OnMessage | None =None, stats: Stats | None =None) None¶ Save image/post media sequentially.
- Parameters:¶
- image_queue : asyncio.Queue[Edge | None]¶
Queue containing edge payloads to save.
Noneis a shutdown sentinel.- first_exception : list[BaseException]¶
Mutable container for the first observed fatal exception.
- save_media : Callable[[Edge], Awaitable[None]]¶
Coroutine factory invoked once per edge to perform the download.
- stop_event : asyncio.Event¶
Event indicating that workers should stop.
- on_cleanup : OnMessage | None¶
Optional callback that receives cleanup status updates.
- on_message : OnMessage | None¶
Optional callback that receives progress text updates.
- stats : Stats | None¶
Optional live statistics object updated after each saved post.
-
async instagram_archiver.workers.video_worker(video_queue: asyncio.Queue[str | None], first_exception: list[BaseException], failed_urls: set[str], stop_event: asyncio.Event, *, fail: bool, idle_event: asyncio.Event | None =
None, is_saved: collections.abc.Callable[[str], bool], on_cleanup: OnMessage | None =None, on_message: OnMessage | None =None, save_to_log: collections.abc.Callable[[str], None], stats: Stats | None =None, ydl: AsyncYoutubeDL, yt_dlp_state: YTDLPState | None =None) None¶ Process video URLs one yt-dlp download at a time.
- Parameters:¶
- video_queue : asyncio.Queue[str | None]¶
Queue containing video URLs.
Noneis a shutdown sentinel.- first_exception : list[BaseException]¶
Mutable container for the first observed fatal exception.
- failed_urls : set[str]¶
Set updated with URLs whose download did not produce any media.
- stop_event : asyncio.Event¶
Event indicating that workers should stop.
- fail : bool¶
Whether yt-dlp failures should abort processing.
- idle_event : asyncio.Event | None¶
Optional event that is set when the worker is idle and cleared while a download is in progress.
- is_saved : Callable[[str], bool]¶
Callback returning
Trueif a URL has already been archived.- on_cleanup : OnMessage | None¶
Optional callback that receives cleanup status updates.
- on_message : OnMessage | None¶
Optional callback that receives progress text updates.
- save_to_log : Callable[[str], None]¶
Callback used to record a successfully downloaded URL.
- stats : Stats | None¶
Optional live statistics object updated after each video URL.
- ydl : AsyncYoutubeDL¶
Configured yt-dlp wrapper instance.
- yt_dlp_state : YTDLPState | None¶
Optional yt-dlp progress state updated with the current URL and index.
SQLite-backed dedup log shared by the scrapers.
-
class instagram_archiver.dedup.LogDB(path: Path, *, disabled: bool =
False)¶ SQLite-backed dedup log.
- instagram_archiver.dedup.clean_url(url: str) str¶
Normalise a URL for dedup lookup by stripping its query string and fragment.
Constants¶
Constants.
- instagram_archiver.constants.API_HEADERS¶
Headers to use for API requests.
- instagram_archiver.constants.BROWSER_CHOICES¶
Possible browser choices to get cookies from.
- instagram_archiver.constants.PAGE_FETCH_HEADERS¶
Headers to use for fetching HTML pages.
- instagram_archiver.constants.SHARED_HEADERS¶
Headers to use for requests.
The
Sec-CH-UA*family must agree withUSER_AGENT. Instagram’s edge runs aUser-Agent↔ client-hint consistency check on the/api/v1/...endpoints and falls back to the React app shell (HTML) when they disagree. The strings here mirror a captured browser request verbatim, including the GREASE entry and version-list ordering.
- instagram_archiver.constants.USER_AGENT¶
User agent.
Modern Chrome on Linux. Must be sent together with the matching
Sec-CH-UA*client-hint headers inSHARED_HEADERS; Instagram’s edge cross-references the two and serves the React app shell (HTML) for/api/v1/media/<pk>/comments/and similar endpoints if they disagree. The exact strings here are taken verbatim from a captured browser request that Instagram successfully routed to the JSON API.
Typing¶
Typing helpers.
- instagram_archiver.typing.BrowserName¶
Possible browser choices to get cookies from.
alias of
Literal[‘brave’, ‘chrome’, ‘chromium’, ‘edge’, ‘firefox’, ‘opera’, ‘safari’, ‘vivaldi’]
- instagram_archiver.typing.COMMENTS_PROCESSED¶
Counter key for posts whose comments have been saved successfully.
- class instagram_archiver.typing.CarouselMedia¶
-
- image_versions2 : MediaInfoItemImageVersions2¶
Image versions.
- class instagram_archiver.typing.ChildCommentsPage¶
One page of replies under a top-level comment.
- has_more_head_child_comments : NotRequired[bool]¶
Whether more replies exist forward of the current cursor.
- class instagram_archiver.typing.Comments¶
Comments container.
- class instagram_archiver.typing.Edge¶
Edge of a graph.
- node : XDTMediaDict¶
Node at this edge.
- class instagram_archiver.typing.HighlightsTray¶
- tray : Sequence[HighlightItem]¶
Highlights tray items.
- instagram_archiver.typing.IMAGES_PROCESSED¶
Counter key for image posts that have been saved successfully.
- class instagram_archiver.typing.MediaInfo¶
Media information.
- class instagram_archiver.typing.MediaInfoItem¶
Media information item.
- carousel_media : NotRequired[Sequence[CarouselMedia] | None]¶
Carousel media items.
- image_versions2 : MediaInfoItemImageVersions2¶
Image versions.
- video_versions : Sequence[MediaInfoItemVideoVersion]¶
Video versions.
- class instagram_archiver.typing.MediaInfoItemImageVersions2Candidate¶
- instagram_archiver.typing.OnMessage¶
Callback used to report human-readable progress updates.
- instagram_archiver.typing.POSTS_HANDLED¶
Counter key for posts routed by the producer.
- class instagram_archiver.typing.Stats¶
Live pipeline statistics shown in the progress spinner.
- class instagram_archiver.typing.StoryReel¶
A reel (collection of story items belonging to a single user/highlight).
- class instagram_archiver.typing.StoryReelItem¶
A single story media item inside a reel.
- image_versions2 : NotRequired[MediaInfoItemImageVersions2]¶
Image versions, when the item carries a still image.
- video_versions : NotRequired[Sequence[MediaInfoItemVideoVersion]]¶
Video versions, if the item is a video.
- class instagram_archiver.typing.UserInfo¶
User information.
- edge_owner_to_timeline_media : EdgeOwnerToTimelineMedia¶
Timeline media edge.
- instagram_archiver.typing.VIDEOS_PROCESSED¶
Counter key for video URLs handed to yt-dlp successfully.
- class instagram_archiver.typing.WebProfileInfo¶
Profile information container.
- data : NotRequired[WebProfileInfoData]¶
Profile data.
- class instagram_archiver.typing.XDTAPIV1FeedUserTimelineGraphQLConnection¶
-
- page_info : PageInfo¶
Pagination information.
- class instagram_archiver.typing.XDTAPIV1FeedUserTimelineGraphQLConnectionContainer¶
Container for
XDTAPIV1FeedUserTimelineGraphQLConnection.- xdt_api__v1__feed__user_timeline_graphql_connection : XDTAPIV1FeedUserTimelineGraphQLConnection¶
User timeline data.
- class instagram_archiver.typing.XDTMediaDict¶
-
- owner : Owner¶
Owner information.
- video_dash_manifest : NotRequired[str | None]¶
Video dash manifest URL, if available.
- class instagram_archiver.typing.XDTStoriesV3ReelPageGalleryConnection¶
Connection for the PolarisStoriesV3 reel page gallery query.
- edges : Sequence[StoryReelEdge]¶
Edges of the connection.
- page_info : PageInfo¶
Pagination information.
- class instagram_archiver.typing.XDTStoriesV3ReelPageGalleryQueryResponse¶
Container for
XDTStoriesV3ReelPageGalleryConnection.- xdt_api__v1__feed__reels_media : XDTStoriesV3ReelPageGalleryConnection¶
Reels media connection payload.
-
class instagram_archiver.typing.YTDLPState(current_index: int =
0, current_url: str | None =None, total_urls: int =0)¶ Mutable yt-dlp progress state shared between the producer and the yt-dlp worker.
- render() str | None¶
Build the
YT_DLP_STATUSvalue from the current state.
- instagram_archiver.typing.YT_DLP_STATUS¶
Status-line key for the current yt-dlp URL.
Utilities¶
Utility functions.
- class instagram_archiver.utils.JSONFormattedString(formatted: str, original: Any)¶
Contains a formatted version of the JSON str and the original value.
- formatted¶
Formatted JSON string.
- original_value¶
Original value.
- exception instagram_archiver.utils.UnknownMimetypeError¶
Raised when an unknown mimetype is encountered in
get_extension().
-
instagram_archiver.utils.dump_json(target: Path | str, obj: Any, *, mode: str =
'w') None¶ Dump
objtotargetas sorted, indented JSON.
- instagram_archiver.utils.get_extension(mimetype: str) str¶
Get the appropriate extension for a mimetype.
- instagram_archiver.utils.json_dumps_formatted(obj: Any) JSONFormattedString¶
Return a special object with the formatted version of the JSON str and the original.
- instagram_archiver.utils.write_bytes(target: Path | str, content: bytes) None¶
Write bytes to a file.
- instagram_archiver.utils.write_failed_urls(target: Path | str, urls: Iterable[str]) None¶
Write a newline-separated list of URLs to
target.
-
instagram_archiver.utils.write_if_new(target: Path | str, content: str | bytes, mode: str =
'w') None¶ Write a file only if it will be a new file.