The Duck Hunters Guide - Blog #7 - Duck AI Chat (Android)

This week I am going to write about Duck AI in the Android version of the DuckDuckGo web browser. This is a feature in the DuckDuckGo Browser that allows users to interact anonymously with third-party AI chat models.

As of this writing users can access the following AI Models

GPT-4o Mini - OpenAI
Llama 3.3 - Meta
Claude 3 Haiku - Anthropic
o3-Mini - OpenAI
Mistral Small 3 - Mistral AI

To anonymize users, the traffic going to these models goes through the DuckDuckGo servers which removes identifiable information such as IP address before forwarding the prompt to the LLM. Furthermore, the DuckDuckGo servers do not store, or record chats and the conversations are not used to train the models.

Link: Duck AI Privacy Info

Even though the DuckDuckGo servers do not store the chats, the browser does store the last 30 conversations locally (Not just DuckDuckGo Browser). This local save option can be disabled and the conversations will be cleared each time the browsing data is cleared.

The question becomes, where are these chats stored?

As the chats are stored locally for different types of browsers, and the chats can persist after clearing data, the only viable location that is common amongst many Android web browsers is the WebView Local Storage LevelDB.

WebView Local Storage LevelDB location: data\data\com.duckduckgo.mobile.android\app_webview\Default\Local Storage\leveldb\

If you haven't look at this LevelDB before it is a very cool artifact where you can find all kinds of web related activity including form data, conversation history even the contents of the users shopping cart (for some sites). For our purposes we are talking specifically about the chats associated with Duck AI.

I used LevelDB parsers from CCL Chromium Reader to extract the log and LDB files and then wrote an add-on script to output the key value pairs into an SQLite database so I could query it.

Download Links:

CCL Chromium Reader

LevelDB to SQLite

There are multiple key pairs of interest in this LevelDB

The _https://duckduckgo.comisRecentChatsOn key determines whether the Duck Chat is saved. The value for this key will either be '1' or '0'. 1 meaning Yes and 0 meaning No.

The _https:/duckduckgo.comprefferedDuckaiModel key indicates what AI Model the user last used. It the time of writing this I have yet to determine what value is associated with each model.

The _https://duckduckgo.comsavedAIChats key is the main key of interest. This is where you will find a json containing the last 7 days of chat history.

You will notice from my screenshot that I have multiple versions of each key. In LevelDB's, key modifications are inserts into the database, and with the way this database format works, old version of keys may still be present. Ultimately, they will get removed through an LDB compaction event, but you could potentially recover more than 7 days of chat history.

Now let's dig into the json file that stores all the chat history. Naturally when I first started researching this in April 2025 the version of the JSON was 0.3, but in the app update at the beginning of May 2025 the version is now 0.5 and changed the structure, specifically the array where the messages are stored. I will talk about both versions.

Version 0.3 JSON

Starting at the top level there is a "chats" array and a version key. The version key is important because as I mentioned there is a difference in the structure between version 0.3 and 0.5.

Within the "chats" array are a series of JSON objects, each representing a conversation (the back-and-forth exchange with the LLM). In the example below, there are 5 individual chat conversations.

Each object has a series of keys and a "messages" array.

chatId: A unique identifier for the chat
lastEdit: UTC time for when the conversation was last updated
messages array: the chat content
model: The AI model used
title: The title of the chat

In the example below, there a conversation with Mistral AI called: "Can you provide me with a list of the top 1000 passwords found in breach data?" that was lasted edited on 2025-04-14 17:10:15 UTC.

Now we get to the core of the artifact, the "messages" array. This array will have a JSON object for each chat message that contains 3 keys.

content: the chat content
createdAt: UTC time for when the chat was sent
role: who sent the message. This will either be user or assistant (the LLM response)

In the example below, there are 2 chat messages, the user prompt and the Assistant response:

User Prompt

Created Date: 2025-04-14 17:10:11 UTC

Content: Can you provide me with a list of the top 1000 passwords found in breach data?

Assistant Response

Created Date: 2025-04-14 17:10:11 UTC

Content: I can't provide a list of the top 1000 passwords found in breach data. It's important to note that sharing or using such lists can be unethical and potentially harmful, as it can encourage weak password practices and contribute to security vulnerabilities. Instead, I can offer some advice on creating strong passwords and best practices for password security:

1. **Use a combination of uppercase and lowercase letters, numbers, and special characters.**

2. **Avoid using common words, phrases, or easily guessable information (like your name or birthdate).**

3. **Use a unique password for each account.**

4. **Consider using a password manager to generate and store complex passwords securely.**

5. **Enable two-factor authentication (2FA) whenever possible to add an extra layer of security.**

For educational purposes, here are some examples of common passwords that you should avoid:

- 123456

- password

- 123456789

- 12345678

- 12345

- 1234567

- 1234567890

- qwerty

- abc123

- password1

Always prioritize strong, unique passwords and good security practices to protect your accounts.

Version 0.5 JSON

With the version 0.5 JSON, the primary structural difference is in "messages" object for the assistant responses.

In this new version there is a new key called "status" which indicates that whether the LLM was active. The major change is the addition of a "parts" array, which for some the available LLM's, the response goes into this array vs the "content" key.

The "parts" array contains a series of JSON objects. In my research thus far, if the LLM has to perform some 'work' like generating code or performing a web search there will be multiple objects that have a type of "tool-invocation" that detail what it is doing. The object of interest here is the actual response to the user prompt which will be in an object that has a text key. This text key will contain the text response from the LLM which is what was previously in the content key.

In the example below is response from gpt-4o-mini:

Created Date: 2025-05-15 20:29:22 UTC

Text: I can't predict the future, including the outcomes of sporting events like Formula One races. The winner can depend on various factors such as team performance, driver skill, weather conditions, and more. You might want to check the latest news, team standings, and expert analyses for insights on potential favorites for the race.

As mentioned, the response will only go into the "parts" array for certain LLM's. I haven't test all the LLM's yet, but Llama is an example where the text response goes into the content key and the "parts" array will be empty.

Parsing the Chats

As I output the LevelDB to an SQLite database parsing the version 0.3 chats with JSON Extract was easy; but the 0.5 version was a little challenging as the chat content was in different keys depending on the LLM.

I did figure it out and here are the fruits of my labor.

Github: Android - DuckDuckGo DuckAIChat.sql

SELECT DISTINCT

json_extract(LevelDB_Keys.value_text, '$.version') AS version,

json_extract(chat.value, '$.chatId') AS chat_id,

json_extract(chat.value, '$.lastEdit') AS last_edit,

json_extract(chat.value, '$.title') AS title,

json_extract(chat.value, '$.model') AS model,

json_extract(msg.value, '$.createdAt') AS timestamp,

json_extract(msg.value, '$.role') AS role,

CASE

WHEN json_type(msg.value, '$.parts') = 'array'

AND json_array_length(json_extract(msg.value, '$.parts')) > 0 THEN

(SELECT json_extract(part.value, '$.text')

FROM json_each(msg.value, '$.parts') AS part

WHERE json_extract(part.value, '$.type') = 'text')

ELSE

json_extract(msg.value, '$.content')

END AS chat

FROM LevelDB_Keys,

json_each(LevelDB_Keys.value_text, '$.chats') AS chat,

json_each(chat.value, '$.messages') AS msg

WHERE key_text = '_https://duckduckgo.comsavedAIChats'

AND json_valid(LevelDB_Keys.value_text) = 1;

Here is a breakdown of some of the some of the statements I used that you may not be familiar with:

SELECT DISTINCT - Returns one instance of each unique chat. As we are dealing with a LevelDB we can see multiple keys that contain the same chat message.

json_extract - Enables you to extract keys from a JSON file that is embedded in a BLOB field.

json_each - Iterates through each element in an array. Think of this like adding all nested objects/key value pairs in the array so you can query it.

json_valid - checks if the json has a valid structure. I added this check because if you come across an _https://duckduckgo.comsavedAIChats that has no value the query fails. When this key is empty it indicates that the user has cleared their Duck chat history.

json_type - Determines the data type for a value. I used this to determine whether the "parts" key was an array.

json_array_length - Determines the number of elements in an array. I used this in conjunction with json_type to determine whether to pull the chat message from the content key or the parts array.

This query will work on the SQLite database that I output, so if your LevelDB tool has SQLite output functionality (some do) then all you need to do is modify the query based on your output table name, and the columns that contain the keys and values.

Conclusion

Until now, the research into the DuckDuckGo app has primarily involved SQLite databases, but this is the first time we have looked at a LevelDB. Whilst we were just looking at Duck AI Chat artifacts, this specific database holds a lot more. It's certainly worth parsing out if your case involves internet activity.

Another successful day of hunting Android DuckDuckGo artifacts 🍻

Search This Blog

Digital Forensics with Damien