You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/web-app/poet/prompts.rst
+39-33Lines changed: 39 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
-
Prompt and prompt sampling methods
1
+
Prompt And Prompt Sampling Methods
2
2
===================================
3
3
4
-
What is a prompt?
4
+
What is a Prompt?
5
5
-----------------
6
6
7
7
A prompt is an input that directs a generative AI model to produce the desired protein sequences. For PoET-2, a prompt can include sequences and/or structures that define the target protein subspace. In contrast, PoET-1 uses a prompt composed of a set of related sequences. These sequences can be homologs, family members, or other groupings that capture the characteristics of the protein of interest.
@@ -13,11 +13,12 @@ A PoET-2 prompt is made up of two components, either of which can be included or
13
13
14
14
**Note:** PoET-1 is still available for some use cases and legacy workflows, but we recommend PoET-2 for most scenarios.
15
15
16
-
Creating a query
16
+
Creating a Query
17
17
-----------------
18
18
A query allows you to specify precise constraints for PoET-2 to follow during sequence generation.
19
19
20
-
Query components
20
+
21
+
Query Components
21
22
~~~~~~~~~~~~~~~~
22
23
23
24
- **Reference sequence:** Baseline sequence for comparison and edits.
@@ -26,7 +27,8 @@ Query components
26
27
27
28
The query enables targeted generation tasks such as sequence in-filling, inverse folding, or motif scaffolding. Only **one** sequence or structure can be entered per query.
28
29
29
-
Uploading a query
30
+
31
+
Uploading a Query
30
32
~~~~~~~~~~~~~~~~~
31
33
32
34
You can enter into the sequence editor or upload a query in the following formats:
@@ -44,6 +46,7 @@ You also have the option to skip entering a query by toggling the disable query
- Replace highlighted positions with a character (e.g., highlight positions 1–50 and press `X` to mask that region)
62
65
63
-
These tools allow precise control over the query, enabling you to define exactly which residues or structural positions should guide PoET-2’s generation.
66
+
These tools allow precise control over the query, enabling you to define exactly which residues or structural positions should guide PoET-2's generation.
64
67
65
68
.. image:: ../../_static/tools/poet/query-2.png
66
69
:alt:Sequence editor tools
67
70
71
+
68
72
Creating a Context
69
73
-------------------
70
74
71
-
You can either use an existing prompt or create a new custom prompt context or build from a Multiple Sequence Alignment (MSA).
75
+
You can create a prompt context in three ways:
72
76
73
-
1. Use Existing Prompt
74
-
^^^^^^^^^^^^^^^^^^^^^^^
77
+
Use Existing Prompt
78
+
^^^^^^^^^^^^^^^^^^^^
75
79
76
-
If you already uploaded some prompts, in the **Prompt Type** dropdown, select one. It will load the sequences from the selected prompt.
80
+
If you've previously uploaded prompts, you can reuse them. In the **Prompt Type** dropdown,
81
+
select an existing prompt. The sequences from that prompt will automatically load.
To create a custom prompt context, in the **Prompt Type** dropdown, select **Create New Prompt** option, and select **Custom** option from the toggle buttons. There are 2 ways to add sequences to your custom context:
90
+
To create a custom prompt context, in the **Prompt Type** dropdown, select **Create New Prompt** option, and select **Custom** option from the toggle buttons. You can add sequences to your custom context in two ways:
86
91
87
-
1. Click **Choose Files** to select files for your context, we support .fa, .fasta for FASTA files, and .pdb, .cif for structure files.
88
-
2. Manually enter sequencesin CSV or Fasta format, then click **Upload** button. If you choose to paste CSV content, please note the following requirements:
92
+
1. **Upload files**: Click **Choose Files** to select files for your context. We support .fa, .fasta for FASTA files, and .pdb, .cif for structure files.
93
+
2. **Manually enter sequences**: Paste sequences in CSV or FASTA format, then click **Upload**.. If you use CSV content, please note the following requirements:
89
94
- It must not include a header row.
90
95
- It can contain a maximum of 2 columns.
91
96
- If there are 2 columns, the first one must be the sequence names.
After uploading the first prompt, a files list will appear to let you preview and manage the prompts. You can upload more prompts by dragging the files into the list, or click **Add Files** to manually enter sequences for the selected prompt. You can also drag a file to move to another prompt.
101
+
After uploading the first prompt, a file list will appear where you can preview and manage your prompts. You can upload more prompts by dragging additional files into the list, or click **Add Files** to manually enter sequences for the selected prompt. You can also drag and drop files within the list to move them between prompts.
97
102
98
-
If a structure contains multiple chains, you can choose which chain to use for the prompt.
103
+
If a structure file contains multiple chains, you can select which chain to use for the prompt.
There are serveral options to create a context from an MSA:
108
113
109
-
1. **Use Existing MSA**: Users can select an existing MSA from the current project.
110
-
2. **Upload MSA**: Users can upload an MSA file directly.
111
-
3. **Run Homology Search Using a Seed Sequence**: Users input a single seed sequence, and PoET builds an MSA by searching for homologs. Please note that when multiple sequences are entered, sequences after the first are ignored.
114
+
1. **Use Existing MSA**: Select an existing MSA from the current project.
115
+
2. **Upload MSA**: Upload an MSA file directly.
116
+
3. **Run Homology Search Using a Seed Sequence**: Enter a single seed sequence, and PoET will generate an MSA by searching for homologs. Note: If multiple sequences are entered, only the first one will be used.
- Choose the number of prompts to ensemble: Select 1 to sample a single prompt, or increase the diversity of generated outputs by ensembling over 2-15 prompts. We suggest using 3-5 prompts.
117
-
- Set sampling method fields: We suggest you start with the default settings, then adjust subsequent jobs based on your results.
121
+
You can further customize your analysis by:
122
+
123
+
- **Number of prompts to ensemble**: Choose 1 to sample a single prompt, or 2-15 to increase diversity. We recommend 3-5 prompts for most use cases.
124
+
- **Prompt Sampling Method**: Start with the default settings and fine-tune them based on your results.
118
125
119
126
120
127
Uploading and Saving a Sequence-Only Prompt
@@ -182,11 +189,9 @@ MSAs can be uploaded via:
182
189
.. image:: ../../_static/tools/poet/prompt-7.png
183
190
:alt:Uploading MSA on project page
184
191
185
-
![Uploading MSA popup on pr]()
186
192
187
-
188
-
Uploading and saving a sequence only-prompt
189
-
-----------------
193
+
Uploading And Saving a Sequence Only-Prompt
194
+
--------------------------------------------
190
195
191
196
Without a Project
192
197
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -222,14 +227,15 @@ What is a Multiple Sequence Alignment?
222
227
223
228
Multiple sequence alignment (MSA) is a technique for biological sequence analysis. It consists of a sequence alignment of three or more biological sequences that usually have an evolutionary relationship.
224
229
225
-
Why is MSA useful?
230
+
Why is MSA Useful?
226
231
------------------
227
232
228
233
The resulting MSA can be used to infer sequence homology and conduct phylogenetic analysis to assess the sequences’ shared evolutionary origins. Biologically sound and accurate alignments show homology and relationships, allowing for new member identification and the comparison of similar sequences. Because subsequent analysis depends on the results of an MSA, accuracy is vital.
229
234
230
235
When building a prompt from a MSA, you should include sequences you want to optimize for. The model learns the patterns of the proteins and predicts sequences that best fit that list. Since the model views proteins in their entirety, you cannot optimize for a specific property or activity.
231
236
232
-
Creating a Prompt using a MSA
237
+
238
+
Creating a Prompt Using a MSA
233
239
---------------------------
234
240
235
241
Without a Project
@@ -301,10 +307,10 @@ Homology search from a seed sequence can be initiated via:
301
307
:alt:single seq popup sidebar
302
308
303
309
304
-
Prompt sampling parameters
310
+
Prompt Sampling Parameters
305
311
--------------------------
306
312
307
-
Prompt sampling definitions
313
+
Prompt Sampling Definitions
308
314
~~~~~~~~~~~~~~~~~~~~~~~~~~~
309
315
310
316
- **Sampling method**: defines the sampling strategy used for selecting prompt sequences from the homologs found by homology search, or from the provided MSA. The following strategies are available:
@@ -318,7 +324,7 @@ Prompt sampling definitions
318
324
- **Maximum number of sequences**: The number of sequences sampled from the MSA to form the prompt. The same sequence will not be sampled from the MSA more than once, so the number of sequences in the prompt will never be greater than the number of sequences in the MSA.
319
325
- **Maximum total number of residues**: The maximum total number of residues in all sequences sampled from the MSA to form the prompt. For example, if this is set to 1000, sequences will be sampled from the MSA up to a maximum cumulative length of 1000 residues.
320
326
321
-
Prompt sampling explained
327
+
Prompt Sampling Explained
322
328
-------------------------
323
329
324
330
The selection of prompt sequences from the MSA is controlled by several prompt sampling parameters.
0 commit comments