PDB ID Extension FAQ

What will happen to PDB format files once four-character PDB IDs have been consumed?

PDB format files will not be provided for PDB entries deposited after four-character PDB IDs have been consumed. Best-effort PDB bundle files will no longer be provided for entries issued with extended PDB IDs.

The format of extended PDB IDs is prefix “pdb_” followed by eight alphanumeric characters, e.g., pdb_10021abc. This PDB ID format will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.

All existing four-character PDB IDs will be extended by adding prefixing “pdb_0000” to the IDs, e.g., PDB ID “1abc” would be listed as “pdb_00001abc” in the _database_2.pdbx_database_accession data item.

Which part of extended PDB IDs should be cited in manuscripts or provided to journals?

The entire PDB ID, e.g., “pdb_1001ba3c” should be cited or provided to journals. Users should not omit the prefix or zeros. Users or journals should parse/recognize PDB IDs using the prefix “pdb_”.

The wwPDB OneDep tool will assign extended PDB IDs starting with pdb_1xxxxxxx, followed by seven alphanumeric characters. When referencing PDB IDs in manuscripts that will be submitted to journals, authors should refrain from abbreviating prefix characters or omitting leading zeros in issued PDB IDs.

The extended PDB ID is currently stored in the mmCIF format file as the _database_2.pdbx_database_accession data item value. Once four-character PDB IDs have been consumed, extended PDB IDs will be stored as values for both _database_2.database_code and _database_2.pdbx_database_accession data items.

After the depletion of four-character PDB IDs, filenames for both existing and new entries will be based on extended PDB IDs. For example, the current filename for PDB entry 1abc, which is 1abc.cif, will transform into pdb_00001abc.cif.

wwPDB strongly encourages journals to swiftly transition to the new PDB ID format (“pdb_xxxxxxxx”). All existing PDB entries with four-character codes will have extended PDB IDs, e.g., the four-character PDB ID “1abc” will have the extended PDB ID, “pdb_00001abc”.

Data blocks will be named by appending the extended PDB ID to the data_ token, e.g., data_pdb_10021abc.

Data block names in structure factor files will follow the pattern data_sf_pdb_xxxxxxxx. In the case of multiple data blocks, they will be labeled as data_sf_pdb_xxxxxxxxA for the initial data block, data_sf_pdb_xxxxxxxxB for the second data block, and so forth.

Example files that contain extended PDB IDs can be downloaded at https://github.com/wwPDB/extended-wwPDB-identifier-examples .

All data files for a particular entry will be stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code. To aid users in adapting to this change, a PDB “beta” archive will be provided during the transition phase. The expected timeline of this beta archive is in 2026. The directory structure will mirror the data organization of the PDB Versioned Archive, i.e.,
https://files-beta.org/pub/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>.
The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry PDB_1abc5678 will be under /67/.
This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.

Yes. The 4-character PDB IDs in the existing entries will be still available at _database_2.database_code in the mmCIF structure files.

Yes. The existing legacy PDB format files will not be removed from the main PDB archive when “beta” archive becomes the main archive.