How to Import Records in Salesforce Using Data Loader

Q: How do I avoid creating duplicate records on import?

Use the Upsert operation with an External ID field instead of Insert. Mark a custom field (for example Legacy_Account_Id__c) as External ID and Unique, include that key in your CSV, and select it as the match field. Upsert updates existing matches and only inserts genuinely new keys, so re-running the same file never creates duplicates.

Salesforce Data Loader is a free client application from Salesforce for moving records in bulk through the Salesforce API. It reads and writes CSV files and supports insert, update, upsert, delete, hard delete, and export — up to 5 million records in a single job. That makes it the standard tool for importing client data such as Accounts, Contacts, Leads, Opportunities, and custom objects into an org.

Reach for Data Loader when you need volume, automation, or objects the browser-based Data Import Wizard does not handle. The Wizard is the simpler choice for one-off loads of common objects (Accounts, Contacts, Leads, Solutions, Campaign Members, Person Accounts, and custom objects) up to roughly 50,000 records, with built-in duplicate matching and nothing to install. Data Loader wins for everything larger, every standard and custom object, and any load you want to script and schedule.

If you would rather hand the migration off, our Salesforce consulting team has run data migrations across 50+ projects over 12+ years — but the steps below let you run a clean import yourself.

Data Loader vs Data Import Wizard

Both tools live under Setup, but they solve different problems. Use this table to pick the right one before you start.

Capability	Data Import Wizard	Data Loader
Where it runs	In the browser (Setup)	Installed desktop app + CLI
Max records per job	~50,000	Up to 5,000,000
Objects supported	Accounts, Contacts, Leads, Solutions, Campaign Members, Person Accounts, custom objects	All standard and custom objects
Operations	Insert, update, upsert	Insert, update, upsert, delete, hard delete, export, export all
Upsert / external ID	Yes (limited)	Yes (full control)
Built-in duplicate matching	Yes	No (use upsert + external ID)
Bulk API / large volumes	No	Yes (SOAP, Bulk API, Bulk API 2.0)
Automation / scheduling	No	Yes (command-line + process-conf.xml)
Install required	No	Yes (bundles Java)
Best for	Quick, ad-hoc loads of common objects	Large, repeatable, or scripted data jobs

Rule of thumb: under ~50,000 records of a supported object with no automation, use the Wizard. Anything bigger, any unsupported object, or any load you need to repeat, use Data Loader.

Install and set up Data Loader

Data Loader runs on Windows and macOS and ships with a bundled Zulu OpenJDK (Java) runtime in recent versions, so you no longer install Java separately.

Go to Setup, type "Data Loader" in the Quick Find box, then click Data Loader.
Download the build for your OS — Download for Windows or Download for macOS.
Run the installer and launch Data Loader.

Because Data Loader works through the Salesforce API, your org must have API access enabled. API access is included in Enterprise, Unlimited, Performance, and Developer editions. Professional Edition needs the API add-on enabled before Data Loader can connect. The running user also needs the API Enabled permission plus the object and field permissions for whatever you are loading.

Step by step: import client records (Insert)

This example inserts new Account records; the same flow applies to Contacts, Leads, and custom objects.

1. Prepare the CSV. One column per field with a header row. Include every required field and use exact picklist values. A minimal Account file:

Name,Type,Industry,BillingCity,BillingCountry,Phone
Acme Health Ltd,Customer,Healthcare,London,United Kingdom,+44 20 7946 0000
Northwind Retail,Prospect,Retail,Sydney,Australia,+61 2 8123 4567

2. Choose the operation. Launch Data Loader and click Insert.

3. Log in. Complete the OAuth login in the browser window that opens and allow access. Modern Data Loader uses OAuth, so you no longer append a security token for interactive logins.

4. Select object and file. Pick the target object (for example Account) and browse to your CSV.

5. Map the fields. Click Create or Edit a Map, then Auto-Match Fields to Columns to match by name. Fix any unmapped columns manually and save the mapping as an .sdl file so you can reuse it.

6. Set output directories. Choose a folder for the success and error result files.

7. Run and review. Click Finish. When the job completes, open success.csv (each new record's 18-character Salesforce ID) and error.csv (failed rows with the reason). Correct the failed rows in your source file and reload only those.

Update, Insert, and Upsert — and how to avoid duplicates

Choosing the right operation is what keeps re-imports clean:

Insert — always creates new records. Re-running an insert creates duplicates.
Update — changes existing records; every row must include the Salesforce record Id.
Upsert — updates a record when it matches an existing one, otherwise inserts it. This is the safe, idempotent choice for repeatable loads.

Upsert needs a key to match on. You can match on the Salesforce Id, or — better for data coming from another system — a custom External ID field:

Create a custom field on the object (e.g. Legacy_Account_Id__c) and tick both External ID and Unique.
Put that external key in your CSV.
In Data Loader choose Upsert, select the object, then select your external ID field as the match field.

Now you can re-run the same file as often as you like: matched rows update in place and only genuinely new keys insert, so you never create duplicate clients. This is the single most important technique for migrations you expect to repeat or reconcile.

SOAP API vs Bulk API settings

Open Settings → Settings in Data Loader to control how records are sent:

SOAP API (default). Processes records synchronously in batches of up to 200. Fine for small to medium loads.
Use Bulk API. Submits records asynchronously in larger batches (up to 10,000 per batch), which is far faster for high volumes and is required for hard delete.
Bulk API 2.0. A simpler, newer mode that auto-manages batching; enable Use Bulk API 2.0 for the largest jobs.
Serial vs parallel. Bulk batches run in parallel by default; switch to serial if parallel processing causes record-lock errors (common when many child rows point at the same parent).
Batch size. SOAP allows up to 200 per batch; Bulk API allows up to 10,000. Smaller batches isolate errors; larger batches run faster.

For a one-off load of a few thousand rows, the defaults are fine. For hundreds of thousands of rows, enable Bulk API (or 2.0) and test batch size in a sandbox first.

Automate loads from the command line

Data Loader can run headless for scheduled, repeatable jobs — a nightly sync from another system, for example. Instead of the UI you define the job in a process-conf.xml file and run it from the command line, scheduled with cron (macOS/Linux) or Task Scheduler (Windows).

The password is never stored in plain text: you encrypt it first with the bundled encrypt utility and reference the encrypted value. A trimmed process-conf.xml for an upsert looks like this:

<!-- process-conf.xml (trimmed) -->
<bean id="accountUpsert"
      class="com.salesforce.dataloader.process.ProcessRunner"
      singleton="false">
  <property name="name" value="accountUpsert"/>
  <property name="configOverrideMap">
    <map>
      <entry key="sfdc.endpoint"             value="https://login.salesforce.com"/>
      <entry key="sfdc.username"             value="integration@example.com"/>
      <!-- value produced by: encrypt -e "password+securityToken" path/to/key.txt -->
      <entry key="sfdc.password"             value="e8a7c...encrypted..."/>
      <entry key="process.encryptionKeyFile" value="/opt/dataloader/key.txt"/>
      <entry key="sfdc.entity"               value="Account"/>
      <entry key="process.operation"         value="upsert"/>
      <entry key="sfdc.externalIdField"      value="Legacy_Account_Id__c"/>
      <entry key="process.mappingFile"       value="/opt/dataloader/accountMap.sdl"/>
      <entry key="dataAccess.name"           value="/opt/dataloader/accounts.csv"/>
      <entry key="dataAccess.type"           value="csvRead"/>
      <entry key="sfdc.useBulkApi"           value="true"/>
    </map>
  </property>
</bean>

# Run it, then schedule the same command with cron or Task Scheduler:
# process.bat "/opt/dataloader/conf" accountUpsert

Common Data Loader errors and fixes

Error	What it means	Fix
`REQUIRED_FIELD_MISSING`	A required field is blank or unmapped	Add the field to the CSV and map it; check required custom fields too
`INVALID_CROSS_REFERENCE_KEY`	A lookup/relationship Id is wrong, malformed, or points to a record you can't see	Use a valid 15/18-character Id (or upsert on an external ID); confirm the parent exists and is shared with you
`DUPLICATE_VALUE` / blocked by duplicate rule	A duplicate or an active duplicate rule stopped the insert	Use upsert on an external ID, or deactivate/relax the duplicate rule for the load
`INVALID_OR_NULL_FOR_RESTRICTED_PICKLIST`	A picklist value isn't in the allowed set	Match values exactly (case-sensitive) or add them to the picklist
Date / locale format error	The date format doesn't match your locale	Use `yyyy-MM-dd'T'HH:mm:ss.SSS'Z'` or your org's expected format
`INSUFFICIENT_ACCESS` / field not writable	Field-level security or object permissions block the field	Grant edit/FLS access via a permission set or profile
`UNABLE_TO_LOCK_ROW`	Parallel batches contend for the same parent record	Switch Bulk API to serial mode or sort the file by parent

Best practices for a clean import

Test in a sandbox first. Validate mappings and record counts before touching production.
Always keep a backup. Run an Export of the object before any update, upsert, or delete so you can roll back.
Use upsert + external ID for anything you might re-run — it is the cleanest defense against duplicates.
Mind 15- vs 18-character IDs. The 15-character Id is case-sensitive; the 18-character Id is case-safe. Use 18-character Ids in lookups to avoid mismatches, especially after editing in Excel.
Deactivate automation for huge loads. Temporarily turn off non-essential workflows, flows, validation rules, and triggers, then run data-quality checks afterward — automation firing per row slows large jobs and can hit governor limits.
Plan large volumes. For multi-million-row loads, enable Bulk API and review our guide to managing large data volumes.
Read the result files. The error CSV tells you exactly which rows failed and why — fix and reload only those.

Frequently Asked Questions

When should I use Data Loader instead of the Data Import Wizard?

Use the Data Import Wizard for quick, one-off loads of common objects (Accounts, Contacts, Leads, custom objects) up to about 50,000 records — it runs in the browser and matches duplicates for you. Use Data Loader when you need to load more than 50,000 records (up to 5 million), work with objects the Wizard does not support, run deletes or exports, or automate and schedule loads from the command line.

How do I avoid creating duplicate records on import?

Use the Upsert operation with an External ID field instead of Insert. Mark a custom field (for example Legacy_Account_Id__c) as External ID and Unique, include that key in your CSV, and select it as the match field. Upsert updates existing matches and only inserts genuinely new keys, so re-running the same file never creates duplicates.

What is the maximum number of records Data Loader can import?

Data Loader handles up to 5 million records per job. For volumes that large, enable the Bulk API (or Bulk API 2.0) in Settings for asynchronous, higher-throughput processing.

Do I need a specific Salesforce edition or API access?

Yes. Data Loader works through the Salesforce API, so your org needs API access — included in Enterprise, Unlimited, Performance, and Developer editions. Professional Edition requires the API add-on. The running user also needs the API Enabled permission and appropriate object and field access.

Can I schedule automated imports with Data Loader?

Yes. Data Loader can run headless from the command line using a process-conf.xml file with an encrypted password, then be scheduled with cron (macOS/Linux) or Task Scheduler (Windows) for repeatable nightly loads.

Why am I getting INVALID_CROSS_REFERENCE_KEY?

That error means a lookup or relationship Id in your file is wrong, malformed, or points to a record your user cannot access. Confirm you are using a valid 15- or 18-character Id, that the referenced parent record exists, and that it is shared with the loading user. Upserting on an external ID instead of a raw Id avoids most of these.