Use-case: “File import”-compatible bulk import script

Hi all,

first things first: YASS, an YNAB API! After years of waiting, it's like magic!

TL;DR: it's hard to do bulk import that's compatible with the existing file import.

---

So. For context: my bank doesn't provide any nice or stable integrations. My two options are parsing SMS notifications (horrible idea) or using manually downloaded XML reports. Those XML reports are almost OFX, so I hacked up https://github.com/abesto/ofxstatement-otp to convert them to proper OFX. Then, until today, I was just importing the OFX files using the file import functionality.

Using the API I'm looking to make this second manual step be a bit less manual. Maybe I could then start looking into how to further automate the first part (Selenium maybe?). I built a quick PoC in Node.js that reads an OFX file, converts the transactions to the YNAB API format, and got hit a wall.

I want to keep this import script compatible with the file import functionality of YNAB – if something goes wrong with my script, or the API changes, or whatever, I still want to be able to directly upload an OFX and not have duplicates between transactions created by the script and file import. To achieve that, the script would need to exactly replicate the functionality of the file import feature, most notable when it comes to generating `import_id`s.

It turns out that's not completely trivial. It seems that transactions with the same import_id-sans-occurrance in a single file import are all imported (as they should be). In subsequent uploads, deduplication seems to happen based on something like “the number of existing transactions with the same import_id-sans-occurrance plus memo”.  Duplicating an entry in the OFX and importing it resulted in two entries (one duplicate detected, a new one created). Removing one, and importing again, no new entries were created. After changing the memo of the one entry in the OFX (which doesn't change the import_id), a new entry was created.

So a script like this would need to fetch relevant existing transactions (for which, by the way, an extra “until date” would be nice to limit bandwidth usage), and implement some kind of heuristic, which I don't really want to guess at.

(If it helps, I'm happy to share the code so far - I'm unsure about whether I should push it to a public GitHub repo)

---

Ideas for a clean solution:

  • Expose a stable, supported API endpoint that triggers a file import (this would be my preference)
  • Document, provide reference implementation for the deduplication heuristic
10replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • (Quick correction for completness' sake: the OTP exported XML is not OFX-like; it's like another, existing, non-open standard. ofxstatement-otp converts it to an OFX)

    Reply Like
  • Hey

    I have built a CSV import into https://github.com/scottrobertson/fintech-to-ynab

    If you have a specific bank bank format you want to import, i can add a script for that. It will handle all the deduplication for you, so you don't need to worry about duplicates.

    Feel free to open a ticket over there with an example file you want to import if you would like that.

    As for using the API directly, all you need to do is define an `import_id` for each transaction and it will handle all the deduplication on the API end.

    Reply Like
  • Zoltán Nagy Thanks for trying out the API!  We appreciate your feedback.

    First of all, you mentioned a way to fetch transactions with an "until date".  The GET budgets/{budget_id}/transactions endpoint supports a since_date query parameter, so a request like GET budgets/{budget_id}/transactions?since_date=2018-02-18 should give you transactions dated on or after 2018-02-18.

    As for deduplication: When you import a file through the web interface, we will automatically assign an import_id in the format "YNAB:[milliunit_amount]:[iso_date]:[occurrence]".  The only transaction factors included in this import_id are the amount and date.  Memo is not used for determining the import_id.

    Here is a simple example:

    If you have an OFX file with the following 3 transactions and imported it through the web interface:

    <STMTTRN>
    <TRNTYPE>DEBIT
    <DTPOSTED>20180201160000
    <TRNAMT>-72.96
    <NAME>CITY OF BOSTON  DES:WATER BILL
    </STMTTRN>
    <STMTTRN>
    <TRNTYPE>DEBIT
    <DTPOSTED>20180202160000
    <TRNAMT>-34.96
    <NAME>KROGER
    </STMTTRN> <STMTTRN>
    <TRNTYPE>DEBIT
    <DTPOSTED>20180202160000
    <TRNAMT>-34.96
    <NAME>COMCAST INTERNET
    </STMTTRN>

    These transactions would be created:

    Date Amount import_id
    2018-02-01   -72.96 YNAB:-72960:2018-02-01:1
    2018-02-02   -34.96 YNAB:-34960:2018-02-02:1
    2018-02-02   -34.96 YNAB:-34960:2018-02-02:2

    If you submit a transaction through the API, specifying an import_id matching one of the above import_ids, the transaction would not be created.

    If you are seeing something different it would be really helpful if you could provide a simple example.  For example: "Here is an OFX file (sample file included) which has 2 transactions.  I imported through the web and both transactions were created.  I then sent a request with 2 transactions to the API (sample request JSON included), specifying an import_id on both and they were both created as well when I was expecting them to not be created since they were already imported with the OFX file." 

    Thanks again for the feedback!

    Reply Like 1
    • Brady at YNAB Thank you for the detailed response!

      First of all, you mentioned a way to fetch transactions with an "until date".  The GET budgets/{budget_id}/transactions endpoint supports a since_date query parameter, so a request like GET budgets/{budget_id}/transactions?since_date=2018-02-18 should give you transactions dated on or after 2018-02-18.

      Indeed, that exists, and it makes downloading transactions that happened not too long ago feasible. However, that's a since date, not an until date. An API call with both a “since” and an “until” date would look something like this: GET budgets/{budget_id}/transactions?since_date=2017-01-09&until_date=2017-01-10

      In this specific example, without the until_date, I put unneeded stress on both your server and my client by downloading over a year of transactions, even though I only care about a couple of days from January last year.

      As for deduplication: When you import a file through the web interface, we will automatically assign an import_id in the format "YNAB:[milliunit_amount]:[iso_date]:[occurrence]".  The only transaction factors included in this import_id are the amount and date.  Memo is not used for determining the import_id.

      It seems I managed to confuse myself somewhere along the line in my experiment last week. It appeared to me that even though the memo is not in import_id, it was still used for deduplication on the server side. I just tried to reproduce what I saw, and it appears you're totally right, along with Scott Robertson : memo is not used for generating import_id or for deduplication either. For reference, below's my experiment. Feel free to mark this thread as resolved; I'll revisit my import script in a few weeks. Thank you!

      ---

      I extended your OFX fragment to be importable:

      <OFX>
          <BANKMSGSRSV1>
              <STMTTRNRS>
                  <STMTRS>
                      <BANKTRANLIST>
                          <STMTTRN>
                              <TRNTYPE>DEBIT</TRNTYPE>
                              <DTPOSTED>20180201160000</DTPOSTED>
                              <TRNAMT>-72.96</TRNAMT>
                              <NAME>CITY OF BOSTON  DES:WATER BILL</NAME>
                          </STMTTRN>
                          <STMTTRN>
                              <TRNTYPE>DEBIT</TRNTYPE>
                              <DTPOSTED>20180202160000</TDPOSTED>
                              <TRNAMT>-34.96</TRNAMNT>
                              <NAME>KROGER</NAME>
                          </STMTTRN>
                          <STMTTRN>
                              <TRNTYPE>DEBIT</TRNTYPE>
                              <DTPOSTED>20180202160000</DTPOSTED>
                              <TRNAMT>-34.96</TRNAMNT>
                              <NAME>COMCAST INTERNET</NAME>
                          </STMTTRN>
                      </BANKTRANLIST>
                  </STMTRS>
              </STMTRNRS>
          </BANKMSGSRSV1>
      </OFX>

       

      • I create a brand-new budget to make sure we have a clean slate (UUID, if you care to investigate: e526ce19-6302-4374-84a0-c0c9060c8277)
      • I create a new account in the new budget (UUID b3fe4552-6f83-44b3-a1d4-dcefbb64a852)
      • I import the OFX above, and see the three new transactions as expected
      • I import the OFX again, and get a notification about three duplicate transactions, as expected
      • I modify the OFX to add a MEMO field to one of the transactions.
      • I import the modified OFX, and get another notification about three duplicate transactions.

      After those I went ahead and changed the amounts and added a memo to get new imports with non-empty memos, imported, changed the memo again, imported again, and got duplicate transactions, like you said I would.

      Reply Like
  • As a follow-up: It'd still be super nice to have a dedicated “import” endpoint where OFX (or even CSV) files could be posted. That way I could always be sure that imports going through the API always do exactly the same thing (ideally through the same code-path on the server) as imports through the UI. Which would in turn make me not waste your time with my false assumptions :P

    Reply Like
  • Thanks for the feedback Zoltán Nagy ! I will add your requests to our feature request list!

    Reply Like
  • Another thought that came to mind on the same topic: the confirmation dialog when importing on the UI provides a sense of safety. It's a last sanity check before applying mass changes. I realized I use that every month to verify I'm on the right account, and that the entries look at least reasonable at a glance. With an automated import, that's kinda gone. One one hand, that might be just the nature of automated imports. On the other, it might be possible to design around this (revertable imports, import plan that needs an extra confirmation via the API are some ideas that come to mind, returning the import_id so that the client side can implement a "revert").

    Reply Like 1
  • Big +1 to an API endpoint for POSTing OFX/QFX/CSV files. Many banks provide data in no other format. As a coder, that means I have two options:

    * parse the CSV, re-implement payee matching, category matching, and transaction ID, and hope I get similar results with /budgets/{budget_id}/transactions to what I expect from the Web UI

    * use a headless browser to login and use the web UI

    Looking around github, the second option seems like the way most small projects are going, and I can understand why.

    7 months ago you mentioned adding this feature to the requests list. How's it looking?

    Reply Like
  • The API still doesn't support importing and parsing OFX/QFX/CSV natively but I wanted to cross post a crude example Node.js script that parses an OFX and imports it through the API, in case it helps anyone landing here on this post. See: https://support.youneedabudget.com/r/360w2l

    Reply Like
Like1 Follow
  • Status Added to feature requests
  • 1 Likes
  • 4 mths agoLast active
  • 10Replies
  • 1425Views
  • 5 Following