Skip to content

Add more brokers (Degiro, Revolut, eToro) + interest rate taxes from abroad (revolut, VUB)#10

Open
BJMg wants to merge 31 commits intodstrupl:mainfrom
BJMg:more-brokers
Open

Add more brokers (Degiro, Revolut, eToro) + interest rate taxes from abroad (revolut, VUB)#10
BJMg wants to merge 31 commits intodstrupl:mainfrom
BJMg:more-brokers

Conversation

@BJMg
Copy link
Copy Markdown

@BJMg BJMg commented Apr 26, 2026

  • No support for SALES data from new brokers (didn't have data for it this year).
  • extended support for E-Trade (Morgan Stanley) to load up also xls files which is easier to download than PDFs (don't need to do it one by one - no support for dividend - no data - only ESPP and RSU)
  • interest from abroad support was missing (Revolut, VUB)
  • Also added automatic downloading of conversion rates from CNB
  • Support for more employers (not just CISCO, especialy useful for former employees, when you want to combine more than just CISCO data)

BJMg added 7 commits April 26, 2026 18:12
The upstream fix in a42f282 added a test that asserted on the
single-currency DividendReport shape (printableDividendList,
totalTaxDollar, totalTaxCrown directly on the report). After rebasing
the multi-currency refactor on top, those fields live on
DividendReport.sections[Currency.USD] instead, and the
ExchangeRateProvider SAM now takes (LocalDate, Currency).

The fix itself (mutable per-date tax candidate list, remove on match)
is retained in DividentReportPreparation.kt across both
buildCurrencySection and buildCzkSection.
BJMg added 15 commits April 26, 2026 18:52
Previously dividends without a same-date tax record were silently dropped
from the §7 report, leading to under-reporting (e.g. Revolut Stocks
parsed with whtRate=0.0 emits no TaxRecord, so the dividend would never
have appeared on the form).

Now both buildCurrencySection and buildCzkSection throw an
IllegalStateException with a descriptive message identifying the
offending dividend (date, broker, symbol, amount, currency) and
suggesting two concrete remedies: emit an explicit zero TaxRecord for
genuine 0% WHT cases, or check that the broker statement's tax row
matches the dividend date.

Test added: dividendWithoutMatchingTaxRecordFailsWithDescriptiveMessage
CNB occasionally publishes the very last fixings of December a few
business days late, and any retroactive corrections appear in early
January. Caching a year file on Jan 8 risked freezing in a still-growing
year.

Treat year N as complete only once today() is at least 30 days into
year N+1 (i.e. ~Jan 31). That comfortably clears any late-December
corrections while still being well before the late-March / early-April
Czech tax-filing window, so previous-year rates are downloaded fresh at
most a handful of times per filing season.
The 'fixed' branch printed a wrong diff value: the last subtrahend was
profit2024WhenUsedDynamicRate instead of profitWhenUsedFixedRate, so the
displayed difference between the two strategies was nonsense (it
contained the dynamic 2024 figure twice and never subtracted the fixed
post-2024 figure).

The else branch already had the correct expression
(fixedTotal - dynamicTotal); this commit makes the if branch
symmetrical (dynamicTotal - fixedTotal) so the printed diff is
consistent with the comparison that selected the branch.
The literal '2024' was duplicated across ReportGenerator (twice, in the
DateInterval.year() arguments for the rsuReport2024/esppReport2024 fields)
and CockroachMain (twice in PDF output filenames and twice in the
recommendation banner strings). The number is not arbitrary: it is the
Czech tax-law cutover year for unsold RSU/ESPP equity, after which the
old vs. new methodology comparison no longer needs to be emitted.

Introduce ReportGenerator.LEGISLATIVE_TRANSITION_YEAR and route every
caller through it. Variable and report-field names that mention 2024
are intentionally left alone -- they refer to the cutover year
specifically, so a literal name carries useful information.

Behaviour is unchanged.
…Record

The production data classes deliberately require broker / symbol / product /
country to be supplied at every call site, so that real parsers cannot
silently lose attribution information. Several test call sites were still
using the pre-change zero-arg form and stopped compiling.

Introduce TestRecords.kt with two tiny factory functions (dividendRecord,
interestRecord) that supply harmless defaults for the metadata fields tests
do not care about (symbol=TEST, broker=TestBroker, product=TestProduct,
country=IE). Migrate the affected test files to use them; tests that
intentionally exercise specific metadata (e.g. country=SK for VÚB) keep
their explicit arguments.

No production code is changed.
The previous usage message only listed the positional Schwab/E-Trade form
and gave a one-line nod to the YAML form, despite the YAML config being
the only way to feed Degiro, Revolut, eToro and VUB into the report.
Users that ran 'cockroach' with no arguments could not discover those
options, and a YAML file (or empty Schwab export) with no broker sources
crashed with an IllegalArgumentException stack trace from require() in
CockroachMain.report().

- Rewrite the help text to lead with the YAML form, list every supported
  broker key (schwab, etrade, etradeBenefitHistory, degiro, revolut.stocks,
  revolut.savings, revolut.whtRate, etoro, vub) and keep the legacy
  positional form clearly marked as Schwab/E-Trade only.
- Wrap main() so any IllegalArgumentException (including the existing
  'No input sources provided. Specify at least one of: schwab, etrade,
  degiro, revolut, etoro, vub.' guard) is reported as a one-line stderr
  message and the process exits with code 1, instead of dumping a stack
  trace.
- Switch System.exit calls to kotlin.system.exitProcess for consistency.

No production behaviour changes for valid inputs; tests still pass.
broker (Schwab, E-Trade, Degiro, Revolut, eToro, VUB), each owning its
own input shape (single file / list / directory / WHT rate / year). The
small file-loading helper that several of them need moves to a top-level
loadText() in BrokerSource.kt; the matching helper inside CockroachMain
is removed.
Copy link
Copy Markdown
Owner

@dstrupl dstrupl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Reviewed PR #10 end-to-end. The PR is a substantial and well-organized expansion of the tool: a clean BrokerSource interface, four new broker integrations (Degiro, Revolut, eToro, VÚB), interest-income reporting, a YAML config, and an HTTP CNB rate fetcher. Tests build and pass locally on mvn clean install (a couple of tests assumeTrue against private files and skip — that is by design).

I'm posting this as a comment review, not request-changes — none of the findings block the merge, but several are worth fixing before tax season because they will produce wrong numbers or fail silently for users other than yourself.

What I liked

  • BrokerSource is a clean abstraction; CockroachMain becomes a thin orchestrator and the YAML branch is easy to follow.
  • CockroachConfig + kaml is a much better UX than the positional-argument mode and is well-documented in the README.
  • HttpCnbYearRatesSource correctly distinguishes complete vs. incomplete years and uses an atomic temp-file rename when caching. The chunk-splitting for the 2022 HRK transition is a nice touch.
  • Test coverage is good for the new parsers; the redacted XLS/XLSX/CSV/PDF fixtures under src/test/resources are a great pattern to keep.
  • README updates are thorough — every broker gets clear acquisition steps + the new input/ layout.

Findings

Likely-incorrect-output (recommend fixing)

  1. Wrong broker name on E-Trade RSU/ESPP PDF reports (pre-existing, but more visible now).
    RsuPdfParser and EsppPdfParser hardcode BROKER_NAME = "Charles Schwab & Co." even when invoked on E-Trade PDFs. After this PR these records are reported through ETradeBrokerSource, so dividends/sales correctly say E-Trade but RSU/ESPP rows still say Schwab. Pass the broker name into the parser (or detect from PDF header).

  2. VubInterestPdfParser uses the YAML year instead of the statement year.
    VubBrokerSource(files, year = config.year) means a December 2024 → January 2025 statement bundled into the 2024 run will book January interest with a 2024 date. Either parse the year from the statement header (Period: row) or require single-year statements explicitly.

  3. eToro dividends are unconditionally classified as US-source and grossed up at 15%.
    The README warns about this, but the code itself doesn't refuse non-US tickers (e.g. VOD.L). Two safer options: (a) require the user to confirm a per-symbol country map in the YAML, or (b) at least log a clear warning per non-US ticker rather than silently labeling it US.

  4. Revolut whtRate is per-broker, not per-issuer.
    This is fine for accounts that only hold US-domiciled stock under W-8BEN. If the user holds a non-US dividend payer through Revolut, the gross-up will be wrong. Worth a note in RevolutBrokerSource and in the report so the user spots it.

  5. DividentReportPreparation 15% match heuristic.
    The same-day tax-to-dividend matching uses a 15% ratio. With Degiro/eToro/Revolut now in the mix, dividends with 0% (treaty exemption) or non-15% WHT can be mis-paired. Consider matching by (symbol, payDate, currency) when available and using the ratio only as a tiebreaker.

Robustness / correctness

  1. HttpCnbYearRatesSource has no HTTP timeouts.
    URI(...).toURL().openStream() with no connectTimeout/readTimeout will block indefinitely if cnb.cz is slow. Use URLConnection.setConnectTimeout(...) / setReadTimeout(...) (e.g. 10s/60s).

  2. No offline fallback to ClasspathCnbYearRatesSource.
    You ship rates_*.txt for completed past years, but the HTTP source ignores them and downloads fresh. If the user is offline or CNB is down, the run fails. Consider: try classpath first for completed years; download only when missing.

  3. No content validation on the CNB download.
    If CNB returns an HTML error page (5xx, captive portal, …), splitByHeader will throw no header line found. That's acceptable but don't cache the HTML accidentally — currently you only cache via downloadToFile which is fine, but a sanity check that the response starts with Date|/Datum| would prevent caching garbage if CNB ever changes the URL semantics.

  4. RevolutParser savings parser only recognizes English Interest PAID.
    Localized statements (the user lives in CZ/SK) would silently report zero interest. Either accept a small allow-list of localized strings or fail loudly when an Interest * row is encountered that the parser doesn't understand.

  5. EtoroXlsxParser uses hand-written regex over the unzipped XLSX.
    The README explains why (POI choked on the file), but the parser hard-codes column letters. A brittle approach. At minimum, validate the header row (column titles) and fail with a clear message if the layout differs from the tested fixture.

Code hygiene

  1. Unused imports (CI doesn't fail on these, but easy nits):

    • ETradeBrokerSource.ktorg.apache.poi.openxml4j.opc.internal.FileHelper (this also pulls in a POI internal package — fragile across POI versions, please remove)
    • JsonExportParser.ktkotlinx.serialization.builtins.serializer
    • CockroachMain.ktjava.util.logging.Logger
  2. CockroachMain.runCockroach is large and procedural.
    It does YAML detection, validation ("at least one source must be present"), and then both the YAML and legacy paths inline. Splitting into two private functions (runFromYaml, runFromArgs) plus a small "detect mode" helper would be easier to test.

  3. Mixed naming. DividentReportPreparation keeps the original typo (Divident / Dividend). Fine to leave for diff hygiene; if you ever do a rename pass, fix this and the matching templates together.

  4. Numbers as Double.
    Pre-existing, not introduced here. Worth flagging for the future: the gross-up + matching logic in Revolut/Degiro accumulates floating-point error; the existing tests pin specific values, but BigDecimal at money boundaries would be the proper fix once the layout stabilizes.

Tests

  1. Build passes (mvn clean install) on kotlin.version=2.3.20 with JUnit 5 + vintage engine. Surefire reports 50 tests run, 3 skipped (the assumeTrue(file.exists()) ones that need private statements). No failures.

  2. There's no end-to-end test that runs CockroachMain.report against a fully-populated ParsedExport and asserts the PDF/HTML output is generated without error. Given how many new code paths this PR touches, an integration test that wires up two or three brokers using the existing fixtures would catch template breakage.

CLAUDE.md

I added a CLAUDE.md at the repo root that summarizes the project, build, conventions, per-broker patterns, and the known issues above. This is intended for future AI agents (or a fresh re-read after a year). Feel free to amend / move it to AGENTS.md if you prefer that name.

— Reviewed using dstrupl (gh auth confirmed before posting).

@dstrupl
Copy link
Copy Markdown
Owner

dstrupl commented Apr 27, 2026

Zdar, nezajdem nekdy na obed to probrat? Sorry, ze jsem nechal udelat to review automatem ....

BJMg added 5 commits April 27, 2026 18:45
…unused imports

* HttpCnbYearRatesSource: switch to URLConnection with explicit
  connectTimeout=10s / readTimeout=60s so a CNB outage no longer
  hangs the whole run; require the response to contain a
  'Date|'/'Datum|' header line before returning or caching it,
  so a captive-portal page or HTML error body fails loudly
  instead of being silently cached as empty rates.
* EtoroXlsxParser: log a per-row WARNING when an instrument name
  carries a non-US exchange suffix (e.g. VOD.L, SAP.DE, MC.PA),
  since country is hard-coded to US and such tickers would be
  misclassified as US-source dividends.
* ETradeBrokerSource, CockroachMain: drop genuinely unused imports
  (org.apache.poi.openxml4j.opc.internal.FileHelper and
  java.util.logging.Logger respectively).
…t rows

A1 (likely-incorrect output): RsuPdfParser and EsppPdfParser previously
hardcoded broker = "Charles Schwab & Co." because the PDF templates are
identical for Schwab and E-Trade/Morgan Stanley. After the E-Trade source
was added, RSU/ESPP records imported via ETradeBrokerSource still showed
'Charles Schwab & Co.' on Priloha c. 3 / sec. 6, conflicting with the
sales/dividend rows from the same source. Pass brokerName explicitly
through parse / parseDirectory / parseFromText; ETradeBrokerSource now
stamps 'Morgan Stanley & Co.' to match what ETradeBenefitHistoryParser
and ETradeGainLossParser already emit.

B2 (robustness): RevolutParser.parseSavings only recognises English
'Interest PAID' / 'Interest Reinvested' descriptions. A localised
statement (CZ/SK) or a new Revolut row type whose description starts
with 'Interest ' would previously fall into the generic 'unrecognised
row' warning branch and be silently dropped, under-reporting sec. 8
interest income. Add a fail-loud check that throws IllegalStateException
on any 'Interest *' row not matching the known prefixes, with guidance
to re-export the statement in English.

Tests: RsuPdfParserTest / EsppPdfParserTest updated to thread brokerName
through; new RevolutParserTest case asserts the IllegalStateException
fires on 'Interest Accrued ...'. mvn test: 51 run, 0 failures, 1 skipped
(external-PDF assumeTrue).
…e year

A2 (likely-incorrect output): VubInterestPdfParser stamped every posting
with the year passed in by the caller (typically tax_YYYY.yaml's year),
even though VUB's table only prints DD/MM. Pointing the configuration at
the wrong year (e.g. running with year=2024 against a 2025 statement, or
vice versa around the December->January boundary) would silently
mis-stamp every InterestRecord.

Fix: extract the statement year from the 'Account balance as at
DD/MM/YYYY' lines in the PDF header (max year across opening and closing
balances = closing year = statement year), and require it to match the
caller's configured year. Throw IllegalStateException with actionable
guidance on mismatch.

Tests: new VubInterestPdfParserTest covers extractStatementYear (synthetic
text fixture for the closing-balance-pick rule and the no-balance-line
error path) and exercises the cross-check via the bundled real PDF using
assumeTrue, asserting both the mismatch (year=2024 -> throws) and the
happy path (year=2025 -> records all carry 2025, CZK, broker='VUB').
mvn test: 55 run, 0 failures, 1 skipped.
…ot per-issuer

A3 (likely-incorrect output): Revolut Stocks reports only net dividends
and the parser performs a mathematical gross-up at a single per-broker
whtRate (default 0.15 for the US/CZ treaty rate). Today this is correct
because Revolut Stocks is a US-only product, but if Revolut adds non-US
listings or pays out an ADR whose underlying issuer-country withholds at
a different rate, the per-row gross-up will not match what was actually
withheld and the user will silently mis-report taxes credited.

Fix:
- RevolutParser.parseStocks now throws IllegalStateException on any
  ticker carrying a non-US exchange suffix (.L, .DE, .PA, ...) instead
  of silently treating it as US, mirroring the existing fail-loud
  behaviour for unrecognised Interest rows in parseSavings. The error
  spells out the offending ticker, the per-broker whtRate that would
  have been mis-applied, and the actionable workaround (split the row
  out manually, or extend the parser with per-issuer routing).
- KDoc on parseStocks spells out the per-broker (vs per-issuer)
  limitation.
- README 'Revolut Stocks' notes call out the same limitation explicitly,
  next to the existing whtRate / gross-up explanation.

Tests: new stocksFailsLoudlyOnNonUsTickerSuffix in RevolutParserTest
asserts the exception type and message contents for a VOD.L row.
mvn test: 56 run, 0 failures.
…) explicitly

Replace the 15%-of-amount heuristic that picked the closest tax row on the same date
with an explicit composite key (broker, symbol, date) within the per-currency section.
Multiple tax rows that share the same key (e.g. Schwab's same-day correction adjustments)
are now summed before being attached to the dividend bucket, and multiple dividend rows
sharing the same key collapse into a single printable line with summed amounts.

Two new fail-loud invariants:
  - any dividend without a matching tax bucket throws IllegalStateException with
    broker/symbol/date/amount and a remediation hint;
  - any tax row without a matching dividend bucket throws IllegalStateException
    with the same diagnostic.

To support keying, TaxRecord and TaxReversalRecord gain non-nullable symbol+broker
fields; all production parsers (Degiro, eToro, Revolut, Morgan Stanley DividendXlsx,
Schwab JsonExport) populate them from the underlying statement, and TestRecords.kt
provides defaulted helpers so existing tests do not have to spell them out.

Updated DividentReportPreparationTest with three scenarios covering the new key:
two brokers paying the same symbol on the same day, multiple tax rows summed against
one dividend, and an orphaned tax row failing loudly.

mvn test: 58 run, 0 failures, 3 skipped.
BJMg added 3 commits April 27, 2026 19:50
Past tax years remain reproducible offline and survive cnb.cz outages
once their year.txt is shipped under
src/main/resources/cz/solutions/cockroach/rates_<year>.txt.

* ClasspathCnbYearRatesSource: expose hasYear(year).
* New ClasspathOrHttpCnbYearRatesSource: prefers bundled snapshot when
  available; only consults HTTP for years not on the classpath
  (typically the current/future year).
* CockroachMain.report wraps HttpCnbYearRatesSource with the composite.
* README: documents the lookup order and how to pin a new year.
* CnbYearRatesSourceTest covers hasYear, bundled-preferred, HTTP
  delegation, and propagated HTTP failure.

mvn test: 62 run, 0 failures, 3 skipped.
Pulls YAML and positional CLI source-list construction out of the
top-level dispatcher into invocationFromYaml() and
invocationFromPositionalArgs(), each returning a small private
Invocation(year, outputDir, sources). runCockroach now only picks the
form, parses, and calls CockroachMain.report. No behavior change.

mvn test: 62 run, 0 failures, 3 skipped.
Replaces the assumeTrue dependency on a private input/BenefitHistory.xlsx
with a checked-in synthetic fixture under
src/test/resources/cz/solutions/cockroach/. The fixture is built by
BenefitHistoryFixtureGenerator, which mirrors the structural layout of
a real Morgan Stanley Stock Plan Connect export (sheet names, headers
including the duplicated 'Vested Qty.' column, and the
Grant/Vest Schedule/Tax Withholding row sequence) but uses anonymised
symbol (ACME), grant ids (G-1001, G-1002, G-1003), quantities, and
amounts. G-1003 has zero vested shares and must be excluded by the
parser, exercising the same path the production file does.

mvn test: 62 run, 0 failures, 1 skipped (was 3 skipped).
@BJMg
Copy link
Copy Markdown
Author

BJMg commented Apr 27, 2026

Zdar, nezajdem nekdy na obed to probrat? Sorry, ze jsem nechal udelat to review automatem ....

V pohode. Uz jsem na to poslal sveho automata a vyresil vetsinu tech poznamek. Co jsem koukal, tak nic z toho by nemelo byt uplne zasadni.

Jinak je hlavne uprava pro me, abych z toho dostal kompletni danove priznani. Hodil jsem to sem jako PR, jestli to nahodou nebude chtit pouzit jeste nekdo.

Obed klidne muzem dat. Treba ctvrtek bych mel mit relativne volny zatim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants