Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a multicodec to identify Ethereum Solidity ABI-encoded data #363

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

raulk
Copy link
Member

@raulk raulk commented Nov 21, 2024

See https://docs.soliditylang.org/en/latest/abi-spec.html. Solidity ABI encoding is the defacto standard by which call data, return data, events, and more are represented in Ethereum. So given its prevalence, it's worth introducing a dedicated multicodec instead of being relegated to using RAW.

@raulk raulk requested review from darobin, rvagg and vmx as code owners November 21, 2024 09:50
@raulk
Copy link
Member Author

raulk commented Nov 21, 2024

@rvagg Given you've been dipping your toes in Ethereum standards as a member of FilOz, I think you'd see the value in this quite clearly ;-)

@@ -67,6 +67,7 @@ eth-account-snapshot, ipld, 0x97, permanent, Ethe
eth-storage-trie, ipld, 0x98, permanent, Ethereum Contract Storage Trie (Eth-Secure-Trie)
eth-receipt-log-trie, ipld, 0x99, draft, Ethereum Transaction Receipt Log Trie (Eth-Trie)
eth-receipt-log, ipld, 0x9a, draft, Ethereum Transaction Receipt Log (RLP)
eth-solidity-abi, ipld, 0x9b, draft, Ethereum Solidity ABI-encoded data
Copy link

@bumblefudge bumblefudge Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something more concrete for the description might help, since it's currently just redundant to the name:

Suggested change
eth-solidity-abi, ipld, 0x9b, draft, Ethereum Solidity ABI-encoded data
eth-solidity-abi, ipld, 0x9b, draft, ABI-encoded contract calls from EVM environments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not just "calls", returns can also be encoded in this form too

Copy link
Member Author

@raulk raulk Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of this is correct.
(a) as mentioned above, Solidity ABI encoding is used for more than just calls.
(b) the EVM does not care about the ABI.
(c) it is Ethereum-related, so let's keep Ethereum in the description.

@bumblefudge
Copy link

Also, forgive me if I'm offbase here, but doesn't Vyper generate the same ABIs? I think it's probably more accurate to scope this to EVM than to Solidity specifically.

@aschmahmann
Copy link

aschmahmann commented Nov 21, 2024

A couple thoughts:

  1. Why is this an IPLD codec? AFAICT it has no content addressable links / hashes in it, but maybe I didn't read the spec carefully enough. I see some stuff around events (e.g. https://docs.soliditylang.org/en/latest/abi-spec.html#encoding-of-indexed-event-parameters) although I'm not sure if I've got it correct.
  2. Who wants to use this and are there any implementations that leverage the codec? IMO we're past the stage where we allocate codes for unused / prospective purposes (looking at you excessive blake2b/s and skein multihash variants, bencode, rlp, etc.)

@rvagg
Copy link
Member

rvagg commented Nov 22, 2024

I can see historically how this might have been useful in early fevm days, perhaps to signal better than this: https://github.com/filecoin-project/lotus/blob/e2c46f1af1e088535d2bb5a2ec04362b9b0e5009/itests/kit/evm.go#L155-L162 on the way in and this: https://github.com/filecoin-project/lotus/blob/e2c46f1af1e088535d2bb5a2ec04362b9b0e5009/node/impl/full/eth.go#L1503-L1511 on the way out. (Although I can't quite see what it would buy us there.)

The tag column should just be serialization and that resolves point 1 from @aschmahmann above.

@raulk do you have a use-case in mind where this kind of signalling is needed? It's in the two-byte range so pretty cheap to add and I don't think this is particularly controversial, unless it's just a thought-bubble entry without a concrete use-case ready to use it.

@raulk
Copy link
Member Author

raulk commented Nov 25, 2024

Appreciate the quick feedback here!

Also, forgive me if I'm offbase here, but doesn't Vyper generate the same ABIs? I think it's probably more accurate to scope this to EVM than to Solidity specifically.

@bumblefudge This ABI scheme was introduced by Solidity, and other high-level Ethereum programming languages followed in adoption in order to interoperate. Ethereum and the EVM are unopinionated at the transaction data layer. From I can tell, it is spec'ed and canonically maintained by the Solidity team. I suspect your question is aimed at clarifying if it's correct to term this "Solidity ABI". My reflection is "yes", as there is technically no standardised Ethereum ABI (nor I expect there to be one).

do you have a use-case in mind where this kind of signalling is needed?
Who wants to use this and are there any implementations that leverage the codec?

@rvagg @aschmahmann Our immediate use case is enabling FVM Wasm actors to expose Ethereum-compatible interfaces. More concretely, in IPC we have anchoring contracts written in Solidity deployed to the parent, and native system actors that run inside IPC subnets. Sometimes we need to make the latter return data that can opaquely be passed to the former as call arguments, without client re-encoding.

The invoke() entrypoint returns a block id. The ipld::block_create syscall demands to know the codec. Right now we're forced to demote the codec to RAW due to the lack of a more appropriate multicodec to designate the actual encoding.

On a more general level, if we ever want to use CID-based systems like IPFS/Filecoin/etc. to store, address, and handle Ethereum calldata, return data, and events (which arguably is already happening today anyway), it's worth being able to express the format type in CIDs to aid downstream interpretation (e.g. indexing, discovery, hashing).

Why is this an IPLD codec?

@aschmahmann I don't know 🤷 I just copied from the row above. Is the schema for this CSV column defined somewhere? I'm fine with @rvagg's suggestion:

The tag column should just be serialization and that resolves point 1 from @aschmahmann above.

@Stebalien
Copy link
Member

In terms of code allocations, IMO, allocating 2-3 byte codes for "maybes" is totally fine.

In terms of serialization v. IPLD, "this is solidity ABI encoded" tells the user exactly nothing unless they also know the schema. It's definitely serialization, not IPLD, because it's not self describing.

IMO, there's no harm in adding this (as long as the type is serialization).


However, separate from the discussion of whether or not this codec should exist, I'm not sure how useful this codec is given that solidity ABI encoded data isn't self-describing. Either:

  1. I already know the method being invoked, so I already know the schema.
  2. I don't know the schema and knowing that the data was solidity ABI encoded is kind of useless.

I'm curious to know how you plan on using it.

@raulk
Copy link
Member Author

raulk commented Dec 10, 2024

@Stebalien the issue is that multicodecs get used for all kinds of things, at many levels of the stack, e.g. on a user-facing level we use them for content type signalling in CIDs; on a deep systems level we use them in the ref-fvm to convey the content type of parameters and return values (blocks) to (a) induce link scanning or not, (b) signal to higher APIs how to interpret the content.

Given the generality and centrality of multicodec, I lean towards a policy evaluating proposals based on opportunities unlocked and directional vision, versus gatekeeping concrete immediate use cases.

For example, if I ever wanted to build a multi-blockchain IPFS-based indexing system, down the the call semantics level, with the ability to address and interpret any blob (params, return values, errors, events, traces, etc.) through a CID, it would be very valuable to be able to express the blob's content type within its CID. Alternatively I could use a an application-level enveloping data structure together with a RAW codec or another standardised custom codec, and put the content type in the envelope, but that is rather wasteful, poorly interoperable, and somewhat of a workaround.

In this particular case, we were creating a custom Wasm FVM actor that behaved externally like a Solidity smart contract, exposing a Solidity ABI interface. The invoke entrypoint requires us to represent return values as blocks, and creating a block necessitates a codec. Using CBOR is a no-go since it fails at link scanning, and using RAW loses information. Hence this proposal, even if functionally speaking the behaviour required today within ref-fvm would be identical to RAW, it seems like a useful evolution to be able to return the content type that something actually is.

Re: Solidity ABI and self-describability, it could make sense to create a multicodec point for the Solidity 4byte call convention as well. You don't need to know the method ahead of time; it's very common to use directories like https://www.4byte.directory/ (for which public APIs and libraries exist) to parse the call and discover its schema at runtime.

@rvagg
Copy link
Member

rvagg commented Dec 11, 2024

I'm interested to hear @Stebalien's response to that, but I'm fine with this PR since it's not in the double-byte range and is being requested for a concrete and plausible use-case. I'd just like the label column changed to serialization instead of ipld.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants