Page MenuHomePhabricator

[ML] Add final reward logging facility.
ClosedPublic

Authored by mtrofin on Oct 17 2020, 9:08 AM.

Details

Summary

Allow logging final rewards. A final reward is logged only once, and is
serialized as all-zero values, except for the last one.

Diff Detail

Event Timeline

mtrofin created this revision.Oct 17 2020, 9:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2020, 9:08 AM
mtrofin requested review of this revision.Oct 17 2020, 9:08 AM
yundiqian requested changes to this revision.Oct 18 2020, 10:10 PM
yundiqian added inline comments.
llvm/include/llvm/Analysis/Utils/TFUtils.h
142–145

It seems over-complicated to pass the flag FinalReward to writeRawTensorsAsFeatureLists function and treat the case separately.

How about making the RawLogData ready-to-print (reward vector is <0, 0, ..., 0, reward>) so that we don't need to change writeRawTensorsAsFeatureLists function? basically the user is supposed to make sure the data in RawLogData is ready-to-print and writeRawTensorsAsFeatureLists only takes care of printing format.

We can either:

  1. logReward(0) in each step
  2. have a function in Logger called logFinalReward(T value) or overwriteFinalReward(T value), which overwrites the value in RawLogData.back()->back()

or:

  1. don't logReward(0) in each step
  2. have a function in Logger called logFinalReward(T value) that fills in 0 in RawLogData.back() except putting reward in the last, we can tell the length by looking at feature length already logged in RawLogData
This revision now requires changes to proceed.Oct 18 2020, 10:10 PM
mtrofin marked an inline comment as done.Oct 18 2020, 10:15 PM
mtrofin added inline comments.
llvm/include/llvm/Analysis/Utils/TFUtils.h
142–145

That would mean keeping around a potentially large array containing 0 (except the last value).

I would rather not add memory overhead if it can be avoided, and the extra consideration in code isn't that hard to grok.

yundiqian accepted this revision.Oct 18 2020, 10:41 PM
yundiqian added inline comments.
llvm/include/llvm/Analysis/Utils/TFUtils.h
142–145

Since it's for development mode, I guess it is debatable whether it worth adding code complexity to trade for better efficiency. imo it does not worth it, but I'm open to either options.

This revision is now accepted and ready to land.Oct 18 2020, 10:41 PM
This revision was landed with ongoing or failed builds.Oct 19 2020, 8:49 AM
This revision was automatically updated to reflect the committed changes.
mtrofin marked 2 inline comments as done.